2025/04 [논문] Cross-speaker Emotion Disentangling and transfer for end-to-end speech synthesis https://arxiv.org/abs/2109.06733 Cross-speaker emotion disentangling and transfer for end-to-end speech synthesisThe cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims to synthesize speech for a target speaker with the emotion transferred from reference speech recorded by another (source) speaker. During the emotion transfer procearxiv.org해당 논문을 보고 작성했습니다. Ab.. 연구실 공부 2025. 4. 14. [논문] StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings https://arxiv.org/abs/2309.07592 StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep EmbeddingsVoice conversion (VC) transforms an utterance to sound like another person without changing the linguistic content. A recently proposed generative adversarial network-based VC method, StarGANv2-VC is very successful in generating natural-sounding conversioarxiv.org해당 논문을 보고 작성했습니다. Abs.. 연구실 공부 2025. 4. 4. [논문] FastSpeech2: Fast and High-Quality End-to-End Text to Speech https://arxiv.org/abs/2006.04558 FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechNon-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duratioarxiv.org해당 논문을 보고 작성했습니다. AbstractFastSpeech와 같.. 연구실 공부 2025. 4. 1. 이전 1 다음 728x90