[논문] CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training
https://arxiv.org/abs/2305.10763 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingImproving text representation has attracted much attention to achieve expressive text-to-speech (TTS). However, existing works only implicitly learn the prosody with masked token reconstruction tasks, which leads to low training efficiency and difficulty iarxiv.org해당 논문을 보고..
연구실 공부
2024. 7. 1.