[논문] Face-StyleSpeech: Improved Face-to-Voice Latent Mapping for Natural Zero-shot Speech Synthesis from a Face Image
https://arxiv.org/abs/2311.05844 Face-StyleSpeech: Improved Face-to-Voice latent mapping for Natural Zero-shot Speech Synthesis from a Face ImageGenerating a voice from a face image is crucial for developing virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) sy..
연구실 공부
2024. 5. 15.