[논문] Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face
https://ieeexplore.ieee.org/document/9687866 Hearing Faces: Target Speaker Text-to-Speech Synthesis from a FaceThe existence of a learnable cross-modal association between a person's face and their voice is recently becoming more and more evident. This provides the basis for the task of target speaker text-to-speech (TTS) synthesis from face ref-erence. In this papieeexplore.ieee.org해당 논문을 보고 작성..
연구실 공부
2024. 5. 19.