본문 바로가기

Jeongwooyeol

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

2024/05

[논문] Face-StyleSpeech: Improved Face-to-Voice Latent Mapping for Natural Zero-shot Speech Synthesis from a Face Image https://arxiv.org/abs/2311.05844 Face-StyleSpeech: Improved Face-to-Voice latent mapping for Natural Zero-shot Speech Synthesis from a Face ImageGenerating a voice from a face image is crucial for developing virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) sy.. 연구실 공부 2024. 5. 15.

Voice Conversion 간단 설명 Voice Conversion (VC)VC는 linguistic content information은 유지한 채로 utterance의 speaker를 target speaker로 변환하는 기술입니다. VC를 하기 위해 과거에는 paired data가 필요했습니다. 최근 몇 년 동안 non-parallel data를 이용하는 다양한 model들이 등장했습니다. DGAN-VC는 adversarial training 방식으로 content information과 speaker information을 분리하도록 학습합니다. StarGAN-VC는 many-to-many voice conversion을 진행하기 위해 conditional input을 사용합니다. 하지만, 두 model다 training 중에 봤던 .. 연구실 공부 2024. 5. 13.

[논문] One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization https://arxiv.org/abs/1904.05742 One-shot Voice Conversion by Separating Speaker and Content Representations with Instance NormalizationRecently, voice conversion (VC) without parallel data has been successfully adapted to multi-target scenario in which a single model is trained to convert the input voice to many different speakers. However, such model suffers from the limitation that it carxiv... 연구실 공부 2024. 5. 12.

[논문] Emotion Intensity and its Control for Emotional Voice Conversion https://arxiv.org/abs/2201.03967 Emotion Intensity and its Control for Emotional Voice ConversionEmotional voice conversion (EVC) seeks to convert the emotional state of an utterance while preserving the linguistic content and speaker identity. In EVC, emotions are usually treated as discrete categories overlooking the fact that speech also conveys emarxiv.org해당 논문을 보고 작성했습니다. Abstractemotional .. 연구실 공부 2024. 5. 10.

[논문] Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations https://arxiv.org/abs/1804.02812 Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio RepresentationsRecently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied to voice conversion to a different speaker without parallel data, although in those approaches an individual model is needed for each target speaker. In this paper,.. 연구실 공부 2024. 5. 9.

이전 1 2 다음

티스토리툴바