2025/02 [논문] DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment https://arxiv.org/abs/2401.08095 DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text AlignmentEmotional voice conversion (EVC) involves modifying various acoustic characteristics, such as pitch and spectral envelope, to match a desired emotional state while preserving the speaker's identity. Existing EVC methods often rely on text transcript.. 연구실 공부 2025. 2. 25. [논문] Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion https://arxiv.org/abs/2005.07025 Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice ConversionEmotional voice conversion aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity. The prior studies on emotional voice conversion are mostly carried out under the assumption that emotioarxiv.org해당 논문을 보고 작성했습니다... 연구실 공부 2025. 2. 19. [논문] Identity Conversion for Emotional Speakers: A Study for Disentanglement of Emotion Style and Speaker Identity https://www.researchgate.net/publication/355496277_Identity_Conversion_for_Emotional_Speakers_A_Study_for_Disentanglement_of_Emotion_Style_and_Speaker_Identity해당 논문을 보고 작성했습니다. Abstractexpressive voice conversion은 speaker identity와 speaker-dependent emotion style을 동시에 변환합니다. speech emotion의 계층적 구조 때문에, speaker-dependent emotional style을 분리하는 것은 어렵습니다. variational autoencoder (VAE)를 이용한 speaker d.. 연구실 공부 2025. 2. 17. [논문] Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation https://arxiv.org/abs/2204.10020 Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data AugmentData augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approac.. 연구실 공부 2025. 2. 15. [논문] Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing https://arxiv.org/abs/2302.10536 Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain PPrimary goal of an emotional voice conversion (EVC) system is to convert the emotion of a given speech signal from one style to another style without modifying the linguistic content of the signal. Most of the state-of-the-art approaches.. 연구실 공부 2025. 2. 14. [논문] Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering https://ieeexplore.ieee.org/document/8114355 Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse FilteringDue to the profound differences between acoustic characteristics of neutral and whispered speech, the performance of traditional automatic speech recognition (ASR) systems trained on neutral speech degrades significantly when whisper is applied. In order tieeexplore.iee.. 연구실 공부 2025. 2. 13. [논문] SEF-VC: Speaker Embedding Free Zero-shot Voice Conversion with Cross Attention https://arxiv.org/abs/2312.08676 SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross AttentionZero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged. Although the voice of generated speech can be controlled by providing the speaker embeddinarxiv.org해당 논문을 보고 작성했습니다. Abst.. 연구실 공부 2025. 2. 4. 이전 1 다음 728x90