분류 전체보기 [논문] A Full-duplex Speech Dialogue Scheme Based on Large Language Model https://arxiv.org/abs/2405.19487 A Full-duplex Speech Dialogue Scheme Based On Large Language ModelsWe present a generative dialogue system capable of operating in a full-duplex manner, allowing for seamless interaction. It is based on a large language model (LLM) carefully aligned to be aware of a perception module, a motor function module, and the concarxiv.org해당 논문을 보고 작성했습니다. Abstractfull-du.. 연구실 공부 2025. 4. 15. [논문] Cross-speaker Emotion Disentangling and transfer for end-to-end speech synthesis https://arxiv.org/abs/2109.06733 Cross-speaker emotion disentangling and transfer for end-to-end speech synthesisThe cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims to synthesize speech for a target speaker with the emotion transferred from reference speech recorded by another (source) speaker. During the emotion transfer procearxiv.org해당 논문을 보고 작성했습니다. Ab.. 연구실 공부 2025. 4. 14. [논문] StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings https://arxiv.org/abs/2309.07592 StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep EmbeddingsVoice conversion (VC) transforms an utterance to sound like another person without changing the linguistic content. A recently proposed generative adversarial network-based VC method, StarGANv2-VC is very successful in generating natural-sounding conversioarxiv.org해당 논문을 보고 작성했습니다. Abs.. 연구실 공부 2025. 4. 4. [논문] FastSpeech2: Fast and High-Quality End-to-End Text to Speech https://arxiv.org/abs/2006.04558 FastSpeech 2: Fast and High-Quality End-to-End Text to SpeechNon-autoregressive text to speech (TTS) models such as FastSpeech can synthesize speech significantly faster than previous autoregressive models with comparable quality. The training of FastSpeech model relies on an autoregressive teacher model for duratioarxiv.org해당 논문을 보고 작성했습니다. AbstractFastSpeech와 같.. 연구실 공부 2025. 4. 1. [논문] Revealing Emotional Clusters in Speaker Embeddings : A Contrastive Learning Strategy for Speech Emotion Recognition https://arxiv.org/html/2401.11017v1 Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion RecognitionLicense: CC BY 4.0 arXiv:2401.11017v1 [eess.AS] 19 Jan 2024 \useunder Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition Abstract Speaker embeddings carry valuable emotion-related info.. 연구실 공부 2025. 3. 31. [논문] VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice Conversion https://arxiv.org/abs/2106.10132 VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-shot Voice COne-shot voice conversion (VC), which performs conversion across arbitrary speakers with only a single target-speaker utterance for reference, can be effectively achieved by speech representation disentanglement. Existing work generally .. 연구실 공부 2025. 3. 26. [논문] Speech Emotion Diarization: Which Emotion Appears When? https://arxiv.org/abs/2306.12991 Speech Emotion Diarization: Which Emotion Appears When?Speech Emotion Recognition (SER) typically relies on utterance-level solutions. However, emotions conveyed through speech should be considered as discrete speech events with definite temporal boundaries, rather than attributes of the entire utterance. To rarxiv.org해당 논문을 보고 작성했습니다. Abstractspeech emotion reco.. 연구실 공부 2025. 3. 12. [논문] Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features https://arxiv.org/abs/2211.04710 Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation FeaturesVoice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conve.. 연구실 공부 2025. 3. 10. 이전 1 2 3 4 ··· 29 다음 728x90