본문 바로가기

Jeongwooyeol

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

연구실 공부

[논문] MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis https://arxiv.org/abs/2404.18398 MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech SynthesisEmotional Text-to-Speech (E-TTS) synthesis has gained significant attention in recent years due to its potential to enhance human-computer interaction. However, current E-TTS approaches often struggle to capture the complexity of human emotions, primarilyarxiv.org 해당 논문을 .. 연구실 공부 2024. 6. 17.

[논문] Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis https://arxiv.org/abs/1803.09017 Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech SynthesisIn this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large rangarxiv.org해당 논문을.. 연구실 공부 2024. 6. 16.

[논문] Speech Synthesis with Mixed Emotions https://arxiv.org/abs/2208.05890 Speech Synthesis with Mixed EmotionsEmotional speech synthesis aims to synthesize human voices with various emotional effects. The current studies are mostly focused on imitating an averaged style belonging to a specific emotion type. In this paper, we seek to generate speech with a mixturearxiv.org해당 논문을 보고 작성했습니다. Abstractemotional speech synthesis는 다양한 emotion.. 연구실 공부 2024. 6. 14.

[논문] LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading https://arxiv.org/abs/2306.03258 LipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingLip-to-speech involves generating a natural-sounding speech synchronized with a soundless video of a person talking. Despite recent advances, current methods still cannot produce high-quality speech with high levels of intelligibility for challenging and rarxiv.org해당 논문을 보고 작성했습니다. AbstractLip-t.. 연구실 공부 2024. 6. 13.

[논문] Let There Be Sound: Reconstructing High Quality Speech from Silent Videos https://arxiv.org/abs/2308.15256 Let There Be Sound: Reconstructing High Quality Speech from Silent VideosThe goal of this work is to reconstruct high quality speech from lip motions alone, a task also known as lip-to-speech. A key challenge of lip-to-speech systems is the one-to-many mapping caused by (1) the existence of homophenes and (2) multiple speech vaarxiv.org해당 논문을 보고 작성했습니다. Abstract이.. 연구실 공부 2024. 6. 11.

[논문] Lip-to-Speech Synthesis in the Wild with Multi-Task Learning https://arxiv.org/abs/2302.08841 Lip-to-Speech Synthesis in the Wild with Multi-task LearningRecent studies have shown impressive performance in Lip-to-speech synthesis that aims to reconstruct speech from visual information alone. However, they have been suffering from synthesizing accurate speech in the wild, due to insufficient supervision forarxiv.org해당 논문을 보고 작성했습니다. Abstract최근 연구들은 visual .. 연구실 공부 2024. 6. 11.

[논문] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text https://arxiv.org/abs/2104.11178 VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextWe present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations tarxiv.org해당 논문을 보고 작.. 연구실 공부 2024. 6. 10.

[논문] Multimodal Learning with Transformers: A Survey https://arxiv.org/abs/2206.06488 Multimodal Learning with Transformers: A SurveyTransformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AIarxiv.org해당 논문을 보고 작성했습니다. Multimodal Learning (MML)최근 몇 년 사이에.. 연구실 공부 2024. 6. 5.

이전 1 ··· 6 7 8 9 10 11 12 ··· 21 다음

728x90

티스토리툴바