본문 바로가기

Jeongwooyeol

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

연구실 공부

[논문] Multimodal Learning with Transformers: A Survey https://arxiv.org/abs/2206.06488 Multimodal Learning with Transformers: A SurveyTransformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AIarxiv.org해당 논문을 보고 작성했습니다. Multimodal Learning (MML)최근 몇 년 사이에.. 연구실 공부 2024. 6. 5.

[논문] VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis https://arxiv.org/abs/2403.08764 VLOGGER: Multimodal Diffusion for Embodied Avatar SynthesisWe propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2)arxiv.org해당 논문을 보고 작성했습니다. Abstract저자들은 VLOGGER를 제.. 연구실 공부 2024. 5. 27.

[논문] Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech https://arxiv.org/abs/2106.06103 Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-SpeechSeveral recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS methodarxiv.org해당 논문을 보고 .. 연구실 공부 2024. 5. 23.

[논문] Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis https://arxiv.org/abs/2005.08209 Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisHumans involuntarily tend to infer parts of the conversation from lip movements when the speech is absent or corrupted by external noise. In this work, we explore the task of lip to speech synthesis, i.e., learning to generate natural speech given only thearxiv.org해당 논문을 보고 작성했습니다. Abstractsp.. 연구실 공부 2024. 5. 22.

[논문] Voice-Face Cross-modal Matching and Retrieval: A Benchmark https://arxiv.org/abs/1911.09338 Voice-Face Cross-modal Matching and Retrieval: A BenchmarkCross-modal associations between voice and face from a person can be learnt algorithmically, which can benefit a lot of applications. The problem can be defined as voice-face matching and retrieval tasks. Much research attention has been paid on these taskarxiv.org해당 논문을 보고 작성했습니다. Abstractvoice와 face 사이의 .. 연구실 공부 2024. 5. 20.

[논문] Hearing Faces: Target Speaker Text-to-Speech Synthesis from a Face https://ieeexplore.ieee.org/document/9687866 Hearing Faces: Target Speaker Text-to-Speech Synthesis from a FaceThe existence of a learnable cross-modal association between a person's face and their voice is recently becoming more and more evident. This provides the basis for the task of target speaker text-to-speech (TTS) synthesis from face ref-erence. In this papieeexplore.ieee.org해당 논문을 보고 작성.. 연구실 공부 2024. 5. 19.

[논문] CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-training https://arxiv.org/abs/2305.10763 CLAPSpeech: Learning Prosody from Text Context with Contrastive Language-Audio Pre-trainingImproving text representation has attracted much attention to achieve expressive text-to-speech (TTS). However, existing works only implicitly learn the prosody with masked token reconstruction tasks, which leads to low training efficiency and difficulty iarxiv.org해당 논문을 보고.. 연구실 공부 2024. 5. 18.

[논문] What Does Your Face Sound Like? 3D Face Shape towards Voice https://dl.acm.org/doi/abs/10.1609/aaai.v37i11.26628 What does your face sound like? 3D face shape towards voice | Proceedings of the Thirty-Seventh AAAI Conference on Artificial InABSTRACT Face-based speech synthesis provides a practical solution to generate voices from human faces. However, directly using 2D face images leads to the problems of uninterpretability and entanglement. In this pape.. 연구실 공부 2024. 5. 17.

이전 1 ··· 7 8 9 10 11 12 13 ··· 22 다음

728x90

티스토리툴바