[논문] VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis
https://arxiv.org/abs/2403.08764 VLOGGER: Multimodal Diffusion for Embodied Avatar SynthesisWe propose VLOGGER, a method for audio-driven human video generation from a single input image of a person, which builds on the success of recent generative diffusion models. Our method consists of 1) a stochastic human-to-3d-motion diffusion model, and 2)arxiv.org해당 논문을 보고 작성했습니다. Abstract저자들은 VLOGGER를 제..
연구실 공부
2024. 5. 27.