본문 바로가기

Jeongwooyeol

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

2024/06/10

[논문] VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text https://arxiv.org/abs/2104.11178 VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and TextWe present a framework for learning multimodal representations from unlabeled data using convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer (VATT) takes raw signals as inputs and extracts multimodal representations tarxiv.org해당 논문을 보고 작.. 연구실 공부 2024. 6. 10.

이전 1 다음

티스토리툴바