|
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang*,
Kaixiong Gong*,Kaipeng Zhang†, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yueâ€
Preprint , 2023
Single foundation model handles 12 modalities and supports a wide range of applications.
PDF | Project |
Code | Video
|
|
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang,
Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yueâ€
CVPR , 2024
A flexible multimodal architectures, to reveal modalities interaction and benefits, early exploration in multimodal unpaired data.
PDF | Project |
Code | Video
|
|
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video,Point Cloud, Time-Series and Image Recognition
Xiaohan Ding*, Yiyuan Zhang*,
Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan
CVPR , 2024
The first CNN-based achtecture for general-purpose multimodal learning, highlighting the universal perception ability.
PDF | Project |
Code | Zhihu
|
|
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang,
Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue
CVPR , 2024
A simple framework to align 8 modalities with large language models.
PDF | Project |
Code
|
Services
Conference Reviewer: ECCV'22,24, ACM MM'23,24, NeurIPS'23, ICLR'24, CVPR'24, and ICML'24.
Journal Reviewer: IEEE TIP
|