Yiyuan Zhang

I am a first-year Ph.D. student at Multimedia Lab of The Chinese University of HongKong. I focus on foundation models in multimodal learning and Ai for Science.

Prior to MMLab, I received my B.S. degree from Beijing Institute of Technology in 2023. I worked as a research intern at Megvii (Face++), Alibaba Group, and Shanghai AI Lab.

Email  /  Google Scholar  /  Github  /  Zhihu

profile photo



I'm interested in Multimoda Foundation Models (Architectural Designs/ Pretraining Framework/ Generative Models). And I'm also curious about research topics of Ai for Science. (Researh cooperation is welcomed, please drop me an email.)

Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang*, Kaixiong Gong*,Kaipeng Zhang†, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue†
Preprint , 2023

Single foundation model handles 12 modalities and supports a wide range of applications.

PDF | Project | Code | Video

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue†
CVPR , 2024

A flexible multimodal architectures, to reveal modalities interaction and benefits, early exploration in multimodal unpaired data.

PDF | Project | Code | Video

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video,Point Cloud, Time-Series and Image Recognition
Xiaohan Ding*, Yiyuan Zhang*, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan
CVPR , 2024

The first CNN-based achtecture for general-purpose multimodal learning, highlighting the universal perception ability.

PDF | Project | Code | Zhihu

OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue
CVPR , 2024

A simple framework to align 8 modalities with large language models.

PDF | Project | Code


  • Conference Reviewer: ECCV'22,24, ACM MM'23,24, NeurIPS'23, ICLR'24, CVPR'24, and ICML'24.
  • Journal Reviewer: IEEE TIP
  • Awards and Honors

  • 36th XuTeLi Scholarship, 19/8000+.
  • SenseTime Scholarship 2022, 30 in China.
  • HUAWEI Future Star Scholarship 2021, 10 in Beijing Institute of Technology.
  • Teaching

  • CSCI-2100B Data Structure
  • IERG-2470 Probability Models and Applications

  • Website template courtesy