Yiyuan Zhang

I am a first-year Ph.D. student at Multimedia Lab of The Chinese University of HongKong. I focus on foundation models in multimodal learning and Ai for Science.

Prior to MMLab, I received my B.S. degree from Beijing Institute of Technology in 2023. I worked as a research intern at Megvii (Face++), Alibaba Group, and Shanghai AI Lab.

Email / Google Scholar / Github / Zhihu

News

[Apr. 2024] One paper is accepted to IEEE TPAMI
[Mar. 2024] Four papers are accepted to CVPR 2024
[Sept. 2023] Meta-Transformer is highlighted at MIT Tech Review.

Research

I'm interested in Multimoda Foundation Models (Architectural Designs/ Pretraining Framework/ Generative Models). And I'm also curious about research topics of Ai for Science. (Researh cooperation is welcomed, please drop me an email.)

	Meta-Transformer: A Unified Framework for Multimodal Learning Yiyuan Zhang, Kaixiong Gong,Kaipeng Zhang†, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue† Preprint , 2023 Single foundation model handles 12 modalities and supports a wide range of applications. PDF \| Project \| Code \| Video
	Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue† CVPR , 2024 A flexible multimodal architectures, to reveal modalities interaction and benefits, early exploration in multimodal unpaired data. PDF \| Project \| Code \| Video
	UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video,Point Cloud, Time-Series and Image Recognition Xiaohan Ding, Yiyuan Zhang, Yixiao Ge, Sijie Zhao, Lin Song, Xiangyu Yue, Ying Shan CVPR , 2024 The first CNN-based achtecture for general-purpose multimodal learning, highlighting the universal perception ability. PDF \| Project \| Code \| Zhihu
	OneLLM: One Framework to Align All Modalities with Language Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue CVPR , 2024 A simple framework to align 8 modalities with large language models. PDF \| Project \| Code

Services

Conference Reviewer: ECCV'22,24, ACM MM'23,24, NeurIPS'23, ICLR'24, CVPR'24, and ICML'24.

Journal Reviewer: IEEE TIP

Awards and Honors

36th XuTeLi Scholarship, 19/8000+.

SenseTime Scholarship 2022, 30 in China.

HUAWEI Future Star Scholarship 2021, 10 in Beijing Institute of Technology.

Teaching

CSCI-2100B Data Structure

IERG-2470 Probability Models and Applications

Website template courtesy