About Me
I am a Research Engineer at Alibaba DAMO Academy. I lead a small (but great) team to build multimodal foundation models that perceive, reason and act in physical world.
Research Interests
- Multimodal LLMs for Robot Perception & Manipulation
- Generalist Vision-Language-Action Models
- Action Policy Beyond 2D Vision
Education
Experiences
- Mar 2021-, Research Engineer, Alibaba DAMO Academy.
- Jul 2020-Feb 2021, Research Intern, Huawei Noah’s Ark Lab. Mentor: Dr. Yi Liao.
- Jan 2018-Apr 2018, Research Intern, Tencent AI Lab. Mentor: Dr. Lidong Bing and Dr. Piji Li.
- Jul 2015-Jun 2016, Research intern, Microsoft Research Asia. Mentor: Dr. Chin-Yew Lin and Dr. Jing Liu
- Sep 2014-July 2015, Research assistant, SentiNet Group, Sun Yat-Sen University. Supervisor: Prof. Rao Yanghui
- RynnBrain: Open Embodied Foundation Models
RynnBrain Team.
[code (RynnScale)][code (eval & infer)][checkpoints & demos]
- RynnVLA-002: A Unified Vision-Language-Action and World Model
Jun Cen* , Siteng Huang* , Yuqian Yuan* , Kehan Li* , Hangjie Yuan, Chaohui Yu, Yuming Jiang, Jiayan Guo, Xin Li, Hao Luo, Fan Wang, Deli Zhao, Hao Chen.
[code][checkpoint]
- RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation
Yuming Jiang, Siteng Huang, Shengke Xue, Yaxi Zhao, Jun Cen, Sicong Leng, Kehan Li, Jiayan Guo, Kexiang Wang, Mingxiu Chen, Fan Wang, Deli Zhao, Xin Li.
[code][blog][checkpoint]
To appear in ICRA 2026 (Full paper)
- RynnEC: Bringing MLLMs into Embodied World
Ronghao Dang* , Yuqian Yuan* , Yunxuan Mao* , Kehan Li* , Jiangpin Liu, Zhikai Wang, Fan Wang, Deli Zhao, Xin Li.
arXiv:2508.14160
[code][checkpoints & demos]
- VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Boqiang Zhang* , Kehan Li* , Zesen Cheng* , Zhiqiang Hu* , Yuqian Yuan* , Guanzheng Chen* , Sicong Leng* , Yuming Jiang* , Hang Zhang* , Xin Li* , Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao.
arXiv:2501.13106
[code][checkpoints & demos]
- EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan, Ronghao Dang, Long Li, Wentong Li, Dian Jiao, Xin Li, Deli Zhao, Fan Wang, Wenqiao Zhang, Jun Xiao, Yueting Zhuang.
[code][benchmark]
In NeurIPS D&B Track 2025 (Full paper, poster)
- 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li, Jiashuo Sun, Yongliang Shen, Weiming Lu, Deli Zhao, Yueting Zhuang, Lidong Bing.
In ICCV 2025 (Full paper, highlight)
[code][dataset]
- VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li, Zesen Cheng, Boqiang Zhang, Long Li, Xin Li, Deli Zhao, Wenqiao Zhang, Yueting Zhuang, Jianke Zhu, Lidong Bing.
In CVPR 2025 (Full paper, poster)
[code (new)][videorefer-700k][videorefer-bench][videorefer-videollama3 (new)]
- ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang, Yuqian Yuan, Wenqi Zhang, Yifei Xin, Boqiang Zhang, Long Li, Liuyi Wang, Qinyang Zeng, Xin Li, Lidong Bing.
In CVPR 2025 (Full paper, poster)
[code][dataset]
Honors & Awards
- AAAI Student Scholarship, 2020
- Outstanding Graduates Awards, Sun Yat-Sen University.
- Excellent Undergraduate Thesis award, Sun Yat-Sen University.
Professional Activities
- Reviewer (or PC Member):
- ACL 2020-2023, EMNLP 2018-2023, NAACL 2021
- AAAI 2019-2020
- WSDM 2023, CIKM 2021
- ACM Transactions on Knowledge Discovery from Data (TKDD)
- ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP)
- IEEE Transaction on Multimedia (TMM)
- Neurocomputing
- Computational Intelligence
Some Useful Notes & Links
Hobbies
- Playing basketball
- Swimming
- Hiking