|
Ao Li (李奡)
Hi! I am a master's student in Artificial Intelligence at Tsinghua University, advised by Prof. Yansong Tang and Prof. Jiwen Lu at THU-IVG Lab.
I received my B.S. degree in Artificial Intelligence from Beijing Normal University in 2024. Before that, I worked as an intern at BNU-IVC Lab under the supervision of Prof. Yongzhen Huang and Prof. Saihui Hou, conducting research on gait recognition.
My research interests include Embodied AI and human-robot interaction.
Email /
GitHub /
Google Scholar
|
|
|
Tsinghua University, 2024.09 - Present
M.S. in Artificial Intelligence
Shenzhen International Graduate School
|
|
Beijing Normal University, 2020.09 - 2024.06
B.S. in Artificial Intelligence
School of Artificial Intelligence
|
|
JD Joy Future Academy, Shenzhen, China. 2026.03 - Present
Project: Egocentric Human Videos for VLA/WAM Pretraining.
Working with Dr. Zhihao Yuan
|
|
Tencent Robotics-X, Shenzhen, China. 2025.05 - 2026.03
Project: VLM for Human-Robot Interaction.
Worked with Dr. Yonggen Ling
|
|
News
2026-06: JoyAI-Sim was released! See our technical report for more details.
2026-06: One paper on Human-Robot Interaction was accepted to ECCV 2026.
2026-04: JoyAI-RA 0.1 was released! See our technical report for more details.
2026-01: One paper on Efficient Image Enhancement was accepted to ICLR 2026.
2025-06: One paper on Human-Object Interaction Reconstruction was accepted to ICCV 2025.
2025-01: One paper on Efficient Image Enhancement was accepted to ICLR 2025.
2024-06: Invited as a Spotlight Presenter at the MANGO workshop at CVPR 2024.
2024-04: Our work FlowIE was selected for an oral presentation at CVPR 2024!
2024-02: Two papers on Human Mesh Recovery and Image Enhancement were accepted to CVPR 2024.
|
|
Selected Publications
(*Equal contribution, #Corresponding author)
|
|
JoyAI-Sim: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid
JD Joy Future Academy (core contributor)
arXiv, 2026
[Tech Report]
[Project Page]
A simulation data transformation toolchain, Robot ⇌ Simulation ⇌ Human, built upon the embodied data pyramid.
|
|
JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
JD Joy Future Academy (core contributor)
arXiv, 2026
[Tech Report]
[Project Page]
A vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation.
|
|
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
Ao Li*, Yonggen Ling*, Yiyang Lin, Yuji Wang, Yong Deng, Yansong Tang
European Conference on Computer Vision (ECCV), 2026
[Paper]
[Code]
We propose TAIHRI, the first vision-language model (VLM) tailored for close-range HRI perception, capable of understanding users' motion commands and directing the robot's attention to the most task-relevant keypoints.
|
|
VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
Yixuan Zhu*, Shilin Ma*, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang#, Lei Chen, Jiwen Lu, Jie Zhou
The Fourteenth International Conference on Learning Representations (ICLR), 2026
[Paper]
[Code]
[Project Page]
We introduce VARestorer, a one-step VAR distillation framework for real-world image super-resolution that mitigates error accumulation.
|
|
ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
Ao Li, Jinpeng Liu, Yixuan Zhu, Yansong Tang
IEEE International Conference on Computer Vision (ICCV), 2025
[arXiv]
[Code]
We propose ScoreHOI, a framework for human-object interaction reconstruction via score-guided diffusion to enhance the physical plausibility.
|
|
InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu
The Thirteenth International Conference on Learning Representations (ICLR), 2025
[Paper]
We propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to leverage strong generative capabilities and reduce the number of sampling steps.
|
|
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
Yixuan Zhu*, Ao Li*, Yansong Tang#, Wenliang Zhao, Jie Zhou, Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[arXiv]
[Code]
[Project Page]
We propose a new method to exploit diffusion priors for human mesh recovery (HMR) in occlusion and crowded scenarios.
|
|
FlowIE: Efficient Image Enhancement via Rectified Flow
Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang#, Jie Zhou, Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Oral Presentation
[arXiv]
[Code]
We propose a unified framework for various efficient image enhancement tasks with generative diffusion priors.
|
|
Selected Honors and Awards
First-Class Scholarship (Top 3%), Tsinghua University, 2025.
Outstanding Bachelor Graduate of Beijing, 2024.
"Jingshi" First-Class Scholarship (Top 10%), Beijing Normal University, 2021-2023.
Potential Star Award - Meituan Second Low-Altitude Economy UAV Management Challenge (Innovation Track), 2024.
|
|