Ao Li (李奡)

Hi! I am a master's student in Artificial Intelligence at Tsinghua University, advised by Prof. Yansong Tang and Prof. Jiwen Lu at THU-IVG Lab. I received my B.S. degree in Artificial Intelligence from Beijing Normal University in 2024. Before that, I worked as an intern at BNU-IVC Lab under the supervision of Prof. Yongzhen Huang and Prof. Saihui Hou, conducting research on gait recognition.

My research interests include Embodied AI and human-robot interaction.

Email  /  GitHub  /  Google Scholar

profile photo

Education

dise Tsinghua University, 2024.09 - Present
  • M.S. in Artificial Intelligence
  • Shenzhen International Graduate School
  • dise Beijing Normal University, 2020.09 - 2024.06
  • B.S. in Artificial Intelligence
  • School of Artificial Intelligence
  • Industry Experience

    dise JD Joy Future Academy, Shenzhen, China. 2026.03 - Present
  • Project: Egocentric Human Videos for VLA/WAM Pretraining.
  • Working with Dr. Zhihao Yuan
  • dise Tencent Robotics-X, Shenzhen, China. 2025.05 - 2026.03
  • Project: VLM for Human-Robot Interaction.
  • Worked with Dr. Yonggen Ling
  • News

  • 2026-06: JoyAI-Sim was released! See our technical report for more details.
  • 2026-06: One paper on Human-Robot Interaction was accepted to ECCV 2026.
  • 2026-04: JoyAI-RA 0.1 was released! See our technical report for more details.
  • 2026-01: One paper on Efficient Image Enhancement was accepted to ICLR 2026.
  • 2025-06: One paper on Human-Object Interaction Reconstruction was accepted to ICCV 2025.
  • 2025-01: One paper on Efficient Image Enhancement was accepted to ICLR 2025.
  • 2024-06: Invited as a Spotlight Presenter at the MANGO workshop at CVPR 2024.
  • 2024-04: Our work FlowIE was selected for an oral presentation at CVPR 2024!
  • 2024-02: Two papers on Human Mesh Recovery and Image Enhancement were accepted to CVPR 2024.
  • Selected Publications

    (*Equal contribution, #Corresponding author)

    dise JoyAI-Sim: A Simulation-Enabled Interconversion Toolchain for the Embodied Data Pyramid
    JD Joy Future Academy (core contributor)
    arXiv, 2026
    [Tech Report] [Project Page]
    A simulation data transformation toolchain, Robot ⇌ Simulation ⇌ Human, built upon the embodied data pyramid.
    dise JoyAI-RA 0.1: A Foundation Model for Robotic Autonomy
    JD Joy Future Academy (core contributor)
    arXiv, 2026
    [Tech Report] [Project Page]
    A vision-language-action (VLA) embodied foundation model tailored for generalizable robotic manipulation.
    dise TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
    Ao Li*, Yonggen Ling*, Yiyang Lin, Yuji Wang, Yong Deng, Yansong Tang
    European Conference on Computer Vision (ECCV), 2026
    [Paper] [Code]
    We propose TAIHRI, the first vision-language model (VLM) tailored for close-range HRI perception, capable of understanding users' motion commands and directing the robot's attention to the most task-relevant keypoints.
    dise VARestorer: One-Step VAR Distillation for Real-World Image Super-Resolution
    Yixuan Zhu*, Shilin Ma*, Haolin Wang, Ao Li, Yanzhe Jing, Yansong Tang#, Lei Chen, Jiwen Lu, Jie Zhou
    The Fourteenth International Conference on Learning Representations (ICLR), 2026
    [Paper] [Code] [Project Page]
    We introduce VARestorer, a one-step VAR distillation framework for real-world image super-resolution that mitigates error accumulation.
    dise ScoreHOI: Physically Plausible Reconstruction of Human-Object Interaction via Score-Guided Diffusion
    Ao Li, Jinpeng Liu, Yixuan Zhu, Yansong Tang
    IEEE International Conference on Computer Vision (ICCV), 2025
    [arXiv] [Code]
    We propose ScoreHOI, a framework for human-object interaction reconstruction via score-guided diffusion to enhance the physical plausibility.
    dise InstaRevive: One-Step Image Enhancement via Dynamic Score Matching
    Yixuan Zhu, Haolin Wang, Ao Li, Wenliang Zhao, Yansong Tang, Jingxuan Niu, Lei Chen, Jie Zhou, Jiwen Lu
    The Thirteenth International Conference on Learning Representations (ICLR), 2025
    [Paper]
    We propose InstaRevive, a straightforward yet powerful image enhancement framework that employs score-based diffusion distillation to leverage strong generative capabilities and reduce the number of sampling steps.
    dise DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery
    Yixuan Zhu*, Ao Li*, Yansong Tang#, Wenliang Zhao, Jie Zhou, Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    [arXiv] [Code] [Project Page]
    We propose a new method to exploit diffusion priors for human mesh recovery (HMR) in occlusion and crowded scenarios.
    dise FlowIE: Efficient Image Enhancement via Rectified Flow
    Yixuan Zhu, Wenliang Zhao, Ao Li, Yansong Tang#, Jie Zhou, Jiwen Lu
    IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
    Oral Presentation
    [arXiv] [Code]
    We propose a unified framework for various efficient image enhancement tasks with generative diffusion priors.
    Selected Honors and Awards

  • First-Class Scholarship (Top 3%), Tsinghua University, 2025.
  • Outstanding Bachelor Graduate of Beijing, 2024.
  • "Jingshi" First-Class Scholarship (Top 10%), Beijing Normal University, 2021-2023.
  • Potential Star Award - Meituan Second Low-Altitude Economy UAV Management Challenge (Innovation Track), 2024.