Siheng Zhao

I am a first-year CS Ph.D. Student at University of Southern California, advised by Prof. Yue Wang. I received my Bachelor's degree with the highest honor at School of Artificial Intelligence, Nanjing University.

Previously, I was honoured to work with Prof. Tao Yu, Prof. Yanchao Yang, Prof. Lin Shao, and Dr. Jiangmiao Pang at the University of Hong Kong, National University of Singapore, and Shanghai AI Lab.

Email  /  Google Scholar  /  Semantic Scholar  /  Github  /  Twitter

profile photo
News
  • [Sep, 2024] OSWorld is accepted by NeurIPS 2024.
  • [Sep, 2024] TieBot is accepted by CoRL 2024 as Oral.
  • [Aug, 2024] Formally join USC as a PhD student.
  • [Jan, 2024] kNN-BOX is accepted by EACL 2024.
  • [Jan, 2024] Text2Reward and Lemur are accepted by ICLR 2024 as Spotlight.
  • [Dec, 2023] Awarded SenseTime Scholarship (30 undergraduates in the field of AI in China).
  • [Dec, 2023] Awarded Nanjing University Top-Grade Scholarship (highest honor in Nanjing University).
  • [July, 2023] ClothesNet is accepted by ICCV 2023.
  • [June, 2023] DiffClothAI is accepted by IROS 2023.
Research

My research interests now lie at the intersection of Language-conditioned Generative Model and Embodied Intelligence. My long-term goal is to build the next generation of autonomous and intelligent agents that can proactively sense, plan, and interact with both the physical and digital world.

Current topics:

  • Video/Motion Generation Model for Embodied Control:
  • LLMs/VLMs for Embodied Control: Text2Reward (ICLR'24 Spotlight), GRUtopia (arXiv'24).

Previous topics:

  • Language and multimodal digital agents: Lemur (ICLR'24 Spotlight), OSWorld (NeurIPS'24).
  • Simulation, perception and manipulation of deformable objects: DiffClothAI (IROS'23), ClothesNet (ICCV'23), TieBot (CoRL'24 Oral).

Selected Publications

* denotes equal contribution. For the full publication list, please refer to my Google Scholar .

profile photo OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Tianbao Xie, Danyang Zhang, Jixuan Chen, Xiaochuan Li, Siheng Zhao, Ruisheng Cao, Toh Jing Hua, Zhoujun Cheng, Dongchan Shin, Fangyu Lei, Yitao Liu, Yiheng Xu, Shuyan Zhou, Silvio Savarese, Caiming Xiong, Victor Zhong, Tao Yu
Advances in Neural Information Processing Systems (NeurIPS) 2024
[arxiv] [project]

profile photo Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
Siheng Zhao*, Tianbao Xie*, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, Tao Yu
International Conference on Learning Representations (ICLR) 2024 Spotlight
[arxiv] [project]

profile photo TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach
Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jichen Sun, Cewu Lu, Lin Shao
Conference on Robot Learning (CoRL) 2024 Oral
[arxiv] [project]

profile photo DiffClothAI: Differentiable Cloth Simulation with Intersection-free Frictional Contact and Differentiable Two-Way Coupling with Articulated Rigid Bodies
Siheng Zhao*, Xinyuan Yu*, Siyuan Luo, Gang Yang, Lin Shao
International Conference on Intelligent Robots and Systems (IROS) 2023
[paper] [project]

Education
USC logo University of Southern California
Ph.D. in Computer Science (2024 - )
Advisor: Prof. Yue Wang
NJU logo Nanjing University
B.Eng. in Artificial Intelligence (2020 - 2024)
Overall GPA: 94.4/100, Ranking: 1/97
NUS logo National University of Singapore
Exchange in Computer Science (2023)
Overall GPA: 4.0/4.0
Work Experience
AILAB logo Shanghai AI Lab, OpenRobot Group
Research Intern (2024)
Advisor: Dr. Jiangmiao Pang
HKU logo The University of Hong Kong, NLP Group
Research Assistant (2023)
Advisor: Prof. Tao Yu
Services
  • Conference Reviewer:
    • International Conference on Learning Representations (ICLR) 2025
    • Annual Meeting of the Association for Computational Linguistics (ACL) 2024
    • Neural Information Processing Systems (NeurIPS) 2024
    • European Conference on Computer Vision (ECCV) 2024
    • IEEE International Conference on Robotics and Automation (ICRA) 2024
  • Journal Reviewer:
Talks
  • OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments, Shanghai AI Lab, 2024
  • Text2Reward: Reward Shaping with Language Models for Reinforcement Learning, SenseTime, 2024
  • Text2Reward: Reward Shaping with Language Models for Reinforcement Learning, Shanghai AI Lab, 2023, 2024
  • Large Language Model for Robotics: a High-level Planner, Nanjing University NLP Group, 2023
Honors & Awards
  • University of Southern California PhD Fellowship, 2024
  • Outstanding Graduate of Nanjing University, 2024
  • Travel Award of the International Conference on Learning Representations, 2024
  • Jiangsu Province Study Abroad Scholarship, 2024
  • SenseTime Scholarship, 2023, awarded to 30 undergraduates in the field of AI in China
  • Nanjing University Top-Grade Scholarship, 2023, the highest honor in Nanjing University
  • Bao Gang Scholarship & Special Prize Nomination, 2023
  • Heng Fang Scholarship, 2022
  • National Scholarship, 2021, the highest honor in China's University
  • China Telecom Scholarship, 2021

This homepage is designed based on Jon Barron's website and deployed on Github Pages.

Copyright 2024 © Siheng Zhao