Iaroslav V. Ponomarenko

Hi! I’m Iaroslav Ponomarenko, a third-year master’s student in Computer Science at the Center on Frontiers of Computing Studies, Peking University, where I’m supervised by Professor Hao Dong. I’m also a visiting student researcher at the Mohamed Bin Zayed University of Artificial Intelligence, mentored by Professor Yoshihiko Nakamura.

Before coming to Peking University, I earned Master of Science degree in Information Systems and Technologies from Voronezh Institute of High Technologies and Bachelor of Science in Automated Information Processing and Control Systems from Borisoglebsk College of Informatics and Computer Engineering.

Research focus

I’m fascinated by how we can build machines that don’t just act, but understand—how they can learn not only to pick, push, or move, but to reason about what they’re doing, why it matters, and how the world might respond. My research lies at the intersection of perception, reasoning, and action in embodied agents, with a focus on grounding robotic behavior in structured models of affordance, causality, and intent.

To bridge the gap between low-level control and high-level understanding, I design systems that recognize where and how to interact with objects, anticipate the consequences of their actions, and adapt fluidly to context using visual and language cues. This vision has taken form through a series of embodied learning contributions: from predicting SE(3)-invariant affordances for articulated objects⁵, to selecting informative viewpoints using only RGB inputs⁶, to answering spatial questions grounded in robot memory⁸, and forecasting future interactions through keyframe-conditioned planning⁹.

In parallel, I explore how large vision-language-action models can perform open-vocabulary manipulation through in-context prompts¹⁰ and reason about embodied tasks in language-driven environments⁷.

These threads converge toward a broader goal: developing a unified framework for spatiotemporal planning and theory-of-mind-based control. The aim is to enable agents that not only perceive and act, but also abstract, explain, and anticipate—learning to behave with an awareness of structure, intent, and purpose.

News

16 Jun 2025
ManipGPT¹⁰ accepted for publication at IROS 2025.
1 Apr 2025
Commenced a visiting student appointment at MBZUAI (Abu Dhabi, United Arab Emirates).
1 Mar 2025
Concluded a one-year research internship at AGIBot (Beijing, China).
27 Feb 2025
CrayonRobo⁹ accepted for publication at CVPR 2025.
28 Jan 2025
SpatialBot⁸ accepted for publication at ICRA 2025.
17 Oct 2024
Presented ManipVQA⁷ at IROS 2024; certificate issued (Abu Dhabi, United Arab Emirates).
12 Aug 2024
Presented ManipVQA⁷ at Microsoft Research Asia Summer Tech Fest (Beijing, China).
30 Jun 2024
ManipVQA⁷ accepted for publication at IROS 2024.
1 Mar 2024
Began a research internship at AGIBot (Beijing, China).
1 Sep 2023
Commenced Master’s studies in Computer Science at Peking University (Beijing, China).
31 Aug 2023
Concluded a three-month visiting student appointment at Peking University (Beijing, China).
1 Jun 2023
Commenced a visiting student appointment at Peking University (Beijing, China).

Publications

(*) Equal contribution. (†) Corresponding author. Highlighted entries denote papers selected for special recognition.

Video abstract for ManipGPT
Kim, T., Bae, H., Li, Z., Li, X., Ponomarenko, I., Wu, R. & Dong, H.^†. ManipGPT – is affordance segmentation by large vision models enough for articulated object manipulation? Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2025).
- arXiv
Video abstract for CrayonRobo
Li, X., Xu, L., Zhang, M., Liu, J., Shen, Y., Ponomarenko, I., Xu, J., Heng, L., Huang, S., Zhang, S. & Dong, H.^†. CrayonRobo – object-centric prompt-driven vision-language-action model for robotic manipulation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2025).
- arXiv
Video abstract for SpatialBot
Cai, W.*, Ponomarenko, I.*, Yuan, J., Li, X., Yang, W., Dong, H. & Zhao, B.^†. SpatialBot – precise spatial understanding with vision language models. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2025).
- PDF
- GitHub
Video abstract for ManipVQA
Huang, S.^*, Ponomarenko, I.^*, Jiang, Z., Li, X., Hu, X., Gao, P., Li, H. & Dong, H.^†. ManipVQA – injecting robotic affordance and physically grounded information into multi-modal large language models. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2024). Oral Pitch
- arXiv
- Oral Pitch
- Slides
- Poster
- GitHub
Video abstract for ImageManip
Li, X., Wang, Y., Shen, Y., Ponomarenko, I., Lu, H., Wang, Q., An, B., Liu, J. & Dong, H.^†. ImageManip – image-based robotic manipulation with affordance-guided next view selection. Preprint (2023).
- arXiv
Video abstract for LPAVAA3DOM
Ju, Y., Geng, H., Yang, M., Geng, Y., Ponomarenko, I., Kim, T., Wang, H. & Dong, H.^†. Learning part-aware visual actionable affordance for 3D articulated object manipulation. Proceedings of the CVPR Workshop on 3D Vision and Robotics (3DVR 2023), Vancouver, Canada, 18 June 2023. Spotlight
- PDF
Sukhanov, A. A. & Ponomarenko, I. V.. Application of block periodization in the design of health-prolonging training cycles. Proceedings of the Interregional Final Scientific Student Conferences: "Student Science" and "Young Scientists of SCOLIPE" 354, 275–279 (2017).
- eLIBRARY
Sukhanov, A. A., Ponomarenko, I. V. & Rubin, V. S.^†. The potential of instrumental methods for medical soft tissue diagnostics in physical education and health-improving training. Fitness-Aerobics–2016: Proceedings of the All-Russian Scientific Online Conference 226, 97–98 (2016).
- eLIBRARY
Sukhanov, A. A., Ponomarenko, I. V. & Rubin, V. S.^†. The study of methodological approaches to intermuscular coordination and strength development in women of early adulthood engaged in health-improving training. Fitness-Aerobics–2016: Proceedings of the All-Russian Scientific Online Conference 226, 98–100 (2016).
- eLIBRARY
Sukhanov, A. A. & Ponomarenko, I. V.. Assessment of the muscle condition as one of the physical health indicators in the framework of physical education and health-improving training. Proceedings of Students and Young Scientists of the Russian State University of Physical Education, Sport, Youth and Tourism 279, 78–80 (2016).
- eLIBRARY