Iaroslav V. Ponomarenko

Hi! I’m Iaroslav Ponomarenko, a third-year master’s student in Computer Science at the Center on Frontiers of Computing Studies, Peking University, where I’m supervised by Professor Hao Dong. I’m also a visiting student researcher at the Mohamed Bin Zayed University of Artificial Intelligence, mentored by Professor Yoshihiko Nakamura.

Before coming to Peking University, I earned Master of Science degree in Information Systems and Technologies from Voronezh Institute of High Technologies and Bachelor of Science in Automated Information Processing and Control Systems from Borisoglebsk College of Informatics and Computer Engineering.

Portrait of Iaroslav V. Ponomarenko

Research focus

I’m fascinated by how we can build machines that don’t just act, but understand—how they can learn not only to pick, push, or move, but to reason about what they’re doing, why it matters, and how the world might respond. My research lies at the intersection of perception, reasoning, and action in embodied agents, with a focus on grounding robotic behavior in structured models of affordance, causality, and intent.

To bridge the gap between low-level control and high-level understanding, I design systems that recognize where and how to interact with objects, anticipate the consequences of their actions, and adapt fluidly to context using visual and language cues. This vision has taken form through a series of embodied learning contributions: from predicting SE(3)-invariant affordances for articulated objects5, to selecting informative viewpoints using only RGB inputs6, to answering spatial questions grounded in robot memory8, and forecasting future interactions through keyframe-conditioned planning9.

In parallel, I explore how large vision-language-action models can perform open-vocabulary manipulation through in-context prompts10 and reason about embodied tasks in language-driven environments7.

These threads converge toward a broader goal: developing a unified framework for spatiotemporal planning and theory-of-mind-based control. The aim is to enable agents that not only perceive and act, but also abstract, explain, and anticipate—learning to behave with an awareness of structure, intent, and purpose.

News

Publications

(*) Equal contribution. (†) Corresponding author. Highlighted entries denote papers selected for special recognition.

  1. ManipGPT graphical abstract
    Video abstract for ManipGPT

    Kim, T., Bae, H., Li, Z., Li, X., Ponomarenko, I., Wu, R. & Dong, H.. ManipGPT – is affordance segmentation by large vision models enough for articulated object manipulation? Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2025).

  2. CrayonRobo graphical abstract
    Video abstract for CrayonRobo

    Li, X., Xu, L., Zhang, M., Liu, J., Shen, Y., Ponomarenko, I., Xu, J., Heng, L., Huang, S., Zhang, S. & Dong, H.. CrayonRobo – object-centric prompt-driven vision-language-action model for robotic manipulation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2025).

  3. SpatialBot graphical abstract
    Video abstract for SpatialBot

    Cai, W.*, Ponomarenko, I.*, Yuan, J., Li, X., Yang, W., Dong, H. & Zhao, B.. SpatialBot – precise spatial understanding with vision language models. Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (2025).

  4. ManipVQA graphical abstract
    Video abstract for ManipVQA

    Huang, S.*, Ponomarenko, I.*, Jiang, Z., Li, X., Hu, X., Gao, P., Li, H. & Dong, H.. ManipVQA – injecting robotic affordance and physically grounded information into multi-modal large language models. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2024). Oral Pitch

  5. ImageManip graphical abstract
    Video abstract for ImageManip

    Li, X., Wang, Y., Shen, Y., Ponomarenko, I., Lu, H., Wang, Q., An, B., Liu, J. & Dong, H.. ImageManip – image-based robotic manipulation with affordance-guided next view selection. Preprint (2023).

  6. LPAVAA3DOM graphical abstract
    Video abstract for LPAVAA3DOM

    Ju, Y., Geng, H., Yang, M., Geng, Y., Ponomarenko, I., Kim, T., Wang, H. & Dong, H.. Learning part-aware visual actionable affordance for 3D articulated object manipulation. Proceedings of the CVPR Workshop on 3D Vision and Robotics (3DVR 2023), Vancouver, Canada, 18 June 2023. Spotlight

  7. Sukhanov, A. A. & Ponomarenko, I. V.. Application of block periodization in the design of health-prolonging training cycles. Proceedings of the Interregional Final Scientific Student Conferences: "Student Science" and "Young Scientists of SCOLIPE" 354, 275–279 (2017).

  8. Sukhanov, A. A., Ponomarenko, I. V. & Rubin, V. S.. The potential of instrumental methods for medical soft tissue diagnostics in physical education and health-improving training. Fitness-Aerobics–2016: Proceedings of the All-Russian Scientific Online Conference 226, 97–98 (2016).

  9. Sukhanov, A. A., Ponomarenko, I. V. & Rubin, V. S.. The study of methodological approaches to intermuscular coordination and strength development in women of early adulthood engaged in health-improving training. Fitness-Aerobics–2016: Proceedings of the All-Russian Scientific Online Conference 226, 98–100 (2016).

  10. Sukhanov, A. A. & Ponomarenko, I. V.. Assessment of the muscle condition as one of the physical health indicators in the framework of physical education and health-improving training. Proceedings of Students and Young Scientists of the Russian State University of Physical Education, Sport, Youth and Tourism 279, 78–80 (2016).

Service

Conference Reviewer: Robotics: Science and Systems (RSS 2025); International Conference on Robotics and Automation (ICRA 2025)

Graduate Student Member: IEEE (2024–present); CAAI (2024–2029)

Teaching Assistant: Fundamentals of AI, Peking University (Spring 2024)