Previously, I obtained an Engineering degree in
Information Systems and Technologies from
Voronezh Institute of High Technologies,
as well as a Technician degree in Automated Information
Processing and Control Systems from Borisoglebsk College
of Informatics and Computer Engineering.
My research focuses on the intersection of embodied AI, visual perception of
the world, reasoning, and robotic control. Specifically, I investigate how to
enable embodied agents to obtain environmental awareness through vision,
including affordance understanding [1, 2]
and spatial reasoning [3], in order
to perform complex manipulation tasks. Currently, I am investigating these
areas using large multimodal foundation models.
News
2024-10-17
Presented
ManipVQA [2] at
IROS 2024 (Abu Dhabi,
United Arab Emirates).
2024-08-12
Presented
ManipVQA [2] at
Microsoft Research Asia Tech Fest (Beijing, China).
2024-06-30
🎉 Our paper
ManipVQA [2] has
been accepted for publication at IROS 2024.
Service
Reviewer for the IEEE
International Conference on Robotics and Automation (ICRA 2025).
Selected Publications
(*) indicates equal contribution,
while (†)
denotes
the corresponding author
We introduce SpatialBot, a framework specifically designed to improve
spatial
reasoning of Vision Language Models (VLMs) by leveraging both RGB and
depth
images. To train VLMs for depth perception, we present the SpatialQA
and
SpatialQA-E datasets, which feature depth-related questions at
multiple levels. In addition, we release
models
fine-tuned on SpatialQA and SpatialQA-E datasets, and present
SpatialBench, a
comprehensive evaluation framework for assessing spatial understanding
capabilities of VLMs.
CVPR Workshop on 3D Vision and Robotics (CVPR @ 3DVR), 2023
[Spotlight Presentation]
Paper
/
Workshop
We introduce Part-aware Affordance Learning methods. Our
approach initially learns a prior for object parts and
then generates an affordance map. To further improve
precision, we incorporate a part-level scoring system to
identify the most suitable part for manipulation.
Last updated on Tuesday, November 26, 2024, at 05:34:20 AM.