Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Real-time Dynamic Scene Reconstruction
Autonomous interaction in real-world environments requires robots to build an internal representation of the observed scene and the objects therein. The goal of this project is to integrate efficient tracking of multiple (potentially) moving objects into an existing scene reconstruction framework.
Keywords: RGB-D Perception, Object Detection, Instance-Aware Semantic Segmentation, 3D reconstruction, Dynamic SLAM
Robots operating autonomously in unstructured, real-world environments cannot rely on a detailed and accurate a priori model of their surroundings. They must therefore be able to robustly perceive the complex surrounding scene and acquire task-relevant knowledge to guide subsequent interaction planning. Besides building an internal representation of the observed geometry, the key insight towards a truly functional understanding of the environment for autonomous mobile manipulation applications is the usage of higher-level entities during mapping, such as individual object instances. Additionally, the design of the representation used to map the environment should allow efficient tracking of multiple objects moving in the scene as a result of humans, or even the robot itself, interacting with them.
The goal of this project is to start from an existing Simultaneous Localization and Mapping (SLAM) framework that densely reconstructs the observed surface geometry in static environments [1] and adapt it to account for the presence of multiple rigid, independently moving objects in the scene. First, individual object instances in the scene need to be detected from the RGB-D stream of a 3D camera by combining an unsupervised geometric segmentation with a instance-aware semantic segmentation network [2]. Next, partial knowledge about the location and shape of the predicted segments in each frame will be incrementally fused within the dense environment map built by the SLAM framework. Finally, such extended object-centric map representation will allow to track the pose of each individual non-stationary element across frames while simultaneously reconstructing its dense 3D shape [3].
References:
[1] Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure, _Kahler et al._, 2016
[2] Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery, _Grinvald et al._, 2019
[3] MID-Fusion: Octree-based Object-Level Multi-Instance Dynamic SLAM, _Xu et al._, 2018
Robots operating autonomously in unstructured, real-world environments cannot rely on a detailed and accurate a priori model of their surroundings. They must therefore be able to robustly perceive the complex surrounding scene and acquire task-relevant knowledge to guide subsequent interaction planning. Besides building an internal representation of the observed geometry, the key insight towards a truly functional understanding of the environment for autonomous mobile manipulation applications is the usage of higher-level entities during mapping, such as individual object instances. Additionally, the design of the representation used to map the environment should allow efficient tracking of multiple objects moving in the scene as a result of humans, or even the robot itself, interacting with them.
The goal of this project is to start from an existing Simultaneous Localization and Mapping (SLAM) framework that densely reconstructs the observed surface geometry in static environments [1] and adapt it to account for the presence of multiple rigid, independently moving objects in the scene. First, individual object instances in the scene need to be detected from the RGB-D stream of a 3D camera by combining an unsupervised geometric segmentation with a instance-aware semantic segmentation network [2]. Next, partial knowledge about the location and shape of the predicted segments in each frame will be incrementally fused within the dense environment map built by the SLAM framework. Finally, such extended object-centric map representation will allow to track the pose of each individual non-stationary element across frames while simultaneously reconstructing its dense 3D shape [3].
References:
[1] Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure, _Kahler et al._, 2016
[2] Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery, _Grinvald et al._, 2019
- Review relevant literature and familiarize with an existing real-time dense SLAM framework
- Integrate per-frame geometric and semantic segmentation into the map
- Implement efficient dynamic object tracking and map update strategies
- Evaluate on public datasets and compare against existing methods
- [Depending on the progress] Integrate the framework on a real robot.
- Review relevant literature and familiarize with an existing real-time dense SLAM framework - Integrate per-frame geometric and semantic segmentation into the map - Implement efficient dynamic object tracking and map update strategies - Evaluate on public datasets and compare against existing methods
- [Depending on the progress] Integrate the framework on a real robot.
- Strong interest in computer vision
- Good programming skills in C++/Python
- Interest or experience with one or more of the following is a plus: 3D vision, GPU programming, ROS, GIT.
- Strong interest in computer vision - Good programming skills in C++/Python - Interest or experience with one or more of the following is a plus: 3D vision, GPU programming, ROS, GIT.
Please apply with your CV and academic transcripts to Margarita Grinvald (margarita.grinvald@mavt.ethz.ch)
Please apply with your CV and academic transcripts to Margarita Grinvald (margarita.grinvald@mavt.ethz.ch)