Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Deep learning enhanced trinocular depth optimization for obstacle avoidance onboard of a fixed-wing UAV
The goal of this project is to combine classical geometric with deep learning based depth estimation for binocular and trinocular camera setups
Keywords: Computer Vision, Deep Learning, UAV
ASL has been working on fixed-wing UAVs since 2007, more recently moving from solar powered UAVs to smaller and more agile planes. For high-speed maneuvers (ca. 15m/s), reliable long-range depth estimates computed with minimal delay are essential. A light-weight and inexpensive approach is to use low-resolution cameras. For this purpose, ASL mounted three time-synchronized cameras on the UAV: one in in the fuselage/center, one on the left wing tip and one on the right wing tip respectively. Furthermore, an inertial measurement unit (IMU) is rigidly attached to each camera. Since the wings are non-rigid, the visual and IMU measurements are required in order to compensate for vibrations and flapping motions and to estimate the relative transformations between the cameras. Based on the time-varying relative poses between two cameras a depth map and point cloud can be computed. As three cameras impose additional geometric constraints the depth reconstruction can be further improved.
Furthermore, deep learning approaches have demonstrated that it is possible to infer the depth from only one single image taken from a monocular camera. The goal of this project is to combine classical geometric depth reconstruction with deep learning based depth inference. The student can build up on an existing simulation framework (Gazebo/Rotors) to generate ground-truth data (images, IMU, relative poses) as well as existing hardware.
Related Literature
- Matteo Poggi et al., “Learning monocular depth estimation with unsupervised trinocular assumptions”, 2018
- Wang et al., “Learning Depth from Monocular Videos using Direct Methods”, 2018
- Andreas Jäger, “Real-Time Monocular Dense 3D Reconstruction For Robot Navigation”, 2015
ASL has been working on fixed-wing UAVs since 2007, more recently moving from solar powered UAVs to smaller and more agile planes. For high-speed maneuvers (ca. 15m/s), reliable long-range depth estimates computed with minimal delay are essential. A light-weight and inexpensive approach is to use low-resolution cameras. For this purpose, ASL mounted three time-synchronized cameras on the UAV: one in in the fuselage/center, one on the left wing tip and one on the right wing tip respectively. Furthermore, an inertial measurement unit (IMU) is rigidly attached to each camera. Since the wings are non-rigid, the visual and IMU measurements are required in order to compensate for vibrations and flapping motions and to estimate the relative transformations between the cameras. Based on the time-varying relative poses between two cameras a depth map and point cloud can be computed. As three cameras impose additional geometric constraints the depth reconstruction can be further improved. Furthermore, deep learning approaches have demonstrated that it is possible to infer the depth from only one single image taken from a monocular camera. The goal of this project is to combine classical geometric depth reconstruction with deep learning based depth inference. The student can build up on an existing simulation framework (Gazebo/Rotors) to generate ground-truth data (images, IMU, relative poses) as well as existing hardware.
Related Literature
- Matteo Poggi et al., “Learning monocular depth estimation with unsupervised trinocular assumptions”, 2018 - Wang et al., “Learning Depth from Monocular Videos using Direct Methods”, 2018 - Andreas Jäger, “Real-Time Monocular Dense 3D Reconstruction For Robot Navigation”, 2015
- “Classical” geometric trinocular depth map optimization and reconstruction
- Deep learning (DL) enhanced trinocular depth map optimization. The framework should be able to deal with inaccurate relative pose estimates, challenging lighting conditions, and artefacts resulting from geometric reconstruction-only. Different approaches could be chosen:
- Use geometric trinocular depth map and improve it based on the learnt depth
- Solely rely on deep learning architecture, i.e. directly use three images and relative pose estimates as input to the DL architecture
- Efficient and probabilistically sound conversion of depth map into 3D or 2.5D georeferenced point cloud
- “Classical” geometric trinocular depth map optimization and reconstruction - Deep learning (DL) enhanced trinocular depth map optimization. The framework should be able to deal with inaccurate relative pose estimates, challenging lighting conditions, and artefacts resulting from geometric reconstruction-only. Different approaches could be chosen: - Use geometric trinocular depth map and improve it based on the learnt depth - Solely rely on deep learning architecture, i.e. directly use three images and relative pose estimates as input to the DL architecture - Efficient and probabilistically sound conversion of depth map into 3D or 2.5D georeferenced point cloud
- Courses and experience in computer vision and (deep) learning
- C++, python
- Courses and experience in computer vision and (deep) learning - C++, python
For more information, visit:
https://docs.google.com/presentation/d/1DVAy-Jl4dDyL4uEeAYm6aD9b2YS_-oTauESIQ7nvhxE/edit?usp=sharing
Please send your CV and transcript of records to hitimo@ethz.ch and tstastny@ethz.ch
For more information, visit: https://docs.google.com/presentation/d/1DVAy-Jl4dDyL4uEeAYm6aD9b2YS_-oTauESIQ7nvhxE/edit?usp=sharing
Please send your CV and transcript of records to hitimo@ethz.ch and tstastny@ethz.ch