Register now After registration you will be able to apply for this opportunity online.
Reinforcement Learning for Drone Maneuvers from Human Preferences
Learn complex drone maneuvers from human feedback using Reinforcement Learning (RL).
Keywords: Reinforcement Learning from human feedback (RLHF), Drones, Robotics
Traditionally, training drones for specific maneuvers rely on pre-defined reward functions meticulously crafted by domain experts. This approach limits the flexibility of learned behaviors and requires significant human effort. Additionally, defining reward functions for complex maneuvers like obstacle avoidance or acrobatics can be challenging.
Recent works have demonstrated the effectiveness of utilizing human preferences for significant efficiency gains and fine-tuning complex models, such as Large Language Models (LLMs). This approach allows the model to incorporate human feedback into their learned behavior.
Traditionally, training drones for specific maneuvers rely on pre-defined reward functions meticulously crafted by domain experts. This approach limits the flexibility of learned behaviors and requires significant human effort. Additionally, defining reward functions for complex maneuvers like obstacle avoidance or acrobatics can be challenging. Recent works have demonstrated the effectiveness of utilizing human preferences for significant efficiency gains and fine-tuning complex models, such as Large Language Models (LLMs). This approach allows the model to incorporate human feedback into their learned behavior.
This project aims to find novel methods for training drones to perform difficult maneuvers (e.g., obstacle avoidance, aerial acrobatics,..) with minimal human supervision and without pre-defining reward functions. We propose leveraging human preferences to guide the learning process, allowing the drone to learn desirable behaviors directly from human feedback.
This project aims to find novel methods for training drones to perform difficult maneuvers (e.g., obstacle avoidance, aerial acrobatics,..) with minimal human supervision and without pre-defining reward functions. We propose leveraging human preferences to guide the learning process, allowing the drone to learn desirable behaviors directly from human feedback.