Reinforcement Learning for Drone Maneuvers from Human Preferences

Learn complex drone maneuvers from human feedback using Reinforcement Learning (RL).

Keywords: Reinforcement Learning from human feedback (RLHF), Drones, Robotics

Description
Traditionally, training drones for specific maneuvers rely on pre-defined reward functions meticulously crafted by domain experts. This approach limits the flexibility of learned behaviors and requires significant human effort. Additionally, defining reward functions for complex maneuvers like obstacle avoidance or acrobatics can be challenging. Recent works have demonstrated the effectiveness of utilizing human preferences for significant efficiency gains and fine-tuning complex models, such as Large Language Models (LLMs). This approach allows the model to incorporate human feedback into their learned behavior.
Traditionally, training drones for specific maneuvers rely on pre-defined reward functions meticulously crafted by domain experts. This approach limits the flexibility of learned behaviors and requires significant human effort. Additionally, defining reward functions for complex maneuvers like obstacle avoidance or acrobatics can be challenging.
Recent works have demonstrated the effectiveness of utilizing human preferences for significant efficiency gains and fine-tuning complex models, such as Large Language Models (LLMs). This approach allows the model to incorporate human feedback into their learned behavior.
Goal
This project aims to find novel methods for training drones to perform difficult maneuvers (e.g., obstacle avoidance, aerial acrobatics,..) with minimal human supervision and without pre-defining reward functions. We propose leveraging human preferences to guide the learning process, allowing the drone to learn desirable behaviors directly from human feedback.
This project aims to find novel methods for training drones to perform difficult maneuvers (e.g., obstacle avoidance, aerial acrobatics,..) with minimal human supervision and without pre-defining reward functions. We propose leveraging human preferences to guide the learning process, allowing the drone to learn desirable behaviors directly from human feedback.
Contact Details
Ismail Geles [geles (at) ifi (dot) uzh (dot) ch], Angel Romero [roagui (at) ifi (dot) uzh (dot) ch], Jiaxu Xing [jixing (at) ifi (dot) uzh (dot) ch]
Ismail Geles [geles (at) ifi (dot) uzh (dot) ch], Angel Romero [roagui (at) ifi (dot) uzh (dot) ch], Jiaxu Xing [jixing (at) ifi (dot) uzh (dot) ch]

Calendar

Earliest start	2024-05-15
Latest end	2025-01-31

Location

Robotics and Perception (UZH)

Labels

Semester Project
Master Thesis

Topics

Information, Computing and Communication Sciences
Engineering and Technology