Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Somjit Nath, Omkar Shelke, Durgesh Kalwar, Hardik Meisheri, Harshad Khadilkar

February 2022

Project Paper Poster

System Architecture

Abstract

Exploration versus exploitation dilemma is a significant problem in reinforcement learning (RL), particularly in complex environments with large state space and sparse rewards. When optimizing for a particular goal, running simple smaller tasks can often be a good way to learn additional information about the environment. Exploration methods have been used to sample better trajectories from the environment for improved performance while auxiliary tasks have been incorporated generally where the reward is sparse. If there is little reward signal available, the agent requires clever exploration strategies to reach parts of the state space that contain relevant sub-goals. However, that exploration needs to be balanced with the need for exploiting the learned policy. This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy. We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.

Type

Conference paper

Publication

Reinforcement Learning in Games, Association for the Advancement of Artificial Intelligence(AAAI)

Reinforcement Learning Deep Learning

Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning

Abstract

Related