Efficient Exploration by Novelty Pursuit (DAI 2020)

This blog summarizes our work of efficient exploration by novelty pursuit, which is presented at DAI 2020.

In our DAI 2020 paper efficient exploration by novelty pursuit [1], which is a joint work with Xiong-Hui Chen, we consider the exploration efficiency in reinforcement learning (RL). The target of RL is to search for the optimal decision by interacting with unknown environments. Due to limited information, the agent needs to attempt various actions (i.e., exploration) to discover potential rewards.

Based on the idea of intrinsically motivated goal exploration processes (IMGEP) [2] and maximum state entropy exploration (MSEE) [3], we propose to employ a goal-conditioned policy to efficiently explore. In particular, our method performs the exploration in two stages: first, it selects a seldom visited state as the target for the goal-conditioned policy to reach the boundary of the explored region; second, it takes random actions to explore the non-explored region. See Figure 1 below for the illustration.

Figure 1. Illustration for the exploration scheme in novelty pursuit.

As we can see, the key idea of our method is to distinguish explored and non-explored states, then try to visit non-explored states by the goal-conditioned policy.

We test our methods on several hard exploration tasks such as Maze, MuJoCo robotic control, and SuperMarioBros video games. For instance, for the SuparMarioBros-1-3 task, our method outperforms the vanilla method and the method based on reward bonus; see Figure 2.

Figure 2. Trajectory visualization on SuperMarioBros-1-3. Trajectories are plotted in green cycles with the same training samples. The agent starts from the most left part and needs to fetch the flag on the most right part. Top row: vanilla method; middle row: vanilla method + reward bonus; bottom row: novelty pursuit (ours).

See the following video for the illustration of learned policies by our method on SuperMarioBros tasks.

[1] Ziniu Li, and Xiong-Hui Chen. "Efficient Exploration by Novelty-Pursuit." DAI 2020.

[2] Forestier, Sébastien, et al. "Intrinsically motivated goal exploration processes with automatic curriculum learning." arXiv:1708.02190 (2017).

[3] Hazan, Elad, et al. "Provably efficient maximum entropy exploration." ICML 2019.