Li Zhang, 18 February 2020
Most existing DRL approaches focus on leveraging the deep neural network structure to approximate the value function via a trial-and-error learning process, but insufficiently address explicit planning computation as in the conventional model-based approaches. We proposed Universal Value Iteration Networks (UVIN) to combine model-free learning and model-based planning in common RL setting to improve long-term reasoning and inference.
Performance on Minecraft
Spatially-variant maze
Minecraft is a popular sandbox video game that allows players to explore, gather, and craft in a 3D world. To collect the desired items in the inventory, players need to plan whether to search for or synthesize a new item, and how. Thus, Minecraft is a typical problem requiring long-term reasoning. UVIN significantly outperforms other state-of-the-art approaches (GPPN, GVIN, VIN, Rainbow) in Minecraft and some variants of maze navigation we introduced. UVIN has been accepted as a conference paper by AAAI-2020 for oral presentation.