DRL agent in Rabbids: 100% win rate against human-players

Key people: Li Zhang

In the cooperation of our lab and Ubisoft, we designed a novel DRL framework and applied it to train agents in Rabbids: Journey To The West, a party game of Rabbids series, which is a famous title presented by Ubisoft.

To test our agents, we organized competitions, in which three human-players formed a team and collaborated to compete with a single agent trained by our approach. This event attract many people to join, but no team beat our agent, yielding a 100% win rate of our agent.

UVIN: An Universal DRL Method to Combine Planning and Learning

Key people: Li Zhang

Most existing DRL approaches focus on leveraging the deep neural network structure to approximate the value function via a trial-and-error learning process, but insufficiently address explicit planning computation as in the conventional model-based approaches. We proposed Universal Value Iteration Networks (UVIN) to combine model-free learning and model-based planning in common RL setting to improve long-term reasoning and inference.

Performance on Minecraft

Spatially-variant maze

Minecraft is a popular sandbox video game that allows players to explore, gather, and craft in a 3D world. To collect the desired items in the inventory, players need to plan whether to search for or synthesize a new item, and how. Thus, Minecraft is a typical problem requiring long-term reasoning. UVIN significantly outperforms other state-of-the-art approaches (GPPN, GVIN, VIN, Rainbow) in Minecraft and some variants of maze navigation we introduced. UVIN has been accepted as a conference paper by AAAI-2020 for oral presentation.

ADRQN: On Improving Deep Reinforcement Learning for POMDPs

Key people: Pengfei Zhu

Most of Deep Reinforcement Learning (DRL) methods focus on Markov Decision Process (MDP). Partially Observable Markov Decision Process (POMDP) is an extension of MDP. It naturally models planning tasks with uncertain action effects and partial state observability, however, finding an optimal policy is notoriously difficult. Inspired by belief state update based on Bayes’ theorem, we proposed Action-specific Deep Recurrent Q-Network (ADRQN) to improve DRL in POMDP.

Performance on Atari

Performance on Doom

Atari 2600 is a classical video game set and is used as benchmark tasks in many DRL research. We evaluated ADRQN in a flickering version of Atari, obscuring the entire screen with a certain probability at each time step. Further more, a 3D video game Doom is also used to evaluate our model. ADRQN achieves better performance than baseline methods (DQN, DRQN, DDRQN) in flickering Atari and Doom. ADRQN also appears to be the most competitive comparison approach among the follow-up works. The citations of ADRQN paper are up to 30 now.

Inverse Reinforcement Learning

Key people: Jie Huang

Inverse Reinforcement Learning (IRL) is mainly for complex tasks where the reward function is difficult to formulate. In general, IRL is to learn the reward function from the expert demonstrations, which can be understood as explaining the expert policy with the reward function we learned. When learning policies based on optimal sequence samples is needed, we can combine inverse reinforcement learning and deep learning to improve the accuracy of the reward function and the effect of the policy.

Our work aims to utilize the inverse reinforcement learning algorithm or the imitation learning algorithm to solve related complex tasks, and apply the policy learned from the simulation environment to the robotics. The robot/agent is expected to learn from the training set demonstrated by the experts and achieves/exceeds the expert performance.


          Robot Dog                                      Simulation environment