Start Date

13-5-2021 2:10 PM

End Date

13-5-2021 2:30 PM

Document Type

Full Paper

Keywords

Reinforcement learning (RL), proximal policy optimization (PPO), advantage actor critic (A2C), minigrid navigation, and neural networks

Description

This paper studies the extended experiments of agent-environment interactions of recent reinforcement learning (RL) algorithms, i.e., Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C). In addition to the evaluation of accumulated rewards in the MiniGrid environment, this work also assesses the learned value tables and others. The experiment platform is expanded with arbitrary shape of mazes, customized training scheme, and the average computation from different initial seeds. The key RL formulas and hyperparameters are explicitly discussed with MiniGrid settings. From the obtained results, the conclusion provides the strengths and weaknesses of neural network implementation for RL methods. This is an undergraduate project from summer 2020.

DOI

https://doi.org/10.5038/LZTZ6050

Share

COinS
 
May 13th, 2:10 PM May 13th, 2:30 PM

Experimental Evaluation of Proximal Policy Optimization and Advantage Actor-Critic RL Algorithms using MiniGrid Environment

This paper studies the extended experiments of agent-environment interactions of recent reinforcement learning (RL) algorithms, i.e., Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C). In addition to the evaluation of accumulated rewards in the MiniGrid environment, this work also assesses the learned value tables and others. The experiment platform is expanded with arbitrary shape of mazes, customized training scheme, and the average computation from different initial seeds. The key RL formulas and hyperparameters are explicitly discussed with MiniGrid settings. From the obtained results, the conclusion provides the strengths and weaknesses of neural network implementation for RL methods. This is an undergraduate project from summer 2020.