Start Date
13-5-2021 2:10 PM
End Date
13-5-2021 2:30 PM
Document Type
Full Paper
Keywords
Reinforcement learning (RL), proximal policy optimization (PPO), advantage actor critic (A2C), minigrid navigation, and neural networks
Description
This paper studies the extended experiments of agent-environment interactions of recent reinforcement learning (RL) algorithms, i.e., Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C). In addition to the evaluation of accumulated rewards in the MiniGrid environment, this work also assesses the learned value tables and others. The experiment platform is expanded with arbitrary shape of mazes, customized training scheme, and the average computation from different initial seeds. The key RL formulas and hyperparameters are explicitly discussed with MiniGrid settings. From the obtained results, the conclusion provides the strengths and weaknesses of neural network implementation for RL methods. This is an undergraduate project from summer 2020.
DOI
https://doi.org/10.5038/LZTZ6050
Experimental Evaluation of Proximal Policy Optimization and Advantage Actor-Critic RL Algorithms using MiniGrid Environment
This paper studies the extended experiments of agent-environment interactions of recent reinforcement learning (RL) algorithms, i.e., Proximal Policy Optimization (PPO) and Advantage Actor Critic (A2C). In addition to the evaluation of accumulated rewards in the MiniGrid environment, this work also assesses the learned value tables and others. The experiment platform is expanded with arbitrary shape of mazes, customized training scheme, and the average computation from different initial seeds. The key RL formulas and hyperparameters are explicitly discussed with MiniGrid settings. From the obtained results, the conclusion provides the strengths and weaknesses of neural network implementation for RL methods. This is an undergraduate project from summer 2020.