Adaptive Reinforcement Learning through Evolving Self-Modifying Neural Networks

Created on 2024-10-07T17:01:16-05:00

Adds some plasticity controls to neurons.

Compares the result of Proximal Policy Optimization (PPO) and an evolutionary strategy (ES) system; but plasticity parameters are added, so various parts of the network adjust their learning rates during training.

Total training time for the self-modifying OpenAI-ES averaged around 214.8 minutes, and 968.8 minutes for the self-modifying PPO running on a standard 6-core CPU.

Seems to compare doing PPO training on a small network to doing a step of evolution scrambling followed by a step of backprop.