Adaptive Reinforcement Learning through Evolving Self-Modifying Neural Networks
Created on 2024-10-07T17:01:16-05:00
Adds some plasticity controls to neurons.
Compares the result of Proximal Policy Optimization (PPO) and an evolutionary strategy (ES) system; but plasticity parameters are added, so various parts of the network adjust their learning rates during training.
Total training time for the self-modifying OpenAI-ES averaged around 214.8 minutes, and 968.8 minutes for the self-modifying PPO running on a standard 6-core CPU.
Seems to compare doing PPO training on a small network to doing a step of evolution scrambling followed by a step of backprop.