Adaptive Reinforcement Learning through Evolving Self-Modifying Neural Networks

Created on 2024-10-07T17:01:16-05:00

Return to the Index

This card pertains to a resource available on the internet.

This card can also be read via Gemini.

Adds some plasticity controls to neurons.

Compares the result of Proximal Policy Optimization (PPO) and an evolutionary strategy (ES) system; but plasticity parameters are added, so various parts of the network adjust their learning rates during training.

Total training time for the self-modifying OpenAI-ES averaged around 214.8 minutes, and 968.8 minutes for the self-modifying PPO running on a standard 6-core CPU.

Seems to compare doing PPO training on a small network to doing a step of evolution scrambling followed by a step of backprop.