| ✅ Proximal Policy Gradient (PPO) |
ppo.py, docs |
|
ppo_atari.py, docs |
|
ppo_continuous_action.py, docs |
|
ppo_atari_lstm.py, docs |
|
ppo_atari_envpool.py, docs |
|
ppo_procgen.py, docs |
| ✅ Deep Q-Learning (DQN) |
dqn.py, docs |
|
dqn_atari.py, docs |
| ✅ Categorical DQN (C51) |
c51.py, docs |
|
c51_atari.py, docs |
| ✅ Soft Actor-Critic (SAC) |
sac_continuous_action.py, docs |
| ✅ Deep Deterministic Policy Gradient (DDPG) |
ddpg_continuous_action.py, docs |
| ✅ Twin Delayed Deep Deterministic Policy Gradient (TD3) |
td3_continuous_action.py, docs |
| ✅ Phasic Policy Gradient (PPG) |
ppg_procgen.py, docs |