Rl algorithms

Below are the implemented algorithms and their brief descriptions.

  • [x] Deep Q-Learning (DQN)
    • dqn.py
      • For discrete action space.
    • dqn_atari.py
      • For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
  • [x] Categorical DQN (C51)
    • c51.py
      • For discrete action space.
    • c51_atari.py
      • For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
    • c51_atari_visual.py
      • Adds return and q-values visulization for dqn_atari.py.
  • [x] Proximal Policy Gradient (PPO)
    • All of the PPO implementations below are augmented with some code-level optimizations. See https://costa.sh/blog-the-32-implementation-details-of-ppo.html for more details
    • ppo.py
      • For discrete action space.
    • ppo_continuous_action.py
      • For continuous action space. Also implemented Mujoco-specific code-level optimizations
    • ppo_atari.py
      • For playing Atari games. It uses convolutional layers and common atari-based pre-processing techniques.
  • [x] Soft Actor Critic (SAC)
  • [x] Deep Deterministic Policy Gradient (DDPG)
  • [x] Twin Delayed Deep Deterministic Policy Gradient (TD3)
  • [x] Phasic Policy Gradient (PPG)