Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v1.0 #69

Merged
merged 31 commits into from
Mar 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
ac890a2
Update sac hyperparams
araffin Mar 1, 2021
d033c0a
Bump version
araffin Mar 1, 2021
fa335df
Move real robot hyperparams
araffin Mar 1, 2021
27edfcb
Update HER params
araffin Mar 2, 2021
1e0c8bd
Fix for HER action noise
araffin Mar 2, 2021
84bd43c
Update benchmark file
araffin Mar 2, 2021
1d3b8a7
Update formatting
araffin Mar 2, 2021
599ba3f
Catch errors when benchmarking
araffin Mar 3, 2021
cb509ac
Use subprocess only if needed
araffin Mar 3, 2021
df76707
Change default number of threads for bench
araffin Mar 3, 2021
994cc30
Add pre-trained agents
araffin Mar 4, 2021
6172d4a
Catch keyboard interrupt for enjoy
araffin Mar 5, 2021
4a912b4
Update benchmark
araffin Mar 5, 2021
2395593
Update README and changelog
araffin Mar 5, 2021
8a41755
Merge branch 'feat/release-v1.0rc0' of github.com:DLR-RM/rl-baselines…
araffin Mar 5, 2021
1b9611c
Tuned DDPG hyperparam
araffin Mar 5, 2021
ef04597
Update TD3 hyperparams
araffin Mar 5, 2021
597a304
Minor edit
araffin Mar 5, 2021
4c2acb1
Add Reacher
araffin Mar 5, 2021
164c2d1
Update table
araffin Mar 5, 2021
7441e47
Ugrade SB3
araffin Mar 6, 2021
4cadd46
Add support for loading saved models with python 3.8
araffin Mar 6, 2021
52693c6
Upgrade SB3
araffin Mar 6, 2021
9debc19
Add BipedalWalkerHardcore
araffin Mar 8, 2021
190adf4
Merge branch 'feat/release-v1.0rc0' of github.com:DLR-RM/rl-baselines…
araffin Mar 8, 2021
d5f75ff
Changed pybullet version in CI
araffin Mar 8, 2021
a02f4fb
Add more Atari games
araffin Mar 9, 2021
ae953f4
Update README
araffin Mar 9, 2021
40f3b5b
Add benchmark files
araffin Mar 9, 2021
1c903c6
Add QR-DQN Enduro
araffin Mar 11, 2021
11f6266
Update README + bug fix for HER enjoy
araffin Mar 11, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .dockerignore

This file was deleted.

18 changes: 18 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
__pycache__/
logs
.pytest_cache/
.coverage
.coverage.*
.idea/
cluster_sbatch.sh
cluster_sbatch_mpi.sh
cluster_torchy.sh
logs/
rl-trained_agents/
.pytype/
htmlcov/
git_rewrite_commit_history.sh
.vscode/
# ignore for docker builds
rl-trained-agents/
.git/
4 changes: 3 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ jobs:
run: |
python -m pip install --upgrade pip
# cpu version of pytorch - faster to download
pip install torch==1.7.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.8.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
# temp fix: use pybullet 3.0.8 (issue with numpy for 3.0.9)
pip install pybullet==3.0.8
pip install -r requirements.txt
# Use headless version
pip install opencv-python-headless
Expand Down
7 changes: 4 additions & 3 deletions .github/workflows/trained_agents.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7] # 3.8 not supported yet due to cloudpickle errors

python-version: [3.6, 3.7, 3.8]
steps:
- uses: actions/checkout@v2
with:
Expand All @@ -30,7 +29,9 @@ jobs:
run: |
python -m pip install --upgrade pip
# cpu version of pytorch - faster to download
pip install torch==1.7.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
pip install torch==1.8.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
# temp fix: use pybullet 3.0.8 (issue with numpy for 3.0.9)
pip install pybullet==3.0.8
pip install -r requirements.txt
# Use headless version
pip install opencv-python-headless
Expand Down
18 changes: 18 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
## Release 1.0rc2 (WIP)

### Breaking Changes
- Upgrade to SB3 >= 1.0rc2

### New Features
- Added 90+ trained agents + benchmark file
- Add support for loading saved model under python 3.8+ (no retraining possible)

### Bug fixes
- Bug fixes for `HER` handling action noise
- Fixed double reset bug with `HER` and enjoy script

### Documentation

### Other
- Updated `HER` hyperparameters

## Pre-Release 0.11.1 (2021-02-27)

### Breaking Changes
Expand Down
161 changes: 89 additions & 72 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@



# RL Baselines3 Zoo: a Collection of Pre-Trained Reinforcement Learning Agents
# RL Baselines3 Zoo: A Training Framework for Stable Baselines3 Reinforcement Learning Agents

<!-- <img src="images/BipedalWalkerHardcorePPO.gif" align="right" width="35%"/> -->

A collection of trained Reinforcement Learning (RL) agents, with tuned hyperparameters, using [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3).
A Training Framework for Reinforcement Learning (RL), together with a collection of trained agents, with tuned hyperparameters, using [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3).

We are **looking for contributors** to complete the collection!

Expand All @@ -19,37 +19,6 @@ Goals of this repository:

This is the SB3 version of the original SB2 [rl-zoo](https://github.com/araffin/rl-baselines-zoo).

## Enjoy a Trained Agent

**Note: to download the repo with the trained agents, you must use `git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo`** in order to clone the submodule too.


If the trained agent exists, then you can see it in action using:
```
python enjoy.py --algo algo_name --env env_id
```

For example, enjoy A2C on Breakout during 5000 timesteps:
```
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000
```

If you have trained an agent yourself, you need to do:
```
# exp-id 0 corresponds to the last experiment, otherwise, you can specify another ID
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 0
```

To load the best model (when using evaluation environment):
```
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-best
```

To load a checkpoint (here the checkpoint name is `rl_model_10000_steps.zip`):
```
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-checkpoint 10000
```

## Train an Agent

The hyperparameters for each environment are defined in `hyperparameters/algo_name.yml`.
Expand Down Expand Up @@ -85,6 +54,46 @@ python train.py --algo sac --env Pendulum-v0 --save-replay-buffer
```
It will be automatically loaded if present when continuing training.

## Plot Scripts

Plot scripts (to be documented, see "Results" sections in SB3 documentation):
- `scripts/all_plots.py`/`scripts/plot_from_file.py` for plotting evaluations
- `scripts/plot_train.py` for plotting training reward/success

## Custom Environment

The easiest way to add support for a custom environment is to edit `utils/import_envs.py` and register your environment here. Then, you need to add a section for it in the hyperparameters file (`hyperparams/algo.yml`).

## Enjoy a Trained Agent

**Note: to download the repo with the trained agents, you must use `git clone --recursive https://github.com/DLR-RM/rl-baselines3-zoo`** in order to clone the submodule too.


If the trained agent exists, then you can see it in action using:
```
python enjoy.py --algo algo_name --env env_id
```

For example, enjoy A2C on Breakout during 5000 timesteps:
```
python enjoy.py --algo a2c --env BreakoutNoFrameskip-v4 --folder rl-trained-agents/ -n 5000
```

If you have trained an agent yourself, you need to do:
```
# exp-id 0 corresponds to the last experiment, otherwise, you can specify another ID
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 0
```

To load the best model (when using evaluation environment):
```
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-best
```

To load a checkpoint (here the checkpoint name is `rl_model_10000_steps.zip`):
```
python enjoy.py --algo algo_name --env env_id -f logs/ --exp-id 1 --load-checkpoint 10000
```

## Hyperparameter yaml syntax

Expand Down Expand Up @@ -160,7 +169,7 @@ for multiple, specify a list:
env_wrapper:
- utils.wrappers.DoneOnSuccessWrapper:
reward_offset: 1.0
- utils.wrappers.TimeFeatureWrapper
- sb3_contrib.common.wrappers.TimeFeatureWrapper
```

Note that you can easily specify parameters too.
Expand All @@ -181,6 +190,8 @@ You can easily overwrite hyperparameters in the command line, using ``--hyperpar
python train.py --algo a2c --env MountainCarContinuous-v0 --hyperparams learning_rate:0.001 policy_kwargs:"dict(net_arch=[64, 64])"
```

Note: if you want to pass a string, you need to escape it like that: `my_string:"'value'"`

## Record a Video of a Trained Agent

Record 1000 steps:
Expand All @@ -190,9 +201,9 @@ python -m utils.record_video --algo ppo --env BipedalWalkerHardcore-v2 -n 1000
```


## Current Collection: to be added soon (after v1.0 release)
## Current Collection: 100+ Trained Agents!

Final performance of the trained agents can be found in `benchmark.md`. To compute them, simply run `python -m utils.benchmark`.
Final performance of the trained agents can be found in [`benchmark.md`](./benchmark.md). To compute them, simply run `python -m utils.benchmark`.

*NOTE: this is not a quantitative benchmark as it corresponds to only one run (cf [issue #38](https://github.com/araffin/rl-baselines-zoo/issues/38)). This benchmark is meant to check algorithm (maximal) performance, find potential bugs and also allow users to have access to pretrained agents.*

Expand All @@ -202,42 +213,47 @@ Final performance of the trained agents can be found in `benchmark.md`. To compu

| RL Algo | BeamRider | Breakout | Enduro | Pong | Qbert | Seaquest | SpaceInvaders |
|----------|--------------------|--------------------|--------------------|-------|-------|--------------------|--------------------|
| A2C | | | | | | | |
| PPO | | | | | | | |
| DQN | | | | | | | |

| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| QR-DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |

Additional Atari Games (to be completed):

| RL Algo | MsPacman |
|----------|-------------|
| A2C | |
| PPO | |
| DQN | |
| RL Algo | MsPacman | Asteroids | RoadRunner |
|----------|-------------|-----------|------------|
| A2C | | :heavy_check_mark: | :heavy_check_mark: |
| PPO | | :heavy_check_mark: | :heavy_check_mark: |
| DQN | | :heavy_check_mark: | :heavy_check_mark: |
| QR-DQN | | :heavy_check_mark: | :heavy_check_mark: |


### Classic Control Environments

| RL Algo | CartPole-v1 | MountainCar-v0 | Acrobot-v1 | Pendulum-v0 | MountainCarContinuous-v0 |
|----------|--------------|----------------|------------|--------------|--------------------------|
| A2C | | | | | |
| PPO | | | | | |
| DQN | | | | N/A | N/A |
| DDPG | N/A | N/A | N/A | | |
| SAC | N/A | N/A | N/A | | |
| TD3 | N/A | N/A | N/A | | |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | N/A | N/A |
| QR-DQN | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | N/A | N/A |
| DDPG | N/A | N/A | N/A | :heavy_check_mark: | :heavy_check_mark: |
| SAC | N/A | N/A | N/A | :heavy_check_mark: | :heavy_check_mark: |
| TD3 | N/A | N/A | N/A | :heavy_check_mark: | :heavy_check_mark: |
| TQC | N/A | N/A | N/A | :heavy_check_mark: | :heavy_check_mark: |


### Box2D Environments

| RL Algo | BipedalWalker-v2 | LunarLander-v2 | LunarLanderContinuous-v2 | BipedalWalkerHardcore-v2 | CarRacing-v0 |
| RL Algo | BipedalWalker-v3 | LunarLander-v2 | LunarLanderContinuous-v2 | BipedalWalkerHardcore-v3 | CarRacing-v0 |
|----------|--------------|----------------|------------|--------------|--------------------------|
| A2C | | | | | |
| PPO | | | | | |
| DQN | N/A | | N/A | N/A | N/A |
| DDPG | | N/A | | | |
| SAC | | N/A | | | |
| TD3 | | N/A | | | |
| TRPO | | | | | |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| DQN | N/A | :heavy_check_mark: | N/A | N/A | N/A |
| QR-DQN | N/A | :heavy_check_mark: | N/A | N/A | N/A |
| DDPG | :heavy_check_mark: | N/A | :heavy_check_mark: | | |
| SAC | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |
| TD3 | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |
| TQC | :heavy_check_mark: | N/A | :heavy_check_mark: | :heavy_check_mark: | |

### PyBullet Environments

Expand All @@ -248,23 +264,23 @@ Note: those environments are derived from [Roboschool](https://github.com/openai

| RL Algo | Walker2D | HalfCheetah | Ant | Reacher | Hopper | Humanoid |
|----------|-----------|-------------|-----|---------|---------|----------|
| A2C | | | | | | |
| PPO | | | | | | |
| DDPG | | | | | | |
| SAC | | | | | | |
| TD3 | | | | | | |
| TRPO | | | | | | |
| A2C | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| PPO | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| DDPG | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| SAC | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| TD3 | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |
| TQC | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | |

PyBullet Envs (Continued)

| RL Algo | Minitaur | MinitaurDuck | InvertedDoublePendulum | InvertedPendulumSwingup |
|----------|-----------|-------------|-----|---------|
| A2C | | | | |
| PPO | | | | |
| PPO | | | | |
| DDPG | | | | |
| SAC | | | | |
| TD3 | | | | |
| TRPO | | | | |
| SAC | | | | |
| TD3 | | | | |
| TQC | | | | |

### MiniGrid Envs

Expand All @@ -281,7 +297,7 @@ A simple, lightweight and fast Gym environments implementation of the famous gri

There are 19 environment groups (variations for each) in total.

Note that you need to specify --gym-packages gym_minigrid with enjoy.py and train.py as it is not a standard Gym environment, as well as installing the custom Gym package module or putting it in python path.
Note that you need to specify `--gym-packages gym_minigrid` with `enjoy.py` and `train.py` as it is not a standard Gym environment, as well as installing the custom Gym package module or putting it in python path.

```
pip install gym-minigrid
Expand Down Expand Up @@ -310,7 +326,8 @@ You can train agents online using [colab notebook](https://colab.research.google

### Stable-Baselines3 PyPi Package

Min version: stable-baselines3[extra] >= 0.6.0
Min version: stable-baselines3[extra] >= 1.0
and sb3_contrib >= 1.0

```
apt-get install swig cmake ffmpeg
Expand Down Expand Up @@ -364,7 +381,7 @@ make type

To cite this repository in publications:

```
```bibtex
@misc{rl-zoo3,
author = {Raffin, Antonin},
title = {RL Baselines3 Zoo},
Expand Down
Loading