Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
OpenAI Gym
OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.

https://travis-ci.org/openai/gym.svg?branch=master
See What's New section below

gym makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages.

If you're not sure where to start, we recommend beginning with the docs on our site. See also the FAQ.

A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication:

@misc{1606.01540,
  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
  Title = {OpenAI Gym},
  Year = {2016},
  Eprint = {arXiv:1606.01540},
}
Contents of this document

OpenAI Gym
Basics
Installation
Environments
Examples
Testing
What's new
Basics
There are two basic concepts in reinforcement learning: the environment (namely, the outside world) and the agent (namely, the algorithm you are writing). The agent sends actions to the environment, and the environment replies with observations and rewards (that is, a score).

The core gym interface is Env, which is the unified environment interface. There is no interface for agents; that part is left to you. The following are the Env methods you should know:

reset(self): Reset the environment's state. Returns observation.
step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
render(self, mode='human', close=False): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window. Passing the close flag signals the renderer to close any such windows.
Installation
You can perform a minimal install of gym with:

git clone https://github.com/openai/gym.git
cd gym
pip install -e .
If you prefer, you can do a minimal install of the packaged version directly from PyPI:

pip install gym
You'll be able to run a few environments right away:

algorithmic
toy_text
classic_control (you'll need pyglet to render though)
We recommend playing with those environments at first, and then later installing the dependencies for the remaining environments.

Installing everything
To install the full set of environments, you'll need to have some system packages installed. We'll build out the list here over time; please let us know what you end up installing on your platform. Also, take a look at the docker files (test.dockerfile.xx.xx) to see the composition of our CI-tested images.

On OSX:

brew install cmake boost boost-python sdl2 swig wget
On Ubuntu 14.04 (non-mujoco only):

apt-get install libjpeg-dev cmake swig python-pyglet python3-opengl libboost-all-dev \
        libsdl2-2.0.0 libsdl2-dev libglu1-mesa libglu1-mesa-dev libgles2-mesa-dev \
        freeglut3 xvfb libav-tools
On Ubuntu 16.04:

apt-get install -y python-pyglet python3-opengl zlib1g-dev libjpeg-dev patchelf \
        cmake swig libboost-all-dev libsdl2-dev libosmesa6-dev xvfb ffmpeg
On Ubuntu 18.04:

apt install -y python3-dev zlib1g-dev libjpeg-dev cmake swig python-pyglet python3-opengl libboost-all-dev libsdl2-dev \
    libosmesa6-dev patchelf ffmpeg xvfb
MuJoCo has a proprietary dependency we can't set up for you. Follow the instructions in the mujoco-py package for help.

Once you're ready to install everything, run pip install -e '.[all]' (or pip install 'gym[all]').

Supported systems
We currently support Linux and OS X running Python 2.7 or 3.5. Some users on OSX + Python3 may need to run

brew install boost-python --with-python3
If you want to access Gym from languages other than python, we have limited support for non-python frameworks, such as lua/Torch, using the OpenAI Gym HTTP API.

Pip version
To run pip install -e '.[all]', you'll need a semi-recent pip. Please make sure your pip is at least at version 1.5.0. You can upgrade using the following: pip install --ignore-installed pip. Alternatively, you can open setup.py and install the dependencies by hand.

Rendering on a server
If you're trying to render video on a server, you'll need to connect a fake display. The easiest way to do this is by running under xvfb-run (on Ubuntu, install the xvfb package):

xvfb-run -s "-screen 0 1400x900x24" bash
Installing dependencies for specific environments
If you'd like to install the dependencies for only specific environments, see setup.py. We maintain the lists of dependencies on a per-environment group basis.

Environments
The code for each environment group is housed in its own subdirectory gym/envs. The specification of each task is in gym/envs/__init__.py. It's worth browsing through both.

Algorithmic
These are a variety of algorithmic tasks, such as learning to copy a sequence.

import gym
env = gym.make('Copy-v0')
env.reset()
env.render()
Atari
The Atari environments are a variety of Atari video games. If you didn't do the full install, you can install dependencies via pip install -e '.[atari]' (you'll need cmake installed) and then get started as follow:

import gym
env = gym.make('SpaceInvaders-v0')
env.reset()
env.render()
This will install atari-py, which automatically compiles the Arcade Learning Environment. This can take quite a while (a few minutes on a decent laptop), so just be prepared.

Box2d
Box2d is a 2D physics engine. You can install it via pip install -e '.[box2d]' and then get started as follow:

import gym
env = gym.make('LunarLander-v2')
env.reset()
env.render()
Classic control
These are a variety of classic control tasks, which would appear in a typical reinforcement learning textbook. If you didn't do the full install, you will need to run pip install -e '.[classic_control]' to enable rendering. You can get started with them via:

import gym
env = gym.make('CartPole-v0')
env.reset()
env.render()
MuJoCo
MuJoCo is a physics engine which can do very detailed efficient simulations with contacts. It's not open-source, so you'll have to follow the instructions in mujoco-py to set it up. You'll have to also run pip install -e '.[mujoco]' if you didn't do the full install.

import gym
env = gym.make('Humanoid-v2')
env.reset()
env.render()
Robotics
MuJoCo is a physics engine which can do very detailed efficient simulations with contacts and we use it for all robotics environments. It's not open-source, so you'll have to follow the instructions in mujoco-py to set it up. You'll have to also run pip install -e '.[robotics]' if you didn't do the full install.

import gym
env = gym.make('HandManipulateBlock-v0')
env.reset()
env.render()
You can also find additional details in the accompanying technical report and blog post. If you use these environments, you can cite them as follows:

@misc{1802.09464,
  Author = {Matthias Plappert and Marcin Andrychowicz and Alex Ray and Bob McGrew and Bowen Baker and Glenn Powell and Jonas Schneider and Josh Tobin and Maciek Chociej and Peter Welinder and Vikash Kumar and Wojciech Zaremba},
  Title = {Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research},
  Year = {2018},
  Eprint = {arXiv:1802.09464},
}
Toy text
Toy environments which are text-based. There's no extra dependency to install, so to get started, you can just do:

import gym
env = gym.make('FrozenLake-v0')
env.reset()
env.render()
Examples
See the examples directory.

Run examples/agents/random_agent.py to run an simple random agent.
Run examples/agents/cem.py to run an actual learning agent (using the cross-entropy method).
Run examples/scripts/list_envs to generate a list of all environments.
Testing
We are using pytest for tests. You can run them via:

pytest
What's new
2018-02-28: Release of a set of new robotics environments.

2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.

Now your Env and Wrapper subclasses should define step, reset, render, close, seed rather than underscored method names.
Removed the board_game, debugging, safety, parameter_tuning environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].
No more render(close=True), use env-specific methods to close the rendering.
Removed scoreboard directory, since site doesn't exist anymore.
Moved gym/monitoring to gym/wrappers/monitoring
Add dtype to Space.
Not using python's built-in module anymore, using gym.logger
2018-01-24: All continuous control environments now use mujoco_py >= 1.50. Versions have been updated accordingly to -v2, e.g. HalfCheetah-v2. Performance should be similar (see openai/gym#834) but there are likely some differences due to changes in MuJoCo.

2017-06-16: Make env.spec into a property to fix a bug that occurs when you try to print out an unregistered Env.

2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at v4. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py <= 0.0.21. Note that the v4 environments will not give identical results to existing v3 results, although differences are minor. The v4 environments incorporate the latest Arcade Learning Environment (ALE), including several ROM fixes, and now handle loading and saving of the emulator state. While seeds still ensure determinism, the effect of any given seed is not preserved across this upgrade because the random number generator in ALE has changed. The *NoFrameSkip-v4 environments should be considered the canonical Atari environments from now on.

2017-03-05: BACKWARDS INCOMPATIBILITY: The configure method has been removed from Env. configure was not used by gym, but was used by some dependent libraries including universe. These libraries will migrate away from the configure method by using wrappers instead. This change is on master and will be released with 0.8.0.

2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a wrapper. Rather than starting monitoring as env.monitor.start(directory), envs are now wrapped as follows: env = wrappers.Monitor(env, directory). This change is on master and will be released with 0.7.0.

2016-11-1: Several experimental changes to how a running monitor interacts with environments. The monitor will now raise an error if reset() is called when the env has not returned done=True. The monitor will only record complete episodes where done=True. Finally, the monitor no longer calls seed() on the underlying env, nor does it record or upload seed information.

2016-10-31: We're experimentally expanding the environment ID format to include an optional username.

2016-09-21: Switch the Gym automated logger setup to configure the root logger rather than just the 'gym' logger.

2016-08-17: Calling close on an env will also close the monitor and any rendering windows.

2016-08-17: The monitor will no longer write manifest files in real-time, unless write_upon_reset=True is passed.

2016-05-28: For controlled reproducibility, envs now support seeding (cf #91 and #135). The monitor records which seeds are used. We will soon add seed information to the display on the scoreboard.
  • Loading branch information
ghost-999 authored Jan 18, 2019
0 parents commit 8a1623b
Show file tree
Hide file tree
Showing 18 changed files with 1,294 additions and 0 deletions.
6 changes: 6 additions & 0 deletions cart.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
import gym
env = gym.make('CartPole-v0')
env.reset()
for _ in range(1000):
env.render()
env.step(env.action_space.sample()) # take a random action
181 changes: 181 additions & 0 deletions cartpole-v1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
import gym
import random
import numpy as np
import tflearn
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
from statistics import median, mean
from collections import Counter

LR = 1e-3
env = gym.make("CartPole-v0")
env.reset()
goal_steps = 500
score_requirement = 50
initial_games = 10000
[2017-03-02 18:38:06,633] Making new env: CartPole-v0
Now, let's just get a quick impression of what a random agent looks like.

def some_random_games_first():
# Each of these is its own game.
for episode in range(5):
env.reset()
# this is each frame, up to 200...but we wont make it that far.
for t in range(200):
# This will display the environment
# Only display if you really want to see it.
# Takes much longer to display it.
env.render()

# This will just create a sample action in any environment.
# In this environment, the action can be 0 or 1, which is left or right
action = env.action_space.sample()

# this executes the environment with an action,
# and returns the observation of the environment,
# the reward, if the env is over, and other info.
observation, reward, done, info = env.step(action)
if done:
break

some_random_games_first()


def initial_population():
# [OBS, MOVES]
training_data = []
# all scores:
scores = []
# just the scores that met our threshold:
accepted_scores = []
# iterate through however many games we want:
for _ in range(initial_games):
score = 0
# moves specifically from this environment:
game_memory = []
# previous observation that we saw
prev_observation = []
# for each frame in 200
for _ in range(goal_steps):
# choose random action (0 or 1)
action = random.randrange(0,2)
# do it!
observation, reward, done, info = env.step(action)

# notice that the observation is returned FROM the action
# so we'll store the previous observation here, pairing
# the prev observation to the action we'll take.
if len(prev_observation) > 0 :
game_memory.append([prev_observation, action])
prev_observation = observation
score+=reward
if done: break

# IF our score is higher than our threshold, we'd like to save
# every move we made
# NOTE the reinforcement methodology here.
# all we're doing is reinforcing the score, we're not trying
# to influence the machine in any way as to HOW that score is
# reached.
if score >= score_requirement:
accepted_scores.append(score)
for data in game_memory:
# convert to one-hot (this is the output layer for our neural network)
if data[1] == 1:
output = [0,1]
elif data[1] == 0:
output = [1,0]

# saving our training data
training_data.append([data[0], output])

# reset env to play again
env.reset()
# save overall scores
scores.append(score)

# just in case you wanted to reference later
training_data_save = np.array(training_data)
np.save('saved.npy',training_data_save)

# some stats here, to further illustrate the neural network magic!
print('Average accepted score:',mean(accepted_scores))
print('Median score for accepted scores:',median(accepted_scores))
print(Counter(accepted_scores))

return training_data


def neural_network_model(input_size):

network = input_data(shape=[None, input_size, 1], name='input')

network = fully_connected(network, 128, activation='relu')
network = dropout(network, 0.8)

network = fully_connected(network, 256, activation='relu')
network = dropout(network, 0.8)

network = fully_connected(network, 512, activation='relu')
network = dropout(network, 0.8)

network = fully_connected(network, 256, activation='relu')
network = dropout(network, 0.8)

network = fully_connected(network, 128, activation='relu')
network = dropout(network, 0.8)

network = fully_connected(network, 2, activation='softmax')
network = regression(network, optimizer='adam', learning_rate=LR, loss='categorical_crossentropy', name='targets')
model = tflearn.DNN(network, tensorboard_dir='log')

return model


def train_model(training_data, model=False):

X = np.array([i[0] for i in training_data]).reshape(-1,len(training_data[0][0]),1)
y = [i[1] for i in training_data]

if not model:
model = neural_network_model(input_size = len(X[0]))

model.fit({'input': X}, {'targets': y}, n_epoch=5, snapshot_step=500, show_metric=True, run_id='openai_learning')
return model


training_data = initial_population()


model = train_model(training_data)



scores = []
choices = []
for each_game in range(10):
score = 0
game_memory = []
prev_obs = []
env.reset()
for _ in range(goal_steps):
env.render()

if len(prev_obs)==0:
action = random.randrange(0,2)
else:
action = np.argmax(model.predict(prev_obs.reshape(-1,len(prev_obs),1))[0])

choices.append(action)

new_observation, reward, done, info = env.step(action)
prev_obs = new_observation
game_memory.append([new_observation, action])
score+=reward
if done: break

scores.append(score)

print('Average Score:',sum(scores)/len(scores))
print('choice 1:{} choice 0:{}'.format(choices.count(1)/len(choices),choices.count(0)/len(choices)))
print(score_requirement)
12 changes: 12 additions & 0 deletions copy-v0.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import gym
env = gym.make('MountainCarContinuous-v0') # try for different environements
observation = env.reset()
for t in range(100):
env.render()
print observation
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
print observation, reward, done, info
if done:
print("Finished after {} timesteps".format(t+1))
break
65 changes: 65 additions & 0 deletions draw.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# -*- coding: utf-8 -*-
"""draw.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/1-egFuLTzA4K1hjsZQIyp78bJMd5vKX7Y
"""

# Import libraries for simulation
import tensorflow as tf
import numpy as np

# Imports for visualization
import PIL.Image
from io import BytesIO
from IPython.display import Image, display

def DisplayFractal(a, fmt='jpeg'):
"""Display an array of iteration counts as a
colorful picture of a fractal."""
a_cyclic = (6.28*a/20.0).reshape(list(a.shape)+[1])
img = np.concatenate([10+20*np.cos(a_cyclic),
30+50*np.sin(a_cyclic),
155-80*np.cos(a_cyclic)], 2)
img[a==a.max()] = 0
a = img
a = np.uint8(np.clip(a, 0, 255))
f = BytesIO()
PIL.Image.fromarray(a).save(f, fmt)
display(Image(data=f.getvalue()))

sess = tf.InteractiveSession()

# Use NumPy to create a 2D array of complex numbers

Y, X = np.mgrid[-1.3:1.3:0.005, -2:1:0.005]
Z = X+1j*Y

xs = tf.constant(Z.astype(np.complex64))
zs = tf.Variable(xs)
ns = tf.Variable(tf.zeros_like(xs, tf.float32))

tf.global_variables_initializer().run()

# Compute the new values of z: z^2 + x
zs_ = zs*zs + xs

# Have we diverged with this new value?
not_diverged = tf.abs(zs_) < 100

# Operation to update the zs and the iteration count.
#
# Note: We keep computing zs after they diverge! This
# is very wasteful! There are better, if a little
# less simple, ways to do this.
#
step = tf.group(
zs.assign(zs_),
ns.assign_add(tf.cast(not_diverged, tf.float32))
)

for i in range(2): step.run()

DisplayFractal(ns.eval())
1 change: 1 addition & 0 deletions estimator.py

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions estimator.pyn

Large diffs are not rendered by default.

124 changes: 124 additions & 0 deletions fmnist.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# -*- coding: utf-8 -*-
"""Untitled3.ipynb
Automatically generated by Colaboratory.
Original file is located at
https://colab.research.google.com/drive/164Q-XbnSJZS5VuKssweyn8Lqwvq438qS
"""

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

train_images.shape

plt.figure()
plt.imshow(train_images[0])
plt.colorbar()
plt.grid(False)

train_images = train_images / 255.0

test_images = test_images / 255.0

plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(train_images[i], cmap=plt.cm.binary)
plt.xlabel(class_names[train_labels[i]])

model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation=tf.nn.relu),
keras.layers.Dense(10, activation=tf.nn.softmax)
])

model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])

model.fit(train_images, train_labels, epochs=5)

test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

predictions = model.predict(test_images)

predictions[0]

np.argmax(predictions[0])

def plot_image(i, predictions_array, true_label, img):
predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])

plt.imshow(img, cmap=plt.cm.binary)

predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'

plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)

def plot_value_array(i, predictions_array, true_label):
predictions_array, true_label = predictions_array[i], true_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
thisplot = plt.bar(range(10), predictions_array, color="#777777")
plt.ylim([0, 1])
predicted_label = np.argmax(predictions_array)

thisplot[predicted_label].set_color('red')
thisplot[true_label].set_color('blue')

i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions, test_labels)

i = 12
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions, test_labels)

# Plot the first X test images, their predicted label, and the true label
# Color correct predictions in blue, incorrect predictions in red
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_image(i, predictions, test_labels, test_images)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_value_array(i, predictions, test_labels)
1 change: 1 addition & 0 deletions game_bot_AI.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"game_bot_AI.ipynb","version":"0.3.2","provenance":[]},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"metadata":{"id":"p9n6eN3vjGN3","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":35},"outputId":"4080075d-9e67-4194-d512-93c8a2f46eba","executionInfo":{"status":"ok","timestamp":1543311769499,"user_tz":-330,"elapsed":5766,"user":{"displayName":"RISHABH CHAKARABARTY","photoUrl":"","userId":"12942336028013708336"}}},"cell_type":"code","source":["!git clone https://github.com/openai/universe.git\n"],"execution_count":10,"outputs":[{"output_type":"stream","text":["fatal: destination path 'universe' already exists and is not an empty directory.\n"],"name":"stdout"}]},{"metadata":{"id":"_OC3DAJZjGPv","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":35},"outputId":"eb341705-4896-49e7-aa62-e0d499af37b7","executionInfo":{"status":"ok","timestamp":1543311781984,"user_tz":-330,"elapsed":1648,"user":{"displayName":"RISHABH CHAKARABARTY","photoUrl":"","userId":"12942336028013708336"}}},"cell_type":"code","source":["cd universe\n"],"execution_count":11,"outputs":[{"output_type":"stream","text":["/content/universe/universe\n"],"name":"stdout"}]},{"metadata":{"id":"660UxxpbjGR9","colab_type":"code","colab":{"base_uri":"https://localhost:8080/","height":35},"outputId":"de81faad-9b34-45a9-ede5-12cb0620abfe","executionInfo":{"status":"ok","timestamp":1543311798940,"user_tz":-330,"elapsed":4852,"user":{"displayName":"RISHABH CHAKARABARTY","photoUrl":"","userId":"12942336028013708336"}}},"cell_type":"code","source":["!pip install -e .\n"],"execution_count":13,"outputs":[{"output_type":"stream","text":["\u001b[31mDirectory '.' is not installable. File 'setup.py' not found.\u001b[0m\n"],"name":"stdout"}]}]}
Loading

0 comments on commit 8a1623b

Please sign in to comment.