Run OpenAI Baselines on Kubernetes with Fiber¶
In this example, we'll show you how to integrate fiber with OpenAI baselines with just one line of code change.
If your project is already using Python's multiprocessing, then integrate it with Fiber is very easy. Here, we are going to use OpenAI Baselines as an example to show how to easily run code written with multiprocessing on Kubernetes easily.
Prepare the code¶
First, we clone baselines from Github, create a new branch and setup our local environment:
git clone https://github.com/openai/baselines cd baselines git checkout -b fiber virtualenv -p python3 env . env/bin/activate echo "env" > .dockerignore pip install "tensorflow<2" pip install -e . pip install fiber
Test that if the environment works:
python -m baselines.run --alg=ppo2 --env=CartPole-v0 --network=mlp --num_timesteps=10000
If it works, you should see something like this in your output
--------------------------------------- | eplenmean | 23.8 | | eprewmean | 23.8 | | fps | 1.95e+03 | | loss/approxkl | 0.000232 | | loss/clipfrac | 0 | | loss/policy_entropy | 0.693 | | loss/policy_loss | -0.00224 | | loss/value_loss | 48.4 | | misc/explained_variance | -0.000784 | | misc/nupdates | 1 | | misc/serial_timesteps | 2.05e+03 | | misc/time_elapsed | 1.05 | | misc/total_timesteps | 2.05e+03 | ---------------------------------------
OpenAI baselines has a
SubprocVecEnv that, according to it's documentation, runs multiple environments in parallel in subprocesses and communicates with them via pipes. We'll start from here to modify it to work with Fiber:
Fiberization (Or adapt your code to run with Fiber)¶
baselines/common/vec_env/subproc_vec_env.py and change this line:
import multiprocessing as mp
import fiber as mp
Let's do a quick test to see if this change works
python -m baselines.run --alg=ppo2 --env=CartPole-v0 --network=mlp --num_timesteps=10000 --num_env 2
--num_env 2 to make sure baselines is using
SubprocVecEnv. If everything works, we should see similar output as the previous run.
Containerize the application¶
OpenAI baselines already has a Dockerfile available, so we just need to add fiber to it by adding a line
RUN pip install fiber. After modification, the Dockerfile looks like this:
FROM python:3.6 RUN apt-get -y update && apt-get -y install ffmpeg # RUN apt-get -y update && apt-get -y install git wget python-dev python3-dev libopenmpi-dev python-pip zlib1g-dev cmake python-opencv ENV CODE_DIR /root/code COPY . $CODE_DIR/baselines WORKDIR $CODE_DIR/baselines # Clean up pycache and pyc files RUN rm -rf __pycache__ && \ find . -name "*.pyc" -delete && \ pip install 'tensorflow < 2' && \ pip install -e .[test] RUN pip install fiber CMD /bin/bash
It's a good habit to make sure everything works locally before submitting the job to the bigger cluster because this will save you a lot of debugging time. So we build our docker image locally:
docker build -t fiber-openai-baselines .
When Fiber starts new dockers locally, it will mount your home directory into docker. So we need to modify baselines' log dir to make sure it can write logs to the correct place by adding an argument
--log_path=logs. By default, baselines writes to
/tmp dir which is not shared by Fiber master process and subprocesses. We also add
--num_env 2 to make sure baselines uses
SubprocVecEnv so that Fiber processes can be launched.
FIBER_BACKEND=docker FIBER_IMAGE=fiber-openai-baselines:latest python -m baselines.run --alg=ppo2 --env=CartPole-v0 --network=mlp --num_timesteps=10000 --num_env 2 --log_path=logs
Running on Kubernetes¶
Now let's run our fiberized OpenAI baselines on Kubernetes. This time we run
1e7 time steps. Also, we want to store the output of the run on persistent storage. We can do this with
fiber command's mounting persistent volumes feature.
$ fiber run -v fiber-pv-claim python -m baselines.run --alg=ppo2 --env=CartPole-v0 --network=mlp --num_timesteps=1e7 --num_env 2 --log_path=/persistent/baselines/logs/
It should output something like this:
Created pod: baselines-d00eb2ef
After the job is done, you can copy the logs with these commands:
$ fiber cp fiber-pv-claim:/persistent/baselines/logs baselines-logs