Deep Q-Networks and Practical Reinforcement Learning with TensorFlow
This blog post highlights things-to-know while enabling reinforcement learning with TensorFlow, as discussed at one of the sessions at TensorBeat 2017. You will find out what toolkit simplifies the work done within an environment, how to handle pitfalls of distributed learning, boost performance across multiple environments, etc.
Making reinforcement learning work
Illia Polosukhin, a co-founder of XIX.ai, provided some practical insights into reinforcement learning with TensorFlow.
As Illia puts it, one doesn’t actually have to train data as part of reinforcement learning, but rather drive different types of observations form an environment, perform actions, etc.
To do so, one can employ OpenAI Gym, which is a toolkit for developing and comparing reinforcement learning algorithms. It features a library of environments for games, classical control systems, etc. to aid developers in creating algorithms of their own. Each of the environments has the same API. The library also enables users to compare / share the results.
Illia demonstrated a sample code of a simple agent acting within an environment.
He also showed the code behind an agent.
So, what makes it all work?
The set of states and actions coupled with the rules for transitioning from one state to another, make up the Markov decision process (MDP). One episode of this process (e.g., one game) produces a finite sequence of states, actions, and rewards.
What one has to define is:
- A return (a total discounted reward)
- Policy: The agent’s behaviour (deterministic or stochastic)
- The expected return starting from a particular state (state-value function, action-value function)
One of the ways to approach reinforcement learning is deep Q-learning—a model-free, off-policy techniques. What it means is that there is no MDP approximation or learning inside the agent. Observations are stored into replay buffers and are further used as training data for the model. Being off-policy ensures that optimal learning policy is independent of the agent’s actions.
Illia then demonstrated what the Q-network code looks likes.
As well as how to run optimization.
More examples can be found in this GitHub repo.
A monitored session
As one of the tricks at hand when training a TensorFlow model, Illia suggested using MonitoredSession for:
- handling pitfalls of distributed training
- saving and restoring checkpoints
- injecting computation into TensorFlow training loop via hooks
Asynchronous Advantage Actor-Critic
To enhance reinforcement learning, the Asynchronous Advantage Actor-Critic (A3C) algorithm can be employed. In contrast to a deep Q-learning network, it employs multiple agents represented by multiple neural networks, which interact with multiple environments. Each of the agents interacts with its own copy of the environment and is independent of the experience of the other agents.
Furthermore, this algorithm allows for estimating both a value function and a policy (a set of action probability outputs). The agent uses the value estimate (the critic) to update the policy (the actor) more intelligently than traditional policy gradient methods. Finally, one can estimate how different the output is from the expected one.
All the above mentioned can be applied in such spheres as robotics, finance, industrial optimization, and predictive assistance.
Join our group to stay tuned with the upcoming events.
Want details? Watch the video!
- What Is Behind Deep Reinforcement Learning and Transfer Learning with TensorFlow?
- Learning Game Control Strategies with Deep Q-Networks and TensorFlow
- TensorFlow in Action: TensorBoard, Training a Model, and Deep Q-learning
- Performance of DL Frameworks: Caffe, Deeplearning4j, TensorFlow, Theano, and Torch
About the speaker
Illia Polosukhin is a chief scientist and a co-founder at XIX.ai. Prior to that, he worked as an engineering manager at Google. Illia is passionate about all things artificial intelligence and machine learning. He has gained master’s degree in Applied Math and Computer Science from Kharkiv Polytechnic Institute. You can check out his GitHub profile.
To stay tuned with the latest updates, subscribe to our blog or follow @altoros.