The application of reinforcement learning (RL) in an enterprise context has been limited. Back in 2017 when Tractica blogged about RL in the enterprise, it was mostly about exploring use cases and development platforms. For the most part, RL has been applied in manufacturing, energy, building automation, or automotive settings, related to optimizing and improving the performance of autonomous machines or control systems that can be simulated in a RL-infused environment. Rather than have an AI model learn from a predefined dataset and spot patterns in the data, RL enables AI to learn from the environment and from positive reinforcements of actions.
Almost 12 months since Tractica published that blog, the needle has not moved much in terms of applying RL within the enterprise. The recent acquisition of Bonsai by Microsoft is a positive development for RL and should democratize and simplify the application of RL in the enterprise. However, the fruits of the Bonsai acquisition have yet to be seen across Microsoft’s Azure platform or its Cognitive Services portfolio.
As we head into 2019, there are some causes to be optimistic about RL and its application beyond academic papers and into enterprise production environments.
Facebook Open Sources its Reinforcement Learning Platform
Horizon is Facebook’s RL platform that the company has been using internally to improve several of its applications, from managing the quality of 360° video to filtering suggestions for its Messenger application. Facebook has effectively bridged the issue of applying RL in “simulation” environments that are limited by their inability to map effectively to the real world, by focusing on RL to automate AI production with a focus on policy optimization.
Typically, in a machine learning (ML) or deep learning workflow, the AI model provides predictions based on a certain dataset, but as the data changes, the models and the predictions also need to change. This constant feedback loop that needs to be maintained between the data and the predictions has been a major challenge for AI development teams and platforms.
For mission-critical, large-scale AI applications, Horizon provides a framework for training the RL model in an offline environment, first training on offline data, using a technique known as counterfactual policy evaluation (CPE) to ensure that the RL decisions being made are accurate and admissible in a live environment. The model training is performed in a PyTorch environment and the trained models are similar to Caffe2 using Facebook’s Open Neural Network Exchange Format (ONNX), as the models are ported to a production environment. Horizon also supports training across multiple graphics processing units (GPUs) and central processing units (CPUs).
Using the Horizon framework, Facebook was able to apply RL in large-scale, mission-critical applications like user notifications and 360° video, and have real-time policy optimization. Rather than have Facebook engineers manually change policies and models on a daily basis, Horizon allows for automated policy optimization and model training based on changing user data.
Expect More Reinforcement Learning Platforms in 2019
Facebook’s Horizon is a step in the right direction for enterprise adoption and application of RL. Until now, we have seen RL with a narrow lens mostly focused on “simulation-based” applications. While simulation is still a key component of any RL development, Horizon extends the applicability of RL into a large-scale AI production environment where RL models can be initiated using offline data, after which they can be integrated into a live environment where models are trained in a constant feedback loop. Horizon has not just automated policy optimization at Facebook, but it has allowed Facebook to use a better RL-based Deep Q-Network (DQN)-based model to improve performance of notifications without sacrificing quality.
While Horizon has been built over Facebook-friendly frameworks like PyTorch and Caffe2, the ONNX capability allows for integration with almost all ML frameworks, including TensorFlow, MXNet, CNTK, and PaddlePaddle. Despite Facebook’s best intentions of driving open source tools, Tractica expects Google, Microsoft, and Amazon all to follow up with their own RL-based development environments in 2019.
Tractica has been speaking to many developers and platform vendors in the AI space and there is a growing need for tools that support the automated retraining of models based on changing datasets. Horizon’s ability to use RL to support “live AI models” is likely to be replicated across other platforms and is a great application of RL.
Rather than have RL target a specific application area or set of use cases, it could become a standard feature of AI platforms going forward. Facebook’s RL model support includes DQN, parametric DQN, and deep deterministic policy gradient (DDPG) models, all of which are neural network or deep learning variants of RL. As the platforms grow and as AI developers start to apply RL across a wide variety of use cases, expect more RL models to emerge.