A quiet revolution is taking place in artificial intelligence (AI) software. Until now, AI developers proved their mettle by using trial and error to choose the best algorithm or ensemble of algorithms suited to solve specific problems. Model selection brings optimization, tuning and optimizing the parameter space. This can be a major issue for deep learning in terms of development time and cost, as training deep learning networks can take up multiple graphics processing unit (GPU) clusters. For example, Google Brain is known to have spent more than 18 million processor hours on one of its papers presented at the International Conference on Learning Representations (ICLR) 2017, while 800 GPUs were used for experiments by Google on another paper.
Mind-Boggling Costs and Compute Requirements
AI cloud processing costs approximately $0.70 per hour, so the costs can be in the millions of dollars for work on a single paper. It is no surprise that the best papers in AI today are coming from hyperscalar companies like Google and Facebook, which have built their own GPU cloud rigs. Today, most state-of-the-art algorithms in AI have a billion or more parameters to tweak. While the primary reason for increasing compute requirements for state-of-the-art AI algorithms is the increasing size of the workload driven by large datasets, the impact of model engineering, especially around testing and tuning different models, hyperparameters, and network architectures, adds to the development cost and time. The design of a deep learning network requires manual processes that can run into bottlenecks; the deeper the network, the more intensive it becomes. For example, a 10-layer deep neural network has 10^10 candidate network combinations, which is a mind-boggling search space to work with for any data scientist.
We are now starting to see the use of techniques like reinforcement learning and evolutionary algorithms in choosing the best network architectures, something that Google has been developing with its automatic machine learning (AutoML) feature specifically for deep learning. Microsoft’s AutoML challenge has been inviting teams over the last few years to perform fully automatic machine learning (mostly classical machine learning like regression or classification) without any human intervention. Although there is room for improvement, the processes of parameter tuning and model selection are slowly becoming automated.
Automatically Identifying Model Selection and Problem Domains
Several open-source tools are available that are providing the AutoML feature, such as TPOT, Auto-Sklearn, Auto-WEKA, and machineJS, mostly coming from university research labs. While Google develops its own AutoML function, Facebook has already incorporated AutoML in its FBLearner Flow architecture for hyperparameter optimization and hopes to expand its use further. Companies like Veritone are using AutoML for model selection using multiple machine learning engines from its database of 100+ machine learning engines to solve different classes of problems, including language transcription, face recognition, object recognition, sentiment analysis, and language translation, among others. Commercial solutions are also becoming available for enterprise-grade AutoML, such as DataRobot. Other enterprise-grade AI platform providers like Skytree, Ayasdi, and H2O are also now supporting AutoML features for hyperparameter tuning or model selection.
AutoML is known to have significantly reduced development time from multiple hours to a few minutes in areas like model selection. Most problem domains like image recognition, natural language processing (NLP), and game playing favor a specific class of algorithms. Image recognition generally uses a convolutional neural network (CNN), NLP uses a recurrent neural network (RNN), and long short-term memory (LSTM) and game playing use reinforcement learning. However, once we dive deeper into the class of problems, there could be models and algorithms that work on a specific class of images, languages or terminologies, or types of games. Different architectures of the neural network could be deployed in deep learning, such as using different combinations of previous and hidden states in a neural network. One class of algorithm like CNN could work in combination with reinforcement learning or LSTMs to achieve the right results. The model ensemble approach becomes easier with AutoML and breaks away from the limitations of a human data scientist’s time, skills, and creativity.
Increasing Processing Capabilities and Decreasing Compute Costs
AutoML capabilities will grow over time, and while there is some saving of compute capacity by not having to train and retrain models to find the best fit, in the future as workloads grow and as the complexity of problems increases, it will become common to throw in heavy compute resources to have the AI find the best approach of models. Over time, Tractica expects the processing capabilities to increase and the compute cost to decrease, especially with the emergence of customized application-specific integrated circuit (ASIC) architectures for AI. It will become commonplace to have AI use the massive compute capabilities at the edge or in the cloud to find the best ensemble of models to solve a specific problem.
Therefore, Tractica expects a rebalancing of compute demand to happen in the medium to long term. In the short term, there is likely to be an impact from one-shot learning techniques and pre-trained models that reduce the need for compute resources, or compressed frameworks that allow for existing hardware to process advanced algorithms at the edge; all techniques that target optimizing software for a given hardware. In the medium to long term, this will be followed by an era during which the hardware will catch up, and rather than worrying about compute costs and time, hardware will be capable of performing large-scale model selection and parameter tuning, becoming smart about how to train and infer AI algorithms.
The rise of AutoML also gets us one step closer toward realizing strong AI or generalized intelligence, as AI starts to design the AI itself. Rather than having humans code specific algorithms for specific problem domains, AI can control the ensemble of models and apply them to a family of problems.