NVIDIA’s Inference Push for AI

nvidias-inference-push-for-ai

NVIDIA’s graphics processing unit (GPU) architecture has revolutionized AI processing by becoming a dominant player in AI hardware. NVIDIA’s dominance has largely come on the back of the rise of deep learning and its ability to use its massively parallel architecture to speed up the AI training portion of the deep learning workflow. Deep learning model training involves feeding large datasets into a neural network and GPUs have been the main workhorse with a 10X to 100X performance gain over central processing units (CPUs). Also, the more GPUs that you put to the task, the faster the scaling on speed, going from weeks to a few hours.

NVIDIA Aims to Ensure Performance Upgrades

NVIDIA’s revenue from its cloud data center business that supplies GPUs for AI workloads has grown 524% in the last 4 years, and is now estimated to make up more than 20% of its total revenue. NVIDIA is transitioning from a gaming hardware company to an AI hardware company. However, the majority of this AI revenue has been related to training. Intel’s Xeon X86 architecture has been the training workhorse for the most part, with most data centers across the world already having Xeons embedded in them. But NVIDIA is steadily encroaching upon Xeon territory.

NVIDIA has had some inference capabilities mainly from its lower precision Tesla P4 and Tesla P40 GPUs, both of which were released in 2016. The hyperscalers have since been adopting NVIDIA’s Tesla chips because of the performance upgrade compared to CPUs and field programmable gate arrays (FPGAs). For the most part, video is known to be driving inference workloads, followed by speech recognition, search, image recognition, and maps. Though the inference portion of NVIDIA’s revenue is growing, it is still a very small portion of the overall AI revenue, which is very training heavy.

NVIDIA has now upgraded its inference products by announcing the Tesla T4 accelerator. This is based on its new Turing architecture and has raised performance significantly. Float precision enables a 10X performance gain and INT8 enables a 6X performance gain. Also, by moving to lower precision INT8 to INT4, performance can be doubled to 260 teraflops (TFLOPS).

(Source: NVIDIA)

NVIDIA is surely gaining customers, with Microsoft seeing a 60X reduction in latency in video recognition, and SAP seeing a 40X higher performance gain on image recognition tasks. Hyperscalers like Microsoft and Google that value inference latency are likely to see an order of magnitude or more in performance gains in terms of latency compared to CPUs or even the older Tesla P4 GPU.

(Source: NVIDIA)

Inference Hardware Market Holds Great Potential

NVIDIA believes that inference hardware (not chipsets) will be a $15 billion market by 2020 and bigger than the inference market. Tractica’a own analysis supports this thesis about inference being bigger than training, but Tractica views the scale of the inference market as much bigger. By 2025, according to Tractica’s Deep Learning Chipsets report, while cloud-based AI chipsets will account for $14.6 billion in revenue, edge-based AI chipsets will account for $51.6 billion.

Essentially, inference will be 3.5X bigger than training in terms of market potential, with most of that inference being driven at the edge. Most of this edge-based chipset revenue will come from mobile phones, smart speakers, drones, AR/VR headsets, robots, security cameras, and other devices, all of which are going to need AI-based edge processing.

Deep Learning Chipset Revenue by Market Sector, World Markets: 2016-2025

(Source: Tractica)

NVIDIA’s T4 with its 70 W power envelope is targeted at the data center, rather than the edge. NVIDIA does have some business in the automotive, smart city, and robotics sectors driving its inference business through the Xavier and Jetson platforms; however, the competition in that space is going to heat up. GPUs have yet to prove their mettle in low-power embedded applications and are at a disadvantage because of their power consumption. Jetson is its lowest power GPU module, which has a 7.5 W power budget. That might be useful for a plugged-in security camera, but it does not help for drones, mobile phones, augmented reality (AR)/virtual reality (VR) headsets or smart speaker devices, which typically require 5 W or less of power.

Can NVIDIA Adapt to Meet the Changing Needs of the AI Hardware Market?

Alternative chipset architectures like SoC accelerators and, eventually, application-specific integrated circuits (ASICs) are expected to lead the edge-based inference market. Tractica’s latest report on Artificial Intelligence for Edge Devices dives deeper into the details of the various chipset architectures, the pros and cons of each, the different device categories driving edge AI, and the future of GPUs. The report suggests that the current regime of cloud-based training and inference will give way to a more decentralized approach where training for larger models will occur in the cloud, but for the most part, AI models will be trained and inferred at the edge.

Based on NVIDIA’s current portfolio and its latest inference push, it is hard to see NVIDIA maintaining its stronghold on the overall AI hardware market as it transitions away from the data center. NVIDIA will have to either invent new techniques to drive higher performance at low-power budgets or it will have to acquire one of the many startup companies that have emerged in the AI hardware space, many of which are focused on the AI edge market opportunity, especially around low-power embedded applications.

Comments are closed.