Race for AI Chips Begins
Deep learning has continued to drive the computing industry’s agenda in 2016. But come 2017, experts say the Artificial Intelligence community will intensify its demand for higher performance and more power efficient “inference” engines for deep neural networks.
Source: Marc Duranton
The current deep learning system leverages advances in large computation power to define network, big data sets for training, and access to the large computing system to accomplish its goal.
Unfortunately, the efficient execution of this learning is not so easy on embedded systems (i.e. cars, drones and Internet of Things devices) whose processing power, memory size and bandwidth are usually limited.
This problem leaves wide open the possibility for innovation of technologies that can put deep neural network power into end devices.
“Deploying Artificial Intelligence at the edge [of the network] is becoming a massive trend,” Movidius CEO, Remi El-Ouazzane, told us a few months ago.
Asked what’s driving AI to the edge, Marc Duranton, Fellow of CEA's Architecture, IC Design and Embedded Software division, during the recent interview with EE Times, cited three factors -- “safety, privacy and economy” – prompting the industry to process data at the end node. Duranton sees a growing demand to “transform data into information as early as possible.”
Think autonomous cars, he said. If the goal is safety, autonomous functions shouldn’t rely on always-on connectivity to the network. When an elderly person falls at home, the incident should be detected and recognized locally. That’s important for privacy reasons, said Duranton. But not transmitting all the images collected from 10 cameras installed at home to trigger an alarm, can also reduce “power, cost and data size,” Duranton added.
Race is on
In many ways, chip vendors are fully cognizant of this increasing demand for better inference engines.
Semiconductor suppliers like Movidus (armed with Myriad 2), Mobileye (EyeQ 4 & 5) and Nvidia (Drive PX) are racing to develop ultra-low power, higher performance hardware-accelerators that can execute learning better on embedded systems.
Their SoC work illustrates that inference engines are already becoming “a new target” for many semiconductor companies in the post-mobile era, observed Duranton.
Google’s Tensor Processing Units (TPUs) unveiled earlier this year marked a turning point for an engineering community eager for innovations in machine learning chips.
At the time of the announcement, the search giant described TPUs as offering “an order of magnitude higher performance per Watt than commercial FPGAs and GPUs.” Google revealed that the accelerators were used for the AlphaGo system, which beat a human Go champion. However, Google has never discussed the details of TPU architecture, and the company won’t be selling TPUs on the commercial market.
Many SoC designers view that Google’s move made the case that machine learning needs custom architecture. But in their attempt to design a custom machine-learning chip, they wonder what its architecture would look like. More important, they want to know if the world already has a benchmarking tool to gauge deep neural network (DNN) performance on different types of hardware.
Tools are coming
CEA said it’s fully prepared to explore different hardware architectures for inference engines. CEA developed a software framework, called N2D2, enabling designers to explore and generate DNN structures. “We developed this as a tool to select the right hardware target for DNN,” said Duranton. N2D2 will become available as open source in the first quarter of 2017, he promised.