AI Chips for Training and Inference
Last updated
Last updated
GPU (Graphics Processing Unit) chips were originally developed for rendering 3D graphics onscreen. Nevertheless, GPUs have proved optimal for specialized computational tasks due to their ability to perform parallel computation in a way that CPUs may not.
How are GPUs different from CPUs? CPUs perform serial tasks very fast but with very little parallelism. A mid-range CPU may have a handful of cores and a mid-range GPU will have several thousand. GPU cores are much slower/less powerful but run in parallel. The parallelism of GPUs are optimal for neural networks because of the kind of math that is performed: Sparse matrix multiplication.
GPUs were popularized in the ML community after discoveries in 2009 and 2012 during which researchers co-opted NVIDIA GPUs and an NVIDIA library called CUDA to train an image recognition model orders of magnitude faster than was previously possible.
Anecdote: GPUs were also popularized in cryptocurrency mining for the same reason -- they can substantially outpace CPUs in tasks that benefit from parallel computation.
NVIDIA shares the GPU market with AMD but NVIDIA dominates the ML segment of the market because of the CUDA (and later on, cuDNN) libraries which have gained widespread usage.
For performance reasons, CPUs are not optimal for training models. That said, CPUs are often used to perform inference as GPUs are over-tuned for the task.
Although GPUs are much faster than CPUs for training ML models, they still contain features that are irrelevant to the ML user. Irrelevant capabilities include computing physics engines, shaders, and 3D environments. As a result, several purpose-built AI chips are currently under development by tech giants and startups alike:
FPGAs (field-programmable gate array) are purpose-built but generic enough to accommodate multiple types of tasks, from encryption to encoding. Example: Microsoft Brainwave
ASICs (application-specific integrated circuit) are typically designed for a single, specific task. Example: Google TPU.
Other examples include: Intel Nervana, Cerebras, Graphcore, SambaNova, Wave Computing, Groq, etc.
To deal with latency-sensitive applications or devices that may experience intermittent or no connectivity, models can also be deployed to edge devices.
Smartphones and other chips like the Google Edge TPU are examples of very small AI chips use for ML. They typically perform only the inference side of ML due to their limited power/performance. Environments might include deployment within a driverless car, robot, or IoT device.
Gradient supports GPUs and CPUs natively in both the hosted and customer-managed environments. Gradient recently announced upcoming support for the Intel Nervana chip. Other chips will most likely be supported in the future.