The Tensor Processing Unit (TPU) is a high-performance ASIC chip that is purpose-built to accelerate machine learning workloads. Models that previously took weeks to train on general purpose chips like CPUs and GPUS can train in hours on TPUs. The TPU was developed by Google and is only available in Google Cloud.
There are a few drawbacks to be aware of:
The topology is unlike other hardware platforms and is not trivial to work with for those not familiar with DevOps and the idiosyncrasies of the TPU itself
The TPU only supports TensorFlow currently, although other frameworks may be supported in the future
Certain TensorFlow operations (e.g. customer operations written in C++) are not supported
TPUs are optimal for large models with very large batch sizes and workloads that are dominated by matrix-multiplication. Models dominated by algebra will not perform well.
Here's a full rundown of the architecture and a performance benchmark: