C. Scott Brown/Android Authority
TL;DR
- Google has announced the eighth generation Tensor Processing Unit (TPU) for its data centers.
- The new category of TPU is divided based on use, with separate units for training and inference.
- Google says this reduces the energy required for the actual end use, which should benefit the environment.
At its Google Cloud Next event last year, Google announced the Ironwood class of tensor processing units (TPUs) that power its data centers. Designed with the AI ​​era in mind, these TPUs focus largely on the ability of AI to make inferences or predictions based on what it has been trained on (essentially what chatbots do), but without actually knowing the answer in advance. This year, it has made further advances in TPU hardware and is now splitting the computation to perform training and inference separately.
At Cloud Next 2026, Google announced its eighth generation of TPUs with different architectures for different purposes. The newly introduced TPUs include the TPU 8T, which will be used for training AI models, and the TPU 8i, which will be specialized for inference-related duties.
Google says the split is done to address the different power and computing requirements of the two processes. This approach will help its data centers reduce energy consumption, thereby reducing operating costs and reducing the negative impacts of AI on the environment. This means your use of Gemini to keep data centers cool may soon (hopefully!) consume a lot less water.
Don’t want to miss the best of Android Authority?


Training neural networks involves high-bandwidth memory and large arrays of processing units because it requires updating billions of parameters every second. Training involves a process called “backward propagation of errors,” which involves countless feedback loops that test and optimize the neural network on the training set until it starts remembering accurate data. It’s basically like testing a person until they tell you the correct answer.

C. Scott Brown/Android Authority
Meanwhile, inference is less intensive and can be processed on less capable hardware with much lower memory consumption. Therefore, using the same hardware for training and inference makes the actual cost much higher, resulting in higher effective costs for inference-related tasks.
Google has previously introduced TPU v5e (where the “e” stands for efficiency) for very small-scale operations. The latest TPU 8i appears to be a massive optimization based on the previous hardware. Amazon is also trying to achieve a similar effect with AWS Inferentia.
While Google has pointed to the environmental benefits of using a dedicated logic TPU, we haven’t seen any promises about reducing costs. It remains to be seen whether Google will extend some benefits to its consumers or reserve the benefits for itself and its corporate affiliates.
Thank you for being a part of our community. Please read our comment policy before posting.
