(CNA Reporter Chang Hsin-yu, Las Vegas, 22nd, Exclusive Dispatch) The era of AI agents has arrived. Seeing inference as the largest future computing demand, Google today released its 8th generation AI chip, the TPU. Unlike previous generations, the new generation consists of two products: TPU 8t, which focuses on training and significantly shortens model training time, and TPU 8i, which focuses on inference and reduces data access latency.
As Artificial Intelligence (AI) moves from the conversational era into the Agentic Era, market demand for Inference continues to expand. AI leader Google today unveiled its next-generation self-developed chip, the TPU (Tensor Processing Unit), at the Google Cloud Next conference in Las Vegas, as anticipated by the market.
The new generation TPU comes in 'two models', including the TPU 8t specifically for training and the TPU 8i specifically for inference.
Compared to the previous generation Ironwood TPU, both chips offer up to a 2x improvement in performance per watt.
Before the official start of the conference, Google showcased its past generations of TPUs at a media-only event. From the first-generation chip launched in 2015 to the two custom chips unveiled this year designed for the AI agent era, camera flashes were non-stop.
Amin Vahdat, Google's Chief Technology Officer for AI and Infrastructure, stated that Google's pace of innovation continues to accelerate, moving from a new generation every 3 years, to 2 years, to 1 year. He also noted: 'The Google team realized two years ago that one chip a year is not enough; this is our first attempt at introducing two high-performance, specialized AI chips.'
For large-scale training, the TPU 8t offers a 2.8x improvement in cost-performance. Regarding memory configuration, it utilizes 216GB of High Bandwidth Memory (HBM) and is equipped with 128MB of Static Random-Access Memory (SRAM).
A single TPU 8t Superpod can scale up to 9,600 chips.
Google also announced a network architecture named Virgo, which is crucial for training ultra-large models using the TPU 8t.
The inference-focused TPU 8i has higher memory bandwidth, significantly reducing inference latency. It is equipped with 288 GB HBM and 384 MB SRAM, breaking through the 'memory wall' bottleneck of latency and high energy consumption caused by frequent data movement.
Notably, the TPU 8i utilizes a new network topology design called Boardfly to improve communication efficiency between chips.
Vahdat indicated that Google's two new chips will be available to cloud customers later this year.
Google TPUs have historically been co-developed with Broadcom, but rumors suggest MediaTek has secured a large order for the new generation inference chip. Responding to inquiries from CNA, Google stated it is inconvenient to publicly discuss details regarding supply chain partners. (Editor: Chang Chih-hsuan) 1150422
Choose to stand with the facts; every sponsorship you make is the power to protect freedom of the press.
Download the CNA 'First Hand News' APP to grasp the latest news instantly.
The text, images, and audio/video on this website may not be reproduced, publicly broadcast, publicly transmitted, or utilized without authorization.
FACT BOX
- Source: CNA (Central News Agency)
- Category: New Product
- Organizations: Broadcom
- Products / services: TPU 8t / TPU 8i