CUDA and tensor cores made Nvidia the AI hardware standard
This article analyzes Nvidia's strategic evolution from a graphics chip company to a dominant force in AI, focusing on the development of its GPU architecture and CUDA software platform. It details how architectural innovations like unified shaders (G80) and specialized Tensor Cores (Volta, Hopper, Blackwell) enabled its GPUs to become foundational for deep learning and large language models. The report highlights CUDA as a critical competitive advantage that fostered a robust ecosystem for parallel computing and AI research.
Key Takeaways
- GeForce 256 in 1999 moved Transform and Lighting onto the GPU, creating the first mass-market chip for geometric processing.
- G80 architecture in 2006 replaced separate vertex and pixel shaders with unified Stream Multiprocessors, making the GPU broadly parallel and programmable.
- CUDA launched in November 2006 and was shipped across Nvidia’s GPU lineup, from $3,000 workstation cards to $50 budget cards.
- AlexNet trained in 2012 on two GeForce GTX 580 3GB GPUs and reached a 15.3% top-5 error rate, versus 26.2% for the runner-up.
- Blackwell B200 introduced a dual-die package, a 10 TB/s interface, and FP4 precision for large-scale inference.
Why It Matters
Nvidia’s lead now rests as much on CUDA and software libraries like cuDNN and TensorRT as on chip design, because the article shows how each architecture change widened the gap for AI training and inference. That matters for the broader AI stack: PyTorch and TensorFlow are deeply optimized for CUDA, while competitors such as AMD’s MI300X and Intel’s Gaudi 3 still have to contend with that installed base. The clearest signal to watch next is the pace of Blackwell adoption, especially the NVL72 rack, which the article says draws 120 kilowatts and requires liquid cooling.
Read full article at crvscience.com
