Google introduced a large language model, Gemma 3, that can run on a single GPU
Google has unveiled a new version of its open-source large language model (LLM), Gemma 3, which builds on technology and research from Gemini 2.0. It can run on a single GPU or tensor processing unit, outperforming models from DeepSeek, OpenAI, and Meta.
The new model has out-of-the-box support for over 35 languages, with preliminary support for over 140 languages. It can analyze text, images, and short videos, and offers a context window of 128,000 tokens, allowing it to process and understand large amounts of data.
Gemma 3 also supports function calling and structured inference, allowing for task automation and agent-based systems. In addition, the model has formal quantum versions that reduce its size and computational requirements while maintaining high accuracy.
In the Chatbot Arena Elo rating, which evaluates the performance of LLMs in random anonymous battles, Gemma 3 outperforms models such as DeepSeek-V3, OpenAI o3-mini, Meta Llama 405B, and Mistral Large. At the same time, it works much more efficiently: while DeepSeek models require 32 accelerators, Gemma 3 performs the same tasks (and sometimes better) on just a single NVIDIA H100 chip.
Google notes that the Gemma family of models is a year old and shares statistics: the open LLM has been downloaded over 100 million times, and developers have created over 60 thousand variations within the Gemmaverse ecosystem.