the battle for AI continues

Google launches Gemini—a powerful AI model it says can surpass GPT-4

Google claims Gemini beats GPT-4 in “30 of the 32 widely used academic benchmarks.”

Benj Edwards – Dec 6, 2023 1:01 pm | 80

The Google Gemini logo. Credit: Google

On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in large language model (LLM) research and development.” It’s a follow-up to PaLM 2, an earlier AI model that Google hoped would match GPT-4 in capability.

A specially tuned English version of its mid-level Gemini model is available now in over 170 countries as part of the Google Bard chatbot—although not in the EU or the UK due to potential regulation issues.

Like GPT-4, Gemini can handle multiple types (or “modes”) of input, making it multimodal. That means it can process text, code, images, and even audio. The goal is to make a type of artificial intelligence that can accurately solve problems, give advice, and answer questions in various fields—from the mundane to the scientific. Google says this will power a new era in computing, and it hopes to tightly integrate the technology into its products.

“Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information,” writes Google. “Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering, and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.”

Google says Gemini will be available in three sizes: Gemini Ultra (“for highly complex tasks”), Gemini Pro (“for scaling across a wide range of tasks”), and Gemini Nano (“for on device tasks” like Google’s Pixel 8 Pro smartphone). Each is likely separated in complexity by parameter count. More parameters means a bigger neural network that is generally more capable of executing more complex tasks but requires more computational power to run. That means Nano, the smallest, is designed to run locally on consumer devices, while Ultra can only run on data center hardware.

Google Gemini promotional video from Google.

“These are the first models of the Gemini era and the first realization of the vision we had when we formed Google DeepMind earlier this year,” wrote Google CEO Sundar Pichai in a statement. “This new era of models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead and for the opportunities Gemini will unlock for people everywhere.”

Although Gemini will come in three sizes, only the mid-level model is available for public use. As mentioned above, Google Bard now runs a specially tuned version of Gemini Pro. From our informal testing so far, Gemini Pro appears to perform much better than the previous version of Bard, which was based on Google’s PaLM 2 language model.

Google also claims that Gemini is more scalable and efficient than its previous AI models when run on Google’s custom Tensor Processing Units (TPU). “On TPUs,” Google says, “Gemini runs significantly faster than earlier, smaller and less-capable models.”

And it’s purportedly great at coding. Google trained a special coding-centric version of Gemni called AlphaCode 2, which “excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science,” according to Google. Gemini is also excellent at inflating Google’s PR language—if the models were any less capable and revolutionary, would the marketing copy be any less breathless? It’s doubtful.

In battle with GPT-4

Gemini isn’t Google’s first attempt to catch up to OpenAI’s ever-evolving GPT-4 model (which is now “GPT-4 Turbo”). The aforementioned PaLM 2, launched in May, was originally supposed to meet that goal. According to Google, Gemini Ultra does outperform GPT-4 on paper, but not everyone is impressed. As MIT Technology Review notes skeptically in its Gemini write-up, “Google DeepMind claims that Gemini outmatches GPT-4 on 30 out of 32 standard measures of performance. And yet the margins between them are thin… To judge from demos, it does many things very well—but few things that we haven’t seen before.”

How thin are the margins? In Google’s press materials, the company provides a chart of eight machine learning benchmarks (MMLU, Big-Bench Hard, DROP, HellaSwag, GSM8K, MATH, HumanEval, and Natural2Code) that aim to measure abilities like Python coding, reading comprehension, multi-step reasoning, commonsense reasoning, basic arithmetic, and general knowledge in 57 subjects. In all metrics except one (the superbly named “HellaSwag”) Gemini Ultra edged out GPT-4 with scores like 83.6 percent vs. 83.1 percent or 74.4 percent vs. 67.0 percent. Here’s the chart:

A Google Gemini benchmark performance chart provided by Google. Credit: Google

In particular, Google says Gemini Ultra’s score of 90 percent on the MMLU (massive multitask language understanding—testing knowledge of 57 subjects such as math, physics, history, law, medicine, and ethics) makes it the first AI model to outperform human experts on that benchmark.

But what does it all mean? To the average person asking Bard or ChatGPT-4 a question, maybe not much. Google hopes this benchmark performance will translate into more useful and accurate answers. Let’s say you’ll show Bard (using Gemini) a picture of your broken bicycle and hope it can tell you how to fix it. Will it actually be able to do that yet? And if not, do the benchmarks of 2 percent over GPT-4 actually matter? That’s a value conundrum in the AI space at the moment.

Even to machine learning researchers, the efficacy of machine learning benchmarks is a subject of ongoing research and debate, and their use is sometimes controversial due to the potential of testing an AI model on material that may be found in its data set. So, it’s important to take any metrics like these with a huge grain of salt.

For now, Google hopes that Gemini will be the opening salvo in a new chapter of the battle to control AI assistants in the future, opposing firms like Anthropic, Meta, and the in-tandem duo of Microsoft and OpenAI. Google DeepMind’s website has more information on how Gemini works in detail and how it sees its potential in scientific fields.

Google says that aside from the Pro version now available in Bard, Gemini 1.0 access will roll out over time. It will be part of its Pixel 8 Pro smartphone, which can run Gemini Nano on-device, and in the coming months, Gemini will be integrated into Search, Ads, Chrome, and Duet AI. And beginning December 13, developers and enterprise customers can use Gemini Pro through the Gemini API in Google AI Studio or Google Cloud Vertex AI.

Listing image: Google

Benj Edwards Senior AI Reporter

Benj Edwards was a reporter at Ars Technica covering artificial intelligence and technology history.

80 Comments

Staff Picks

ronamadeo

So Google admits that Gemini is still worse than ChatGPT right now, but sometime next year, the "Ultra" version might me fractionally better than what ChatGPT is now. Cool. Great.

December 6, 2023 at 6:34 pm

UseServ

Very much so. Their own benchmarks show that the version running on Bard falls way short of GPT-4 and even is outshone by PaLM 2 on many measures.

December 6, 2023 at 7:29 pm