On Wednesday, Google announced Gemini, a multimodal AI model family it hopes will rival OpenAI’s GPT-4, which powers the paid version of ChatGPT. Google claims that the largest version of Gemini exceeds “current state-of-the-art results on 30 of the 32 widely used academic benchmarks used in large language model (LLM) research and development.” It’s a follow-up to PaLM 2, an earlier AI model that Google hoped would match GPT-4 in capability.
A specially tuned English version of its mid-level Gemini model is available now in over 170 countries as part of the Google Bard chatbot—although not in the EU or the UK due to potential regulation issues.
Like GPT-4, Gemini can handle multiple types (or “modes”) of input, making it multimodal. That means it can process text, code, images, and even audio. The goal is to make a type of artificial intelligence that can accurately solve problems, give advice, and answer questions in various fields—from the mundane to the scientific. Google says this will power a new era in computing, and it hopes to tightly integrate the technology into its products.
“Gemini 1.0’s sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information,” writes Google. “Its remarkable ability to extract insights from hundreds of thousands of documents through reading, filtering, and understanding information will help deliver new breakthroughs at digital speeds in many fields from science to finance.”
Google says Gemini will be available in three sizes: Gemini Ultra (“for highly complex tasks”), Gemini Pro (“for scaling across a wide range of tasks”), and Gemini Nano (“for on device tasks” like Google’s Pixel 8 Pro smartphone). Each is likely separated in complexity by parameter count. More parameters means a bigger neural network that is generally more capable of executing more complex tasks but requires more computational power to run. That means Nano, the smallest, is designed to run locally on consumer devices, while Ultra can only run on data center hardware.



Very much so. Their own benchmarks show that the version running on Bard falls way short of GPT-4 and even is outshone by PaLM 2 on many measures.