OpenAI’s reign at the forefront of generative AI might be approaching its conclusion, with Google officially unveiling its most advanced large language model to date on Wednesday, named Gemini 1.0. Described by Google’s CEO Sundar Pichai as the inaugural member of “a new generation of AI models, inspired by the way people understand and interact with the world,” Gemini 1.0 marks a significant milestone.
Reflecting on his personal journey from programming AI for computer games as a teenager to his years as a neuroscience researcher studying the intricacies of the brain, Pichai expressed his enduring belief in the potential of building more intelligent machines for the betterment of humanity.
Born out of a collaborative effort between Google’s DeepMind and Research divisions, Gemini 1.0 is equipped with all the advanced features characteristic of cutting-edge generative AIs. According to Pichai, its capabilities are deemed state-of-the-art across nearly every domain.
The system has been meticulously developed as an integrated multimodal AI from the ground up. While many foundational models are essentially composed of smaller models stacked together, each individually trained for specific functions within the larger framework, this approach proves effective for simpler tasks like image descriptions but falls short for intricate reasoning assignments.
In contrast, Google took a different approach with Gemini, opting for pre-training and fine-tuning “from the start on different modalities.” This allows Gemini to seamlessly comprehend and reason across various inputs, surpassing the capabilities of existing multimodal models, particularly in addressing more challenging subjects such as physics, as stated by Pichai.
Gemini’s versatility extends to coding proficiency; it reportedly excels in popular programming languages like Python, Java, C++, and Go. Google has utilized a specialized version of Gemini to develop AlphaCode 2, the successor to the previous year’s competition-winning generative AI. According to the company, AlphaCode 2 outperformed its predecessor by solving twice as many challenge questions, positioning its performance above that of an estimated 85 percent of the participants in the previous competition.
While Google has not immediately disclosed the number of parameters Gemini can employ, the company emphasizes the model’s operational adaptability and its capacity to function across various form factors, ranging from large data centers to local mobile devices. To achieve this transformative capability, Gemini is offered in three sizes: Nano, Pro, and Ultra.
Nano, as expected, is the smallest of the trio, primarily designed for on-device tasks. Pro represents the next tier, offering greater versatility than Nano and is set to integrate into various Google products, including Bard.
Commencing from Wednesday, Bard will utilize a specially tuned version of Pro, promising enhanced reasoning, planning, understanding, and more. The upgraded Bard chatbot will be accessible in the same 170 countries and territories as the regular Bard, with plans to expand availability for the new version throughout 2024. In the upcoming year, with the introduction of Gemini Ultra, Google will introduce Bard Advanced, a more robust AI with additional features.
Pro’s capabilities will be conveniently accessible through API calls using either Google AI Studio or Google Cloud Vertex AI. In the upcoming months, Gemini functionality will be integrated into features of various Google offerings, including Search (specifically SGE), Ads, Chrome, and Duet AI.
Gemini Ultra is expected to be unavailable until at least 2024, as it undergoes additional red-team testing before being cleared for release to a select group of customers, developers, partners, safety experts, and responsibility experts for testing and feedback. When it eventually becomes available, Ultra is anticipated to be an immensely powerful tool for advancing AI development.