AI, ML, and IoT

Gemini 2.0 Flash: Google's New Multimodal AI Model

Leakite

Updated: December 15, 2024

Gemini 2.0 Flash: Google's Multimodal AI Powerhouse

Enhanced Capabilities

Google has launched Gemini 2.0 Flash, a new AI model capable of generating text, images, and audio. It integrates with third-party apps and services like Google Search, expanding its functionality beyond text generation. Learn more about Gemini 2.0's impact on Google Search.

Improved Performance

Gemini 2.0 Flash is twice as fast as Gemini 1.5 Pro and boasts superior math skills and factuality. It excels in coding and image analysis, making it Google's new flagship Gemini model. For insights into AI-powered research, explore Gemini's Deep Research.

Multimodal Generation

This model can generate and modify images alongside text. It can also process photos, videos, and audio recordings to answer questions about their content. The audio generation feature offers eight voices with different accents and languages, and users can adjust speech speed. Google uses SynthID technology to watermark all generated audio and images to address concerns about deepfakes.

Multimodal Live API

Google has also released the Multimodal Live API, enabling developers to create real-time multimodal apps with audio and video input. This API supports tool integration and handles natural conversation patterns, including interruptions. This development aligns with advancements in AI agents for web navigation, as discussed in Project Mariner.

Availability

An experimental version of 2.0 Flash is available through the Gemini API, AI Studio, and Vertex AI. Audio and image generation features are initially limited to early access partners, with a wider rollout planned for January. Google will integrate 2.0 Flash into products like Android Studio, Chrome DevTools, and Firebase in the coming months.