Google Rolls Out Updated AI Model Capable Of Handling Longer Text, Video

Gemini 1.5 can process far more data compared with what the latest AI models from OpenAI can handle, Google says.

The Gemini logo on a smartphone arranged in New York, US, on Saturday, Dec. 9, 2023. Alphabet's Google said Gemini is its largest, most capable and flexible AI model to date, replacing PaLM 2, released in May.

Alphabet Inc.’s Google is rolling out a new version of its powerful artificial intelligence model that it says can handle larger amounts of text and video than products made by competitors.

The updated AI model, called Gemini 1.5 Pro, will be available on Thursday to cloud customers and developers so they can test its new features and eventually create new commercial applications. Google and its rivals have spent billions to ramp up their capabilities in generative AI and are keen to attract corporate clients to show their investments are paying off.

“We’re focusing first and foremost today on delivering you the research that enabled this model,” Oriol Vinyals, a Google vice-president and Gemini co-tech lead said in a briefing with reporters. “Tomorrow, we’re excited to see what the world will make of the new capabilities.” The mid-size version of the new AI model, Gemini 1.5 Pro, performs at a level similar to the larger Gemini 1.0 Ultra model, Google said.

Since OpenAI’s runaway success in late 2022 with its conversational chatbot ChatGPT, Google has been angling to show that it, too, is a force in cutting-edge generative AI technology, which can create new text, images or even video based on user prompts. More companies have been experimenting with the technology, which can be used to automate tasks like coding, summarizing reports or creating marketing campaigns.

Google released its AI model Gemini in December with three versions, allowing it to be customized to the task at hand and able to run on everything from mobile devices to large-scale data centers. Gemini is Google’s response to the allied forces of Microsoft Corp. and OpenAI, which some say have been quicker to take advantage of the current AI boom, including among cloud customers and developers.

Now, Google is seeking to lure those users into its ecosystem with even more powerful tools. Gemini 1.5 can be trained faster and more efficiently, and has the ability to process a huge amount of information each time it’s prompted, according to Vinyals. For example, developers can use Gemini 1.5 Pro to query up to an hour's worth of video, 11 hours of audio or more than 700,000 words in a document, an amount of data that Google says is the “longest context window” of any large-scale AI model yet. Gemini 1.5 can process far more data compared with what the latest AI models from OpenAI and Anthropic can handle, according to Google. 

In a pre-recorded video demonstration for reporters, Google showed off how engineers asked Gemini 1.5 Pro to ingest a 402-page PDF transcript of the Apollo 11 moon landing, and then prompted it to find quotes that showed “three funny moments.” One of answers from the AI model noted that, five hours into the Apollo 11 mission transcript, astronaut Michael Collins told Mission Control, “If we’re late in answering you, it’s because we’re munching sandwiches.”

In another pre-recorded demo, Google engineers asked Gemini 1.5 Pro to find a particular scene in a 44-minute Buster Keaton film, providing the AI model with a rough sketch of the scene they remembered. Gemini found the scene successfully, noting that it was depicted around 15 minutes into the video.

Google cautioned, however, that like all generative models, responses aren’t always perfect. Gemini 1.5 Pro is still prone to hallucinations, works slowly at times and doesn’t always understand the intent of users, forcing them to ask their questions in different ways before the model comes up with the right response. Vinyals said the company is “working to optimize” the performance of Gemini 1.5 to make it faster and that it’s “still in an experimental stage and in a research stage.”

The company said developers can explore Gemini 1.5 Pro using Google’s AI Studio, while some cloud customers can access the AI model in private preview on its enterprise platform, Vertex AI. Google also said on Thursday that it would expand access to its large-scale Gemini 1.0 Ultra, opening the model up to a wider number of global customers on Vertex AI.

More stories like this are available on bloomberg.com

©2024 Bloomberg L.P.

Watch LIVE TV , Get Stock Market Updates, Top Business , IPO and Latest News on NDTV Profit.
GET REGULAR UPDATES