[ad_1]
Google is trying to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But while Gemini appears promising in some aspects, it’s falling short in others – as per our unofficial review revealed,
So what is Gemini? How can you use it? And how does it stack up to the competition?
To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updating as new Gemini models and features are released.
What is Gemini?
Gemini is from Google long promised, the next generation GenAI model family, is developed by Google’s AI research laboratories DeepMind and Google Research. It comes in three flavours:
- gemini ultraThe flagship Gemini model.
- gemini proA “Lite” Gemini model.
- gemini nanoA shorter “distilled” model that runs on mobile devices pixel 8 pro,
All Gemini models were trained to be “natively multimodal” – in other words, able to work with and use more than just words. They were pre-trained and refined on a variety of audio, images and videos, a large set of codebases, and text in different languages.
This is what differentiates Gemini from Google’s similar models LaMDA, which was trained specifically on text data. LaMDA cannot understand or generate anything other than text (e.g., essays, email drafts), but this is not the case with the Gemini model.
What is the difference between Gemini Apps and Gemini models?
Google is proving once again It lacks branding capabilities, making it not clear from the start that Gemini is separate and distinct from Gemini Apps on Web and Mobile (formerly Bard). Gemini Apps is simply an interface through which some Gemini models can be accessed – think of it as a client for Google’s GenAI.
Incidentally, Gemini apps and models are also completely free. Image 2, Google’s text-to-image model that is available in some of the company’s dev tools and environments. Don’t worry – you’re not the only one confused by this.
What can Gemini do?
Because Gemini models are multimodal, they can theoretically perform a range of multimodal tasks, from transcribing speech to captioning images and videos to producing artwork. Some of these capabilities have yet reached the product stage (more on that later), but Google is promising all of them – and more – at some point in the near future.
Of course, it’s a little hard to believe what the company says.
Google seriously under distributed With the original Bard launch. And recently it spread its wings With a video showing Gemini’s capabilities It turned out that it was heavily manipulated and was more or less ambitious.
Still, assuming Google is more or less truthful in its claims, here’s what the different levels of Gemini will be able to do once they reach their full potential:
gemini ultra
Google says so gemini ultra – Thanks to its versatility – it can be used to help with things like physics homework, solving step-by-step problems on worksheets, and pointing out potential mistakes in pre-filled answers.
Google says Gemini Ultra can also be applied to tasks like identifying scientific papers related to a particular problem – extracting information from those papers and generating the formulas needed to recreate charts with more recent data. To “update” a chart by.
The Gemini Ultra technically supports image creation, as mentioned earlier. But that capability hasn’t yet made its way into a productized version of the model — perhaps because the mechanism is more complex than in apps like this. chatgpt Generate images. Instead of feeding a signal to an image generator (e.g. FROM-E3In the case of ChatGPT), Gemini outputs images “natively” without any intermediary steps.
Gemini Ultra is available as an API through Vertex AI, Google’s fully managed AI developer platform, and through AI Studio, Google’s web-based tool for app and platform developers. It also powers Gemini apps – but not for free. Access to Gemini Ultra through what Google calls Gemini Advanced requires subscribing to the Google One AI premium plan, which costs $20 per month.
The AI Premium plan also connects Gemini to your broader Google Workspace account — think emails in Gmail, documents in Docs, presentations in Sheets, and Google Meet recordings. This is useful for summarizing emails or capturing notes with Gemini during a video call.
gemini pro
Google says Gemini Pro is superior to LaMDA in its reasoning, planning, and understanding capabilities.
an independent Study Researchers at Carnegie Mellon and BerryAI found that Gemini Pro is actually better than OpenAI GPT-3.5 In handling longer and more complex logic chains. But the study also found that, like all large language models, Gemini Pro particularly struggles with math problems involving multiple digits. Users have found lots of examples Of bad logic And mistakes.
However, Google promised improvements – and the first one came in the form of Gemini 1.5 Pro,
Designed as a drop-in replacement, the Gemini 1.5 Pro (currently in preview) is improved over its predecessor in several areas, perhaps most importantly in the amount of data it can process. Gemini 1.5 Pro (in limited private preview) can contain ~700,000 words, or ~30,000 lines of code – 35 times the capacity of Gemini 1.0 Pro. And – because the model is multimodal – it is not limited to text. Gemini 1.5 Pro can analyze up to 11 hours of audio or one hour of video in different languages, although slowly (for example, it takes 30 seconds to a minute to search for a scene in an hourlong video Seems like).
Gemini Pro is also available via API in Vertex AI to accept text as input and generate text as output. An additional endpoint, Gemini Pro Vision, can process text And Imagery – including photos and videos – and output text along the lines of OpenAI GPT-4 with Vision Sample.
Within Vertex AI, developers can adapt Gemini Pro to specific contexts and use cases using a fine-tuning or “grounding” process. Gemini Pro can also be connected to external, third-party APIs to perform special functions.
In AI Studio, there are workflows for creating structured chat prompts using Gemini Pro. Developers have access to both Gemini Pro and Gemini Pro Vision endpoints, and can adjust model temperatures to control the creative range of output and provide examples to dictate tone and style – and security settings. Can also tune.
gemini nano
The Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and is efficient enough to run directly on (some) phones rather than sending tasks to a server. So far it offers two features on the Pixel 8 Pro: summaries in Recorder and smart replies in Gboard.
The Recorder app, which lets users press a button to record and transcribe audio, includes a Gemini-powered summary of your recorded conversations, interviews, presentations, and other snippets. Users get these summaries even if they don’t have a signal or Wi-Fi connection available – and for the sake of privacy, no data leaves their phone in the process.
The Gemini Nano also has Google’s keyboard app Gboard developer Preview, There, it offers a feature called Smart Reply, which helps suggest the next thing you want to say while you have a conversation in the messaging app. Google says the feature only works with WhatsApp initially, but will come to more apps in 2024.
Is Gemini better than OpenAI’s GPT-4?
Google has many times put off Gemini’s superiority on benchmarks, claiming that Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in large language model research and development”. The company says Gemini Pro is more capable at tasks like summarizing, brainstorming, and writing content than GPT-3.5.
But leaving aside the question of whether the benchmarks actually indicate a better model, the scores reported by Google appear to be only marginally better than OpenAI’s corresponding models. And – as mentioned earlier – some of the early impressions have not been good. users And educational Pointing out that Gemini Pro gets basic facts wrong, struggles with translations and makes poor coding suggestions.
How much will Gemini cost?
Gemini Pro is free for use in Gemini Apps and, for now, AI Studio and Vertex AI.
Once Gemini Pro is out of preview in Vertex, the model will cost $0.0025 per character while the output will cost $0.00005 per character. Vertex customers pay per 1,000 characters (about 140 to 250 words) and, in the case of models like the Gemini Pro Vision, per image ($0.0025).
Let’s say a 500-word article has 2,000 characters. Summarizing that article with Gemini Pro would cost $5. Meanwhile, an article of similar length would cost $0.1 to produce.
Ultra pricing has not been announced yet.
Where can you try Gemini?
gemini pro
The easiest place to experience Gemini Pro is gemini apps, Pro and Ultra are answering questions in different languages.
There are also Gemini Pro and Ultra accessible In preview in Vertex AI via an API. The API is currently free to use “within limits” and supports some regions, including Europe, as well as features like chat functionality and filtering.
Elsewhere, there may be Gemini Pro and Ultra found In AI Studio. Using the service, developers can iterate prompts and Gemini-based chatbots and then obtain API keys to use them in their apps – or export the code to a more fully featured IDE.
Duet AI for developersGoogle’s suite of AI-powered assistance tools for code completion and generation is now using the Gemini model. And Google brought the Gemini model to its development tools For Chrome and Firebase mobile dev platforms.
gemini nano
Gemini Nano is on the Pixel 8 Pro – and will be coming to other devices in the future. Developers interested in incorporating the model into their Android apps can do so Sign up For a glimpse.
[ad_2]
Thanks For Reading