Mastering Gemini 1.5 Pro: Use Cases and Code Samples

Published On Wed May 08 2024

A tour of Gemini 1.5 Pro samples - Atamel.Dev

Back in February, Google announced Gemini 1.5 Pro with its impressive 1 million token context window. Larger context size means that Gemini 1.5 Pro can process vast amounts of information in one go — 1 hour of video, 11 hours of audio, 30,000 lines of code or over 700,000 words and the good news is that there’s good language support.

Samples Utilizing Gemini 1.5 Pro

In this blog post, I will point out some samples utilizing Gemini 1.5 Pro in Google Cloud’s Vertex AI in different use cases and languages (Python, Node.js, Java, C#, Go).

Audio Processing

Gemini 1.5 Pro can understand audio. For example, listen to this audio file. It’s 10:28 long but maybe you don’t have time or patience to listen to it fully. You can use Gemini to summarize it with Python in gemini_audio.py.

AZULLE Access4 Pro Zoom Mini PC Stick 4GB/64GB with ...

Transcribing Audio Files

If you want to transcribe the whole audio file instead, you can do it with Node.js in gemini-audio-transcription.js.

Video Processing

Take this 57 seconds long video for example. You can describe the video and everything people said in the video in Java with VideoInputWithAudio.java.

Multimodal Processing

Gemini Reshaping the NLP Task for Extracting Knowledge in Text ...

You can go even further and process images, video, audio, and text at the same time. Here’s how to do it in C# with MultimodalAllInput.cs.

Handling Pdf Files

Gemini 1.5 can even handle Pdf files. Here’s a Go example in pdf.go that summarizes a given PDF with the help of Gemini.

System Instructions Support

Gemini 1.5 supports system instructions. System instructions enable users to direct the behavior of the model based on their specific needs and use cases. It’s an additional context to understand the task over the full user interaction with the model.

Introducing Gemini 1.5, Google's next-generation AI model

For example, here’s a Python example in gemini_system_instruction.py on how to set system instructions. And with that instruction, the model answers in French.

Gemini 1.5 Pro is quite impressive with its multimodal nature and large context size. In this blog post, I provided you pointers to samples for different use cases in different languages.

If you want to learn more, here’s a list of further resources:

As always, for any questions or feedback, feel free to reach out.