Unleashing the Power of Gemini Models with Long Context

Published On Thu Dec 05 2024
Unleashing the Power of Gemini Models with Long Context

Re: Gemini Pro and Flash 002 suddenly shorter cont... - Google ...

Hello, I can no longer use the Vertex AI API for Gemini models with long context, this is the error I get:

run with [gemini-1.5-pro-002] failed:

Unable to submit request because the input token count is 53163 but model only supports up to 32768. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models

Exploring the Capabilities of Gemini 1.5 Pro with 10M Tokens The same code and same models used to work (as expected, as per documentation Pro and Flash 002 should have context of >= 1M tokens). I wonder if I am enrolled in some live experiment to some model that only supports short context windows (like the most recent gemini exp, which supports only 32K context window).

Hi @AskingQuestions,Welcome to Google Cloud Community!

The error message indicates that the context length you provided (53,163 tokens) exceeds the maximum token limit (32,768 tokens) supported by the Gemini 1.5 Pro model. It’s possible that you're now using a version of the model with a smaller token limit than you expected. As a temporary workaround, you might consider exploring other models that offer similar context lengths or test with a smaller context window to confirm that the issue is indeed related to the context window size and not something else in your request.

If the issue persists, I suggest contacting Google Cloud Support as they can provide more insights to see if the behavior you've encountered is a known issue or specific to your project. I hope the above information is helpful

Billing | Gemini API | Google AI for Developers As mentioned in the initial submission, the models I am using (gemini 1.5 002 pro and flash) have much longer context windows (over a million tokens) than what I am sending (less than 60,000 tokens). This means that my requests get routed to some different model behind the scenes, because it often works without issues.