10 Catchy Titles for an Offline Token-Counter Dilemma

Published On Fri Mar 28 2025

Solved: Re: Can we please get an offline token-counter so ...

I'm quite proud of my "chunker" for my custom RAG, which has some elegant recursive mechanisms based on tiktoken:

` tokenizer = tiktoken.encoding_for_model("text-embedding-3-large") `

I'm wanting to upgrade to gemini-embedding-exp-03-07 but there's no way to count tokens without the 1000000x slowdown of online API calls involved.

Is there an official local-lib we can use which is guaranteed to count tokens properly which we can use, without exhausting our API limits and slowing down all our code unnecessarily?

I'm asking, not just for myself, but for everyone who seriously wants to try and work with these models: "hello world" simple examples where token-counts are basically ignored are nice and all, but a production environment is going to depend on the existence of robust tools we can use... not actually having/releasing those things makes all these Gemini-* models a "non starter" for serious business use-cases...

Get text embeddings | Generative AI | Google Cloud

Specifically - the "location=location" requirement of the lib's count_tokens method needs to be removed (or the entire vertexai.init(project=project_id, location=location))

Solution

Go to Solution.

Here is (or should be) the answer:

View solution in original post

Hi @cndg, Welcome to Google Cloud Community! There's currently no offline token counting method for Gemini embeddings like gemini-embedding-exp-03-07, making precise chunking for RAG applications difficult.

Here's what you can do: Alternatively, you can submit a feature request so that our Engineering Team can help you further. Please note that I cannot specify when this enhancement will be implemented. For future updates, I recommend monitoring the tracker and release notes regularly.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Looks like a new offline feature to do this has just been released:

https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/list-token