Counting Gemini text tokens locally - DEV Community
The Vertex AI SDK for Python now offers local tokenization. This feature allows you to calculate the number of tokens in your text input before sending requests to Gemini. Let’s see how this feature works…
Large language models (LLMs) use a fundamental unit: the token. In other words, they process input tokens and generate output tokens. A text token can represent characters, words, or phrases. On average, one token represents approximately four characters in English text. When you send a query to Gemini, your text input is transformed into tokens. This step is called tokenization. Gemini generates output tokens that are then converted back into text using the reverse operation.
To use Gemini tokenizers, you need both the latest google-cloud-aiplatform
package with the tokenization
extra:
Ensure you have version 1.57.0
or later installed.
Create a tokenizer for the Gemini model you’re using and call the count_tokens()
method on your text input.
Now, let’s try with a larger document:
🚀 Perfect! The number of tokens is computed in a fraction of a second. In the tested version (1.57.0
), local token counting is only supported for text inputs. For multimodal inputs (image, video, audio, documents), check out the documentation for details on how different medias account for different token counts.
In all cases, you can send a request using the Vertex AI API as usual.
Text tokenizers use a fixed data file, also called the LLM vocabulary. This vocabulary determines how to encode a text string into a sequence of tokens and, reversely, how to decode a sequence of tokens into a text string.
Here is what happens under the hood:
Key points to remember:
- Knowing how many tokens your text represents enables the following:
You now have a new local tool to manage your inputs before sending requests to Gemini! 🖖 Follow me on Twitter/X or LinkedIn for more cloud explorations