Skip to main content

Questions tagged [token]

A token is a small unit that can be easily understood by a large language model. Use on questions about "token". For general inquiries about utilizing tokens in LLM, please use this tag together with the tag "llm". For questions regarding the use of tokens on a specific LLM or GenAI tool, please use a more specific tag along with "token".

Filter by
Sorted by
Tagged with
1 vote
1 answer
40 views

I have word lists of of different sizes (100, 300, 3k+ words) and I want an LLM to generate stories sticking very closely (>90%) to the vocabulary specified in this list without derivations / ...
l2poca's user avatar
  • 13
1 vote
1 answer
47 views

I need to have abstract references in my LLM prompt. Say this is my prompt: Out of this list: item 183346: blue, heavy, English, <... add more characteristics ...> item 311296: green, light, ...
Michel de Ruiter's user avatar
2 votes
1 answer
35 views

The original problem is about how we can force AI to generate JSON. This is usually done with prompt engineering, or certain settings of the api, like response_format. However, this is only limited to ...
ZhenRanZR's user avatar
  • 123
-1 votes
1 answer
26 views

I encountered "Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid? (easily = not using dichotomy on prompt) ...
Franck Dernoncourt's user avatar
1 vote
2 answers
192 views

There are many examples where language models can’t count letters in a word. I would assume this also means they cannot count syllables or figure out rhythmic properties of written text. These seem ...
Looft's user avatar
  • 119
5 votes
1 answer
548 views

DeepSeek has a simple python code to calculate token usage offline but I don't really understand what it is showing. For example: result = tokenizer.encode("This is a test") print(result) &...
Gabriel's user avatar
  • 153
3 votes
1 answer
87 views

I have a really long string. How can I efficiently identify the boundaries of a fixed token length in the text? For example: text = "Quick silver brown fox jumped over the hedge" ...
Maximos's user avatar
  • 43
5 votes
1 answer
1k views

since, Llama2 is multi lingual model and it supports multiple languages, including English, Spanish, French, German, Italian, Portuguese, and Dutch. Vocab Size of Llama is 32K. How to know out of 32k ...
Vinay Sharma's user avatar
3 votes
1 answer
774 views

From the question How long is a "token"? we learn that tokens are commonly around 4 characters. So it seems plausible that LLMs might therefore prefer to have word boundaries coincide with ...
Rebecca J. Stones's user avatar
6 votes
1 answer
3k views

LLM max prompt length (e.g: GPT-4) and generation pricing (e.g: Azure) are both measured by the number of "tokens". How long is a "token"? Is it equivalent to a single character/...
SirBenet's user avatar
  • 2,229