Questions tagged [token]
A token is a small unit that can be easily understood by a large language model. Use on questions about "token". For general inquiries about utilizing tokens in LLM, please use this tag together with the tag "llm". For questions regarding the use of tokens on a specific LLM or GenAI tool, please use a more specific tag along with "token".
10 questions
1
vote
1
answer
40
views
Generate Stories from fixed sets of words
I have word lists of of different sizes (100, 300, 3k+ words) and I want an LLM to generate stories sticking very closely (>90%) to the vocabulary specified in this list without derivations / ...
1
vote
1
answer
47
views
How do I introduce references?
I need to have abstract references in my LLM prompt. Say this is my prompt:
Out of this list:
item 183346: blue, heavy, English, <... add more characteristics ...>
item 311296: green, light, ...
2
votes
1
answer
35
views
Is there some way to hard restrict the token choices that GPT generates, or to customize token selection?
The original problem is about how we can force AI to generate JSON. This is usually done with prompt engineering, or certain settings of the api, like response_format.
However, this is only limited to ...
-1
votes
1
answer
26
views
"Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid?
I encountered "Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid? (easily = not using dichotomy on prompt)
...
1
vote
2
answers
192
views
Why aren't there good large language models that have a small token count?
There are many examples where language models can’t count letters in a word. I would assume this also means they cannot count syllables or figure out rhythmic properties of written text.
These seem ...
5
votes
1
answer
548
views
How to understand DeepSeek's offline token usage estimator?
DeepSeek has a simple python code to calculate token usage offline but I don't really understand what it is showing.
For example:
result = tokenizer.encode("This is a test")
print(result)
&...
3
votes
1
answer
87
views
What is the efficient way to tokenize a long string?
I have a really long string. How can I efficiently identify the boundaries of a fixed token length in the text?
For example:
text = "Quick silver brown fox jumped over the hedge"
...
5
votes
1
answer
1k
views
Llama2 Vocab contents
since, Llama2 is multi lingual model and it supports multiple languages, including English, Spanish, French, German, Italian, Portuguese, and Dutch. Vocab Size of Llama is 32K. How to know out of 32k ...
3
votes
1
answer
774
views
Does the length of a token give LLMs a preference for words of certain lengths?
From the question How long is a "token"? we learn that tokens are commonly around 4 characters. So it seems plausible that LLMs might therefore prefer to have word boundaries coincide with ...
6
votes
1
answer
3k
views
How long is a "token"?
LLM max prompt length (e.g: GPT-4) and generation pricing (e.g: Azure) are both measured by the number of "tokens".
How long is a "token"? Is it equivalent to a single character/...