Newest 'token' Questions

1 vote

1 answer

40 views

Generate Stories from fixed sets of words

I have word lists of of different sizes (100, 300, 3k+ words) and I want an LLM to generate stories sticking very closely (>90%) to the vocabulary specified in this list without derivations / ...

l2poca

13

asked Nov 20 at 12:19

1 vote

1 answer

47 views

How do I introduce references?

I need to have abstract references in my LLM prompt. Say this is my prompt: Out of this list: item 183346: blue, heavy, English, <... add more characteristics ...> item 311296: green, light, ...

Michel de Ruiter

111

asked Oct 7 at 10:16

2 votes

1 answer

35 views

Is there some way to hard restrict the token choices that GPT generates, or to customize token selection?

The original problem is about how we can force AI to generate JSON. This is usually done with prompt engineering, or certain settings of the api, like response_format. However, this is only limited to ...

ZhenRanZR

123

asked Apr 26 at 9:27

-1 votes

1 answer

26 views

"Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid?

I encountered "Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid? (easily = not using dichotomy on prompt) ...

Franck Dernoncourt

5,506

asked Sep 22, 2024 at 11:36

1 vote

2 answers

192 views

Why aren't there good large language models that have a small token count?

There are many examples where language models can’t count letters in a word. I would assume this also means they cannot count syllables or figure out rhythmic properties of written text. These seem ...

Looft

119

asked Aug 27, 2024 at 6:54

5 votes

1 answer

548 views

How to understand DeepSeek's offline token usage estimator?

DeepSeek has a simple python code to calculate token usage offline but I don't really understand what it is showing. For example: result = tokenizer.encode("This is a test") print(result) &...

Gabriel

153

asked Jul 18, 2024 at 12:02

3 votes

1 answer

87 views

What is the efficient way to tokenize a long string?

I have a really long string. How can I efficiently identify the boundaries of a fixed token length in the text? For example: text = "Quick silver brown fox jumped over the hedge" ...

Maximos

43

asked Jan 5, 2024 at 11:12

5 votes

1 answer

1k views

Llama2 Vocab contents

since, Llama2 is multi lingual model and it supports multiple languages, including English, Spanish, French, German, Italian, Portuguese, and Dutch. Vocab Size of Llama is 32K. How to know out of 32k ...

Vinay Sharma

153

asked Nov 27, 2023 at 8:03

3 votes

1 answer

774 views

Does the length of a token give LLMs a preference for words of certain lengths?

From the question How long is a "token"? we learn that tokens are commonly around 4 characters. So it seems plausible that LLMs might therefore prefer to have word boundaries coincide with ...

Rebecca J. Stones

1,605

asked Aug 7, 2023 at 3:01

6 votes

1 answer

3k views

How long is a "token"?

LLM max prompt length (e.g: GPT-4) and generation pricing (e.g: Azure) are both measured by the number of "tokens". How long is a "token"? Is it equivalent to a single character/...

SirBenet

2,229

asked Jul 19, 2023 at 18:07

Stack Exchange Network

Questions tagged [token]

Generate Stories from fixed sets of words

How do I introduce references?

Is there some way to hard restrict the token choices that GPT generates, or to customize token selection?

"Error Invalid token" in Azure's Early access playground (Preview) with GPT o1. How can I easily figure out which token is invalid?

Why aren't there good large language models that have a small token count?

How to understand DeepSeek's offline token usage estimator?

What is the efficient way to tokenize a long string?

Llama2 Vocab contents

Does the length of a token give LLMs a preference for words of certain lengths?

How long is a "token"?

Hot Network Questions