eric orso • 9 months ago

Char/s vs Token/s in OSS

An heads up for people using this model

I made some benchmarks, and compared to other models, OSS has weak tokens at 1.2 Character per Token. Most models I tested hovers from 3.5 to 4.5 Character per Token. This has two implications:
1) Context is measured in tokens, you should use around 3X context size compared to what you use for other models for the same workload. If you do 4000 tokens in llama 3.2, it should be around 12000 tokens in OSS to do the same work
2) Speed is measured in tokens per second, if you look just at that, the model looks a lot faster than it is because of that 3X factors if you use english text. When comparing across models you should look at character per second, with the caveat that the conversion rate changes quite a bit, especially on short text.

Detailed stats over around 200 000 tokens, I don't expect many surprises for more english text. I haven't measured programming, it's plausible compression of OSS gets a lot better if you do mainly code but I haven't measured that in this bench

OSS20B

"n_char_per_token": {
"n_avg": 1.1982278934686585,
"n_rms": 1.2870897420857887,
"n_std": 0.4699467198482436,
"n_min": 0.449079754601227,
"n_max": 2.161705551086082,
},

Qwen 3 14B

"n_char_per_token": {
"n_avg": 4.030572019404191,
"n_rms": 4.082795976764328,
"n_std": 0.6509317815862128,
"n_min": 2.5442857142857145,
"n_max": 4.843283582089552,
},

1 comment
Dominik Kundel • 8 months ago

Hey Eric

Yes the two models use different vocabularies. The gpt-oss models use the same vocabulary as o3 and o4-mini if you want to have full visibility into the tokens.
https://github.com/openai/tiktoken/blob/main/tiktoken/model.py#L20

Comparing models with different vocabularies can indeed show variability so it depends on your overall goal. For example if you look at the cost to run Artificial Analysis' intelligence index the token usage of gpt-oss-120b and Qwen 3 reasoning end up in the same territory

https://artificialanalysis.ai/?intelligence-tab=openWeights&models=gpt-oss-120b%2Cgpt-oss-20b%2Cqwen3-235b-a22b-instruct-2507-reasoning%2Cqwen3-30b-a3b-2507-reasoning%2Cqwen3-235b-a22b-instruct-reasoning#output-tokens-used-to-run-artificial-analysis-intelligence-index

But yes it's helpful to remember that there are vocabulary differences between different open models.

Comments are closed.

Char/s vs Token/s in OSS

1 comment