Slop in, craft out?

October 27, 2025 0 By addshore

Earlier today, I sent this absolutely perfectly crafted piece of slop into GitHub Copilot…

Right, but i want thje patche sot be / and /* always

And as I already expected, due to using these LLM based coding agents and assistants continually throughout their evolution, the resulting change was exactly what I wanted, despite the poor instructions.

Now, I’m sure there is actually some difference, and likely this depends on the relevance of the typoed areas, and how often such typos might also appear in training data.

Why is this, you might ask?

These models work by recognizing statistical patterns and using the surrounding context, and “learned” context, to predict what’s next. I was going to same something along the lines of, the model saw ‘patche’ and ‘sot’ near coding terms like ‘/ and /*’ and made a highly probable guess that you meant ‘patches’ and ‘to be’.

However, I then realized I could also just get the model to “explain” itself…

How did you manage to know what I meant despite my typos in the last prompt?

I understood your message through a combination of:

Conversation context – We were just discussing the CloudFront purge workflow and the paths parameter
Pattern recognition – “patche” → “paths”, “thje” → “the”, “sot” → “to” are common typos
Semantic understanding – The sentence structure “I want [X] to be [Y] always” was clear even with typos
Domain knowledge – In CloudFront invalidation, “/” and “/*” are standard paths for root and wildcard invalidation

When typos are minor and the overall meaning is unambiguous from context, I can usually infer the correct interpretation. If something is truly unclear, I’d ask for clarification!

…

Now, im not sure it’s done the tokenization correctly here, as most of the things it is counting as tokens are full words, and that is not always the case, but also, what do I know about what is happening within the “magic brain soup” of the grok model I am using as this example…

You can take a look at https://platform.openai.com/tokenizer for example with this prompt, and see that there are some words split across tokens.

Anyway, here are some other interesting links for the vague topic…

https://github.com/cx0/llm-typos – Impact of typos and common misspellings on LLM task performance.
How do LLMs understand the words with Typos/Spelling mistakes? (archive link)
Understanding Why LLMs Struggle with Spelling and How to Fix It

CategoryPosts Tech

Tagsgithub copilot llm ai grok grok code prompt tokenization

Slop in, craft out?

Related

Leave a CommentCancel reply

Slop in, craft out?

Share this:

Related

Leave a CommentCancel reply