Earlier today, I sent this absolutely perfectly crafted piece of slop into GitHub Copilot…
Right, but i want thje patche sot be / and /* always
And as I already expected, due to using these LLM based coding agents and assistants continually throughout their evolution, the resulting change was exactly what I wanted, despite the poor instructions.
Now, I’m sure there is actually some difference, and likely this depends on the relevance of the typoed areas, and how often such typos might also appear in training data.
Why is this, you might ask?
These models work by recognizing statistical patterns and using the surrounding context, and “learned” context, to predict what’s next. I was going to same something along the lines of, the model saw ‘patche’ and ‘sot’ near coding terms like ‘/ and /*’ and made a highly probable guess that you meant ‘patches’ and ‘to be’.
However, I then realized I could also just get the model to “explain” itself…
How did you manage to know what I meant despite my typos in the last prompt?
I understood your message through a combination of:
Conversation context – We were just discussing the CloudFront purge workflow and the paths parameter
Pattern recognition – “patche” → “paths”, “thje” → “the”, “sot” → “to” are common typos
Semantic understanding – The sentence structure “I want [X] to be [Y] always” was clear even with typos
Domain knowledge – In CloudFront invalidation, “/” and “/*” are standard paths for root and wildcard invalidation
When typos are minor and the overall meaning is unambiguous from context, I can usually infer the correct interpretation. If something is truly unclear, I’d ask for clarification!
…
Now, im not sure it’s done the tokenization correctly here, as most of the things it is counting as tokens are full words, and that is not always the case, but also, what do I know about what is happening within the “magic brain soup” of the grok model I am using as this example…