When Encryption Isn’t Enough: Token-Length Side Channel Attacks For AI Assistants

Singularity — Tue, 28 Oct 2025 03:28:45 GMT

You’ve probably seen clips of the long-running TV show, Wheel of Fortune at least once on social media. (Just like the famous meme above. Can you guess the song?) Contestants stare at a board full of blank spaces, waiting for a lucky spin to reveal a single letter. At first, it looks simple. Once a few letters appear, the answer often clicks into place. But when there are no letters at all, only the number of words and how long each one is, guessing the full phrase becomes almost impossible. Unless it’s something everyone knows, like a famous song lyric, you’re basically guessing in the dark due to the infinite redundancy of the human language.

Now imagine AI trying to solve that kind of puzzle. And not as a game, but as a way to uncover parts of encrypted conversations between you and an AI assistant like ChatGPT, Gemini, or Claude. That probably sounds far-fetched, but a group of researchers recently showed that it can be done. In a paper published at the USENIX Security conference, one team demonstrated that it’s possible to look at encrypted data (no text, no keys) and still infer what the conversation was about, just by examining the size of the packets being sent.

In this post, I’ll explain what the researchers found, why and how the attack works, and what this means for anyone who is either a developer actively engaging in the development of AI models, or a daily AI user, especially in an rapid era where LLMs are being part of our everyday lives.

Encryption hides words, not their shape

When you chat with an AI assistant, your messages are indeed encrypted using strong cryptographic algorithms. That means no one watching the network can see the actual words you're sending or receiving. On the surface, this sounds completely secure. But encryption doesn't hide everything. Some clues still slip through, things like the timing of each message, or the size of the data being sent back and forth.

The most important thing to remember is that most AI Models do not reply in full answers at once. Rather, it creates the context piece by piece, token by token. A token refers to a small chunk of text, usually a few characters. When the model sends a new token, an encrypted packet arrives in your browser or app. Even though we can’t see what those packets contain, we can see how big they are.

When you arrange the packet sizes, it shows a pattern or a sequence of numbers which shows the length of every token in the AI’s response. It's like seeing the masked version of a sentence without any of the words filled in, just like the TV show game mentioned earlier.

That’s the insight the researchers built on. They figured out that these "token-length" patterns mirror the actual wording of the document. After research and trial, they figured out it is possible to reverse engineer parts of what they’re saying, even if the messages are encrypted, without actually decrypting the original message with the correct key. And to make this easier, instead of playing their own game of Wheel of Fortune, they created an AI model that is specifically trained to do the job for you.

The attack pipeline

Here’s how the researchers broke down their method:

Capture the traffic

Researchers captured encrypted data from AI helpers, like ChatGPT and Copilot. The packets’ sizes and order were visible, but their contents were not.
Find message boundaries

They identified where each response began and ended. Since most AI models stream their answers in small chunks, each chunk corresponded roughly to a piece of generated text.
Extract token lengths

They measured how much the packet size changed each time. The difference showed how many characters were in each new token. Together, these differences formed a numeric sequence shaped like the AI's sentence rhythm.
Split into segments

They broke that sentence down into smaller sentences. The pattern has short spikes that correspond to punctuation marks for instance.
Reconstruct the text

In the end, the final length sequence was fed into a model that tried to translate the length sequence into natural language. The model made many possible sentences and picked its best fitting one.

How it works

So how does the trained AI model actually guess what the AI assistant said?

The researchers built their own AI (LLM) and trained it to read these token-length patterns as if they were a new language. Each token length became a special symbol. For instance, “_5” for a five-character token, and “_2” for a two-character one. The model learned to associate sequences with likely words and phrases through internalized statistics of how language behaves.

Two transformer models were trained based on T5 architecture from Google.

The initial model, LLM-A, targeted the opening line because first sentences often characterize a specific topic.

The second model LLM-B guessed the next sentences below, using the context provided, making sure the following sentences are coherent and natural.

To train them, the researchers used the UltraChat dataset, which contains 1.5 million GPT-4 Turbo conversations. Every response from the assistant was tokenized and every token was replaced with its character length to make training pairs of a numeric pattern and original text. The models were self-supervised with noise addition with the idea to mimic real network artifacts like batching tokens and streaming preambles. They learned how AI assistants “sound” through tone, phrasing, and structure. Eventually, they were able to translate vague numeric patterns into very useful guesses about what the assistant actually said.

Boosting the Accuracy

To boost the accuracy of the model’s predictions, the research team focused on an important, but rather classic idea.

Language isn’t random. Assistants make use of familiar wording, predictable sentence structures, and repeating templates. Due to this predictability, only a limited number of sentences are likely to fit any specific length pattern.e.

For example, ChatGPT often starts with “Certainly!” or “Here’s how…”. When the model identifies a similar pattern of tokens, it could make a confident guess that those words were there.

As I read this, it reminded me of the famous story of Alan Turing and the team at Bletchley Park trying to decrypt Germany’s ‘unbreakable’ code system, Enigma. The key to breaking the code wasn't brute force (which was technically impossible), but rather noticing certain patterns or relying on common or familiar phrases. One such phrase was weather reports or the everyday “Heil Hitler,” a phrase that made an appearance in many communications.

What the researchers did here isn’t far off in spirit. The tools differ, but the logic is the same: language always leaves its mark, even in code.

How accurate is it?

The paper reports strong empirical performance.

The cosine similarity (φ) of first sentences to actual topic is more than 0.5 for ≈ 52.7 % of first sentences.
Near-exact phrasing was achieved for nearly a quarter of the output (higher φ).
Full-response inference works about 38% of the time on traffic from the ChatGPT-4 browser and about 17% of the time on traffic from API streams where the tokens are paired.
Training on general text instead of assistant outputs dropped performance to ~5 %, confirming the benefit of targeted tuning.
A model trained on GPT-4 data also performed well on Microsoft Copilot traffic, suggesting stylistic similarity.

These numbers show that, while not perfect, the attack reliably reveals topics and often recovers phraseology which sufficient enough to constitute meaningful privacy exposure.

Can it be fixed?

The short answer is yes, but at a cost.

The researchers suggest three main defenses:

Add random padding. Adjust the size of every packet in a subtle manner to hide the real token length.
Batch tokens. Instead of streaming tokens one by one, send them in batches.
Send the full response. Do not send anything until the entire output is generated.

The three measures will make the attacks harder, but they will also slow down the experience or will be more bandwidth heavy. It’s a trade-off between performance and privacy.

Also, major LLM providers have already responded. The writers and researchers of the paper responsibly disclosed their findings and within weeks the companies including the big ones rolled out patches to address this vulnerability. In other words, the attack demonstrated in this paper is no longer available in recent versions of these LLMs.

What this tells us

What’s fascinating here is encryption alone doesn’t guarantee privacy. What's fascinating and a little unsettling at the same time is that even with all the complex encryption and security layers in place, data can still slip out through something as small and unexpected as packet length. It's impressive, almost absurd, that a system designed to be airtight can leak information from such an ordinary detail. It makes you realize how fragile "security" can be when the weakest point isn't the algorithm itself, but the invisible side effects of how we make technology smooth and human-friendly.

It also shows how powerful artificial intelligence is. The technology that we built to understand and generate language can be turned around to expose hidden patterns that we would miss otherwise. It’s a striking example of how AI is starting to work beyond human intuition. It also shows how adaptable this technology is. In this case, AI was used to read what humans couldn't. It's a reminder that AI isn't just expanding into new fields, but also reaching into areas beyond human perception, uncovering signals and structures we would never notice on our own.

~~BTW, I’m also curious whether if we could build a Wheel of Fortune LLM using this idea and see if it can beat the TV show.~~

References

Weiss, R., Ayzenshteyn, D., Amit, G., & Mirsky, Y. (2024). What Was Your Prompt? A Remote Keylogging Attack on AI Assistants. Proceedings of the 33rd USENIX Security Symposium.

https://www.usenix.org/conference/usenixsecurity24/presentation/weiss