A harder-to-crack Wordle

Wordle is a game that took the Tweetosphere by storm, seemingly transforming Twitter into the now-defunct emoji-only social network Emojli. Countless nerds and word junkies likely litter your Twitter timeline with a daily stream of 30 green, grey, and yellow emoji boxes. So if you haven't joined the status-quo and engaged in this sweet new Internet addiction, here's the gist of the game. It's really quite simple. You have to guess the daily secret word with six tries. After submitting a guess, the letters in the correct position will light in green, those that appear in the answer but are in the wrong place will light in yellow, and those not in the word dwell in a dull grey. It's a very effective game to make you realize how few five-lettered words you remember or can spit out on demand.

Everyone and their mother have been presenting ideal strategies for playing Wordle. I want to do something different. I want to crack Wordle and exhibit fixes that make the game uncrackable. I know, I know, I'm cheating. But I like guarantees, and I like to play a game knowing it's guaranteed to be fair.

I'm not trying to condemn the authors of Wordle and its derivatives for not thinking about making the app uncrackable. Quite the contrary, they made a great game that I enjoy playing (and breaking) every day. Wordle was literally a labor of love. Here's a quote from the New York Times: "Josh Wardle, a software engineer in Brooklyn, knew his partner loved word games, so he created a guessing game for just the two of them". Me? I'm just the party-pooper who's here to tell everyone how to break (and possibly fix) it.

Wordle's setup

The game was such a hit that it spun an alternative for Spanish, and a couple for Math like Mathler and Nerdle. I argue these spin-offs are possible for two reasons: first, the code is easy to fork, and second, the backend is cheap.

The backend is so cheap that it could practically be removed entirely. The only interaction you'll have with the server is the initial page load. That's it. Zero contact is made to check if a word is in the dictionary. Zero contact is made to list the correctly-placed letters. Zero contact is made. Nil.

All but Nerdle can be downloaded locally and played independently of any server. Nerdle requires a server to download the daily answer. From here on, the post will assume that it's perfectly acceptable to run tiny daily computations on the server like in Nerdle's case.

Attacks on Wordle and derivatives

The setup of Wordle, and derivatives, implies that the logic and the data it needs: the dictionary and the daily word, are already present in the code. Both Wordle and Spanish Wordle provide the dictionary in clear text in the application's code (if you think about it, Mathler does not need a dictionary, the expression just needs to parse correctly). And both Wordle and Mathler put the answer in clear text in your web browser's local storage.

Ergo, it's easy to crack the answer; just read it from the browser's localStorage. The cracking code fits in half-a-tweet:

Stuck on wordle and want to see the solution? Make a bookmark with this URL:

javascript:alert(JSON.parse(localStorage.gameState).solution)

and open it in wordle

— GEZ (@_typeable) January 24, 2022

Nerdle does something strange. It does store the solution in the localStorage, but only after you've interacted with the page beyond loading it. So if you open the page and look in the localStorage, you won't find anything. Press a key or a button on the screen, and the solution is written right in the localStorage making Wordle's crack achievable here. Now I'm a lazy guy, and I want my cracks to work with the least amount of interaction with the page as possible.

However, if you look in the Networks tab, you'll find a request to a strange URL that returns an 8-character word. Nerdle is the only one that requires an 8-character solution. Could this be an encoding of the equation of the day? An invasive operation inserting a "debugger" statement with medical accuracy reveals that it's indeed the encoding. This URL is also the MD5 hash of the number of days between Nerdle's launch and today.

The crack, this time in Python 2, fits in four lines:


import hashlib, urllib, time
offset=int((time.time()*1000-16426368e5)/864e5)
url="https://nerdle.com/words/"+hashlib.md5(str(offset)).hexdigest()
print "".join([chr((ord(c)+113)%126) for c in urllib.urlopen(url).read()])

On the other hand, Spanish Wordle is harder to crack at first glance. The localStorage's "solution" entry is encrypted with what appears to be AES. The encryption key is buried deep in the code's logic, but nothing a surgically placed "debugger" placement can't step you through. Inspecting a curious line of code that contains "AES.decrypt(t,c)" that happened to be called with the encrypted solution reveals that the key and the salt (both pre-generated and not updated daily) are respectively "llanos" and "ibai." Parenthesis: I googled llanos and ibai out of curiosity. The results revealed what I assume to be Spanish Wordle's creator's alter-ego or favorite e-sport internet streamer: Ibai Llanos. How is he related to Spanish Wordle? Your guess is as good as mine. End Parenthesis.

So it's possible to crack, and the cracker still fits in a single tweet:

Wordle(ES) crack bookmark:

javascript:var d=document;var s=d.createElement("script");s.onload=()=>alert(CryptoJS.AES.decrypt(JSON.parse(localStorage.getItem("solution")),"llanos").toString(CryptoJS.enc.Utf8).slice("ibai".length));s.src="https://cdnjs.cloudflare.com/ajax/libs/crypto-js/3.1.2/rollups/aes.js";d.body.append(s)

— GEZ (@_typeable) February 7, 2022

The cracker is more involved, but here's the gist of it: it starts by loading CryptoJS from the CDNjs, since accessing the internal AES algorithm proved to be a nightmare. Once loaded, the script uses the library to decrypt the "solution" entry in localStorage, removes the salted string, and reveals it in the most anachronistic pop-up ever.

Post-script: As I was getting ready to publish this post, I came across 3Blue1Brow's video about finding the best strategy to find an answer in Wordle. Go watch it. What's relevant in this video is that Grant quickly mentions a possible attack at 4:47. Here's what he says: "The way that it's [the list of possible answers] visible in the source code is in the specific order in which answers come up from day-to-day. That you can always look up what tomorrow's answer will be."

This brings us to these fundamental questions: can we keep everything encrypted? Can we compute the properties of a guess without needing to store clear text versions anywhere? If there are encryption keys, can we avoid providing their generation algorithm? Can we do that in the least amount of memory possible? This sounds like a job for homomorphic encryption, but let's not get carried away and think of simpler alternatives.

To give you an idea of how much each application uses, Wordle's code is in a 173.34KB javascript file, Spanish Wordle's code is in a 639.04KB file, and Mathler's code is in a 198.71KB file.

Admissible attacks

If we can answer yes to all the questions above, we should be vulnerable to only two attacks.

The first is a naive daily dictionary attack, in which the attacker generates a rainbow table for each combination of letters and tries each individually. I say each five-letter word combination because the dictionary will be "encrypted," so the attacker will not know which words we use and which we don't.

In the second attack, the attacker utilizes the properties of the guess to improve the searching strategy. I argue that this is perfectly fine because the attacker is now playing the game, as intended!

The proposed solutions

Disclaimer: I am not an expert in cryptography or privacy-preserving techniques. If you are an expert and think my analysis is faulty, please shoot a tweet at me.

In the following, I present three techniques that I experimented with. All three proposals provide almost-complete secrecy about the guess and the dictionary and the zero-contact requirement of the setup. There is only some daily code that the backend must execute. If you want to read what I deem to be the best technique (shortest code, simplest conceptually, and most economical memory-wise), then head over to the last section. The order in which I present the solutions follows my chronology of discovering them.

Update (12/02/2022): I have implemented a proof-of-concept which is available on my Github. There is also a running instance that generates a new word every hour on my website. To encourage cracking this instance, no code was obfuscated or minimized, no library was used, and only the essential features were implemented.

Bloom Filters

If you are unaware of Bloom Filters, they are fundamentally a set with two operations: adding and checking membership. A Bloom filter starts as a fixed-length array of 0s. To add an item to the Bloom filter, you hash it k times, then put a 1 at the index equalling the hash. To check the membership of an item, you hash it k times and make sure that all the entries in the array at those hashes are 1s.

A Bloom filter of all the words in the dictionary will not store these words, nor their hashes. So they are perfect for hiding what the elements are.

Great, if the server encodes a Bloom filter in the code, then we can tell if a guess is a valid word or not. Next question: how do we check if the second letter of the guess is in the correct place?

I propose that we construct a Bloom filter for every word index. The words we will add to that filter are those valid words with the correct letter in that position. For example, if today's mystery word is "ghost," we will construct five filters. The first will contain words like "glass" and "gamer," the second words like "phase" and "rhyme," the third words like "khaki" and "twang," etc...

Notice that if a word is in all those five filters, then the guess is the word of the day. So nothing special has to be done.

Okay, now how do we check if a letter which appears in the solution is wrongly placed?

Even more bloom filters! To be precise, five more bloom filters. Each one will be associated with a word index. So in the first Bloom filter, we will put all the words that start with all the letters which occur in the answer that are not the first letter. More concretely, if today's word is again "ghost," the first Bloom filter will contain words like "honey," "octal," "state," and "taunt". The same is done for the other four indices.

Done. Problem solved. Probably though, right? Right. A Bloom filter is a probabilistic data structure. There will be some false positives. In other words, we may say "correct, the second letter is indeed an E," yet the actual answer's second letter is an S. Imagine the barrage of intimidating tweets you'll receive when this happens. This actually happened to me while testing out this solution. Not the stream of tweets, fortunately, but the false positives. The trivial fix would be to iterate over all 26⁵ possible words, check if the bloom filter has false positives, and regenerate it with new hashes if it's the case.

These bloom filters can be constructed in one pass over the dictionary. So the runtime is linear in the size of the dictionary. It should be fast to do by a server just before midnight every night.

On the other side, how much will our poor gamer have to download? Sadly no caching can be leveraged (as with all the solutions I present). This is a fantastic calculator to find the optimal parameters for a Bloom filter. Our dictionary will include the 13K words of Wordle and have 1/26⁵ false positive probability that boils down to 53.8KB. The remaining 10 Bloom filters will contain 13K words in the worst case with a chance of 1/13K of false positives that end up being 31.29KB each. Total: 366.7KB. Twice Wordle and some change.

Finite-Field Polynomials

Can we do better memory-wise? An optimal Bloom filter has 50% of its entries 0s. Can we get rid of those somehow?

This proposal relies on observing that a polynomial of the shape (x-x₁)...(x-x_n) precisely has these n roots: x₁, x₂, until x_n, and has n coefficients. As it so happens, this is also the case in finite fields such as Z_p where p is a prime, i.e., all the numbers between 0 (inclusive) and p (exclusive).

It's rather critical to work in Z_p and not good-ol' R. The reason has nothing to do with floating-point errors or compressing coefficients. Working in R allows an attacker to use Newton's method to find all the roots of a given polynomial in just a handful of applications. We want to hide as many properties about our polynomial as possible from the attacker. Luckily, root finding in Z_p is hard. The standard algorithm I'm aware of is Chien search which is essentially just a fancy brute-force search.

To detect whether a guess is in the dictionary or not, we start by choosing a perfect hashing function for all the possible 26⁵ words. Next, we compute the coefficients of (x-h(w₁))...(x-h(w_n)) where w₁...w_n are the words in our dictionary. When presented with a guess, we can evaluate the polynomial at the hash of the guess and check whether the answer is zero or not.

In my experience, this perfect hashing function can be found pretty quickly, often in less than 5 trials. Moreover, computing the polynomial's coefficients by expanding its factors can be done in quadratic time in the size of the dictionary. And the evaluation can be done in linear time in the size of the dictionary.

Since our domain must be Z_p, and we must support encoding 26⁵ words, we'll choose the prime 11881379. This is the first prime after 26⁵. So all our numbers, hashes, and coefficients can be stored in log₂(11881379) = 24 bits.

Next problem: how can we know whether the first letter is correct or exists in the answer? I propose to encode all these properties in a single small number. This is the scheme I adopted. Let's work in base 3, where 0 means a letter is correctly placed, 1 means the letter exists in the answer, and 2 means that the letter does not exist. For example, if today's word is once again "ghost," the guess "chogs" is encoded as 20011 in base 3 or 166 in base 10.

So we can construct another polynomial that interpolates the hash of every dictionary word and its encoding. However, this time, we need a perfect hashing function not for 26⁵ words but for 13K words. Luckily 13001 is a nice close prime number. This means all our numbers for this polynomial will fit in log₂(13001) = 14 bits.

This interpolation can be trivially done using Lagrange interpolation in cubic time. Which in practice will take a long time, hardly less than half an hour. But it can be squeezed down to a quadratic time which is just fast enough.

The first polynomial has 13K coefficients taking each 24 bits. The second one also has 13K coefficients, but each takes 14 bits. Total: 61.75KB. This is a reduction factor of 5 from the Bloom filter solution! Putting us at half the size of Wordle.

Hashes to Hashes

The motivation is as follows: there is still some waste in the finite field approach; the result of computing our second polynomial is 14 bits long, yet we only need 8 bits to store the biggest encoding (22222 in ternary = 242 in decimal). How to represent a function with a domain of 14 bits and a codomain of 8 bits? Hash-tables, D'uh! And do we really need that first polynomial now? Can't we just map these 24 bits directly to these 8 bits?

Of course, we can! But there's a catch, we should salt these 8 bits. Otherwise, we'd be vulnerable to a new attack. The number of words that hash into 21112 in ternary (203 in decimal) is a good enough hint to the answer. For example, if the attacker sees 103 numbers that hash into 203, they'll deduce that the answer must be one of only 31 possible words (tiara, angry, evoke, amigo, latex, to list a few). If they spot 27 203s, then the range of possible words is 174. If that number turns out to be still too large for the attacker, they can look at how many hashes map into 21211, for example, and take the intersection of those two lists. In this fashion, an attacker can avoid a full dictionary attack. Salting the property with the hash of the word (using a different hashing function than the key) eliminates this attack.

The whole construction fits neatly in this seven-line Python script:


import random
def n_bits(num, n): return num & ((1<<(n+1))-1)
words = list(filter(lambda x: len(x)==5, map(lambda x: x[:-1], open("dict.txt").readlines())))
k1 = random.randint(0, 1<<25)
k2 = random.randint(0, 1<<9)
answer = random.choice(words)
table = { n_bits(hash(w,k1), 24): compare(w,answer) + n_bits(hash(w,k2), 8) for w in words }

The code above clearly computes the hash table in one swoop over the dictionary. So its runtime is linear.

What about memory consumption? The key of the hash table is 24 bits long, and the values are 8 bits long. So for every entry, we only need 32 bits. In other words, one integer! How elegant! In total, our hashtable will be 52KB big. That's slightly less than the 58.3KB gzipped Wordle code.

Conclusion

It's possible to have versions of Wordle that do not expose either the dictionary or the word of the day to the player. The downside of the Bloom filter approach, besides its relatively porky size, is its probabilistic nature. So extra work has to be done to check that no player will complain.

Although the latter two methods are not probabilistic, they rely on perfect hashes. This undoubtedly means extra work. But where they shine is in their slim and graceful size.

Earlier I said these schemes provide almost-complete secrecy about the dictionary because there was one property that I couldn't hide: the size of the dictionary. Here, the Bloom filter has the advantage. Because it can only approximate the number of elements inside it. This parameter could be a clue for the attacker to factor in his evil scheme. But I believe that this can be fixed by randomly adding bogus elements that map to invalid properties to the data structures, making them appear larger than they actually are. However, this entails more bytes to be downloaded.

Asymptotically Typeable

Home Blog RSS