Cryptanalysis of the Sarah2 Pen-and-Paper Cipher

Disclaimer: Building strong cryptosystems is way harder than breaking them. Props to the system author for going for it and publishing the system – it’s a nice and simple design which is useful for learning cryptographic concepts. I hope this post helps inform future designs!

I recently heard about the Sarah2 pen-and-paper cipher, touted as a “strong pen-and-paper cipher” and posted publicly here: The cipher is quite simple in order to make it amenable for pen-and-paper use. When I first heard about the cipher, I immediately started thinking of ways to break it, and a few hours later I had two functional attacks that fully compromise the key with relatively small numbers of chosen-plaintext messages (in the best attack, less than 2000 4-character messages). This writeup explains the technical details of the attack. Because the cipher is quite simple, the attack is similarly simple to explain, and I hope it makes for a good illustration of a neat cryptanalytic technique.

The Sarah2 Cipher

Sarah2 encrypts strings of characters from the English alphabet, plus the “_” character for spacing and padding, for a total of 27 letters. The secret key is a permutation of all possible letter pairs (an S-Box, in cryptography parlance), mapping every pair of letters to another random pair. Thus, there are (27*27)! = 729! possible secret keys. This equates to a 5886.98-bit key – quite a bit larger than the tiny 128-bit keys commonly used in AES.

Because the cipher operates on pairs of letters, the input must be of even length, and is padded (with a “_” character) if it is not. Then, encryption proceeds as a series of rounds. In each round, every (non-overlapping) pair of characters is passed through the secret S-Box to yield a new pair of characters. Then, the characters are permuted using a kind of “inverse shuffle”, where the characters in the odd-numbered positions (1, 3, 5, …) are collected and then concatenated with the characters in the even-numbered positions (2, 4, 6, …), so that for example the string “12345678” would become “13572468”. This shuffle breaks up all the pairs to provide diffusion in the cipher.

The permutation is omitted in the final round as it is unnecessary (it would add no security and would make the algorithm harder to execute on pen and paper). Decryption simply inverts these steps – the characters are passed through the S-Box in reverse and shuffled each round.

Here’s a worked example to make things clearer. The secret S-Box we’ll use is


To use this, find the row for the first character, then the column for the second. For example, encrypting “cf” would yield “bm”. Now let’s take a plaintext, like “attack_at_dawn” and encrypt it through one round

We first translate each pair “at”, “ta”, “ck”, “_a”, … and translate via the table, to get “fmjheqgzzmxohd“. Then, we “inverse shuffle” this to get “fjegzxhmhqzmod“.

In the next round, we translate each pair “fj”, “eg”, “zx”, “hm”, … to get “cchvcsxwmkjmnh“. Then inverse shuffle to get “chcxmjncvswkmh“. You get the hang of it. Visually, on a four-character plaintext:

Four-round Sarah2 operating on a 4-character plaintext.

So, this is a simple cipher indeed. The author makes three recommendations about the number of rounds to use, based on the attack model:

log2n rounds: “I’m only encrypting a handful of messages, and adversaries will never get a chance to make me encrypt anything they choose.”

log2n+2 rounds: “I’m encrypting tons of messages, but adversaries will still probably not get a chance to make me encrypt anything they choose.”

2 x log2n rounds: “My adversary can encrypt and decrypt things using my key, but they don’t know the key.” My expectation is that this provides a fairly strong level of security, even against attacks involving a fair amount of computational power.

Let’s Do The (Electric) Slide Attack

The author notes that, after log2n rounds, a one-letter difference will have propagated through to all ciphertext characters. If you play with the cipher a bit, you might feel like it’s pretty strong; after all, if you encrypt attack_at_dawn you might get _fjqoaprokfpgu, but attack_at_down would yield the completely different wmhkpqfpspcrcb – and that’s after just 4 rounds. Going for the highest-security 2 x log2n rounds (8, in this case) seems like it should be very secure, indeed, especially if the S-Box is properly random. And, indeed, the author explicitly explains that 2 x log2n “provides a fairly strong level of security, even against attacks involving a fair amount of computational power”. However, there’s a super-powerful attack which will break this cipher no matter how many rounds are used!

The key insight is that this cipher, at its core, is the same round operation applied a whole bunch of times. If we add a final permutation onto the encryption operation (which we can do, because the permutation doesn’t depend on the key), an n-round Sarah2 encryption operation straightforwardly becomes n iterations of the substitution+permutation round. Let’s call the round function F, such that the encryption operation (with extra permutation) is F applied n times: Fn.

Suppose that we have the ability to encrypt anything we want but no access to the key (the third scenario outlined above): that is, for any message a of our choosing, we can encrypt it to get b = Fn(a). Now suppose that, somehow, we’re able to find (or guess) c such that c = F(a). In that case, we can ask for the encryption of c, which is d = Fn(c) = Fn+1(a) = F(Fn(a)) = F(b). This seemingly simple relationship is actually extremely powerful – if we know a single relation c = F(a) of the round function, we can obtain a second relationship d = F(b) which enables us to learn more about the round function. And with that second relationship, we could obtain a third, and so on and so forth. Note that these are relationships on one round of the ciphernot on the full multi-round encryption function. This will allow us to attack the round function directly – no matter how many rounds the encryption uses.

This technique is known as a slide attack, since it visually looks like “sliding” the cipher across itself:

Example of sliding the cipher. a = “abcd”, b = Fn(a) = “fmba”, c = F(a) = “fiwt”, d = Fn(c) = “qsai”.
If we were able to guess that the S-Box mapped ab to fw and cd to it, we could slide to find that fm mapped to qa and ba mapped to si – doubling the number of key entries we know.

So, if we can encrypt anything we want, we can effectively reduce this n-round cipher down to a single round. Is one round of Sarah2 secure? Unfortunately, no. A single round only performs a single substitution through the secret S-Box, and a simple fixed permutation.

Now, the only problem left is to guess a single relationship c = F(a). Luckily, we can just apply brute force. If the input a consists only of repeats of a single character pair (e.g. ababababab), one round will only use one entry of the S-Box.

Attack 1 – Long Messages

This leads us to attack idea #1: what if we encrypted a really long message, say, 10000 underscores (_)? Since the message has only a single character pair, we know that the first substitution will turn it into xyxyxyxy....xyxyxy for some xy, and then the first permutation will turn it into xxxxxxxx...yyyyyyyy. So, we could also encrypt every xxxxxxxx...yyyyyyyy for each of the 729 (27*27) pairs of characters x, y. How do we know which one is the right match?

If we tack on an extra permutation step to the first encryption (of ___...___), then the resulting ciphertext will be just one substition step away from the encryption of the correct xxxx...yyyy. The ciphertexts are basically random, but for long-enough messages they must contain duplicate pairs (by the Pigeonhole principle). For each candidate pairing of ciphertexts, we can check to see if the duplicates line up – if they don’t, then there’s no way they could be separated by a single substitution step.

For example, in the example below the encryption of ___…___ happens to start with the duplicate pair “ei ei” (although in general the duplicate pairs might occur anywhere in the text). The encryption of xx…yy starts with ddxh…, which is impossible since “ei” could not map to both “dd” and “xh”. But, encrypting the correct kk…zz produces a ciphertext that starts with “gu gu”, which is plausible given the “ei ei” in the original ciphertext.

If __ maps to kz, then encrypting ____..____ will produce the same “pattern” of duplicate pairs (e.g. eiei) as encrypting kkkk..zzzz (e.g. gugu).

So, by leveraging these duplicate pairs, we can get two ciphertexts that only differ by a single substition step. Since the messages are very long, we expect that almost all possible letter pairs exist, from which we can easily reconstruct the whole table.

With 10000 characters (5000 letter pairs), the probability that the ciphertext does not contain all 729 letter-pair combinations is (728/729)^5000 = 0.001; that is, with 99.9% probability the ciphertext contains every possible letter pair and therefore we get the entire secret S-Box. This only requires us to encrypt at most 730 messages of 10000 characters, or around 7,300,000 characters.

Attack 2 – Short Messages

We can also pull off the slide attack even if we restrict ourselves to short messages. Let’s restrict ourselves to 8-character (4 letter pair) messages. What we’ll do is obtain the encryptions of every message of the form abababab and every message of the form aaaabbbb – a total of 1431 messages (skipping duplicates of the form aaaaaaaa).

Let’s let A(ab) denote the encryption of “abababab” with an extra permutation (as with the long message attack), and B(ab) denote the encryption of “aaaabbbb“. Let S denote the secret substitution function; for example, S(abcd) = fwit for the S-box above. Then, for any letter-pair X, we have S(A(X)) = B(S(X)) – that is, A(X) and B(S(X)) are related through a single substitution (using the slide attack). All we need to do is figure out the pairings between A and B messages.

As with the long message attack, we can do this by looking for messages with duplicated letter pairs. For example, if A(ic) = cfcftlvc and B(gh) = bmbmsmlc, and no other B messages have this pattern (the same pair twice, followed by two different pairs), then we can be certain that S(ic) = gh, S(cf) = bm, S(tl) = sm and S(vc) = lc. Once we learn a new pair, such as S(cf) = bm, we can look at A(cf) and B(bm) to get even more pairs, and this way iteratively derive the entire key.

The probability that any A message has no duplicated letter pairs is (728/729)^4 = 0.9945, so the probability that every A message has no duplicate pairs is 0.9945^729 = 0.018. Therefore, this particular attack strategy succeeds over 98% of the time. We could improve this by using longer messages (e.g. with 16 characters it would succeed 99.9% of the time). We could also use a more brute-force approach where we guess pairings, propagating the resulting new pairings and continuing until we get a full key or hit a contradiction and backtrack.

This attack only requires 1431*8 = 11448 characters to be encrypted, or around 54,434 bits of information. Given that the key itself is around 5887 bits in size, this equates to only needing around 9 bits of information per bit of key – not a bad ratio! The theoretically optimal attack would require observing 5887 bits of encrypted (or decrypted) data, so this is pretty close.

Improving the Cipher

As we can see, the slide attack is a pretty clever attack that works whenever you iterate the same function over and over again in a cipher. So, one of the best ways of dealing with this is simply to add a little bit of per-round variability. The AES cipher and its predecessor DES both use the concept of subkeys – the main key is used to derive a series of subkeys using a key schedule, and each round uses a different subkey. This way, each round function is slightly different, which is enough to defeat the slide attack.

In Sarah2, this might be achieved by using the S-Box to derive some per-round modifications to use. As an illustrative (and perhaps not very strong) example, you could interpret a letter pair like “kz” to mean a Caesar shift (mapping k to z, l to a, m to b, etc.), and apply one Caesar shift per round according to a particular entry in the S-Box (e.g. the “__” entry for round 1, the “aa” entry for round 2, etc.).

The LC4 pen-and-paper cipher uses a similar secret S-Box, albeit a much smaller one (6×6). It encrypts characters one at a time (a stream cipher), and moves the entries of the S-Box around after encrypting each character. This ensures that the S-Box changes over the course of encryption rather than remaining static. A similar approach of shifting the entries in the Sarah2 S-Box after each round could help mitigate the slide attack.

Summary and Conclusions

Above, I’ve outlined two attacks on Sarah2 which reveal the entire secret key, under the assumption that the attacker can encrypt messages of their choice. While these attacks don’t break the cipher if the attacker can only passively observe messages, they should be taken as a strong indication that the Sarah2 cipher is not safe or secure. The slide attack principle underpinning these attacks means that the number of rounds is immaterial; these attacks work even if you were to do thousands of rounds, demonstrating that “confusion and diffusion” are not the only requirements for building solid cryptosystems.

The aim of this post has been to illustrate a practical slide attack on a simple, understandable cipher, and I hope it serves as a useful tool for learning more about this type of attack. The attacks aren’t hard to implement – I strongly recommend implementing them yourself if you want to understand the slide attack better! For reference, my implementations in Python can be found in this GitHub repo.

Previous Article
Next Article