# [Insomni’hack 2017] encryptor – reverse engineering

insomni’hack is an annual security conference in Geneva, Switzerland. Each year, they host a teaser competition prior to the conference. Here’s a writeup from one of the problems, which was to reverse engineer a cryptographic algorithm from a binary executable, worth 400 points (a hard problem). Our team was the only team out of 300+ teams to solve this challenge.

We are provided an encryption program and a ciphertext file. The encryption program takes no parameters (i.e. no external key) so we “only” have to implement a decryption program.

Opening the program in IDA, we see that it’s almost entirely AVX2 instructions, which IDA doesn’t decompile. We’ll have to attack the assembly directly. Below, a “yword” is a 256-bit (32-byte) quantity, named after the `ymmN` 256-bit AVX2 registers.

The `main` function has a fairly simple outer structure:

```load 19 constant ywords to [rbp-0x930..rbp-0x6d0] ("constants")
vzeroupper
call func1 to generate keys in [rbp-0x6d0..rbp-0x50] ("keys")

f = open("/tmp/plaintext")
while 1:
block = buf[pos:pos+0x80]
if len(block) < 0x80:
break
inblock = rearrange block with AVX2 instructions
outblock = func2(inblock, keys, constants)
buf[pos:pos+0x80] = rearrange outblock
pos += 0x80

open("/tmp/ciphertext", "wb").write(buf)
```

We can run the program in `gdb` and dump out the `constants`, `passkey`, and `keys` just before it reads the input file. Also, with a little bit of GDB sleuthing, we figure out that the block rearrangements just amount to reversing the 16-bit words in the block, i.e.

``````00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F
``````

becomes

``````1E 1F 1C 1D 1A 1B 18 19 16 17 14 15 12 13 10 11 0E 0F 0C 0D 0A 0B 08 09 06 07 04 05 02 03 00 01
``````

which is a self-inverting function. So, now all we have to do is reverse engineer `func2` (note that we didn’t even look at `func1`, because it’s only called to generate the `keys`). `func2` appears to take a single block of 0x80 = 128 bytes and encrypt it; no block chaining is used (so two consecutive identical blocks encrypt to the same identical output blocks).

`func2` is just a huge function with a normal function prologue and epilogue, with 625 AVX2 instructions in the middle – no jumps, conditionals, or even any non-AVX instructions in between. This seems like a super-daunting task to reverse – each instruction operates on 32 bytes of data, so plugging the whole mess into Z3 will probably fail (I didn’t even try).

So, what I did instead was just try to visualize how data flowed through these instructions. I wrote `analyze.py` to create a data flow graph, where nodes are the instructions (and `mov`s are elided for simplicity). The first few `analyze.py` attempts produced really bad graphs because I was leaving constants as separate inputs, even though they were used by many, many instructions (see `flow_bad.pdf`). After moving constants into the node labels, I got a much nicer graph (`flow.pdf`) which immediately revealed some high-level structure.

First, it’s obvious from the flow graph (at a high level, zoomed all the way out) that there are eight repeats of the same basic structure, suggesting an 8-round cipher (albeit one with 1024-bit block sizes). Second, one particular pattern stands out frequently: the use of a `vpmullw/vpmulhuw/vpor - vpsubusw/vpsubw - vpcmpeqw - vpsrlw/vpand/vpaddw/vpsubw` network to turn two inputs into a single output. Because every instruction in this pattern operates either word-wise or bitwise, we can analyze the network’s effect on a single pair of 16-bit inputs. As it turns out (through some lucky guessing and checking), this network computes `(x * y) % 65537`, but where 0 is swapped for 65536. It’s a very clever function, and it’s invertible if you know one of the inputs.

At this point, I started to code the cipher in the forward direction to check my understanding – see the first half of `decrypt.py`. By simply tracing the flow of data through one round and reimplementing the AVX instructions in Python, it was fairly easy to build the forward encryption algorithm (liberal amounts of debugging and tracing with `gdb` were used to check that each step was implemented correctly). After a couple hours, I got it working (producing identical output to the original program), so it was time to run the algorithm in reverse.

The round function breaks down as

```r0 = vmul65537(invecs, getvec(keys, keyoff + 0))
r1 =    vpaddw(invecs, getvec(keys, keyoff + 0x20))
r2 =    vpaddw(invecs, getvec(keys, keyoff + 0x40))
r3 = vmul65537(invecs, getvec(keys, keyoff + 0x60))
x0 = vpxor(r0, r2)
x1 = vpxor(r1, r3)
y0 = vmul65537(x0, getvec(keys, keyoff + 0x80))
y1 = vmul65537(vshufnet(vpaddw(y0, x1)), getvec(keys, keyoff + 0xa0))

o0 = vpxor(y1, r0)
o1 = vpshufb(vpxor(y1, r2), getvec(constants, 0x200))
o2 = vpshufb(vpxor(y2, r1), getvec(constants, 0x220))
o3 = vpshufb(vpxor(y2, r3), getvec(constants, 0x240))
```

where `vshufnet` is a complicated function mapping a single input to a single output involving a bunch of shuffles and XORs. The `vpshufb`s, which permute the first input’s bytes according to the second input, are all invertible thanks to the particular constants chosen.

By calculating `o0 ^ invshuf(o1)` we can recover `r0^r2 = x0`, which lets us get `y0`. Similarly, `invshuf(o2) ^ invshuf(o3)` gives `r1^r3 = x1`, which yields `y1` and then `y2` (just by running the forward calculations for `y0`, `y1`, and `y2`). With `y1` and `y2`, we can calculate `r0` through `r3` and thereby invert the round function.

Finally, we just invert all eight round functions to produce a decryptor – note that at no point do we have to invert `vshufnet`. When we’re done, we get a nice plaintext JPG, `plaintext.jpg`, with our flag:

``````INS{XuejiaLai_StudentOfJim_YOUROCK}
``````