This is just another write-up for the last challenge of the flareon 5 CTF I joined last year. In the core of the challenge are two nested VMs as many people wrote about it. The purpose of this blog post is to share my blackbox-style solution using an SMT solver. I don’t think using an SMT was a good decision, but I hope someone is interested.

I’d like to give a special thanks to (my friends) KT (@koczkatamas) and Alex (@alex_k_polyakov) for discussing the challenge.

Non-SMT Solution

My original strategy for the challenge was applying a taint analysis and an SMT solver to avoid reading the shitty VM code. But I had to give up that strategy after spending more than a week on it because of the time constraint and I switched to reading the first VM code. Eventually I found a way of bruteforcing the flag on the second VM emulator, without knowing the exact form of the constraints on the flag.

After the competition, I went back to the original strategy with an SMT solver and got the flag (again). I’m going to talk about the details of it.

Magic Words

The most important information that we get from an analysis on the distribution of words inside the VM code binary is a sequence of suspicious words that starts at the offset 0x24de. These words turn out to be an encoded form of the flag.

alt

Emulation

An emulator for the first VM (Subleq) can be so simple. The first version of my emulator was also simple and boring. By logging accesses to the magic words on the emulator with various inputs, we can find that only the first 15 words are actually used. This is indicative of the fact that the total length of the flag is 30 characters (including the trailing ‘@flare-on.com’ part).

Symbolization

Then I added a feature of taint analysis and symbolization to my poor emulator with Z3Py. I replaced each character of the password with bit vectors x_i (i = 0, 1, …) and each magic words with bit vectors y_j (j = 0, 1, …). The emulator tries to trace tainted values and express those with x_i and y_j.

In general when two values a and b are compared to see if they are equal, a - b (and also b - a) is stored into the second operand of a subleq instruction. If each magic word y_j is expressed with x_i, x_i and y_j should appear at the same time in an expression stored in the second operand. In other words, x_i and y_j are expected to appear in the same conditional jump.

Let’s see an example. If I provide ‘AAAAAAAAAAAAAAAAA@flare-on.com’ as a password and log the values of the second operands with the emulator, I get the following expression in Z3Py:

alt

The output means, y_00 (= 0xfc7f) is being compared with 56481 + Σx_i. What is 56481? With some experiments, we can conclude that the number depends only on the pair of characters x_0 and x_1. Formally expressed, we get the following assumption:

Making the same analysis on y_j (j > 0), we get a more general expression (or constraint):

The question is: what does f_j exactly look like? I guessed it with an SMT solver.

Guess What

The idea is so simple. I collected I/O pairs using a sampler and chose a candidate expression from the set of expressions generated by a generator, then solved the I/O constraints with an SMT solver.

The sampler is based on the VM emulator. It runs the emulator with a random password as an input and extract a number from a Z3 expression used to check the password, i.e. 56481 shown above. The sampling is so slow.

# python sampler.py 0 5
(101, 108) => 62021
(99, 98) => 60739
(78, 48) => 54318
(113, 48) => 54353
(74, 57) => 55466
{(74, 57): 55466, (101, 108): 62021, (99, 98): 60739, (113, 48): 54353, (78, 48): 54318}
#

The generator produces a series of expressions in the variables given based on the two arguments of the Generator.generate method. The first argument n controls the order of expressions generated, while the second argument m controls the number of constants involved.

My guesser program uses the generator and takes I/O pairs from the sampler.

# python guesser.py
f[x, y] = c0 + c0 + c58 + c1 + x + c1 + c2 + c59 + c3*y where [c0 = 11968,
 c3 = 128,
 c1 = 25632,
 c59 = 61440,
 c2 = 57024,
 c58 = 51040]
f[x, y] = 48096 + x + 128*y
#

Relations

In summary, I got the following results from the guesser:

f_0 = 48096 + x + 128*y
f_1 = 48128 + (33 ^ (x + 65504)) + 128*y
f_2 = 49024 + (64578 ^ (x + 96)) + 128*y
f_3 = 23328 + (24931 ^ x) + 128*y
f_4 = 31584 + (32772 ^ x) + 65408*(15998 ^ y)
f_5 = 21504 + (59557 ^ (96 + x)) + 128*(16129 ^ y)
f_6 = 58816 + (5062 ^ (1440 + x)) + 65408*(15998 ^ y)
f_7 = 19232 + (28903 ^ x) + 65408*(16382 ^ y)
f_8 = 46304 + (10248 ^ x)

Here f_8 doesn’t contain y as I assumed the 18th character of the flag was ‘@’.

Also note that the guesser can make mistakes especially when you don’t provide it with enough I/O samples. In my case, I collected 30+ samples for each function. I think it’s more than enough as at most 5 constants are involved in a function.

Solver

Based on the results above, I wrote the following code, which unveils the flag: Av0cad0_Love_2018@flare-on.com.

#!/usr/bin/python

import z3

bits = 16

x = [z3.BitVec('x_%02x' % i, bits) for i in xrange(0x11)]
y = [0xfc7f, 0xf30f, 0xf361, 0xf151, 0xf886, 0xf3d1, 0xdb57, 0xd9d5, 0xe26e]
f = [lambda x,y:48096 + x + 128*y,
     lambda x,y:48128 + (33 ^ (x + 65504)) + 128*y,
     lambda x,y:49024 + (64578 ^ (x + 96)) + 128*y,
     lambda x,y:23328 + (24931 ^ x) + 128*y,
     lambda x,y:31584 + (32772 ^ x) + 65408*(15998 ^ y),
     lambda x,y:21504 + (59557 ^ (96 + x)) + 128*(16129 ^ y),
     lambda x,y:58816 + (5062 ^ (1440 + x)) + 65408*(15998 ^ y),
     lambda x,y:19232 + (28903 ^ x) + 65408*(16382 ^ y),
     lambda x,y:46304 + (10248 ^ x)]

s = z3.Solver()
for bv in x:
    s.add(bv < 0x80)
    s.add(bv > 0x20)
    s.add(bv != 64)

_x = x + [z3.BitVecVal(64, bits)]

for j in xrange(9):
    s.add(f[j](_x[2*j], _x[2*j+1]) + z3.Sum(x) == y[j])

if s.check() == z3.sat:
    m = s.model()
    print ''.join(map(lambda x:chr(m[x].as_long()), x)) + '@flare-on.com'