By playing around with the service, it quickly becomes clear, that we can define different data chunks, which will be put together as soon as we affirm LSF.
Assembling them will create a new chunk and copy the previously defined chunks at the specified offsets into it.
The binary itself contained quite a lot of functions. Luckily enough, after getting a rough idea of how the chunks are allocated and filled, I took a break on reversing and started playing around with the heap instead. In the end, this worked out pretty well, since it was all about heap corruption.
Stripped pseudo-code:
So, when we specify a data chunk, the binary will create an input chunk and hex-encodes our input into it (AAAABBBB becomes 0xaaaabbbb). Then it creates a copy of that chunk (data chunk) and a pdu chunk, which contains the meta-information for this part of the sequence (offset, pointer to data, …). After creating the pdu chunk, the initial input chunk gets freed.
If we specified LSF with “Yes”, it will create a final chunk and copy the previously entered data into this at the corresponding offsets. Yet it misses to check, if the specified offset is inside the boundaries of this chunk.
This gives us the possibility to overwrite arbitrary values behind the chunk on the heap!
Let’s verify this:
For a quick poc, we’ll just overwrite 0x61a090, which is at offset 96 relative to our input chunk (which will serve as final chunk on assembly, since it gets freed before the final chunk is allocated and both have the same size)
So, we have an (almost) arbitrary write on the heap. Since badint allocates and frees a lot of fastbins, fastbin corruption shouldn’t be hard to achieve.
Leaking libc
The behaviour for the chunk assembly also introduces an use-after-free vulnerability. The final chunk won’t be zeroed out on creation. Thus, we can define mutliple data chunks at one offset (increasing the size of the final chunk), while letting others uninitialized (thanks to ebeip90 and nilch for pointing that out).
We can use that to obtain a libc address. Just need to create a data chunk, whose size is larger than fastbin size. When the input chunk gets freed, its FD/BK will get populated with pointers to main_arena. Again, since the input chunk gets freed before the final chunk gets allocated, the final chunk will occupy the same memory area.
With an appropriate offset, we’ll be able to leak the FD pointer.
Might be too verbose, but watching the heap should make it easier to understand, what’s happening, while the chunks get assembled.
At first, the input chunk will be created, and our input gets stored there:
After this, it will create the PDU chunk on the heap, and copy the input chunk into a new data chunk:
It will then free the input chunk and since it’s not a fastbin, it will be put into unsorted bin list and FD/BK will be populated with pointers to main_arena:
Since we chose to assemble the sequence immediately, it will now allocate the final chunk. Malloc will serve us the previous input chunk for that. The assembly function then copies our input from the data pointer to offset 8 in the final chunk, letting FD untouched.
Now that we have the base address of libc, we can start with corrupting the heap in order to get code execution.
Fastbin corruption
The idea behind fastbin corruption is to overwrite the FD pointer of a freed fastbin with an address of our choice. When this fastbin gets allocated another time, it will put our fake address into the fastbin list. This will mislead malloc into thinking, that there’s another freed fastbin at this address, which could be reused at a following allocation (as long as the fake chunk at this address fulfills some restrictions, that is).
If we succeed in tricking malloc to assume, that our fake address is a valid chunk, we can use this chunk to get an arbitrary write to that address. Used this to overwrite __malloc_hook, which will be called by malloc when allocating new chunks, thus calling our injected address.
Though, overwriting __malloc_hook is a little bit tricky, since we don’t control the data around it. Just passing the address of __malloc_hook into the fastbin list will fail, since it doesn’t have a valid size field.
But we can get around this by misaligning our chunk (uafio showed this in BabyHeap2017)
On allocation of a fastbin, malloc checks, that the size field of the target chunk matches the one of the index of the fastbin.
If we’d now just pass the address of malloc hook for our fake fastbin, it would interpret 0x7fbd777d5738 (=> 0x0) as the chunk size, which will make the malloc checks fail.
But by misaligning the address to 0x7fbd777d572d, malloc would interpret 0x7f as size, thus passing the checks and allocating this as a fastbin chunk.
Exploiting it
So much for the theory, let’s see, how this all can be applied to the challenge service.
First, we’ll need to forge an appropriate heap structure. Since we can only overwrite values behind the final chunk, we have to make sure that the fastbin, whose values we want to overwrite, is positioned behind it.
To ensure this, I first created a smaller chunk (size 0x16) and then a bigger chunk (size 0x60):
The final chunk for the second sequence was allocated at 0x61ad70 and then got freed. Thus, it was put into the fastbin list (see fastbinsY[5]). If we would now allocate another fastbin with the same size, malloc would give us this chunk again and puts its FD pointer (currently 0x0) into the fastbin list. But if we overwrite the FD pointer with the (misaligned) address of __malloc_hook before allocating a new fastbin, it will put that address into the fastbin list instead. The next allocation with a size around 0x70 will then return a chunk overlapping __malloc_hook, enabling us to overwrite it with arbitrary data.
So we just have to create a new (small) data chunk with offset 0x150 (0x61ad80 (target) - 0x61ac30 (input buffer for small chunks)) and assemble it. The assembled chunk will be created at 0x61ac30 and our data chunk will be copied to offset 0x150 from there, thus landing in 0x61ad80 (FD pointer for the chunk in fastbinsY[5])
The input buffer for the next sequence will now be created at 0x61ad70, stuffing our fake address into fastbin list. Allocating the final chunk will then serve a chunk overlapping __malloc_hook. When the binary assembles our sequence, it will copy our input to that chunk, thus overwriting the hook.
On the next allocation, malloc will execute __malloc_hook, jumping to the address we wrote there.
So, we’ve got code execution. Searched for a one_gadget, sure enough I’d be seeing a shell very soon. And so I did… locally…
While execve("/bin/sh", 0, 0) worked on my machine, popping a shell, the remote service wasn’t really happy about it:
Wuss? Luckily enough, Zach (ebeip90) pointed out, that the remote services are running busybox, which would be the reason that execve("/bin/sh", 0, 0) fails and we’d have to do an execve("/bin/sh", ["sh"], 0) instead.
So, one_gadget was no longer an option, since I didn’t have an address for an array containing “sh”. After some more failures trying to get the gadget working anyways, I finally switched to stack pivoting and preparing a rop chain on the stack.
The rop chain would basically read user input to a known address (0x604900). With that, we can setup the array for argv (pointer to “sh” / “sh”) and then call execve ("/bin/sh", 0x604900, 0).
A possible stack pivoting gadget can be found in libc:
This basically is add rsp, 0xa8, which will land in our input buffer.
Since our data gets read by fgets(&s, 768, stdin), our input will get stored onto the stack first. When malloc now tries to allocate the chunk to store our input, it will call __malloc_hook, which is now pointing to the add rsp, 0xa8 gadget. This will move the stack pointer to the start of our rop chain, thus executing it.
The rop chain will then wait for input to store at 0x604900, for which we’ll send the data to construct the argv array.
Now we should be able to call execve in a way, that hopefully executes even on the remote service: