Tataru was a heap challenge based on musl-libc, which gave me a hard time, since I had nearly no experience with musl-libc at all.
To be honest, even while doing this challenge, I avoided to really look into musl much, but solved this challenge in a more “ghetto way” instead by just analyzing everything in gdb, searching for good addresses and experimenting a lot.
So, let’s take a look at the different functionalities:
1.Summon a pet to fight with tataru.
For creating a pet, we first have to send 4 bytes, which will be the size for the spell to use (buffer) and then one byte, which will be the index, at which the pet will be stored.
It will then allocate a spell buffer for it and read our input into it. The length of our input will be stored in offset and the initial size in size.
And here’s already the main bug for this challenge: if we specify a size > 0x1000, it will store the size in the pet object, but then bail out of the function with the message no more magic without reallocating the object.
We can use this to create an object with a smaller size (for example 0x30) and then call this function again with a size of 0x1200, which will result in a pet, which has a 0x30 buffer allocated, but a size of 0x1200. This will come in handy later on.
2.Order pet to leave the battlefield.
This will just set the offset of the object to size. This would normally prevent us to append more data to the buffer, since the append function will check, that offset is smaller than size.
But as we’ve already seen, we will be able to manipulate size for the pet afterwards again, so we can abuse this to set offset to an arbitrary value.
3.Order pet to parpre for attacking the boss.
This option lets us append data to our pets spell buffer (at the current offset).
That check will however verify, that the current buffer offset is not bigger than libc base address, so no easy leaks or overwrites will be possible.
This also results in some quirky behaviour of the challenge, since musl will allocate some chunks between libc and stack and the challenge will terminate, if you try to write to those chunks (though they aren’t corrupted or anything), which was kinda weird in the beginning.
It’s also important to note here, that read will not segfault, if it tries to read into an unmapped memory region. We would just get a "failed" response from the challenge in that case.
4.Order pet to start attacking the boss.
This is pretty much a leaking function. It will print out the string at buffer+offset. But again, this will also check, that the address is not bigger than base address of libc, which made it quite hard to get any useful leaks except heap addresses.
5.Order pet to attack tataru.
This will just quit the application, which could be useful, if you’d manage to write a ropchain on the stack and return into it. But I didn’t find a way to leak a stack address, so didn’t use it all.
So, the main bug in this challenge is the fact, that we can change the size of an already allocated chunk without reallocating it.
With this we can simply overflow the spell buffer chunk. But combining this with option 2, which sets the offset to current size, we can even do an arbitrary “relative” oob write and read.
Allocate a chunk with size < 0x1000
Call option 1 again with size = relative offset to write to
Call option 2 to set offset to current size
Call option 1 again and set size to a bigger value
Can now write or read to it
Let’s start with a heap leak:
musl tries to optimize memory usage and will also reuse unused memory from bss.
After allocating 5 chunks (which were served from a region behind libc), the 6th chunk was allocated on bss.
By corrupting the size of the first pet, we can align it’s buffer to the heap address and leak it.
Since options 2-5 all had those pesky libc checks, we couldn’t use those to get a buffer in or behind libc.
Only option 1 doesn’t check the buffer address before writing to it, so I was determined that the only way to get a write into libc was to exploit musls allocation to serve us a buffer inside libc.
But for corrupting anything in musls arenas, we would need to be able to calculate the offset from a chunk on bss to an address on the heap, where musls metadata was stored. For this, we’d need to get a leak for bss first…
Sounds easy, but getting this leak took me the most time.
I allocated chunks in different ways for hours, trying to find a situation, in which a libc or address from the challenge might be somewhere behind it, so we could read it via the oob read, but no luck…
At some point, I got desparate and thought: Ok, screw it, with ASLR active, heap and bss address will “only” differ by 2 bytes, maybe give bruteforcing a try…
But, well yeah…. Bruteforcing two bytes will take wayyyy too long remote (every try took 10 seconds, so it might have finished in a week or so).
But thinking about bruteforcing lead me to another idea. Remember, that the read in option 3 will not segfault when reading into an unmapped region? Yep, we can totally use this, to create a “heap offset finder” :)
First we align the offset to the end of bss (0x0000562f10f4efa0 + 0x1060 = 0x562f10f50000).
Then we’ll just try to read from that offset. If it fails, we’ll add 0x1000 to the offset and try again, until the read succeeds. As soon as read doesn’t fail, we will have hit the start of heap and by this know the offset from the current buffer to heap.
The reason I added another 8 byte to the offset is, that we don’t want to write into the first qword on the heap, because there will be a canary, which musl uses to validate its arena regions. So, our first valid write will write to heap+0x8, which is totally fine.
As soon as we know the offset to the heap, we can calculate elf base with our previous heap leak.
Depending on ASLR, finding the offset can take a long time and I already feared, that this approach would be blocked by a timeout remote, but as it turned out, it worked totally fine (the dockerfile, that was released later, also showed, that the challenge had a 30 minute timeout).
So, armed with a heap and elf leak, we can now finally calculate the needed offset to overwrite stuff on the heap with our relative oob write.
Since I still didn’t want to dig deeper into musl, I went on with “dynamic analysis” >;)
I allocated some chunks (size 0x30) and checked where they got allocated. Then I restarted the exploit and paused it, before the chunk would get allocated and searched the heap for an address around the address, where the next chunk will be stored and then used the oob write to write another address to it before.
Doing this, I found, that the next arena would be created at the address stored at 0x55555555a118. So, let’s overwrite it with a pointer to bss.
After this, the next allocated chunk would overlap our ENTRIES table, so we can write arbitrary pointers into it.
I pointed slot 0 to the entry table itself, and slot 1 to read.got.
Since the first pet buffer points to the entry table itself, we can use it to overwrite the entry table again.
At this point, I prepared a “helper buffer” which can be accessed via slot 0. This will be needed later in the exploit. Will skip this for now, but we’ll get back to this.
Let’s first think about how we can achieve rip control. We don’t have a stack leak, and due to the checks in the binary, I’m not sure, if it’s even possible to get one, so we cannot put a ropchain on the stack and return to it.
I thought, musl might also be using some kind of exitfuncs, so I debugged through the code of exit and indeed it does. We can easily trigger exit by pointing a chunk to somewhere in or behind libc and access it (though it was bugging us the whole time, now it comes to our rescue).
At 0x7ffff7f666b4 it will read the address of the exitfuncs table from 0x7ffff7ffdd48.
At 0x7ffff7f666c4 it will read the count of entries from 0x7ffff7ffdf64.
If we set entry count to 1, rax will now be 0 and r12 will be read from [rdx+0x108] (which will be moved to rdi at 0x7ffff7f66707).
We can use this to set the argument for the function that will be called.
At 0x7ffff7f666fdrbp will be set to [rdx+8] and will then be called.
So to get an arbitrary call, we need to
Write the address of a fake exitfunc table to 0x7ffff7ffdd48
Write 0x1 (for rcx) to 0x7ffff7ffdf64
Write the address of the gadget we want to call into our fake exitfunc table
Write the argument (rdi) for our gadget into fake exitunc table +0x108
This will point the exitfunc table address to our previously created helper buffer and the second allocation will overwrite 0x7ffff7ffdf64 with 0x1, setting rcx to the correct value.
So, we need now to go back to the allocation of our helper buffer and prepare our fake exitfunc table there.
Let’s check exit again.
Perfect, we got rip control and even control the argument in rdi.
Could be finished already, but we won’t be able to call system("/bin/sh") or one_gadget since the challenge uses seccomp
So, we will need to do a ropchain. Tried to find a gadget for pivotting the stack into some controlled area, and musl libc just contained a wonderful gadget to do this:
Since we control rdi, we can just put an address at rdi+0x30 pointing to one of our buffers and put a simple ret gadget at rdi+x038. Calling this, will then pivot the stack into our buffer and continue execution there.
What a journey, but finally we’re ropping :)
So, all that’s left now, is to open/read/write the flag via our ropchain.
exit will now trigger the ropchain and print the flag.
Due to the offset finder, the exploit will take quite some time to execute remote, since depending on the random gap between bss and heap it will need 300-20000 requests to find the correct offset to the heap.
Since there were only challenge servers in China available, it would have taken hours to execute it, since the connection was extremely slow.
Thanks to the organizers here for setting up a challenge server in Hong Kong, after we pointed out the slow connection. Connecting to that one via an AWS instance, the exploit ran a lot smoother and finished in about 5 minutes.