regfuck was a brainfuck-like “mini vm”, which allowed moving a cell pointer, increasing, decreasing values and some basic branching.
Like most bf challenges it had an OOB vulnerability, that would allow us to overwrite got entries and thus gaining a shell.
The first challenge was to understand, how our code would be parsed and what we could do with it.
So, it reads 4 bytes for the data_size (used for storing cell values), 4 bytes for program_size (size of our code in bytes) and creates a memory region for our vm context (data + program).
Here’s the first caveat: While starting exploits, I’ll often disable ASLR to make debugging easier, not having to fiddle around with changing memory addresses. This is a very bad idea to do here ;-)
If ASLR is disabled, the heap will be allocated at 0x405000 and thus mmap will allocate a new region (for example at 0x7ffff7dc8000), which will make it impossible to exploit it (or at least way harder than needed).
With ASLR enabled, heap will be allocated at some random address, allowing mmap to create the region exactly at 0x405000, which is perfectly aligned to bss.
This will become important later on, but let’s first continue to analyze how to provide executable code to the vm.
The timer checks are a bit nasty, when debugging this, since it will break execution when doing break & continue (except you’re fast enough to debug everything in 2 seconds :-P). I just patched the binary to ignore the condition making debugging a lot easier.
Apart from this, this will just initialiaze an index variable (let’s call it vm instruction pointer), calling repeatedly process_code (passing vm_ip also as a reference, so the code could update it when branching).
So, our instruction pointer is basically a bit offset in our code, and process_code will check if the bit at the current ip is set to 1 or 0. When the bit is set, it will increase OP_Code, and if the bit is not set, it will use the current value in OP_Code to find the offset in OP_CODE_TABLE, execute the corresponding opcode and reset OP_Code back to 0 (start reading the next opcode).
There will be 9 instructions available. Thus to execute different opcodes, we have to create a “bit stream”, with n following bits set to 1 and a stop 0 bit for executing OPCODE(n).
Basic starter script:
With this, we can start writing vm code and execute the different available opcodes.
Won’t be getting too much into detail of reversing the opcodes itself, but as a short explanation:
1 : Increase current cell index (0x40123b)
2 : Decrease current cell index (0x40126f)
3 : Increase value at current cell (0x40129e)
4 : Decrease value at current cell (0x4012c8)
5 : Jump to IP stored in accumulator if current value > 0 (0x4012f2)
6 : Move value at current cell into accumulator (0x401335)
7 : putchar (current cell) (0x401364)
8 : Store current IP and Index array in current cell (0x40138d)
9 : Restore current cell index from value in accumulator (0x4013c9)
Calling function 8, will store the current instruction pointer in the upper 2 bytes of the current cell and the current cell index in the lower 2 bytes.
Interesting parts
There is a basic boundary check, when decreasing the current cell index, as it will validate, that the current cell index does not equal 0xffff (-1).
If cell index is -1, program execution will be aborted, so we cannot just decrease our way into bss.
But the boundaries won’t be checked in the “set array index function”. Thus we can move outside of the boundaries of our vm context, by first decreasing a cell value to a negative value, storing it in the accumulator, and then calling “set array index”, which will directly set the array index to the negative value (and as long as it doesn’t equals -1, we’re good to go).
Armed with this, I wrote a script to “walk” into got, and “decreased” putchar.got to point to a one_gadget. In CTF ghetto style, I only did as much as needed to write exactly that value to the got entry, which on the one hand worked and resulted in getting a local shell…
But well… failed miserably at the remote service, most likely due to a different stack layout.
Since my regfuck code at that point was even harder to read, than it was to write, every change in it just broke everything. While messing around with the script, it got quite late and I needed some sleep, so I called it a day…
Next morning, I just dropped everything and started from scratch again (though ending up in the end with the same script as the day before, but at least understanding again, what I was doing ;)).
While it seemed, that we could write an almost arbitrary size of vm code, only walking to the got entry and decreasing it 23894482 times (totally random value) seemed to break code execution, so I had to come up with a proper code including some counter variables.
Plan:
Create a negative value in a cell (so it would point below got table)
Move it to the accumulator
Set the array index to accumulator (which now holds our forged negative index in the lower 2 bytes and thus moved the array index backwards)
Move one cell up and write a multiplicator value there
Move back and store the current instruction pointer (and current cell value) to this cell
Reset index array (it will still point to the same cell at this point, but this is needed for making the loop work)
Move cell pointer to our destination address (putchar.got)
Increase/Decrease it by a factor value
Move back to our multiplicator value and decrease it by 1
Do a “jump if not zero” (on multiplicator) back to the stored instruction pointer (6) from the accumulator (The cell index is now pointing to the multiplicator cell and not the instruction pointer cell, as it was, when the loop was executed the first time. That’s the reason we’ll reset the array index in 6, so the state is the same for every loop)
When we finished increasing/decreasing the value m * f times, we once again walk back to the destination address and increase/decrease it by the remaining value
Go back into initial state (for starting more writes, which weren’t needed in the end)
Since the regfuck code might not be too easy to understand at this point, some more details, what’s happening in memory:
1 - Creating fake index in data context
2 - Moving fake index to accumulator
3 - Set the array index to accumulator
4 - Move one cell up and write a “multiplicator” value there
5 - Move back and store the current instruction pointer (and current cell value) to this cell
6 - Reset index array
7 - Move cell pointer to our destination address (putchar.got)
8 - Increase/Decrease it by a “factor” value
9 - Move back to our multiplicator value and decrease it by 1
10 - Do a “jump if not zero” (multiplicator value) back to the stored instruction pointer
After some repetitions
Until multiplicator hits zero and we’ll jump out of the loop
11 - Walk back to the destination address and increase/decrease it by the “remaining” value
So, at this point, putchar is now pointing to system. A /bin/sh string and a pointer to it at a known address, is all we need now to do a proper system("/bin/sh"). At first, I thought about also increasing values to forge the pointer and the address, but…
We can just append them to the end of our program code, which makes this last step much easier ;)
This way, we know that we’ll have a pointer to the string at 0x405c88 (and /bin/sh at 0x405c90). So we just need to get the current cell index point to the cell 0x405c88 - 0x405008 and trigger putchar.
But, we cannot just increase the cell index multiple times (because every increase would increase the program size and move our values further down, by which we’d end up in an endless catchup game).
In the end, I just did the same as for increasing the got entry multiple times and wrote another short vm code, which will create the needed array index, pointing to the /bin/sh pointer and set it via set array index.
After this, putchar is triggered
rdi holding 0x405c90 pointing to our /bin/sh string and now happily triggering system
Fortunately, this didn’t have the problems anymore like the one_gadget approach and also resulted in a shell when running remote :)