Part 2: Reverse-Engineering the API
We need to understand our two target functions well enough to be able to call them. We’ll start with QDecode because it’s smaller and
will be quicker to understand.
Tools
We’ll use radare2. I don’t like it most of the time because of the wacky syntax, but it is great for one-liners for guides like these. You are encouraged to use Ghidra/IDA Pro instead if you’re more comfortable in there. Just pull the binaries from the lab with SCP.
Aside: The i386 cdecl calling convention
Since we’re targeting a 32-bit x86 binary, I’ll refresh your memory on how calls work in that architecture.
Arguments are pushed onto the stack before a call. After the prologue
(push ebp; mov ebp, esp), the stack layout at the beginning of the function is:
ebp+0x10 │ 3rd argument │
ebp+0x0c │ 2nd argument │
ebp+0x08 │ 1st argument │
ebp+0x04 │ return address │
ebp+0x00 │ saved ebp │
[ebp+0x08] = first arg, [ebp+0x0c] = second, and so on. The return value goes in eax before the function returns.
Target function: Qdecode (quoted-printable decoder)
$ r2 -a x86 -b 32 -qc 'aaa; s sym.Qdecode; pdf' $ROOT/bin/mailscanner
The function signature from r2’s analysis, sym.Qdecode (int32_t arg_8h, int32_t arg_ch), shows two arguments.
Prologue and allocation
0x080f8acc push ebp
0x080f8acd mov ebp, esp
...
0x080f8ae0 mov eax, dword [arg_ch] ; eax = arg2 (length)
0x080f8ae3 mov esi, dword [arg_8h] ; esi = arg1 (input string)
0x080f8ae6 mov dword [size], eax ; size = length
...
0x080f8af4 mov eax, dword [size]
0x080f8af7 inc eax ; size + 1
0x080f8af8 push eax
0x080f8af9 call malloc ; output = malloc(length + 1)
0x080f8b01 mov dword [var_40h], eax ; var_40h = output buffer
0x080f8b04 mov dword [var_30h], 0 ; var_30h = write_offset = 0
Qdecode(char *input, int length) returns a malloc‘d buffer, likely to be used as an output buffer.
For this buffer it allocates length + 1 bytes. var_30h tracks how many bytes have been written, so
I refer to it as write_offset.
This might be just enough information to write an initial fuzzer to test it out. First though, since we’re in the mindset of doing reversing work, let’s take a look at the next target function.
Target function: mime_content_type_new_from_string
This is a more difficult function to understand than the last one. It presumably parses Content-Type headers like:
text/html; charset=utf-8; boundary="----=_Part_123"; name="file.txt"
Function prototype
$ r2 -e scr.color=0 -a x86 -b 32 -qc 'aaa; s sym.mime_content_type_new_from_string; pdf' \
$ROOT/bin/mailscanner | head -5
; CALL XREF from fcn.0807d536 @ +0x302(x)
; CALL XREF from fcn.0807f8de @ 0x807fa0c(x)
┌ 2520: sym.mime_content_type_new_from_string (int32_t arg_60h, int32_t arg_64h, int32_t arg_68h);
│ `- args(sp[0x4..0xc]) vars(13:sp[0x10..0x5b])
│ 0x08086093 55 push ebp
The signature line shows three pointer-size arguments: arg_60h, arg_64h, arg_68h.
We can learn how these arguments are used by looking at functions that call them.
The CALL XREF comments at the top list every function that calls this one. The
+0x302(x) means the call is 0x302 bytes past the start of fcn.0807d536 (0x0807d838).
Here’s the disassembly leading up to the first call site:
$ r2 -e scr.color=0 -a x86 -b 32 -qc 'aaa; s 0x0807d818; pd 13' $ROOT/bin/mailscanner
; CODE XREF from fcn.0807d536 @ +0x2a6(x)
0x0807d818 8b542408 mov edx, dword [esp + 8]
0x0807d81c 8b7a04 mov edi, dword [edx + 4]
0x0807d81f e889e8ffff call fcn.0807c0ad
0x0807d824 29c7 sub edi, eax
0x0807d826 83ec04 sub esp, 4
0x0807d829 8d57fe lea edx, [edi - 2]
0x0807d82c 52 push edx
0x0807d82d 6a00 push 0
0x0807d82f 8b54240c mov edx, dword [esp + 0xc]
0x0807d833 8d440202 lea eax, [edx + eax + 2]
0x0807d837 50 push eax
0x0807d838 e856880000 call sym.mime_content_type_new_from_string
0x0807d83d 894618 mov dword [esi + 0x18], eax
Reading the three pushes bottom-to-top (remember cdecl pushes args in reverse order):
- 1st argument (
push eax):lea eax, [edx + eax + 2]is pointer arithmetic, computing an address into a buffer. This is probably theContent-Typestring to parse. - 2nd argument (
push 0): literal0. The other call site passes1here, so this is probably a flag or boolean field. - 3rd argument (
push edx):lea edx, [edi - 2]computes a small integer. Given that arg1 is a string, this is likely its length.
For our harness, the prototype we’ll assume is:
void *mime_content_type_new_from_string(const char *string, int flags, int length).
We’ll pass the fuzz input as arg1, 0 as arg2, and strlen(input) as arg3.
Cleanup
The target function name ends in _new_from_string, suggesting it allocates and returns
something. There’s a matching mime_content_type_destroy in the exports:
$ nm -D $ROOT/bin/mailscanner | grep mime_content_type
08085f2b T mime_content_type_copy
08085e4c T mime_content_type_destroy
08085e14 T mime_content_type_get_boundary
08085ddc T mime_content_type_get_filename
08085a47 T mime_content_type_new
08086093 T mime_content_type_new_from_string
The _new/_destroy pair tells us that the parse function allocates something, and we
need to call the destroy function after each fuzz iteration to free it. Otherwise we
leak memory and the fuzzer (and host system) will slow to a crawl.
Summary
char *Qdecode(const char *input, int len)returns an allocated string after it has decoded itvoid *mime_content_type_new_from_string(const char *s, int flags, int len)returns an allocated structure of some type, which we’ll free with…void mime_content_type_destroy(void *ct)frees the structure allocated bymime_content_type_new_from_string