Part 2: Reverse-Engineering the function
We need to understand our two target functions well enough to be able to call them. We’ll start with QDecode because it’s smaller and
will be quicker to understand.
Tools
We’ll use radare2. I don’t like it most of the time because of the wacky syntax, but it is great for one-liners for guides like these. You are encouraged to use Ghidra/IDA Pro instead if you’re more comfortable in there. Just pull the binaries from the lab with SCP.
Aside: The i386 cdecl calling convention
Since we’re targeting a 32-bit x86 binary, I’ll refresh your memory on how calls work in that architecture.
Arguments are pushed onto the stack before a call. After the prologue
(push ebp; mov ebp, esp), the stack layout at the beginning of the function is:
ebp+0x10 │ 3rd argument │
ebp+0x0c │ 2nd argument │
ebp+0x08 │ 1st argument │
ebp+0x04 │ return address │
ebp+0x00 │ saved ebp │
[ebp+0x08] = first arg, [ebp+0x0c] = second, and so on. The return value goes in eax before the function returns.
Target function: Qdecode (quoted-printable decoder)
r2 -a x86 -b 32 -qc 'aaa; s sym.Qdecode; pdf' $ROOT/bin/mailscanner
The function signature from r2’s analysis, sym.Qdecode (int32_t arg_8h, int32_t arg_ch), shows two arguments.
Prologue and allocation
┌ 291: sym.Qdecode (int32_t arg_8h, int32_t arg_ch);
│ `- args(sp[0x4..0x8]) vars(11:sp[0x10..0x44])
│ 0x080f8acc 55 push ebp
│ 0x080f8acd 89e5 mov ebp, esp
| ...
│ 0x080f8ae0 8b450c mov eax, dword [arg_ch]
│ 0x080f8ae3 8b7508 mov esi, dword [arg_8h]
│ 0x080f8ae6 8945cc mov dword [size], eax
| ...
│ 0x080f8af4 8b45cc mov eax, dword [size]
│ 0x080f8af7 40 inc eax
│ 0x080f8af8 50 push eax
│ 0x080f8af9 e8728af6ff call sym.imp.malloc ; malloc(size + 1)
│ 0x080f8afe 83c410 add esp, 0x10
│ 0x080f8b01 8945c0 mov dword [var_40h], eax ; store allocated buf in var_40h
│ 0x080f8b04 c745d00000.. mov dword [var_30h], 0 ; store 0 in var_30h
Setting return value, epilogue
│ │ │ ; CODE XREFS from sym.Qdecode @ 0x80f8b0e(x), 0x80f8b3d(x)
│ └─└──> 0x080f8bcc 8b45c0 mov eax, dword [var_40h] ; storing allocated buf from start in return val
│ 0x080f8bcf 8b5dcc mov ebx, dword [size]
│ 0x080f8bd2 8b55e4 mov edx, dword [var_1ch]
│ 0x080f8bd5 6533151400.. xor edx, dword gs:[0x14]
│ 0x080f8bdc c6041800 mov byte [eax + ebx], 0 ; writing null byte at output_buffer[size]
│ ┌─< 0x080f8be0 7405 je 0x80f8be7
│ │ 0x080f8be2 e819570000 call fcn.080fe300
│ │ ; CODE XREF from sym.Qdecode @ 0x80f8be0(x)
│ └─> 0x080f8be7 8d65f4 lea esp, [var_ch]
│ 0x080f8bea 5b pop ebx
│ 0x080f8beb 5e pop esi
│ 0x080f8bec 5f pop edi
│ 0x080f8bed 5d pop ebp
└ 0x080f8bee c3 ret
Qdecode(char *input, int length) returns a malloc‘d buffer, and uses it as an output buffer.
For this buffer it allocates length + 1 bytes. var_30h tracks how many bytes have been written, so
I refer to it as write_offset.
This might be just enough information to write an initial fuzzer to test it out. First though, since we’re in the mindset of doing reversing work, let’s take a look at the next target function.
Target function: mime_content_type_new_from_string
This is a more difficult function to understand than the last one. It presumably parses Content-Type headers like:
text/html; charset=utf-8; boundary="----=_Part_123"; name="file.txt"
Function prototype
r2 -e scr.color=0 -a x86 -b 32 -qc 'aaa; s sym.mime_content_type_new_from_string; pdf' \
$ROOT/bin/mailscanner | head -5
; CALL XREF from fcn.0807d536 @ +0x302(x)
; CALL XREF from fcn.0807f8de @ 0x807fa0c(x)
┌ 2520: sym.mime_content_type_new_from_string (int32_t arg_60h, int32_t arg_64h, int32_t arg_68h);
│ `- args(sp[0x4..0xc]) vars(13:sp[0x10..0x5b])
│ 0x08086093 55 push ebp
The signature line shows three pointer-size arguments: arg_60h, arg_64h, arg_68h.
We can learn how these arguments are used by looking at functions that call them.
The CALL XREF comments at the top list everywhere that reference/call this one.
Here’s the disassembly leading up to the 2nd call site (with some annotation):
r2 -a x86 -b 32 -qc 'aaa; s 0x807fa0c + 5; pd -20' $ROOT/bin/mailscanner
│ 0x0807f9c1 c746280000.. mov dword [esi + 0x28], 0
│ 0x0807f9c8 c746200000.. mov dword [esi + 0x20], 0
│ 0x0807f9cf c746240000.. mov dword [esi + 0x24], 0
│ 0x0807f9d6 83ec0c sub esp, 0xc
│ 0x0807f9d9 50 push eax ; int32_t arg_20h
│ 0x0807f9da e86d640000 call sym.mime_content_type_destroy
│ 0x0807f9df c746180000.. mov dword [esi + 0x18], 0
│ 0x0807f9e6 83c410 add esp, 0x10
│ ; CODE XREF from fcn.0807f8de @ 0x807f9bf(x)
│ 0x0807f9e9 8b3c24 mov edi, dword [esp]
│ 0x0807f9ec 85ff test edi, edi
│ ┌─< 0x0807f9ee 0f84ab000000 je 0x807fa9f
│ │ 0x0807f9f4 b9ffffffff mov ecx, 0xffffffff ; -1
│ │ 0x0807f9f9 b800000000 mov eax, 0
│ │ 0x0807f9fe f2ae repne scasb al, byte es:[edi] ; inline strlen
│ │ 0x0807fa00 f7d1 not ecx
│ │ 0x0807fa02 83ec04 sub esp, 4
│ │ 0x0807fa05 51 push ecx ; int32_t from inline strlen
│ │ 0x0807fa06 6a01 push 1 ; 1 ; int32_t arg_64h flag
│ │ 0x0807fa08 ff74240c push dword [var_ch_3] ; int32_t "from_string"
│ │ 0x0807fa0c e882660000 call sym.mime_content_type_new_from_string
This is a lot easier to parse in Ghidra’s pseudocode…
Reading the three pushes bottom-to-top (remember cdecl pushes args in reverse order):
- 1st argument (
push dword [var_ch_3]): used in strlen above, likely the string to parse - 2nd argument (
push 1): literal1. This is probably a flag or boolean field. We will try in our harness with1or0. - 3rd argument (
push ecx): result of inlined strlen above, so the length of the string
For our harness, the prototype we’ll assume is:
void *mime_content_type_new_from_string(const char *string, int flags, int length).
We’ll pass the fuzz input as arg1, 1 as arg2, and strlen(input) as arg3.
Cleanup
The target function name ends in _new_from_string, suggesting it allocates and returns
something. There’s a matching mime_content_type_destroy in the exports:
nm -D $ROOT/bin/mailscanner | grep mime_content_type
08085f2b T mime_content_type_copy
08085e4c T mime_content_type_destroy
08085e14 T mime_content_type_get_boundary
08085ddc T mime_content_type_get_filename
08085a47 T mime_content_type_new
08086093 T mime_content_type_new_from_string
The _new/_destroy pair tells us that the parse function allocates something, and we
need to call the destroy function after each fuzz iteration to free it. Otherwise we
leak memory and the fuzzer (and host system) will slow to a crawl.
Summary
char *Qdecode(const char *input, int len)returns an allocated string after it has decoded itvoid *mime_content_type_new_from_string(const char *s, int flags, int len)returns an allocated structure of some type, which we’ll free with…void mime_content_type_destroy(void *ct)frees the structure allocated bymime_content_type_new_from_string