Part 2: Reverse-Engineering the function

We need to understand our two target functions well enough to be able to call them. We’ll start with QDecode because it’s smaller and will be quicker to understand.

Tools

We’ll use radare2. I don’t like it most of the time because of the wacky syntax, but it is great for one-liners for guides like these. You are encouraged to use Ghidra/IDA Pro instead if you’re more comfortable in there. Just pull the binaries from the lab with SCP.

Aside: The i386 cdecl calling convention

Since we’re targeting a 32-bit x86 binary, I’ll refresh your memory on how calls work in that architecture.

Arguments are pushed onto the stack before a call. After the prologue (push ebp; mov ebp, esp), the stack layout at the beginning of the function is:

ebp+0x10 │  3rd argument    │
ebp+0x0c │  2nd argument    │
ebp+0x08 │  1st argument    │
ebp+0x04 │  return address  │
ebp+0x00 │  saved ebp       │

[ebp+0x08] = first arg, [ebp+0x0c] = second, and so on. The return value goes in eax before the function returns.

Target function: Qdecode (quoted-printable decoder)

r2 -a x86 -b 32 -qc 'aaa; s sym.Qdecode; pdf' $ROOT/bin/mailscanner

The function signature from r2’s analysis, sym.Qdecode (int32_t arg_8h, int32_t arg_ch), shows two arguments.

Prologue and allocation

┌ 291: sym.Qdecode (int32_t arg_8h, int32_t arg_ch);
│ `- args(sp[0x4..0x8]) vars(11:sp[0x10..0x44])
│           0x080f8acc      55             push ebp
│           0x080f8acd      89e5           mov ebp, esp
|           ...
│           0x080f8ae0      8b450c         mov eax, dword [arg_ch]
│           0x080f8ae3      8b7508         mov esi, dword [arg_8h]
│           0x080f8ae6      8945cc         mov dword [size], eax
|           ...
│           0x080f8af4      8b45cc         mov eax, dword [size]
│           0x080f8af7      40             inc eax
│           0x080f8af8      50             push eax
│           0x080f8af9      e8728af6ff     call sym.imp.malloc         ;  malloc(size + 1)
│           0x080f8afe      83c410         add esp, 0x10
│           0x080f8b01      8945c0         mov dword [var_40h], eax    ;  store allocated buf in var_40h
│           0x080f8b04      c745d00000..   mov dword [var_30h], 0      ;  store 0 in var_30h

Setting return value, epilogue

│    │ │    ; CODE XREFS from sym.Qdecode @ 0x80f8b0e(x), 0x80f8b3d(x)
│    └─└──> 0x080f8bcc      8b45c0         mov eax, dword [var_40h]    ; storing allocated buf from start in return val
│           0x080f8bcf      8b5dcc         mov ebx, dword [size]
│           0x080f8bd2      8b55e4         mov edx, dword [var_1ch]
│           0x080f8bd5      6533151400..   xor edx, dword gs:[0x14]
│           0x080f8bdc      c6041800       mov byte [eax + ebx], 0     ; writing null byte at output_buffer[size]
│       ┌─< 0x080f8be0      7405           je 0x80f8be7
│       │   0x080f8be2      e819570000     call fcn.080fe300
│       │   ; CODE XREF from sym.Qdecode @ 0x80f8be0(x)
│       └─> 0x080f8be7      8d65f4         lea esp, [var_ch]
│           0x080f8bea      5b             pop ebx
│           0x080f8beb      5e             pop esi
│           0x080f8bec      5f             pop edi
│           0x080f8bed      5d             pop ebp
└           0x080f8bee      c3             ret

Qdecode(char *input, int length) returns a malloc‘d buffer, and uses it as an output buffer. For this buffer it allocates length + 1 bytes. var_30h tracks how many bytes have been written, so I refer to it as write_offset.

This might be just enough information to write an initial fuzzer to test it out. First though, since we’re in the mindset of doing reversing work, let’s take a look at the next target function.

Target function: mime_content_type_new_from_string

This is a more difficult function to understand than the last one. It presumably parses Content-Type headers like:

text/html; charset=utf-8; boundary="----=_Part_123"; name="file.txt"

Function prototype

r2 -e scr.color=0 -a x86 -b 32 -qc 'aaa; s sym.mime_content_type_new_from_string; pdf' \
    $ROOT/bin/mailscanner | head -5
            ; CALL XREF from fcn.0807d536 @ +0x302(x)
            ; CALL XREF from fcn.0807f8de @ 0x807fa0c(x)
┌ 2520: sym.mime_content_type_new_from_string (int32_t arg_60h, int32_t arg_64h, int32_t arg_68h);
│ `- args(sp[0x4..0xc]) vars(13:sp[0x10..0x5b])
│           0x08086093      55             push ebp

The signature line shows three pointer-size arguments: arg_60h, arg_64h, arg_68h. We can learn how these arguments are used by looking at functions that call them. The CALL XREF comments at the top list everywhere that reference/call this one.

Here’s the disassembly leading up to the 2nd call site (with some annotation):

r2 -a x86 -b 32 -qc 'aaa; s 0x807fa0c + 5; pd -20' $ROOT/bin/mailscanner
│           0x0807f9c1      c746280000..   mov dword [esi + 0x28], 0
│           0x0807f9c8      c746200000..   mov dword [esi + 0x20], 0
│           0x0807f9cf      c746240000..   mov dword [esi + 0x24], 0
│           0x0807f9d6      83ec0c         sub esp, 0xc
│           0x0807f9d9      50             push eax                    ; int32_t arg_20h
│           0x0807f9da      e86d640000     call sym.mime_content_type_destroy
│           0x0807f9df      c746180000..   mov dword [esi + 0x18], 0
│           0x0807f9e6      83c410         add esp, 0x10
│           ; CODE XREF from fcn.0807f8de @ 0x807f9bf(x)
│           0x0807f9e9      8b3c24         mov edi, dword [esp]
│           0x0807f9ec      85ff           test edi, edi
│       ┌─< 0x0807f9ee      0f84ab000000   je 0x807fa9f
│       │   0x0807f9f4      b9ffffffff     mov ecx, 0xffffffff         ; -1
│       │   0x0807f9f9      b800000000     mov eax, 0
│       │   0x0807f9fe      f2ae           repne scasb al, byte es:[edi] ; inline strlen
│       │   0x0807fa00      f7d1           not ecx
│       │   0x0807fa02      83ec04         sub esp, 4
│       │   0x0807fa05      51             push ecx                    ; int32_t from inline strlen
│       │   0x0807fa06      6a01           push 1                      ; 1 ; int32_t arg_64h flag
│       │   0x0807fa08      ff74240c       push dword [var_ch_3]       ; int32_t "from_string"
│       │   0x0807fa0c      e882660000     call sym.mime_content_type_new_from_string

This is a lot easier to parse in Ghidra’s pseudocode…

Reading the three pushes bottom-to-top (remember cdecl pushes args in reverse order):

  • 1st argument (push dword [var_ch_3]): used in strlen above, likely the string to parse
  • 2nd argument (push 1): literal 1. This is probably a flag or boolean field. We will try in our harness with 1 or 0.
  • 3rd argument (push ecx): result of inlined strlen above, so the length of the string

For our harness, the prototype we’ll assume is: void *mime_content_type_new_from_string(const char *string, int flags, int length). We’ll pass the fuzz input as arg1, 1 as arg2, and strlen(input) as arg3.

Cleanup

The target function name ends in _new_from_string, suggesting it allocates and returns something. There’s a matching mime_content_type_destroy in the exports:

nm -D $ROOT/bin/mailscanner | grep mime_content_type
08085f2b T mime_content_type_copy
08085e4c T mime_content_type_destroy
08085e14 T mime_content_type_get_boundary
08085ddc T mime_content_type_get_filename
08085a47 T mime_content_type_new
08086093 T mime_content_type_new_from_string

The _new/_destroy pair tells us that the parse function allocates something, and we need to call the destroy function after each fuzz iteration to free it. Otherwise we leak memory and the fuzzer (and host system) will slow to a crawl.

Summary

  • char *Qdecode(const char *input, int len) returns an allocated string after it has decoded it
  • void *mime_content_type_new_from_string(const char *s, int flags, int len) returns an allocated structure of some type, which we’ll free with…
  • void mime_content_type_destroy(void *ct) frees the structure allocated by mime_content_type_new_from_string

Next: Part 3: Writing the Fuzzing Harness