Part 5: Optimizations: Speeding up the fuzzer
CMPLOG and COMPCOV from Part 4 help AFL++ get deeper coverage per execution. This part is about getting more executions per second in QEMU-mode. For that, we will take advantage of persistent mode.
Why persistent mode
By default, AFL++ in QEMU mode forks a fresh child for every input. Forking is cheap on Linux, but in QEMU the cost is much higher because the emulator state has to be re-initialized too.
Persistent mode keeps a single QEMU process alive and re-runs a chosen address range in a loop
between iterations. It no longer will fork or run initialization code. The AFL++ docs say it’s a 2-5x improvement, and we’ll
see something in that range on harness_qdecode.
There are three new environment variables to set:
AFL_QEMU_PERSISTENT_ADDRis the address where each iteration starts (typically the start ofmainor whatever per-input function you want to loop on)AFL_ENTRYPOINTis where the forkserver attaches. Place it after ourdlopen/dlsymblock so the initialization only runs onceAFL_QEMU_PERSISTENT_GPR=1restores general-purpose registers at each iteration so things likeargc/argvsurvive the loop. You will crash otherwise.
We also need to extend AFL_QEMU_INST_RANGES to include the harness binary, because that’s where
the persistent and entrypoint addresses live.
Finding the addresses
Disassemble main:
$ r2 -e scr.color=0 -a x86 -b 32 -qc 'aaa; s main; pdf' harness_qdecode | grep -C5 'mov byte.*, 1\|int main\|ret$'
┌ 502: int main (char **envp, int32_t argv);
│ `- args(sp[0x8..0xc]) vars(3:sp[0x1c..0x24])
| ; harness_qdecode.c:38int main(int argc, char **argv) {
│ 0x00001240 55 push ebp
│ 0x00001241 53 push ebx
│ 0x00001242 57 push edi
│ 0x00001243 56 push esi
│ 0x00001244 83ec1c sub esp, 0x1c
-- ... ... ... ...
│ │││││ 0x000012db e810feffff call sym.imp.dlsym
│ │││││ ; harness_qdecode.c:35:9 if (log_level) *log_level = 0;
│ │││││ 0x000012e0 85c0 test eax, eax
│ ┌──────< 0x000012e2 7406 je 0x12ea
│ ││││││ ; harness_qdecode.c:35:31 if (log_level) *log_level = 0;
│ ││││││ 0x000012e4 c70000000000 mov dword [eax], 0
│ ││││││ ; CODE XREF from main @ 0x12e2(x)
│ ││││││ ; harness_qdecode.c:41:35 if (!init) { load_lib(); init = 1; }
│ └──────> 0x000012ea c6834c0000.. mov byte [ebx + 0x4c], 1
│ │││││ ; CODE XREF from main @ 0x126a(x)
│ │││└──> 0x000012f1 8b442434 mov eax, dword [argv] ; harness_qdecode.c:0:35
│ │││ │ ; harness_qdecode.c:43:21 FILE *f = fopen(argv[1], "rb");
│ │││ │ 0x000012f5 8b4004 mov eax, dword [eax + 4]
│ │││ │ ; harness_qdecode.c:43:15 FILE *f = fopen(argv[1], "rb");
│ │││ │ 0x000012f8 8d8b14e0ffff lea ecx, [ebx - 0x1fec]
│ │││ │ 0x000012fe 894c2404 mov dword [format], ecx ; const char *mode
You’re looking for two offsets:
- The beginning of the function, which is where we’ll set
AFL_ENTRYPOINTso that the forkserver starts past thelibc_start_mainright into our harness code. Theinitguard makes sureload_lib()only runs once. - The start of the per-iteration work. For this we look past the initialization/
load_libpart and look for where the testcase gets loaded from a file into memory. In this output it looks like that would be0x000012f1, just after theinitblock exits and beforefopen.
Make sure to add 0x40000000 (the QEMU 32-bit base address) to each offset before exporting them. For example, if main starts at offset0x1240 and the post-init instruction is at 0x12f1:
$ python3 -c "print(hex(0x40000000+0x12f1))"
0x400012f1
$ export AFL_QEMU_PERSISTENT_ADDR=0x400012f1
$ python3 -c "print(hex(0x40000000+0x1240))"
0x40001240
$ export AFL_ENTRYPOINT=0x40001240
Your offsets may be different. Compiler version, optimization, and even minor source changes shift them around.
Test with afl-qemu-trace
Before launching afl-fuzz, run a single iteration through afl-qemu-trace to confirm
nothing crashes and the persistent loop survives one round:
AFL_USE_QASAN=1 \
AFL_QEMU_INST_RANGES=0x08048000-0x082da000,0x40001000-0x40002000 \
AFL_ENTRYPOINT=0x40001240 \
AFL_QEMU_PERSISTENT_ADDR=0x400012f1 \
AFL_QEMU_PERSISTENT_GPR=1 \
afl-showmap -Q -o /dev/null -- ./harness_qdecode corpus_qdecode/plain.txt
echo $?
Exit code 0 and no error output for the means the addresses are valid. If you see “forkserver was not
found”, the AFL_ENTRYPOINT or AFL_QEMU_PERSISTENT_ADDR is wrong.
Bake it into the run script
Once the addresses work, drop them into run_fuzz_qdecode_qasan.sh as exports. The actual
afl-fuzz invocation doesn’t change. Compare your exec/sec in the AFL++ status screen to a
run without persistent mode, and you should see a healthy increase.
Optional knobs
- Use
stdininstead of a file: File I/O takes up a lot of CPU cycles. Try replacing all thefopen,fclose, etc with just anfreadinto a large buffer fromstdinand note the performance increase you get. You’ll probably still want tomalloca buffer andmemcpyfrom your large string buffer into that before calling the target function. **AFL_QEMU_PERSISTENT_CNT**determines how many iterations to reuse the same QEMU process before forking a fresh one. Default is 1000. Drop it to 100-500 if your target leaks memory or accumulates state and starts behaving oddly mid-loop. Crank it up to 10000 if the loop is perfectly clean.**AFL_QEMU_PERSISTENT_HOOK=/path/to/hook.so**bypasses the file-read on each iteration and writes the input straight into the target’s memory. Big additional speedup, but it requires writing a small shared object. See AFL++’s[utils/qemu_persistent_hook](https://github.com/AFLplusplus/AFLplusplus/tree/stable/utils/qemu_persistent_hook)for an example and more information.**AFL_QEMU_PERSISTENT_RET**is an explicit end-of-loop address. SincePERSISTENT_ADDR
is in the middle of a function rather than at its entry, we need to add it so QEMU knows where to end its loop.
When we start at the beginning of a function, QEMU-mode automatically picks up the return address and ends the loop there.
Next: Part 6: Exercise