Part 5: Optimizations: Speeding up the fuzzer
CMPLOG and COMPCOV from Part 4 help AFL++ get deeper coverage per execution. This part is about getting more executions per second in QEMU-mode. For that, we will take advantage of persistent mode.
Why persistent mode
By default, AFL++ in QEMU mode forks a fresh child for every input. Forking is cheap on Linux, but in QEMU the cost is much higher because the emulator state has to be re-initialized too.
Persistent mode keeps a single QEMU process alive and re-runs a chosen address range in a loop
between iterations. It no longer will fork or run initialization code. The AFL++ docs say it’s a 2-5x improvement, and we’ll
see something in that range on harness_qdecode.
There are three new environment variables to set:
AFL_QEMU_PERSISTENT_ADDRis the address where each iteration starts (typically the start ofmainor whatever per-input function you want to loop on)AFL_ENTRYPOINTis where the forkserver attaches. Place it after ourdlopen/dlsymblock so the initialization only runs onceAFL_QEMU_PERSISTENT_GPR=1restores general-purpose registers at each iteration so things likeargc/argvsurvive the loop. You will crash otherwise.
We also need to extend AFL_QEMU_INST_RANGES to include the harness binary, because that’s where
the persistent and entrypoint addresses live.
Finding the addresses
Disassemble main:
r2 -e scr.color=0 -e asm.bytes=0 -a x86 -b 32 -qc 'aaa; s main; pdf' harness_qdecode | grep -C5 'mov byte.*, 1\|int main\|ret$'
┌ 502: int main (char **envp, int32_t argv);
│ `- args(sp[0x8..0xc]) vars(3:sp[0x1c..0x24])
│ 0x00001240 push ebp ; <--- beginning of function
│ 0x00001241 push ebx
│ 0x00001242 push edi
│ 0x00001243 push esi
│ 0x00001244 sub esp, 0x1c
│ 0x00001247 call 0x124c
│ ; CALL XREF from main @ 0x1247(x)
│ 0x0000124c pop ebx
│ 0x0000124d add ebx, 0x2da8
│ 0x00001253 mov esi, 1
│ 0x00001258 cmp dword [envp], 2 ; if (argc < 2) return 1;
--
│ │││││ 0x000012db call sym.imp.dlsym
│ │││││ 0x000012e0 test eax, eax ; if (log_level) *log_level = 0;
│ ┌──────< 0x000012e2 je 0x12ea
│ ││││││ 0x000012e4 mov dword [eax], 0 ; if (log_level) *log_level = 0;
│ ││││││ ; CODE XREF from main @ 0x12e2(x)
│ └──────> 0x000012ea mov byte [ebx + 0x4c], 1 ; if (!init) { load_lib(); init = 1; }
│ │││││ ; CODE XREF from main @ 0x126a(x)
│ │││└──> 0x000012f1 mov eax, dword [argv] ; <--- past load_lib
│ │││ │ 0x000012f5 mov eax, dword [eax + 4] ; FILE *f = fopen(argv[1], "rb");
│ │││ │ 0x000012f8 lea ecx, [ebx - 0x1fec] ; FILE *f = fopen(argv[1], "rb");
│ │││ │ 0x000012fe mov dword [format], ecx ; const char *mode
--
│ │││ 0x000013cd add esp, 0x1c
│ │││ 0x000013d0 pop esi
│ │││ 0x000013d1 pop edi
│ │││ 0x000013d2 pop ebx
│ │││ 0x000013d3 pop ebp
│ │││ 0x000013d4 ret ; <--- return
│ │││ ; CODE XREF from main @ 0x1288(x)
│ ││└───> 0x000013d5 mov eax, dword [ebx - 0x14] ; fprintf(stderr, "dlopen stubs: %s\n", dlerror()); _exit(1);
│ ││ 0x000013db mov esi, dword [eax]
│ ││ 0x000013dd call sym.imp.dlerror ; fprintf(stderr, "dlopen stubs: %s\n", dlerror()); _exit(1);
│ ││ 0x000013e2 mov dword [whence], eax ; fprintf(stderr, "dlopen stubs: %s\n", dlerror()); _exit(1);
For our speed optimizations we need two things:
- The beginning of the function, which is where we’ll set
AFL_ENTRYPOINTso that the forkserver starts past thelibc_start_mainright into our harness code. Theinitguard makes sureload_lib()only runs once. We’ll also setAFL_QEMU_PERSISTENT_ADDRto this for now since it’s safer than trying to re-use a possibly staleargvif we were to pick a spot later in the function. - The return instruction. In
Qdecodethat would be0x000013d4.
Make sure to add 0x40000000 (the QEMU 32-bit base address) to each offset before exporting them. For example, if main starts at offset0x1240 and the post-init instruction is at 0x12f1:
python3 -c "print(hex(0x40000000+0x1240))"
0x40001240
python3 -c "print(hex(0x40000000+0x13d4))"
0x400013d4
export AFL_ENTRYPOINT=0x40001240
export AFL_QEMU_PERSISTENT_ADDR=0x40001240
export AFL_QEMU_PERSISTENT_RET=0x400013d4
Your offsets may be different. Compiler version, optimization, and even minor source changes shift them around.
Test with afl-qemu-trace
Before launching afl-fuzz, run a single iteration through afl-qemu-trace to confirm
nothing crashes and the persistent loop survives one round:
AFL_USE_QASAN=1 \
AFL_QEMU_INST_RANGES=0x08048000-0x082da000,0x40001000-0x40002000 \
AFL_ENTRYPOINT=0x40001240 \
AFL_QEMU_PERSISTENT_ADDR=0x40001240 \
AFL_QEMU_PERSISTENT_RET=0x400013d4 \
AFL_QEMU_PERSISTENT_GPR=1 \
afl-showmap -Q -o /dev/null -- ./harness_qdecode corpus_qdecode/plain.txt
echo $?
afl-showmap++4.41a by Michal Zalewski
[*] Executing './harness_qdecode'...
-- Program output begins --
-- Program output ends --
+++ Program timed off +++
[+] Hash of coverage map: 917471071e6c3e30
[+] Captured 67 tuples (map size 65536, highest value 7, total values 280) in '/dev/null'.
0
Exit code 0 and no error output for the means the addresses are valid. If you see “forkserver was not
found”, the AFL_ENTRYPOINT or AFL_QEMU_PERSISTENT_ADDR is wrong.
After that, we check against all of our corpus files:
AFL_USE_QASAN=1 \
AFL_QEMU_INST_RANGES=0x08048000-0x082da000,0x40001000-0x40002000 \
AFL_ENTRYPOINT=0x40001240 \
AFL_QEMU_PERSISTENT_ADDR=0x40001240 \
AFL_QEMU_PERSISTENT_RET=0x400013d4 \
AFL_QEMU_PERSISTENT_GPR=1 \
afl-showmap -Q -o /tmp/showmap -i ./corpus_qdecode -- ./harness_qdecode @@
echo $?
afl-showmap++4.41a by Michal Zalewski
[*] Executing './harness_qdecode'...
[*] Reading from directory './corpus_qdecode'...
[*] Scanning './corpus_qdecode'...
...
+++ Program killed by signal 11 +++
-- Program output begins --
-- Program output ends --
[+] Processed 7 input files.
[+] Captured 71 tuples (map size 65536, highest value 5, total values 540) in '/tmp/showmap'.
Uh oh! We see a crash when trying to run our corpus. Run the same command again with AFL_DEBUG=1
tacked on in front of it. Now you should see what file caused the crash.
afl-showmap++4.41a by Michal Zalewski
[*] Executing './harness_qdecode'...
...
+++ Program killed by signal 11 +++
[!] WARNING: crashed: ./corpus_qdecode/truncated.txt
-- Program output begins --
-- Program output ends --
[+] Processed 7 input files.
[+] Captured 71 tuples (map size 65536, highest value 5, total values 540) in '/tmp/showmap'.
So the issue is our ./corpus_qdecode/truncated.txt. We may have accidentally found a bug in this
program we’re fuzzing, but for now we’ll just:
rm -f ./corpus_qdecode/truncated.txt
Now if you run the command above again, it should check out ok!
Bake it into the run script
Once the addresses work, drop them into run_fuzz_qdecode_qasan.sh as exports. The actual
afl-fuzz invocation doesn’t change. Compare your exec/sec in the AFL++ status screen to a
run without persistent mode, and you should see a healthy increase.
Optional knobs
- Use
stdininstead of a file: File I/O takes up a lot of CPU cycles. Try replacing all thefopen,fclose, etc with just anfreadinto a large buffer fromstdinand note the performance increase you get. You’ll probably still want tomalloca buffer andmemcpyfrom your large string buffer into that before calling the target function. This will also make it so we can put theAFL_QEMU_PERSISTENT_ADDRa little further since we don’t rely onargvbeing stable. AFL_QEMU_PERSISTENT_CNTdetermines how many iterations to reuse the same QEMU process before forking a fresh one. Default is 1000. Drop it to 100-500 if your target leaks memory or accumulates state and starts behaving oddly mid-loop. Crank it up to 10000 if the loop is perfectly clean.AFL_QEMU_PERSISTENT_HOOK=/path/to/hook.sobypasses the file-read on each iteration and writes the input straight into the target’s memory. Big additional speedup, but it requires writing a small shared object. See AFL++’sutils/qemu_persistent_hookfor an example and more information.AFL_QEMU_PERSISTENT_RETis an explicit end-of-loop address. SincePERSISTENT_ADDR
is in the middle of a function rather than at its entry, we need to add it so QEMU knows where to end its loop.
When we start at the beginning of a function, QEMU-mode automatically picks up the return address and ends the loop there.
Next: Part 6: Exercise