AFL源码分析

[toc]

文件处理过程

源码分析

afl-gcc

afl-gcc是gcc/g++的wrapper,调用gcc/g++
afl-gcc设置-B direcotry
gcc/g++调用afl对应的cpp、cc1/cc1/plus、as、ld。

/*
  afl-gcc.c:main()
 */

find_as(argv[0]); //查找argv[0]的目录(即afl-gcc的目录)供 edit_params()函数使用

edit_params(argc, argv); //设置 -B 选项和参数

execvp(cc_params[0], (char**)cc_params); //调用 gcc

afl-as

AFL的代码插桩，是在将源文件编译为汇编代码后，通过afl-as完成的
afl-as是as的封装
afl-as会在汇编代码的代码相应位置插入统计代码，然后调用真正的as进行汇编。统计代码是在afl-as.h文件中，afl-as负责找到每个 basic block 插入 afl-as.h中的统计代码。afl-as.c:main()主要调用了两个函数：

/*
afl-as.c:main()
*/

edit_params(argc, argv); //调整传递给真正的汇编器`as`的参数。

add_instrumentation(); //判断分支，插入统计代码

add_instrumentation()

Process input file, generate modified_file. Insert instrumentation in all the appropriate places.

查找代码部分
只对代码部分插桩
利用代码文件的符号排列格式进行判断
如果是代码部分则会将instr_ok置1

查找基本块

方法
- 标识符：以 “点号”(.)开始,以“冒号”(:)结束，中间是字母数字组合
- 跳转指令：
  - 一般是进行比较根据比较结果来决定是否跳转(如 jnz xxx)，条件跳转指令的下一条也是一个 basic block 的开始处。
  - 如果是跳转指令则在指令后插入统计代码
- instrument_next置1

插桩

fprintf(outf, use_64bit ? trampoline_fmt_64 : trampoline_fmt_32, R(MAP_SIZE));

这里通过fprintf()将格式化字符串添加到汇编文件的相应位置

R(MAP_SIZE)即为0到MAP_SIZE之间的一个随机数。

eg.trampoline_fmt_32

static const u8* trampoline_fmt_32 =

  "\n"
  "/* --- AFL TRAMPOLINE (32-BIT) --- */\n"
  "\n"
  ".align 4\n"
  "\n"
  "leal -16(%%esp), %%esp\n"
  "movl %%edi, 0(%%esp)\n"
  "movl %%edx, 4(%%esp)\n"
  "movl %%ecx, 8(%%esp)\n"
  "movl %%eax, 12(%%esp)\n"
  "movl $0x%08x, %%ecx\n"
  "call __afl_maybe_log\n"
  "movl 12(%%esp), %%eax\n"
  "movl 8(%%esp), %%ecx\n"
  "movl 4(%%esp), %%edx\n"
  "movl 0(%%esp), %%edi\n"
  "leal 16(%%esp), %%esp\n"
  "\n"
  "/* --- END --- */\n"
  "\n";

这一段汇编代码，主要的操作是：

保存edi等寄存器
将ecx的值设置为fprintf()所要打印的变量内容
调用方法__afl_maybe_log()
恢复寄存器

__afl_maybe_log是插桩代码所执行的实际内容

fork server

确定性 -fork- 子进程随机性

插桩

补充知识：

gcc/cc1

The g++ is a compiler driver. It knows how to invoke the actual compiler (cc1plus), assembler and linker. It does not know how to parse or compile the sources.