CS8395 — Homework 3

Fuzzing

This assignment is due Monday, April 18 at 11:59PM Central time.

Warning: The Tools Can Take Hours
The tools you must use for this assignment literally take multiple hours to run. Some students reported that it took over 14 hours to run! (However, others were able to finish in about five minutes. Regardless of when you finish, everything is fine.) Even if you are fast and can finish your work at the last minute, the tools are not and can not. On our high-powered multi-core rack-mounted RAID-storage test machine it took 6.3 hours to run the the AFL tool. However, as soon as you get enough data (see below) you can stop early.

In this assignment you will create a high-coverage test suite using an automated technique called fuzzing.

This assignment consists of two parts: (1) building and measuring coverage of a subject program, and (2) using a fuzzer to generate high-coverage inputs for that subject program.

The starter package is available here: https://kjl.name/cs8395/hw3.zip. The package contains two files: (1) afl-2.52b.tar.gz, the fuzzer, and (2) libpng-1.6.34.tar.gz, an image processing library.

Turn-in for HW3

Read this assignment document and complete the tasks described. The final deliverable is a zip file containing your written PDF report. There is no length requirement or limit for the report, but I expect this will take 3 pages or fewer (depending on the size of your screenshots where appropriate).

Ensure that your name, VUNetID, and email address appear at the top of each of page.

Ensure that you have two sections to your submission labeled: Task 1 and Task 2.

I strongly recommend the use of LaTeX to complete this (and all) assignments. LaTeX is the lingua franca for scientific communication in computer science — every peer-reviewed publication I have submitted and reviewed has been written in LaTeX. You can use overleaf.com to help you write LaTeX documents. I have also created a template that you can copy, available here.

Background: Testing and Coverage

Recall from lecture that the predominant software development activity is testing, the act of writing inputs for programs and evaluating the outputs they produce. Developers write multiple test cases to form a test suite, which is then used to check if a program is doing the right thing.

From a software security perspective, developing high quality test suites is an important part of minimizing the likelihood that a vulnerability could admit an attack against it. But, how do we make sure we have a good test suite? If our test suite does not check everything the program could do, there may be untested portions of code that might contain vulnerabilities.

We often measure the quality of a test suite by checking coverage. Coverage refers to the fraction of statements that are executed by the test suite. Low code coverage means that there are portions of code in the software that are not executed by the program. To compute coverage, we follow a few steps:

  1. We instrument the program to compute coverage.
    We need the program to help tell us which statements are executed when provided with a test case input.
  2. For each test case in the test suite, we
    1. run the program with the test suite
    2. save the number of each of the statements that were executed.
  3. After finishing all test cases, we report the total statements executed by all test cases divided by the total number of statements in the program.
Example for code coverage
Sample program
Test suite

Note: for this assignment, we are considering only statement coverage. Other types of coverage (branch, MC/DC, path, condition) are beyond the scope of this course.

Background: Fuzzing

Fuzzing is an automated technique for generating test inputs. In practice, it is very difficult for human developers to manually create high-coverage test suites. Again, from a software security perspective, many software attacks involve the creation of an exploitative input to a program that causes unintended behavior (recall stack overflows from HW1). Thus, we may want to harden software by proactively generating test inputs that can cover very specific paths of execution for which an intuitive human-created test input may not be possible to create.

Fuzzing operates by starting with a seed input and a subject program. Fuzzers will automatically mutate the seed input, then execute the subject program against that mutated input, and measures the coverage. Given enough time, fuzzers can create many unique test inputs that cover many paths over the subject program. In this assignment, you will use a fuzzer to generate "interesting" inputs to an image processing library.

Task 1: libPNG, the subject program, and coverage

For Task 1, you will use a utility called gcov to compute the coverage of a subject program. The subject program is the portable network graphics file format reference library, libpng. It is used for image manipulation and conversion.

We will be using the developer-provided pngtest driver. It takes as input a single .png file. pngtest will attempt to read and parse the input PNG file, and then outputs a copy of the image as pngout.png.

We compute coverage for this program using the gcov test coverage program for gcc. Only statement coverage is considered.

The reference implementation (version 1.6.34) is contained within the starter package. It contains about 85,000 lines of code spread over about 70 files.

A test suite for this application is a collection of .png files. The reference implementation comes with over 150 such example files. Feel free to use an image manipulation or conversion program to create your own files or to use files or benchmarks you find on the Internet.

Building libPNG and Producing Coverage Details

The gcov utility usually comes with gcc, so you should not need to do anything specific to install it if you are working with your Kali VM. However, our subject program depends on the development version of a compression library (you probably already installed this during HW2):

$ sudo apt-get install libz-dev 
In addition, you will have to compile the project with coverage instrumentation (taking care to use static linking or to otherwise make all of the libpng source files, and not just the test harness, visible to coverage):

Note how gcov gives a separate report for each source file specified on the command line, but also gives a sum total for statement coverage (the final Lines executed reported). Your coverage numbers may be slightly different from the example ones listed above.

Note that gcov *.c will report the sum total for statement coverage for all of the tests you execute. That is, if you use pngtest for multiple different images back to back, using gcov will report the collective statement coverage for all the previous tests! If you want to "reset" coverage, be sure to delete the *.gcda files!

The png.c.gcov (etc.) files created by this processed are marked-up source files that include coverage and visitation information. You can view them to determine which lines and branches were covered. Note that lcov, a graphical front-end to gcov, may help you interpret that output for large projects. While it is possible to obtain branch coverage using gcov (e.g., -b -c) we will not for this assignment.

Note that pngtest creates pngout.png. (This can confuse some students, because if you collect coverage after running on *.png and then do it again, you may get a different answer the second time if you're not careful because of the newly-added file!) I recommended deleting pngout.png between each test case.

Turn-in for Task 1: libPNG Coverage

For this task, you must document that you have built libPNG and that you can measure coverage using the gcov utility. Turn in the following:

  • Document that you are able to run the pngtest program against any of the built in testcases. A screenshot of statement coverage suffices.
  • Demonstrate that you can increase coverage by incorporating at least 3 PNG images (you can use whatever images you like). Provide a screenshot where you show an increase in coverage compared to your original base coverage shown in the first point.
  • Write a paragraph explaining what you think leads to different coverage values for different test cases for pngtest.

Task 2: American Fuzzy Lop and test input generation

Now that you can collect coverage with manually-curated PNG input test cases, we will now use a fuzzing utility to automatically create PNG test inputs to improve coverage.

The associated test input generation tool is American Fuzzy Lop, version 2.52b. This version is available in the starter package. You can also visit the project webpage for documentation. As the (punny? bunny?) name suggests, it is a fuzz testing tool.

You will use AFL to create a test suite of png images to exercise the pngtest program, starting from all of the png images provided with the original tarball. Note that you can terminate AFL as soon as you reach 510 paths_total (also called total paths in the GUI)— AFL does not stop on its own. You are finished with Task 2 once you reach 510 paths_total.

Building AFL-compatible pngtest

Your goal here is to use AFL to fuzz inputs for the pngtest program. To do this, we need to use a special AFL-aware gcc to compile pngtest (consider for your writeup: why do we need a special compiler here? can we fuzz inputs without a specially-compiled subject program? Could you fuzz a random binary for which you have no source code?).

This will create a new version of the pngtest executable that is separate from the one you produced in Task 1. I strongly recommend copying your Task 1 pngtest somewhere else before you rerun the configure and make commands below. Indeed, consider moving the libpng directory from Task 1 elsewhere, then untar'ing the libpng tarball again. The newly-created AFL-instrumented pngtest will not report coverage in the same way with gcov. The Task 2 binary is used to create new test cases. Once you have a good test suite, you will run that suite against your Task 1 binary to measure how much better the coverage is!

AFL claims that one of its key advantages is ease of use. According to its quick start guide, you can extract the AFL tarball and run make in the afl-2.52b directory. Then, change back to your libpng-1.6.34 directory and re-compile libpng with a configure line like:

$ CC=/home/kali/hw3/afl-2.52b/afl-gcc_(don't_just_copy_this_unchanged) ./configure --disable-shared CFLAGS="-static" 
$ make clean; make

The "CC" bit will look something like (but maybe different for you) CC=/home/kali/hw3/afl-2.52b/afl-gcc — note that there is no trailing slash, it is referring to the special afl-gcc utility that helps instrument the target program for coverage. If you see configure: error: C compiler cannot create executables, double-check your spelling here.

Note that you will need to run make again after configure to actually compile.

Running AFL against the pngtest program

AFL operates by manipulating input files contained in a starting testcase directory (called "testcase_dir" by default). So, in your libpng directory, create a new directory called testcase_dir, then place some starting PNG images there to use as AFL's seed inputs. You can pick your favorite 3 images from Task 1, or you can copy over all the images you can find; you can even save PNG files from online and start there.

Next, we will use afl-fuzz to begin the fuzzing process with a command like this:

$ sudo su
# echo core /proc/sys/kernel/core_pattern
# exit
$ /home/kali/hw3/afl-2.52b/afl-fuzz(_don't_just_copy_this_unchanged) -i testcase_dir -o findings_dir -- /path/to/pngtest_(not_.c_nor_.png_but_the_task_2_executable_you_built) @@

Go back and double-check the end of the previous line for @@. It is required, it is not a typo, and if you did not type it in (to tell AFL where its randomly-created arguments to pngtest go) you are likely to "get stuck" when enumerating paths and tests (see FAQ below).

Note that findings_dir is a new folder you make up: afl-fuzz will puts its results there.

Note that you must stop afl-fuzz yourself, otherwise it will run forever — it does not stop on its own. Read the Report instructions below for information on the stopping condition and knowing "when you are done."

Note also that you can resume afl-fuzz if it is interrupted or stopped in the middle (you don't "lose your work"). When you try to re-run it, it will give you a helpful message like:

To resume the old session, put '-' as the input directory in the command
line ('-i -') and try again.
Just follow its directions. Warning: when you resume AFL it will overwrite your findings/plot_data file (which you need for the final report), so be sure to save a copy of that somewhere before resuming.

Note that afl-fuzz will likely abort the first few times you run it and ask you to change some system settings (e.g., echo core | sudo tee /proc/sys/kernel/core_pattern, echo core >/proc/sys/kernel/core_pattern etc.). For example, on Ubuntu systems it often asks twice. Just become root and execute the commands. Note that sudo may not work for some of the commands (e.g., sudo echo core >/proc/sys/kernel/core_pattern will fail because bash will do the > redirection before running sudo so you will not yet have permissions, etc.) — so just become root (e.g., sudo su) and then execute the commands in a root shell.

Examining test cases produced by AFL

Once you reach 510 paths_total, you can Ctrl+C to exit AFL. The test cases produced by AFL are located in the findings/queue/ directory. They will not have the .png extension (instead, they will have names like 000154,sr...pos/36,+cov), but you can rename them — consider writing a bash script to do this for you: something like for i in `ls findings/queue`; do cp $i ./findings/queue/$i.png; done.

Help — Can't Open AFL's Output?

You will almost certainly find that AFL's output queue folder does not contain files with the .png extension. In addition, you will almost certainly find that most of the files produced by AFL are "invalid" PNG files that cause program executions to cover error cases (e.g., libpng read error).

This is normal.

Double-check all of the instructions here, and the explanations in the course lectures for how these tools work: there's no need to panic. In addition, you might try alternate image viewers (rather than just the default one). For example, multiple students have reported that uploading apparently-invalid images to GitHub (no, really) works well for viewing them.

Turn-in for Task 2: Coverage for AFL-generated inputs

For this task, you must document that you have created new test inputs with AFL, and that you can compute coverage with those new test cases. Provide the following.

  • Demonstrate that you have used AFL to create new test cases. Provide a screenshot of your ./findings/queue directory.
  • Demonstrate that your newly-generated test cases achieve higher coverage. Run all of the generated images against your Task 1 pngtest binary and use gcov *.c to report the coverage. Provide as screenshot of the resulting coverage.
  • Write a paragraph commenting on the types of files AFL generated as inputs. Why are so many generated images not viewable by typical image viewers?
  • Write a paragraph explaining what you think afl-gcc is doing and why it is required when building the Task 2 pngtest binary. Could fuzzing work without a specially-compiled program? Why or why not?

What to turn in for HW3

you must submit a single .zip file called vunetid.zip. While you can work with others in the class conceptually, please submit your own copy of the assignment. Your zip file must contain:

Use the submission system (VU login required) to submit your zip file.

FAQ and Troubleshooting

In this section we detail previous student issues and resolutions:

  1. Question: Using AFL, I get:

    ERROR: PROGRAM ABORT : Test case 'xxxxxx/pngtest' is too big (2.25 MB, limit is 1.00 MB)
    

    Answer: You are mistakenly passing the pngtest executable in as a testcase to itself. Try putting your pngtest executable one directory above from your testcase_dir. In other words, rather than having it in the same folder as your test images (testcase_dir), put it in the directory that testcase_dir is in, and adjust /path/to/pngtest accordingly.

  2. Question: My AFL session has 0 cycles done but the total paths counter does increment. I am worried.

    Answer: Everything is fine. It is entirely possible to complete the assignment with 0 cycles done. (AFL can enumerate quite a few candidate test cases — enough for this assignment — before doing a complete cycle.)

  3. Question: Using AFL, I get:

    [-]  SYSTEM ERROR : Unable to create './findings_dir/queue/id:000000,orig:pngbar.png'
    

    Answer: This is apparently a WSL issue, but students running Linux who ran into it were able to fix things by making a new, fresh VM.

  4. Question: Using AFL, I get:

    [-] PROGRAM ABORT : Program 'pngtest' not found or not executable
    
    or
    [-] PROGRAM ABORT : Program 'pngnow.png' is not an ELF binary
    

    Answer: You need to use the right /path/to/pngtest instead of just pngtest. You must point to the pngtest executable (produced by "make") and not, for example, pngtest.png.

  5. Question: Using AFL, I get:

    [-] PROGRAM ABORT: Program 'pngtest' is a shell script 
    

    Answer: You must recompile libpng carefully following the instructions above, including the explanation about "CC=..." and "--disable-shared" and the like. Example showing that a normal build produces a shell script while a more careful AFL-based build produces an executable:

     
    $ ./configure >& /dev/null ; make clean >& /dev/null ; make >& /dev/null ; file ./pngtest
    
    ./pngtest: Bourne-Again shell script, ASCII text executable
    
    
    $ CC=~/hw3/afl-2.52b/afl-gcc ./configure --disable-shared CFLAGS="-static" >& /dev/null ; make clean >& /dev/null ; make >& /dev/null ; file ./pngtest
    
    ./pngtest: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=bec3dc8e4b3feff6660f9339368f5c1ec5f55ab9, with debug_info, not stripped
    
  6. Question: When I try to run AFL, I get:

    [-] PROGRAM ABORT : No instrumentation detected
    

    Answer: You are pointing AFL to the wrong pngtest executable. Double-check the instructions near $ CC=/path/to/afl-gcc ./configure --disable-shared CFLAGS="-static" , rebuild pngtest using "make", and then point to exactly that executable and not a different one.

  7. Question: When I try to run configure with AFL via something like CC=/home/kali/hw3/afl-2.52b/afl-gcc/ ./configure --disable-shared CFLAGS="-static" , I get:

    checking whether the C compiler works... no
    configure: error: in `/home/kali/hw3/libpng-1.6.34':
    configure: error: C compiler cannot create executables
    

    Answer: You need to specify afl-gcc, not afl-gcc.c or afl-gcc/ (note trailing slash!).

  8. Question: When I am running AFL, it gets "stuck" at 163 (or 72, or another small number) paths.

    Answer: In multiple instances, students had forgotten the @@ at the end of the AFL command. Double check the command you are running!

  9. Question: When I try to compile libpng with AFL, I get:

    configure: error: C compiler cannot create executables
    

    Answer: You need to provide the full path to the afl-gcc executable, not just the path to hw3/afl-2.52b/.

  10. Question: Some of the so-called "png" files that AFL produces cannot be opened by my image viewer and may not even be valid "png" files at all!

    Answer: Yes, you are correct. (Thought question: why are invalid inputs sometimes good test cases for branch coverage?)

  11. Question: How can I make AFL run faster?

    Answer: One anonymous student suggests:

    Is your AFL running slow? Are you getting less than 30/sec on the exec speed? Have you been running for 21+ hours like me and are frustrated that you haven't found any new paths in the last 4 hours?

    Try making a copy of your test image directory, then remove any "large" test images from this new directory (I deleted all test images over 100KB), and then try running a new AFL session with this new input directory, and a new output directory. Each "session" of AFL basically runs in a single thread, so it seems to be fine running two different sessions at once, with different input/output directories. I watched as my new run (with small test image files) consistently ran with an exec speed of 500-1000, and achieved 600 total paths in under 7 minutes, all while safely letting my old session continue to run.

    tl;dr Don't use lots of "large" images with AFL (large roughly being >100KB)