Integrating libFuzzer with a tightly coupled main function

In the Heroes program, we’re (understandably) encouraged to leave the upstream repository’s files unmodified if possible in our fork. But when building a fuzzer executable with libFuzzer, it is disallowed to have any main function in the source files since libFuzzer links in its own main function.

So when trying to integrate a libFuzzer build into an upstream repository’s existing build system, I often run into the situation where the upstream repository includes essential functionality in the same source file as the main function. Therefore, it is not possible to perform the libFuzzer build by simply excluding that source file from the main build. What is the suggested path forward here?

Some ideas I had, both of which unfortunately involve minor modifications to that source file:

  • Modify the main method name in the fork, e.g. to non_mayhem_main, which should still allow upstream changes to the source file to be merged automatically into the fork
  • Remove the main method entirely in the fork

For more context, this situation applies to repositories that do not have a library build target.

Hi @rnshah9! For context, let’s take a look at a concrete example that has a pretty tightly coupled main file:

If we look at the main.c for this repo:

We see that there’s some significant lift required to run the various initializations for the application. One option would be to optionally compile the main function as an llvmFuzzerTestOneInput function, but as you mentioned this would be modifying the upstream source. Another option would be to copy the main.c file to a separate fuzz-some-function.c , but then we have static copies of main and are introducing drift into the fuzz targets. And as you mentioned prior, this repo doesn’t have an option to “build as a library” (defining a main target):

If there is no library target, the next best option would be to create one.
One approach would be to create another Makefile for fuzzing that omits this main target and expects some application or target with a main defined. If we look closer in the repo, we see a couple of examples with this exact behavior:

So, to create a fuzzing harness, we can follow this example to build this repo as a library that gets called by a libFuzzer target. I’ve created an example here:

As you can see, I follow the hello-embed example pretty closely and call execute-from-string() on the fuzzed data from libFuzzer. The next part is the Makefile, which creates a library target for micropython:

Finally, we can call make on this fuzz target in our Dockerfile:

I found that this target in particular is pretty fragile (it crashes on inputs larger than 54 bytes), so it’s a good idea to take a look at the subroutines and see if some conditional data massaging can help get better results.

For targets that do not have any obvious paths to develop a library target version, you will probably have to create this functionality yourself. It might be helpful to look at the above Makefile as an example.

1 Like

Thanks for the detailed response!

How can we create a libFuzzer harness if a repository has ALL of its source code (including a main function) in a single file, so that no library target that omits main can be created?

For example, GitHub - mayhemheroes/TerminalImageViewer: Small C++ program to display images in a (modern) terminal using RGB ANSI codes and unicode block graphics characters has all of its source code in tiv.cpp.

Hmm, that’s a unique one! Taking a look at the project:

  1. First instinct would be that this doesn’t need harnessing (and indeed the integration shows this). Since this takes file input directly you can easily fuzz it. But what if you wanted to avoid spitting out the result to the terminal, for speed for example?

  2. With all of the source in one file, the maintainers almost have to expect that upstream merge requests will modify the file in some way. So in this case, you could indeed have some #IFDEF’d logic for building, say, a libFuzzer version of the target and a non-fuzzing version. While invasive, it’s certainly less invasive than option 3.

  3. Modularize the repo yourself. It’s not best practice for software repositories to become monolithic. For this particular repo, it’s a small enough tool to not matter, but if this was a large repo with many contributors having a monolithic build/source file becomes problematic. You could offer to assist the maintainers by splitting up and modularizing functionality in the main C file. This is of course a much more invasive approach, and I wouldn’t recommend it for this particular target, but for targets that expect to cover a lot of material and be contributed to by a large number of participants, this is probably best practice.