Andreas Zeller is faculty at the CISPA Helmholtz Center for Information Security and professor for Software Engineering at Saarland University, both in Saarbrücken, Germany. His research on automated debugging, mining software archives, specification mining, and security testing has been highly influential. Zeller is an ACM Fellow and holds an ACM SIGSOFT Outstanding Research Award.

Mail: zeller@cispa.saarland

Phone: +49 681 302-70971

Twitter: @AndreasZeller

by Andreas Zeller and Sascha Just; with Kai Greshake

In 2018, Chen and Chen published the paper "Angora: Efficient Fuzzing by Principled Search" [1] featuring a new gray box, mutation-based fuzzer called Angora. Angora presented extraordinary results in its effectiveness and shines with its stellar performance on the Lava-M benchmark [3], dramatically outperforming all competition.

The reason behind this breakthrough and leap in effectiveness towards deeper penetration of target programs was cited as the combination of four techniques: scalable byte-level taint tracking, context-sensitive branch count, input length exploration, and search based on gradient descent. The former three techniques had already been explored in earlier work; but the novel key contribution, as also indicated by the paper title, was the

One of our students was intrigued by the outstanding performance and wanted to investigate why and how optimization strategies affect the fuzzing performance that much - with the initial goal of actually further improving on Angora. To properly measure the effect of his work, he went and worked directly on the Angora source code. During his study [2], he came across some confusing findings, which we summarize below.

Of course, this raises the question of why Angora is so successful. Magic byte extraction is not a new technique, but in the case of this benchmark, in combination with taint tracking, good heuristics, and other improvements it leads to spectacular results on LAVA-M. However, Gradient Descent contributed little to that success.

These findings are confirmed by Angora's authors [5]: "If the fuzzer can extract the magic bytes and copy them to the

Since AFL uses random mutations to solve constraints, it is not surprising that all constraints that can be easily solved with random mutations have been already solved in the input corpus. This gives random search an immediate disadvantage.

While inspecting the code of Angora, we also found another issue. The evaluation ([1, Section 5.4]) states that Gradient Descent is compared to random search with Magic Byte extraction. However, the Angora Gradient Descent implementation itself makes use of Magic Byte Extraction, thus guaranteeing that it always performs

The Angora authors state that their "exploitation routines are a part of the enhancements aforementioned. During gradient descent, Angora first descends by the entire gradient. Only when this fails to solve the constraint does Angora try to descend by each partial derivative as a crude attempt of salvation." [5] According to our findings, however, these heuristics dominate the search of Gradient Descent for constraints with a large number of input dimensions – a fact that should have been discovered in an adequate evaluation setting.

- The Gradient Descent implementation goes beyond what is described in the paper. It contains special exploitation routines that inject fixed values that are known to trigger overflows and numerical bugs. The algorithm also does not only use the entire gradient but instead descends by each partial derivative.
- Parameters like the optimization budget are set to fixed, arbitrary values without justification or documentation for the chosen parameters:
`// SEARCH`

pub const ENABLE_DET_MUTATION: bool = true;

pub const MAX_SEARCH_EXEC_NUM: usize = 376;

pub const MAX_EXPLOIT_EXEC_NUM: usize = 66;

pub const MAX_NUM_MINIMAL_OPTIMA_ROUND: usize = 8;

pub const MAX_RANDOM_SAMPLE_NUM: usize = 10;

pub const GD_MOMENTUM_BETA: f64 = 0.0;

pub const GD_ESCAPE_RATIO: f64 = 1.0;

pub const BONUS_EXEC_NUM: usize = 66;

Values like`MAX_SEARCH_EXEC_NUM`

(376; the number of iterations to solve a constraint) or`BONUS_EXEC_NUM`

(66; the number of additional iterations added under certain conditions) would be odd choices in an experiment design; computer scientists would normally prefer multiples of powers of 10, 2, or 5. These choices are not justified or documented; and in any case, the impact of these choices (compared to others) would have to be determined.

The value of 376 for`MAX_SEARCH_EXEC_NUM`

is a particularly odd choice. Since the few constraints that Gradient Descent operated on were actually either solved after a maximum of 50 iterations or not at all, setting the iteration budget to at most 50 should actually increase the performance on LAVA-M.

- Angora's novel Gradient Descent approach has little to no impact on its performance on LAVA-M;
- The performance of Angora is impacted by several decisions that are not documented, motivated, assessed, or evaluated individually or sufficiently; and
- The evaluation methodology is inadequate to support the paper's conclusions on Angora's performance.

It would be a mistake, however, to point at the Angora authors only. Chen and Chen

This calls for the community to discuss and question its procedures and criteria on what makes good research. Reviewers and researchers must not only focus on great results, but also assess how these results have been obtained, whether they have been rigorously evaluated, and how they translate into insights that will stand the test of time. How this can be achieved for programs with thousands of lines of undocumented code is a pressing question.

For now, the Angora case tells us how much we do

- [1] P. Chen and H. Chen, "Angora: Efficient Fuzzing by Principled Search." 2018 IEEE Symposium on Security and Privacy (IEEE S&P), San Francisco, CA, 2018, pp. 711-725.
- [2] K. Greshake, “Effective Search Algorithms for Grey Box Fuzzing” BSc. Thesis at Saarland University, August 2019.
- [3] B. Dolan-Gavitt et al., "LAVA: Large-Scale Automated Vulnerability Addition," 2016 IEEE Symposium on Security and Privacy (IEEE S&P), San Jose, CA, 2016, pp. 110-121.
- [4] Valentin J.M. Manes, HyungSeok Han, Choongwoo Han, Sang Kil Cha, Manuel Egele, Edward J. Schwartz, Maverick Woo, "The Art, Science, and Engineering of Fuzzing: A Survey". arXiv:1812.00140
- [5] P. Chen, H. Chen, A. Zeller. "Questions on Angora tool vs Angora paper". E-mail exchange, September/October 2019.