(Almost) 6 Months of Rust Runtime Performance
After reading this issue, I decided to poke around in the history of
rustc’s runtime performance.
NOTE: The benchmarks are now normalized against the first result in each individual series. So a value of 1.0 means “roughly similar performance as the first measure in the series.”
After some feedback on the reddit thread where I posted this, I’ve made a few changes here:
- Posted the benchmark manifest and output json to https://github.com/anp/rust-runtime-benchmarks. Note that not all benchmarks in the manifest are included because they may have had all of their benchmarks ignored. secondstring doesn’t have parameters to include which flags are called on
- I changed all of the graphs to use a more sensible y-axis. There’s now a 50% window above and below the normalized values.
- Fixed some typos.
- Normalized all benchmark results against those from the first date the benchmark was successfully run (9/1/15 in most cases).
- Set the same x axis for all graphs (9/1/15 - 2/11/16).
This data was generated using secondstring, a benchmark tracking tool I’m working on (it’s still pretty rudimentary right now). At the moment it only supports running specified crate versions and git repo commits against a range of nightly compiler dates. It doesn’t have any analysis built-in and just saves some JSON of the results. Soon I’d like to add automatic regression detection, but for now I just wanted to do a little graphing of the data.
Using my completely 100% scientific method of benchmark selection (a combination of finding benchmarks among the most downloaded crates and searching crates.io for “benchmark”), I picked recent commits from ~20 repositories that support running
cargo bench in their root directory and ran their benchmarks against nightly compilers from 9/1/15 to 2/11/16. The repositories’ benchmarks were pinned to a single commit each, so this runs the same code across all of the compiler versions. All benchmarks were run on the same machine (i7-6700k) running Arch Linux.
I took the geometric mean of all benchmark functions for each day, and plotted them below. This should (although I’m certainly not a statistician) provide a performance index for each crate that varies independently of the relative benchmark times of functions (i.e. a 10% variation in a 1000ns function should affect the geometric mean about as much as a 10% variation in a 10ns function).
I’d be more than interested in hearing about a better way to reduce these benchmarks to a single index, as I’d eventually like to include a regression detector in secondstring. I had initially tried graphing each bench function separately, but that’s hundreds of series and I just don’t think it’s practical.
Since I’m not terribly familiar with geometric means, I don’t know what percentage variation in the index corresponds to in terms of actual performance. From the reading I’ve done, it seems like a 25% change in a single benchmark function would have a ~12% change in the index, which can be stacked multiple times for shifts in multiple bench functions.
All of the code for generating the graphics is at the end of this post.