Profile-Guided Optimization (PGO) benchmark report #7

zamazan4ik · 2024-09-06T08:06:15Z

Hi!

Recently I read the article about Candystore. As far as I see, the project cares a lot about its performance. That's why I decided to perform some benchmarks with more advanced compiler optimizations.

As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects (including many optimized databases, parsers, compilers, etc.) are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on Candystore to see if the performance win (or lose) can be achieved. Here are my benchmark results.

Test environment

Fedora 40
Linux kernel 6.10.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.83.0-nightly
Candystore version: main branch on commit 263d385c1bd0ba00a5d813a1fbf789c72baf9ae8
Disabled Turbo boost

Benchmark

For benchmark purposes, I use built-in into the project benchmarks - candy-perf. For PGO optimization I use cargo-pgo tool. All measurements, benchmark and PGO training are done with the same command - taskset -c 0 candy_perf.

taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).

Also, I decided to enable LTO (lto = true for [profile.release] in the root Cargo.toml) for candy-perf - it can help the compiler perform more aggressive optimizations.

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/3c48e2876d2808eb7fc92a5ff0ba745c
LTO: https://gist.github.com/zamazan4ik/7fdcd2b05498fb32eed90b26d360193d
LTO + PGO optimized compared to Release: https://gist.github.com/zamazan4ik/26e29e87fe5b94621039ee9369c43152
(just for reference) LTO + PGO instrumented compared to Release: https://gist.github.com/zamazan4ik/444d90bf4ba915f098cae7ab333ed2ad

According to the results, PGO improves the Candystore's performance in many cases.

Also, I did quick measurements about the candy-perf's binary size:

Release: 895 Kib
LTO: 719 Kib
LTO + PGO optimized: 733 Kib
LTO + PGO instrumented: 1.5 Mib

Further steps

I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about Candystore's performance in their workloads. Maybe a small note somewhere in the documentation will be enough to raise awareness about this work.

Thank you.

P.S. I just created the Issue since Discussions are disabled for the repo. Don't treat the issue like the issue - it's more an improvement idea.

The text was updated successfully, but these errors were encountered:

tomerfiliba · 2024-09-06T16:16:14Z

Thanks @zamazan4ik for this thorough review. I will link to it from the README.

Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform

zamazan4ik · 2024-09-07T12:29:21Z

Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform

You are right - PGO helps with optimizing only the CPU part of a workload. IO won't magically go away but still - the CPU part will be improved, and the benchmarks above show some performance improvements even after optimizing "only" the CPU part of the performance.

tomerfiliba closed this as completed Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark report #7

Profile-Guided Optimization (PGO) benchmark report #7

zamazan4ik commented Sep 6, 2024 •

edited

Loading

tomerfiliba commented Sep 6, 2024

zamazan4ik commented Sep 7, 2024

Profile-Guided Optimization (PGO) benchmark report #7

Profile-Guided Optimization (PGO) benchmark report #7

Comments

zamazan4ik commented Sep 6, 2024 • edited Loading

Test environment

Benchmark

Results

Further steps

tomerfiliba commented Sep 6, 2024

zamazan4ik commented Sep 7, 2024

zamazan4ik commented Sep 6, 2024 •

edited

Loading