You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently I read the article about Candystore. As far as I see, the project cares a lot about its performance. That's why I decided to perform some benchmarks with more advanced compiler optimizations.
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects (including many optimized databases, parsers, compilers, etc.) are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on Candystore to see if the performance win (or lose) can be achieved. Here are my benchmark results.
Test environment
Fedora 40
Linux kernel 6.10.7
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.83.0-nightly
Candystore version: main branch on commit 263d385c1bd0ba00a5d813a1fbf789c72baf9ae8
Disabled Turbo boost
Benchmark
For benchmark purposes, I use built-in into the project benchmarks - candy-perf. For PGO optimization I use cargo-pgo tool. All measurements, benchmark and PGO training are done with the same command - taskset -c 0 candy_perf.
taskset -c 0 is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).
Also, I decided to enable LTO (lto = true for [profile.release] in the root Cargo.toml) for candy-perf - it can help the compiler perform more aggressive optimizations.
According to the results, PGO improves the Candystore's performance in many cases.
Also, I did quick measurements about the candy-perf's binary size:
Release: 895 Kib
LTO: 719 Kib
LTO + PGO optimized: 733 Kib
LTO + PGO instrumented: 1.5 Mib
Further steps
I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about Candystore's performance in their workloads. Maybe a small note somewhere in the documentation will be enough to raise awareness about this work.
Thank you.
P.S. I just created the Issue since Discussions are disabled for the repo. Don't treat the issue like the issue - it's more an improvement idea.
The text was updated successfully, but these errors were encountered:
Thanks @zamazan4ik for this thorough review. I will link to it from the README.
Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform
Btw, I would never say no to more performance, but in the end the operations ares dominated by a syscall and a potential disk IO, which would out-weigh any optimizations the compiler/linker may perform
You are right - PGO helps with optimizing only the CPU part of a workload. IO won't magically go away but still - the CPU part will be improved, and the benchmarks above show some performance improvements even after optimizing "only" the CPU part of the performance.
Hi!
Recently I read the article about Candystore. As far as I see, the project cares a lot about its performance. That's why I decided to perform some benchmarks with more advanced compiler optimizations.
As I have done many times before, I decided to test the Profile-Guided Optimization (PGO) technique to optimize the library performance. For reference, results for other projects (including many optimized databases, parsers, compilers, etc.) are available at https://github.com/zamazan4ik/awesome-pgo . Since PGO helped a lot for many other libraries, I decided to apply it on Candystore to see if the performance win (or lose) can be achieved. Here are my benchmark results.
Test environment
main
branch on commit263d385c1bd0ba00a5d813a1fbf789c72baf9ae8
Benchmark
For benchmark purposes, I use built-in into the project benchmarks -
candy-perf
. For PGO optimization I use cargo-pgo tool. All measurements, benchmark and PGO training are done with the same command -taskset -c 0 candy_perf
.taskset -c 0
is used for reducing the OS scheduler's influence on the results. All measurements are done on the same machine, with the same background "noise" (as much as I can guarantee).Also, I decided to enable LTO (
lto = true
for[profile.release]
in the root Cargo.toml) forcandy-perf
- it can help the compiler perform more aggressive optimizations.Results
I got the following results:
According to the results, PGO improves the Candystore's performance in many cases.
Also, I did quick measurements about the
candy-perf
's binary size:Further steps
I understand that the steps above can be time-consuming and hard to implement in practice. At the very least, the library's users can find this performance report and decide to enable PGO for their applications if they care about Candystore's performance in their workloads. Maybe a small note somewhere in the documentation will be enough to raise awareness about this work.
Thank you.
P.S. I just created the Issue since Discussions are disabled for the repo. Don't treat the issue like the issue - it's more an improvement idea.
The text was updated successfully, but these errors were encountered: