Profile Guided Optimization (PGO)
Profile-guided Optimization is a compiler optimization technique that involves collecting typical execution data, including possible branches, during program execution. This collected data is then used to optimize various aspects of the code, such as inlining, conditional branches, machine code layout, and register allocation.
According to the tests, PGO boosts Databend performance by up to 10%. The performance benefits depend on your typical workload - you can get better or worse results.
More information about PGO in Databend you can read in Issue #9387 | Feature: add PGO support.
Prerequisites
Before you build Databend with PGO, make sure the following requirements have been met:
- Install the PGO helper binary by adding the
llvm-tools-preview
component to your toolchain with rustup:
$ rustup component add llvm-tools-preview
$ cargo install cargo-pgo
Building Databend with PGO
Follow the steps below to build Databend:
- Download the source code.
git clone https://github.com/datafuselabs/databend.git
- Install dependencies and compile the source code.
cd databend
make setup -d
export PATH=$PATH:~/.cargo/bin
- Build Databend with
cargo-pgo
. Due to a known issue with PyO3, we need to skipbendpy
during the build.
cargo pgo build -- --workspace --exclude bendpy
- Import the dataset and run a typical query workload.
# Run Databend in standalone mode, or you can try running it in cluster mode.
BUILD_PROFILE=<target-tuple>/release ./scripts/ci/deploy/databend-query-standalone.sh
# Here Databend's SQL logic tests is used as a demonstration, but you need to adjust it for the target workload
ulimit -n 10000;ulimit -s 16384; cargo run -p databend-qllogictest --release -- --enable_sandbox --parallel 16 --no-fail-fast
# Stop the running databend instance to dump profile data.
killall databend-query
killall databend-meta
tip
- You need to check the platform triple corresponding to your current build and replace
<target-tuple>
above. For example:x86_64-unknown-linux-gnu
. - For more precise profiles, run it with the following environment variable:
LLVM_PROFILE_FILE=./target/pgo-profiles/%m_%p.profraw
.
- Build an optimized Databend
cargo pgo optimize -- --workspace --exclude bendpy