Skip to content

Auto tuning combined bandwidth

Hüseyin Tuğrul BÜYÜKIŞIK edited this page Feb 8, 2021 · 4 revisions

The "PcieBandwidthBenchmarker.h" header includes PcieBandwidthBenchmarker class that benchmarks each physical card in system and finds their relative multiplier constants to maximize combined bandwidth for virtual array usage.

User can choose amount of benchmark data to tune precision of measurement. Picking too high values (since it also tests main card that serves video-output for OS) causes buffer read/write errors. Development computer's 2GB main card had 1400MB room for benchmarking but just 128MB was good enough for tests and completed the benchmarking quicker.

Usage:

// user allows it to use 128MB per card during benchmark
PcieBandwidthBenchmarker bench(128);

// user picks minimum allowed data channel(virtual gpu) per physical gpu
// example: 2 here
std::vector<int> multipliers = bench.bestBandwidth(2);

Output array on development machine becomes {3,4,2} because slowest connection is on 3rd pcie bridge. Rest are scaled with their own data copying performances.

std::vector<int> multipliers = bench.bestBandwidth(10); // multipliers = { 15,20,10}

Then it can be directly sent to constructor of VirtualMultiArray like this:

VirtualMultiArray<Obj> data(..,..,..,..,bench.bestBandwidth(2));

The data array uses that specified ratios of bandwidths which maps well to physical card communication performance under high-enough concurrent accesses to elements.