Skip to content

Conversation

@AadityaRavindran
Copy link
Contributor

Running sending metrics through TSan reveals data races when we try to clear the buffer. Adding a mutex lock makes it thread safe.

@kevinkreiser
Copy link
Collaborator

i mentioned this over in an open issue about this point: #12 (comment)

tldr; the client is fairly light weight, you can simply use a separate client for every thread

@AadityaRavindran
Copy link
Contributor Author

AadityaRavindran commented Apr 17, 2025

i mentioned this over in an open issue about this point: #12 (comment)

tldr; the client is fairly light weight, you can simply use a separate client for every thread

Just curious then why m_buffer was made mutable and memory is reserved in construction, instead of just a local buffer variable.

// Avoid re-allocations by reserving a generous buffer
m_buffer.reserve(256);

Would a nice middle-ground be a way to turn on thread safety if someone needs it? Or the alternative could be to move m_buffer to a local variable in StatsdClient::send instead?

@kevinkreiser
Copy link
Collaborator

everything was done with the idea of performance in mind. doing a reservation upfront and allowing us to modify it on send lets us reuse that chunk of memory and avoid de/reallocs. we want the client to be as fast as possible, since its basically like a logger but pushing logs over the network. no one wants a logger that they have to worry about perfwise everyone assumes its basically free. so yeah that, at least in my opinion, was a design goal

@kevinkreiser
Copy link
Collaborator

yeah we totally could have opt-in thread safety. we could also just perf test this and see if modern compilers are smart enough to know when its use is in a single threaded context

@AadityaRavindran
Copy link
Contributor Author

Alright, I did some benchmarking. TLDR; Using local buffer is much better than std::mutex, and does not add much performance overhead over a mutable class variable, while providing thread safety.

This is the test script

#include <chrono>
#include <iostream>
#include <string>
#include <thread>
#include <vector>
#include "cpp-statsd-client/StatsdClient.hpp"

const std::string host = "127.0.0.1";
const int port = 8125;
Statsd::StatsdClient statsd(host, port, "test");

void sendMetrics(int thread_id, int iteration_count) {
    for (int i = 0; i < iteration_count; ++i) {
        int value = rand() % 100000;
        statsd.gauge("test-metric", value, 1.0);
        // std::cout << "Thread " << thread_id << " sent metric: test-metric = " << value << std::endl;
    }
}

int main(int argc, char* argv[]) {
    int num_threads = std::stoi(argv[1]);
    int iteration_count = std::stoi(argv[2]);
    std::srand(static_cast<unsigned int>(std::time(nullptr)));
    std::vector<std::thread> threads;
    auto start = std::chrono::high_resolution_clock::now();
    for (int i = 0; i < num_threads; ++i) {
        threads.emplace_back(sendMetrics, i, iteration_count);
    }
    for (auto& thread : threads) {
        thread.join();
    }
    auto end = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double, std::milli> duration = end - start;
    std::cout << "Total metrics sent: " << iteration_count * num_threads << std::endl;
    std::cout << "Time taken: " << duration.count() << " ms" << std::endl;
    std::cout << "Metrics sent per second: " << (iteration_count * num_threads / (duration.count() / 1000.0)) << std::endl;
    return 0;
}

Here are the results:
Test 1: no mutex, 1 thread

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 1 1000000
Total metrics sent: 1000000
Time taken: 3673.82 ms
Metrics sent per second: 272196

Test 2: no mutex, 10 threads (causes data race issues leading to stat parsing issues - Splitting ':', unable to parse metric: test.test-metric818100|g)

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 10 100000
Total metrics sent: 1000000
Time taken: 863.359 ms
Metrics sent per second: 1.15827e+06

Test 3: std::mutex, 1 thread

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 1 1000000
Total metrics sent: 1000000
Time taken: 3778.6 ms
Metrics sent per second: 264649

Test 4: std::mutex, 10 threads (no data races, obviously)

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 10 100000
Total metrics sent: 1000000
Time taken: 6998.56 ms
Metrics sent per second: 142887

Test 5: Local string buffer, 1 thread

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 1 1000000
Total metrics sent: 1000000
Time taken: 4002.7 ms
Metrics sent per second: 249831

Test 6: Local string buffer, 10 threads (no data races)

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 10 100000
Total metrics sent: 1000000
Time taken: 830.395 ms
Metrics sent per second: 1.20425e+06

@AadityaRavindran
Copy link
Contributor Author

And if you ended up instantiating StatsdClient every time you sent it (with mutable std::string m_buffer) it would be 10x slower

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 1 1000000
Total metrics sent: 1000000
Time taken: 28197.7 ms
Metrics sent per second: 35463.9
user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 10 100000
Total metrics sent: 1000000
Time taken: 4872.05 ms
Metrics sent per second: 205252

@AadityaRavindran
Copy link
Contributor Author

Note that all of this testing was done with a Telegraf server running.

Comment on lines 289 to 290
std::string buffer;
buffer.reserve(256);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still cant believe this would be faster than keeping the buffer around! allocating once in the constructor and then using clear (which is O(1)) should always be faster than this.

maybe we could just make a static thread local buffer to achieve the same result and avoid the allocations/deallocations? something like:

Suggested change
std::string buffer;
buffer.reserve(256);
// the thread keeps this buffer around and reuses it, clear should be O(1) and reserve should only have to do so the first time, after that its a no-op
static thread_local std::string buffer;
buffer.clear();
buffer.reserve(256);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't though. Single threaded performance of allocating once in the constructor is 3673.82 ms, whereas local allocation is 4002.7 ms. And we can't trust the multi-threaded performance of allocating in the constructor because it does the wrong thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But yeah, static thread_local is a lot better.

user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 1 1000000
Total metrics sent: 1000000
Time taken: 2750.9 ms
Metrics sent per second: 363517
user@group:~$ g++ -o statsd_example statsd_example.cpp -I./cpp-statsd-client/ -lpthread && ./statsd_example 10 100000
Total metrics sent: 1000000
Time taken: 547.708 ms
Metrics sent per second: 1.82579e+06

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can't trust the multi-threaded performance of allocating in the constructor because it does the wrong thing.

Yeah, I wasn't concerned about multithreaded perf because testing thread-unsafe code with threads makes no sense 😄

@kevinkreiser
Copy link
Collaborator

@AadityaRavindran can you merge master into this branch

Copy link
Collaborator

@kevinkreiser kevinkreiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks for trying my suggestion!

@kevinkreiser kevinkreiser merged commit 6ab917a into vthiery:master Apr 21, 2025
4 checks passed
@AadityaRavindran
Copy link
Contributor Author

Thank you for the quick responses!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants