-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive lock contention #50
Comments
When thinking about optimizations, I think it's important to have a specific use case in mind. We intend to be a general purpose networking library, so I don't want to have very bad perf for an important use case. But right now lock contention hasn't been an issue in practice. We have had at least one user of this lib use a process model with a high number of threads and connections all within the same process. They were initially concerned about the global lock, as well, but we never did find any actual perf problems. It was an intentional design to keep the multi-threading design as simple as possible, and I'm hesitant to deviate from that design without a specific customer with a real world use case that is experiencing perf problems. Then we can look at more granular locking strategies. At Valve, our approach is to use multi-process when we want to host thousands of players on a single box, and use a relatively small number of threads per process. This is how the SDR relays work -- they are single-threaded, and this is also how most of the different types of steam servers are designed to scale to fully utilize the CPU power of a box. Because multiple threads in the same process is just much more complicated. I personally think multi-process is the better way to skin the cat, but I also know that it doesn't work everywhere. Also, this is not visible in the opensource code, but the relayed connections involve a lot more sharing of global data structures. E.g. two different connections might share a connection to one relay, but each have a separate session with two other relays, and the information about ping times used for route selection is all global. More granular locking would be very complicated. |
One question, what is the maximum number of players/messages per second a single dedicated server can now host for Valve games (which was determined using simulations or real-world experience)? |
We don't really have any use cases that approach the limits for our own games. We had a partner use the lib with a multi-threaded server, with hundreds of connections, and that was the closest we got to having problems. I think a use case of 100's of connections to a server is within the design scope of this library. 1000 is probably not. (Depending on the nature of the connections, of course.) For 100s of connections, my expectation is that lock contention will not be an issue unless the application code is designed to be accessing network functionality from many different threads at once. If the application only accesses the networking API for 1 or 2 threads, I don't think that there will be high contention, even if the number of connections is relatively high. To be specific, I would expect a battle royale server that doesn't access networking from many different threads to not have issues with lock contention. The APIs that receive multiple messages at once (and send -- if that API is added) could reduce the lock contention a lot, too, I suspect. Obviously these expectations could be wrong. |
@fletcherdvalve Do I understand correctly that lock synchronizer in |
That's right. |
Ah, so this explains why my simulation fails at 70 concurrent connections within the same process... 😿 |
Is there any read-only functionality that potentially could benefit from SRW lock instead of the mutex in cases where read operations exceed write operations? |
Yeah, for sure. |
@fletcherdvalve Another thing that came to my mind is that, wouldn't the scoped lock in Actually, this applies to all functions with the scoped lock that frequently invoked. I believe that some sort of Task/Phase-Fair RW should be used there at least to avoid starvation. |
This is used to poll many connections in a single function call. Previously, this was only possible if all of the connections were those accepted on the same listen socket. (ReceiveMessagesOnListenSocket). But this left out at least two important use cases with known users: - If you create more than one listen socket (because there is more way to contact your service, e.g. once for P2P and another for direct IP, and another for relayed connections), then you could not poll all of the connections efficiently. - In P2P use cases, we may initiate many connections to peers, and we want to poll all of them at once. This change is relevant to: Issue #49, Issue #50, and issue #52. (But I don't this it really "fixes" any of them.)
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
@nxrighthere I've got a change to implement fine-grained locking. I'd be very interested, if you try it out, how performance behaves, or if you even hit any bugs. Let me know if you have a chance to look at this! https://github.com/ValveSoftware/GameNetworkingSockets/tree/fine-grained-locking |
This is in response to issue ValveSoftware#50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
Hey @nxrighthere just checking in to see if you think you might have any time to re-run your tests with the threading improvements. I am going to be testing these changes using Dota and CSGO, and then eventually the Steamworks SDK, and maybe one day Bungie will ship it, too in Destiny. But having your test case, which is very different, would be really great, especially since your use case really hammers the problem very directly and is a big stress test. We never really had issues with lock contention before in CSGO or Dota, but those apps only make networking calls from one (maybe 2) threads, so it isn't really a stress test. Let me know if you think you might end up re-testing this sometime. It'll help me set my expectations. Thanks again for your contributions! |
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
I can't find this test on my machine, sadly. But I know guys who did something similar before, so I'll ask them if they can do some tests. |
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
Sorry, it looks like every time I rebase and force push the fine-grained locking branch, it adds more notes to this issue. I'll see if there's a better way to do this. |
This is in response to issue #50. I have a partner who wants to make sure that all calls form their "main" thread never stall waiting on the global lock, even for a few ms, and has documented some times when some stalls were visible due to contention. Using tons of locks to cover lots of different objects is really scary. A few high level strategies were used to make this possible: - Document what locks protect what data. Sprinkle asserts liberally when accessing that data to ensure that the appropriate lock is held. - Restrict lock aquisition patterns so that deadlocks are impossible, and add logic to detect violations of the expected lock pattern. - Use "leaf" locks smartyly. A leaf lock is one that only protects a relatively small piece of data, and for which the usage pattern is "lock, access data quickly, unlock". No other locks are allowed to be taken while a leaf lock is held, and so we don't have to worry about deadlocks. - Major state changes, object creation and destruction, and other operations that are likely to touch multiple objects are still protected by the global lock. Some API calls still take the global lock. The most common API calls we expect to happen per frame and need to be fast, however, no longer take the global lock. - Using scope lock objects, not only to take advantage of RAII unlocking, but also to clearly communicate lock ownership semantics between caller and calee. Here are all the locks: - Global lock. All background processing and most "major" operations like creating and destroying objects, is done while holding this lock. - Connection and poll group locks. You must hold this lock while queuing messages or pulling them off the queue. - Handle table lock. The important API calls that need to be optimized begin by looking up a handle in a table and getting an object. We have a lock to protect these tables. (We use the same lock for both connection and pollgroup tables.) The access pattern for this lock is the most unique -- see the code for details. - Thinker lock. Think times, and the PQ of thinkers, cannot be modified without holding this lock. This lock also has a relatively unique access pattern. - A single global recv message queue lock. Received messages waiting to be fetched by the API are complicated because they may exist in multiple queues at the same time. Instead of trying to have a lock for each queue and dealing with safely aquiring multiple locks when necessary, we just use a single "leaf" lock to protect *all* queues. This is OK because the lock is only held very briefly. - Misc other "leaf" locks. If you are only using UDP connections, with each connection having its own UDP socket, this may seem like overkill. But the relationship between connection and UDP socket is most definitely not one-to-one in the case of SDR, where many connections utilize the same data structures. One connection may use one or more sessions to relays, and the server state data may be shared between multiple connections. These server data may use different UDP sockets, or the sockets may be shared. Basically each connection doesn't simply have all its own data in such a way that it can be easily siloed. There are still a few FIXMEs tagged in the code that I know need to be fixed, but I confirmed basic functionality work in the simple UDP case, as well as the much more complicated SDR case. So this seems like a good time to check it into a branch and get some broader testing on it.
We've been using this on Dota internally for a while with no issues. I think it's stable enable to merge into master. I'll keep this bug open for a while longer. If somebody has a chance to really stress test the new code, please comment here. |
@fletcherdvalve, are you happy with how things look here? Looks like this issue has fallen into limbo pending feedback. |
Let's keep it open. Dota is going to ship with the fine-grained-locking fix soon. If that doesn't show any problems, CSGO will be soon thereafter. |
OK, well I'd love it if @nxrighthere was able to run some of his tests again (since he reported this issue). But, we have been doing some work sending a bunch of messages on different threads in the dota server, and this change has made a big difference in perf. So, it isn't crashing, and it's made Dota server perf quite a bit better. I think that's good enough to call this one closed. |
BTW, the latest steam client beta includes the new fine grained locking code https://steamcommunity.com/groups/SteamClientBeta/announcements/detail/3032590367259099679 (It's not mentioned in the patch notes.) |
Right now thread-safety achieved via the excessive amount of locks, this leads to performance degradation and it's killing scalability. While locks aren't slow, lock contention is. Practical solution: remove any shared states/data, and re-design the core using inter-thread message-based communication without any contention. This approach scales perfectly for traditional networking on top of UDP sockets, Disruptor works best for this.
Additional information regarding multi-threading you can find here.
The text was updated successfully, but these errors were encountered: