Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

receive/handler: do not double lock #7124

Merged
merged 1 commit into from
Feb 9, 2024
Merged

receive/handler: do not double lock #7124

merged 1 commit into from
Feb 9, 2024

Conversation

GiedriusS
Copy link
Member

markPeerUnavailable was always taking a lock and in one case we were calling it with a lock already taken. Fix this.

markPeerUnavailable was always taking a lock and in one case we were
calling it with a lock already taken. Fix this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Copy link
Contributor

@douglascamata douglascamata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we solve this in a better way? The existence of the two type of functions (with lock and without lock) are confusing. I would love to have all the locking contained in the functions that need it.

Maybe we should explicitly release the lock after trying to pull the connection from p.connections? The code just above using RLock does this.

Maybe we don't even need to try to get it twice (first with RLock and then later with the normal Lock)?

@GiedriusS
Copy link
Member Author

GiedriusS commented Feb 9, 2024

Can we solve this in a better way? The existence of the two type of functions (with lock and without lock) are confusing. I would love to have all the locking contained in the functions that need it.

Maybe we should explicitly release the lock after trying to pull the connection from p.connections? The code just above using RLock does this.

Maybe we don't even need to try to get it twice (first with RLock and then later with the normal Lock)?

It's not possible to "upgrade" a lock i.e. Lock() while RLock()ing: golang/go#38859. RUnlock()ing and Lock()ing is not what we want because in the background someone could try to establish another connection, we need to keep the lock at all times in getConnection to have a consistent view of p.connections.

I don't see a problem with this implementation because the unlocked version is available only to code in the same package and it is not visible anywhere else 😄

@douglascamata
Copy link
Contributor

I don't see a problem with this implementation because the unlocked version is available only to code in the same package and it is not visible anywhere else 😄

The problem is our future selves, including other contributors, that can much more easily make a mistake because they have to choose between the lock and lock-free versions of the function. The receive package is HUGE and doesn't make me feel as safe regarding bad use as if it was smaller.

But I'm not sure about how we could achieve this. 🤔

@GiedriusS
Copy link
Member Author

I agree, ideally, there's only one way of doing things that is 100% clear. However, I don't think there's a clear answer right now in the context of this PR and because this bug is quite devastating - Receiver gets stuck, goroutine count keeps growing but still shows that it is healthy - I think let's merge first and then iterate.

@GiedriusS GiedriusS merged commit 29831f8 into main Feb 9, 2024
19 of 20 checks passed
@GiedriusS GiedriusS deleted the do-not-douible-lock branch February 9, 2024 13:06
jnyi pushed a commit to jnyi/thanos that referenced this pull request Apr 4, 2024
markPeerUnavailable was always taking a lock and in one case we were
calling it with a lock already taken. Fix this.

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants