Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: added region cache warmup #250

Merged
merged 1 commit into from
May 8, 2024
Merged

Conversation

gogochickenleg
Copy link
Contributor

This change adds an option in gohbase to prepopulate its region cache. It does a single scan of the meta table and cache all regions.

Copy link
Collaborator

@aaronbee aaronbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I left a couple comments. Let me know if they make sense.

integration_test.go Outdated Show resolved Hide resolved
rpc.go Outdated Show resolved Hide resolved
rpc.go Outdated Show resolved Hide resolved
integration_test.go Outdated Show resolved Hide resolved
@aaronbee
Copy link
Collaborator

aaronbee commented May 1, 2024

@dethi Do you want to take a final look at this?

rpc.go Show resolved Hide resolved
rpc.go Outdated Show resolved Hide resolved
}

// Start a goroutine to connect to the region
go c.establishRegion(reg, addr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should establish a connection immediately. I would think that just loading the cache is enough and once one of the region is being used, then a connection will be established. No?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SendRPC function, we first call getRegionFromCache to retrieve the region from the cache. If the region is not found in the cache, we then call findRegion to look up the region from the meta table. It seems that we are not establishing a connection when calling getRegionFromCache, but we are establishing one in findRegion. This leads me to think that we should establish a connection when loading a region into the cache. But I could be missing something

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dethi's suggestion is a good one, to make the connection establishment lazy, but I don't think it would work well today.
What you could do is remove the MarkUnavailable() and establishRegion calls. Then when an RPC is made getRegionAndClientForRPC will see that reg.Client() == nil for that region and then call reestablishRegion. The problem is that the way this works today is it calls establishRegion with "" as the addr, which results in re-looking up the region from the meta region. So, then we are back to a flurry of requests to the meta region during start up (though, only one per region).

I think we should come up with a better fix for this in a later change.

Copy link
Collaborator

@dethi dethi May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah my concern is because if you have 10k regions, this will cause 10k goroutines to start trying to establish a connection to maybe ~100 RegionServers. They will all get block behind a lock, that ensure that we establish only one client connection to a RegionServers, so that part isn't a problem. It's more the impact of all these goroutines on the Go scheduler, the CPU usage of the client on startup and what it would mean for the RegionServers as well if you have a few 10s clients doing the same thing at once. Because one of the thing we do in establishRegion is a Get to validate that we have properly established the connection to the right regionserver and that the region is online, see isRegionEstablished. That could cause quite a bit of storm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sucks, and we are doing O(number of regions * number of RegionServers) work during this warmup because clientRegionCache.put is going to iterate over every RegionServer for each region added.

I still think we should try to merge this change as-is so that we can try out the tradeoffs and continue to work to improve things here. We might be starting 10k goroutines, but at least we aren't sending out 10k scan requests to the meta region.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm, but need to get the CI to pass first

@dethi
Copy link
Collaborator

dethi commented May 2, 2024

Left a few comment + CI is complaining about the mock missing the new function.

integration_test.go Outdated Show resolved Hide resolved
@dethi
Copy link
Collaborator

dethi commented May 3, 2024

Looks like there was a data race detected in the test, related to marshalJSON

@gogochickenleg gogochickenleg force-pushed the BUGFIX237 branch 5 times, most recently from 5aac5a1 to 78c56cf Compare May 7, 2024 17:36
integration_test.go Outdated Show resolved Hide resolved
This change adds an option in gohbase to prepopulate its region
cache. It does a single scan of the meta table and cache all
regions.
@aaronbee aaronbee merged commit ce0b353 into tsuna:master May 8, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants