New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate limits on router #102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use a wrapper function and force the users to use a custom routing function instead of simply calling the RateLimiting function if rate limiting is enabled?
config/config.go
Outdated
@@ -105,6 +105,8 @@ func (c *Config) fillDefaultValues() { | |||
"pitaya.modules.bindingstorage.etcd.endpoints": "localhost:2379", | |||
"pitaya.modules.bindingstorage.etcd.leasettl": "1h", | |||
"pitaya.modules.bindingstorage.etcd.prefix": "pitaya/", | |||
"pitaya.router.ratelimiting.limit": 20, | |||
"pitaya.router.ratelimiting.interval": "1s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you set the default value for pitaya.router.ratelimiting.forcedisable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not set, the default is false for bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should contain all defaults for reference, so add it here and if you included any other config and it's default is not here, include it as well
Pull Request Test Coverage Report for Build 890
💛 - Coveralls |
Do you think rate limiting should be enabled by default? But it's a good point that defaultRouting also should be able do use rate limiting, I will work on that. |
router/rate_limiting_test.go
Outdated
}, | ||
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.Take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.Take
is not checked (from errcheck
)
router/rate_limiting_test.go
Outdated
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.Take(now) | ||
r.Take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.Take
is not checked (from errcheck
)
router/rate_limiting_test.go
Outdated
}, | ||
"test_can_take_when_oldest_request_expired": { | ||
before: func() { | ||
r.Take(now.Add(-2 * interval)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.Take
is not checked (from errcheck
)
@henrod I don't think the routing function is the correct place to handle this. We're trying to avoid badly behaved clients, correct? If we put this logic in the routing function we're only limiting calls that need to be redirected to other servers. IMO rate limiting should be enforced as close as possible to the client, i.e. in the frontendend server she is connected to. In http APIs rate limiting are usually done in a proxy that is reached before the API. I would move this logic to somewhere in |
Good point! |
@cscatolini @andrehp So, I am thinking about checking for rateLimiting here. What do you think? Just before sending to localProcess and remoteProcess Also, I am not sure where to put *RateLimiting. Another file in service directory? Or a module? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good choice of algorithm @henrod!
I also agree with camila's point, however, I don't like the suggestion of putting this logic inside handlerService, it should remain responsible only to handling messages. I'ld like this better as a pluggable component, haven't gave it a lot of thought though yet.
config/config.go
Outdated
@@ -105,6 +105,8 @@ func (c *Config) fillDefaultValues() { | |||
"pitaya.modules.bindingstorage.etcd.endpoints": "localhost:2379", | |||
"pitaya.modules.bindingstorage.etcd.leasettl": "1h", | |||
"pitaya.modules.bindingstorage.etcd.prefix": "pitaya/", | |||
"pitaya.router.ratelimiting.limit": 20, | |||
"pitaya.router.ratelimiting.interval": "1s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should contain all defaults for reference, so add it here and if you included any other config and it's default is not here, include it as well
docs/features.md
Outdated
@@ -29,6 +29,9 @@ When a server instance receives a client message, it checks the target server ty | |||
|
|||
By default the routing function chooses one instance of the target server type at random. Custom functions can be defined to change this behavior. | |||
|
|||
#### Rate limiting | |||
By adding a routing function, it is possible to wrap it with a rate limiting logic by using WithRateLimiting function. The requests counts and management is done on player's session, therefore it happens even before session bind. The used algorithm is the Leaky Bucket. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include a link with a description of the leaky bucket algorithm
router/rate_limiting.go
Outdated
|
||
// WithRateLimiting wraps routingFunc and controls | ||
// player rate through session. | ||
func WithRateLimiting(c *config.Config, f RoutingFunc) RoutingFunc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use an implementation of a pitaya pipeline instead of a decorator function?
=== EDIT
pipelines are not run on the frontend server when the call is to be forwarded, so it wouldn't fit this case today
router/rate_limiting.go
Outdated
payload []byte, | ||
servers map[string]*cluster.Server, | ||
) (*cluster.Server, error) { | ||
if forceDisable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an optional feature that one had intentionally included, is the point of having this config just to make it easier to turn it off without having to make a new build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. In case you notice that players are being blocked by rate limiting unintentionally and you want to disable it for now to test other values for limit and interval
router/rate_limiting.go
Outdated
|
||
sessionVal := ctx.Value(constants.SessionCtxKey) | ||
if sessionVal == nil { | ||
logger.Log.Warnf("session not found, cant rate limit, route: %s", r.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can't"
also, when would this case happen (not having a session)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for safety, so the application doesn't crash. But I will verify if this can't happen at all and, if so, I remove this check
router/rate_limiting.go
Outdated
// Take saves the now as time taken or returns an error if | ||
// in the limit of rate limiting | ||
func (r *RateLimiting) Take(now time.Time) error { | ||
r.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think it's possible to remove this mutex?
@felipejfc I also thought about pipelines but it doesn't fit the requirement of blocking the message as soon as possible. If the handler service is the one responsible for handling messages, I think it is okay to make it responsibe for blocking messages. |
What I meant is that handlerService's only responsibility should be to receive messages from clients, decoding them and routing for processing. Today, apart from that it's also dealing with handshake packets, I'ld like to refactor that. |
@felipejfc If filtering after routing, the logic would be at |
But then you would need to add it at two distinct places in the code instead of only one |
Discussing it with @henrod I think we came up with a good solution, the idea is to decorate the acceptor with a wrapper that will act on the net.Conn objects before they reach the HandlerService, it could be used for rate limiting or other stuff like regional ip blocking for example. |
f14b7db
to
daacb4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are missing from this PR. An example would be nice as well.
docs/features.md
Outdated
|
||
Wrappers can be used on acceptors, like TCP and Websocket, to read and change incoming data before performing the message forwarding. To create a new wrapper just implement the Wrapper interface (or inherit the struct from BaseWrapper) and add it into your acceptor by using the WithWrappers method. Next there are some examples of acceptor wrappers. | ||
|
||
#### Rate limiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here it should contain three hash symbols
return r.Conn.Read(b) | ||
} | ||
|
||
n, err = r.Conn.Read(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the Read call returns an error the service/handler.go closes the connection immediately. I don't think this is the expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's what happens now, see
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the only error possible is a broken pipe error, so it actually makes sense
acceptorwrapper/base_test.go
Outdated
Acceptor: mockAcceptor, | ||
connChan: make(chan net.Conn), | ||
wrapConn: func(c net.Conn) net.Conn { | ||
c.Read([]byte{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of c.Read
is not checked (from errcheck
)
}, | ||
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.take(now) | ||
r.take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
}, | ||
"test_can_take_when_oldest_request_expired": { | ||
before: func() { | ||
r.take(now.Add(-2 * interval)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
// after exceeded rate limiting in a connection | ||
func ReportExceededRateLimiting(reporters []Reporter) { | ||
for _, r := range reporters { | ||
r.ReportCount(ExceededRateLimiting, map[string]string{}, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.ReportCount
is not checked (from errcheck
)
b6d10c3
to
8b60c25
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@henrod instead of having acceptorwrappers why don't we use connectionwrappers and send them to the acceptors like: NewTCP(bla,...connectionwrapper)?
EDIT: On a second thought I also prefer acceptorwrappers.
return r.Conn.Read(b) | ||
} | ||
|
||
n, err = r.Conn.Read(b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the only error possible is a broken pipe error, so it actually makes sense
|
||
// RateLimitingConn wraps net.Conn by applying rate limiting | ||
// and return empty if exceeded | ||
type RateLimitingConn struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename to RateLimiter
if err != nil { | ||
logger.Log.Errorf("Data=%s, Error=%s", b, err) | ||
metrics.ReportExceededRateLimiting(pitaya.GetMetricsReporters()) | ||
return 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure you want to return here, the default behaviour of net.Conn is that it only returns when it has data to return, returning here you're changing this behaviour for the ones who are consuming a wrapped connection, isn't it better to have this inside a while and have it continue on errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But on the other hand, from the client's point of view the messages are simply getting lost, the client isn't notified about the rate limiting happening, this might lead to hard to detect bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the client should be notified, in what situations would a client hit the limiting?
- cheating/trying to attack us
- we inserted a but in clients code
in both case I think that it's irrelevant for us to answer anything for them, what do you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@codingllama: any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would be hard to detect because it's reported using metricsReporters and logged as an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have enough context here. For example:
- Is dropping this data OK or could it cause bugs?
- Does rate-limiting at the connection level implies a 1:1 relationship between a Read and RPC, or could this cause corrupted/partial reads?
- Could we rate limit at a higher abstraction level? (eg, per RPC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The effect of dropping would be a timeout error on the client side. So it depends on how the game deals with it. Maybe expliciting this behaviour on docs? So they can be prepared for this case.
- Currently, Read can have more than one RPC but never less (we don't fragment packets now). Even though we can have more than one, we think it's a good approximation to rate limit based on Read
- Yes we could. But in my discussion with @felipejfc we thought it was better to not mix the logic afterwards and rate limit as soon as possible. Meaning that we don't need to decode the packets to rate limit it. Also, putting the logic here it was clearer to make this logic "plugable"
acceptorwrapper/rate_limiter_test.go
Outdated
}, | ||
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
acceptorwrapper/rate_limiter_test.go
Outdated
"test_can_take_missing_one_to_limit": { | ||
before: func() { | ||
r.take(now) | ||
r.take(now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
acceptorwrapper/rate_limiter_test.go
Outdated
}, | ||
"test_can_take_when_oldest_request_expired": { | ||
before: func() { | ||
r.take(now.Add(-2 * interval)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error return value of r.take
is not checked (from errcheck
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a quick look, but I don't have a lot of context here, so let me know if the comments make sense.
@@ -0,0 +1,69 @@ | |||
// Copyright (c) TFG Co. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the year be present in the copyright notice? (Not legal advice, just curious.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be present, and this copyright should be present in all files as well, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm, we don't have the year in almost any file in the project
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=/
I think every file should contain every year it had a modification, but I don't know if modifying the whole project is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used to do year of conception. Copyright can be tricky, it may be worth looking for proper advice.
If the entire project is like this, then I'd suggest leaving it as-is here and maybe adding the year in a separate PR.
"github.com/topfreegames/pitaya/metrics" | ||
) | ||
|
||
// RateLimitingConn wraps net.Conn by applying rate limiting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please document how the rate limiting works and what the limit
and interval
settings do. A reference to https://en.wikipedia.org/wiki/Leaky_bucket, like you did in the .md, would be good here.
if err != nil { | ||
logger.Log.Errorf("Data=%s, Error=%s", b, err) | ||
metrics.ReportExceededRateLimiting(pitaya.GetMetricsReporters()) | ||
return 0, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I have enough context here. For example:
- Is dropping this data OK or could it cause bugs?
- Does rate-limiting at the connection level implies a 1:1 relationship between a Read and RPC, or could this cause corrupted/partial reads?
- Could we rate limit at a higher abstraction level? (eg, per RPC)
|
||
// take saves the now as time taken or returns an error if | ||
// in the limit of rate limiting | ||
func (r *RateLimitingConn) take(now time.Time) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: take
is not a very descriptive name. I'd suggest something that alludes to rate limiting, maybe shouldRateLimit() bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about the name.
Don't you think it's good to return the error here instead of only a bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either works, no strong opinion here. I took it out because it seems it only exists to be logged, in which case you could just write the message directly on logger.Log.Errorf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. And shouldRateLimit
makes more sense if returns a bool. I will change it.
@codingllama @felipejfc @cscatolini @andrehp any more thoughts? |
LGTM, but I think the GolangCI is still complaining about a few issues I liked the approach we ended up with :) |
The error on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach isn't my favorite, but it works well enough
@@ -0,0 +1,69 @@ | |||
// Copyright (c) TFG Co. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be present, and this copyright should be present in all files as well, shouldn't it?
No blockers from me, if the other reviewers are happy go for it. :) |
beee5be
to
1accfe1
Compare
Use WithRateLimiting function as RoutingFunc decorator
PIT-429
onresponse_time_ns
metric