Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rate limits on router #102

Merged
merged 1 commit into from May 29, 2019
Merged

rate limits on router #102

merged 1 commit into from May 29, 2019

Conversation

henrod
Copy link
Contributor

@henrod henrod commented May 22, 2019

Use WithRateLimiting function as RoutingFunc decorator

  • It's optional, without the decorator no rate limit is not applied
  • On disastrous situations in production, WithRateLimiting can be forcibly disabled by config
  • The number of blocked requests can be monitored by the code PIT-429 on response_time_ns metric
  • RateLimiting is saved on player's session, so no bind is necessary to control rate

@henrod henrod added the enhancement New feature or request label May 22, 2019
@henrod henrod requested a review from andrehp May 22, 2019 19:33
@henrod henrod self-assigned this May 22, 2019
Copy link
Contributor

@andrehp andrehp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use a wrapper function and force the users to use a custom routing function instead of simply calling the RateLimiting function if rate limiting is enabled?

docs/configuration.rst Outdated Show resolved Hide resolved
config/config.go Outdated
@@ -105,6 +105,8 @@ func (c *Config) fillDefaultValues() {
"pitaya.modules.bindingstorage.etcd.endpoints": "localhost:2379",
"pitaya.modules.bindingstorage.etcd.leasettl": "1h",
"pitaya.modules.bindingstorage.etcd.prefix": "pitaya/",
"pitaya.router.ratelimiting.limit": 20,
"pitaya.router.ratelimiting.interval": "1s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you set the default value for pitaya.router.ratelimiting.forcedisable as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not set, the default is false for bool

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should contain all defaults for reference, so add it here and if you included any other config and it's default is not here, include it as well

@topfreegames topfreegames deleted a comment from golangcibot May 22, 2019
@topfreegames topfreegames deleted a comment from golangcibot May 22, 2019
@topfreegames topfreegames deleted a comment from golangcibot May 22, 2019
@coveralls
Copy link

coveralls commented May 22, 2019

Pull Request Test Coverage Report for Build 890

  • 72 of 96 (75.0%) changed or added relevant lines in 7 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.3%) to 68.898%

Changes Missing Coverage Covered Lines Changed/Added Lines %
acceptorwrapper/base.go 17 21 80.95%
metrics/report.go 0 4 0.0%
acceptorwrapper/wrapper.go 0 5 0.0%
metrics/prometheus_reporter.go 0 11 0.0%
Files with Coverage Reduction New Missed Lines %
agent/agent.go 2 80.9%
Totals Coverage Status
Change from base Build 856: 0.3%
Covered Lines: 4406
Relevant Lines: 6395

💛 - Coveralls

@henrod
Copy link
Contributor Author

henrod commented May 22, 2019

Why use a wrapper function and force the users to use a custom routing function instead of simply calling the RateLimiting function if rate limiting is enabled?

Do you think rate limiting should be enabled by default?

But it's a good point that defaultRouting also should be able do use rate limiting, I will work on that.

},
"test_can_take_missing_one_to_limit": {
before: func() {
r.Take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.Take is not checked (from errcheck)

"test_can_take_missing_one_to_limit": {
before: func() {
r.Take(now)
r.Take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.Take is not checked (from errcheck)

},
"test_can_take_when_oldest_request_expired": {
before: func() {
r.Take(now.Add(-2 * interval))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.Take is not checked (from errcheck)

@cscatolini
Copy link
Contributor

@henrod I don't think the routing function is the correct place to handle this. We're trying to avoid badly behaved clients, correct? If we put this logic in the routing function we're only limiting calls that need to be redirected to other servers.

IMO rate limiting should be enforced as close as possible to the client, i.e. in the frontendend server she is connected to. In http APIs rate limiting are usually done in a proxy that is reached before the API. I would move this logic to somewhere in HandlerService, as soon as we have the client information so we can limit by client.

@henrod
Copy link
Contributor Author

henrod commented May 22, 2019

@henrod I don't think the routing function is the correct place to handle this. We're trying to avoid badly behaved clients, correct? If we put this logic in the routing function we're only limiting calls that need to be redirected to other servers.

IMO rate limiting should be enforced as close as possible to the client, i.e. in the frontendend server she is connected to. In http APIs rate limiting are usually done in a proxy that is reached before the API. I would move this logic to somewhere in HandlerService, as soon as we have the client information so we can limit by client.

Good point!

@henrod
Copy link
Contributor Author

henrod commented May 22, 2019

@cscatolini @andrehp So, I am thinking about checking for rateLimiting here. What do you think? Just before sending to localProcess and remoteProcess

Also, I am not sure where to put *RateLimiting. Another file in service directory? Or a module?

Copy link
Contributor

@felipejfc felipejfc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good choice of algorithm @henrod!
I also agree with camila's point, however, I don't like the suggestion of putting this logic inside handlerService, it should remain responsible only to handling messages. I'ld like this better as a pluggable component, haven't gave it a lot of thought though yet.

config/config.go Outdated
@@ -105,6 +105,8 @@ func (c *Config) fillDefaultValues() {
"pitaya.modules.bindingstorage.etcd.endpoints": "localhost:2379",
"pitaya.modules.bindingstorage.etcd.leasettl": "1h",
"pitaya.modules.bindingstorage.etcd.prefix": "pitaya/",
"pitaya.router.ratelimiting.limit": 20,
"pitaya.router.ratelimiting.interval": "1s",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file should contain all defaults for reference, so add it here and if you included any other config and it's default is not here, include it as well

docs/features.md Outdated
@@ -29,6 +29,9 @@ When a server instance receives a client message, it checks the target server ty

By default the routing function chooses one instance of the target server type at random. Custom functions can be defined to change this behavior.

#### Rate limiting
By adding a routing function, it is possible to wrap it with a rate limiting logic by using WithRateLimiting function. The requests counts and management is done on player's session, therefore it happens even before session bind. The used algorithm is the Leaky Bucket.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include a link with a description of the leaky bucket algorithm


// WithRateLimiting wraps routingFunc and controls
// player rate through session.
func WithRateLimiting(c *config.Config, f RoutingFunc) RoutingFunc {
Copy link
Contributor

@felipejfc felipejfc May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to use an implementation of a pitaya pipeline instead of a decorator function?
=== EDIT
pipelines are not run on the frontend server when the call is to be forwarded, so it wouldn't fit this case today

payload []byte,
servers map[string]*cluster.Server,
) (*cluster.Server, error) {
if forceDisable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an optional feature that one had intentionally included, is the point of having this config just to make it easier to turn it off without having to make a new build?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In case you notice that players are being blocked by rate limiting unintentionally and you want to disable it for now to test other values for limit and interval


sessionVal := ctx.Value(constants.SessionCtxKey)
if sessionVal == nil {
logger.Log.Warnf("session not found, cant rate limit, route: %s", r.String())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can't"

also, when would this case happen (not having a session)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for safety, so the application doesn't crash. But I will verify if this can't happen at all and, if so, I remove this check

// Take saves the now as time taken or returns an error if
// in the limit of rate limiting
func (r *RateLimiting) Take(now time.Time) error {
r.mutex.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it's possible to remove this mutex?

@cscatolini
Copy link
Contributor

@felipejfc I also thought about pipelines but it doesn't fit the requirement of blocking the message as soon as possible. If the handler service is the one responsible for handling messages, I think it is okay to make it responsibe for blocking messages.

@felipejfc
Copy link
Contributor

felipejfc commented May 23, 2019

@felipejfc I also thought about pipelines but it doesn't fit the requirement of blocking the message as soon as possible. If the handler service is the one responsible for handling messages, I think it is okay to make it responsibe for blocking messages.

What I meant is that handlerService's only responsibility should be to receive messages from clients, decoding them and routing for processing. Today, apart from that it's also dealing with handshake packets, I'ld like to refactor that.

@henrod
Copy link
Contributor Author

henrod commented May 23, 2019

@felipejfc If filtering after routing, the logic would be at removeService.remoteProcess method and handlerService.localProcess method. That would work.

@andrehp
Copy link
Contributor

andrehp commented May 23, 2019

But then you would need to add it at two distinct places in the code instead of only one

@felipejfc
Copy link
Contributor

felipejfc commented May 23, 2019

Discussing it with @henrod I think we came up with a good solution, the idea is to decorate the acceptor with a wrapper that will act on the net.Conn objects before they reach the HandlerService, it could be used for rate limiting or other stuff like regional ip blocking for example.
This way we keep the right responsibilities for each component and the code for the rate limiter will be explicitly "plugable" and easy to find and maintain.

Copy link
Contributor

@andrehp andrehp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are missing from this PR. An example would be nice as well.

docs/features.md Outdated

Wrappers can be used on acceptors, like TCP and Websocket, to read and change incoming data before performing the message forwarding. To create a new wrapper just implement the Wrapper interface (or inherit the struct from BaseWrapper) and add it into your acceptor by using the WithWrappers method. Next there are some examples of acceptor wrappers.

#### Rate limiting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here it should contain three hash symbols

return r.Conn.Read(b)
}

n, err = r.Conn.Read(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the Read call returns an error the service/handler.go closes the connection immediately. I don't think this is the expected behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's what happens now, see

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the only error possible is a broken pipe error, so it actually makes sense

Acceptor: mockAcceptor,
connChan: make(chan net.Conn),
wrapConn: func(c net.Conn) net.Conn {
c.Read([]byte{})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of c.Read is not checked (from errcheck)

},
"test_can_take_missing_one_to_limit": {
before: func() {
r.take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

"test_can_take_missing_one_to_limit": {
before: func() {
r.take(now)
r.take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

},
"test_can_take_when_oldest_request_expired": {
before: func() {
r.take(now.Add(-2 * interval))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

// after exceeded rate limiting in a connection
func ReportExceededRateLimiting(reporters []Reporter) {
for _, r := range reporters {
r.ReportCount(ExceededRateLimiting, map[string]string{}, 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.ReportCount is not checked (from errcheck)

Copy link
Contributor

@felipejfc felipejfc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henrod instead of having acceptorwrappers why don't we use connectionwrappers and send them to the acceptors like: NewTCP(bla,...connectionwrapper)?

EDIT: On a second thought I also prefer acceptorwrappers.

return r.Conn.Read(b)
}

n, err = r.Conn.Read(b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the only error possible is a broken pipe error, so it actually makes sense


// RateLimitingConn wraps net.Conn by applying rate limiting
// and return empty if exceeded
type RateLimitingConn struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename to RateLimiter

if err != nil {
logger.Log.Errorf("Data=%s, Error=%s", b, err)
metrics.ReportExceededRateLimiting(pitaya.GetMetricsReporters())
return 0, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you want to return here, the default behaviour of net.Conn is that it only returns when it has data to return, returning here you're changing this behaviour for the ones who are consuming a wrapped connection, isn't it better to have this inside a while and have it continue on errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But on the other hand, from the client's point of view the messages are simply getting lost, the client isn't notified about the rate limiting happening, this might lead to hard to detect bugs.

Copy link
Contributor

@felipejfc felipejfc May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the client should be notified, in what situations would a client hit the limiting?

  1. cheating/trying to attack us
  2. we inserted a but in clients code
    in both case I think that it's irrelevant for us to answer anything for them, what do you guys think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codingllama: any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would be hard to detect because it's reported using metricsReporters and logged as an error

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have enough context here. For example:

  • Is dropping this data OK or could it cause bugs?
  • Does rate-limiting at the connection level implies a 1:1 relationship between a Read and RPC, or could this cause corrupted/partial reads?
  • Could we rate limit at a higher abstraction level? (eg, per RPC)

Copy link
Contributor Author

@henrod henrod May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The effect of dropping would be a timeout error on the client side. So it depends on how the game deals with it. Maybe expliciting this behaviour on docs? So they can be prepared for this case.
  • Currently, Read can have more than one RPC but never less (we don't fragment packets now). Even though we can have more than one, we think it's a good approximation to rate limit based on Read
  • Yes we could. But in my discussion with @felipejfc we thought it was better to not mix the logic afterwards and rate limit as soon as possible. Meaning that we don't need to decode the packets to rate limit it. Also, putting the logic here it was clearer to make this logic "plugable"

},
"test_can_take_missing_one_to_limit": {
before: func() {
r.take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

"test_can_take_missing_one_to_limit": {
before: func() {
r.take(now)
r.take(now)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

},
"test_can_take_when_oldest_request_expired": {
before: func() {
r.take(now.Add(-2 * interval))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error return value of r.take is not checked (from errcheck)

Copy link

@codingllama codingllama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a quick look, but I don't have a lot of context here, so let me know if the comments make sense.

@@ -0,0 +1,69 @@
// Copyright (c) TFG Co. All Rights Reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the year be present in the copyright notice? (Not legal advice, just curious.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be present, and this copyright should be present in all files as well, shouldn't it?

Copy link
Contributor Author

@henrod henrod May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, we don't have the year in almost any file in the project

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=/
I think every file should contain every year it had a modification, but I don't know if modifying the whole project is worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to do year of conception. Copyright can be tricky, it may be worth looking for proper advice.

If the entire project is like this, then I'd suggest leaving it as-is here and maybe adding the year in a separate PR.

"github.com/topfreegames/pitaya/metrics"
)

// RateLimitingConn wraps net.Conn by applying rate limiting

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document how the rate limiting works and what the limit and interval settings do. A reference to https://en.wikipedia.org/wiki/Leaky_bucket, like you did in the .md, would be good here.

if err != nil {
logger.Log.Errorf("Data=%s, Error=%s", b, err)
metrics.ReportExceededRateLimiting(pitaya.GetMetricsReporters())
return 0, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I have enough context here. For example:

  • Is dropping this data OK or could it cause bugs?
  • Does rate-limiting at the connection level implies a 1:1 relationship between a Read and RPC, or could this cause corrupted/partial reads?
  • Could we rate limit at a higher abstraction level? (eg, per RPC)


// take saves the now as time taken or returns an error if
// in the limit of rate limiting
func (r *RateLimitingConn) take(now time.Time) error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: take is not a very descriptive name. I'd suggest something that alludes to rate limiting, maybe shouldRateLimit() bool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the name.

Don't you think it's good to return the error here instead of only a bool?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either works, no strong opinion here. I took it out because it seems it only exists to be logged, in which case you could just write the message directly on logger.Log.Errorf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. And shouldRateLimit makes more sense if returns a bool. I will change it.

@henrod
Copy link
Contributor Author

henrod commented May 28, 2019

@codingllama @felipejfc @cscatolini @andrehp any more thoughts?

@cscatolini
Copy link
Contributor

LGTM, but I think the GolangCI is still complaining about a few issues

I liked the approach we ended up with :)

@henrod
Copy link
Contributor Author

henrod commented May 28, 2019

LGTM, but I think the GolangCI is still complaining about a few issues

I liked the approach we ended up with :)

The error on metrics/report.go we already don't check on the other cases =(

Copy link
Contributor

@andrehp andrehp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach isn't my favorite, but it works well enough

constants/errors.go Outdated Show resolved Hide resolved
@@ -0,0 +1,69 @@
// Copyright (c) TFG Co. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be present, and this copyright should be present in all files as well, shouldn't it?

@codingllama
Copy link

@codingllama @felipejfc @cscatolini @andrehp any more thoughts?

No blockers from me, if the other reviewers are happy go for it. :)

@henrod henrod merged commit d1c207f into master May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants