Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of rate-limiting controls #2421

Open
afrittoli opened this issue Aug 24, 2022 · 4 comments
Open

Lack of rate-limiting controls #2421

afrittoli opened this issue Aug 24, 2022 · 4 comments
Labels
kind/security Categorizes issue or PR as related to a security issue lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@afrittoli
Copy link
Member

Describe the bug

From the Tekton security audit:

Tekton Dashboard does not enforce rate limiting of HTTP requests. As a result, we were able to issue over a thousand requests in just over a minute.

Processing requests sent at such a high rate can consume an inordinate amount of resources, increasing the risk of denial-of-service attacks through excessive resource consumption. In particular, we were able to create hundreds of running “import resources” pods that were able to consume nearly all the host’s memory in the span of a minute.

Expected behaviour

Short term, implement rate limiting on all API endpoints.

Long term, run stress tests to ensure that the rate limiting enforced by Tekton Dashboard is robust.

Environment details

  • Versions:
    • Tekton Dashboard: v0.24
@afrittoli afrittoli added the kind/security Categorizes issue or PR as related to a security issue label Aug 24, 2022
@AlanGreene
Copy link
Member

AlanGreene commented Aug 24, 2022

Given the Dashboard is not exposed outside the cluster by default, and that in full read-write mode should never be exposed publicly and should always be deployed behind a reverse proxy for authentication anyway, it could be argued that kubectl, tkn, and other clients are also affected by this issue.

For read-only mode there's less of a concern as new runs / related resources cannot be created in this mode, however large numbers of concurrent requests or a burst of requests in a short period to list / get / watch resources can indeed still cause high resource usage / slow responses.

I'm not sure if this is something that should be built in to the Dashboard application itself or would be better handled by a well-tested reverse proxy solution. Would it make sense to document and/or provide a simple copy-paste example showing how to achieve this with nginx + oauth2-proxy for example?

@tekton-robot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 23, 2022
@AlanGreene
Copy link
Member

We should document an example for this, I'll try to put something together before end of year.
/lifecycle frozen

@tekton-robot tekton-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 24, 2022
@AlanGreene
Copy link
Member

A few months ago I did an experiment to rewrite the Dashboard back-end entirely (as well as rearchitecting the client, some of which we've already recently adopted in #2452 and related issues).

The resulting (partial) rewrite of the app, with the new back-end is in https://github.com/alangreene/dashboard-next

It includes rate-limiting support among other improvements. There will be some breaking changes in config so it'll have to be introduced in a non-breaking manner over a number of releases. I'll be creating issues in the next few weeks to track the various pieces of this with more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/security Categorizes issue or PR as related to a security issue lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Todo
Development

No branches or pull requests

3 participants