-
Notifications
You must be signed in to change notification settings - Fork 181
docs: k8s quickstart and observability with k8s #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Need further testing, leave it as a draft. |
@JaredforReal this is great! Would you be available to follow up with another PR to add a GHA to deploy k8s and run validation? A quick solution could be use the kind-action to create an env. |
@rootfs Sure, I’ll work on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some review comments.
@@ -0,0 +1,224 @@ | |||
# Deployment Quickstart |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to rename this web page as "Containerized Deployment Quickstart". Since there is already a "Install in Local" and this guide is specifically about running semantic router in a container (with Docker or Kubernetes) its better to make it clear and differentiate that this guide is about containerized install.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good idea
# Deployment Quickstart | ||
|
||
This unified guide helps you quickly run Semantic Router locally (Docker Compose) or in a cluster (Kubernetes) and explains when to choose each path. Both share the same configuration concepts: Docker is ideal for rapid iteration and demos, while Kubernetes is suited for long‑running workloads, elasticity, and upcoming Operator / CRD scenarios. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guide describes deployment of the Semantic router as a containerized component using either Docker Compose or a Kubernetes cluster. This does not cover deployment of the Envoy router or the LLM endpoints inside the same containerized environment (for instance the same Kubernetes cluster). To follow this guide, you must still separately deploy the Envoy gateway and LLM endpoints separately following the instructions in the [`Install in Local`](../installation.md] guide. Future guides will cover additional deployment scenarios including where the other components such as Envoy gateway are also running in the Kubernetes cluster and using different types of controllers such as Istio or Gateway API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JaredforReal BTW I am working on this issue #39 which covers the case of running with Envoy gateway also in Kubernetes and using controllers like Istio or Gateway api so will add documentation for those scenarios as part of the PR for that issue. So in this PR your documentation can cover the case where just the semantic router is in kubernetes but the rest of the deployment is the same as described in the "Install in Local" guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification! I’ll keep the scope of this PR to the case where only the semantic router is deployed in Kubernetes, with Envoy and the LLM endpoints still following the steps in the Install in Local guide.
But for Docker Compose, we’ve already automated the Envoy setup and also provide a testing profile with a mock vLLM, so that developers can get a lightweight but complete experience out of the box. I’ll continue improving the developer experience for both setups in future PRs, and I look forward to aligning with the work in issue #39 once it’s ready.
sorry to trouble you, plz move the docs into new paths: https://vllm-semantic-router.com/docs/installation |
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
@JaredforReal thanks for writing this up! We have queries in slack about envoy proxy install, would you please follow up with instructions too? Thanks! |
@rootfs Sure, will work on it |
* fix typo & add k8s quickstart doc Signed-off-by: JaredforReal <w13431838023@gmail.com> * change docker to deploy quickstart Signed-off-by: JaredforReal <w13431838023@gmail.com> * refactor deploy-quickstart.md Signed-off-by: JaredforReal <w13431838023@gmail.com> * declare k8s needs seperate llm endpoint and envoy set up Signed-off-by: JaredforReal <w13431838023@gmail.com> * add some reference in k8s requirement Signed-off-by: JaredforReal <w13431838023@gmail.com> * change docker to deploy quickstart Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Signed-off-by: liuhy <liuhongyu@apache.org>
What type of PR is this?
docs: k8s quickstart and observability with k8s
What this PR does / why we need it:
fix: typo in
docker-compose.yml
.fix: command error in
tools/mock-vllm/Dockerfile
, COPY needs 2 parameters.docs: change
docker-quickstart.md
todeploy-quickstart.md
, add k8s quickstart to it.docs: add k8s observability in
observability.md
.Which issue(s) this PR fixes:
a preparation to #48
Thoughts on Docker Compose and k8s:
I will try clarifying this difference in the docs and improving both experiences in future PRs
Love to hear any suggestions from the community!