-
Notifications
You must be signed in to change notification settings - Fork 18
feat: dedicated endpoint cli commands #254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments but we can merge this and tackle those in upcoming PRs
src/together/cli/api/endpoints.py
Outdated
def stop(client: Together, endpoint_id: str) -> None: | ||
"""Stop a dedicated inference endpoint.""" | ||
client.endpoints.update(endpoint_id, state="STOPPED") | ||
click.echo("Successfully stopped endpoint", err=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not instantly stop the endpoint. It moves to STOPPING
state and then after some time STOPPED
. We can change the message here to indicate that STOPPING is initiated. Maybe we can also add a --wait
option to wait for the STOPPED
state.
src/together/cli/api/endpoints.py
Outdated
def start(client: Together, endpoint_id: str) -> None: | ||
"""Start a dedicated inference endpoint.""" | ||
client.endpoints.update(endpoint_id, state="STARTED") | ||
click.echo("Successfully started endpoint", err=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as stoppped. In start it moves to PENDING
and the STARTING
and then STARTED
. So we can change the message and maybe add a --wait
option
src/together/resources/endpoints.py
Outdated
if min_replicas is not None or max_replicas is not None: | ||
current_min = min_replicas | ||
current_max = max_replicas | ||
if current_min is None or current_max is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is not needed as the api supports updating just one of min or max replicas. There is no harm in doing a check here but then we can avoid the get call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to update the OpenAPI spec in that case, because the autoscaling
object currently says min_replicas
and max_replicas
are required. So that's not the case?
YAY! The tests pass! Ultimately I had to include the generated OpenAPI client files in the repo, because whatever I tried, Poetry (or pip) would not include them. I tried |
8a8f537
to
e68aa19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Have you read the Contributing Guidelines?
Describe your changes
Adds the ability to create, stop, delete start, list and update dedicated endpoints from the CLI:
This approach is a bit different than the other resources. In this case, I have generated an OpenAPI client from our public OpenAPI spec, and used that within the CLI (and resource) to call everything inside dedicated. Because of this, I have added amake
job to also generate the client, and I have updated the spec (here: togethercomputer/openapi#64). We'll need to merge both PRs for dedicated endpoint support.