-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add feedback trace server APIs #1718
Conversation
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=502ad1986eaf1644dab0f45996a0989a5ed3090d |
74bf345
to
e0387bc
Compare
e0387bc
to
83f08b9
Compare
result = query.execute(self.ch_client) | ||
return tsi.FeedbackQueryRes(result=result) | ||
|
||
def feedback_purge(self, req: tsi.FeedbackPurgeReq) -> tsi.FeedbackPurgeRes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels extremely risky - a user would be able to purge an entire project's feedback in 1 call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a concern for me too. Batch deletion seems useful, but we could make it so we only allow purging a list of ids. We could also keep this as is but make the Python API more restrictive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we might want to heavily restrict this to a list of ids for now
feedback_id = generate_id() | ||
created_at = datetime.datetime.now(ZoneInfo("UTC")) | ||
# TODO: Any validation on weave_ref? | ||
# TODO: What should max payload size be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1024 seems like a pretty sufficient cap for now - couple paragraphs in note form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I picked a number rather arbitrarily - small because it is easier to loosen than tighten, but I'm open to alternatives.
@@ -0,0 +1,561 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3 loving this!
39395e9
to
57dd455
Compare
class FeedbackQueryReq(BaseModel): | ||
project_id: str = Field(examples=["entity/project"]) | ||
fields: typing.Optional[list[str]] = Field( | ||
default=None, examples=[["id", "feedback_type", "payload.note", "count(*)"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting idea to combine count(*)
in the field selection. I'm curious to read more below how this is handled - both from a sql injection perspective but also if you have count & fields, then you need 2 queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't added GROUP BY
or HAVING
to ORM yet. Kind of hacked in count(*)
because it was useful for tests. Re: injection, I'm only allowing that specific string, we'd have to get fancier if we wanted to support functions more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it is just useful for tests, i might not advertise it as a primary example in case you want to change the way aggregations are specified when you are doing an agg operation
wb_user_id: str | ||
|
||
|
||
# The response provides the additional fields needed to convert a request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by this? Is there a back-and-forth here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create request has a "partial" feedback object, the server does things like assign the id, determine the creation timestamp, possibly augment the payload (e.g. in the emoji case we augment it with the shortcode). So the server response includes all of the fields that are needed to make a complete "Feedback" object that will match what you would get if you had queried for all of the feedback.
So e.g. in the Python SDK, the response from creation allows us to update the list of feedback without having to do a separate roundtrip query or complete refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - makes sense
|
||
class FeedbackQueryRes(BaseModel): | ||
# Note: this is not a list of Feedback because user can request any fields. | ||
result: list[dict[str, typing.Any]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too bad we can't have Partials
|
||
|
||
class FeedbackQueryRes(BaseModel): | ||
# Note: this is not a list of Feedback because user can request any fields. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting that we support cases where the fields are not present. alternative idea for the "fields" in the request to only control payload fields, which would result in a list of valid feedbacks (where the payload could be modified/restricted)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of feedback there are probably not a lot of cases (although supporting COUNT(*) is already useful) where you wouldn't want the payload, but if think about API patterns for more complicated objects, like calls, I think there are more cases where you want the ability to only pull down the fields you need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree that it is useful to filter fields. However, in practice, there are a bunch of id/date/name like fields that are low cost. Then there are big ones (inputs, output, attributes, summary). Similarly here you have a few simple fields, then a possibly huge field (payload). My mental model for calls has been to always return the "simple fields". Then essentially have "partials" for the others. In such cases, the typing would be consistent because the big fields are still dicts, just empty.
I don't think this is a blocking comment btw, but maybe something to consider (both ways) as we start to harden the API
class FeedbackPurgeReq(BaseModel): | ||
project_id: str = Field(examples=["entity/project"]) | ||
# TODO: Should we make this required? | ||
query: typing.Optional[Query] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't like this ... feels very easy to mistakenly delete everything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opinion on how I should mitigate that? Add a limit on how many can rows be deleted at once? Change the API to only allow deletion by ID? Keep it as a (required) Query parameter but raise if it is too generic (by some heuristic)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to retain the query capability, I'd prefer to just allow a single query shape that matches what you need (which i think could just be an "and" over ids)
return self.db_name or self.name | ||
|
||
def create_sql(self) -> str: | ||
sql_type = "TEXT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be self.type
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seemed like in SQLite we were mapping all of the field types ORM currently supports (string, json, datetime) to a TEXT column, but this would need to be updated with an actual mapping to support other column types.
Internal Jira: https://wandb.atlassian.net/browse/WB-18697
Adds Trace Server APIs for create / query / purge. These APIs are based on a new "ORM" layer that helps us have more similar code between ClickHouse and SQLite.
Emoji utilities were previously reviewed in core, bringing them over to this repo to correspond to change in where validation logic lives.
@tssweeney - there is obviously a bunch of code duplication here with the mongo processing in Calls that we should figure out what to do about.