-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mixpanel grouping improvements #2610
Conversation
Important Auto Review SkippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the To trigger a single review, invoke the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Quickstart template updates in |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
LLM Finetuning template updates in |
IdentitySchema( | ||
id=str(GlobalConfiguration().user_id).replace("-", "") | ||
) | ||
deployment_id = os.getenv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefannica I'm unsure whether this is the right place to use this environment variable. When implemented like this, the environment variable will only be respected when deploying the server for the first time. We could use it to have precedence over the DB entry by checking for the env variable existence in SQLZenStore.get_deployment_id()
. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not adding yet another environment variable for this, it just complicates this even more. We already have ZENML_USER_ID
that can be used to override the GlobalConfiguration().user_id
.
If I understand correctly, you're just trying to override the server identity as reported by the server (server_info.id
) with the tenant ID when the server is deployed as a cloud tenant:
- use the
ZENML_SERVER_EXTERNAL_SERVER_ID
environment variable (or callserver_config().external_server_id
if set and if running on server), instead of this newENV_ZENML_DEPLOYMENT_ID
env var. This takes care of newly deployed tenants. - for existing tenants, you can either manually update the IDs in the database to retrospectively match their tenant IDs, or you can also patch
SQLZenStore.get_deployment_id()
to use the same override. I prefer the former, because it keeps things nice and consistent and persists all information in the DB. Also easier to apply without upgrades.
Please note that this change is disruptive in many ways, because the deployment ID is used in quite a few places:
- all existing API tokens are invalidated (users will have to re-login, running pipelines will fail unless they use service accounts)
- obviously, it's like a complete change of identity, so new telemetry events will be disconnected from the old ones, unless some other means are used to group them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should probably reuse the ZENML_SERVER_EXTERNAL_SERVER_ID
env variable. But even if doing that, my question remains the same: Should I just use it once when the DB is first initialized and from that point on use the value stored in the Identity
table of the database. Or should this env variable be checked anytime the server_info.id
is checked?
E2E template updates in |
NLP template updates in |
IdentitySchema( | ||
id=str(GlobalConfiguration().user_id).replace("-", "") | ||
) | ||
deployment_id = os.getenv( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not adding yet another environment variable for this, it just complicates this even more. We already have ZENML_USER_ID
that can be used to override the GlobalConfiguration().user_id
.
If I understand correctly, you're just trying to override the server identity as reported by the server (server_info.id
) with the tenant ID when the server is deployed as a cloud tenant:
- use the
ZENML_SERVER_EXTERNAL_SERVER_ID
environment variable (or callserver_config().external_server_id
if set and if running on server), instead of this newENV_ZENML_DEPLOYMENT_ID
env var. This takes care of newly deployed tenants. - for existing tenants, you can either manually update the IDs in the database to retrospectively match their tenant IDs, or you can also patch
SQLZenStore.get_deployment_id()
to use the same override. I prefer the former, because it keeps things nice and consistent and persists all information in the DB. Also easier to apply without upgrades.
Please note that this change is disruptive in many ways, because the deployment ID is used in quite a few places:
- all existing API tokens are invalidated (users will have to re-login, running pipelines will fail unless they use service accounts)
- obviously, it's like a complete change of identity, so new telemetry events will be disconnected from the old ones, unless some other means are used to group them
…enml-io/zenml into feature/mixpanel-grouping-improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the comment regarding extra hyphens, this looks good
id_ = ( | ||
ServerConfiguration.get_server_config().external_server_id | ||
or GlobalConfiguration().user_id | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed the replace("-", "")
part here. Is that intentional ? In the database, UUIDs are stored without their hyphens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was intentional. The UUID is stored as a string without their hyphens, but sqlmodel/pydantic takes care of that for us (We use UUID in lots of other places). What really happened here I think is the following:
- We convert the UUID to a string and replace the hyphens
- We create the pydantic/SQLModel object, which converts the string to a UUID.
- When storing in the DB, SQLModel again converts to a string and replaces the hyphens 😄
Describe changes
This PR contains two improvements for mixpanel groups:
server_id
instead ofgroup_id
in the segment group callserver_id
when deploying a server so we can set it to the tenant ID when deploying cloud tenantsPre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes