Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control-service: introduce data job deployment entity #2659

Merged

Conversation

mivanov1988
Copy link
Contributor

@mivanov1988 mivanov1988 commented Sep 13, 2023

Why
As part of the VEP-2272 we need to switch the source of truth from Kubernetes to the database.

What
Implemented a database table named "data_job_deployment," designated for the storage of all data associated with data job deployments.

Note: We've observed a problem where data jobs load slowly because Hibernate executes select n+1 queries. Essentially, Hibernate executes one query to fetch all data jobs and an additional query for each data job to load deployment data. This results in a slow response from the CS APIs that depend on JobsRepository.findAll().

To address this issue, we introduced a custom findAll() method that retrieves all data jobs and their deployments in a single query.

Hibernate: select datajob0_.name as name1_0_, datajob0_.enabled as enabled2_0_, datajob0_.db_default_type as db_defau3_0_, datajob0_.description as descript4_0_, datajob0_.enable_execution_notifications as enable_e5_0_, datajob0_.generate_keytab as generate6_0_, datajob0_.name_deprecated as name_dep7_0_, datajob0_.notification_delay_period_minutes as notifica8_0_, datajob0_.notified_on_job_deploy as notified9_0_, datajob0_.notified_on_job_failure_platform_error as notifie10_0_, datajob0_.notified_on_job_failure_user_error as notifie11_0_, datajob0_.notified_on_job_success as notifie12_0_, datajob0_.schedule as schedul13_0_, datajob0_.team as team14_0_, datajob0_.last_execution_duration as last_ex15_0_, datajob0_.last_execution_end_time as last_ex16_0_, datajob0_.last_execution_status as last_ex17_0_, datajob0_.latest_job_deployment_status as latest_18_0_, datajob0_.latest_job_execution_id as latest_19_0_, datajob0_.latest_job_termination_status as latest_20_0_ from data_job datajob0_
Hibernate: select datajobdep0_.data_job_name as data_job1_1_0_, datajobdep0_.deployment_version_sha as deployme2_1_0_, datajobdep0_.enabled as enabled3_1_0_, datajobdep0_.git_commit_sha as git_comm4_1_0_, datajobdep0_.last_deployed_by as last_dep5_1_0_, datajobdep0_.last_deployed_date as last_dep6_1_0_, datajobdep0_.python_version as python_v7_1_0_, datajobdep0_.resources_cpu_limit as resource8_1_0_, datajobdep0_.resources_cpu_request as resource9_1_0_, datajobdep0_.resources_memory_limit as resourc10_1_0_, datajobdep0_.resources_memory_request as resourc11_1_0_, datajob1_.name as name1_0_1_, datajob1_.enabled as enabled2_0_1_, datajob1_.db_default_type as db_defau3_0_1_, datajob1_.description as descript4_0_1_, datajob1_.enable_execution_notifications as enable_e5_0_1_, datajob1_.generate_keytab as generate6_0_1_, datajob1_.name_deprecated as name_dep7_0_1_, datajob1_.notification_delay_period_minutes as notifica8_0_1_, datajob1_.notified_on_job_deploy as notified9_0_1_, datajob1_.notified_on_job_failure_platform_error as notifie10_0_1_, datajob1_.notified_on_job_failure_user_error as notifie11_0_1_, datajob1_.notified_on_job_success as notifie12_0_1_, datajob1_.schedule as schedul13_0_1_, datajob1_.team as team14_0_1_, datajob1_.last_execution_duration as last_ex15_0_1_, datajob1_.last_execution_end_time as last_ex16_0_1_, datajob1_.last_execution_status as last_ex17_0_1_, datajob1_.latest_job_deployment_status as latest_18_0_1_, datajob1_.latest_job_execution_id as latest_19_0_1_, datajob1_.latest_job_termination_status as latest_20_0_1_ from data_job_deployment datajobdep0_ left outer join data_job datajob1_ on datajobdep0_.data_job_name=datajob1_.name where datajobdep0_.data_job_name=?
Hibernate: select datajobdep0_.data_job_name as data_job1_1_0_, datajobdep0_.deployment_version_sha as deployme2_1_0_, datajobdep0_.enabled as enabled3_1_0_, datajobdep0_.git_commit_sha as git_comm4_1_0_, datajobdep0_.last_deployed_by as last_dep5_1_0_, datajobdep0_.last_deployed_date as last_dep6_1_0_, datajobdep0_.python_version as python_v7_1_0_, datajobdep0_.resources_cpu_limit as resource8_1_0_, datajobdep0_.resources_cpu_request as resource9_1_0_, datajobdep0_.resources_memory_limit as resourc10_1_0_, datajobdep0_.resources_memory_request as resourc11_1_0_, datajob1_.name as name1_0_1_, datajob1_.enabled as enabled2_0_1_, datajob1_.db_default_type as db_defau3_0_1_, datajob1_.description as descript4_0_1_, datajob1_.enable_execution_notifications as enable_e5_0_1_, datajob1_.generate_keytab as generate6_0_1_, datajob1_.name_deprecated as name_dep7_0_1_, datajob1_.notification_delay_period_minutes as notifica8_0_1_, datajob1_.notified_on_job_deploy as notified9_0_1_, datajob1_.notified_on_job_failure_platform_error as notifie10_0_1_, datajob1_.notified_on_job_failure_user_error as notifie11_0_1_, datajob1_.notified_on_job_success as notifie12_0_1_, datajob1_.schedule as schedul13_0_1_, datajob1_.team as team14_0_1_, datajob1_.last_execution_duration as last_ex15_0_1_, datajob1_.last_execution_end_time as last_ex16_0_1_, datajob1_.last_execution_status as last_ex17_0_1_, datajob1_.latest_job_deployment_status as latest_18_0_1_, datajob1_.latest_job_execution_id as latest_19_0_1_, datajob1_.latest_job_termination_status as latest_20_0_1_ from data_job_deployment datajobdep0_ left outer join data_job datajob1_ on datajobdep0_.data_job_name=datajob1_.name where datajobdep0_.data_job_name=?
Hibernate: select datajobdep0_.data_job_name as data_job1_1_0_, datajobdep0_.deployment_version_sha as deployme2_1_0_, datajobdep0_.enabled as enabled3_1_0_, datajobdep0_.git_commit_sha as git_comm4_1_0_, datajobdep0_.last_deployed_by as last_dep5_1_0_, datajobdep0_.last_deployed_date as last_dep6_1_0_, datajobdep0_.python_version as python_v7_1_0_, datajobdep0_.resources_cpu_limit as resource8_1_0_, datajobdep0_.resources_cpu_request as resource9_1_0_, datajobdep0_.resources_memory_limit as resourc10_1_0_, datajobdep0_.resources_memory_request as resourc11_1_0_, datajob1_.name as name1_0_1_, datajob1_.enabled as enabled2_0_1_, datajob1_.db_default_type as db_defau3_0_1_, datajob1_.description as descript4_0_1_, datajob1_.enable_execution_notifications as enable_e5_0_1_, datajob1_.generate_keytab as generate6_0_1_, datajob1_.name_deprecated as name_dep7_0_1_, datajob1_.notification_delay_period_minutes as notifica8_0_1_, datajob1_.notified_on_job_deploy as notified9_0_1_, datajob1_.notified_on_job_failure_platform_error as notifie10_0_1_, datajob1_.notified_on_job_failure_user_error as notifie11_0_1_, datajob1_.notified_on_job_success as notifie12_0_1_, datajob1_.schedule as schedul13_0_1_, datajob1_.team as team14_0_1_, datajob1_.last_execution_duration as last_ex15_0_1_, datajob1_.last_execution_end_time as last_ex16_0_1_, datajob1_.last_execution_status as last_ex17_0_1_, datajob1_.latest_job_deployment_status as latest_18_0_1_, datajob1_.latest_job_execution_id as latest_19_0_1_, datajob1_.latest_job_termination_status as latest_20_0_1_ from data_job_deployment datajobdep0_ left outer join data_job datajob1_ on datajobdep0_.data_job_name=datajob1_.name where datajobdep0_.data_job_name=?

Testing Done
Unit tests

Signed-off-by: Miroslav Ivanov miroslavi@vmware.com

@mivanov1988 mivanov1988 force-pushed the person/miroslavi/introduce-data-job-deployment-entity branch from a730a35 to ec695d3 Compare September 13, 2023 12:46
@mivanov1988 mivanov1988 enabled auto-merge (squash) September 13, 2023 12:47
Why
As part of the VEP-2272 we need to switch the source of truth from Kubernetes to the database.

What
Implemented a database table named "data_job_deployment," designated for the storage of all data associated with data job deployments.

Testing Done
Unit tests

Signed-off-by: Miroslav Ivanov miroslavi@vmware.com
@mivanov1988 mivanov1988 force-pushed the person/miroslavi/introduce-data-job-deployment-entity branch from fbd6974 to b755be8 Compare September 13, 2023 13:31
Copy link
Collaborator

@antoniivanov antoniivanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

I don't think we should have explicit dependencies between deploy table and job definitions table since I want the control service to have more of modular monolith type of architecture ("micro"like services within a single app).

But that's not something that I will block merging over.

@mivanov1988 mivanov1988 enabled auto-merge (squash) September 15, 2023 09:30
@mivanov1988 mivanov1988 merged commit a3311b8 into main Sep 15, 2023
8 of 9 checks passed
@mivanov1988 mivanov1988 deleted the person/miroslavi/introduce-data-job-deployment-entity branch September 15, 2023 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants