-
Notifications
You must be signed in to change notification settings - Fork 642
Description
Search before asking
- I had searched in the issues and found no similar issues.To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
What happened
While analyzing the data gathered from our GitHub integration, I stumbled upon an anomaly in a minor subset of deployments. Oddly enough, some deployments were reporting a deployment of incosistent time (in my example 30 weeks).
Given my familiarity with our GitHub operations, I was certain this wasn't accurate, prompting me to dig deeper. My investigation points towards a potential issue with the getDeploymentCommit function, particularly in relation to EnrichPrevSuccessDeploymentCommit.
To illustrate, I've attached an example PR
{
"pull_requests": [
{
"id" : "github:GithubPullRequest:4:1806504909",
"created_at" : "2024-11-12 08:34:13.134",
"updated_at" : "2024-11-12 08:34:13.134",
"_raw_data_params" : "{\"ConnectionId\":4,\"Name\":\"********\"}",
"_raw_data_table" : "_raw_github_api_pull_requests",
"_raw_data_id" : 4499,
"_raw_data_remark" : "",
"base_repo_id" : "github:GithubRepo:4:***********",
"base_ref" : "main",
"base_commit_sha" : "a51af0dabd2d9e9bd7b23e432688842cf086c248",
"head_repo_id" : "github:GithubRepo:4:*******",
"head_ref" : "feature\***************",
"head_commit_sha" : "e6a73f3095d1c97d4d4576e3c06719fb07938bd1",
"merge_commit_sha" : "b297c73092685c4b1a0ee62c2326d77115a86e94",
"status" : "MERGED",
"original_status" : "closed",
"type" : "",
"component" : "",
"title" : "****************************",
"description" : "null",
"url" : "https:\/\/github.com\/******\/*******\/pull\/1025",
"author_name" : "***************",
"author_id" : "github:GithubAccount:4:***********",
"parent_pr_id" : "",
"pull_request_key" : 1025,
"created_date" : "2024-04-04 09:08:03",
"merged_date" : "2024-04-05 09:00:16",
"closed_date" : "2024-04-05 09:00:16",
"additions" : 0,
"deletions" : 0,
"merged_by_name" : "",
"merged_by_id" : "github:GithubAccount:4:0",
"is_draft" : 0
}
]}
As you can see this PR was merged 2024-04-05 09:00:16 yet the deployment linked is incorrect, as it belongs to a different deployment.
{
"project_pr_metrics": [
{
"id" : "github:GithubPullRequest:4:1806504909",
"created_at" : "2024-11-12 08:38:29.797",
"updated_at" : "2024-11-12 08:38:29.797",
"_raw_data_params" : "",
"_raw_data_table" : "",
"_raw_data_id" : 0,
"_raw_data_remark" : "",
"project_name" : "*************",
"first_commit_sha" : "3726ef9fce54e1f95717ab0fbe695403cbf850ff",
"pr_coding_time" : 1,
"first_review_id" : "github:GithubPrReview:4:******",
"pr_pickup_time" : 4,
"pr_review_time" : 1429,
"deployment_commit_id" : "github:GithubRun:4:435823930:11612162783:https:\/\/github.com\/*****\/****",
"pr_deploy_time" : 301201,
"pr_cycle_time" : 302635,
"first_commit_authored_date" : "2024-04-04 09:07:51",
"pr_created_date" : "2024-04-04 09:08:03",
"first_comment_date" : "2024-04-04 09:11:44",
"pr_merged_date" : "2024-04-05 09:00:16",
"pr_deployed_date" : "2024-10-31 13:00:17"
}
]}
Manually checking in the cicd_deployment_commits for the merge_commit_sha (b297c73092685c4b1a0ee62c2326d77115a86e94) I can see that the deployment is there:
{
"cicd_deployment_commits": [
{
"id" : "github:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/****",
"created_at" : "2024-11-12 11:17:32.066",
"updated_at" : "2024-11-12 11:17:33.602",
"_raw_data_params" : "{\"ConnectionId\":4,\"Name\":\"*****\/****\"}",
"_raw_data_table" : "_raw_github_api_runs",
"_raw_data_id" : 2338,
"_raw_data_remark" : "prev_deployment_commit_id_enricher,",
"cicd_scope_id" : "github:GithubRepo:4:435823930",
"cicd_deployment_id" : "github:GithubRun:4:435823930:8567339272",
"name" : "CI\/CD",
"result" : "SUCCESS",
"original_result" : "success",
"status" : "DONE",
"original_status" : "completed",
"environment" : "PRODUCTION",
"original_environment" : "",
"created_date" : "2024-04-05 09:00:19",
"queued_date" : null,
"queued_duration_sec" : null,
"started_date" : "2024-04-05 09:00:19",
"finished_date" : "2024-04-05 09:14:22",
"duration_sec" : 843.0,
"commit_sha" : "b297c73092685c4b1a0ee62c2326d77115a86e94",
"ref_name" : "main",
"repo_id" : "github:GithubRepo:4:435823930",
"repo_url" : "https:\/\/github.com\/*****\/****",
"prev_success_deployment_commit_id" : "",
"commit_msg" : "",
"subtask_name" : "dora.generateDeploymentCommits",
"display_title" : "Merge pull request #1025 from *****\/****",
"url" : "https:\/\/github.com\/*****\/****/actions\/runs\/8567339272"
}
]}
Upon inspection, I've noticed that the prev_success_deployment_commit_id field is empty in our data. I suspect this might be influencing the results of the query in the change_lead_time_calculator.go module.
I'm uncertain if the query should return null when faced with this condition, given that prev_success_deployment_commit_id is consistently empty for the first deployment recorded for the cicd_scope_id/repo_url/env grouping, as per code.
Interestingly, the linked "deployment_commit_id" : "github:GithubRun:4:435823930:11612162783 is the final entry in the cicd_deployment_commits table for the cicd_scope_id/repo_url/env grouping.
What do you expect to happen
PR deployment should be linked to the proper run for this example (github:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/****)
How to reproduce
I will just describe what the status of that repo is, GitHub, deployment through CI/CD workflow, explore for the first PR deployment recorded by Devlake and evaluate the linked deployment.
This issue is only related to the first deployment for each cicd_scope_id/repo_url/env grouping.
Anything else
No response
Version
v1.0.1@e061ef2
Are you willing to submit PR?
- Yes I am willing to submit a PR!To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Code of Conduct
- I agree to follow this project's Code of ConductTo pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Activity
dosubot commentedon Nov 12, 2024
Hello @kostas-petrakis! I'm here to help you with any bugs, questions, or contributions you have. Let me know how I can assist you while we wait for a human maintainer.
The issue you're encountering seems to be related to how the
getDeploymentCommitfunction handles cases where theprev_success_deployment_commit_idis empty. This situation typically occurs for the first deployment in acicd_scope_id/repo_url/envgrouping, as theEnrichPrevSuccessDeploymentCommitfunction resets this field to an empty string for the first deployment in a new set of consecutive deployments [1].In the
change_lead_time_calculator.gomodule, the logic for selecting the deployment commit might not be correctly handling cases whereprev_success_deployment_commit_idis empty, leading to incorrect deployment references being linked. This could explain why the deployment time is inaccurately reported as 30 weeks.To address this, you might need to adjust the logic in the
getDeploymentCommitfunction to ensure it correctly identifies and links the first deployment in such groupings. This could involve modifying the query to account for cases whereprev_success_deployment_commit_idis empty and ensuring it selects the appropriate deployment commit for the first deployment scenario.Since this issue is specific to the first deployment in each grouping, reviewing and potentially updating the logic in both the
getDeploymentCommitandEnrichPrevSuccessDeploymentCommitfunctions to handle these edge cases more effectively would be a good starting point.To continue talking to Dosu, mention @dosu.
Startrekzky commentedon Nov 27, 2024
Hi @kostas-petrakis , I saw your PR #8206
One of DevLake's strategies of
PR-deployment_commitmapping is to skip the mapping of the very firstdeployment_commitin DevLake. Why?Because in many cases,
PRsanddeployment_commitswithin a selected timeframe.precise commitsdeployed by the very first deployment_commit collected by DevLake.In your case, the PR
github:GithubPullRequest:4:1806504909might be deployed by the first deployement_commitgithub:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/****for sure.However, if we allow the first deployment_commit mapped to PRs. If the
deploy timebetween the prior deployment to the first deployment collected by DevLake, many irrelevant PRs merged during this period will be mis-mapped to the deployment commit (some other users had encountered the problem).Finally, we adopted the above strategy of excluding the very first deployment when mapping to PR.
kostas-petrakis commentedon Dec 4, 2024
@Startrekzky thanks for the explanation. However, in my specific use case (and I assume to others looking at the upvotes), I've observed that PRs associated with the initial recorded commit are being mapped to an incorrect deployment. This seems to introduce a different kind of mis-mapping issue, which affects the accuracy of our metrics (as explained with the data above).
I'm wondering if there might be a way to refactor the PR-deployment mapping process to handle this situation more accurately.
github-actions commentedon Feb 3, 2025
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
marktuckcp commentedon Feb 7, 2025
We've been tracking this and it's still an issue for us! Although we're transitioning over to Github Actions we still have a lot of teams using Github for commits and an old Jenkins set up for deploy. Where devlake is finding PRs within Github but can't find a corresponding deploy it's choosing (we think) the first and so earliest deploy available to it, creating massive data skews of 10s of weeks.
If there is a PR that says it's been deployed but devlake can't find a deploy job we'd expect it to just not include that PR at all
kostas-petrakis commentedon Feb 26, 2025
We are in the process of analyzing some of the PRs that have been wrongly mapped to non relevant deployments. This is likely due to the previously mentioned query, which appears to be producing inaccurate results during transformation. We are currently investigating affected PRs.
marktuckcp commentedon Mar 25, 2025
Appreciate it @kostas-petrakis ! We'll keep an eye on this thread, thanks
nicolavolpini commentedon May 8, 2025
Hello @kostas-petrakis , we are also keeping track of this ticket since we are affected.
Have you found out anything more about your affected PRs? Thanks.
3 remaining items
kostas-petrakis commentedon May 23, 2025
Following up on this issue now that we've resolved the more urgent missing DORA calculations.
I can confirm that we're still seeing the same behavior on some of the newly onboarded services. As @hera-a-glitch pointed out, the behavior isn't entirely consistent—likely due to recalculation of DORA? (need to check)
From what I’ve observed, this issue only affects the first batch of collected commits. The current logic attempts to find a previous deployment; if none is found, it links the PR deployment to the latest available one (if I’m recalling the behavior correctly).
I'll dig deeper into this and explore potential solutions. Thanks for your patience!
github-actions commentedon Jul 23, 2025
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
github-actions commentedon Jul 30, 2025
This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.
petkostas commentedon Aug 23, 2025
I am re-opening this as it is still affecting some of our repositories and raising questions from engineering teams.
I took some time to think about a possible approach which would possibly reduce the effect.
@Startrekzky
I am thinking of the following approach:
Adjust
getDeploymentCommitto use a two-phase strategy:commit_shaequals the PR’s merge commit, return it. This is precise and does not risk the first-deployment over-mapping problem.dc.prev_success_deployment_commit_id <> '') to avoid mapping to the first deployment via diffs.The following is how this translates to the implementation (adding here for discussion before opening a PR):
Startrekzky commentedon Aug 25, 2025
Hi @petkostas , thanks for the proposal. I'll discuss with @klesh to figure out how to optimize this part.
github-actions commentedon Oct 25, 2025
This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.
petkostas commentedon Oct 25, 2025
Will test next week