Skip to content

[Bug][DORA-change_lead_time_calculator] Wrong deployment reference #8188

@kostas-petrakis

Description

@kostas-petrakis
Contributor

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

While analyzing the data gathered from our GitHub integration, I stumbled upon an anomaly in a minor subset of deployments. Oddly enough, some deployments were reporting a deployment of incosistent time (in my example 30 weeks).
Given my familiarity with our GitHub operations, I was certain this wasn't accurate, prompting me to dig deeper. My investigation points towards a potential issue with the getDeploymentCommit function, particularly in relation to EnrichPrevSuccessDeploymentCommit.

To illustrate, I've attached an example PR

{
"pull_requests": [
	{
		"id" : "github:GithubPullRequest:4:1806504909",
		"created_at" : "2024-11-12 08:34:13.134",
		"updated_at" : "2024-11-12 08:34:13.134",
		"_raw_data_params" : "{\"ConnectionId\":4,\"Name\":\"********\"}",
		"_raw_data_table" : "_raw_github_api_pull_requests",
		"_raw_data_id" : 4499,
		"_raw_data_remark" : "",
		"base_repo_id" : "github:GithubRepo:4:***********",
		"base_ref" : "main",
		"base_commit_sha" : "a51af0dabd2d9e9bd7b23e432688842cf086c248",
		"head_repo_id" : "github:GithubRepo:4:*******",
		"head_ref" : "feature\***************",
		"head_commit_sha" : "e6a73f3095d1c97d4d4576e3c06719fb07938bd1",
		"merge_commit_sha" : "b297c73092685c4b1a0ee62c2326d77115a86e94",
		"status" : "MERGED",
		"original_status" : "closed",
		"type" : "",
		"component" : "",
		"title" : "****************************",
		"description" : "null",
		"url" : "https:\/\/github.com\/******\/*******\/pull\/1025",
		"author_name" : "***************",
		"author_id" : "github:GithubAccount:4:***********",
		"parent_pr_id" : "",
		"pull_request_key" : 1025,
		"created_date" : "2024-04-04 09:08:03",
		"merged_date" : "2024-04-05 09:00:16",
		"closed_date" : "2024-04-05 09:00:16",
		"additions" : 0,
		"deletions" : 0,
		"merged_by_name" : "",
		"merged_by_id" : "github:GithubAccount:4:0",
		"is_draft" : 0
	}
]}

As you can see this PR was merged 2024-04-05 09:00:16 yet the deployment linked is incorrect, as it belongs to a different deployment.

{
"project_pr_metrics": [
	{
		"id" : "github:GithubPullRequest:4:1806504909",
		"created_at" : "2024-11-12 08:38:29.797",
		"updated_at" : "2024-11-12 08:38:29.797",
		"_raw_data_params" : "",
		"_raw_data_table" : "",
		"_raw_data_id" : 0,
		"_raw_data_remark" : "",
		"project_name" : "*************",
		"first_commit_sha" : "3726ef9fce54e1f95717ab0fbe695403cbf850ff",
		"pr_coding_time" : 1,
		"first_review_id" : "github:GithubPrReview:4:******",
		"pr_pickup_time" : 4,
		"pr_review_time" : 1429,
		"deployment_commit_id" : "github:GithubRun:4:435823930:11612162783:https:\/\/github.com\/*****\/****",
		"pr_deploy_time" : 301201,
		"pr_cycle_time" : 302635,
		"first_commit_authored_date" : "2024-04-04 09:07:51",
		"pr_created_date" : "2024-04-04 09:08:03",
		"first_comment_date" : "2024-04-04 09:11:44",
		"pr_merged_date" : "2024-04-05 09:00:16",
		"pr_deployed_date" : "2024-10-31 13:00:17"
	}
]}

Manually checking in the cicd_deployment_commits for the merge_commit_sha (b297c73092685c4b1a0ee62c2326d77115a86e94) I can see that the deployment is there:

{
"cicd_deployment_commits": [
	{
		"id" : "github:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/****",
		"created_at" : "2024-11-12 11:17:32.066",
		"updated_at" : "2024-11-12 11:17:33.602",
		"_raw_data_params" : "{\"ConnectionId\":4,\"Name\":\"*****\/****\"}",
		"_raw_data_table" : "_raw_github_api_runs",
		"_raw_data_id" : 2338,
		"_raw_data_remark" : "prev_deployment_commit_id_enricher,",
		"cicd_scope_id" : "github:GithubRepo:4:435823930",
		"cicd_deployment_id" : "github:GithubRun:4:435823930:8567339272",
		"name" : "CI\/CD",
		"result" : "SUCCESS",
		"original_result" : "success",
		"status" : "DONE",
		"original_status" : "completed",
		"environment" : "PRODUCTION",
		"original_environment" : "",
		"created_date" : "2024-04-05 09:00:19",
		"queued_date" : null,
		"queued_duration_sec" : null,
		"started_date" : "2024-04-05 09:00:19",
		"finished_date" : "2024-04-05 09:14:22",
		"duration_sec" : 843.0,
		"commit_sha" : "b297c73092685c4b1a0ee62c2326d77115a86e94",
		"ref_name" : "main",
		"repo_id" : "github:GithubRepo:4:435823930",
		"repo_url" : "https:\/\/github.com\/*****\/****",
		"prev_success_deployment_commit_id" : "",
		"commit_msg" : "",
		"subtask_name" : "dora.generateDeploymentCommits",
		"display_title" : "Merge pull request #1025 from *****\/****",
		"url" : "https:\/\/github.com\/*****\/****/actions\/runs\/8567339272"
	}
]}

Upon inspection, I've noticed that the prev_success_deployment_commit_id field is empty in our data. I suspect this might be influencing the results of the query in the change_lead_time_calculator.go module.

I'm uncertain if the query should return null when faced with this condition, given that prev_success_deployment_commit_id is consistently empty for the first deployment recorded for the cicd_scope_id/repo_url/env grouping, as per code.

Interestingly, the linked "deployment_commit_id" : "github:GithubRun:4:435823930:11612162783 is the final entry in the cicd_deployment_commits table for the cicd_scope_id/repo_url/env grouping.

What do you expect to happen

PR deployment should be linked to the proper run for this example (github:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/****)

How to reproduce

I will just describe what the status of that repo is, GitHub, deployment through CI/CD workflow, explore for the first PR deployment recorded by Devlake and evaluate the linked deployment.
This issue is only related to the first deployment for each cicd_scope_id/repo_url/env grouping.

Anything else

No response

Version

v1.0.1@e061ef2

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct

Activity

added
devopsSomething about CI/CD (devops)
severity/p1This bug affects functionality or significantly affect ux
on Nov 12, 2024
dosubot

dosubot commented on Nov 12, 2024

@dosubot

Hello @kostas-petrakis! I'm here to help you with any bugs, questions, or contributions you have. Let me know how I can assist you while we wait for a human maintainer.

The issue you're encountering seems to be related to how the getDeploymentCommit function handles cases where the prev_success_deployment_commit_id is empty. This situation typically occurs for the first deployment in a cicd_scope_id/repo_url/env grouping, as the EnrichPrevSuccessDeploymentCommit function resets this field to an empty string for the first deployment in a new set of consecutive deployments [1].

In the change_lead_time_calculator.go module, the logic for selecting the deployment commit might not be correctly handling cases where prev_success_deployment_commit_id is empty, leading to incorrect deployment references being linked. This could explain why the deployment time is inaccurately reported as 30 weeks.

To address this, you might need to adjust the logic in the getDeploymentCommit function to ensure it correctly identifies and links the first deployment in such groupings. This could involve modifying the query to account for cases where prev_success_deployment_commit_id is empty and ensuring it selects the appropriate deployment commit for the first deployment scenario.

Since this issue is specific to the first deployment in each grouping, reviewing and potentially updating the logic in both the getDeploymentCommit and EnrichPrevSuccessDeploymentCommit functions to handle these edge cases more effectively would be a good starting point.

To continue talking to Dosu, mention @dosu.

Startrekzky

Startrekzky commented on Nov 27, 2024

@Startrekzky
Contributor

Hi @kostas-petrakis , I saw your PR #8206

One of DevLake's strategies of PR-deployment_commit mapping is to skip the mapping of the very first deployment_commit in DevLake. Why?

Because in many cases,

  1. Users will collect both PRs and deployment_commits within a selected timeframe.
  2. When finding a PR's associated deployment_commit, we need to diff the commits between every consecutive deployment_commits, which means we don't know the precise commits deployed by the very first deployment_commit collected by DevLake.
  3. Hence, we don't know the PRs deployed by the very first deployment_commit.

In your case, the PR github:GithubPullRequest:4:1806504909 might be deployed by the first deployement_commit github:GithubRun:4:435823930:8567339272:https:\/\/github.com\/l*****\/**** for sure.

However, if we allow the first deployment_commit mapped to PRs. If the deploy time between the prior deployment to the first deployment collected by DevLake, many irrelevant PRs merged during this period will be mis-mapped to the deployment commit (some other users had encountered the problem).

Finally, we adopted the above strategy of excluding the very first deployment when mapping to PR.

kostas-petrakis

kostas-petrakis commented on Dec 4, 2024

@kostas-petrakis
ContributorAuthor

@Startrekzky thanks for the explanation. However, in my specific use case (and I assume to others looking at the upvotes), I've observed that PRs associated with the initial recorded commit are being mapped to an incorrect deployment. This seems to introduce a different kind of mis-mapping issue, which affects the accuracy of our metrics (as explained with the data above).
I'm wondering if there might be a way to refactor the PR-deployment mapping process to handle this situation more accurately.

github-actions

github-actions commented on Feb 3, 2025

@github-actions
Contributor

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

marktuckcp

marktuckcp commented on Feb 7, 2025

@marktuckcp

We've been tracking this and it's still an issue for us! Although we're transitioning over to Github Actions we still have a lot of teams using Github for commits and an old Jenkins set up for deploy. Where devlake is finding PRs within Github but can't find a corresponding deploy it's choosing (we think) the first and so earliest deploy available to it, creating massive data skews of 10s of weeks.

If there is a PR that says it's been deployed but devlake can't find a deploy job we'd expect it to just not include that PR at all

kostas-petrakis

kostas-petrakis commented on Feb 26, 2025

@kostas-petrakis
ContributorAuthor

We are in the process of analyzing some of the PRs that have been wrongly mapped to non relevant deployments. This is likely due to the previously mentioned query, which appears to be producing inaccurate results during transformation. We are currently investigating affected PRs.

marktuckcp

marktuckcp commented on Mar 25, 2025

@marktuckcp

Appreciate it @kostas-petrakis ! We'll keep an eye on this thread, thanks

nicolavolpini

nicolavolpini commented on May 8, 2025

@nicolavolpini

Hello @kostas-petrakis , we are also keeping track of this ticket since we are affected.
Have you found out anything more about your affected PRs? Thanks.

3 remaining items

kostas-petrakis

kostas-petrakis commented on May 23, 2025

@kostas-petrakis
ContributorAuthor

Following up on this issue now that we've resolved the more urgent missing DORA calculations.
I can confirm that we're still seeing the same behavior on some of the newly onboarded services. As @hera-a-glitch pointed out, the behavior isn't entirely consistent—likely due to recalculation of DORA? (need to check)
From what I’ve observed, this issue only affects the first batch of collected commits. The current logic attempts to find a previous deployment; if none is found, it links the PR deployment to the latest available one (if I’m recalling the behavior correctly).
I'll dig deeper into this and explore potential solutions. Thanks for your patience!

github-actions

github-actions commented on Jul 23, 2025

@github-actions
Contributor

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

github-actions

github-actions commented on Jul 30, 2025

@github-actions
Contributor

This issue has been closed because it has been inactive for a long time. You can reopen it if you encounter the similar problem in the future.

self-assigned this
on Aug 23, 2025
petkostas

petkostas commented on Aug 23, 2025

@petkostas
Contributor

I am re-opening this as it is still affecting some of our repositories and raising questions from engineering teams.
I took some time to think about a possible approach which would possibly reduce the effect.
@Startrekzky
I am thinking of the following approach:

Adjust getDeploymentCommit to use a two-phase strategy:

  • Direct match: if there exists a successful PRODUCTION deployment in the project whose commit_sha equals the PR’s merge commit, return it. This is precise and does not risk the first-deployment over-mapping problem.
  • Fallback to diff-based mapping: retain the current strategy (including the filter dc.prev_success_deployment_commit_id <> '') to avoid mapping to the first deployment via diffs.

The following is how this translates to the implementation (adding here for discussion before opening a PR):

// getDeploymentCommit takes a merge commit SHA, a repository ID, a list of deployment pairs, and a database connection as input.
// It returns the deployment pair related to the merge commit, or nil if not found.
func getDeploymentCommit(mergeSha string, projectName string, db dal.Dal) (*devops.CicdDeploymentCommit, errors.Error) {
	// Phase 1: direct mapping when the deployment run's commit_sha equals the PR's merge commit.
	// This is precise and safe, and should be allowed even if it's the first deployment in the collected window.
	direct := make([]*devops.CicdDeploymentCommit, 0, 1)
	err := db.All(
		&direct,
		dal.Select("dc.*"),
		dal.From("cicd_deployment_commits dc"),
		dal.Join("LEFT JOIN project_mapping pm ON (pm.table = 'cicd_scopes' AND pm.row_id = dc.cicd_scope_id)"),
		dal.Where("pm.project_name = ? AND dc.commit_sha = ? AND dc.result = ? AND dc.environment = 'PRODUCTION'", projectName, mergeSha, devops.RESULT_SUCCESS),
		dal.Orderby("dc.started_date, dc.id ASC"),
		dal.Limit(1),
	)
	if err != nil {
		return nil, err
	}
	if len(direct) > 0 {
		return direct[0], nil
	}

	// Phase 2: fall back to diff-based mapping (skipping the first deployment to avoid over-mapping).
	deploymentCommits := make([]*devops.CicdDeploymentCommit, 0, 1)
	// do not use `.First` method since gorm would append ORDER BY ID to the query which leads to an error
	err = db.All(
		&deploymentCommits,
		dal.Select("dc.*"),
		dal.From("cicd_deployment_commits dc"),
		dal.Join("LEFT JOIN cicd_deployment_commits p ON (dc.prev_success_deployment_commit_id = p.id)"),
		dal.Join("LEFT JOIN project_mapping pm ON (pm.table = 'cicd_scopes' AND pm.row_id = dc.cicd_scope_id)"),
		dal.Join("INNER JOIN commits_diffs cd ON (cd.new_commit_sha = dc.commit_sha AND cd.old_commit_sha = COALESCE (p.commit_sha, ''))"),
		dal.Where("dc.prev_success_deployment_commit_id <> ''"),
		dal.Where("dc.environment = 'PRODUCTION'"), // TODO: remove this when multi-environment is supported
		dal.Where("pm.project_name = ? AND cd.commit_sha = ? AND dc.RESULT = ?", projectName, mergeSha, devops.RESULT_SUCCESS),
		dal.Orderby("dc.started_date, dc.id ASC"),
		dal.Limit(1),
	)
	if err != nil {
		return nil, err
	}
	if len(deploymentCommits) == 0 {
		return nil, nil
	}
	return deploymentCommits[0], nil
}
Startrekzky

Startrekzky commented on Aug 25, 2025

@Startrekzky
Contributor

Hi @petkostas , thanks for the proposal. I'll discuss with @klesh to figure out how to optimize this part.

github-actions

github-actions commented on Oct 25, 2025

@github-actions
Contributor

This issue has been automatically marked as stale because it has been inactive for 60 days. It will be closed in next 7 days if no further activity occurs.

petkostas

petkostas commented on Oct 25, 2025

@petkostas
Contributor

Will test next week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

devopsSomething about CI/CD (devops)severity/p1This bug affects functionality or significantly affect uxtype/bugThis issue is a bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @petkostas@Startrekzky@nicolavolpini@kostas-petrakis@marktuckcp

    Issue actions

      [Bug][DORA-change_lead_time_calculator] Wrong deployment reference · Issue #8188 · apache/incubator-devlake