Skip to content

Add recipe to update SCM tag based on Git origin #5647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 19, 2025

Conversation

e5LA
Copy link
Contributor

@e5LA e5LA commented Jun 20, 2025

What's changed?

  • Added a new recipe UpdateScmFromGitOrigin that updates the <scm> section in pom.xml based on the Git origin URL.
  • Added helper class ScmValues for URL generation logic.
  • Added search recipe FindScm used as a precondition to check if <scm> tag is present.

What's your motivation?

Anything in particular you'd like reviewers to focus on?

As it's my first recipe, I would say:

  • Use of Visitor + use of Preconditions.check(new FindScm(), ...)
  • Use of GitProvenance and initalization of git
  • Correctness / completeness of the ScmValues logic for various Git origin formats

Checklist

  • I've added unit tests to cover both positive and negative cases
  • I've read and applied the recipe conventions and best practices
  • I've used the IntelliJ IDEA auto-formatter on affected files

@e5LA e5LA force-pushed the update-scm-receipe branch from 8934852 to 0b2bc05 Compare June 20, 2025 14:23
*/
package org.openrewrite.maven.utilities;

public class ScmValues {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be used by future AddScm recipe.

Also, once higher java is supported, could be replaced by Record

@e5LA e5LA marked this pull request as draft June 20, 2025 15:25
@e5LA e5LA marked this pull request as ready for review June 20, 2025 20:03
@e5LA
Copy link
Contributor Author

e5LA commented Jun 20, 2025

Hmm, I'm not sure why these tests have failed. I don't think my changes impact them.
Are there any known issues or test flakiness that might explain this?

If someone with permissions could help re-trigger the build, that’d be appreciated.

@timtebeek timtebeek added the recipe Requested Recipe label Jun 23, 2025
@timtebeek timtebeek moved this from In Progress to Ready to Review in OpenRewrite Jun 23, 2025
@e5LA
Copy link
Contributor Author

e5LA commented Jun 23, 2025

Thanks for triggering the build, @timtebeek.
One of my tests was failing - must have missed it locally after some cleanup and refactoring..

Comment on lines 34 to 52
if (origin.startsWith("git@")) {
// SSH origin
String hostAndPath = cleanOrigin.substring("git@".length()).replaceFirst(":", "/");
url = "https://" + hostAndPath;
connection = "scm:git:https://" + hostAndPath + ".git";
developerConnection = "scm:git:" + origin;
} else if (origin.startsWith("http://") || origin.startsWith("https://")) {
// HTTPS origin
url = cleanOrigin;
connection = "scm:git:" + origin;
String sshPath = cleanOrigin
.replaceFirst("^https?://", "") // github.com/user/repo
.replaceFirst("/", ":"); // github.com:user/repo
developerConnection = "scm:git:git@" + sshPath + ".git";
} else {
url = cleanOrigin;
connection = "scm:git:" + origin;
developerConnection = "scm:git:" + origin;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder a bit about the logic here specifically for developerConnection, as we already have GitRemote to go in the other direction that seems to differ for some Services. Would it make sense to harmonize these here?

servers.add(new RemoteServer(Service.GitHub, "github.com", URI.create("https://github.com"), URI.create("ssh://git@github.com")));
servers.add(new RemoteServer(Service.GitLab, "gitlab.com", URI.create("https://gitlab.com"), URI.create("ssh://git@gitlab.com")));
servers.add(new RemoteServer(Service.BitbucketCloud, "bitbucket.org", URI.create("https://bitbucket.org"), URI.create("ssh://git@bitbucket.org")));
servers.add(new RemoteServer(Service.AzureDevOps, "dev.azure.com", URI.create("https://dev.azure.com"), URI.create("ssh://git@ssh.dev.azure.com")));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I feel like maybe we can keep ScmValues as private class of UpdateScmfromGitOrigin for now, as we're usually a bit hesitant to add public API surface before we've seen a few more use cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing and making the changes @timtebeek !

I have just looked closely at GitRemote class and to be honest, I'm not sure how we could harmonize it.

Tbh, more I look at the values, more confused I get..
AFAIK, in SCM we need:

<url>

which usually is http or https, without .git at the end of the path. So e.g.: <url>https://new-server.example.com/username/repo</url>

We can't use git.getOrigin() directly (because it may contain value like: git@new-server.example.com:username/repo.git) so we would need to do some String operations.

Looking at git.getGitRemote():

{GitRemote@5224}
 service = {GitRemote$Service@5225} "Unknown"
 url = "git@new-server.example.com:username/repo.git"
 origin = "new-server.example.com"
 path = "username/repo"
 organization = "username"
 repositoryName = "repo"

we could build url from origin + path + .git. We would also need to derive the protocol from url.
So it's kind of the same as we do in ScmValues, isn't it?

I also wonder if for origin that has ssh:// as protocol, shall we still use https or keep ssh:// in url?

<connection>

starts with scm:git: and then we could put value of <url> + .git. We can't use url directly.

<developerConnection>

starts with scm:git:. Then I'm not sure if we should always go with e.g. git@github.com:username/repo.git even if git.origin contains https as protocol. Because in fact 'write-access' could also go via https afaik.
For ssh:// I would assume it's: scm:git:ssh://git@github.com/username/repo.git

In any case, I'm not sure if GitRemote can help here. For reference, for https protocol we get:

result = {GitRemote@5222} 
 service = {GitRemote$Service@5223} "Unknown"
 url = "https://new-server.example.com/username/repo.git"
 origin = "new-server.example.com"
 path = "username/repo"
 organization = "username"
 repositoryName = "repo"

--
Did I miss anything? Are these SCM values Service -specific in any way? Could we easily derive them from GitRemote?
Another idea would be to extend GitRemote with the logic to derive the values, but not sure if scm-specific logic really belongs there.

Regarding making ScmValues class private - happy to do so if we decide to keep it.
Having that said, I wonder how we’ll reuse this logic when creating a new Recipe to add the tag (#5561). (But maybe that’s a tomorrow's problem :) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @timtebeek, I've marked ScmValues as private for now.
Is there anything else I can do before we move this forward?

Copy link
Member

@timtebeek timtebeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see already! Shaping up nicely. I made some small changes already, and a suggestion to reduce the public API we start with; I think once addressed we could add this fairly soon after. Thanks again!

@timtebeek
Copy link
Member

timtebeek commented Jul 5, 2025

Great to see! I've pushed up some light polish, and I'm not spot checking the results against Apache projects. I'm finding a few patterns that I wonder if we should support, such as for instance
image

Notice how

  • we're stripping off the .git suffix on <connection>; should we retain the .git suffix there if present previously ?
  • switching the <developerConnection> from https to git@; should we allow <developerConnection>scm:git:https://... ?
  • correctly adding the incubator- prefix that was missing before.

image

Ideally I think we'd see minimal changes unless there's a host/path that needs to be corrected; that way we can run this recipe broadly and only have to review proper fixes as opposed to a broader set of potentially more contentious changes.

@timtebeek
Copy link
Member

If you'd like to similarly see the effects at scale, then you can follow these steps:

  1. Download the Moderne CLI
  2. Set your recipe as the active recipe (requires a license :( )
  3. mod git clone moderne apache/ Apache --metadata-only
  4. mod build apache/ --download-only
  5. mod run apache/ --active-recipe

You can also achieve step 5 without step 2 by using ./gradlew pTML followed by mod config recipes jar install org.openrewrite:rewrite-maven:LATEST.

Hope that helps!

@e5LA
Copy link
Contributor Author

e5LA commented Jul 8, 2025

Thanks @timtebeek for the feedback and instructions how to run the Recipe on apache repos. Really helpful.

After seeing the changes you got in the apache repo, I agree with you - we should indeed try to keep the changes minimal..

So, another approach I’m considering, would be: instead of generating the SCM url, connection and developerConnect values entirely from git.origin and then replacing the full value in the tags - I would rather extract just the host and path from git.origin and use that to replace only the corresponding part of the existing SCM values. This way, we keep the original protocol and .git suffix.

With this approach, we may not be able to update the protocol.. But it’s likely a safer option overall.

Would you agree with this approach, or do you see any other potential issues?

@timtebeek
Copy link
Member

That's indeed likely a safer option. One thing to keep an eye on then still is that some git hosts use a context-path like /scm which for instance github.com does not have. As always there's more edge cases the closer you look, and scale helps bring those to light. :) Appreciate your help here!

@e5LA e5LA force-pushed the update-scm-receipe branch from 775baa6 to 8f3460a Compare July 9, 2025 19:21
@timtebeek timtebeek self-requested a review July 9, 2025 21:29
@e5LA
Copy link
Contributor Author

e5LA commented Jul 10, 2025

I've updated the logic, now it looks a bit more complex..

I've also run it against few apache projects:
image

--

image

--

image

@timtebeek
Copy link
Member

Thanks a lot! I'll be camping over the weekend & back Wednesday, so review from my side might take a while longer, but know it's appreciated and I hope to get it in soon after I'm back.

Copy link
Member

@timtebeek timtebeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see! Thanks a lot for a neat addition for the Apache Maven best practices.

@timtebeek timtebeek merged commit 3db8e80 into openrewrite:main Jul 19, 2025
2 checks passed
@github-project-automation github-project-automation bot moved this from Ready to Review to Done in OpenRewrite Jul 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
recipe Requested Recipe
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Maven - Update the scm configuration to match the origin of the repository
2 participants