Skip to content

Allow gitlab URL link shortening from non-gitlab/github.com domains #2068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

mattpitkin
Copy link

@mattpitkin mattpitkin commented Dec 4, 2024

This is a PR for #2065. It allows URL shortening for GitLab/GitHub links that use the gitlab_url/github_url given in html_context, e.g.,:

html_context = {
    "gitlab_url": "https://gitlab.mydomain.com/",
    ...
}

rather than just for GitLab/Github URLs that are in the git[lab/hub].com domain.

Update: I've expanded this PR to also add in the ability to shorten bitbucket URLs.

@mattpitkin mattpitkin changed the title Allow gitlab URL link shortening from non-gitlab.com domains Allow gitlab URL link shortening from non-gitlab/github.com domains Dec 4, 2024
@trallard trallard added the kind: enhancement New feature or request label Dec 10, 2024
@gabalafou gabalafou requested review from gabalafou June 17, 2025 11:46
@gabalafou
Copy link
Collaborator

Thanks @mattpitkin for this work! I felt that this work needed more tests, so I added tests, and that led to some slight refactoring.

@drammock, do you think you could give my additions a review? Please bear in mind that a lot of the test cases are marked with "TODO: this is wrong", but the same is true not just for the bitbucket URLs but for some of the GitHub and GitLab URLs, as well. I think those problems should maybe be tackled in a separate PR. What do you think?

@gabalafou gabalafou requested a review from drammock June 17, 2025 18:53
Comment on lines +37 to +38
# TODO: this is wrong
"pydata/pydata-sphinx-theme#issues",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be pydata/pydata-sphinx-theme/issues I think. Let's do in a follow-up PR

Comment on lines +41 to +42
# TODO: should this be shortened the way that GitHub does it:
# pydata/pydata-sphinx-theme@3caf346
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be ideal. Let's do in a follow-up PR

Comment on lines 47 to 49
# TODO, I belive this is wrong as both orgs/pydata/projects/2 and
# pydata/projects/issue/2 shorten to the same
("github", "https://github.com/orgs/pydata/projects/2", "pydata/projects#2"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree this looks wrong. The collision is unlikely (would need a repo called "projects") but still, I think the # should only be used for issues and PRs, not projects. So this should be pydata/projects/2 I guess.

Comment on lines +105 to +119
elif self.platform == "bitbucket":
# split the url content
parts = path.split("/")

if len(parts) > 0:
text = parts[0] # organisation
if len(parts) > 1 and not (
parts[-2] == "workspace" and parts[-1] == "overview"
):
if len(parts) > 1:
text += f"/{parts[1]}" # repository
if len(parts) > 2:
if parts[2] in ["issues", "pull-requests"]:
itemnumber = parts[-1]
text += f"#{itemnumber}" # element number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hard to follow, and feels brittle to me. Why aren't we using regex for this? I worked up this gist as a POC and it seems to capture the behavior that we want:

https://gist.github.com/drammock/78d2d3c9837aafd1259866c7b936b9e4

presumably similar things will work for other forges (though IIRC gitlab is a bit more complex). @gabalafou WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The code in this PR follows patterns already established for GitHub and GitLab, but I think those patterns are bad.

The way I think this should work is that we should be rather picky about which URL patterns we recognize/support and only shorten those. All the other URLs should be not be shortened. But right now, the code takes the opposite approach, as soon as it sees github.com or gitlab.com, it tries to shorten the URL.

For example, let's say we were just starting to support GitHub URLs. But in this scenario, let's start only with supporting pull request URLs.

Then we would convert the following link, like so:

https://github.com/pydata/pydata-sphinx-theme/101 
  => pydata/pydata-sphinx-theme#101

But we would not convert any of the following links (if we were only supporting pull request links):

https://github.com/pydata/pydata-sphinx-theme/issues
https://github.com/pydata/pydata-sphinx-theme/issues/
https://github.com/pydata/
https://github.com/pydata/pydata-sphinx-theme/commit/3caf346cacd2dad2a192a83c6cc9f8852e5a722e

None of those links would get shortened until we specifically add each type of URL that we want to support.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to go down the route of a rewrite of the link shortening logic, I would be happy to do it. In that case, I would just close this PR and open a new one.

I would abandon the feature in this PR to turn off link shortening because I suspect that part of the motivation for adding a config value to turn it off is because our current link shortener is bad. Perhaps with a better shortener, there would be little to no demand for a config setting to turn it off. What do you think, @mattpitkin?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this PR follows patterns already established for GitHub and GitLab, but I think those patterns are bad. [...] If we want to go down the route of a rewrite of the link shortening logic, I would be happy to do it. In that case, I would just close this PR and open a new one.

Since I think I've basically solved it for bitbucket in the linked Gist, I feel like it might be the right call to just rewrite the others too. +1 to close and open a new PR to refactor what we have and also add bitbucket. See also #2215 which adds link shortening for codeberg/forgejo and gitea.

Co-authored-by: Daniel McCloy <dan@mccloy.info>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants