Skip to content

Allow gitlab URL link shortening from non-gitlab/github.com domains #2068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -39,6 +39,7 @@ repos:
hooks:
- id: djlint-jinja
types_or: ["html"]
exclude: ^tests/test_build/.*\.html$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linter was complaining about http link in a test fixture so I excluded it here.


- repo: "https://github.com/PyCQA/doc8"
rev: v1.1.2
2 changes: 2 additions & 0 deletions docs/user_guide/source-buttons.rst
Original file line number Diff line number Diff line change
@@ -4,6 +4,8 @@ Source Buttons

Source buttons are links to the source of your page's content (either on your site, or on hosting sites like GitHub).

.. _add-edit-button:

Add an edit button
==================

22 changes: 21 additions & 1 deletion docs/user_guide/theme-elements.md
Original file line number Diff line number Diff line change
@@ -212,7 +212,7 @@ All will end up as numbers in the rendered HTML, but in the source they look lik

## Link shortening for git repository services

Many projects have links back to their issues / PRs hosted on platforms like **GitHub** or **GitLab**.
Many projects have links back to their issues / PRs hosted on platforms like **GitHub**, **GitLab**, or **Bitbucket**.
Instead of displaying these as raw links, this theme does some lightweight formatting for these platforms specifically.

In **reStructuredText**, URLs are automatically converted to links, so this works automatically.
@@ -252,5 +252,25 @@ There are a variety of link targets supported, here's a table for reference:
- `https://gitlab.com/gitlab-org`: https://gitlab.com/gitlab-org
- `https://gitlab.com/gitlab-org/gitlab`: https://gitlab.com/gitlab-org/gitlab
- `https://gitlab.com/gitlab-org/gitlab/-/issues/375583`: https://gitlab.com/gitlab-org/gitlab/-/issues/375583
- `https://gitlab.com/gitlab-org/gitlab/-/merge_requests/174667`: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/174667

**Bitbucket**

- `https://bitbucket.org`: https://bitbucket.org
- `https://bitbucket.org/atlassian/workspace/overview`: https://bitbucket.org/atlassian/workspace/overview
- `https://bitbucket.org/atlassian/aui`: https://bitbucket.org/atlassian/aui
- `https://bitbucket.org/atlassian/aui/pull-requests/4758`: https://bitbucket.org/atlassian/aui/pull-requests/4758

Links provided with a text body won't be changed.

If you have links to GitHub, GitLab, or Bitbucket repository URLs that are on non-standard domains
(i.e., not on `github.com`, `gitlab.com`, or `bitbucket.org`, respectively), then these will be
shortened if the base URL is given in the `html_context` section of your `conf.py` file (see
{ref}`Add an edit button <add-edit-button>`), e.g.,

```python
html_context = {
"gitlab_url": "https://gitlab.mydomain.com", # your self-hosted GitLab
...
}
```
28 changes: 28 additions & 0 deletions src/pydata_sphinx_theme/__init__.py
Original file line number Diff line number Diff line change
@@ -275,6 +275,33 @@ def _fix_canonical_url(
context["pageurl"] = app.config.html_baseurl + target


def _add_self_hosted_platforms_to_link_transform_class(app: Sphinx) -> None:
if not hasattr(app.config, "html_context"):
return

# Use list() to force the iterator to completion because the for-loop below
# can modify the dictionary.
platforms = list(short_link.ShortenLinkTransform.supported_platform.values())

for platform in platforms:
# {platform}_url -- e.g.: github_url, gitlab_url, bitbucket_url
self_hosted_url = app.config.html_context.get(f"{platform}_url", None)
if self_hosted_url is None:
continue
parsed = urlparse(self_hosted_url)
if parsed.scheme not in ("http", "https"):
raise Exception(
f"If you provide a value for html_context option {platform}_url,"
" it must begin with http or https."
)
if not parsed.netloc:
raise Exception(
f"Unsupported URL provided for html_context option {platform}_url."
" Could not get domain (netloc) from ${self_hosted_url}."
)
short_link.ShortenLinkTransform.add_platform_mapping(platform, parsed.netloc)


def setup(app: Sphinx) -> Dict[str, str]:
"""Setup the Sphinx application."""
here = Path(__file__).parent.resolve()
@@ -286,6 +313,7 @@ def setup(app: Sphinx) -> Dict[str, str]:

app.connect("builder-inited", translator.setup_translators)
app.connect("builder-inited", update_config)
app.connect("builder-inited", _add_self_hosted_platforms_to_link_transform_class)
app.connect("html-page-context", _fix_canonical_url)
app.connect("html-page-context", edit_this_page.setup_edit_url)
app.connect("html-page-context", toctree.add_toctree_functions)
7 changes: 6 additions & 1 deletion src/pydata_sphinx_theme/assets/styles/base/_base.scss
Original file line number Diff line number Diff line change
@@ -48,7 +48,8 @@ a {

// set up a icon next to the shorten links from github and gitlab
&.github,
&.gitlab {
&.gitlab,
&.bitbucket {
&::before {
color: var(--pst-color-text-muted);
font: var(--fa-font-brands);
@@ -63,6 +64,10 @@ a {
&.gitlab::before {
content: var(--pst-icon-gitlab);
}

&.bitbucket::before {
content: var(--pst-icon-bitbucket);
}
}

%heading-style {
Original file line number Diff line number Diff line change
@@ -20,6 +20,7 @@ html {
--pst-icon-search-minus: "\f010"; // fa-solid fa-magnifying-glass-minus
--pst-icon-github: "\f09b"; // fa-brands fa-github
--pst-icon-gitlab: "\f296"; // fa-brands fa-gitlab
--pst-icon-bitbucket: "\f171"; // fa-brands fa-bitbucket
--pst-icon-share: "\f064"; // fa-solid fa-share
--pst-icon-bell: "\f0f3"; // fa-solid fa-bell
--pst-icon-pencil: "\f303"; // fa-solid fa-pencil
227 changes: 153 additions & 74 deletions src/pydata_sphinx_theme/short_link.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
"""A custom Transform object to shorten github and gitlab links."""

import re

from typing import ClassVar
from urllib.parse import ParseResult, urlparse, urlunparse
from urllib.parse import urlparse

from docutils import nodes
from sphinx.transforms.post_transforms import SphinxPostTransform
@@ -12,8 +14,8 @@

class ShortenLinkTransform(SphinxPostTransform):
"""
Shorten link when they are coming from github or gitlab and add an extra class to
the tag for further styling.
Shorten link when they are coming from github, gitlab, or bitbucket and add
an extra class to the tag for further styling.

Before:
.. code-block:: html
@@ -37,8 +39,13 @@ class ShortenLinkTransform(SphinxPostTransform):
supported_platform: ClassVar[dict[str, str]] = {
"github.com": "github",
"gitlab.com": "gitlab",
"bitbucket.org": "bitbucket",
}
platform = None

@classmethod
def add_platform_mapping(cls, platform, netloc):
"""Add domain->platform mapping to class at run-time."""
cls.supported_platform.update({netloc: platform})

def run(self, **kwargs):
"""Run the Transform object."""
@@ -50,74 +57,146 @@ def run(self, **kwargs):
# only act if the uri and text are the same
# if not the user has already customized the display of the link
if uri is not None and text is not None and text == uri:
uri = urlparse(uri)
parsed_uri = urlparse(uri)
# only do something if the platform is identified
self.platform = self.supported_platform.get(uri.netloc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure about this part of the class refactor.

I got rid of the platform member. I also renamed the parse_url method and moved it out of the class.

If I am reading the code correctly, it seems odd to me that each time this class encounters a node, it changes self.platform to whatever is matched in that moment.

I felt like it would be cleaner and easier to test if I just passed platform as an argument to the shortener function.

if self.platform is not None:
node.attributes["classes"].append(self.platform)
node.children[0] = nodes.Text(self.parse_url(uri))

def parse_url(self, uri: ParseResult) -> str:
"""Parse the content of the url with respect to the selected platform.

Args:
uri: the link to the platform content

Returns:
the reformated url title
"""
path = uri.path
if path == "":
# plain url passed, return platform only
return self.platform

# if the path is not empty it contains a leading "/", which we don't want to
# include in the parsed content
path = path.lstrip("/")

# check the platform name and read the information accordingly
# as "<organisation>/<repository>#<element number>"
# or "<group>/<subgroup 1>/…/<subgroup N>/<repository>#<element number>"
if self.platform == "github":
# split the url content
parts = path.split("/")

if parts[0] == "orgs" and "/projects" in path:
# We have a projects board link
# ref: `orgs/{org}/projects/{project-id}`
text = f"{parts[1]}/projects#{parts[3]}"
else:
# We have an issues, PRs, or repository link
if len(parts) > 0:
text = parts[0] # organisation
if len(parts) > 1:
text += f"/{parts[1]}" # repository
if len(parts) > 2:
if parts[2] in ["issues", "pull", "discussions"]:
text += f"#{parts[-1]}" # element number

elif self.platform == "gitlab":
# cp. https://docs.gitlab.com/ee/user/markdown.html#gitlab-specific-references
if "/-/" in path and any(
map(uri.path.__contains__, ["issues", "merge_requests"])
):
group_and_subgroups, parts, *_ = path.split("/-/")
parts = parts.rstrip("/")
if "/" not in parts:
text = f"{group_and_subgroups}/{parts}"
else:
parts = parts.split("/")
url_type, element_number, *_ = parts
if not element_number:
text = group_and_subgroups
elif url_type == "issues":
text = f"{group_and_subgroups}#{element_number}"
elif url_type == "merge_requests":
text = f"{group_and_subgroups}!{element_number}"
else:
# display the whole uri (after "gitlab.com/") including parameters
# for example "<group>/<subgroup1>/<subgroup2>/<repository>"
text = uri._replace(netloc="", scheme="") # remove platform
text = urlunparse(text)[1:] # combine to string and strip leading "/"

return text
platform = self.supported_platform.get(parsed_uri.netloc)
if platform is not None:
short = shorten_url(platform, uri)
if short != uri:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code change that prevents adding the platform class (and thereby the icon) unless the link is actually shortened.

node.attributes["classes"].append(platform)
node.children[0] = nodes.Text(short)


def shorten_url(platform: str, url: str) -> str:
"""Parse the content of the path with respect to the selected platform.

Args:
platform: "github", "gitlab", "bitbucket", etc.
url: the full url to the platform content, beginning with https://

Returns:
short form version of the url,
or the full url if it could not shorten it
"""
if platform == "github":
return shorten_github(url)
elif platform == "bitbucket":
return shorten_bitbucket(url)
elif platform == "gitlab":
return shorten_gitlab(url)

return url


def shorten_github(url: str) -> str:
"""
Convert a GitHub URL to a short form like owner/repo#123 or
owner/repo@abc123.
"""
path = urlparse(url).path

# Pull request URL
# - Example:
# - https://github.com/pydata/pydata-sphinx-theme/pull/2068
# - pydata/pydata-sphinx-theme#2068
if match := re.match(r"/([^/]+)/([^/]+)/pull/(\d+)", path):
owner, repo, pr_id = match.groups()
return f"{owner}/{repo}#{pr_id}"

# Issue URL
# - Example:
# - https://github.com/pydata/pydata-sphinx-theme/issues/2176
# - pydata/pydata-sphinx-theme#2176
elif match := re.match(r"/([^/]+)/([^/]+)/issues/(\d+)", path):
owner, repo, issue_id = match.groups()
return f"{owner}/{repo}#{issue_id}"

# Commit URL
# - Example:
# - https://github.com/pydata/pydata-sphinx-theme/commit/51af2a27e8a008d0b44ed9ea9b45311e686d12f7
# - pydata/pydata-sphinx-theme@51af2a2
elif match := re.match(r"/([^/]+)/([^/]+)/commit/([a-f0-9]+)", path):
owner, repo, commit_hash = match.groups()
return f"{owner}/{repo}@{commit_hash[:7]}"

# No match — return the original URL
return url


def shorten_gitlab(url: str) -> str:
"""
Convert a GitLab URL to a short form like group/project!123 or
group/project@abcdef7.

Only supports canonical ('/-/') GitLab URLs.
"""
path = urlparse(url).path

# Merge requests
# - Example:
# - https://gitlab.com/gitlab-org/gitlab/-/merge_requests/195598
# - gitlab-org/gitlab!195598
if match := re.match(r"^/(.+)/([^/]+)/-/merge_requests/(\d+)$", path):
namespace, project, mr_id = match.groups()
return f"{namespace}/{project}!{mr_id}"

# Issues
# - Example:
# - https://gitlab.com/gitlab-org/gitlab/-/issues/551885
# - gitlab-org/gitlab#195598
#
# TODO: support hash URLs, for example:
# https://gitlab.com/gitlab-org/gitlab/-/issues/545699#note_2543533261
if match := re.match(r"^/(.+)/([^/]+)/-/issues/(\d+)$", path):
namespace, project, issue_id = match.groups()
return f"{namespace}/{project}#{issue_id}"

# Commits
# - Example:
# - https://gitlab.com/gitlab-org/gitlab/-/commit/81872624c4c58425a040e158fd228d8f0c2bda07
# - gitlab-org/gitlab@8187262
if match := re.match(r"^/(.+)/([^/]+)/-/commit/([a-f0-9]+)$", path):
namespace, project, commit_hash = match.groups()
return f"{namespace}/{project}@{commit_hash[:7]}"

# No match — return the original URL
return url


def shorten_bitbucket(url: str) -> str:
"""
Convert a Bitbucket URL to a short form like team/repo#123 or
team/repo@main.
"""
path = urlparse(url).path

# Pull request URL
# - Example:
# - https://bitbucket.org/atlassian/atlassian-jwt-js/pull-requests/23
# - atlassian/atlassian-jwt-js#23
if match := re.match(r"^/([^/]+)/([^/]+)/pull-requests/(\d+)$", path):
workspace, repo, pr_id = match.groups()
return f"{workspace}/{repo}#{pr_id}"

# Issue URL.
# - Example:
# - https://bitbucket.org/atlassian/atlassian-jwt-js/issues/11/
# - atlassian/atlassian-jwt-js!11
#
# Deliberately not matching the end of the string because sometimes
# Bitbucket issue URLs include a slug at the end, for example:
# https://bitbucket.org/atlassian/atlassian-jwt-js/issues/11/nested-object-properties-are-represented
elif match := re.match(r"^/([^/]+)/([^/]+)/issues/(\d+)", path):
workspace, repo, issue_id = match.groups()
return f"{workspace}/{repo}!{issue_id}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the ! standard for bitbucket issues? above we use # for both PRs and issues in the GitHub/GitLab shorteners.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I have no clue about Bitbucket. Never used it.


# Commit URL
# - Example:
# - https://bitbucket.org/atlassian/atlassian-jwt-js/commits/d9b5197f0aeedeabf9d0f8d0953a80be65743d8a
# - atlassian/atlassian-jwt-js@d9b5197
elif match := re.match(r"^/([^/]+)/([^/]+)/commits/([a-f0-9]+)$", path):
workspace, repo, commit_hash = match.groups()
return f"{workspace}/{repo}@{commit_hash[:7]}"

# No match — return the original URL
return url
Loading
Oops, something went wrong.
Loading
Oops, something went wrong.