Skip to content

Add Gitlab Live V2 Importer #1910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

michaelehab
Copy link
Collaborator

@michaelehab michaelehab commented Jun 15, 2025

Solves #1903

  • Extract Gitlab API handling code from vulntotal gitlab datasource into a utils file.
  • Update vulntotal gitlab datasource to use the utils file.
  • Update vulntotal gitlab datasource tests to reflect the changes.
  • Add Gitlab Live V2 Importer.
  • Add Gitlab Live V2 Importer tests to test package-first mode.

@TG1999
Copy link
Contributor

TG1999 commented Jul 1, 2025

@michaelehab
Copy link
Collaborator Author

@TG1999 I modified the V2 importer as well

@michaelehab michaelehab force-pushed the 1903-gitlab-importer-package-first branch from a33f85b to c56e940 Compare July 4, 2025 15:29
@@ -31,6 +32,9 @@
from vulnerabilities.utils import build_description
from vulnerabilities.utils import get_advisory_url
from vulnerabilities.utils import get_cwe_id
from vulntotal.datasources.gitlab import get_casesensitive_slug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space what do you think is it a good idea to import vulntotal functions in vcio ? Or shall we create separate functions here ?

Copy link
Contributor

@TG1999 TG1999 Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelehab , also spdx license is missing in Gitlab vulntotal datasource. We need to have that before using it in our VCIO importers. https://github.com/aboutcode-org/vulnerablecode/blob/main/vulntotal/datasources/gitlab.py#L30

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think is it a good idea to import vulntotal functions in vcio ?

We can import it from VulnTotal for now. Later on we can extract these out in common utility.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelehab , also spdx license is missing in Gitlab vulntotal datasource. We need to have that before using it in our VCIO importers. https://github.com/aboutcode-org/vulnerablecode/blob/main/vulntotal/datasources/gitlab.py#L30

I believe we discussed this in an earlier meeting where we said that the package-first API endpoint won't be enabled by default and users have to enable it locally which is like using vulntotal when it comes to license, that's even why I reused the vulntotal functions in VCIO's package-first mode.

* Add Gitlab Live V2 Importer

* Add tests for the Gitlab Live V2 Importer

* Tested functionally using the Live Evaluation API in #1969

Signed-off-by: Michael Ehab Mikhail <michael.ehab@hotmail.com>
@michaelehab michaelehab force-pushed the 1903-gitlab-importer-package-first branch from bc7d2ea to 79429df Compare August 18, 2025 14:25
@michaelehab michaelehab changed the title Modify Gitlab Importer to support package-first mode Add Gitlab Live V2 Importer Aug 18, 2025
Copy link
Member

@keshav-space keshav-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @michaelehab, pipeline steps are looking good. few nits for your consideration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO this should be inside vulnerabilities/pipelines/v2_importers/gitlab_importer.py

Comment on lines +125 to +135
def advisory_dict_to_advisory_data(
advisory: dict,
purl_type_by_gitlab_scheme,
gitlab_scheme_by_purl_type,
logger,
purl=None,
advisory_url=None,
):
"""
Convert a GitLab advisory dict to AdvisoryDataV2.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has lots of duplicate with

def parse_gitlab_advisory(
file, base_path, gitlab_scheme_by_purl_type, purl_type_by_gitlab_scheme, logger
):
"""
Parse a Gitlab advisory file and return an AdvisoryData or None.
These files are YAML. There is a JSON schema documented at
https://gitlab.com/gitlab-org/advisories-community/-/blob/main/ci/schema/schema.json
Sample YAML file:
---
identifier: "GMS-2018-26"
package_slug: "packagist/amphp/http"
title: "Incorrect header injection check"
description: "amphp/http isn't properly protected against HTTP header injection."
pubdate: "2018-03-15"
affected_range: "<1.0.1"
fixed_versions:
- "v1.0.1"
urls:
- "https://github.com/amphp/http/pull/4"
cwe_ids:
- "CWE-1035"
- "CWE-937"
identifiers:
- "GMS-2018-26"
"""
with open(file) as f:
gitlab_advisory = saneyaml.load(f)
if not isinstance(gitlab_advisory, dict):
logger(
f"parse_gitlab_advisory: unknown gitlab advisory format in {file!r} with data: {gitlab_advisory!r}",
level=logging.ERROR,
)
return
# refer to schema here https://gitlab.com/gitlab-org/advisories-community/-/blob/main/ci/schema/schema.json
aliases = gitlab_advisory.get("identifiers")
advisory_id = gitlab_advisory.get("identifier")
package_slug = gitlab_advisory.get("package_slug")
advisory_id = f"{package_slug}/{advisory_id}" if package_slug else advisory_id
if advisory_id in aliases:
aliases.remove(advisory_id)
summary = build_description(gitlab_advisory.get("title"), gitlab_advisory.get("description"))
urls = gitlab_advisory.get("urls")
references = [ReferenceV2.from_url(u) for u in urls]
cwe_ids = gitlab_advisory.get("cwe_ids") or []
cwe_list = list(map(get_cwe_id, cwe_ids))
date_published = dateparser.parse(gitlab_advisory.get("pubdate"))
date_published = date_published.replace(tzinfo=pytz.UTC)
advisory_url = get_advisory_url(
file=file,
base_path=base_path,
url="https://gitlab.com/gitlab-org/advisories-community/-/blob/main/",
)
purl: PackageURL = get_purl(
package_slug=package_slug,
purl_type_by_gitlab_scheme=purl_type_by_gitlab_scheme,
logger=logger,
)
if not purl:
logger(
f"parse_yaml_file: purl is not valid: {file!r} {package_slug!r}", level=logging.ERROR
)
return AdvisoryData(
advisory_id=advisory_id,
aliases=aliases,
summary=summary,
references_v2=references,
date_published=date_published,
url=advisory_url,
original_advisory_text=json.dumps(gitlab_advisory, indent=2, ensure_ascii=False),
)
affected_version_range = None
fixed_versions = gitlab_advisory.get("fixed_versions") or []
affected_range = gitlab_advisory.get("affected_range")
gitlab_native_schemes = set(["pypi", "gem", "npm", "go", "packagist", "conan"])
vrc = RANGE_CLASS_BY_SCHEMES[purl.type]
gitlab_scheme = gitlab_scheme_by_purl_type[purl.type]
try:
if affected_range:
if gitlab_scheme in gitlab_native_schemes:
affected_version_range = from_gitlab_native(
gitlab_scheme=gitlab_scheme, string=affected_range
)
else:
affected_version_range = vrc.from_native(affected_range)
except Exception as e:
logger(
f"parse_yaml_file: affected_range is not parsable: {affected_range!r} for: {purl!s} error: {e!r}\n {traceback.format_exc()}",
level=logging.ERROR,
)
parsed_fixed_versions = []
for fixed_version in fixed_versions:
try:
fixed_version = vrc.version_class(fixed_version)
parsed_fixed_versions.append(fixed_version.string)
except Exception as e:
logger(
f"parse_yaml_file: fixed_version is not parsable`: {fixed_version!r} error: {e!r}\n {traceback.format_exc()}",
level=logging.ERROR,
)
if affected_version_range:
vrc = affected_version_range.__class__
fixed_version_range = vrc.from_versions(parsed_fixed_versions)
if not fixed_version_range and not affected_version_range:
return
affected_package = AffectedPackageV2(
package=purl,
affected_version_range=affected_version_range,
fixed_version_range=fixed_version_range,
)
return AdvisoryData(
advisory_id=advisory_id,
aliases=aliases,
summary=summary,
references_v2=references,
date_published=date_published,
affected_packages=[affected_package],
weaknesses=cwe_list,
url=advisory_url,
original_advisory_text=json.dumps(gitlab_advisory, indent=2, ensure_ascii=False),
)
we should re use the existing code and if needed we can break down parse_gitlab_advisory smaller function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Modify the GitLab importer to support package-first mode
3 participants