Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: add codeql analysis #32745

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ReenigneArcher
Copy link
Contributor

@ReenigneArcher ReenigneArcher commented Mar 14, 2024

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

Explanation of your pull request in arbitrary form goes here. Please make sure the description explains the purpose and effect of your pull request and is worded well enough to be understood. Provide as much context and examples as possible.

This PR adds CodeQL (security) analysis.

It will check security of the python code on the following events.

  • pull request to master branch
  • push into master branch
  • schedule (to detect new vulnerabilities in old code)

CodeQL will check for known vulnerabilities in the code. It will also check for exposed tokens.

This will enable the security tab of your repo, and create security events there similar to issues, however they are private to the repo owner (and probably specific people you specify).

It will also fail PRs if they introduce new security issues. You can setup branch protection rules to block these PRs from being merged, by forcing the job to pass (along with your standard CI) if you wanted to.

I highly suggest implementing CodeQL as it will alert you of issues, and help you have more secure code.

You can see the analysis results of this PR here once finished running ->
https://github.com/ytdl-org/youtube-dl/security/code-scanning?query=is%3Aopen+pr%3A32745

@dirkf dirkf mentioned this pull request Mar 15, 2024
11 tasks
@dirkf
Copy link
Contributor

dirkf commented Mar 15, 2024

I like the idea of automated testing but CodeQL was proposed before and, looking at the results linked above, I think the response still applies.

By all means implement this as a separate optional workflow. If there were

  1. a way of exposing that in the PR workflow, ideally restricted to the changed files, and
  2. a command-line equivalent that a dev can run without making a commit in GH (or downloading 1GB, or even 100MB, of CodeQL),

I'd be even happier. For instance, by specifying a single commit or range of commits, it should be possible to generate CodeQL whingediagnostics relevant to the changed lines in the specified commits.

That would have found things like the second "Inefficient RE" (?:[^>]*>\s*<[^>]*)*: extractor/gaskrank.py trying to match possibly the end of one tag, some space-separated complete tags, then opening a tag. It should probably be (?:[^>]*>\s*<)*?[^>]*?, but I can't tell without the command-line tool. To adapt https://www.regular-expressions.info/redos.html in line with Einstein's principle, "you should make your regular expressions as strict as possible_, but no stricter_."

@ReenigneArcher
Copy link
Contributor Author

a way of exposing that in the PR workflow, ideally restricted to the changed files, and

codeql will not fail a workflow unless the PR introduces NEW issues... you can always merge the PR anyway if you don't agree with the reason why it failed it

a command-line equivalent that a dev can run without making a commit in GH

I've never tried it, but there is a CLI option. The python run in GitHub workflows is pretty quick (~3 minutes), so never gave it much thought. https://docs.github.com/en/code-security/codeql-cli/getting-started-with-the-codeql-cli/setting-up-the-codeql-cli

As for the comments on your linked PR, you can always close the security issues if you don't agree them.
image

The security issues opened by codeql will be private, so anyone who you don't give explicit permission to view them won't be able to.

Anyway, I only opened this because I've added youtube-dl as a submodule (due to no updated pypi package), so the codeql I run in my repo will include issues found in the submodule... and It has found 24 possible issues in this project.

@dirkf
Copy link
Contributor

dirkf commented Mar 16, 2024

You can feel free to ignore/close off the issues that were flagged.

If this was enabled for PRs, I'd only want to flag issues found in the new/changed code. With the linter (flake8) workflow, the rules are secretly changed from time to time, causing a swathe of new diagnostics from previously checked code. No doubt this would be true with CodeQL too, but those diagnostics should only be shown to PR devs if the PR itself is affected.

Regarding PyPi, you can pip install ... from the master or nightly repos, either via git or just using the source archive in the Clone Repo menu (master) or Releases page (nightly). Maybe that would help?

@ReenigneArcher
Copy link
Contributor Author

If this was enabled for PRs, I'd only want to flag issues found in the new/changed code.

That's precisely how it works. The schedule, runs against your master branch to get new baselines (you can always adjust the frequency of this, right now it's on weekly). For PRs it only flags NEW potential issues.

Here's an example PR that would have introduced new security issues. The comments on the PR notify the submitter, and they can fix the issues before it's merged. It doesn't notify of existing issues (this would actually be a security risk because it would make the possible security vulnerabilities public). LizardByte/Themerr-plex#130

Regarding PyPi, you can pip install ... from the master or nightly repos

Dependabot can't keep those up to date, so I opted to use a submodule instead. Using a branch without specifying a commit is not an option. Builds would vary between one to the next and it would be impossible to track when something breaks.

@dirkf
Copy link
Contributor

dirkf commented Mar 16, 2024

You can specify a commit instead of a branch, or f'https://github.com/ytdl-org/youtube-dl/archive/{commit}.zip', but probably better to use a tagged nightly build, like https://github.com/ytdl-org/ytdl-nightly/archive/refs/tags/2024.03.13.zip.

@ReenigneArcher
Copy link
Contributor Author

You can specify a commit instead of a branch, or f'https://github.com/ytdl-org/youtube-dl/archive/{commit}.zip', but probably better to use a tagged nightly build, like https://github.com/ytdl-org/ytdl-nightly/archive/refs/tags/2024.03.13.zip.

I'm aware. The issue is that dependabot won't track it, unfortunately... And I don't have the capacity to manually update dependencies all the time. So I opted for the submodule route, which dependabot can track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants