-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement: split out query packs #1367
Comments
Thanks for raising this issue. Do you have any examples of code scanning runs that are taking too long? We can investigate and see if you might benefit from larger runners? I would like to understand more about your issue. Some scans on larger projects can be resource intensive. Sometimes it's beneficial to run fewer queries at a time, but you may wind up spending time re-building the database on each run. Before trying to split the query suites, I would recommend first running using larger runners. Alternatively, you can use query filters to selectively run subsets of queries in a query suite. Please let me know if these suggestions help you. |
We can use this repo for example but the specific use case is in a private repo. https://github.com/OneGHEOrg/angular/actions We've given bigger runners and have only been able to reduce the time to 11 minutes (16 core 64 Gb ram). We're open to other ideas but we'd really like to find a solution that doesn't require tossing a 64-core machine at this. I'd also looked at using query filters to run a subset but this particular case requires security-extended which is due to needing more results that originally surfaced by the default query pack. I've thought about running a matrix job and having several different codeql-configs but that doesn't feel like a very maintainable solution since those query packs would require additional maintenance going forward as GitHub iterates on CodeQL. In the private repo, we've been able to reduce the runtime from 17 minutes to 11 but we'd ideally get it under 10. |
From your example repository, I have a few comments and suggestions. Not sure how applicable this is to your private repository:
The last query took almost 30s. There is a possibility that one or more of our queries are just performing poorly on your repo. We do a lot of internal benchmarking, but it's hard to do this on private repos (since we can't see the code). If you see a query that you think is taking longer than it should, let us know. |
I noticed one other thing after modifying the categories. The preview and show paths are unavailable when providing an alternate path in source-root during the init step. Is this the expected behavior or something that needs to be addressed? I'm linking an example but I believe you will need to accept the invite to be an outside collaborator in order to see it. |
When you specify a custom source-root, code-scanning (the server-side imlpementation that processes the results) does not know how to map the locations of the alerts to places in the code since the root is different. You should instead be using Please give me more time to think about your other questions. |
And thanks for pointing out the documentation for the ATM queries. We've just updated the README. They are being run for javascript codescanning runs. |
The ATM queries are still somewhat experimental. You may want to filter them from running if they are taking too much time. |
Sounds good and we actually found an alternative solution for source-root using mono-repos by using turbo which can pull a project and its dependent packages while maintaining the locations for visibility. We originally looked at using codeql-config but needed to deviate from this pattern since we wanted to run each app as part of a matrix job. As for the codeql pack split out. What are your initial thoughts on having some additional packs such as security-core, extended-only, and quality-only so we could run those as part of a matrix further reducing runtime? We can likely omit some of the long-running queries (ATM) but want to be careful about omitting too much. |
When you specify a matrix job, you can include some matrix parameters including which codeql-config.yml files to run. For example, your matrix currently looks like this:
It happens to be that each of your directories is just a single word, so you can use this to generate unique codeql-config files for each job. Eg-
Or something like that.
I will have to communicate with our product team, since this is not a technical decision and can potentially affect many customers. |
I'll take a look into this and see if it will work for the actual use case.
Sounds great. I understand there's a lift here but I can definitely see this strategy helping with some of these larger repos/apps that are concerned about performance. Especially when it's embedded into the PR workflow. |
We're having a conversation about expanding our set of query suites or query suite selectors (ie- building blocks for custom query suites). If we decide this is the right thing to do, we won't be able to implement in the short term, so I would recommend finding a solution here. If you would like to permanently opt out of the ATM queries, we can do so on our end. Just please let us know which organizations and repositories we should do this for. And if you ever change your mind, we can easily undo it. This is not much different from filtering them out using query-filters thought. So, the choice is yours. Another suggestion is that you avoid running |
One challenge we've seen with customers running CodeQL against large applications is the time to execute CodeQL scans. One potential solution is to split out the query packs to run separately.
ex.
This could be combined with a matrix strategy to run parallel jobs and optimize the total execution time of the workflow. I understand one limitation of this is new CodeQL packs would need to be published that support this pattern. However, it would be very beneficial for customers where PRs running over 10 minutes in their large legacy apps are considered a nonstarter for CodeQL as a long-term solution.
The text was updated successfully, but these errors were encountered: