Have you ever wished you could query your code the same way you query a SQL database? Well, that’s exactly what GitHub’s CodeQL enables you to do. It’s a semantic code analysis engine that transforms your code into a structured database that you can use to surface security vulnerabilities or discover new insights.
You don’t need to learn a thing about static analysis or structured queries to benefit from CodeQL. GitHub’s code scanning feature runs hundreds of predefined queries right out of the box—for free on public repositories, or as part of GitHub Advanced Security for enterprises. There are also many more niche “query packs” available that go far beyond the default scans. But while the number of ready-made queries is growing all the time, you can also create your own queries to meet your specific needs.
We’ve been writing custom CodeQL queries at Betsson for about two years, including ones to moderate package use, research and quantify code and quality metrics, and facilitate adherence to code structure and preferred architecture design. In this guide, I share some of what we’ve learned to help you get up and running with custom queries as quickly as possible.
In this guide, you will learn:
How to build a quick and minimal local setup.
How to create and run a simple custom query.
How to to add CodeQL scans to your CI with GitHub Actions.
More advanced custom query possibilities.
Set up a simple local environment
In this guide, we will use JavaScript and Visual Studio Code, but you should be able to follow along regardless of your language and code editor of choice. I invite you to fork this small repository where I collected most of the setup used for this article, including a minimal application called “health-app” that we can scan.
You need the CodeQL command-line interface (CLI) tool to create and configure databases, a language pack for your programming language of choice to convert your code into a query-able database, and one or more query packs. You can find the CLI, packs, and the Visual Studio Code plugin on the CodeQL tools page. For help setting everything up, you can refer to the CodeQL CLI quick-start documentation.
You’ll do most of your CodeQL work in the plugin for Visual Studio Code, or a similar plugin for your code editor of choice. The Visual Studio Code plugin enables you to connect to different scan targets, design queries using IntelliSense, and run/view results of your scans from produced SARIF (Static Analysis Results Interchange Format) reports.
Create and run your first custom query
Before you can run a scan, you need the following:
The project’s source code/repository
A CodeQL database built from that repository
A CodeQL configuration file for the project
Remember, your CodeQL setup—which includes scripts, packages, and databases—will live in a separate directory from the project you’re scanning.
Let’s start by running all commands from the project’s root directory (if you’re using my health-app repository, all of this has been done already):
Initiate CodeQL by running the following (“.” stands for the current directory):
codeql pack init -d . codeql
This will create the qlpack.yml
file in a new subdirectory called codeql
in the project’s root directory.
Configure qlpack.yml
by adding the JavaScript language reference:
codeql pack add --dir ./codeql codeql/javascript-all
Create a database from your codebase:
codeql database create codeql/db -s . -l javascript
This creates a new subdirectory within the root directory called db
.
Now let’s make our first custom query! Create a new file in your code editor with the following:
import javascript
from PackageDependencies deps, string name
where deps.getADependency(name, _)
select deps, "Dependency found'" + name + "'."
This is a simple query that will return all of a project’s dependencies. Save it as a .ql file inside the newly created codeql
subdirectory of the project’s directory. Of course, you can create far more interesting and sophisticated queries, but let’s start here.
From the VS Code plugin, select the db
directory you just created. Then right-click anywhere within the .ql file to run the first scan. The query should produce a list of package.json dependencies.
You can perform many different types of scans with CodeQL. For example, you could block vulnerable log4j usage at scale by disallowing affected versions of the package. You could update the example query we created above to explicitly disallow any library (dotenv
in our case) by assigning appropriate security severity level (read on about security severity and alert settings for available options).
/**
* @name dependencies
* @description finds and lists referenced dependencies
* @kind problem
* @problem.severity error
* @security-severity 10.0
* @tags setup_check
* @id setup
*/
import javascript
from PackageDependencies deps, string name
where deps.getADependency(name, _) and name.matches("dotenv")
select deps, "Dependency found'" + name + "'."
You can learn more about static analysis and using CodeQL for vulnerability detection from GitHub’s recent tutorial. A more exotic use for CodeQL would be implementing fitness functions to proactively pursue architectural designs in a measurable way.
As you can see, running custom queries locally is quite simple. Now let’s take it up a level with GitHub Actions.
Automating CodeQL scans with GitHub Actions
The easiest way to run a custom query with GitHub Actions is with GitHub’s CodeQL Analysis workflow, which uses GitHub’s CodeQL action. It has three main components: setup, runner, and reporter. The setup and runner components are pretty self-explanatory. The reporter uploads scan results and a snapshot of your database to your repository context store, and makes them available in your Security tab. The best part is that you can download the database using a GitHub API call, should you want to investigate further or explore results in a semi-manual mode.
To run the custom dependency query we created above, be sure to add both the .ql and qlpack.yml files to your repository. Then set up the Actions workflow.
If you haven’t already enabled GitHub Actions for the repository, click Settings under your repository name. If you cannot see the Actions tab, select the “...” dropdown menu, then click Actions. Click the button that says I understand my workflows, go ahead and enable them.
On the Actions tab, click New workflow and search for CodeQL Analysis. There should be one result. Click the Configure button.
You should see an Actions workflow YAML file. Add this line to the file in the github/codeql-action/init
section (remember to include the white space):
queries: +./${{ env.CI_TMP_DIR }}/codeql/deps.ql
Click Commit. This should kick off a CodeQL scan. When the scan is complete you should see something like this in the repository’s Security tab:
Note: If you’re using my health-app repository, please be aware that the included codeql-custom.yml
workflow requires GitHub Advanced Security. If you don’t have Advanced Security, you can still test the custom workflow by following the steps above.
While this process will work for testing our workflow, in the long run, it’s better to use a custom CodeQL configuration file, not the Actions workflow, to manage which custom queries you run.
Exploring further possibilities
You can create multiple-language or multiple-configuration setups to quickly gather more information from a single run or perform multiple scans at once. For example, instead of specifying languages up front, you can automatically detect which languages are used in a repository and spawn appropriate scans based on the results. Here is an example of working with the GitHub CLI to fetch information:
gh api repos/${{ env.CI_REPOSITORY }}/languages -q 'keys[]'
And this documentation details how to customize your CodeQL scans.
Make something cool? Share it!
Of course, we just scratched the surface of what can and should be done with CodeQL. There is much more to be discovered in the documentation and the application itself. As you explore this powerful platform, you’ll probably find yourself making things that other people can use. If you create a query that could be useful in practically all codebases, you can submit your query to the open source CodeQL query repository. If it’s a bit more niche—for example, a query that’s only applicable to actions written in JavaScript—you can create your own query pack and share it through GitHub Packages. I look forward to seeing what you come up with.