Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codeql-action/init should explicitly delete an already present database (for self-hosted runners) #1722

Open
rseeton opened this issue Jun 9, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@rseeton
Copy link

rseeton commented Jun 9, 2023

We are running CodeQL as an action on self-hosted runners and have enabled the 'over-write' option to clean the CodeQL database before each run.

However, some environments fail with the following message:

Failure invoking /__w/_tool/CodeQL/2.13.3-20230524/x64/codeql/codeql with arguments database,init,--db-cluster,/__w/_temp/codeql_databases,--source-root=/__w/a1_aggregation_mtf/a1_aggregation_mtf,--language=java,--begin-tracing,--trace-process-name=Runner.Worker.exe,--codescanning-config=/__w/_temp/user-config.yaml,--overwrite.
 
       Exit code 2 and error was:

       A fatal error occurred: Ran with database overwriting enabled, but the directory does not appear to be a CodeQL database or database cluster. Please check you do indeed wish to delete it, and do so manually.

As suggested in the error message, we can manually delete the directory, allowing for a successful run.

We do not understand what is causing CodeQL to believe the directory is not a CodeQL database. We are not explicitly altering the contents or structure. As the directories involved are created by CodeQL within the workflow, the hope is to have CodeQL able to clean up it's workspace from within the workflow as well.

Ideally, we want to avoid having to implement an external cleanup step for the process.

We are curious to see what logic is being used to determine the 'CodeQL database' status of the targeted directory.

Regards,
Richard Seeton

@rseeton rseeton added the question Further information is requested label Jun 9, 2023
@smowton smowton changed the title General issue --overwrite option sometimes does not recognise an existing CodeQL database Jun 9, 2023
@smowton
Copy link
Contributor

smowton commented Jun 9, 2023

In brief, the directory is taken for a codeql database if it contains codeql-database.yml, or a database cluster if it contains exactly dirs log, working, temp and one or more directories that look like databases (i.e., which contain codeql-database.yml). I suggest logging what's really in the directory so you can see what is either unexpectedly missing or present when that error is raised.

@hmakholm
Copy link
Contributor

hmakholm commented Jun 9, 2023

I'm assuming you're using the github/codeql-action action on your self-hosted runner. But the action doesn't seem to have any overt feature for asking for the database directory to be overwritten. Did you use $CODEQL_ACTIONS_EXTRA_OPTIONS to inject the --overwrite option on the codeql database init command line?

The original use case for --overwrite was as a convenience feature for interactive use where one executes the same codeql database create command again and again after making changes to the source tree and/or extractor. In those cases, deleting the database manually is a low-cost workaround, so the "is this actually a database we're deleting" check deliberately tries to err in the direction of not deleting a completely wrong directory tree, e.g. because the interactive user misremembered the command-line syntax.

Looking through the code that implements the check, I can see a few theoretical possibilities where an earlier run that was killed a just the wrong time may have left the directory in a state that's not recognizable as a database. This would be a fairly rare occurrence -- but once a particular self-hosted runner has a directory in that state, there would be nothing that automatically clears it away. (In contrast, GitHub-hosted runners always start all just in a pristine container without anything left over from earlier jobs).

As Chris says, it's hard to say exactly what has go wrong without looking at the actual directory content.

Arguably the right fix ought to be that the action should clear out the database directory itself when it starts up rather than leaving it to the CLI to do so. (At least when it finds itself on a self-hosted runner, if that condition is easily detectable).

@hmakholm hmakholm changed the title --overwrite option sometimes does not recognise an existing CodeQL database codeql-action/init should explicitly delete an already present database (for self-hosted runners) Jun 9, 2023
@hmakholm hmakholm transferred this issue from github/codeql Jun 9, 2023
@hmakholm hmakholm added enhancement New feature or request and removed question Further information is requested labels Jun 9, 2023
@hmakholm
Copy link
Contributor

hmakholm commented Jun 9, 2023

(More precisely, if an earlier attempt to run codeql-action/init was killed after it had created a language subdirectory of the database cluster but before it wrote a codeql-database.yml into it, then I think codeql database init --overwrite would not recognize the resulting directory tree being safe to delete).

@hmakholm
Copy link
Contributor

hmakholm commented Jun 9, 2023

If this guess is right, there seems to be a good chance that cleaning out the offending directory on the affected runner manually could get your setup working again while we decide whether to change the behavior of the action.

@rseeton
Copy link
Author

rseeton commented Jun 10, 2023

Hello @hmakholm

Your assumption is correct - we are running a GitHub Enterprise environment with selfhosted runners calling the codeql-action. To enable the 'overwrite' functionality, we use the $CODEQL_ACTIONS_EXTRA_OPTIONS environment setup.

The problem occurs on every run of the workflow, regardless of whether the previous run completed successfully or not. As suggested above, if we force the deletion of the codeql_databases directory, we can get around the problem (this is being implemented in the runners using the ACTIONS_RUNNER_HOOK_JOB_STARTED option)

The concern/question though, is why is this external cleanup needed? On a successful execution the codeql_databases directory is populated and happily generates the sarif results, but the subsequent run aborts saying that if doesn't think the directory is a codeql_database - this suggests we are violating the rules used to identify a codeql_database, but we don't have clarity on what that rule requires.

A clean run produces the following directories and a codeql-database.yml file ... and yet this triggers a failure on the subsequent run (if not cleaned up):

**_pwd_**
/<runner_dir>/_work/_temp/codeql_databases

**_ls -lrta ./_**
total 0
drwxr-xr-x 3 root    root          20 Jun  9 23:21 diagnostic
drwxr-xr-x 3 root    root          32 Jun  9 23:21 temp
drwxr-xr-x 7 root    root          74 Jun  9 23:21 .
drwxr-xr-x 2 root    root          93 Jun  9 23:21 working
drwxr-xr-x 7 root    root         145 Jun  9 23:24 java
drwxr-xr-x 2 root    root         134 Jun  9 23:27 log
drwxr-xr-x 7 rseeton role-algodev 147 Jun  9 23:27 ..

**_$ ls -lrta ./java/codeql-database.yml_** 
-rw-r--r-- 1 root root 310 Jun  9 23:24 ./java/codeql-database.yml

We aren't blocked but we are confused.
Regards.

@hmakholm
Copy link
Contributor

Thanks, that was helpful.

The diagnostic directory is one we have started generating since the --overwrite logic was written, and it looks like we missed updating the latter accordingly to deal with it.

I'll file an internal note to get that fixed.

@rseeton
Copy link
Author

rseeton commented Jun 12, 2023

@hmakholm - Excellent, thanks.
Testing in our environments confirms that if we remove the codeql_databases/diagnostic library, the overwrite function recognizes the codeql_databases as a CodeQL database and allows the over-write operation to complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants