Description
The nature of the C# driver means that it needs to see all csc invocations for a build. Today that is achieved by disabling shared compilation during build. That unfortunately creates a significant performance penalty when building C# code. The larger the repo the more significant the penalty. The rule of thumb is that for solutions of at least medium size this will cause somewhere from a 3-4X build slow down. For larger repositories this can be even bigger.
Consider as a concrete example the dotnet/runtime repository. When their CodeQL pipeline runs and shared compilation is disabled it increases their build time by 1 hour and 38 minutes. Or alternatively building the dotnet/roslyn repository. Building that locally results in a ~4X perf penalty when shared compilation is disabled.
I understand from previous conversations that this is done because the extractor needs to see every C# compiler call in the build and potentially process it. An alternative approach to doing this is to have the build produce a binary log. That is a non-invasive change to builds and there is ample tooling for reading C# compiler calls from those once the build completes. The code for reading compiler calls is at the core very straight forward:
using var fileStream = new FileStream(binlogPath, FileMode.Open, FileAccess.Read, FileShare.Read);
List<CompilerCall> compilerCalls = BinaryLogUtil.ReadAllCompilerCalls(fileStream);
I have a sample here of integrating this approach into the existing extractor tool. On my machine I was able to use this sample to run over a full build of roslyn without any obvious issues. And most importantly, no changes to how I built roslyn 😄
Using libraries like this have other advantages because you could also offload the work of creating CSharpCompilation
objects. There is code in that library that will very faithfully recreate a CSharpCompilation
for every CompilerCall
instance. Furthermore, think there are other optimization opportunities for the extractor once you are running the analysis in bulk: caching MetadataReferences
, caching SourceText
, etc ...
More than happy to chat about this, the different ways binary logs can be used, moving analysis off machine, etc ... My big end goal is to get back to us using shared compilation so we can keep our build times down.