Improve ingestion of CPEs #510

ctron · 2024-07-05T07:40:59Z

For PURLs we do have an optimized importer by now. A list of PURLs from an SBOM get's batch importer, with an upsert strategy.

However, for CPEs we still have the single "get or insert" strategy. As there seem to be a lot of CPEs in the SBOMs now, that hurts performance a lot.

The idea is to replicate the ingestion process from PURLs and apply the same pattern to CPEs. Batch insertion, plus upsert. A quick check for a single RHEL style SBOMs shows that this should bring down operations quite a bit, just by avoiding duplicates:

➜  sbom bzcat rhel-br-9.2.0.json.bz2 | grep cpe: | sort | wc
   4940    9880  348225
➜  sbom bzcat rhel-br-9.2.0.json.bz2 | grep cpe: | sort -u | wc
     27      54    1944

On the other hand, those CPEs are of type "security" and we can skip them at first. Also see: #509 … However, in the future we might want to ingest this information anyway. So we need to improve the CPE creationg process.

The text was updated successfully, but these errors were encountered:

bobmcwhirter · 2024-07-05T15:49:27Z

I'm now using the pih CPEs to contextualize product-status from CSAF, so yes please, CPEs would be good. I'm currently relying upon graph.ingest_cpe(...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ingestion of CPEs #510

Improve ingestion of CPEs #510

ctron commented Jul 5, 2024

bobmcwhirter commented Jul 5, 2024

Improve ingestion of CPEs #510

Improve ingestion of CPEs #510

Comments

ctron commented Jul 5, 2024

bobmcwhirter commented Jul 5, 2024