Releases: vthakore23/dicomlock
Releases · vthakore23/dicomlock
v0.8.0
DicomLock 0.8.0 brings the security fixes from the S8 and S9 falsification rounds into the published package. pip install dicomlock is now at 0.8.0.
Highlights in 0.8.0
- CDR escape closed. A payload hidden under an allowlisted vendor creator (e.g. GEMS_IDEN_01) no longer survives disarm. The exe-signature override now matches across a 4 KiB window plus a high-entropy gate, and the same classifier (
scanner.file_security._private_payload_threat) is used by both the scanner and the CDR so detection and disarm cannot disagree. - Polyglot detection broadened. OLE/CFBF (MSI / Office-macro), CAB, Zstd, plus WASM, DEX, RAR, 7z, Lua from the prior round.
- File Meta length validation. A bomb in group 0002 used to push the byte-walk past EOF unseen; now caught.
- Tiered T2 in the decompression-bomb check. Warns at 100 to 1000x amplification, blocks above 1000x.
- Mixed-compression FP fixes (S8). 0 false positives on conformant files across 12 transfer syntaxes. Four real FPs surfaced and fixed (explicit/implicit VR-mismatch length-walk desync, 1-bit packing, YBR_FULL_422 subsampling, legal trailing padding).
- Sandboxed codec decode (S7). The one third-party codec step runs in a resource-limited subprocess; the tool quarantines on crash, OOM, or hang.
- Statistical rigor in the bench (S9). Wilson 95% CIs, rule-of-three FP upper bound, McNemar paired test vs the parser matrix (pydicom, GDCM, dcmtk).
Validated on the bench corpus
- Detection 80/80, neutralization 80/80, CDR fidelity 72/72 bit-exact.
- False positives 0/605 (575 real CTs plus 30 curated; one-sided 95% upper bound 0.50%).
- Differentiation 39/62; McNemar chi-square 49.0, p < 1e-6 (51 files DicomLock flags that every toolkit accepts; 0 DicomLock blind spots vs the matrix).
- Pinned codec: a fuzzer-found malformed JPEG 2000 OOM-kills OpenJPEG 2.3.0 + ASan raw; CDR quarantines the carrier DICOM. DoS class, not memory corruption.
Additions on main since 0.8.0 (not yet republished)
- Diverse-modality validation (
bench.diverse_check): 0 false positives, 270/270 bit-exact across 120 brain MR (UPENN-GBM) + 150 chest radiographs (LIDC-IDRI). Across all real public TCIA data on disk now: 0 FP across 845 files / 3 modalities. - CDR fidelity at scale (
bench.fidelity): 623/623 native and lossless bit-exact across 13 transfer syntaxes; 20/20 lossy preserved as decoded. - De-id first-class. CLI
--deidrenders a colored re-identification score bar with a per-channel breakdown; the web UI shows a teal re-identification risk card. - B1 (
bench.reid_vs_anonymizer): on 60 brain MR, dicognito 0.19 re-pseudonymizes 120/120 direct identifiers but leaves the pixels byte-identical 60/60, so the facial-geometry and burned-in re-id channels are provably unchanged by tag anonymization. - B3 (
bench.reid_audit): residual re-identification-risk audit across 845 public files. Facial-geometry channel fires on 96.7% of head MR (the Mayo concern); burned-in pixel text on 89.3% of chest radiographs; CT body imaging is low pixel-domain risk (face 0.3%, burn 8%). - PREPRINT.md complete with 24 web-verified references and the diverse-modality numbers folded in. REPRODUCE.md maps each headline number to its exact command.
Install
pip install dicomlock
License
Apache-2.0