fix(section16): preserve null vs 0 for empty numeric leaves on Forms 3/4/5 (#116 follow-up)#117
Merged
Merged
Conversation
… distinction (#116 follow-up)
…n OwnershipDocument.storage.ts - Consolidated and reordered import statements for better readability. - Adjusted the extractor version to align with recent changes. - Improved formatting of function parameters for clarity. - Ensured consistent handling of transaction and holding saving logic.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The bug
EDGAR ownership filings (Forms 3, 4, 5, and their /A amendments) routinely emit empty XML elements for numeric fields the filer chose not to populate — e.g.
<transactionShares><value/></transactionShares>or<transactionPricePerShare/>. After XML parsing these arrive as{ value: "" }.OwnershipDocument.schema.tstyped the innervalueasType.Number(). TypeBox'sValue.Convert(run before our storage helpers) coerces""to0, so the empty element becomes a fabricated zero indistinguishable from a real reported zero by the time it reaches the storage layer'snum()helper. That zero then gets written to thesection16_transactions/section16_holdingsnumeric columns, silently corrupting:PR #116 fixed exactly this bug for Form 144 by typing numeric leaves as
Type.String()and letting storage coerce""→null. The same fix was not backported to the shared Section 16 schema. This PR does that backport.The fix
OwnershipDocument.schema.ts— change the innervaluefield of the sharedVALUE_NUMBERwrapper fromType.Optional(Type.Number())toType.Optional(Type.String()), with a comment mirroringForm_144.schema.tsexplaining why. This single change covers all 8 affected leaves:transactionAmounts.transactionSharestransactionAmounts.transactionPricePerSharepostTransactionAmounts.sharesOwnedFollowingTransactionpostTransactionAmounts.valueOwnedFollowingTransactionunderlyingSecurity.underlyingSecuritySharesunderlyingSecurity.underlyingSecurityValuederivativeTransaction.conversionOrExercisePricederivativeHolding.conversionOrExercisePriceOwnershipDocument.storage.ts— narrownum()'s parameter type to{ value?: string } | string | undefinedand drop the now-deadtypeof v === "number"branch. Behavior is identical: string-numeric still parses;""still maps tonull.extractor_versionbumped from"1.0.0"to"1.1.0".Five regression tests in
OwnershipDocument.storage.test.ts, mirroring the Form 144 pattern: load a real fixture, mutate one numeric leaf's.valueto"", run storage, assert the corresponding DB column isnull(not0).Migration strategy
Re-extraction via the
extractor_versionbump. The dispatch/extractor-runs machinery already triggers re-extraction whenever the recorded extractor version is behind the code's declared version.processOwnershipFormcallssection16Repo.clearTransactions(accession_number)/clearHoldings(accession_number)before writing, so re-running is idempotent and naturally heals the fabricated-zero rows in place.This covers Forms 3, 4, 5, 3/A, 4/A, 5/A — every form that flows through
processOwnershipForm.Why not a SQL migration?
A
UPDATE section16_transactions SET shares = NULL WHERE shares = 0(and similar for the other 7 columns) is unsafe and was rejected: legitimate zero values exist in the wild (e.g. a $0 acquisition price for a gift transaction, a position closed to exactly zero shares following a transaction). After this corruption ran in production, there is no way at the SQL layer to distinguish a fabricated-from-empty 0 from a genuine 0 — only re-extraction from the original XML can.Downstream consumers
No
=== 0sentinel consumers were found in this repo for any of the 8 affected columns. Out-of-repo downstream consumers that interpret these columns should switch from=== 0/IS 0checks toIS NULLfor the "not reported" case, since post-fix the column is properlyNULLrather than a fabricated0.What this PR explicitly does not touch
Section16Schema.ts— DB columns are already nullable; no schema change needed.Section16Repo.ts— repo just stores what storage hands it.Form_144.*— already correct (this PR is the backport).https://claude.ai/code/session_013keqifgtvU1cKdyvoNf1ua
Generated by Claude Code