fix: dedupe extracted comments in linear time#682
Conversation
The default comment-extraction path checked each preserved comment with Array.prototype.includes against a growing array, making deduplication O(n^2) in the number of distinct preserved comments. A crafted asset with many distinct comments could cause superlinear, CPU-bound build slowdown (GHSA-8cjx-vvr8-p635). Track membership with a Set for O(1) lookups while preserving output order. https://claude.ai/code/session_01XE82o4oLA4FdUP8r7yGd4r
|
🦋 Changeset detectedLatest commit: aac69d7 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Pull request overview
Improves the performance and resilience of the comment-extraction deduplication path by replacing an O(n²) duplicate check with O(1) membership tracking, mitigating a potential CPU-bound slowdown when processing assets with many distinct preserved comments (GHSA-8cjx-vvr8-p635).
Changes:
- Use a
Setto track seen extracted comment strings while preserving output order (applied to both Terser and UglifyJS paths). - Add
GHSAto cspell dictionary for clean spellchecking. - Add a changeset documenting the patch release and the performance/security fix.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/utils.js |
Replaces Array#includes-based dedupe with Set membership checks in comment extraction for linear-time behavior. |
.cspell.json |
Adds GHSA to allowed words and fixes trailing comma consistency. |
.changeset/dedupe-extracted-comments-linear.md |
Documents the patch-level fix and advisory reference for release notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #682 +/- ##
=======================================
Coverage 96.11% 96.11%
=======================================
Files 3 3
Lines 592 592
Branches 203 201 -2
=======================================
Hits 569 569
Misses 21 21
Partials 2 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Asset filenames and the runtime chunk-loading code embed webpack content/chunk/full hashes. Those digests are not stable across environments - this plugin folds the minimizer version into the chunk hash for cache busting, so a different installed terser shifts every hash even when the emitted code is byte-identical. A snapshot serializer now replaces those digests with a placeholder so the snapshots assert on output that actually matters. https://claude.ai/code/session_01XE82o4oLA4FdUP8r7yGd4r
| module.exports = { | ||
| test(value) { | ||
| return typeof value === "string" ? hasHash(value) : isAssetMap(value); | ||
| }, | ||
| serialize(value, config, indentation, depth, refs, printer) { | ||
| if (typeof value === "string") { | ||
| return printer(strip(value), config, indentation, depth, refs); | ||
| } | ||
|
|
||
| const normalized = {}; | ||
|
|
||
| for (const key of Object.keys(value)) { | ||
| normalized[strip(key)] = strip(value[key]); | ||
| } | ||
|
|
||
| return printer(normalized, config, indentation, depth, refs); | ||
| }, |
| const isAssetMap = (value) => | ||
| value !== null && | ||
| typeof value === "object" && | ||
| !Array.isArray(value) && | ||
| Object.keys(value).length > 0 && | ||
| Object.keys(value).every((key) => typeof value[key] === "string") && | ||
| Object.keys(value).some((key) => hasHash(key) || hasHash(value[key])); | ||
|
|
npm 6 (bundled with Node <= 14) can't read the lockfileVersion 3 package-lock.json, so the install step re-resolves terser to the latest match of its range instead of the locked version. The newer terser produces different minified output, breaking the snapshots. Reinstall the exact locked version (read from the pristine committed lockfile) on those rows so their output matches the snapshots. https://claude.ai/code/session_01XE82o4oLA4FdUP8r7yGd4r
@swc/html (used by swcMinifyHtml / swcMinifyHtmlFragment) requires Node >= 14, so those cases fail on the Node 10/12 rows. Filter them out there with a Jest test-name pattern. `-u` drops the snapshots orphaned by the skipped tests so `--ci` doesn't fail on them; it only affects the throwaway CI workspace, and every Node >= 14 row still runs the full suite and validates those snapshots strictly. https://claude.ai/code/session_01XE82o4oLA4FdUP8r7yGd4r
| const HASH = /\b[0-9a-f]{20,}\b/g; | ||
| const PLACEHOLDER = "x".repeat(20); | ||
|
|
||
| const hasHash = (string) => /\b[0-9a-f]{20,}\b/.test(string); | ||
| const strip = (string) => string.replace(HASH, PLACEHOLDER); | ||
|
|
| const normalized = {}; | ||
|
|
||
| for (const key of Object.keys(value)) { | ||
| normalized[strip(key)] = strip(value[key]); | ||
| } |
| - name: Run tests for webpack version ${{ matrix.webpack-version }} | ||
| run: npm run test:coverage -- --ci | ||
| # `@swc/html` (swcMinifyHtml / swcMinifyHtmlFragment) requires Node | ||
| # >= 14, so skip those cases on Node 10/12. `-u` drops the snapshots | ||
| # orphaned by the skipped tests, which `--ci` would otherwise treat as | ||
| # a failure; it only touches the throwaway CI workspace, and every | ||
| # Node >= 14 row still runs the full suite and validates those | ||
| # snapshots strictly. | ||
| shell: bash | ||
| run: | | ||
| case "${{ matrix.node-version }}" in | ||
| 10.x | 12.x) | ||
| npm run test:coverage -- --ci -u -t '^(?!.*swcMinifyHtml).*$' | ||
| ;; | ||
| *) | ||
| npm run test:coverage -- --ci | ||
| ;; | ||
| esac |
- Snapshot serializer: use the `print` API (matching the existing serializer in the repo), compute `Object.keys` once, and map each distinct digest to a unique token so different hashes never collapse into the same value (which could drop asset entries when used as keys). - Move the `@swc/html` tests into a dedicated file that jest.config path-ignores on Node < 14, instead of filtering by test name with `-t`/`-u` in CI. This avoids orphaned snapshots (so `--ci` stays happy without `-u`) and keeps strict snapshot validation on every row, rather than rewriting snapshots on the Node 10/12 rows. https://claude.ai/code/session_01XE82o4oLA4FdUP8r7yGd4r
Deduplicate extracted comments in linear time.
The default comment-extraction path checked each preserved comment with
Array.prototype.includesagainst a growing array, so deduplication was O(n^2) in the number of distinct preserved comments. Membership is now tracked with aSet, keeping build time linear while preserving output order.Also bumps the
webpackdev dependency to^5.107.2.