Skip to content

feat(pack): add tracing option to control default tracing output#2809

Merged
fireairforce merged 3 commits intonextfrom
perf-log-output
Apr 19, 2026
Merged

feat(pack): add tracing option to control default tracing output#2809
fireairforce merged 3 commits intonextfrom
perf-log-output

Conversation

@fireairforce
Copy link
Copy Markdown
Member

Summary

Add option for control @utoo/pack's tracing output.

issue: #2792

PR: umijs/umi#13297

Test Plan

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a "tracing" configuration option across the CLI, shared configuration, and core build logic, allowing users to toggle default tracing logs. The option is integrated into the NAPI project structures, CLI command arguments for build and dev, and the internal bundle options where it defaults to true. I have no feedback to provide.

@xusd320
Copy link
Copy Markdown
Contributor

xusd320 commented Apr 19, 2026

支持完全关掉 tracing ? 那 dev 的启动过程怎么透出呢?

@fireairforce
Copy link
Copy Markdown
Member Author

支持完全关掉 tracing ? 那 dev 的启动过程怎么透出呢?

默认没关,透传一个配置,umi 那边可以拿到 dev 完成的状态

@github-actions
Copy link
Copy Markdown

📊 Performance Benchmark Report (with-antd)

Utoopack Performance Report

Report ID: utoopack_performance_report_20260419_113154
Generated: 2026-04-19 11:31:54
Trace File: trace_antd.json (0.6GB, 1.61M spans)
Test Project: examples/with-antd


Executive Summary

Metric Value Assessment
Total Wall Time 14,090.0 ms Baseline
Total Thread Work (de-duped) 35,984.1 ms Non-overlapping busy time
Effective Parallelism 2.6x thread_work / wall_time
Working Threads 5 Threads with actual spans
Thread Utilization 51.1% ⚠️ Suboptimal
Total Spans 1,608,406 All B/E + X events
Meaningful Spans (>= 10us) 523,256 (32.5% of total)
Tracing Noise (< 10us) 1,085,150 (67.5% of total)

Build Phase Timeline

Shows when each build phase is active and how much CPU it consumes.
Self-Time is the time spent exclusively in that phase (excluding children).

Phase Spans Inclusive (ms) Self-Time (ms) Wall Range (ms)
Resolve 124,951 5,017.2 4,075.8 8,015.0
Parse 12,135 2,244.6 1,905.4 13,972.5
Analyze 301,235 24,017.8 17,745.9 13,485.9
Chunk 30,179 2,790.4 2,557.5 11,135.8
Codegen 42,153 5,070.1 3,822.5 10,700.9
Emit 75 62.3 29.7 8,986.7
Other 12,528 1,930.8 1,670.0 14,090.0

Workload Distribution by Diagnostic Tier

Category Spans Inclusive (ms) % Work Self-Time (ms) % Self
P0: Scheduling & Resolution 435,065 30,321.8 84.3% 22,866.5 63.5%
P1: I/O & Heavy Tasks 3,262 162.5 0.5% 129.9 0.4%
P2: Architecture (Locks/Memory) 0 0.0 0.0% 0.0 0.0%
P3: Asset Pipeline 83,143 10,118.8 28.1% 8,299.1 23.1%
P4: Bridge/Interop 0 0.0 0.0% 0.0 0.0%
Other 1,786 530.0 1.5% 511.2 1.4%

Top 20 Tasks by Self-Time

Self-time is the exclusive duration: time spent in the task itself, not in sub-tasks.
This is the most accurate indicator of where CPU cycles are actually spent.

Self (ms) Inclusive (ms) Count Avg Self (us) P95 Self (ms) Max Self (ms) % Work Task Name Top Caller
10,385.7 11,996.8 170,098 61.1 0.1 60.3 28.9% module write all entrypoints to disk (1%)
3,643.0 4,991.7 39,382 92.5 0.2 263.5 10.1% analyze ecmascript module process module (76%)
2,248.1 5,522.6 72,296 31.1 0.1 15.4 6.2% process module module (14%)
2,188.1 2,375.6 62,603 35.0 0.0 20.1 6.1% internal resolving resolving (30%)
1,877.6 2,631.5 61,677 30.4 0.0 16.8 5.2% resolving module (35%)
1,791.3 3,038.9 23,287 76.9 0.3 44.4 5.0% code generation chunking (8%)
1,716.0 1,716.0 16,784 102.2 0.4 36.5 4.8% precompute code generation code generation (44%)
1,701.0 1,898.2 16,489 103.2 0.3 107.4 4.7% chunking write all entrypoints to disk (0%)
1,559.4 1,640.0 8,783 177.6 0.5 68.6 4.3% parse ecmascript analyze ecmascript module (27%)
1,282.4 1,282.4 16,223 79.0 0.3 133.7 3.6% compute async module info chunking (0%)
1,099.8 1,336.1 9,929 110.8 0.0 287.8 3.1% write all entrypoints to disk None (0%)
826.8 827.5 13,520 61.2 0.1 73.6 2.3% compute async chunks compute async chunks (0%)
480.5 491.7 1,560 308.0 1.0 18.3 1.3% webpack loader parse css (10%)
315.2 315.2 2,082 151.4 0.4 17.2 0.9% generate source map code generation (96%)
255.8 514.4 836 306.0 1.5 22.6 0.7% parse css module (5%)
104.0 104.0 1,192 87.2 0.0 28.3 0.3% compute binding usage info write all entrypoints to disk (0%)
87.4 87.4 2,504 34.9 0.0 7.8 0.2% read file parse ecmascript (85%)
64.5 64.5 2,017 32.0 0.0 20.9 0.2% collect mergeable modules compute merged modules (0%)
59.0 64.8 813 72.6 0.1 6.4 0.2% async reference write all entrypoints to disk (1%)
29.6 64.7 170 174.1 0.4 16.2 0.1% make production chunks chunking (2%)

Critical Path Analysis

The longest sequential dependency chains that determine wall-clock time.
Focus on reducing the depth of these chains to improve parallelism.

Rank Self-Time (ms) Depth Path
1 263.6 2 process module → analyze ecmascript module
2 129.6 2 process module → analyze ecmascript module
3 68.7 2 analyze ecmascript module → parse ecmascript
4 61.6 2 code generation → generate source map
5 47.1 2 analyze ecmascript module → parse ecmascript

Batching Candidates

High-volume tasks dominated by a single parent. If the parent can batch them,
it drastically reduces scheduler overhead.

Task Name Count Top Caller (Attribution) Avg Self P95 Self Total Self
analyze ecmascript module 39,382 process module (76%) 92.5 us 0.16 ms 3,643.0 ms

Duration Distribution

Range Count Percentage
<10us 1,085,150 67.5%
10us-100us 491,159 30.5%
100us-1ms 25,476 1.6%
1ms-10ms 6,341 0.4%
10ms-100ms 271 0.0%
>100ms 9 0.0%

Action Items

  1. [P0] Focus on tasks with the highest Self-Time — these are where CPU cycles are actually spent.
  2. [P0] Use Batching Candidates to identify callers that should use try_join or reduce #[turbo_tasks::function] granularity.
  3. [P1] Check Build Phase Timeline for phases with disproportionate wall range vs. self-time (= serialization).
  4. [P1] Inspect P95 Self (ms) for heavy monolith tasks. Focus on long-tail outliers, not averages.
  5. [P1] Review Critical Paths — reducing the longest chain depth directly improves wall-clock time.
  6. [P2] If Thread Utilization < 60%, investigate scheduling gaps (lock contention or deep dependency chains).

Report generated by Utoopack Performance Analysis Agent

@fireairforce fireairforce enabled auto-merge (squash) April 19, 2026 11:32
@fireairforce fireairforce merged commit 81c19cb into next Apr 19, 2026
33 checks passed
@fireairforce fireairforce deleted the perf-log-output branch April 19, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants