Requirement Statement
ID: REQ-NF-PERF-NDIS-001
Type: Non-Functional Requirement
Priority: Critical
Phase: Phase 02 - Requirements Analysis & Specification
The Intel AVB Filter Driver shall achieve packet forwarding latency <1µs (microsecond) in the NDIS filter fast path to meet AVB/TSN timing requirements for Class A traffic (125µs end-to-end latency budget).
Traceability
Traces to: #31 (StR-NDIS-FILTER-001: NDIS Filter Driver Implementation)
Architecture Decisions
- Refined by: #121 (ADR-PERF-001: NDIS Fast Path Optimization)
Quality Scenarios
Test Cases
Rationale
Problem: AVB Class A traffic requires <125µs end-to-end latency (2ms observation window). Filter driver adds latency to packet path.
IEEE 802.1BA Budget:
- NIC ingress: 20µs
- NDIS filter processing: <1µs ← This requirement
- Stack processing: 50µs
- NIC egress: 20µs
- Network transit: 34µs
- Total: ~125µs
Failure Impact: If filter exceeds 1µs, Class A traffic violates latency guarantees → audio/video quality degradation.
Detailed Requirements
PERF-NDIS-001.1: Fast Path Bypass for AVB Traffic
Filter shall detect AVB packets in FilterReceiveNetBufferLists and forward without deep inspection:
Fast Path Criteria:
BOOLEAN IsAvbFastPath(PNET_BUFFER_LIST Nbl) {
// Check EtherType == 0x22F0 (AVB Transport Protocol)
if (EtherType == 0x22F0) return TRUE;
// Check VLAN PCP == 6 or 7 (network control)
if (VlanPresent && (Pcp == 6 || Pcp == 7)) return TRUE;
return FALSE;
}
Fast Path Actions:
- Tag NBL with AVB marker (OOB data)
- Forward immediately to next driver
- No packet inspection
- No buffer copying
- No IOCTL synchronization
Slow Path (non-AVB):
- Full packet inspection allowed
- Statistics collection
- Logging/tracing
- IOCTL operations
Latency Target: Fast path <500ns (allows 500ns margin for NDIS overhead)
PERF-NDIS-001.2: Zero-Copy Packet Forwarding
Filter shall use NDIS zero-copy APIs (no packet buffer allocation):
Receive Path:
VOID FilterReceiveNetBufferLists(
NDIS_HANDLE FilterModuleContext,
PNET_BUFFER_LIST NetBufferLists,
NDIS_PORT_NUMBER PortNumber,
ULONG NumberOfNetBufferLists,
ULONG ReceiveFlags
) {
PFILTER_ADAPTER_CONTEXT ctx = (PFILTER_ADAPTER_CONTEXT)FilterModuleContext;
// Fast path: Forward original NBLs without cloning
if (IsAvbFastPath(NetBufferLists)) {
TagAvbTraffic(NetBufferLists); // <100ns: Set OOB flag
NdisFIndicateReceiveNetBufferLists(
ctx->FilterHandle,
NetBufferLists, // Original NBLs (no clone)
PortNumber,
NumberOfNetBufferLists,
ReceiveFlags
);
return;
}
// Slow path: Inspect/modify if needed
// ...
}
Send Path:
VOID FilterSendNetBufferLists(
NDIS_HANDLE FilterModuleContext,
PNET_BUFFER_LIST NetBufferLists,
NDIS_PORT_NUMBER PortNumber,
ULONG SendFlags
) {
PFILTER_ADAPTER_CONTEXT ctx = (PFILTER_ADAPTER_CONTEXT)FilterModuleContext;
// Fast path: Forward without modification
if (IsAvbFastPath(NetBufferLists)) {
NdisFSendNetBufferLists(
ctx->FilterHandle,
NetBufferLists, // Original NBLs
PortNumber,
SendFlags
);
return;
}
// Slow path: Inspect/modify if needed
// ...
}
Latency Impact:
- Zero-copy: <100ns overhead (just function call + flag check)
- Clone NBL: ~2-5µs (unacceptable for AVB)
PERF-NDIS-001.3: Lock-Free Packet Statistics
Filter shall use atomic operations for statistics (no spinlocks in fast path):
Statistics Structure:
typedef struct _FILTER_STATS {
volatile LONG64 RxPackets; // Atomic increment
volatile LONG64 TxPackets;
volatile LONG64 RxAvbPackets; // AVB-specific counters
volatile LONG64 TxAvbPackets;
volatile LONG64 RxBytes; // Atomic add
volatile LONG64 TxBytes;
} FILTER_STATS;
Atomic Update (fast path):
// Good: Lock-free increment (1 CPU cycle on x64)
InterlockedIncrement64(&ctx->Stats.RxAvbPackets);
// Bad: Spinlock (50-200 CPU cycles, cache line contention)
KeAcquireSpinLockAtDpcLevel(&ctx->StatsLock);
ctx->Stats.RxAvbPackets++;
KeReleaseSpinLockFromDpcLevel(&ctx->StatsLock);
Latency Impact:
- Atomic operations: <10ns
- Spinlock acquisition: 50-200ns (unacceptable in fast path)
Query Statistics (slow path IOCTL):
// Read atomic counters (no locks needed)
Stats->RxPackets = InterlockedCompareExchange64(&ctx->Stats.RxPackets, 0, 0); // Atomic read
PERF-NDIS-001.4: CPU Cache Optimization
Filter shall align critical data structures to cache line boundaries:
Cache-Aligned Context:
typedef struct DECLSPEC_CACHEALIGN _FILTER_ADAPTER_CONTEXT {
// Hot path: First cache line (64 bytes)
NDIS_HANDLE FilterHandle; // Offset 0
PHARDWARE_OPS HwOps; // Offset 8
PVOID HwContext; // Offset 16
FILTER_STATS Stats; // Offset 24 (fits in 64 bytes)
// Cold path: Separate cache lines
NDIS_SPIN_LOCK Lock; // Offset 64 (next cache line)
DEVICE_CONTEXT DeviceContext; // Offset 128
// ...
} FILTER_ADAPTER_CONTEXT;
Compiler Directive:
#define DECLSPEC_CACHEALIGN __declspec(align(64)) // x64 cache line size
Latency Impact:
- Cache-aligned: <5ns access (L1 cache hit)
- Unaligned: 50-200ns (cache line split, false sharing)
PERF-NDIS-001.5: Inline Critical Functions
Filter shall inline fast path functions to reduce call overhead:
Inline Directives:
__forceinline BOOLEAN IsAvbFastPath(PNET_BUFFER_LIST Nbl) {
// Inline assembly or intrinsics for EtherType check
USHORT etherType = *(PUSHORT)((PUCHAR)NblData + 12); // Offset to EtherType
return (etherType == 0x22F0); // Single comparison, no function call
}
__forceinline VOID TagAvbTraffic(PNET_BUFFER_LIST Nbl) {
NET_BUFFER_LIST_INFO(Nbl, NetBufferListFilteringInfo) = (PVOID)AVB_MARKER;
}
Compiler Optimization:
<PropertyGroup>
<WholeProgramOptimization>true</WholeProgramOptimization>
<LinkTimeCodeGeneration>UseLinkTimeCodeGeneration</LinkTimeCodeGeneration>
</PropertyGroup>
<ItemDefinitionGroup>
<ClCompile>
<Optimization>MaxSpeed</Optimization>
<InlineFunctionExpansion>AnySuitable</InlineFunctionExpansion>
</ClCompile>
</ItemDefinitionGroup>
<ItemDefinitionGroup>
<ClCompile>
<FavorSizeOrSpeed>Speed</FavorSizeOrSpeed>
</ClCompile>
</ItemDefinitionGroup>
Latency Impact:
- Inline: 0ns (code embedded in caller)
- Function call: 5-20ns (stack frame, return address, branch prediction)
PERF-NDIS-001.6: Prefetch Packet Headers
Filter shall prefetch packet headers to reduce cache misses:
Prefetch Directive:
VOID FilterReceiveNetBufferLists(...) {
PNET_BUFFER_LIST nbl = NetBufferLists;
while (nbl) {
PNET_BUFFER nb = NET_BUFFER_LIST_FIRST_NB(nbl);
PVOID data = NdisGetDataBuffer(nb, 64, NULL, 1, 0); // Prefetch 64 bytes
_mm_prefetch((char*)data, _MM_HINT_T0); // Prefetch to L1 cache
// Process packet (data now in cache)
if (IsAvbFastPath(data)) {
// ...
}
nbl = NET_BUFFER_LIST_NEXT_NBL(nbl);
}
}
Latency Impact:
- With prefetch: <10ns header access (L1 cache)
- Without prefetch: 50-200ns (L3 cache or RAM)
PERF-NDIS-001.7: DPC Processing for Receive
Filter shall process receive packets at DISPATCH_LEVEL (no IRQL transitions):
NDIS Receive Callback (runs at DISPATCH_LEVEL):
VOID FilterReceiveNetBufferLists(
NDIS_HANDLE FilterModuleContext,
PNET_BUFFER_LIST NetBufferLists,
NDIS_PORT_NUMBER PortNumber,
ULONG NumberOfNetBufferLists,
ULONG ReceiveFlags
) {
// Already at DISPATCH_LEVEL (DPC context)
// No IRQL transition needed
ASSERT(KeGetCurrentIrql() == DISPATCH_LEVEL);
// Fast path processing
// ...
}
Latency Impact:
- DISPATCH_LEVEL: No IRQL transition overhead
- PASSIVE → DISPATCH: 200-500ns per transition
PERF-NDIS-001.8: Avoid Memory Allocation in Fast Path
Filter shall pre-allocate all resources during initialization:
Pre-Allocated Pools:
typedef struct _FILTER_ADAPTER_CONTEXT {
NDIS_HANDLE NblPoolHandle; // Pre-allocated NBL pool (slow path only)
NDIS_HANDLE NbPoolHandle; // Pre-allocated NB pool
NDIS_HANDLE BufferPool; // Pre-allocated buffer pool
} FILTER_ADAPTER_CONTEXT;
Initialization:
NTSTATUS AllocateFilterPools(PFILTER_ADAPTER_CONTEXT ctx) {
NET_BUFFER_LIST_POOL_PARAMETERS nblParams = {0};
nblParams.Header.Type = NDIS_OBJECT_TYPE_DEFAULT;
nblParams.Header.Size = sizeof(NET_BUFFER_LIST_POOL_PARAMETERS);
nblParams.ProtocolId = NDIS_PROTOCOL_ID_DEFAULT;
nblParams.ContextSize = 0;
nblParams.fAllocateNetBuffer = FALSE;
nblParams.PoolTag = 'LBVA'; // 'AVBL'
ctx->NblPoolHandle = NdisAllocateNetBufferListPool(ctx->FilterHandle, &nblParams);
if (!ctx->NblPoolHandle) return STATUS_INSUFFICIENT_RESOURCES;
// Pre-allocate 64 NBLs (reused from pool)
// ...
return STATUS_SUCCESS;
}
Fast Path (no allocation):
// Good: Reuse from pre-allocated pool (200-500ns)
PNET_BUFFER_LIST clone = NdisAllocateNetBufferList(ctx->NblPoolHandle, ...);
// Bad: Allocate dynamically (2-10µs, unacceptable)
PNET_BUFFER_LIST clone = ExAllocatePoolWithTag(NonPagedPool, sizeof(NET_BUFFER_LIST), 'LBVA');
Latency Impact:
- Pool allocation: 200-500ns
- Dynamic allocation: 2-10µs (20x slower)
PERF-NDIS-001.9: Minimize Conditional Branches
Filter shall optimize branch prediction for fast path:
Branch Optimization:
// Good: Predict AVB traffic is rare (fall-through)
if (UNLIKELY(IsAvbFastPath(nbl))) { // Hint: unlikely branch
FastPathForward(nbl);
return;
}
// Default path: Non-AVB processing
// ...
// Compiler hint for branch prediction
#define UNLIKELY(x) __builtin_expect(!!(x), 0) // GCC/Clang
#define LIKELY(x) __builtin_expect(!!(x), 1)
Latency Impact:
- Correct prediction: <1ns (pipelined)
- Misprediction: 10-20ns (pipeline flush)
Error Scenarios
ES-PERF-NDIS-001: Fast Path Latency Exceeded
Condition: Packet forwarding takes >1µs (measured via TSC)
NTSTATUS: STATUS_TIMEOUT (0x00000102)
Recovery: Log event; switch to slow path for diagnostics
User Impact: AVB latency budget violated → audio/video glitches
Prevention: Performance profiling during development
Event ID: 17101 (Warning: Fast path latency exceeded)
Test: Timestamp packet entry/exit; verify <1µs
ES-PERF-NDIS-002: Memory Allocation in Fast Path
Condition: Fast path attempts dynamic allocation (detected via verifier)
NTSTATUS: STATUS_UNSUCCESSFUL (0xC0000001)
Recovery: Use pre-allocated pool instead
User Impact: Latency spike → potential packet drop
Prevention: Driver Verifier with low resource simulation
Event ID: 17102 (Error: Unexpected allocation in fast path)
Test: Enable Driver Verifier; verify no allocations during receive
ES-PERF-NDIS-003: Spinlock Contention in Fast Path
Condition: Multiple CPUs contend for statistics lock
NTSTATUS: N/A (performance degradation)
Recovery: Replace spinlock with atomic operations
User Impact: Latency increases 50-200ns per packet
Prevention: Lock-free atomic counters
Event ID: 17103 (Warning: Spinlock contention detected)
Test: Multi-core stress test; monitor lock wait time
ES-PERF-NDIS-004: Cache Line False Sharing
Condition: Multiple CPUs modify adjacent variables (same cache line)
NTSTATUS: N/A (performance degradation)
Recovery: Align hot variables to separate cache lines
User Impact: Latency increases 50-200ns (cache coherency traffic)
Prevention: DECLSPEC_CACHEALIGN on hot structures
Event ID: 17104 (Warning: Cache line contention detected)
Test: CPU cache profiler (Intel VTune); check false sharing
ES-PERF-NDIS-005: Packet Cloning in Fast Path
Condition: Fast path incorrectly clones NBLs instead of forwarding
NTSTATUS: N/A (performance degradation)
Recovery: Remove cloning; use zero-copy forward
User Impact: Latency increases 2-5µs per packet
Prevention: Code review; static analysis
Event ID: 17105 (Error: Unnecessary NBL cloning detected)
Test: ETW tracing; verify NdisAllocateCloneNetBufferList not called
ES-PERF-NDIS-006: IRQL Transition in Fast Path
Condition: Fast path lowers IRQL (KeRaiseIrql/KeLowerIrql)
NTSTATUS: N/A (performance degradation)
Recovery: Keep processing at DISPATCH_LEVEL
User Impact: Latency increases 200-500ns per transition
Prevention: IRQL assertions in fast path
Event ID: 17106 (Error: IRQL transition in fast path)
Test: Instrument KeRaiseIrql; verify not called during receive
ES-PERF-NDIS-007: Branch Misprediction Penalty
Condition: Fast path has unpredictable branches (50/50 split)
NTSTATUS: N/A (performance degradation)
Recovery: Reorganize code for fall-through common case
User Impact: Latency increases 10-20ns per misprediction
Prevention: Profile branch statistics; optimize hot path
Event ID: 17107 (Info: Branch misprediction rate high)
Test: CPU performance counters; measure misprediction rate
ES-PERF-NDIS-008: TLB Miss in Packet Access
Condition: Packet buffer not page-aligned → TLB miss
NTSTATUS: N/A (performance degradation)
Recovery: Prefetch packet data; use large pages if possible
User Impact: Latency increases 50-200ns on TLB miss
Prevention: NDIS handles alignment; driver cannot control
Event ID: 17108 (Info: TLB miss rate elevated)
Test: CPU performance counters (PMU); measure DTLB misses
ES-PERF-NDIS-009: Excessive Function Call Depth
Condition: Fast path has deep call stack (>5 levels)
NTSTATUS: N/A (performance degradation)
Recovery: Inline critical functions
User Impact: Latency increases 5-20ns per call
Prevention: __forceinline on hot path functions
Event ID: 17109 (Warning: Deep call stack in fast path)
Test: Call stack profiler; verify <3 levels in fast path
ES-PERF-NDIS-010: Non-Temporal Store Pollution
Condition: Fast path writes pollute CPU cache
NTSTATUS: N/A (performance degradation)
Recovery: Use non-temporal stores for large buffers
User Impact: Latency increases 20-100ns (cache eviction)
Prevention: _mm_stream_si64 for non-temporal writes
Event ID: 17110 (Info: Cache pollution detected)
Test: Cache profiler; measure eviction rate
Performance Metrics
PM-PERF-NDIS-001: Fast Path Latency (Target)
Target: <1µs (1000ns) packet forwarding
Measurement: RDTSC timestamp at entry/exit of FilterReceiveNetBufferLists
Threshold: 95th percentile <1µs, 99th percentile <1.5µs
Test: High-rate AVB traffic (10,000 pps); measure per-packet latency
PM-PERF-NDIS-002: Fast Path Throughput
Target: 1 Gbps line rate (AVB traffic)
Measurement: Measure packets/second with 1500-byte frames
Threshold: >80,000 pps (1 Gbps / 12,000 bits per packet)
Test: Packet generator; saturate filter with AVB traffic
PM-PERF-NDIS-003: CPU Utilization
Target: <5% CPU per 1 Gbps AVB traffic
Measurement: Windows Performance Monitor (% Processor Time)
Threshold: <5% CPU on 4-core system
Test: Sustained 1 Gbps AVB traffic; measure driver CPU time
PM-PERF-NDIS-004: Cache Miss Rate
Target: <1% L1 cache miss rate in fast path
Measurement: CPU performance counters (PEBS/PMU)
Threshold: <1% L1 miss rate
Test: Intel VTune cache analysis during packet processing
PM-PERF-NDIS-005: Branch Misprediction Rate
Target: <0.5% branch mispredictions in fast path
Measurement: CPU performance counters (branch-misses event)
Threshold: <0.5% of all branches
Test: perf stat -e branch-misses (Linux) or VTune (Windows)
PM-PERF-NDIS-006: Memory Allocations
Target: Zero allocations in fast path
Measurement: Driver Verifier pool allocation tracking
Threshold: 0 allocations during packet forwarding
Test: Driver Verifier + ETW tracing; verify no ExAllocatePool calls
PM-PERF-NDIS-007: Spinlock Wait Time
Target: 0ns (no spinlocks in fast path)
Measurement: ETW kernel tracing (SpinlockAcquire events)
Threshold: 0 spinlock acquisitions in fast path
Test: Concurrency stress test; verify no spinlock events
PM-PERF-NDIS-008: Atomic Operation Latency
Target: <10ns per InterlockedIncrement64
Measurement: RDTSC micro-benchmark
Threshold: <10ns average
Test: Tight loop of InterlockedIncrement64; measure overhead
PM-PERF-NDIS-009: Packet Drop Rate
Target: 0% packet drops under load
Measurement: NDIS statistics (IfOutDiscards, IfInDiscards)
Threshold: 0 drops at 1 Gbps sustained
Test: 24-hour stress test at line rate; verify no drops
Acceptance Criteria (Gherkin Format)
Feature: Packet Forwarding Performance <1µs
As an AVB application
I need minimal filter latency
So that Class A traffic meets <125µs end-to-end latency
Scenario: Fast path latency under 1µs
Given AVB packet stream at 10,000 pps
When filter receives packets
Then 95% of packets forwarded in <1µs
And 99% of packets forwarded in <1.5µs
Scenario: Zero-copy forwarding
Given AVB packet with EtherType 0x22F0
When filter processes packet
Then original NBL forwarded (no clone)
And no memory allocation occurs
And latency <500ns
Scenario: Lock-free statistics
Given 4 CPUs processing packets concurrently
When updating packet counters
Then atomic operations used (no spinlocks)
And no cache line contention
And latency <10ns per counter update
Scenario: Line rate throughput
Given 1 Gbps AVB traffic (80,000 pps)
When sustaining load for 1 hour
Then 0% packet drops
And CPU utilization <5%
And latency remains <1µs
Scenario: Cache-optimized data structures
Given hot path variables in first cache line
When accessing FilterHandle, HwOps, Stats
Then all accesses hit L1 cache (<5ns)
And no cache line splits
And no false sharing between CPUs
Dependencies
Prerequisites:
Effort Estimation
Complexity: High (requires low-level optimization)
Estimated Effort: 40 hours (optimize + profile + test)
Status: Draft
Created: 2025-12-09
Enhanced: 2025-12-10 (Added 10 error scenarios, 9 performance metrics, Event IDs 17101-17110)
Requirement Statement
ID: REQ-NF-PERF-NDIS-001
Type: Non-Functional Requirement
Priority: Critical
Phase: Phase 02 - Requirements Analysis & Specification
The Intel AVB Filter Driver shall achieve packet forwarding latency <1µs (microsecond) in the NDIS filter fast path to meet AVB/TSN timing requirements for Class A traffic (125µs end-to-end latency budget).
Traceability
Traces to: #31 (StR-NDIS-FILTER-001: NDIS Filter Driver Implementation)
Architecture Decisions
Quality Scenarios
Test Cases
Rationale
Problem: AVB Class A traffic requires <125µs end-to-end latency (2ms observation window). Filter driver adds latency to packet path.
IEEE 802.1BA Budget:
Failure Impact: If filter exceeds 1µs, Class A traffic violates latency guarantees → audio/video quality degradation.
Detailed Requirements
PERF-NDIS-001.1: Fast Path Bypass for AVB Traffic
Filter shall detect AVB packets in
FilterReceiveNetBufferListsand forward without deep inspection:Fast Path Criteria:
Fast Path Actions:
Slow Path (non-AVB):
Latency Target: Fast path <500ns (allows 500ns margin for NDIS overhead)
PERF-NDIS-001.2: Zero-Copy Packet Forwarding
Filter shall use NDIS zero-copy APIs (no packet buffer allocation):
Receive Path:
Send Path:
Latency Impact:
PERF-NDIS-001.3: Lock-Free Packet Statistics
Filter shall use atomic operations for statistics (no spinlocks in fast path):
Statistics Structure:
Atomic Update (fast path):
Latency Impact:
Query Statistics (slow path IOCTL):
PERF-NDIS-001.4: CPU Cache Optimization
Filter shall align critical data structures to cache line boundaries:
Cache-Aligned Context:
Compiler Directive:
Latency Impact:
PERF-NDIS-001.5: Inline Critical Functions
Filter shall inline fast path functions to reduce call overhead:
Inline Directives:
Compiler Optimization:
Latency Impact:
PERF-NDIS-001.6: Prefetch Packet Headers
Filter shall prefetch packet headers to reduce cache misses:
Prefetch Directive:
Latency Impact:
PERF-NDIS-001.7: DPC Processing for Receive
Filter shall process receive packets at DISPATCH_LEVEL (no IRQL transitions):
NDIS Receive Callback (runs at DISPATCH_LEVEL):
Latency Impact:
PERF-NDIS-001.8: Avoid Memory Allocation in Fast Path
Filter shall pre-allocate all resources during initialization:
Pre-Allocated Pools:
Initialization:
Fast Path (no allocation):
Latency Impact:
PERF-NDIS-001.9: Minimize Conditional Branches
Filter shall optimize branch prediction for fast path:
Branch Optimization:
Latency Impact:
Error Scenarios
ES-PERF-NDIS-001: Fast Path Latency Exceeded
Condition: Packet forwarding takes >1µs (measured via TSC)
NTSTATUS:
STATUS_TIMEOUT(0x00000102)Recovery: Log event; switch to slow path for diagnostics
User Impact: AVB latency budget violated → audio/video glitches
Prevention: Performance profiling during development
Event ID: 17101 (Warning: Fast path latency exceeded)
Test: Timestamp packet entry/exit; verify <1µs
ES-PERF-NDIS-002: Memory Allocation in Fast Path
Condition: Fast path attempts dynamic allocation (detected via verifier)
NTSTATUS:
STATUS_UNSUCCESSFUL(0xC0000001)Recovery: Use pre-allocated pool instead
User Impact: Latency spike → potential packet drop
Prevention: Driver Verifier with low resource simulation
Event ID: 17102 (Error: Unexpected allocation in fast path)
Test: Enable Driver Verifier; verify no allocations during receive
ES-PERF-NDIS-003: Spinlock Contention in Fast Path
Condition: Multiple CPUs contend for statistics lock
NTSTATUS: N/A (performance degradation)
Recovery: Replace spinlock with atomic operations
User Impact: Latency increases 50-200ns per packet
Prevention: Lock-free atomic counters
Event ID: 17103 (Warning: Spinlock contention detected)
Test: Multi-core stress test; monitor lock wait time
ES-PERF-NDIS-004: Cache Line False Sharing
Condition: Multiple CPUs modify adjacent variables (same cache line)
NTSTATUS: N/A (performance degradation)
Recovery: Align hot variables to separate cache lines
User Impact: Latency increases 50-200ns (cache coherency traffic)
Prevention: DECLSPEC_CACHEALIGN on hot structures
Event ID: 17104 (Warning: Cache line contention detected)
Test: CPU cache profiler (Intel VTune); check false sharing
ES-PERF-NDIS-005: Packet Cloning in Fast Path
Condition: Fast path incorrectly clones NBLs instead of forwarding
NTSTATUS: N/A (performance degradation)
Recovery: Remove cloning; use zero-copy forward
User Impact: Latency increases 2-5µs per packet
Prevention: Code review; static analysis
Event ID: 17105 (Error: Unnecessary NBL cloning detected)
Test: ETW tracing; verify NdisAllocateCloneNetBufferList not called
ES-PERF-NDIS-006: IRQL Transition in Fast Path
Condition: Fast path lowers IRQL (KeRaiseIrql/KeLowerIrql)
NTSTATUS: N/A (performance degradation)
Recovery: Keep processing at DISPATCH_LEVEL
User Impact: Latency increases 200-500ns per transition
Prevention: IRQL assertions in fast path
Event ID: 17106 (Error: IRQL transition in fast path)
Test: Instrument KeRaiseIrql; verify not called during receive
ES-PERF-NDIS-007: Branch Misprediction Penalty
Condition: Fast path has unpredictable branches (50/50 split)
NTSTATUS: N/A (performance degradation)
Recovery: Reorganize code for fall-through common case
User Impact: Latency increases 10-20ns per misprediction
Prevention: Profile branch statistics; optimize hot path
Event ID: 17107 (Info: Branch misprediction rate high)
Test: CPU performance counters; measure misprediction rate
ES-PERF-NDIS-008: TLB Miss in Packet Access
Condition: Packet buffer not page-aligned → TLB miss
NTSTATUS: N/A (performance degradation)
Recovery: Prefetch packet data; use large pages if possible
User Impact: Latency increases 50-200ns on TLB miss
Prevention: NDIS handles alignment; driver cannot control
Event ID: 17108 (Info: TLB miss rate elevated)
Test: CPU performance counters (PMU); measure DTLB misses
ES-PERF-NDIS-009: Excessive Function Call Depth
Condition: Fast path has deep call stack (>5 levels)
NTSTATUS: N/A (performance degradation)
Recovery: Inline critical functions
User Impact: Latency increases 5-20ns per call
Prevention: __forceinline on hot path functions
Event ID: 17109 (Warning: Deep call stack in fast path)
Test: Call stack profiler; verify <3 levels in fast path
ES-PERF-NDIS-010: Non-Temporal Store Pollution
Condition: Fast path writes pollute CPU cache
NTSTATUS: N/A (performance degradation)
Recovery: Use non-temporal stores for large buffers
User Impact: Latency increases 20-100ns (cache eviction)
Prevention: _mm_stream_si64 for non-temporal writes
Event ID: 17110 (Info: Cache pollution detected)
Test: Cache profiler; measure eviction rate
Performance Metrics
PM-PERF-NDIS-001: Fast Path Latency (Target)
Target: <1µs (1000ns) packet forwarding
Measurement: RDTSC timestamp at entry/exit of FilterReceiveNetBufferLists
Threshold: 95th percentile <1µs, 99th percentile <1.5µs
Test: High-rate AVB traffic (10,000 pps); measure per-packet latency
PM-PERF-NDIS-002: Fast Path Throughput
Target: 1 Gbps line rate (AVB traffic)
Measurement: Measure packets/second with 1500-byte frames
Threshold: >80,000 pps (1 Gbps / 12,000 bits per packet)
Test: Packet generator; saturate filter with AVB traffic
PM-PERF-NDIS-003: CPU Utilization
Target: <5% CPU per 1 Gbps AVB traffic
Measurement: Windows Performance Monitor (% Processor Time)
Threshold: <5% CPU on 4-core system
Test: Sustained 1 Gbps AVB traffic; measure driver CPU time
PM-PERF-NDIS-004: Cache Miss Rate
Target: <1% L1 cache miss rate in fast path
Measurement: CPU performance counters (PEBS/PMU)
Threshold: <1% L1 miss rate
Test: Intel VTune cache analysis during packet processing
PM-PERF-NDIS-005: Branch Misprediction Rate
Target: <0.5% branch mispredictions in fast path
Measurement: CPU performance counters (branch-misses event)
Threshold: <0.5% of all branches
Test: perf stat -e branch-misses (Linux) or VTune (Windows)
PM-PERF-NDIS-006: Memory Allocations
Target: Zero allocations in fast path
Measurement: Driver Verifier pool allocation tracking
Threshold: 0 allocations during packet forwarding
Test: Driver Verifier + ETW tracing; verify no ExAllocatePool calls
PM-PERF-NDIS-007: Spinlock Wait Time
Target: 0ns (no spinlocks in fast path)
Measurement: ETW kernel tracing (SpinlockAcquire events)
Threshold: 0 spinlock acquisitions in fast path
Test: Concurrency stress test; verify no spinlock events
PM-PERF-NDIS-008: Atomic Operation Latency
Target: <10ns per InterlockedIncrement64
Measurement: RDTSC micro-benchmark
Threshold: <10ns average
Test: Tight loop of InterlockedIncrement64; measure overhead
PM-PERF-NDIS-009: Packet Drop Rate
Target: 0% packet drops under load
Measurement: NDIS statistics (IfOutDiscards, IfInDiscards)
Threshold: 0 drops at 1 Gbps sustained
Test: 24-hour stress test at line rate; verify no drops
Acceptance Criteria (Gherkin Format)
Dependencies
Prerequisites:
Effort Estimation
Complexity: High (requires low-level optimization)
Estimated Effort: 40 hours (optimize + profile + test)
Status: Draft
Created: 2025-12-09
Enhanced: 2025-12-10 (Added 10 error scenarios, 9 performance metrics, Event IDs 17101-17110)