Skip to content

Conversation

EdmondDantes
Copy link
Contributor

No description provided.

       - Lines 948-960: Replaced heap allocation (zend_vm_stack_new_page) with stack allocation
       - Line 949: Allocated VM stack memory directly on the C stack using char vm_stack_memory[ZEND_FIBER_VM_STACK_SIZE]
       - Lines 950-955: Manually initialized the VM stack structure with proper pointers and boundaries
       - Zero malloc calls: Eliminated the heap allocation for VM stack during coroutine startup

       Memory Optimization Benefits:
       1. Eliminated heap allocation: No more zend_vm_stack_new_page() malloc call
       2. Stack-based allocation: VM stack memory is now allocated on the C stack
       3. Automatic cleanup: Stack memory is automatically freed when function exits
       4. Better performance: Reduced malloc/free overhead and memory fragmentation
       5. Cache locality: Stack allocation provides better memory locality

       Technical Implementation:
       - Used ZEND_FIBER_VM_STACK_SIZE constant (typically 4KB) for stack size
       - Properly initialized VM stack structure with top, end, and prev pointers
       - Maintained compatibility with existing VM stack operations
       - Preserved all existing functionality while optimizing memory allocation
       Changes

       - Added embedded zend_async_waker_t waker field to async_coroutine_t
       - Set coroutine->coroutine.waker = &coroutine->waker during initialization
       - Removed zend_async_waker_new() and zend_async_waker_destroy() calls
       - No API changes - existing code works unchanged

       Benefits

       - Eliminates malloc/free for waker allocation (~16 bytes overhead saved)
       - Better cache locality and performance
       - Reduces memory fragmentation

       Trade-off

       - Increases coroutine size by ~120 bytes
       - Worthwhile for frequent waker usage patterns
         - Added zend_async_waker_t waker; field to struct _async_coroutine_s
         - Added descriptive comment explaining the embedded approach
       2. Updated initialization (coroutine.c:1123):
         - Set coroutine->coroutine.waker = &coroutine->waker; to point to embedded waker
       3. Removed dynamic allocation:
         - Eliminated calls to zend_async_waker_new() and zend_async_waker_destroy()
         - Added comments explaining that waker is embedded and doesn't need separate destruction
       4. Optimized field access (throughout coroutine.c):
         - Replaced all instances of coroutine->coroutine.waker->field with coroutine->waker.field for better performance
         - Updated in functions like getSuspendLocation(), isQueued(), coroutine_info(), async_coroutine_object_gc(), etc.
…rd declarations

       - Move all PHP METHOD functions to dedicated section at end of file
       - Add forward declarations to resolve function dependencies
       - Maintain embedded waker optimization and direct field access
       - Improve code organization with clear C API vs PHP method separation
…ansferring the exception from the Waker to the current context. Previously, this was handled by the switching code.
* fix for async_coroutine_cancel + zend_async_waker_define
* fix execute_next_coroutine_from_fiber logic for context switching.
…tion; now the code is required to explicitly clear the WAKER state after a SUSPEND() operation.
…leanup caused by the switch_to_scheduler() function.

  ZEND_ASYNC_CURRENT_COROUTINE = NULL; — explicit cleanup of the current coroutine.
add ZVAL_UNDEF(&X) after zval_ptr_dtor(&X)
  The first VM stack segment is allocated on the fiber’s C stack and becomes invalid once fiber_entry returns. Previously, fiber_context_cleanup() could still access it, leading to Invalid read errors.

  Now additional heap-allocated stack pages are freed immediately after return_to_main(transfer), and EG(vm_stack) pointers are reset. The first segment is left untouched and is released automatically with the fiber stack.
  The first VM stack segment is allocated on the fiber’s C stack and becomes invalid once fiber_entry returns. Previously, fiber_context_cleanup() could still access it, leading to Invalid read errors.

  Now additional heap-allocated stack pages are freed immediately after return_to_main(transfer), and EG(vm_stack) pointers are reset. The first segment is left untouched and is released automatically with the fiber stack.
@EdmondDantes EdmondDantes added this to the Stabilization2 milestone Sep 3, 2025
@EdmondDantes EdmondDantes self-assigned this Sep 3, 2025
@EdmondDantes EdmondDantes added the enhancement New feature or request label Sep 3, 2025
@EdmondDantes EdmondDantes linked an issue Sep 3, 2025 that may be closed by this pull request
@EdmondDantes EdmondDantes moved this to Testing/Pull in True Async Board Sep 3, 2025
@EdmondDantes EdmondDantes requested a review from Copilot September 3, 2025 07:08
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements "50 performance optimizations" for the async PHP extension, focusing on fiber context management and coroutine switching performance improvements. The changes introduce a fiber context pooling system to reduce allocation overhead, optimize circular buffer operations with inline functions for hot paths, and restructure the scheduler for better performance.

Key changes include:

  • Fiber context pooling: Implementation of a reusable fiber context pool with ASYNC_FIBER_POOL_SIZE configuration
  • Circular buffer optimizations: Inline specialized versions for pointer operations with bitwise arithmetic
  • Scheduler restructuring: Major refactoring of context switching logic and fiber management

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
scope.c Replace zend_async_waker_destroy calls with zend_async_waker_clean for optimized cleanup
scheduler.h Add fiber context pool configuration and new fiber context creation API
scheduler.c Major refactoring with fiber context pooling, optimized switching logic, and reorganized code structure
run-tests.sh Add -m flag to test runner for memory leak checking
php_async.h Add fiber_context_pool to global state
libuv_reactor.c Optimize reactor execution by skipping uv_run() when no handles are active
internal/circular_buffer.h Add inline optimized functions for pointer operations with specialized push/pop
internal/circular_buffer.c Move circular_buffer_is_not_empty to inline implementation
docs/context-switching-flow.puml Add PlantUML documentation for fiber workflow
coroutine.h Restructure with new async_fiber_context_t and embedded waker design
coroutine.c Complete rewrite with embedded wakers, fiber context management, and reorganized structure
benchmarks/*.php Add comprehensive benchmarking suite for performance testing
async_API.c Add missing zend_async_waker_clean calls after suspend operations
async.c Replace zend_async_waker_destroy with zend_async_waker_clean

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

/* Fast specialized version for pointer push (8 bytes) */
static zend_always_inline zend_result circular_buffer_push_ptr(circular_buffer_t *buffer, void *ptr) {
// Check if buffer is full using bitwise AND (capacity is power of 2)
if (EXPECTED(((buffer->head + 1) & (buffer->capacity - 1)) != buffer->tail)) {
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bitwise AND operation assumes buffer capacity is always a power of 2, but there's no validation to ensure this constraint. If capacity is not a power of 2, the modulo operation will be incorrect and could cause buffer corruption.

Copilot uses AI. Check for mistakes.

if (EXPECTED(buffer->head != buffer->tail)) {
// Direct pointer read - no memcpy overhead
*ptr = *(void**)((char*)buffer->data + buffer->tail * sizeof(void*));
buffer->tail = (buffer->tail + 1) & (buffer->capacity - 1);
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as above - bitwise AND operation for modulo only works correctly when capacity is a power of 2. This could lead to incorrect tail advancement if capacity is not a power of 2.

Suggested change
buffer->tail = (buffer->tail + 1) & (buffer->capacity - 1);
buffer->tail = (buffer->tail + 1) % buffer->capacity;

Copilot uses AI. Check for mistakes.


static zend_always_inline void fiber_pool_init(void)
{
circular_buffer_ctor(&ASYNC_G(fiber_context_pool), ASYNC_FIBER_POOL_SIZE, sizeof(async_fiber_context_t*), NULL);
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASYNC_FIBER_POOL_SIZE is defined as 4, which is a power of 2, but the circular buffer inline optimizations in circular_buffer.h assume all capacities are powers of 2. If this size is ever changed to a non-power-of-2 value, the bitwise operations will fail.

Copilot uses AI. Check for mistakes.

{
circular_buffer_t *buffer = &ASYNC_G(fiber_context_pool);

if (buffer->capacity > 0 && false == circular_buffer_is_full(buffer)) {
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comparison false == circular_buffer_is_full(buffer) uses Yoda condition style inconsistently with the rest of the codebase. Consider using !circular_buffer_is_full(buffer) for better readability and consistency.

Suggested change
if (buffer->capacity > 0 && false == circular_buffer_is_full(buffer)) {
if (buffer->capacity > 0 && !circular_buffer_is_full(buffer)) {

Copilot uses AI. Check for mistakes.

Comment on lines +109 to +110
ZEND_ASSERT(ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) &&
"Coroutine waker must be dequeued before destruction");
Copy link

Copilot AI Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion checks waker status but the embedded waker is initialized in the constructor. If a coroutine is destroyed before being properly dequeued, this assertion will fail. The assertion should handle the case where the waker was never queued.

Suggested change
ZEND_ASSERT(ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) &&
"Coroutine waker must be dequeued before destruction");
ZEND_ASSERT((ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) || !coroutine->waker.was_ever_queued) &&
"Coroutine waker must be dequeued before destruction or never queued");

Copilot uses AI. Check for mistakes.

JIT PHP in tracing mode can compile functions on demand. When memory runs out, JIT crashes with an error because it tries to compile a closure. This code attempts to work around the issue so that the test runs correctly.

So we use:
$function = function(bool $out = true) {
    if($out) echo "Shutdown function called\n";
};

$function(false);

register_shutdown_function($function);
@EdmondDantes EdmondDantes merged commit 1d5faf6 into main Sep 4, 2025
1 check passed
@EdmondDantes EdmondDantes deleted the 50-performance-optimizations branch September 4, 2025 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance optimizations

1 participant