Skip to content

Ensure Consistency among Data Accesses in the same Thread Block #551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 3, 2025

Conversation

caiomcbr
Copy link
Contributor

The PR ensures that if two or more operations impact the same memory location, it will insert a mechanism to synchronize the thread block between theses operations, preventing any errors. For instance:

copy chunk 0 -> chunk 1
put chunk 1 -> chunk 2

Here we can see the copy operation and put operation have chunk 1 in common for writing and reading, respectively. In this case, I need to make sure the copy operation ends before starting the put operation. Therefore, we should have a synchronization mechanism between them to avoid any conflicts.

In this PR, we will insert synchronization mechanisms between the operations under the following conditions:

Operation using Write Data Access followed by Operation using Write Data Access to the same memory location
Operation using Read Data Access followed by Operation using Write Data Access to the same memory location
Operation using Write Data Access followed by Operation using Read Data Access to the same memory location

@caiomcbr caiomcbr requested a review from Binyang2014 June 17, 2025 23:23
@Binyang2014 Binyang2014 requested a review from Copilot June 20, 2025 17:11
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a conflict-detection pass to insert synchronization operations between thread‐block operations that read/write the same buffer intervals. It adds data‐access tracking (DataAccess), a buffering‐access analyzer (BuffersAccess), and hooks into the existing pipeline to resolve dependencies after instruction fusion.

  • Introduced DataAccessType and DataAccess in dsl_types and built buffer_access.py to detect interval conflicts
  • Extended every operation with unique IDs and local_data_access methods to annotate reads/writes
  • Added resolve_data_dependency through ThreadBlock, Gpu, and MSCCLPPProgram.post_process_operations

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/mscclpp/language/internal/dsl_types.py Added DataAccessType, DataAccess, reordered SyncType
python/mscclpp/language/internal/buffer_access.py New BuffersAccess class to scan and sync operations
python/mscclpp/language/internal/operations.py Gave operations UUIDs and implemented local_data_access
python/mscclpp/language/internal/threadblock.py Imported buffer_access and added resolve_data_dependency
python/mscclpp/language/internal/gpu.py Propagated resolve_data_dependency to all threadblocks
python/mscclpp/language/program.py Called resolve_data_dependency after adding_data_sync
python/mscclpp/language/internal/optmizer.py Included relaxed_signal/relaxed_wait in sync filter
python/mscclpp/language/tests/*.py Updated tests for new sync/DSL types and revised logic
python/mscclpp/language/collectives.py & channel.py Migrated imports to dsl_types
Comments suppressed due to low confidence (1)

python/mscclpp/language/tests/allreduce.py:13

  • [nitpick] The variable num_tb is ambiguous—does it represent the chunk factor, number of thread blocks, or something else? Consider renaming to a more descriptive identifier like chunk_factor or num_thread_blocks.
    num_tb = 8

@caiomcbr caiomcbr merged commit e9dfda6 into feature/dsl Jul 3, 2025
5 checks passed
@caiomcbr caiomcbr deleted the caiorocha/manage_data_access branch July 3, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants