Skip to content

Feature: Interval joins #924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jun 19, 2025
Merged

Feature: Interval joins #924

merged 25 commits into from
Jun 19, 2025

Conversation

gwaramadze
Copy link
Contributor

@gwaramadze gwaramadze commented Jun 9, 2025

Implement interval join

This PR introduces a new join_interval() method to StreamingDataFrame for performing stream-to-stream interval joins.

Interval Join

A new interval join feature has been added, allowing records from two streams to be joined based on a user-defined time interval around their timestamps.

Key Features:

  • Symmetric Interval: Configure the join with backward_ms and forward_ms to define a flexible time window for matching records between streams.
  • Join Types: Supports both inner and left joins.
  • Flexible Merging: Offers built-in strategies (raise, keep-left, keep-right) and allows custom functions for merging matched records.
  • Stateful Efficiency: Leverages an enhanced TimestampedStore that can now store and query multiple values per key and timestamp, which is crucial for handling multiple matches within an interval.

Example Usage:

# Join records from `left_sdf` with records from `right_sdf` that
# fall within a window of 1s before to 2s after the left record's timestamp.
joined_sdf = left_sdf.join_interval(
    right_sdf,
    how="inner",
    backward_ms=1000,
    forward_ms=2000,
)

State Store Enhancements

To support this feature, TimestampedStore has been updated with a get_interval() method to retrieve all records within a given time range.

@gwaramadze gwaramadze force-pushed the feature/interval-joins branch from d70e033 to db493c2 Compare June 10, 2025 12:25
@gwaramadze gwaramadze marked this pull request as ready for review June 10, 2025 13:05
@gwaramadze gwaramadze force-pushed the feature/interval-joins branch from db493c2 to 0a92b41 Compare June 12, 2025 11:24
@gwaramadze gwaramadze force-pushed the feature/interval-joins branch 2 times, most recently from 1481cab to eb574e7 Compare June 18, 2025 08:54
@gwaramadze gwaramadze force-pushed the feature/interval-joins branch from de8c91d to bc2de13 Compare June 19, 2025 07:40
@gwaramadze gwaramadze force-pushed the feature/interval-joins branch from c2df19b to f1e2d8b Compare June 19, 2025 11:59
@gwaramadze gwaramadze merged commit bb6e40b into main Jun 19, 2025
4 checks passed
@gwaramadze gwaramadze deleted the feature/interval-joins branch June 19, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants