Skip to content

Update Application.run() behavior #932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 18, 2025
Merged

Conversation

daniil-quix
Copy link
Collaborator

Main changes

  1. Application.run() with the count param now counts the outputs instead of messages.
    It makes the behavior easier to reason about for the users, and also simplifies the inner logic.
    For example, we no longer need to handle messages from the repartition topics separately.

  2. Application.run() with count or timeout parameters can now accumulate and return the processed outputs for further debugging.
    This approach is less manual than using a ListSink.

  3. StreamingDataFrame.group_by() now filters the data after sending it to the repartition topic to avoid double-counting

daniil-quix and others added 3 commits June 16, 2025 14:05
Co-authored-by: Remy Gwaramadze <gwaramadze@users.noreply.github.com>
Co-authored-by: Remy Gwaramadze <gwaramadze@users.noreply.github.com>
gwaramadze
gwaramadze previously approved these changes Jun 16, 2025
@daniil-quix daniil-quix merged commit 7363251 into main Jun 18, 2025
4 checks passed
@daniil-quix daniil-quix deleted the feature/app-run-count-update branch June 18, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants