Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): use transaction log to cap memory usage #4724

Merged
merged 322 commits into from
Jan 6, 2023
Merged

Conversation

raubitsj
Copy link
Member

@raubitsj raubitsj commented Jan 5, 2023

Description

Use the transaction log (.wandb file) as a disk backed queue of work when network sending gets backed up.

Implemented:

  • Move cancel logic to the writer thread.
  • Convert stop_status and network_status polling to use mailbox
  • Add new request wrappers for non-persistent versions of telemetry and summary records
  • Add new finite state machine library to support describing state transitions with a datastructure
  • And so much more... :)

TODO:

  • add more to fsm tests (more state transitions and actions)
  • add more to flowcontrol tests (telemetry update, etc)
  • add more telemetry? maybe something adhoc for manual testing
  • add telemetry for _sync and _offline

Future: ?

  • Run parts of CI with lowered limits for memory buffer size
  • Finish up relay merge into yea-wandb

Testing

How was this PR tested?

Checklist

  • Include reference to internal ticket "Fixes WB-NNNN" and/or GitHub issue "Fixes #NNNN" (if applicable)
  • Ensure PR title compliance with the conventional commits standards

@github-actions github-actions bot added cc-feat and removed cc-feat labels Jan 6, 2023
@github-actions github-actions bot added cc-feat and removed cc-feat labels Jan 6, 2023
wandb/sdk/lib/fsm.py Outdated Show resolved Hide resolved
wandb/sdk/lib/fsm.py Outdated Show resolved Hide resolved
wandb/sdk/internal/writer.py Outdated Show resolved Hide resolved
wandb/sdk/internal/writer.py Outdated Show resolved Hide resolved
@raubitsj raubitsj enabled auto-merge (squash) January 6, 2023 23:31
@raubitsj raubitsj merged commit f87c083 into main Jan 6, 2023
@raubitsj raubitsj deleted the feat-resiliency-3 branch January 6, 2023 23:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants