Skip to content

Activity

Disable use_cuda for local_sgd_integ_tests (#149)

Pull request merge
H-Huangpushed 1 commit to main • aa00ef5…6e17fb4 • 
13 hours ago

Support generic quorum api on LighthouseClient (#150)

Pull request merge
fduwjjpushed 1 commit to main • 2b3cd8d…aa00ef5 • 
yesterday

Fix documents

fduwjjpushed 1 commit to lighthouse_client • a6eacfc…456d1a3 • 
yesterday

Include into coordination

fduwjjpushed 1 commit to lighthouse_client • 3dfacd9…a6eacfc • 
yesterday

Address comments

fduwjjpushed 1 commit to lighthouse_client • be3bbb5…3dfacd9 • 
yesterday

Remove

fduwjjpushed 1 commit to lighthouse_client • 0c5221c…be3bbb5 • 
yesterday

Fix sphinx-theme

fduwjjpushed 1 commit to lighthouse_client • f9fccdb…0c5221c • 
yesterday

Add code comment

fduwjjpushed 1 commit to lighthouse_client • 0e45087…f9fccdb • 
yesterday

Make output back to python dict again

fduwjjpushed 1 commit to lighthouse_client • 862b2cb…0e45087 • 
yesterday

Deleted branch

d4l3kdeleted d4l3k/async_err • 
yesterday

ProcessGroupNCCL,Manager: surface async abort errors correctly (#147)

Pull request merge
d4l3kpushed 1 commit to main • 3724f7c…2b3cd8d • 
yesterday

Fix unit test failure

fduwjjpushed 1 commit to lighthouse_client • 71dcd48…862b2cb • 
yesterday

Add unit test

fduwjjpushed 1 commit to lighthouse_client • f689429…71dcd48 • 
yesterday

support generic quorum api on LighthouseClient

fduwjjcreated lighthouse_client • f689429 • 
2 days ago

ProcessGroupNCCL,Manager: surface async abort errors correctly

Force push
d4l3kforce pushed to d4l3k/async_err • 8b2581f…0e8bf01 • 
6 days ago

ProcessGroupNCCL,Manager: surface async abort errors correctly

Force push
d4l3kforce pushed to d4l3k/async_err • b06100f…8b2581f • 
6 days ago

ProcessGroupNCCL,Manager: surface async abort errors correctly

d4l3kcreated d4l3k/async_err • b06100f • 
6 days ago

Deleted branch

d4l3kdeleted d4l3k/tokio_threads • 
6 days ago

tokio: limit number of threads and set names (#146)

Pull request merge
d4l3kpushed 1 commit to main • 538b219…3724f7c • 
6 days ago

tokio: limit number of threads and set names

Force push
d4l3kforce pushed to d4l3k/tokio_threads • 8fd028c…4c662fe • 
6 days ago

tokio: limit number of threads and set names

Force push
d4l3kforce pushed to d4l3k/tokio_threads • e602c05…8fd028c • 
6 days ago

TimeoutManager: delete cuda events on main thread (#142)

Pull request merge
d4l3kpushed 1 commit to main • 038d222…538b219 • 
6 days ago

Deleted branch

d4l3kdeleted d4l3k/recovery_stream • 
7 days ago

manager: use separate stream for recovery (#144)

Pull request merge
d4l3kpushed 1 commit to main • f0a4061…038d222 • 
7 days ago

TimeoutManager: delete cuda events on main thread

Force push
d4l3kforce pushed to d4l3k/del_queue • 3dae3c8…c1ab7d9 • 
7 days ago

tokio: limit number of threads and set names

d4l3kcreated d4l3k/tokio_threads • e602c05 • 
7 days ago

Deleted branch

d4l3kdeleted d4l3k/pg_tcpstore_timeout • 
7 days ago

process_group: set timeout for TCPStore client connect (#145)

Pull request merge
d4l3kpushed 1 commit to main • 73a6f78…f0a4061 • 
7 days ago

TimeoutManager: delete cuda events on main thread

Force push
d4l3kforce pushed to d4l3k/del_queue • 75350d4…3dae3c8 • 
7 days ago

manager: use separate stream for recovery

Force push
d4l3kforce pushed to d4l3k/recovery_stream • f4afd82…33bae9e • 
7 days ago