{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":136989015,"defaultBranch":"master","name":"pytorch","ownerLogin":"xw285cornell","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2018-06-11T22:47:21.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/7795712?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1717377383.0","currentOid":""},"activityList":{"items":[{"before":"d8c8432df2f5058ca4c186b6b3515ad00ec5cf8b","after":"cb614b1868e5720ceca11c627a6a3bfd800c6fe9","ref":"refs/heads/export-D58047484","pushedAt":"2024-06-03T17:00:13.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix power_draw api (#127729)\n\nSummary:\n\naverage_socket_power only gives me NA. So we need to change it to current_socket_power\n\nTest Plan: Before `torch.cuda.power_draw` gives me NA, after it gives me the right power reading (e.g.441)\n\nReviewed By: nmacchioni\n\nDifferential Revision: D58047484","shortMessageHtmlLink":"[AMD] Fix power_draw api (pytorch#127729)"}},{"before":"bc36a0c435f03509fa559e3ebf3bc6c358d96eb4","after":"d8c8432df2f5058ca4c186b6b3515ad00ec5cf8b","ref":"refs/heads/export-D58047484","pushedAt":"2024-06-03T01:17:09.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix power_draw api (#127729)\n\nSummary:\n\naverage_socket_power only gives me NA. So we need to change it to current_socket_power\n\nTest Plan: Before `torch.cuda.power_draw` gives me NA, after it gives me the right power reading (e.g.441)\n\nDifferential Revision: D58047484","shortMessageHtmlLink":"[AMD] Fix power_draw api (pytorch#127729)"}},{"before":null,"after":"bc36a0c435f03509fa559e3ebf3bc6c358d96eb4","ref":"refs/heads/export-D58047484","pushedAt":"2024-06-03T01:16:23.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix power_draw api\n\nSummary: average_socket_power only gives me NA. So we need to change it to current_socket_power\n\nTest Plan: Before `torch.cuda.power_draw` gives me NA, after it gives me the right power reading (e.g.441)\n\nDifferential Revision: D58047484","shortMessageHtmlLink":"[AMD] Fix power_draw api"}},{"before":"68291eb9358b438f51b030e473a222709e91c66a","after":"68758c8f2e9ffce6b103fe118af171c45c063392","ref":"refs/heads/export-D54105053","pushedAt":"2024-05-30T22:10:35.000Z","pushType":"push","commitsCount":3388,"pusher":{"login":"jeffdaily","name":"Jeff Daily","path":"/jeffdaily","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/904248?s=80&v=4"},"commit":{"message":"Merge branch 'main' into export-D54105053","shortMessageHtmlLink":"Merge branch 'main' into export-D54105053"}},{"before":"ce68a11dfb12a46c263e91e93b23112ee4498aea","after":"a04cda7cd2a63a6da47f545dd5e29ed8e2b6f719","ref":"refs/heads/export-D57711088","pushedAt":"2024-05-24T03:12:47.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix deprecated amdsmi api (#126962)\n\nSummary:\n\nhttps://github.com/pytorch/pytorch/pull/119182 uses an API that has already been deprecated by https://github.com/ROCm/amdsmi/commit/c551c3caedbd903ba828e7fdffa5b56d475a15e7. So fixing this in a backward compatible way\n\nReviewed By: nmacchioni\n\nDifferential Revision: D57711088","shortMessageHtmlLink":"[AMD] Fix deprecated amdsmi api (pytorch#126962)"}},{"before":"d12ac9ff871e8cf70fb64a61fcd6a8d018620e49","after":"ce68a11dfb12a46c263e91e93b23112ee4498aea","ref":"refs/heads/export-D57711088","pushedAt":"2024-05-24T02:47:23.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix deprecated amdsmi api (#126962)\n\nSummary:\n\nhttps://github.com/pytorch/pytorch/pull/119182 uses an API that has already been deprecated by https://github.com/ROCm/amdsmi/commit/c551c3caedbd903ba828e7fdffa5b56d475a15e7. So fixing this in a backward compatible way\n\nReviewed By: nmacchioni\n\nDifferential Revision: D57711088","shortMessageHtmlLink":"[AMD] Fix deprecated amdsmi api (pytorch#126962)"}},{"before":"05f2abd0da7cee18b0b950b502f2ebc9738d3eb0","after":"d12ac9ff871e8cf70fb64a61fcd6a8d018620e49","ref":"refs/heads/export-D57711088","pushedAt":"2024-05-24T02:41:55.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix deprecated amdsmi api (#126962)\n\nSummary:\n\nhttps://github.com/pytorch/pytorch/pull/119182 uses an API that has already been deprecated by https://github.com/ROCm/amdsmi/commit/c551c3caedbd903ba828e7fdffa5b56d475a15e7. So fixing this in a backward compatible way\n\nReviewed By: nmacchioni\n\nDifferential Revision: D57711088","shortMessageHtmlLink":"[AMD] Fix deprecated amdsmi api (pytorch#126962)"}},{"before":null,"after":"05f2abd0da7cee18b0b950b502f2ebc9738d3eb0","ref":"refs/heads/export-D57711088","pushedAt":"2024-05-23T06:59:24.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix deprecated amdsmi api\n\nSummary: https://github.com/pytorch/pytorch/pull/119182 uses an API that has already been deprecated by https://github.com/ROCm/amdsmi/commit/c551c3caedbd903ba828e7fdffa5b56d475a15e7. So fixing this in a backward compatible way\n\nDifferential Revision: D57711088","shortMessageHtmlLink":"[AMD] Fix deprecated amdsmi api"}},{"before":"6bc96d979db2755765b2326dc42c01e8aa454f0f","after":"1eda3e9c337543beff30761557e86c9b1e2fb729","ref":"refs/heads/export-D56347560","pushedAt":"2024-05-05T09:00:36.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure ncclComm is not aborted before checking exception (#124466)\n\nSummary:\n\nMore details in this pytorch issue: https://github.com/pytorch/pytorch/issues/124468\n\nIt seems there is a race in the ProcessGroupNCCL shutdown logic. The code is quite simple:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\n\nWhat can happen is this:\n\n1. dist.destroy_process_group() calls into shutdown() and then calls into abort: https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1095\n2. It'll call ncclCommAbort (not graceful afaict), and also set the ncclAsyncErr_ = ncclSystemError; https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/NCCLUtils.hpp#L388. \n3. ncclWatchdog thread may not have woken up while all this shutdown process happens. And in shutdown we're not waiting for watchdog thread\n4. ProcessGroupNCCL dtor is called. It'll wait for the watchdog thread to join\n5. watchdog will check the work's isCompleted() -> then calls checkAndSetException(). Because ncclAsyncError_ was set to ncclSystemError, it'll error out and makes you think it's a nccl error.\n\nSo we can mitigate this issue by checking if the comm was aborted during work.isCompleted/isStarted\n\nSome more longer term discussion in the issue.\n\nTest Plan:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\nno longer errors out\n\nReviewed By: kwen2501, yoyoyocmu\n\nDifferential Revision: D56347560","shortMessageHtmlLink":"[comm] Ensure ncclComm is not aborted before checking exception (pyto…"}},{"before":"734c6300012ab8b7e37f2bf9cea0a7c51c2df9b2","after":"6bc96d979db2755765b2326dc42c01e8aa454f0f","ref":"refs/heads/export-D56347560","pushedAt":"2024-05-05T09:00:33.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure ncclComm is not aborted before checking exception (#124466)\n\nSummary:\n\nMore details in this pytorch issue: https://github.com/pytorch/pytorch/issues/124468\n\nIt seems there is a race in the ProcessGroupNCCL shutdown logic. The code is quite simple:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\n\nWhat can happen is this:\n\n1. dist.destroy_process_group() calls into shutdown() and then calls into abort: https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1095\n2. It'll call ncclCommAbort (not graceful afaict), and also set the ncclAsyncErr_ = ncclSystemError; https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/NCCLUtils.hpp#L388. \n3. ncclWatchdog thread may not have woken up while all this shutdown process happens. And in shutdown we're not waiting for watchdog thread\n4. ProcessGroupNCCL dtor is called. It'll wait for the watchdog thread to join\n5. watchdog will check the work's isCompleted() -> then calls checkAndSetException(). Because ncclAsyncError_ was set to ncclSystemError, it'll error out and makes you think it's a nccl error.\n\nSo we can mitigate this issue by checking if the comm was aborted during work.isCompleted/isStarted\n\nSome more longer term discussion in the issue.\n\nTest Plan:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\nno longer errors out\n\nReviewed By: kwen2501, yoyoyocmu\n\nDifferential Revision: D56347560","shortMessageHtmlLink":"[comm] Ensure ncclComm is not aborted before checking exception (pyto…"}},{"before":null,"after":"43d508264bbba92e1eb41c944adcd2bfb2379e94","ref":"refs/heads/export-D56923833","pushedAt":"2024-05-03T08:07:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Fix cutlass path in inductor\n\nSummary: Trunk is broken because fbcode triton-amd doesn't have cutlass path\n\nTest Plan: It now runs.\n\nDifferential Revision: D56923833","shortMessageHtmlLink":"[AMD] Fix cutlass path in inductor"}},{"before":"1616e27b920fb5465c37ee8fd5dced60858d0aaf","after":"734c6300012ab8b7e37f2bf9cea0a7c51c2df9b2","ref":"refs/heads/export-D56347560","pushedAt":"2024-05-02T18:27:30.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure ncclComm is not aborted before checking exception (#124466)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/pytorch/pull/124466\n\nMore details in this pytorch issue: https://github.com/pytorch/pytorch/issues/124468\n\nIt seems there is a race in the ProcessGroupNCCL shutdown logic. The code is quite simple:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\n\nWhat can happen is this:\n\n1. dist.destroy_process_group() calls into shutdown() and then calls into abort: https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1095\n2. It'll call ncclCommAbort (not graceful afaict), and also set the ncclAsyncErr_ = ncclSystemError; https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/NCCLUtils.hpp#L388.\n3. ncclWatchdog thread may not have woken up while all this shutdown process happens. And in shutdown we're not waiting for watchdog thread\n4. ProcessGroupNCCL dtor is called. It'll wait for the watchdog thread to join\n5. watchdog will check the work's isCompleted() -> then calls checkAndSetException(). Because ncclAsyncError_ was set to ncclSystemError, it'll error out and makes you think it's a nccl error.\n\nSo we can mitigate this issue by checking if the comm was aborted during work.isCompleted/isStarted\n\nSome more longer term discussion in the issue.\n\nTest Plan:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\nno longer errors out\n\nReviewed By: kwen2501, yoyoyocmu\n\nDifferential Revision: D56347560","shortMessageHtmlLink":"[comm] Ensure ncclComm is not aborted before checking exception (pyto…"}},{"before":"ca2c79bb5820f9c6db8306310de993edddd007e4","after":"1616e27b920fb5465c37ee8fd5dced60858d0aaf","ref":"refs/heads/export-D56347560","pushedAt":"2024-04-20T03:13:24.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure ncclComm is not aborted before checking exception (#124466)\n\nSummary:\n\nMore details in this pytorch issue: https://github.com/pytorch/pytorch/issues/124468\n\nIt seems there is a race in the ProcessGroupNCCL shutdown logic. The code is quite simple:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\n\nWhat can happen is this:\n\n1. dist.destroy_process_group() calls into shutdown() and then calls into abort: https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp#L1095\n2. It'll call ncclCommAbort (not graceful afaict), and also set the ncclAsyncErr_ = ncclSystemError; https://github.com/pytorch/pytorch/blob/b2f6cfd9c061a212cde8c8768fda41cc75a3110c/torch/csrc/distributed/c10d/NCCLUtils.hpp#L388. \n3. ncclWatchdog thread may not have woken up while all this shutdown process happens. And in shutdown we're not waiting for watchdog thread\n4. ProcessGroupNCCL dtor is called. It'll wait for the watchdog thread to join\n5. watchdog will check the work's isCompleted() -> then calls checkAndSetException(). Because ncclAsyncError_ was set to ncclSystemError, it'll error out and makes you think it's a nccl error.\n\nSo we can mitigate this issue by checking if the comm was aborted during work.isCompleted/isStarted\n\nSome more longer term discussion in the issue.\n\nTest Plan:\n```\nfor i in range(100):\n dist.all_to_all_single(tensor_out, tensor_in)\ndist.destroy_process_group()\n```\nno longer errors out\n\nDifferential Revision: D56347560","shortMessageHtmlLink":"[comm] Ensure ncclComm is not aborted before checking exception (pyto…"}},{"before":"8b8565ddafe0099ef0d43a940d06cda10f65e528","after":"c7ffefe53edc2987024f235a3f867521dc8ac79c","ref":"refs/heads/export-D55602788","pushedAt":"2024-04-19T09:28:06.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"Reduce warning msg in torch.profiler (#124469)\n\nSummary:\n\nThis is actually quite noisy and my logs are full of this soft assertion msg. Maybe making it log once?\n\nTest Plan:\nOn AMD GPU side, I got a lot of those warnings: \n```\nW0415 01:40:45.109864 917160 collection.cpp:602] Warning: Memcpy ? (? -> ?) (function operator())”\n```\nSo just suppress the excessive logs\n\nReviewed By: aaronenyeshi, yoyoyocmu\n\nDifferential Revision: D55602788","shortMessageHtmlLink":"Reduce warning msg in torch.profiler (pytorch#124469)"}},{"before":null,"after":"8b8565ddafe0099ef0d43a940d06cda10f65e528","ref":"refs/heads/export-D55602788","pushedAt":"2024-04-19T09:27:09.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"Reduce warning msg in torch.profiler\n\nSummary: This is actually quite noisy and my logs are full of this soft assertion msg. Maybe making it log once?\n\nTest Plan:\nOn AMD GPU side, I got a lot of those warnings: \n```\nW0415 01:40:45.109864 917160 collection.cpp:602] Warning: Memcpy ? (? -> ?) (function operator())”\n```\nSo just suppress the excessive logs\n\nReviewed By: aaronenyeshi, yoyoyocmu\n\nDifferential Revision: D55602788","shortMessageHtmlLink":"Reduce warning msg in torch.profiler"}},{"before":"626d7d8b187740cb95599901c17e0a6988135786","after":"2e6a3d2fe0f3b05dcf25cc8d5a72c4450675b90d","ref":"refs/heads/export-D56183954","pushedAt":"2024-04-19T09:24:26.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] TunableOp take priority over DISABLE_ADDMM_HIP_LT (#124161)\n\nSummary:\n\nIt seems super confusing that if we set DISABLE_ADDMM_HIP_LT + PYTORCH_TUNABLEOP_ENABLED, the former takes priority. This is because the former goes through the gemm_and_bias and tunable op is integrated with gemm path. Before we can integrate tunable op with gemm_and_bias, we'll probably just let tunable op takes priority\n\nTest Plan: Run a simple linear program and verified.\n\nReviewed By: nmacchioni\n\nDifferential Revision: D56183954","shortMessageHtmlLink":"[AMD] TunableOp take priority over DISABLE_ADDMM_HIP_LT (pytorch#124161)"}},{"before":null,"after":"4ef18553199e4e086ef25f963f336f3eeac23e1e","ref":"refs/heads/export-D56347940","pushedAt":"2024-04-19T09:01:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure graceful shutdown by waiting watchdog thread to finish\n\nDifferential Revision: D56347940","shortMessageHtmlLink":"[comm] Ensure graceful shutdown by waiting watchdog thread to finish"}},{"before":null,"after":"ca2c79bb5820f9c6db8306310de993edddd007e4","ref":"refs/heads/export-D56347560","pushedAt":"2024-04-19T09:01:48.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[comm] Ensure ncclComm is not aborted before checking exception\n\nDifferential Revision: D56347560","shortMessageHtmlLink":"[comm] Ensure ncclComm is not aborted before checking exception"}},{"before":null,"after":"9c222c4f7b2647ec068b833777a2ee8eabde3d4d","ref":"refs/heads/export-D56226746","pushedAt":"2024-04-17T03:14:36.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[cublas] Keep explicit workspace creation to avoid OOM\n\nSummary:\nWe explicitly set the cublas workspace even though CUDA 12.2+ fixed the issue where memory usage increased during graph capture. Original issue: https://github.com/pytorch/pytorch/pull/83461\n\nThis is because in CUDA 12.2+, the use of cudaMallocAsync in cublas will allocate memory dynamically (even if they're cheap) outside PyTorch's CUDA caching allocator. It's possible that CCA used up all the memory and cublas's cudaMallocAsync will return OOM\n\nTest Plan: CI\n\nDifferential Revision: D56226746","shortMessageHtmlLink":"[cublas] Keep explicit workspace creation to avoid OOM"}},{"before":null,"after":"626d7d8b187740cb95599901c17e0a6988135786","ref":"refs/heads/export-D56183954","pushedAt":"2024-04-16T09:33:14.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] TunableOp take priority over DISABLE_ADDMM_HIP_LT\n\nSummary: It seems super confusing that if we set DISABLE_ADDMM_HIP_LT + PYTORCH_TUNABLEOP_ENABLED, the former takes priority. This is because the former goes through the gemm_and_bias and tunable op is integrated with gemm path. Before we can integrate tunable op with gemm_and_bias, we'll probably just let tunable op takes priority\n\nTest Plan: Run a simple linear program and verified.\n\nDifferential Revision: D56183954","shortMessageHtmlLink":"[AMD] TunableOp take priority over DISABLE_ADDMM_HIP_LT"}},{"before":"3ec9d43cadfe5ea1a38814318e728b72b3508da8","after":"2ff8854f764c8b33450992021587b096b71ec9ce","ref":"refs/heads/export-D54528255","pushedAt":"2024-04-16T08:45:18.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[2/n] Make aten-cpu build gpu agnostic (#124159)\n\nSummary:\n\nAgain, it's not ideal to have aten-cpu diverge between cuda and rocm which can cause weird issues. So let's kill them one by one\n\nTest Plan: CI\n\nDifferential Revision: D54528255","shortMessageHtmlLink":"[2/n] Make aten-cpu build gpu agnostic (pytorch#124159)"}},{"before":null,"after":"3ec9d43cadfe5ea1a38814318e728b72b3508da8","ref":"refs/heads/export-D54528255","pushedAt":"2024-04-16T08:44:07.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[2/n] Make aten-cpu build gpu agnostic\n\nSummary: Again, it's not ideal to have aten-cpu diverge between cuda and rocm which can cause weird issues. So let's kill them one by one\n\nTest Plan: CI\n\nDifferential Revision: D54528255","shortMessageHtmlLink":"[2/n] Make aten-cpu build gpu agnostic"}},{"before":null,"after":"7610496a2c8aef794e65f21f49ea85fef3428e65","ref":"refs/heads/export-D56172660","pushedAt":"2024-04-16T08:31:42.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] use hip_pp_flags for ATen-cpu on AMD build\n\nSummary:\nThis is not great, but our ATen-cpu is not completely GPU agnostic. Previously we have worked on D54453492 (https://github.com/pytorch/pytorch/pull/121082) and D54528255, but there are a few things we haven't resolved, and it's exploding here. So we'll continue to fix them until all are gone. \n\nThis ROCm block is for 4.3 which is very old. I don't think it should be supported any more. So let's just kill this macro\n\nTest Plan: CI\n\nDifferential Revision: D56172660","shortMessageHtmlLink":"[AMD] use hip_pp_flags for ATen-cpu on AMD build"}},{"before":"60bfe1c5c8f25c103771fe5ce8d2bbe04bd4842f","after":"e9415df8fda0ecff5a80f05557eb315937899f2f","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-11T09:22:31.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module (#123633)\n\nSummary:\n\nClean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module (pytorch#123633)"}},{"before":"4dbd109b16828d91faf93046a3030a21b8b1c999","after":"60bfe1c5c8f25c103771fe5ce8d2bbe04bd4842f","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-11T09:22:05.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module (#123633)\n\nSummary:\n\nClean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module (pytorch#123633)"}},{"before":"c323477b2af44ebd7833087dd0158095602766d6","after":"4dbd109b16828d91faf93046a3030a21b8b1c999","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-11T08:46:03.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module (#123633)\n\nSummary:\n\nClean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module (pytorch#123633)"}},{"before":"2717f73a6c81abd136091bca60389f8094c9d2ab","after":"c323477b2af44ebd7833087dd0158095602766d6","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-09T16:00:38.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module (#123633)\n\nSummary:\n\nClean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module (pytorch#123633)"}},{"before":"c911257f2b4b98cb0ee46c126d2ecbece4fcb4e2","after":"2717f73a6c81abd136091bca60389f8094c9d2ab","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-09T16:00:04.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module (#123633)\n\nSummary:\n\nClean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module (pytorch#123633)"}},{"before":null,"after":"c911257f2b4b98cb0ee46c126d2ecbece4fcb4e2","ref":"refs/heads/export-D55815513","pushedAt":"2024-04-09T08:41:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[AMD] Clean up hipify base module\n\nSummary: Clean up the base module namespace and simplify the hipify script.\n\nTest Plan:\nCI \n buck2 build mode/{opt,amd-gpu} //gen_ai/llm_inference/fb/llm:llama_disagg\n\nReviewed By: aaronenyeshi\n\nDifferential Revision: D55815513","shortMessageHtmlLink":"[AMD] Clean up hipify base module"}},{"before":null,"after":"528ab106a6793af2eb301c1ea5ee4839a0c7a8ae","ref":"refs/heads/export-D54687690","pushedAt":"2024-03-08T18:18:43.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xw285cornell","name":"Xiaodong Wang","path":"/xw285cornell","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7795712?s=80&v=4"},"commit":{"message":"[c10d] Improve logPrefix for ND-parallelism\n\nSummary: When we have 2+D parallelism, we have multiple PGs, inter- and intra-host. So just printing PG id (which is by creation order and it's hard to tell which logical PG it is). So i think it has values to log global PG.\n\nTest Plan: CI\n\nDifferential Revision: D54687690","shortMessageHtmlLink":"[c10d] Improve logPrefix for ND-parallelism"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEWy_mTAA","startCursor":null,"endCursor":null}},"title":"Activity · xw285cornell/pytorch"}