{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":165720276,"defaultBranch":"master","name":"pytorch","ownerLogin":"z-a-f","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2019-01-14T19:20:05.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/4216323?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1715971218.0","currentOid":""},"activityList":{"items":[{"before":"31ea8290e7973441bcc9517f4f322fe131149e82","after":"245d3125516e0cf46df4c1639ea66a1d82fd476d","ref":"refs/heads/mha_sans_outproj","pushedAt":"2024-05-17T18:54:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"z-a-f","name":"Zafar","path":"/z-a-f","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4216323?s=80&v=4"},"commit":{"message":"Flatten out_proj in MultiHeadAttention\n\nThe MHA has an explicit `nn.Linear` layer for output projections, which is not consistent with the rest of the implementation (s.a. input projections). In addition to that this makes the `nn.MultiHeadAttention` dependent on the linear implementation, as well as making it a nested module.\n\n## Changes:\n\n1. Remove `MultiHeadAttention.out_proj`\n2. Add `MultiHeadAttention.out_proj_weight`, `MultiHeadAttention.out_proj_bias`. Add the functional linear for forward\n3. Add initialization\n4. Change expected string to hide the `out_proj`\n5. Adds forward compatibility to be able to load old models\n\n## Potential issues:\n\n* Initialization: `nn.Linear` initilizes its weight as uniform Kaiming, while this PR uses uniform Xavier. In addition to that, bias in the `nn.Linear` is uniform based on fan-in/fan-out, while here it is constant 0. This means that numerically this will be different from the original implementation.\n    * *Option 1: Accept current change* -- this is more consistent with the rest of the implementation\n    * *Option 2: Duplicate initialization logic from Linear* -- this is consistent with the initialization from before this PR\n\n## Tests\n\nThere are no new tests, as no new logic or change in functionality is introduced.","shortMessageHtmlLink":"Flatten out_proj in MultiHeadAttention"}},{"before":null,"after":"31ea8290e7973441bcc9517f4f322fe131149e82","ref":"refs/heads/mha_sans_outproj","pushedAt":"2024-05-17T18:40:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"z-a-f","name":"Zafar","path":"/z-a-f","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/4216323?s=80&v=4"},"commit":{"message":"Workflow for uploading additional test stats on workflow dispatch (#126080)\n\nThis kind of an experiment for uploading test stats during the run, and also for test dashboard stuff so it can re calculate the info\n\nAdd workflow that is callable via workflow dispatch for uploading additional test stats\nAdds script that only calculates the additional info\n\nPull Request resolved: https://github.com/pytorch/pytorch/pull/126080\nApproved by: https://github.com/ZainRizvi","shortMessageHtmlLink":"Workflow for uploading additional test stats on workflow dispatch (<a class=\"issue-link js-issue-link\" data-error-text=\"Failed to load title\" data-id=\"2293275743\" data-permission-text=\"Title is private\" data-url=\"https://github.com/pytorch/pytorch/issues/126080\" data-hovercard-type=\"pull_request\" data-hovercard-url=\"/pytorch/pytorch/pull/126080/hovercard\" href=\"https://github.com/pytorch/pytorch/pull/126080\">py…</a>"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAETTamIAA","startCursor":null,"endCursor":null}},"title":"Activity · z-a-f/pytorch"}