Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The ability to dynamically set the gradient requirements for modules has been added. Previously, it was assumed that every parameter required gradients, which resulted in resetting the graph after each operation at the module level. This was problematic because we might want to execute operations on modules while still tracking operations. To address this, the fork function has been added to modules, which indicates when we want to send a module to another device with an empty graph.
The detach function has been removed from Module because it can lead to mistakes. Instead of using to_device().detach().require_grad(), which would override previous set_require_grad(false), we can call fork, which preserves gradient settings set when creating a module.
The from_inner function has been removed from ADModule. A static method cannot keep track of dynamic gradient configurations. Instead, the function inner(self) has been changed to inner(&self). Additionally, inner may be renamed to valid since it highlights the purpose of the method and is familiar to PyTorch users.