# Differentiable Function in PyTorch
In PyTorch, a differentiable function is a mathematical operation or function for which PyTorch can compute the derivative with respect to its inputs. These functions are essential for training neural networks using gradient-based optimaization techniques, such as back propagation.

## * Key Concepts:
### 1. Automatic Differentiation (Autograd):
* PyTorch uses a system called Autograd to automatically compute the gradients of tensors that require gradients. It does this by tracking operations performed on tensors to build a computational graph. Each node in the graph corresponds to a tensor, and the edges represent the operatioons that were used to compute the tensor.

### 2. Differentiable Operations:
* For Autograd to compute gradients, the operations applied to the tensors must be differentiable. This means that PyTorch must be able to calculate the derivative of the function with respect to its inputs.
* Common differentiablbe operations in PyTorch include:
    * Basic arithmetic (addtion, subtraction, multiplication, division)
    * Matrix operations (dot product, matrix multiplication)
    * Non-linear functios (ReLU, sigmoid, tanh)
    * Convolutions ,pooling operations
    * Loss functions (e.g., MSELoss, CrossEntropyLoss)
    * Activation functions

### 3 Backward Functions (grad_fn):
* Each differentiable operation in PyTorch has a corresponding backward function that computes the gradient during backpropagation. The **grad_fn** attribute of a tensor points to this function, allowing PyTorch to trace back through the computational graph to calculate gradients.

### 4. Requirement for Differentiability:
* For a tensor to have gradients calculated, it must have the attribute **requires_grad=True**. This tells PyTorch to track operations on this tensor and to compute gradients during backpropagation.

### 5. Non-Differentiable Operations:
* Some operations are non-differentiable, meaning that gradients cannot be computed for them. Examples include discreat operations (e.g., rounding, comparison operations) and certain indexing operations.
* If a non-differentiable operation is applied, it may cause the computational graph to break, and PyTorch will not be able to compute the gradients. This is why differentiability is crucial in the context of neural networks.

## * Example of a Differentiable Function:

In [1]:
import torch

# Create a tensor with requires_grad=True
x = torch.tensor([2.0, 3.0], requires_grad=True)

# Perform a differentiable operation
y = x ** 2 # y = x^2

# Calculate the gradients by backpropagation
y.sum().backward()

# x.grad now contains the gradient of y with respect to x
print(x.grad) # Output: tensor([4., 6.]) <- x^2の微分は2xなので

tensor([4., 6.])


In this example:
* The operation **x \*\* 2** is differentiable. The gradient of **y** with respect to **x** is **2x**, so the gradients for the values in **x** will be **[4.0, 6.0]**.


## Summary:
* A **diferentiable function** in PyTorch is one for which the derivative can be computed by PyTorch's Autograd system. These functions are fundamental for training neural networks since they allow the model to learn by adjusting parameters based on gradients.
* Most standard mathematical operations, activation functions, and loss functionss in PyTorch are differentiable.
* **Autograd** enable automatic gradient calculation by maintaining a computational graph that tracks operations on tensors.


Understanding which operations are differentiable is crucial when constructing models and custom loss functions, as it ensures that backpropagation can be performed correctly.

# `grad_fn` attribute in pytorch.tensor

In PyTorch, the **grad_fn** parameter is an attribute associated with tensors that tracks the function used to create the tensor. It is part of PyTorch's automatic differentiation engine, known as Autograd, which is responsible for computing gradients during backpropagation in neural networks.

## Key Points about `grad_fn`
### 1. Tracking the Computational Graph
* PyTorch builds a computational graph dynamically as you perform operations on tensors. Each tensor in this graph knows how it was created. The **grad_fn** attribute stores this information.

### 2. Function that Created the Tensor
* The **grad_fn** attribute points to the function that produced the tensor. For example, if a tensor is created as the result of adding two other tensors, the **grad_fn** will point to a **AddBackward0** object.
* It's important for tensors that are the result of an operation because it allows PyTorch to track how each tensor was derived from other tensors, which is essential for calculating gradients.

### 3. Leaf Tensors and Non-Leaf Tensors:
* **Leaf Tensors**: These are tensors that are created by the user, not as the result of an operation, They have **grad_fn** set to **None**.
* **Non-Leaf Tensors**: These are tensors that are created as the result of operations on other tensors. The have **grad_fn** pointing to the appropriate backward function.

### 4. Example:

In [2]:
import torch

# Creating a tensor with requires_grad=True so that it tracks operations
a = torch.tensor([2.0, 3.0], requires_grad=True)

# Perforiming an operation on tensor `a`
b = a + 5

# Checking the grad_fn attribute
print(b.grad_fn)

<AddBackward0 object at 0x7d28ad715150>


In this example:
* **a** is a leaf tensor, so **a.grad_fn** is **None**.
* **b** is the result of adding 5 to **a**, so **b.grad_fn** points to the **AddBackward0** function, which represents the addition operation.

### 5. Use in Backpropagation:
* During backpropagation, PyTorch traverses the computational graph starting from the final loss tensor, using the **grad_fn** attributes to follow the chain of operations backward and compute the gradients.

### 6. Relevance:
* If you're manually constructing or modifying parts of the computational graph or debugging, understading **grad_fn** can be crucial. However, in most cases, when simply building and training models, PyTorch handles this automatically.

## Summary:
* The **grad_fn** attribute in a PyTorch tensor provides a reference to the function that created the tensor. This is essential for PyTorch's automatic differentiation system to work, as it allows the framework to trace back through the operations that produced each tensor and compute the necessary gradients for optimization.