<a href="https://colab.research.google.com/github/yashpatel5400/CurveTorch/blob/main/tutorials/basic_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Overview
CurveTorch can be used interchangeably with any other PyTorch optimizer. The only thing that may be different from most optimizers you usually use is the need to pass in the optimizer function closure. This allows the optimizer to access Hessian information during the update step. We will see an example of how to do this further below. Let's start by importing the usual packages:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim.lr_scheduler import StepLR
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms, utils

Let's now continue by importing the CurveSGD package. If you have installed it globally, there is no need to add the `sys.append`:

In [None]:
# hack for importing local library: not necessary if installed globally
import sys
sys.path.append("..")
import curvetorch as curve

## CurveSGD Usage
To use CurveSGD, we need to define a model that we wish to optimize. CurveSGD can also be used in isolation for optimizing functions outside the scope of neural networks, so long as the functions are defined using the PyTorch API and have autograd available. See the other tutorial if you are interested in using CurveSGD for purposes beyond optimizing neural networks. Here, we start by define a very simple network, which will be run on MNIST:

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.dropout1 = nn.Dropout2d(0.25)
        self.dropout2 = nn.Dropout2d(0.5)
        self.fc1 = nn.Linear(784, 4) 
        self.fc2 = nn.Linear(4, 10)

    def forward(self, x):
        x = self.dropout1(x)
        x = torch.flatten(x, 1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        output = F.log_softmax(x, dim=1)
        return output

Finally, let's see how to actually use CurveSGD in an optimization loop (note: the "optimizer" referenced in the code below is an instance of CurveSGD, passed in further below):

In [None]:
def train(conf, model, device, train_loader, optimizer, epoch, writer):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        
        def closure():
            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward(retain_graph=True, create_graph=True)
            return loss
        
        optimizer.step(closure)
        
        loss = F.nll_loss(model(data), target)
        if batch_idx % conf.log_interval == 0:
            loss = loss.item()
            idx = batch_idx + epoch * (len(train_loader))
            writer.add_scalar('Loss/train', loss, idx)
            print(
                'Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    epoch,
                    batch_idx * len(data),
                    len(train_loader.dataset),
                    100.0 * batch_idx / len(train_loader),
                    loss,
                )
            )

As mentioned towards the beginning of the tutorial, we have a chunk of code that is somewhat atypical of standard optimizers: the use of function closures. Generally, optimizers perform updates via: `optimizer.step()`. In this case, however, we have:

```
def closure():
    optimizer.zero_grad()
    output = model(data)
    loss = F.nll_loss(output, target)
    loss.backward(retain_graph=True, create_graph=True)
    return loss
        
optimizer.step(closure)
```

With the reason being that we need the entire closure of the loss to access Hessian information within the optimization step. Other optimizers that require 2nd order information similarly make use of function closures, but they are less common than 1st order methods, which can get away with simply being invoked using `optimizer.step()`. With this setup complete, we can at last go ahead and prep up some final accessory functions:

In [None]:
def prepare_loaders(conf, use_cuda=False):
    kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST(
            '../data',
            train=True,
            download=True,
            transform=transforms.Compose(
                [
                    transforms.ToTensor(),
                    transforms.Normalize((0.1307,), (0.3081,)),
                ]
            ),
        ),
        batch_size=conf.batch_size,
        shuffle=True,
        **kwargs,
    )

    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST(
            '../data',
            train=False,
            transform=transforms.Compose(
                [
                    transforms.ToTensor(),
                    transforms.Normalize((0.1307,), (0.3081,)),
                ]
            ),
        ),
        batch_size=conf.test_batch_size,
        shuffle=True,
        **kwargs,
    )
    return train_loader, test_loader


class Config:
    def __init__(
        self,
        batch_size: int = 64,
        test_batch_size: int = 1000,
        epochs: int = 15,
        lr: float = 0.01,
        gamma: float = 0.7,
        no_cuda: bool = True,
        seed: int = 42,
        log_interval: int = 10,
    ):
        self.batch_size = batch_size
        self.test_batch_size = test_batch_size
        self.epochs = epochs
        self.lr = lr
        self.gamma = gamma
        self.no_cuda = no_cuda
        self.seed = seed
        self.log_interval = log_interval

Finally, let's define the invocation of the optimization loop. Note the invocation:

```
optimizer = curve.CurveSGD(model.parameters(), lr=conf.lr)
```

There are some other parameters related to the exponential moving averages of the function, gradient, and Hessian-vector values, which can be seen in full detail in the accompanying full documentation. With that, let's define the optimization loop with the appropriate invocation:

In [None]:
def run_optimizer():
    conf = Config()
    log_dir = 'runs/mnist_custom_optim'
    print('Tensorboard: tensorboard --logdir={}'.format(log_dir))

    with SummaryWriter(log_dir) as writer:
        use_cuda = not conf.no_cuda and torch.cuda.is_available()
        torch.manual_seed(conf.seed)
        device = torch.device('cuda' if use_cuda else 'cpu')
        train_loader, test_loader = prepare_loaders(conf, use_cuda)

        model = Net().to(device)

        # create grid of images and write to tensorboard
        images, labels = next(iter(train_loader))
        img_grid = utils.make_grid(images)
        writer.add_image('mnist_images', img_grid)

        # custom optimizer from torch_optimizer package
        optimizer = curve.CurveSGD(model.parameters(), lr=conf.lr)

        for epoch in range(1, conf.epochs + 1):
            train(conf, model, device, train_loader, optimizer, epoch, writer)

And at last, we can run the accompanying optimizer simply with:

In [None]:
run_optimizer()

And that's it! Notice that there is nothing different about the invocation of CurveSGD for optimization compared to other optimizers in the PyTorch library, with the exception of having to invoke the function closure for Hessian information. For full documentation, see the associated website or Sphinx pages.