# File I/O

At some point, we want to save the results for later use in various context. Additionally, when running a long training process, the best practice is to periodically save the intermediate results (checkpointing) to ensure that we do not lose several days worth of computation if we trip over the power cord of our server.

## 1. Loading and Saving Tensors

In [2]:
import torch
from torch import nn
from torch.nn import functional as F

# store the tensor
x = torch.arange(4)
torch.save(x, 'others/x-file')

In [3]:
# read the tensor
x2 = torch.load('others/x-file')
x2

tensor([0, 1, 2, 3])

We can store a list of tensors and read them back into memory

In [4]:
y = torch.zeros(4)
torch.save([x, y], 'others/x-files')
x2, y2 = torch.load('others/x-files')
(x2, y2)

(tensor([0, 1, 2, 3]), tensor([0., 0., 0., 0.]))

We can even write and read a dictionary that maps from strings to tensors. This is convenient when we want to read or write all the weights in a model.

In [5]:
mydict = {'x': x, 'y': y}
torch.save(mydict, 'others/mydict')
mydict2 = torch.load('others/mydict')
mydict2

{'x': tensor([0, 1, 2, 3]), 'y': tensor([0., 0., 0., 0.])}

## 2. Loading and Saving Model Parameters

More frequently, we want to save entire networks. However, we do not save the actual network but their the models' parameters instead. For example, if we have a 3-layer MLP, we need to specipy the architecture separately. The reason for this is that the models themselves can contain arbitrary code, hence can not be **serialized** naturally. Thus, in order to reinstate a model, **we need to generate the architecture in code and then load the parameters from disk**.

In [7]:
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.LazyLinear(256)
        self.output = nn.LazyLinear(10)

    def forward(self, X):
        return self.output(F.relu(self.hidden(X)))

net = MLP()
X = torch.randn((2, 20))
Y = net(X)



In [14]:
torch.save(net.state_dict(), 'others/mlp.params')

To recover the model, we instantiate a clone of the original MLP model.

In [15]:
clone = MLP()
clone.load_state_dict(torch.load('others/mlp.params'))
# set the module to evaluation model
clone.eval()



MLP(
  (hidden): LazyLinear(in_features=0, out_features=256, bias=True)
  (output): LazyLinear(in_features=0, out_features=10, bias=True)
)

In [16]:
Y_clone = clone(X)
(Y_clone == Y).all()

tensor(True)