- Creating a tensor on the fly is all well and good, but **if the data inside is valuable**, we will want to save it to a file and load it back at some point.
- **PyTorch uses `pickle` under the hood to serialize the tensor object**, plus dedicated serialization code for the storage.

In [1]:
import torch

In [2]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])

In [4]:
torch.save(points, '../data/p1ch3/ourpoints.t')

- As an alternative, we can pass a file descriptor in lieu (instead) of the filename.

In [5]:
with open('../data/p1ch3/ourpoints.t', 'wb') as f:
    torch.save(points, f)

In [6]:
points = torch.load('../data/p1ch3/ourpoints.t')

In [7]:
with open('../data/p1ch3/ourpoints.t', 'rb') as f:
    points = torch.load(f)

- While we can quickly save tensors this way if we only want to load them with PyTorch, **the file format itself is not interoperable**: we can't read the tensor with software other than PyTorch.
- ではhow to save tensors interoperably?
### 3.12.1 Serializing to HDF5 with h5py
- **HDF5 is a portable, widely supported format for representing serialized multidimensional arrays, organized in a nested key-value dictionary**.
- Python supports HDF5 through the `h5py` library, which accepts and returns data in the form of NumPy arrays.

In [8]:
import h5py

In [9]:
f = h5py.File('../data/p1ch3/ourpoints.hdf5', 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()

- Here `'coords'` is a **key** into the HDF5 file.
- We can have other keys - **even nested ones**.
- One of the interesting things in HDF5 is that we can **index the dataset while on disk and access only the elements we're interested in**.
- Let's suppose we want to load just the last two points in out dataset:

In [10]:
f = h5py.File('../data/p1ch3/ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]

- **The data is not loaded when the file is opened or the dataset is required**.
- Rather, **data stays on disk until we request the second and last rows in the dataset**.
- At that point, `h5py` accesses those two columns and returns a NumPy array-like object **encapsulating that region** in that dataset that behaves like a NumPy array and has the same API.
- Owing to this fact, can pass the returned object to the `torch.from_numpy` function to obtain a tensor directly.
    - In this case, the data is copied over to the tensor's storage:

In [11]:
last_points = torch.from_numpy(dset[-2:])
f.close()

In [12]:
last_points

tensor([[5., 3.],
        [2., 1.]])

- Closing the HDF5 file invalidates the datasets, and trying to access `dset` afterward will give an exception:

In [13]:
last_points_again = torch.from_numpy(dset[-2:])

ValueError: Invalid dataset identifier (invalid dataset identifier)