# 2.1 Data Manipulation

## 2.1.1 Getting Started
A **tensor** represents a (possibly multidimensional) array of numerical values.

In the one-dimensional case, i.e., when only one axis is needed for the data, a tensor is called a **vector**.

With two axes, a tensor is called a **matrix**.

In [31]:
import torch

In [32]:
x = torch.arange(12, dtype=torch.float32)
print("Tensor:", x)
print("Tensor size:", x.numel())
print("Tensor shape:", x.shape)

Tensor: tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11.])
Tensor size: 12
Tensor shape: torch.Size([12])


Tensors can be reconfigured without changing size or elements.

In [33]:
X = x.reshape(3, 4)
X

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.]])

In [34]:
torch.zeros((2, 3, 4))

tensor([[[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]],

        [[0., 0., 0., 0.],
         [0., 0., 0., 0.],
         [0., 0., 0., 0.]]])

In [35]:
torch.ones((2, 3, 4))

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]])

In [36]:
torch.tensor([[2, 1, 4, 3], [1, 2, 3, 4], [4, 3, 2, 1]])

tensor([[2, 1, 4, 3],
        [1, 2, 3, 4],
        [4, 3, 2, 1]])

The following snippet creates a tensor with elements drawn from a standard Gaussian (normal) distribution with mean 0 and standard deviation 1.

In [37]:
torch.randn(3, 4)

tensor([[ 0.7709, -0.0249,  1.0209,  2.5209],
        [ 1.7326, -2.0321,  0.5766, -0.7545],
        [-0.2726,  0.3639,  1.2308,  0.4624]])

## 2.1.2 Indexing
If we want to assign multiple elements the same value, we apply the indexing on the left-hand side of the assignment operation. For instance, [:2, :] accesses the first and second rows, where : takes all the elements along axis 1 (column).

In [38]:
X[:2, :] = 12
X

tensor([[12., 12., 12., 12.],
        [12., 12., 12., 12.],
        [ 8.,  9., 10., 11.]])

## 2.1.3 Operations
Elementwise operations apply a standard scalar operation to each element of a tensor. Most standard operators, including **unary** ones like 
e^x, can be applied elementwise.

In [39]:
torch.exp(x)

tensor([162754.7969, 162754.7969, 162754.7969, 162754.7969, 162754.7969,
        162754.7969, 162754.7969, 162754.7969,   2980.9580,   8103.0840,
         22026.4648,  59874.1406])

Likewise, we denote binary scalar operators, which map pairs of real numbers to a (single) real number.

In [40]:
x = torch.tensor([1.0, 2, 4, 8])
y = torch.tensor([2, 2, 2, 2])
x + y, x - y, x * y, x / y, x**y

(tensor([ 3.,  4.,  6., 10.]),
 tensor([-1.,  0.,  2.,  6.]),
 tensor([ 2.,  4.,  8., 16.]),
 tensor([0.5000, 1.0000, 2.0000, 4.0000]),
 tensor([ 1.,  4., 16., 64.]))

Sometimes, we want to construct a binary tensor via logical statements. Take X == Y as an example. For each position i, j, if X[i, j] and Y[i, j] are equal, then the corresponding entry in the result takes value 1, otherwise it takes value 0.

In [41]:
x == y

tensor([False,  True, False, False])

Summing all the elements in the tensor yields a tensor with only one element.

In [42]:
X.sum()

tensor(134.)

## 2.1.4 Broadcasting
By now, you know how to perform elementwise binary operations on two tensors of the same shape. Under certain conditions, even when shapes differ, we can still perform elementwise binary operations by invoking the broadcasting mechanism.

Broadcasting works according to the following two-step procedure:
1. Expand one or both arrays by copying elements along axes with length 1 so that both tensors have the same shape
2. Perform an elementwise operation on the resulting arrays.

In [43]:
a = torch.arange(3).reshape((3, 1))
b = torch.arange(2).reshape((1, 2))
a, b

(tensor([[0],
         [1],
         [2]]),
 tensor([[0, 1]]))

- `a` is a `3 x 1` matrix
- `b` is a `1 x 2` matrix

Broadcasting produces a `3 x 2` matrix by replicating `a` along columns and `b` along rows before adding them elementwise.

In [44]:
a + b

tensor([[0, 1],
        [1, 2],
        [2, 3]])

## 2.1.5 Saving Memory
If we write `Y = X + Y`, we dereference the tensor that `Y` used to point to, and allocate more memory for the new `Y`.

In [45]:
before = id(y)
y = y + x
id(y) == before

False

1. First, we do not want to run around allocating memory unnecessarily all the time. In machine learning, we often have hundreds of megabytes of parameters and update all of them multiple times per second. **Whenever possible, we want to perform these updates in place.**

1. Second, we might point at the same parameters from multiple variables. If we do not update in place, we must be careful to update all of these references, lest we spring a memory leak or **inadvertently refer to stale parameters.**

Fortunately, performing in-place operations is easy. We can assign the result of an operation to a previously allocated array Y by using slice notation: `Y[:] = <expression>`.

In [47]:
z = torch.zeros_like(y)
print("id(Z):", id(z))
z[:] = x + y
print("id(Z):", id(z))

id(Z): 6079004240
id(Z): 6079004240


Converting to a NumPy tensor (ndarray), or vice versa, is easy. The torch tensor and NumPy array will share their underlying memory, and changing one through an in-place operation will also change the other.

In [48]:
A = X.numpy()
B = torch.from_numpy(A)
type(A), type(B)

(numpy.ndarray, torch.Tensor)