# NumPy数组基础知识

## NumPy数组属性

In [1]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

``ndim`` 几个维度, 

``shape`` 每个维度的尺寸, 

``size`` 元素总数:

In [2]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60


``dtype``查看数据类型:

In [3]:
print("dtype:", x3.dtype)

dtype: int64


``itemsize``,每个元素的占有多少字节, ``nbytes``整个数组占有多少字节，这个可以查看数组占有的内存空间:

In [4]:
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

itemsize: 8 bytes
nbytes: 480 bytes


``nbytes``=``itemsize``*``size``.

## 数组索引：访问单个元素

与Python列表类似，从0开始计数:

In [5]:
x1

array([5, 0, 3, 3, 7, 9])

In [6]:
x1[0] #第一个元素

5

In [7]:
x1[4]

7

可以使用负数从数组尾部开始索引:

In [8]:
x1[-1]

9

In [9]:
x1[-2]

7

多维数组，用逗号分隔的元组索引:

In [10]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [11]:
x2[0, 0]

3

In [12]:
x2[2, 0]

1

In [13]:
x2[2, -1]

7

值也可以使用上面的索引进行修改:

In [14]:
x2[0, 0] = 12
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

记住NumPy数组元素的类型是固定的，不行Python列表

In [15]:
x1[0] = 3.14159  # this will be truncated!
x1

array([3, 0, 3, 3, 7, 9])

## 数组切片: 访问子数组

``` python
x[start:stop:step]
```
默认值``start=0``, ``stop=``*``size of dimension``*, ``step=1``.


### 一维子数组

In [16]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
x[:5]  # first five elements

array([0, 1, 2, 3, 4])

In [18]:
x[5:]  # elements after index 5

array([5, 6, 7, 8, 9])

In [19]:
x[4:7]  # middle sub-array 不包括最后一个元素

array([4, 5, 6])

In [20]:
x[::2]  # every other element

array([0, 2, 4, 6, 8])

In [21]:
x[1::2]  # every other element, starting at index 1

array([1, 3, 5, 7, 9])

step可以负数

In [22]:
x[::-1]  # all elements, reversed 这个到能理解

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [23]:
x[5::-2]  # reversed every other from index 5  这个还要想想

array([5, 3, 1])

### 多维子数组

多个分片，以逗号隔开

In [24]:
x2

array([[12,  5,  2,  4],
       [ 7,  6,  8,  8],
       [ 1,  6,  7,  7]])

In [25]:
x2[:2, :3]  # two rows, three columns

array([[12,  5,  2],
       [ 7,  6,  8]])

In [26]:
x2[:3, ::2]  # all rows, every other column

array([[12,  2],
       [ 7,  8],
       [ 1,  7]])

负数，行列倒序:

In [27]:
x2[::-1, ::-1]

array([[ 7,  7,  6,  1],
       [ 8,  8,  6,  7],
       [ 4,  2,  5, 12]])

#### 访问数组的行或者列

空分片``:``，不明白

In [28]:
print(x2[:, 0])  # first column of x2，可否理解为所有的行，0列

[12  7  1]


In [29]:
print(x2[0, :])  # first row of x2

[12  5  2  4]


空分片也可以省略掉:

In [30]:
print(x2[0])  # equivalent to x2[0, :]

[12  5  2  4]


### 子数组是非拷贝的视图

这与Python列表是不同的

In [31]:
print(x2)

[[12  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


提取 $2 \times 2$ 子数组:

In [32]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[12  5]
 [ 7  6]]


如果改变子数组的元素，原始数组的元素也会相应改变

In [33]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [34]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


我们处理大型数据的时候很多时候是没必要复制的，浪费空间.

### 创建数组的拷贝

有时候也需要复制，那就显式调用``copy()`` 方法:

In [35]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


If we now modify this subarray, the original array is not touched:

In [36]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [37]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


## 调整数组的形状

调用``reshape`` 方法实现:

In [38]:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


``reshape`` 方法尽量非复制，但不保证

两种方式将一维数组变成二维矩阵

调用``reshape`` 方法, 或者切片操作使用``newaxis`` 关键字:

In [39]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape((1, 3))

array([[1, 2, 3]])

In [40]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [41]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [42]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

## 数组连接和分裂

前面都是操作单个数组，也有可能多个数组合为一个数组，或者一个数组拆分为多个数组

### 数组连接

主要有三个方法 ``np.concatenate``, ``np.vstack``, 和 ``np.hstack``.


In [43]:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

array([1, 2, 3, 3, 2, 1])

可以一次连接多个数组:

In [44]:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 99 99]


也可以连接二维数组:

In [45]:
grid = np.array([[1, 2, 3],
                 [4, 5, 6]])

In [46]:
# concatenate along the first axis，沿着第一个轴连接
np.concatenate([grid, grid])

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [47]:
# concatenate along the second axis (zero-indexed)，指定沿着第二个轴连接
np.concatenate([grid, grid], axis=1)

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

对于混合多维的数组连接，使用 ``np.vstack`` (垂直堆积) 和 ``np.hstack`` (水平延伸) 更加清晰点:

In [48]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                 [6, 5, 4]])

# vertically stack the arrays
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [None]:
# horizontally stack the arrays
y = np.array([[99],
              [99]])
np.hstack([grid, y])

类似的, ``np.dstack`` 将会沿着第三个轴合并数组.

### 数组分裂

也是三个方法 ``np.split``, ``np.hsplit``, and ``np.vsplit``.  需要列表指定分裂点:

In [50]:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

[1 2 3] [99 99] [3 2 1]


注意 *N* 分裂点, 产生*N + 1* 子数组

In [51]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [52]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [53]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


类似的方法``np.dsplit`` 将沿第三个轴分裂.