NumPy是Python的科学计算的一个核心库。它提供了一个高性能的多维数组(矩阵)对象，可以完成在其之上的很多操作。很多机器学习中的计算问题，把数据vectorize之后可以进行非常高效的运算。

## Numpy数组的创建
### 初始化
#### 使用List
* 一个NumPy数组是一些类型相同的元素组成的类矩阵数据。用list或者层叠的list可以初始化：

In [67]:
import numpy as np

a = np.array([1, 2, 3])  # 一维Numpy数组
print("numpy.array type: {}".format(type(a)))            # Prints "<type 'numpy.ndarray'>"
print("shape of array: {}".format(a.shape))          # Prints "(3,)"
print("values in array: \n{}\na[0]={}, a[1]={}, a[2]={}".format(a, a[0], a[1], a[2]))   # Prints "1 2 3"
a[0] = 5                 # 重赋值
print("*** changed the value with a[0]=5, values in array: {}".format(a))                  # Prints "[5, 2, 3]"

b = np.array([[1,2,3],[4,5,6]])   # 二维Numpy数组
print("\nshape of new array: {}".format(b.shape))                    # Prints "(2, 3)"
print("values in array:\n{}\nb[0,0]={}, b[0,1]={}, b[1,0]={}".format(b, b[0,0], b[0,1], b[1,0]))   # Prints "1 2 4"

numpy.array type: <class 'numpy.ndarray'>
shape of array: (3,)
values in array: 
[1 2 3]
a[0]=1, a[1]=2, a[2]=3
*** changed the value with a[0]=5, values in array: [5 2 3]

shape of new array: (2, 3)
values in array:
[[1 2 3]
 [4 5 6]]
b[0,0]=1, b[0,1]=2, b[1,0]=4


#### 使用特殊函数
* 生成一些特殊的Numpy数组(矩阵)时，我们有特定的函数可以调用：

In [23]:
a = np.zeros((2,2))  # 全0的2*2 Numpy数组
print("array with zeros:\n{}".format(a))       # Prints "[[ 0.  0.]
                                              #          [ 0.  0.]]"
b = np.ones((1,2))   # 全1 Numpy数组
print("\narray with ones:\n{}".format(b))        # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7) # 固定值Numpy数组
print("\narray with a constant:\n{}".format(c))  # Prints "[[ 7.  7.]
                                              #          [ 7.  7.]]"
d = np.eye(2)        # 2*2 对角Numpy数组
print("\narray with eye:\n{}".format(d))         # Prints "[[ 1.  0.]
                                              #          [ 0.  1.]]"
e = np.random.random((2,2)) # 2*2 的随机Numpy数组
print("\narray with a randam:\n{}".format(e))  # 随机输出

array with zeros:
[[0. 0.]
 [0. 0.]]

array with ones:
[[1. 1.]]

array with a constant:
[[7 7]
 [7 7]]

array with eye:
[[1. 0.]
 [0. 1.]]

array with a randam:
[[0.22607625 0.79457681]
 [0.24368967 0.69382384]]


### 索引与取值
#### 切片法
* 可以通过像list一样的分片/slicing操作取出需要的数值部分。

In [84]:
# 创建如下的3*4 Numpy数组
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a=np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print("shape of array a: {}".format(a.shape))
print("values in array:\n{}".format(a))

# 通过slicing取出前两行的2到3列:
# [[2 3]
#  [6 7]]
b=a[:2, 1:3]
print("\nvalues in array b=a[:2, 1:3]:\n{}".format(b))

# 需要注意的是取出的b中的数据实际上和a的这部分数据是同一份数据.
print("\nvalue of a[0,1]={}".format(a[0,1]))   # Prints "2"
b[0,0]=77    # b[0, 0] 和 a[0, 1] 是同一份数据
print("*** changed the value with b[0,0]=77 ***")
print("value of a[0,1]={}".format(a[0,1]))   # a也被修改了，Prints "77"

shape of array a: (3, 4)
values in array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

values in array b=a[:2, 1:3]:
[[2 3]
 [6 7]]

value of a[0,1]=2
*** changed the value with b[0,0]=77 ***
value of a[0,1]=77


* 提取 行向量 / 列向量

In [86]:
print("shape of array a: {}".format(a.shape))
print("values in array:\n{}".format(a))

row_r1 = a[1, :]    # a 的第二行  
print("\nshape with a[1,:]: {}".format(row_r1.shape))
print("values in array:\n{}".format(row_r1))   # Prints "[5 6 7 8] (4,)"

row_r2 = a[1:2, :]  # 同上
print("\nshape with a[1:2,:]: {}".format(row_r2.shape))
print("values in array:\n{}".format(row_r2)) # Prints "[[5 6 7 8]] (1, 4)"

col_r1 = a[:, 1]
print("\nshape with a[:,1]: {}".format(col_r1.shape))
print("values in array:\n{}".format(col_r1))   # Prints "[ 2  6 10] (3,)"

col_r2 = a[:, 1:2]
print("\nshape with a[:,1:2]: {}".format(col_r2.shape))
print("values in array:\n{}".format(col_r2))

shape of array a: (3, 4)
values in array:
[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

shape with a[1,:]: (4,)
values in array:
[5 6 7 8]

shape with a[1:2,:]: (1, 4)
values in array:
[[5 6 7 8]]

shape with a[:,1]: (3,)
values in array:
[77  6 10]

shape with a[:,1:2]: (3, 1)
values in array:
[[77]
 [ 6]
 [10]]


* 提取子矩阵

In [61]:
a = np.array([[1,2], [3, 4], [5, 6]])
print("shape with a: {}\nvalues in array a:\n{}".format(a.shape,a))

# 取出(0,0) (1,1) (2,0)三个位置的值
a1=a[[0,1,2], [0,1,0]]
print("\na[[0,1,2], [0,1,0]]:\ntype: {}\nshape: {}\nvalues: {}".format(type(a1), a1.shape, a1))  # Prints "[1 4 5]"

# 和上面一样
a2=np.array([a[0,0], a[1,1], a[2,0]])
print("\nnp.array([a[0,0], a[1,1], a[2,0]]):\ntype: {}\nshape: {}\nvalues: {}".format(type(a2), a2.shape, a2))  # Prints "[1 4 5]"

# 取出(0,1) (0,1) 两个位置的值
a3=a[[0,0], [1,1]]
print("\na[[0,0], [1,1]]:\ntype: {}\nshape: {}\nvalues: {}".format(type(a3), a3.shape, a3))  # Prints "[2 2]"

# 同上
a4=np.array([a[0,1], a[0,1]])
print("\nnp.array([a[0,1], a[0,1]]):\ntype: {}\nshape: {}\nvalues: {}".format(type(a4), a4.shape, a4))  # Prints "[2 2]"


print("\nnp.array([a[0,1], a[0,1]]):")
print("type with a[0,1]: {}".format( type(a[0,1]) ) )
print("type with [a[0,1], a[0,1]]: {}".format(type([a[0,1], a[0,1]])))
print("type with np.array([a[0,1], a[0,1]]): {}".format(type(np.array([a[0,1], a[0,1]]))))


shape with a: (3, 2)
values in array a:
[[1 2]
 [3 4]
 [5 6]]

a[[0,1,2], [0,1,0]]:
type: <class 'numpy.ndarray'>
shape: (3,)
values: [1 4 5]

np.array([a[0,0], a[1,1], a[2,0]]):
type: <class 'numpy.ndarray'>
shape: (3,)
values: [1 4 5]

a[[0,0], [1,1]]:
type: <class 'numpy.ndarray'>
shape: (2,)
values: [2 2]

np.array([a[0,1], a[0,1]]):
type: <class 'numpy.ndarray'>
shape: (2,)
values: [2 2]

np.array([a[0,1], a[0,1]]):
type with a[0,1]: <class 'numpy.int64'>
type with [a[0,1], a[0,1]]: <class 'list'>
type with np.array([a[0,1], a[0,1]]): <class 'numpy.ndarray'>


#### 条件过滤法——布尔类型
* 通过条件得到bool型的Numpy数组结果，再通过这个数组取出符合条件的值，如下：

In [90]:
a = np.array([[1,2], [3, 4], [5, 6]])
print("shape of a: {}".format(a.shape))
print("values:\n{}".format(a))

bool_idx = (a > 2)  # 判定a大于2的结果矩阵
print("\nvalues by bool with bool_idx=(a>2):\n{}".format(bool_idx))

# 再通过bool_idx取出我们要的值
a1=a[bool_idx]
print("\nget the values by bool array a[bool_idx]:")
print("type: {}".format(type(a1)))
print("shape: {}".format(a1.shape))
print("values:\n{}".format(a1))  # Prints "[3 4 5 6]"

# 放在一起我们可以这么写
a2=a[a>2]
print("\nget the values by condition with a[a>2]:")
print("type: {}".format(type(a2)))
print("shape: {}".format(a2.shape))
print("values:\n{}".format(a2))  # Prints "[3 4 5 6]"


shape of a: (3, 2)
values:
[[1 2]
 [3 4]
 [5 6]]

values by bool with bool_idx=(a>2):
[[False False]
 [ True  True]
 [ True  True]]

get the values by bool array a[bool_idx]:
type: <class 'numpy.ndarray'>
shape: (4,)
values:
[3 4 5 6]

get the values by condition with a[a>2]:
type: <class 'numpy.ndarray'>
shape: (4,)
values:
[3 4 5 6]


### 数据类型

In [96]:
x = np.array([1, 2])  
print("np.array([1, 2])")
print("the value type with x.dtype: {}".format(x.dtype))         # Prints "int64"

x = np.array([1.0, 2.0])
print("\nnp.array([1.0, 2.0])")
print("the value type with x.dtype: {}".format(x.dtype))         # Prints "float64"

x = np.array([1, 2], dtype=np.int64)  # 强制使用某个type
print("\nnp.array([1, 2], dtype=np.int64)")
print("the value type with x.dtype: {}".format(x.dtype))         # Prints "int64"

np.array([1, 2])
the value type with x.dtype: int64

np.array([1.0, 2.0])
the value type with x.dtype: float64

np.array([1, 2], dtype=np.int64)
the value type with x.dtype: int64


## Numpy数组的运算
### 基本运算
* np.add、np.subtract、np.multiply、np.divide、np.sqrt

In [103]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)
print("values in x:\n{}".format(x))
print("values in y:\n{}".format(y))

print("\nvalues with x+y:\n{}".format(x+y))
print("\nvalues with np.add(x,y):\n{}".format(np.add(x,y)))

print("\nvalues with x-y:\n{}".format(x-y))
print("\nvalues with np.subtract(x,y):\n{}".format(np.subtract(x,y)))

# 元素对元素，点对点的乘积
print("\nvalues with x*y:\n{}".format(x*y))
print("\nvalues with np.multiply(x,y):\n{}".format(np.multiply(x,y)))

# 元素对元素，点对点的除法
print("\nvalues with x/y:\n{}".format(x/y))
print("\nvalues with np.divide(x,y):\n{}".format(np.divide(x,y)))

# 开方
print("\nvalues with np.sqrt(x):\n{}".format(np.sqrt(x)))

values in x:
[[1. 2.]
 [3. 4.]]
values in y:
[[5. 6.]
 [7. 8.]]

values with x+y:
[[ 6.  8.]
 [10. 12.]]

values with np.add(x,y):
[[ 6.  8.]
 [10. 12.]]

values with x-y:
[[-4. -4.]
 [-4. -4.]]

values with np.subtract(x,y):
[[-4. -4.]
 [-4. -4.]]

values with x*y:
[[ 5. 12.]
 [21. 32.]]

values with np.multiply(x,y):
[[ 5. 12.]
 [21. 32.]]

values with x/y:
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

values with np.divide(x,y):
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

values with np.sqrt(x):
[[1.         1.41421356]
 [1.73205081 2.        ]]


### 内积
* np.dot

In [106]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])
v = np.array([9,10])
w = np.array([11, 12])
print("values in x:\n{}".format(x))
print("values in y:\n{}".format(y))
print("values in v:\n{}".format(v))
print("values in w:\n{}".format(w))

# 向量内积，得到 219
print("\nvalues with v.dot(w):\n{}".format(v.dot(w)))
print("values with np.dot(v,w):\n{}".format(np.dot(v,w)))

# 矩阵乘法，得到 [29 67]
print("\nvalues with x.dot(v):\n{}".format(x.dot(v)))
print("values with np.dot(x,v):\n{}".format(np.dot(x,v)))

# 矩阵乘法
# [[19 22]
#  [43 50]]
print("\nvalues with x.dot(y):\n{}".format(x.dot(y)))
print("values with np.dot(x,y):\n{}".format(np.dot(x,y)))

values in x:
[[1 2]
 [3 4]]
values in y:
[[5 6]
 [7 8]]
values in v:
[ 9 10]
values in w:
[11 12]

values with v.dot(w):
219
values with np.dot(v,w):
219

values with x.dot(v):
[29 67]
values with np.dot(x,v):
[29 67]

values with x.dot(y):
[[19 22]
 [43 50]]
values with np.dot(x,y):
[[19 22]
 [43 50]]


### 求和
* 特别特别有用的一个操作是，np.sum/求和(对某个维度)：

In [110]:
x = np.array([[1,2],[3,4]])
print("values in x:\n{}".format(x))

print("*** ALL ***")
print("np.sum(x): {}".format(np.sum(x))) # 整个矩阵的和，得到 "10"
print("*** COLUMNS ***")
print("np.sum(x, axis=0): {}".format(np.sum(x, axis=0))) # 每一列的和 得到 "[4 6]"
print("*** ROWS ***")
print("np.sum(x, axis=1): {}".format(np.sum(x, axis=1))) # 每一行的和 得到 "[3 7]"

values in x:
[[1 2]
 [3 4]]
*** ALL ***
np.sum(x): 10
*** COLUMNS ***
np.sum(x, axis=0): [4 6]
*** ROWS ***
np.sum(x, axis=1): [3 7]


### 转置
*  还有一个经常会用到操作是矩阵的转置，在Numpy数组里用.T实现：

In [111]:
x = np.array([[1,2], [3,4]])
print("values in x:\n{}".format(x))
print("values with x.T:\n{}".format(x.T))

# 1*n的Numpy数组，用.T之后其实啥也没做:
v = np.array([1,2,3])
print("\nvalues in v:\n{}".format(v))
print("values with v.T:\n{}".format(v.T))

values in x:
[[1 2]
 [3 4]]
values with x.T:
[[1 3]
 [2 4]]

values in v:
[1 2 3]
values with v.T:
[1 2 3]


##  Broadcasting
Numpy还有一个非常牛逼的机制，你想想，如果你现在有一大一小俩矩阵，你想使用小矩阵在大矩阵上做多次操作。额，举个例子好了，假如你想将一个1*n的矩阵，加到m*n的矩阵的每一行上：

* 使用for循环

In [127]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print("values in x:\n{}".format(x))
print("values in v:\n{}".format(v))

# 逐行相加
#for循环(下面用y的原因是，你不想改变原来的x)
#如果for的次数非常多，会很慢
y = np.empty_like(x)   # 设置一个和x一样维度的Numpy数组y
print("\nvalues in y:\n{}".format(y))
for i in range(4):
    y[i, :] = x[i, :] + v
print("values with x+v by for in each row:\n{}".format(y))

values in x:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
values in v:
[1 0 1]

values in y:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
values with x+v by for in each row:
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


* 变形

In [128]:
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print("values in x:\n{}".format(x))
print("values in v:\n{}".format(v))

# 变形，重复然后叠起来
vv=np.tile(v, (4,1))
print("\nvalues in v with np.tile(v, (4,1)):\n{}".format(vv))
y=x+vv
print("values with x+v:\n{}".format(y))

values in x:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
values in v:
[1 0 1]

values in v with np.tile(v, (4,1)):
[[1 0 1]
 [1 0 1]
 [1 0 1]
 [1 0 1]]
values with x+v:
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


* Broadcasting

In [130]:
#其实因为Numpy的Broadcasting，你可以直接酱紫操作
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
v = np.array([1, 0, 1])
print("values in x:\n{}".format(x))
print("values in v:\n{}".format(v))

y=x+v
print("\nvalues with x+v:\n{}".format(y))

values in x:
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
values in v:
[1 0 1]

values with x+v:
[[ 2  2  4]
 [ 5  5  7]
 [ 8  8 10]
 [11 11 13]]


In [139]:
#更多Broadcasting的例子请看下面：
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
print("shape: {}, values in v:\n{}".format(v.shape, v))
print("shape: {}, values in w:\n{}".format(w.shape, w))

# 首先把v变成一个列向量
# v现在的形状是(3, 1);
# 作用在w上得到的结果形状是(3, 2)，如下
# [[ 4  5]
#  [ 8 10]
#  [12 15]]
print("\nreshape v with np.reshape(v, (3, 1)):")
print(np.reshape(v, (3, 1)))

print("np.reshape(v, (3,1)) * w:")
print(np.reshape(v, (3, 1)) * w)

shape: (3,), values in v:
[1 2 3]
shape: (2,), values in w:
[4 5]

reshape v with np.reshape(v, (3, 1)):
[[1]
 [2]
 [3]]
np.reshape(v, (3,1)) * w:
[[ 4  5]
 [ 8 10]
 [12 15]]


In [4]:
import numpy as np
x = np.array([[1,2,3], [4,5,6]])
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
print("shape: {}, values in x:\n{}".format(x.shape, x))
print("shape: {}, values in v:\n{}".format(v.shape, v))
print("shape: {}, values in w:\n{}".format(w.shape, w))
# 逐行相加，得到如下结果:
# [[2 4 6]
#  [5 7 9]]
print("\nvalues with x+v:\n{}".format(x+v))

# 先逐行相加再转置，得到以下结果:
# [[ 5  6  7]
#  [ 9 10 11]]
print("\nvalues with (x.T + w).T:\n{}".format((x.T + w).T))
print("\nvalues with x + np.reshape(w, (2, 1)):\n{}".format(x + np.reshape(w, (2, 1))))

shape: (2, 3), values in x:
[[1 2 3]
 [4 5 6]]
shape: (3,), values in v:
[1 2 3]
shape: (2,), values in w:
[4 5]

values with x+v:
[[2 4 6]
 [5 7 9]]

values with (x.T + w).T:
[[ 5  6  7]
 [ 9 10 11]]

values with x + np.reshape(w, (2, 1)):
[[ 5  6  7]
 [ 9 10 11]]
