# 前言

在前面两篇当中，我学习了 Python 中的一些基础知识，例如基本的数据类型，容器类型，Python 的函数定义，如何调用函数等知识，现在我以及迫不及待的想要学习这一次作者分享的文章了。

> [盘一盘 Python 系列 2 - NumPy (上)](https://mp.weixin.qq.com/s/nWu_PE5U7EASwJLYlyZcNA)
[NumPy官网](http://www.numpy.org)

和前面的内容一样，分两篇进行学习。主要内容如下：

+ 数组的创建(1维/2维/n维数组，三种创建方法以及数组的性质)
+ 数组的存载(保存和加载,npy,txt,csv格式)
+ 数组的获取(索引和切片，正规，布尔，花式)
+ 数组的变形(重塑和打平，合并和分类，元素重复和数组重复)
+ 数组的计算(元素层面计算，线性代数计算，广播机制计算)

在学习之前，先看个小例子：

In [1]:
import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

In [2]:
%time for _ in range(10): my_arr2 = my_arr * 2

Wall time: 52 ms


In [3]:
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Wall time: 1.57 s


In [4]:
#看个数组转置的例子
arr = np.array([[1,2,3],[4,5,6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [5]:
arr.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [7]:
arr = np.arange(16).reshape((2,2,4))
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [8]:
#轴的转置
arr.transpose(1,0,2)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

# 1. 数组的创建
## 1.1 初次印象
NumPy 中的数组有 1 维数组，2 维数组，...，n 维数组。类似于点线面，可以用轴来理解。例如只有一维数组时，就只有一个轴，被称为轴0，二维数组则有轴0和轴1等。就像下面这样：

+ 第一维 = axis 0
+ 第二维 = axis 1
+ 第三维 = axis 2

## 1.2 创建数组

+ 按部就班 np.array()
+ 定隔定点 np.arange() 和 np.linspace()
+ 一步登天 np.ones() np.zeros() np.eye() np.random.random()

### 按部就班法

In [9]:
l = [3.5,5,2,8,4.2]
np.array(l)

array([3.5, 5. , 2. , 8. , 4.2])

In [10]:
t = (3.5,5,2,8,4.2)
np.array(t)

array([3.5, 5. , 2. , 8. , 4.2])

### 定隔定点法

+ arange() 固定元素大小间隔
+ linspace 固定元素的个数

用法：
+ arange(start,stop,step) stop必须要有
+ linspace(start,stop,num)

In [12]:
print(np.arange(8))
print(np.arange(2,8))
print(np.arange(2,8,2))#第三个 2 是步长

[0 1 2 3 4 5 6 7]
[2 3 4 5 6 7]
[2 4 6]


In [13]:
print(np.linspace(2,6,3))
print(np.linspace(3,8,11))

[2. 4. 6.]
[3.  3.5 4.  4.5 5.  5.5 6.  6.5 7.  7.5 8. ]


### 一步登天法

+ zeros() 创建全是 0 的 n 维数组
+ ones() 创建全是 1 的 n 维数组
+ random() 创建随机 n 维数组
+ eye() 创建对角矩阵

In [16]:
print(np.zeros(5))
print(np.ones((2,3)))
print(np.random.random((2,3,4)))

[0. 0. 0. 0. 0.]
[[1. 1. 1.]
 [1. 1. 1.]]
[[[0.84243212 0.10665508 0.38931824 0.14421967]
  [0.87489327 0.57627455 0.76766105 0.88199034]
  [0.54985233 0.67220373 0.14233803 0.49116518]]

 [[0.48180999 0.17678358 0.69928024 0.26949941]
  [0.1262937  0.7704266  0.26925639 0.92594612]
  [0.42872654 0.40360784 0.63063398 0.71170859]]]


In [17]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [18]:
np.eye(4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [19]:
np.eye(4, k=-1)

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.]])

## 1.3 数组性质
Python 中万物皆为对象。
### 一维数组

In [20]:
arr = np.array([3.5,5,2,8,4.2])
arr

array([3.5, 5. , 2. , 8. , 4.2])

In [21]:
dir(arr)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

In [22]:
print('The type is ',type(arr))
print('The dimension is ', arr.ndim)
print('The length of array is ',len(arr))
print('The number of elements is ',arr.size)
print('The shape of array is ',arr.shape)
print('The stride of array is ',arr.strides)
print('The type of elements is ',arr.dtype)

The type is  <class 'numpy.ndarray'>
The dimension is  1
The length of array is  5
The number of elements is  5
The shape of array is  (5,)
The stride of array is  (8,)
The type of elements is  float64


### 二维数组


In [23]:
l2 = [[1,2,3],[4,5,6]]
arr2d = np.array(l2)
arr2d

array([[1, 2, 3],
       [4, 5, 6]])

In [24]:
print('The type is ',type(arr2d))
print('The dimension is ',arr2d.ndim)
print('The length of array is ',len(arr2d))
print('The number of elements is ',arr2d.size)
print('The shape of array is ',arr2d.shape)
print('The stride of array is ',arr2d.strides)
print('The type of elements is ',arr2d.dtype)

The type is  <class 'numpy.ndarray'>
The dimension is  2
The length of array is  2
The number of elements is  6
The shape of array is  (2, 3)
The stride of array is  (12, 4)
The type of elements is  int32


### n 维数组

In [26]:
arr4d = np.random.random((2,2,2,3))
arr4d

array([[[[0.46336051, 0.0866378 , 0.5695559 ],
         [0.84947661, 0.20548835, 0.86086956]],

        [[0.31074812, 0.7091611 , 0.43621513],
         [0.255793  , 0.19595474, 0.26927188]]],


       [[[0.80334628, 0.80431385, 0.52186245],
         [0.27531342, 0.46432267, 0.92982317]],

        [[0.07469738, 0.09499854, 0.86401962],
         [0.88901648, 0.32792331, 0.32183391]]]])

In [27]:
#属性比较重要
print('The type is ',type(arr4d))
print('The dimensions is ',arr4d.ndim)
print('The length of array is ',len(arr4d))
print('The number of elements is ',arr4d.size)
print('The stride of array is ',arr4d.strides)
print('The type of elements is ',arr4d.dtype)

The type is  <class 'numpy.ndarray'>
The dimensions is  4
The length of array is  2
The number of elements is  24
The stride of array is  (96, 48, 24, 8)
The type of elements is  float64


猜测关系：
shape = (a1,a2,a3,...,a_(n-1),an)

strides = (2^an*an*a_(n-1)*...*a2,...,2^an*an,2^an)

# 2. 数组的存载
数组的保存和加载。
## numpy 自身的 .npy 格式

In [28]:
arr_disk = np.arange(8)
np.save('arr_disk',arr_disk) #以 .npy 格式保存
arr_disk

array([0, 1, 2, 3, 4, 5, 6, 7])

In [29]:
#加载
np.load('arr_disk.npy')

array([0, 1, 2, 3, 4, 5, 6, 7])

## 文本 .txt 格式

In [30]:
arr_text = np.array([[1.,2.,3.],[4.,5.,6.]])
np.savetxt('arr_from_text.txt',arr_text)

In [31]:
#加载
np.loadtxt('arr_from_text.txt')

array([[1., 2., 3.],
       [4., 5., 6.]])

## 文本 .csv 格式

In [32]:
arr_csv = np.array([[1,2,3],[4,5,6],[7,8,9]])
np.savetxt('arr_from_csv.csv',arr_csv,delimiter=';')

In [33]:
#未指定分割符
np.genfromtxt('arr_from_csv.csv')

array([nan, nan, nan])

In [34]:
np.genfromtxt('arr_from_csv.csv',delimiter=';')

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

# 3. 数组的获取

可以通过索引(indexing)和切片(slicing)完成。

+ 切片是获取一段特定位置的元素(arr[start:stop:step])
+ 索引是获取一个特定位置的元素(arr[index])

索引数组有三种形式：
+ 正规索引(normal indexing)
+ 布尔索引(boolean indexing)
+ 花式索引(fancy indexing)

切片和索引区别：
+ 切片得到原数组的一个视图(view),修改切片内容会改变原数组
+ 索引得到的是原数组中的一个复制(copy)，修改索引中内容不改变原数组

## 3.1 正规索引
### 一维数组

In [35]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [36]:
#范围第七个位置元素
arr[6]

6

In [37]:
a = arr[6]
a = 100
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [38]:
a

100

In [39]:
arr[6:8]

array([6, 7])

In [40]:
b = arr[6:8]
b[1] = 12
arr

array([ 0,  1,  2,  3,  4,  5,  6, 12,  8,  9])

In [41]:
b

array([ 6, 12])

### 二维数组

In [42]:
arr2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [43]:
#索引
#axis2
arr2d[2]

array([7, 8, 9])

In [44]:
#axis0 axis2
arr2d[0][2]

3

In [45]:
#另一种方式索引
arr2d[0,2]

3

In [46]:
#切片
#axis0 的前两个元素
arr2d[:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [47]:
arr2d[:,[0,2]]

array([[1, 3],
       [4, 6],
       [7, 9]])

In [48]:
#第二行的前两个元素
arr2d[1,:2]

array([4, 5])

In [49]:
arr2d[:2,2]

array([3, 6])

## 3.2 布尔索引

In [56]:
code = np.array(['BABA','FA','JD','BABA','JD','FB'])
price = np.array([[170,177,169],[150,159,153],
                 [24,27,26],[165,170,167],
                 [22,23,20],[155,116,157]])#开盘，最高，收盘
price

array([[170, 177, 169],
       [150, 159, 153],
       [ 24,  27,  26],
       [165, 170, 167],
       [ 22,  23,  20],
       [155, 116, 157]])

In [57]:
#boolean
code == 'BABA'

array([ True, False, False,  True, False, False])

In [58]:
#利用索引获取
price[code == 'BABA']

array([[170, 177, 169],
       [165, 170, 167]])

In [60]:
#利用索引获取 BABA 的最高和收盘价格
price[code == 'BABA',1:]

array([[177, 169],
       [170, 167]])

In [61]:
#JD 和 FB 的股价
price[(code=='FB')|(code=='JD')]

array([[ 24,  27,  26],
       [ 22,  23,  20],
       [155, 116, 157]])

In [62]:
#将股价小于 25 的清零
price[price<25]=0
price

#pandas 包中有相应知识

array([[170, 177, 169],
       [150, 159, 153],
       [  0,  27,  26],
       [165, 170, 167],
       [  0,   0,   0],
       [155, 116, 157]])

## 3.3 花式索引

In [63]:
arr = np.arange(32).reshape(8,4)
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [65]:
#获取 5,4,7行
arr[[4,3,6]]

array([[16, 17, 18, 19],
       [12, 13, 14, 15],
       [24, 25, 26, 27]])

In [66]:
#获取倒数第 4,3,6行
arr[[-4,-3,-6]]

array([[16, 17, 18, 19],
       [20, 21, 22, 23],
       [ 8,  9, 10, 11]])

In [67]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

In [68]:
#等价的写法
np.array([arr[1,0],arr[5,3],
         arr[7,1],arr[2,2]])

array([ 4, 23, 29, 10])

In [69]:
arr[:,[0,3,1,2]] #相当于交换了列

array([[ 0,  3,  1,  2],
       [ 4,  7,  5,  6],
       [ 8, 11,  9, 10],
       [12, 15, 13, 14],
       [16, 19, 17, 18],
       [20, 23, 21, 22],
       [24, 27, 25, 26],
       [28, 31, 29, 30]])

# 4. 总结

+ 如何创建
+ 如何存载
+ 获取
+ 变形
+ 计算

In [71]:
arr = np.arange(16).reshape(2,2,4)
arr

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

In [72]:
arr.transpose(1,0,2)

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])