<a id=menu><center><h1>目录</h1></center></a>

1. [NumPy数据结构介绍](#numpy)
        1.1 Ndarray对象
        1.2 数组的属性
        1.3 创建数组
        1.4 数组的切片、索引和迭代
        1.5 数组的基础操作

In [1]:
import pandas as pd
import numpy as np

In [2]:
np.__version__

'1.17.0'

<a id=numpy></a>
# 1. [NumPy](https://numpy.org/doc/)数据结构介绍
## 1.1 Ndarray对象
[返回目录](#menu)

NumPy 最重要的一个特点是其 N 维数组对象 ndarray(N-dimensional array object)，它是一系列同类型数据的集合，以 0 下标为开始进行集合中元素的索引。
- ndarray 对象是用于存放**同类型元素**的多维数组。
- ndarray 中的每个元素在内存中都有**相同存储大小的区域**。
![image.png](attachment:image.png)

In [3]:
# 创建一个一维数组
arr = np.array([1, 2, 3])
arr

array([1, 2, 3])

In [6]:
# 创建一个多维数组
arr2 = np.array([[1, 2, 3], [4, 5, 5]])
arr2

array([[1, 2, 3],
       [4, 5, 5]])

In [7]:
# 设置最小维度
a = np.array([1, 2, 3, 4, 5], ndmin=2) 
a

array([[1, 2, 3, 4, 5]])

In [8]:
# 设置内部元素的类型
a = np.array([1,  2,  3], dtype=complex)  
print(a)
a = np.array([1, 2, 3], dtype='f')
print(a)

[1.+0.j 2.+0.j 3.+0.j]
[1. 2. 3.]


In [10]:
np.array?

## 1.2 数组的属性
[返回目录](#menu)


In [12]:
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
arr

array([[[1, 2, 3],
        [4, 5, 6]],

       [[1, 2, 3],
        [4, 5, 6]]])

In [13]:
print(arr.ndim)  # 秩，即轴的数量或维度的数量
print(arr.shape)  # 数组的维度，对于矩阵，n 行 m 列
print(arr.size)  # 数组元素的总个数，相当于 .shape 中 n*m 的值
print(arr.dtype)  # ndarray 对象的元素类型
print(arr.itemsize)  # ndarray 对象中每个元素的大小，以字节为单位
print(arr.flags)  # ndarray 对象的内存信息
print(a.real)  # ndarray元素的实部
print(a.imag)   # ndarray元素的虚部

3
(2, 2, 3)
12
int64
8
  C_CONTIGUOUS : True
  F_CONTIGUOUS : False
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

[1. 2. 3.]
[0. 0. 0.]


## 1.3 创建数组
[返回目录](#menu)

### 1.3.1 从已有的数组创建数组

In [13]:
a = np.array([2,3,4])
print(a)
print(a.dtype)

[2 3 4]
int64


In [14]:
b = np.array([1.2, 3.5, 5])
print(b)
print(b.dtype)

[1.2 3.5 5. ]
float64


In [15]:
a = np.array(1, 2, 3, 4)  # ValueError

ValueError: only 2 non-keyword arguments accepted

In [16]:
np.array?

In [17]:
b = np.array([(1.5,2,3), (4,5,6)])
b

array([[1.5, 2. , 3. ],
       [4. , 5. , 6. ]])

In [18]:
# numpy.asarray 
# 类似numpy.array，但参数只有三个
x = [1, 2, 3] 
a = np.asarray(x)  
print(a)

[1 2 3]


In [19]:
a

array([1, 2, 3])

### 1.3.2 创建特殊数组

In [20]:
# numpy.ones
# 创建指定形状的数组，数组元素以1来填充
np.ones([2,3], dtype=int)

array([[1, 1, 1],
       [1, 1, 1]])

In [21]:
# numpy.zeros
# 创建指定大小的数组，数组元素以0来填充
np.zeros((2, 3, 4))

array([[[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]],

       [[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]]])

In [25]:
# numpy.empty
# 创建一个未初始化的数组，随机填充元素
np.empty((2, 5))

array([[ 1.28822975e-231, -4.33009835e-311,  7.61402136e-010,
         9.04115219e+271,  2.59083755e+029],
       [ 9.32711684e-076,  1.28822975e-231, -3.11107873e+231,
         1.28824253e-231,  6.99665679e-309]])

In [26]:
# numpy.random
print(np.random.rand(10))  # 生成[0,1)区间内的随机数组
print(np.random.randn(10))  # 生成符合标准正态分布的随机数组
print(np.random.randint(10))  # 随机生成一个整数

[0.06357694 0.55707014 0.59835124 0.39635484 0.8699401  0.87631854
 0.10672789 0.1345227  0.75524729 0.54427151]
[-2.06050405  1.32533384  0.36502806 -0.2890505  -0.84928129  0.49433809
 -0.12240531 -1.72902004  0.46458297  1.31969143]
1


### 1.3.3 从数值范围创建数组

In [28]:
range(1, 10, 2)

range(1, 10, 2)

In [30]:
# numpy.arange
# numpy 包中的使用 arange 函数创建数值范围并返回 ndarray 对象
# np.arange(10, 30, 5)
np.arange(0, 2, 0.3)  # 参数接受浮点数

array([0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

In [33]:
# numpy.linspace
# numpy.linspace 函数用于创建一个一维数组，数组是一个等差数列构成的
# np.linspace(1, 10, 20)
np.linspace(1, 10, 20, endpoint=False)

array([1.  , 1.45, 1.9 , 2.35, 2.8 , 3.25, 3.7 , 4.15, 4.6 , 5.05, 5.5 ,
       5.95, 6.4 , 6.85, 7.3 , 7.75, 8.2 , 8.65, 9.1 , 9.55])

In [37]:
# numpy.logspace
# numpy.logspace 函数用于创建一个于等比数列
# np.logspace(1.0, 2.0, num=10)
np.logspace(0, 9, 10, base=2)

array([  1.,   2.,   4.,   8.,  16.,  32.,  64., 128., 256., 512.])

### 1.3.4 用reshape函数重塑数组

In [33]:
np.arange(24)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [38]:
arr3 = np.arange(24).reshape(2, 3, 4)
arr3

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

## 1.4 数组的切片、索引与迭代
[返回目录](#menu)
### 1.4.1 切片
ndarray对象的内容可以通过索引或切片来访问和修改，与 Python 中 list 的切片操作一样

In [57]:
[1,2,3]*3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [56]:
a = np.arange(10)*3
a

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [58]:
# 一维数组的切片操作
print(a[0])
print(a[:8])  # 前闭后开
print(a[2::2])

0
[ 0  3  6  9 12 15 18 21]
[ 6 12 18 24]


In [61]:
# 多维索引
print(arr2)
print('='*10)
print(arr2[:2, 1])
print(arr2[1:3, :])
print(arr2[1, 1])

[[1 2 3]
 [4 5 5]]
[2 5]
[[4 5 5]]
5


In [62]:
# 切片还可以用...
print(arr3)
print('='*10)
print(arr3.shape)
print(arr3[1, ...])  # 等价于arr3[1,:,:] 或arr3[1]
print(arr3[..., 2])  # 等价于arr3[:,:,2]

[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
(2, 3, 4)
[[12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
[[ 2  6 10]
 [14 18 22]]


In [63]:
arr4 = np.arange(120).reshape(2, 3, 4, 5)
arr4

array([[[[  0,   1,   2,   3,   4],
         [  5,   6,   7,   8,   9],
         [ 10,  11,  12,  13,  14],
         [ 15,  16,  17,  18,  19]],

        [[ 20,  21,  22,  23,  24],
         [ 25,  26,  27,  28,  29],
         [ 30,  31,  32,  33,  34],
         [ 35,  36,  37,  38,  39]],

        [[ 40,  41,  42,  43,  44],
         [ 45,  46,  47,  48,  49],
         [ 50,  51,  52,  53,  54],
         [ 55,  56,  57,  58,  59]]],


       [[[ 60,  61,  62,  63,  64],
         [ 65,  66,  67,  68,  69],
         [ 70,  71,  72,  73,  74],
         [ 75,  76,  77,  78,  79]],

        [[ 80,  81,  82,  83,  84],
         [ 85,  86,  87,  88,  89],
         [ 90,  91,  92,  93,  94],
         [ 95,  96,  97,  98,  99]],

        [[100, 101, 102, 103, 104],
         [105, 106, 107, 108, 109],
         [110, 111, 112, 113, 114],
         [115, 116, 117, 118, 119]]]])

In [64]:
print(arr4[1, 2, ...])  # arr4[1,2,:,:]
print('*'*10)
print(arr4[..., 3])  # arr4[:, :, :, 3]
print('*'*10)
print(arr4[..., 3, :1])  # arr4[:, :, 3, :1]

[[100 101 102 103 104]
 [105 106 107 108 109]
 [110 111 112 113 114]
 [115 116 117 118 119]]
**********
[[[  3   8  13  18]
  [ 23  28  33  38]
  [ 43  48  53  58]]

 [[ 63  68  73  78]
  [ 83  88  93  98]
  [103 108 113 118]]]
**********
[[[ 15]
  [ 35]
  [ 55]]

 [[ 75]
  [ 95]
  [115]]]


### 1.4.2 索引

In [66]:
arr2

array([[1, 2, 3],
       [4, 5, 5]])

In [65]:
arr2[:3, [0, 2]]

array([[1, 3],
       [4, 5]])

In [67]:
# 布尔索引
arr4[arr4>50]

array([ 51,  52,  53,  54,  55,  56,  57,  58,  59,  60,  61,  62,  63,
        64,  65,  66,  67,  68,  69,  70,  71,  72,  73,  74,  75,  76,
        77,  78,  79,  80,  81,  82,  83,  84,  85,  86,  87,  88,  89,
        90,  91,  92,  93,  94,  95,  96,  97,  98,  99, 100, 101, 102,
       103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
       116, 117, 118, 119])

In [68]:
# 花式索引
# 花式索引根据索引数组的值作为目标数组的某个轴的下标来取值。
# 对于使用一维整型数组作为索引，如果目标是一维数组，那么索引的结果就是对应位置的元素；如果目标是二维数组，那么就是对应下标的行。
# 花式索引跟切片不一样，它总是将数据复制到新数组中。
x=np.arange(32).reshape((8,4))
x

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [59]:
x[[4,2,1,7], :]

array([[16, 17, 18, 19],
       [ 8,  9, 10, 11],
       [ 4,  5,  6,  7],
       [28, 29, 30, 31]])

In [60]:
x[[-4,-2,-1,-7]]

array([[16, 17, 18, 19],
       [24, 25, 26, 27],
       [28, 29, 30, 31],
       [ 4,  5,  6,  7]])

In [69]:
df = pd.DataFrame(x, columns=list('abcd'))
df

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15
4,16,17,18,19
5,20,21,22,23
6,24,25,26,27
7,28,29,30,31


In [70]:
df.loc[:, ['b', 'a', 'd']]

Unnamed: 0,b,a,d
0,1,0,3
1,5,4,7
2,9,8,11
3,13,12,15
4,17,16,19
5,21,20,23
6,25,24,27
7,29,28,31


### 1.4.3 迭代

In [71]:
a

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [72]:
for i in a:
    print(i**(1/3))

0.0
1.4422495703074083
1.8171205928321397
2.080083823051904
2.2894284851066637
2.46621207433047
2.6207413942088964
2.7589241763811203
2.8844991406148166
3.0


In [73]:
arr2

array([[1, 2, 3],
       [4, 5, 5]])

In [74]:
for i in arr2:
    print(i)

[1 2 3]
[4 5 5]


In [75]:
list(arr2.flat)

[1, 2, 3, 4, 5, 5]

In [76]:
for i in arr2.flat:
    print(i)

1
2
3
4
5
5


## 1.5 数组的基础操作
[返回目录](#menu)


In [39]:
arr

array([[[1, 2, 3],
        [4, 5, 6]],

       [[1, 2, 3],
        [4, 5, 6]]])

In [40]:
print(arr.sum())
print(arr.min())
print(arr.max())

42
1
6


In [41]:
print(arr.sum(axis=0))
print(arr.min(axis=1))
print(arr.cumsum(axis=1))

[[ 2  4  6]
 [ 8 10 12]]
[[1 2 3]
 [1 2 3]]
[[[1 2 3]
  [5 7 9]]

 [[1 2 3]
  [5 7 9]]]


In [43]:
# Universal Function(ufunc)
B = np.arange(3)
print(B)

print(np.exp(B))

print(np.sqrt(B))

print(np.sin(B))

C = np.array([2., -1., 4.])
print(np.add(B, C))

[0 1 2]
[1.         2.71828183 7.3890561 ]
[0.         1.         1.41421356]
[0.         0.84147098 0.90929743]
[2. 0. 6.]


In [45]:
# 广播(broadcast)
# 4x3 的二维数组与长为 3 的一维数组相加，等效于把数组 b 在二维上重复 4 次再运算
a = np.array([[ 0, 0, 0],
           [10,10,10],
           [20,20,20],
           [30,30,30]])
b = np.array([1,2,3])
print(a + b)
print(np.add(a, b))

[[ 1  2  3]
 [11 12 13]
 [21 22 23]
 [31 32 33]]
[[ 1  2  3]
 [11 12 13]
 [21 22 23]
 [31 32 33]]


![image.png](attachment:image.png)

In [50]:
# numpy.maximum 
# numpy.minimum
# 比较不同序列的大小，并取最大值
s1 = pd.Series(np.random.randn(50))
s2 = pd.Series(np.random.randn(50))
np.maximum(s1, s2)

0     0.036139
1     1.829724
2     2.034463
3     0.231515
4     0.323367
5     0.337779
6     1.955697
7    -0.103875
8     2.422660
9    -1.142374
10    1.022884
11    0.440264
12    0.773063
13    1.910343
14    0.681355
15    0.193537
16    1.093198
17    0.019240
18    0.211881
19    0.402606
20    2.823835
21   -0.502386
22    2.053976
23    0.816291
24    0.871699
25   -0.133470
26    0.150444
27    0.739432
28    1.073264
29    2.731059
30    2.125125
31    0.798887
32    0.144432
33    2.050917
34   -0.204420
35    1.217767
36   -0.037434
37    1.261830
38    0.640753
39    0.537765
40    0.045995
41   -0.079198
42    0.586919
43    1.692056
44    0.383487
45    0.301339
46    0.661765
47   -0.077029
48   -0.184859
49    1.795907
dtype: float64

In [52]:
# 如果是一个DataFrame中的两列，可以直接用max取最大值
df = pd.DataFrame({'s1': s1, 's2': s2})
df.max(axis=1)

0     0.036139
1     1.829724
2     2.034463
3     0.231515
4     0.323367
5     0.337779
6     1.955697
7    -0.103875
8     2.422660
9    -1.142374
10    1.022884
11    0.440264
12    0.773063
13    1.910343
14    0.681355
15    0.193537
16    1.093198
17    0.019240
18    0.211881
19    0.402606
20    2.823835
21   -0.502386
22    2.053976
23    0.816291
24    0.871699
25   -0.133470
26    0.150444
27    0.739432
28    1.073264
29    2.731059
30    2.125125
31    0.798887
32    0.144432
33    2.050917
34   -0.204420
35    1.217767
36   -0.037434
37    1.261830
38    0.640753
39    0.537765
40    0.045995
41   -0.079198
42    0.586919
43    1.692056
44    0.383487
45    0.301339
46    0.661765
47   -0.077029
48   -0.184859
49    1.795907
dtype: float64

In [54]:
# np.where(condition, x, y)
# 满足条件(condition)，输出x，不满足输出y。
np.where(df['s1']>0, df['s2']+1, 0)

array([ 0.        ,  1.313727  ,  1.89987074,  0.        ,  0.16481431,
        0.87842528,  2.95569656,  0.        ,  0.42701451,  0.        ,
        0.        ,  0.75410912,  1.77306323,  0.        ,  1.68135507,
        0.36519386,  0.46089877,  0.01065607, -0.13481652, -0.55853824,
        0.51850169,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        , -0.02880578,  0.        ,  3.73105877,
       -0.04584066,  0.        ,  0.        ,  0.22103489,  0.        ,
        0.53513577,  0.        ,  2.26182974,  0.3646672 ,  0.        ,
        0.37026124,  0.        , -0.68156044,  0.65250765,  0.        ,
        0.34761805,  1.66176494,  0.        ,  0.        ,  0.71867474])

In [55]:
df.head()

Unnamed: 0,s1,s2
0,-0.342997,0.036139
1,1.829724,0.313727
2,2.034463,0.899871
3,-1.168189,0.231515
4,0.323367,-0.835186
