# <center>第8.1章 NumPy基础：数组和矢量计算</center>

<br>



## 8.1-numpy是什么
NumPy（Numerical Python的简称）是Python数值计算最重要的基础包。大多数提供科学计算的包都是用NumPy的数组作为构建基础。

NumPy的部分功能如下：
+ ndarray，一个具有矢量算术运算和复杂广播能力的快速且节省空间的多维数组。
+ 用于对整组数据进行快速运算的标准数学函数（无需编写循环）。
+ 用于读写磁盘数据的工具以及用于操作内存映射文件的工具。
+ 线性代数、随机数生成以及傅里叶变换功能。
+ 用于集成由C、C++、Fortran等语言编写的代码的A C API。
NumPy本身并没有提供多么高级的数据分析功能，理解NumPy数组以及面向数组的计算将有助于你更加高效地使用诸如pandas之类的工具。

<b>Numpy是一个很大的话题,附录A中介绍更多NumPy高级功能，比如广播。(有兴趣的同学可以去学习)</b>

### 8.1.1 NumPy最主要的操作
+ 用于数据整理和清理、子集构造和过滤、转换等快速的矢量化数组运算。
+ 常用的数组算法，如排序、唯一化、集合运算等。
+ 高效的描述统计和数据聚合/摘要运算。
+ 用于异构数据集的合并/连接运算的数据对齐和关系型数据运算。
+ 将条件逻辑表述为数组表达式（而不是带有if-elif-else分支的循环）。
+ 数据的分组运算（聚合、转换、函数应用等）

### 8.1.2 NumPy高效处理大数组的数据：
+ NumPy是在一个连续的内存块中存储数据，独立于其他Python内置对象。NumPy的C语言编写的算法库可以操作内存，而不必进行类型检查或其它前期工作。比起Python的内置序列，NumPy数组使用的内存更少。
+ NumPy可以在整个数组上执行复杂的计算，而不需要Python的for循环。

In [3]:
import numpy as np
my_arr = np.arange(1000000)
my_list = list(range(1000000))
%time for _ in range(10): my_arr2 = my_arr * 2
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

CPU times: user 23.8 ms, sys: 15.3 ms, total: 39.1 ms
Wall time: 55.7 ms
CPU times: user 598 ms, sys: 157 ms, total: 755 ms
Wall time: 883 ms


In [8]:
a = 5
type(a)
a = "foo"
type(a)

In [9]:
"5" + 5

In [10]:
a = 4.5
b = 2
# String formatting, to be visited later
print(f"a is {type(a)}, b is {type(b)}")
a / b

In [11]:
a = 5
isinstance(a, int)

In [12]:
a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))

In [13]:
a = "foo"

In [14]:
getattr(a, "split")

In [15]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

In [16]:
isiterable("a string")
isiterable([1, 2, 3])
isiterable(5)

In [17]:
5 - 7
12 + 21.5
5 <= 2

In [18]:
a = [1, 2, 3]
b = a
c = list(a)
a is b
a is not c

In [19]:
a == c

In [20]:
a = None
a is None

In [21]:
a_list = ["foo", 2, [4, 5]]
a_list[2] = (3, 4)
a_list

In [22]:
a_tuple = (3, 5, (4, 5))
a_tuple[1] = "four"

In [23]:
ival = 17239871
ival ** 6

In [24]:
fval = 7.243
fval2 = 6.78e-5

In [25]:
3 / 2

In [26]:
3 // 2

In [27]:
c = """
This is a longer string that
spans multiple lines
"""

In [28]:
c.count("\n")

In [29]:
a = "this is a string"
a[10] = "f"

In [30]:
b = a.replace("string", "longer string")
b

In [31]:
a

In [32]:
a = 5.6
s = str(a)
print(s)

In [33]:
s = "python"
list(s)
s[:3]

In [34]:
s = "12\\34"
print(s)

In [35]:
s = r"this\has\no\special\characters"
s

In [36]:
a = "this is the first half "
b = "and this is the second half"
a + b

In [37]:
template = "{0:.2f} {1:s} are worth US${2:d}"

In [38]:
template.format(88.46, "Argentine Pesos", 1)

In [39]:
amount = 10
rate = 88.46
currency = "Pesos"
result = f"{amount} {currency} is worth US${amount / rate}"

In [40]:
f"{amount} {currency} is worth US${amount / rate:.2f}"

In [41]:
val = "español"
val

In [42]:
val_utf8 = val.encode("utf-8")
val_utf8
type(val_utf8)

In [43]:
val_utf8.decode("utf-8")

In [44]:
val.encode("latin1")
val.encode("utf-16")
val.encode("utf-16le")

In [45]:
True and True
False or True

In [46]:
int(False)
int(True)

In [47]:
a = True
b = False
not a
not b

In [48]:
s = "3.14159"
fval = float(s)
type(fval)
int(fval)
bool(fval)
bool(0)

In [49]:
a = None
a is None
b = 5
b is not None

In [50]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute

In [51]:
dt.date()
dt.time()

In [52]:
dt.strftime("%Y-%m-%d %H:%M")

In [53]:
datetime.strptime("20091031", "%Y%m%d")

In [54]:
dt_hour = dt.replace(minute=0, second=0)
dt_hour

In [55]:
dt

In [56]:
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
type(delta)

In [57]:
dt
dt + delta

In [58]:
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print("Made it")

In [59]:
4 > 3 > 2 > 1

In [60]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))


In [61]:
range(10)
list(range(10))

In [62]:
list(range(0, 20, 2))
list(range(5, 0, -1))

In [63]:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
    print(f"element {i}: {seq[i]}")

In [64]:
total = 0
for i in range(100_000):
    # % is the modulo operator
    if i % 3 == 0 or i % 5 == 0:
        total += i
print(total)