
# $$ NumPy 学习手册1.0版 $$


In [1]:
import numpy as np

# 1.创建

## How to create a numpy array?

the most common ways is to create one from a list or a list like an object by passing it to the **np.array** function.

In [2]:
list1 = list(range(4))
arr1d = np.array(list1)
arr1d

array([0, 1, 2, 3])

In [3]:
arr1d + 2

array([2, 3, 4, 5])

In [4]:
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2d = np.array(list2)
arr2d

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [5]:
arr2d_f = np.array(list2, dtype='float')
arr2d_f

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

In [6]:
arr2d_f.astype('int')

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [7]:
arr2d_b = np.array([1, 0, 1], dtype='bool')
arr2d_b

array([ True, False,  True])

In [8]:
arr1d_obj = np.array([1, 'a'], dtype='object')
arr1d_obj

array([1, 'a'], dtype=object)

In [9]:
arr1d_obj.tolist()

[1, 'a']

##  How to create sequences, repetitions and random numbers using numpy?

The **np.arange** function comes handy to create customised number sequences as ndarray.

In [10]:
# Lower limit is 0 be default
np.arange(5)

array([0, 1, 2, 3, 4])

In [11]:
# 0 to 9
print(np.arange(0, 10))  

[0 1 2 3 4 5 6 7 8 9]


In [12]:
# 0 to 9 with step of 2
print(np.arange(0, 10, 2)) 

[0 2 4 6 8]


In [13]:
# 10 to 1, decreasing order
print(np.arange(10, 0, -1))

[10  9  8  7  6  5  4  3  2  1]


You can set the starting and end positions using np.arange. But if you are focussed on **the number of items** in the array you will have to manually calculate the appropriate step value.

Say, you want to create an array of exactly 10 numbers between 1 and 50, Can you compute what would be the step value?

Well, I am going to use the **np.linspace** instead.

In [14]:
# Start at 1 and end at 50
np.linspace(start=1, stop=50, num=10, dtype=int)

array([ 1,  6, 11, 17, 22, 28, 33, 39, 44, 50])

In [15]:
# Limit the number of digits after the decimal to 2
np.set_printoptions(precision=2)  

# Start at 10^1 and end at 10^50
np.logspace(start=1, stop=50, num=10, base=10) 

array([1.00e+01, 2.78e+06, 7.74e+11, 2.15e+17, 5.99e+22, 1.67e+28,
       4.64e+33, 1.29e+39, 3.59e+44, 1.00e+50])

In [16]:
np.zeros([2,3])

array([[0., 0., 0.],
       [0., 0., 0.]])

In [17]:
np.ones([3,4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [18]:
np.full((3, 3), True, dtype=bool)

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

关于numpy.array总结:  

+ Arrays support vectorised operations, while lists don’t.
+ Once an array is created, you cannot change its size. You will have to create a new array or overwrite the existing one.
+ Every array has one and only one dtype. All items in it should be of that dtype.
+ An equivalent numpy array occupies much less space than a python list of lists.

## How to create a 2D array containing random floats between 5 and 10?

In [19]:
np.random.uniform(5,10, size=(3,5))

array([[5.71, 8.8 , 6.41, 6.61, 6.81],
       [9.54, 7.31, 5.75, 6.58, 6.96],
       [7.25, 7.37, 8.44, 5.23, 6.92]])

## How to generate random numbers?

The **np.random** module provides nice functions to generate random numbers (and also statistical distributions) of any given shape.

In [20]:
# Random numbers between [0,1) of shape 2,3
print(np.random.rand(2,3))

[[0.36 0.46 0.7 ]
 [0.24 0.9  0.74]]


In [21]:
# Normal distribution with mean=0 and variance=1 of shape 2,3
print(np.random.randn(2,3))

[[-0.84 -1.92 -0.04]
 [ 0.48  0.82  1.14]]


In [22]:
# Random integers between [0, 10) of shape 2,3
print(np.random.randint(0, 10, size=[2,3]))

[[7 9 0]
 [5 7 6]]


In [23]:
# One random number between [0,1)
print(np.random.random())

0.8461802742171424


In [24]:
# Random numbers between [0,1) of shape 2,3
print(np.random.random(size=[2,3]))

[[0.53 0.99 0.28]
 [0.4  0.74 0.09]]


In [25]:
# Pick 10 items from a given list, with equal probability
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))  

['a' 'i' 'o' 'a' 'a' 'a' 'e' 'u' 'u' 'e']


In [26]:
# Pick 10 items from a given list with a predefined probability 'p'
print(np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10, p=[0.3, .1, 0.1, 0.4, 0.1])) 

['o' 'o' 'o' 'u' 'a' 'o' 'o' 'a' 'a' 'o']


Now, everytime you run any of the above functions, you get a different set of random numbers.

If you want to **repeat the same set of random numbers every time**, you need to set the seed or the random state. The see can be any value. The only requirement is you must set the seed to the same value every time you want to generate the same set of random numbers.

Once **np.random.RandomState** is created, all the functions of the np.random module becomes available to the created randomstate object.

In [27]:
# Create the random state
rn = np.random.RandomState(100)

# Create random numbers between [0,1) of shape 2,2
print(rn.rand(2,2))

[[0.54 0.28]
 [0.42 0.84]]


In [28]:
# Set the random seed
np.random.seed(100)

# Create random numbers between [0,1) of shape 2,2
print(np.random.rand(2,2))

[[0.54 0.28]
 [0.42 0.84]]


## How to create repeating sequences?

**np.tile** will repeat a whole list or array n times. Whereas, **np.repeat** repeats each item n times.

In [29]:
a = [1, 2, 3]

In [30]:
# Repeat whole of 'a' two times
print('Tile:   ', np.tile(a, 2))

Tile:    [1 2 3 1 2 3]


In [31]:
# Repeat each element of 'a' two times
print('Repeat: ', np.repeat(a, 2))

Repeat:  [1 1 2 2 3 3]


## How to create a new array from an existing array?

If you just assign a portion of an array to another array, the new array you just created actually refers to the parent array in memory.

That means, if you make any changes to the new array, it will reflect in the parent array as well.

So to avoid disturbing the parent array, you need to make a copy of it using **copy()**. All numpy arrays come with the copy() method.

In [32]:
arr2 = np.arange(15).reshape((3,5))
arr2a = arr2[:2, :2]
arr2a[:1, :1] = 100
arr2

array([[100,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14]])

In [33]:
arr2b = arr2[:2, :2].copy()
arr2b[:1, :1] = 101
arr2

array([[100,   1,   2,   3,   4],
       [  5,   6,   7,   8,   9],
       [ 10,  11,  12,  13,  14]])

## How to create a sequence of dates?

In [34]:
# Create date sequence
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-10'))
print(dates)

# Check if its a business day
np.is_busday(dates)

['2018-02-01' '2018-02-02' '2018-02-03' '2018-02-04' '2018-02-05'
 '2018-02-06' '2018-02-07' '2018-02-08' '2018-02-09']


array([ True,  True, False, False,  True,  True,  True,  True,  True])

# 2.查询

## How to inspect the size and shape of a numpy array?

In [35]:
list2 = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [36]:
print('Shape:\t',arr2.shape)
print('Datatype:\t',arr2.dtype)
print('Size:\t',arr2.size)
print('Num Dimensions',arr2.ndim)

Shape:	 (3, 4)
Datatype:	 float64
Size:	 12
Num Dimensions 2


## How to compute mean, min, max on the ndarray?

In [37]:
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [38]:
# mean, max and min
print("Mean value is: ", arr2.mean())
print("Max value is: ", arr2.max())
print("Min value is: ", arr2.min())

Mean value is:  4.5
Max value is:  8.0
Min value is:  1.0


In [39]:
# Row wise and column wise min
print("Column wise minimum: ", np.amin(arr2, axis=0))
print("Row wise minimum: ", np.amin(arr2, axis=1))

Column wise minimum:  [1. 2. 3. 4.]
Row wise minimum:  [1. 3. 5.]


## How to extract specific items from an array?

In [40]:
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [41]:
arr2[:2, :2]

array([[1., 2.],
       [3., 4.]])

numpy arrays support **boolean indexing**.  
A boolean index array is of the same shape as the array-to-be-filtered and it contains only True and False values. The values corresponding to True positions are retained in the output.

In [42]:
b = arr2 > 4
b

array([[False, False, False, False],
       [False, False,  True,  True],
       [ True,  True,  True,  True]])

In [43]:
arr2[b]

array([5., 6., 5., 6., 7., 8.])

## How to extract all numbers between a given range from a numpy array?

In [44]:
a = np.arange(15)

# Method 1
index = np.where((a >= 5) & (a <= 10))
a[index]

# Method 2:
index = np.where(np.logical_and(a>=5, a<=10))
a[index]
#> (array([6, 9, 10]),)

# Method 3: (thanks loganzk!)
a[(a >= 5) & (a <= 10)]

array([ 5,  6,  7,  8,  9, 10])

## How to reverse the rows and the whole array?

In [45]:
arr2

array([[1., 2., 3., 4.],
       [3., 4., 5., 6.],
       [5., 6., 7., 8.]])

In [46]:
arr2[::-1,]

array([[5., 6., 7., 8.],
       [3., 4., 5., 6.],
       [1., 2., 3., 4.]])

In [47]:
arr2[::-1, ::-1]

array([[8., 7., 6., 5.],
       [6., 5., 4., 3.],
       [4., 3., 2., 1.]])

##  How to represent missing values and infinite?

In [48]:
arr2[1, 1] = np.nan # not a number
arr2[1, 2] = np.inf # infinite
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., nan, inf,  6.],
       [ 5.,  6.,  7.,  8.]])

In [49]:
missing_bool = np.isnan(arr2) | np.isinf(arr2)
missing_bool

array([[False, False, False, False],
       [False,  True,  True, False],
       [False, False, False, False]])

In [50]:
arr2[missing_bool] = -1
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

## Reshaping and Flattening Multidimensional arrays

**flatten()** or **ravel()**  
Reshaping is changing the arrangement of items so that shape of the array changes while maintaining the same number of dimensions.

In [51]:
# Flatten it to a 1d array
arr2.flatten()

array([ 1.,  2.,  3.,  4.,  3., -1., -1.,  6.,  5.,  6.,  7.,  8.])

In [52]:
arr2.flatten().shape

(12,)

In [53]:
# Changing the flattened array does not change parent
b1 = arr2.flatten()
b1[0] = 1000  # changing b1 does not affect arr2
arr2

array([[ 1.,  2.,  3.,  4.],
       [ 3., -1., -1.,  6.],
       [ 5.,  6.,  7.,  8.]])

In [54]:
# Changing the raveled array changes the parent also.
b2 = arr2.ravel()  
b2[0] = 101  # changing b2 changes arr2 also
arr2

array([[101.,   2.,   3.,   4.],
       [  3.,  -1.,  -1.,   6.],
       [  5.,   6.,   7.,   8.]])

## How to get the unique items and the counts?

The **np.unique** method can be used to get the unique items. If you want the repetition counts of each item, set the return_counts parameter to True.

In [55]:
# Create random integers of size 10 between [0,10)
np.random.seed(100)
arr_rand = np.random.randint(0, 10, size=10)
print(arr_rand)

[8 8 3 7 7 0 4 2 5 2]


In [56]:
# Get the unique items and their counts
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items : ", uniqs)
print("Counts       : ", counts)

Unique items :  [0 2 3 4 5 7 8]
Counts       :  [1 2 1 1 1 2 2]


## How to get index locations that satisfy a given condition using np.where?

Previously you saw how to extract items from an array that satisfy a given condition. Boolean indexing, remember?

But sometimes we want to know the index positions of the items (that satisfy a condition) and do whatever you want with it.

**np.where** locates the positions in the array where a given condition holds true.

In [57]:
arr_rand = np.array([8, 8, 3, 7, 7, 0, 4, 2, 5, 2])
print("Array: ", arr_rand)

# Positions where value > 5
index_gt5 = np.where(arr_rand > 5)
print("Positions where value > 5: ", index_gt5)

Array:  [8 8 3 7 7 0 4 2 5 2]
Positions where value > 5:  (array([0, 1, 3, 4], dtype=int64),)


In [58]:
# Take items at given index
arr_rand.take(index_gt5)

array([[8, 8, 7, 7]])

In [59]:
# If value > 5, then yield 'gt5' else 'le5'
np.where(arr_rand > 5, 'gt5', 'le5')

array(['gt5', 'gt5', 'le5', 'gt5', 'gt5', 'le5', 'le5', 'le5', 'le5',
       'le5'], dtype='<U3')

In [60]:
# If value > 5, then yield arr_rand else 0

In [61]:
np.where(arr_rand > 5, arr_rand, 0)

array([8, 8, 0, 7, 7, 0, 0, 0, 0, 0])

In [62]:
# Location of the max
print('Position of max value: ', np.argmax(arr_rand))  

# Location of the min
print('Position of min value: ', np.argmin(arr_rand))  

Position of max value:  0
Position of min value:  5


# 3.操作

## 连接
---------------------

## How to concatenate two numpy arrays columnwise and row wise

There are 3 different ways of concatenating two or more numpy arrays.

+ Method 1: **np.concatenate** by changing the axis parameter to 0 and 1
+ Method 2: **np.vstack** and **np.hstack**
+ Method 3: **np.r_** and **np.c_**  

All three methods provide the same output.

One key difference to notice is unlike the other 2 methods, both np.r_ and np.c_ use **square brackets** to stack arrays. But first, let me create the arrays to be concatenated.

In [63]:
a = np.zeros([2, 3])
b = np.ones([2, 3])
print('a:\n',a)
print('b:\n',b)

a:
 [[0. 0. 0.]
 [0. 0. 0.]]
b:
 [[1. 1. 1.]
 [1. 1. 1.]]


In [64]:
# Vertical Stack Equivalents (Row wise)
np.concatenate([a, b], axis=0)  
np.vstack([a,b])  
np.r_[a,b]  

array([[0., 0., 0.],
       [0., 0., 0.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [65]:
# Horizontal Stack Equivalents (Coliumn wise)
np.concatenate([a, b], axis=1) 
np.hstack([a,b])  
np.c_[a,b]

array([[0., 0., 0., 1., 1., 1.],
       [0., 0., 0., 1., 1., 1.]])

In [66]:
np.r_[[1,2,3], 0, 0, [4,5,6]]

array([1, 2, 3, 0, 0, 4, 5, 6])

## 排序
-----

## How to sort a numpy array based on one or more columns?

In [67]:
arr = np.random.randint(1,6, size=[3, 5])
arr

array([[3, 3, 2, 1, 1],
       [5, 4, 5, 3, 1],
       [4, 2, 3, 4, 5]])

In [68]:
# Sort each columns of arr
np.sort(arr, axis=0)

array([[3, 2, 2, 1, 1],
       [4, 3, 3, 3, 1],
       [5, 4, 5, 4, 5]])

## How to sort a numpy array based on 1 column using argsort?

**np.argsort** returns the index positions of that would make a given 1d array sorted.

In [69]:
# Get the index positions that would sort the array
x = np.array([1, 10, 5, 2, 8, 9])
sort_index = np.argsort(x)
print(sort_index)

[0 3 2 4 5 1]


In [70]:
x[sort_index]

array([ 1,  2,  5,  8,  9, 10])

In [71]:
# Argsort the first column
sorted_index_1stcol = arr[:, 0].argsort()

# Sort 'arr' by first column without disturbing the integrity of rows
arr[sorted_index_1stcol]

array([[3, 3, 2, 1, 1],
       [4, 2, 3, 4, 5],
       [5, 4, 5, 3, 1]])

In [72]:
# Descending sort
arr[sorted_index_1stcol[::-1]]

array([[5, 4, 5, 3, 1],
       [4, 2, 3, 4, 5],
       [3, 3, 2, 1, 1]])

## How to sort a numpy array based on 2 or more columns?

+ You can do this using **np.lexsort** by passing a tuple of columns based on which the array should be sorted.

+ Just remember to place the column to be **sorted first at the rightmost side** inside the tuple.

In [73]:
# Sort by column 0, then by column 1
lexsorted_index = np.lexsort((arr[:, 1], arr[:, 0])) 
arr[lexsorted_index]

array([[3, 3, 2, 1, 1],
       [4, 2, 3, 4, 5],
       [5, 4, 5, 3, 1]])

## 改变行列顺序
---

## How to swap two columns in a 2d numpy array?

In [74]:
# Input
arr = np.arange(9).reshape(3,3)
print('arr:\n',arr)
# Solution
print('after change:\n',arr[:, [1,0,2]])

arr:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
after change:
 [[1 0 2]
 [4 3 5]
 [7 6 8]]


## How to swap two rows in a 2d numpy array?

In [75]:
# Input
arr = np.arange(9).reshape(3,3)
print('arr:\n',arr)
# Solution
print('after change:\n',arr[[1,0,2], :])

arr:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
after change:
 [[3 4 5]
 [0 1 2]
 [6 7 8]]


## How to reverse the rows of a 2D array?

In [76]:
# Input
arr = np.arange(9).reshape(3,3)
print('arr:\n',arr)
# Solution
print('after change:\n',arr[::-1])

arr:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
after change:
 [[6 7 8]
 [3 4 5]
 [0 1 2]]


## How to reverse the columns of a 2D array?

In [77]:
# Input
arr = np.arange(9).reshape(3,3)
print('arr:\n',arr)
# Solution
print('after change:\n',arr[:, ::-1])

arr:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
after change:
 [[2 1 0]
 [5 4 3]
 [8 7 6]]


## 增加维度
----

In [78]:
# Create a 1D array
x = np.arange(5)
print('Original array: ', x)

# Introduce a new column axis
x_col = x[:, np.newaxis]
print('x_col shape: ', x_col.shape)
print(x_col)

# Introduce a new row axis
x_row = x[np.newaxis, :]
print('x_row shape: ', x_row.shape)
print(x_row)

Original array:  [0 1 2 3 4]
x_col shape:  (5, 1)
[[0]
 [1]
 [2]
 [3]
 [4]]
x_row shape:  (1, 5)
[[0 1 2 3 4]]


## 集合运算
---

In [79]:
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

In [80]:
np.intersect1d(a,b)

array([2, 4])

In [81]:
np.setdiff1d(a,b)

array([1, 3, 5, 6])

## 自定义函数
-----

## vectorize – Make a scalar function work on vectors

With the help of **vectorize()** you can make a function that is meant to work on individual numbers, to work on arrays.

In [82]:
# Define a scalar function
def foo(x):
    if x % 2 == 1:
        return x**2
    else:
        return x/2

In [83]:
# Vectorize foo(). Make it work on vectors.
foo_v = np.vectorize(foo, otypes=[float])

In [84]:
print('x = [10, 11, 12] returns ', foo_v([10, 11, 12]))

x = [10, 11, 12] returns  [  5. 121.   6.]


## apply_along_axis – Apply a function column wise or row wise

In [85]:
# Create a 4x10 random array
np.random.seed(100)
arr_x = np.random.randint(1,10,size=[4,10])
arr_x

array([[9, 9, 4, 8, 8, 1, 5, 3, 6, 3],
       [3, 3, 2, 1, 9, 5, 1, 7, 3, 5],
       [2, 6, 4, 5, 5, 4, 8, 2, 2, 8],
       [8, 1, 3, 4, 3, 6, 9, 2, 1, 8]])

In [86]:
# Define func1d
def max_minus_min(x):
    return np.max(x) - np.min(x)

# Apply along the rows
print('Row wise: ', np.apply_along_axis(max_minus_min, 1, arr=arr_x))

# Apply along the columns
print('Column wise: ', np.apply_along_axis(max_minus_min, 0, arr=arr_x))

Row wise:  [8 8 6 8]
Column wise:  [7 8 2 7 6 5 8 5 5 5]


In [87]:
arr_x_3D = np.random.randint(1,10,size=[3,4,5])
arr_x_3D

array([[[7, 3, 1, 9, 3],
        [6, 2, 9, 2, 6],
        [5, 3, 9, 4, 6],
        [1, 4, 7, 4, 5]],

       [[8, 7, 4, 1, 5],
        [5, 6, 8, 7, 7],
        [3, 5, 3, 8, 2],
        [7, 7, 1, 8, 3]],

       [[4, 6, 5, 3, 5],
        [4, 8, 1, 1, 6],
        [7, 7, 6, 7, 5],
        [8, 4, 3, 4, 9]]])

In [88]:
# Apply along the rows
B =  np.apply_along_axis(max_minus_min, 1, arr=arr_x_3D)
print('Row wise: ', B)
print('output shape:', B.shape)

Row wise:  [[6 2 8 7 3]
 [5 2 7 7 5]
 [4 4 5 6 4]]
output shape: (3, 5)


In [89]:
# Apply along the rows
C =  np.apply_along_axis(max_minus_min, 2, arr=arr_x_3D)
print('Row wise: ', C)
print('output shape:', C.shape)

Row wise:  [[8 7 6 6]
 [7 3 6 7]
 [3 7 2 6]]
output shape: (3, 4)


In [90]:
C_add_axis = C[:,:,np.newaxis]

In [91]:
C_add_axis

array([[[8],
        [7],
        [6],
        [6]],

       [[7],
        [3],
        [6],
        [7]],

       [[3],
        [7],
        [2],
        [6]]])

In [92]:
arr_x_3D - C_add_axis

array([[[-1, -5, -7,  1, -5],
        [-1, -5,  2, -5, -1],
        [-1, -3,  3, -2,  0],
        [-5, -2,  1, -2, -1]],

       [[ 1,  0, -3, -6, -2],
        [ 2,  3,  5,  4,  4],
        [-3, -1, -3,  2, -4],
        [ 0,  0, -6,  1, -4]],

       [[ 1,  3,  2,  0,  2],
        [-3,  1, -6, -6, -1],
        [ 5,  5,  4,  5,  3],
        [ 2, -2, -3, -2,  3]]])

## 广播机制
---------

In [93]:
arr_x_3D = np.random.randint(1,10,size=[3,4,5])
arr_x_3D

array([[[8, 2, 6, 4, 1],
        [7, 3, 4, 5, 9],
        [9, 6, 3, 8, 6],
        [1, 9, 7, 3, 1]],

       [[6, 4, 3, 4, 7],
        [5, 2, 4, 2, 5],
        [9, 9, 3, 3, 8],
        [3, 2, 3, 8, 2]],

       [[1, 6, 4, 6, 3],
        [7, 2, 2, 6, 3],
        [6, 7, 5, 7, 8],
        [8, 4, 1, 3, 6]]])

In [94]:
arr_x_3D_c_mean = np.mean(arr_x_3D, axis=1)

In [95]:
arr_x_3D_c_mean = arr_x_3D_c_mean[:,np.newaxis,:]

In [96]:
arr_x_3D_1 = arr_x_3D - arr_x_3D_c_mean

In [97]:
arr_x_3D_c_var = np.std(arr_x_3D, axis=1)

In [98]:
arr_x_3D_c_var = arr_x_3D_c_var[:,np.newaxis,:]

In [99]:
arr_x_3D_2 = arr_x_3D_1 / arr_x_3D_c_var

In [100]:
np.mean(arr_x_3D_2, axis=1)

array([[-5.55e-17,  0.00e+00,  0.00e+00,  0.00e+00,  2.78e-17],
       [-5.55e-17,  0.00e+00, -5.55e-17, -5.55e-17,  0.00e+00],
       [-2.78e-17,  1.39e-17,  0.00e+00, -5.55e-17, -1.39e-17]])

In [101]:
np.std(arr_x_3D_2, axis=1)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

##  一些有用函数
---

分箱 Use **np.digitize** to return the index position of the bin each element belongs to.

In [102]:
# Create the array and bins
x = np.arange(10)
bins = np.array([0, 3, 6, 9])

# Get bin allotments
np.digitize(x, bins)

array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4], dtype=int64)

切片  
Use **np.clip** to cap the numbers within a given cutoff range. All number lesser than the lower limit will be replaced by the lower limit. Same applies to the upper limit also.

In [103]:
# Cap all elements of x to lie between 3 and 8
np.clip(x, 3, 8)

array([3, 3, 3, 3, 4, 5, 6, 7, 8, 8])

计数 **np.bincount(x)**

In [104]:
# Bincount example
x = np.array([1,1,2,2,2,4,4,5,6,6,6]) # doesn't need to be sorted

In [105]:
np.bincount(x)  # 0 occurs 0 times, 1 occurs 2 times, 2 occurs thrice, 3 occurs 0 times,...

array([0, 2, 3, 0, 2, 1, 3], dtype=int64)

# 4.存取

## How to import and export data as a csv file?

A standard way to import datasets is to use the np.genfromtxt function. It can import datasets from web URLs, handle missing values, multiple delimiters, handle irregular number of columns etc.

A less versatile version is the **np.loadtxt** which assumes the dataset has no missing values.

As an example, let’s try to read a .csv file from the below URL. Since all elements in a numpy array should be of the same data type, the last column which is a text will be imported as a ‘nan’ by default.

By setting the filling_values argument you can replace the missing values with something else.

In [106]:
# Turn off scientific notation
np.set_printoptions(suppress=True)  

# Import data from csv file url
path = 'https://raw.githubusercontent.com/selva86/datasets/master/Auto.csv'
data = np.genfromtxt(path, delimiter=',', skip_header=1, filling_values=-999, dtype='float')
data[:5]  # see first 3 rows

array([[  18. ,    8. ,  307. ,  130. , 3504. ,   12. ,   70. ,    1. ,
        -999. ],
       [  15. ,    8. ,  350. ,  165. , 3693. ,   11.5,   70. ,    1. ,
        -999. ],
       [  18. ,    8. ,  318. ,  150. , 3436. ,   11. ,   70. ,    1. ,
        -999. ],
       [  16. ,    8. ,  304. ,  150. , 3433. ,   12. ,   70. ,    1. ,
        -999. ],
       [  17. ,    8. ,  302. ,  140. , 3449. ,   10.5,   70. ,    1. ,
        -999. ]])

## How to handle datasets that has both numbers and text columns?

In [107]:
# data2 = np.genfromtxt(path, delimiter=',', skip_header=1, dtype='object')
data2 = np.genfromtxt(path, delimiter=',', skip_header=1, dtype=None)
data2[:3]  # see first 3 rows

  


array([(18., 8, 307., 130, 3504, 12. , 70, 1, b'"chevrolet chevelle malibu"'),
       (15., 8, 350., 165, 3693, 11.5, 70, 1, b'"buick skylark 320"'),
       (18., 8, 318., 150, 3436, 11. , 70, 1, b'"plymouth satellite"')],
      dtype=[('f0', '<f8'), ('f1', '<i4'), ('f2', '<f8'), ('f3', '<i4'), ('f4', '<i4'), ('f5', '<f8'), ('f6', '<i4'), ('f7', '<i4'), ('f8', 'S38')])

In [108]:
# Save the array as a csv file
np.savetxt("out.csv", data, delimiter=",")

## How to save and load numpy objects?

At some point, we will want to save large transformed numpy arrays to disk and load it back to console directly without having the re-run the data transformations code.

Numpy provides the .npy and the .npz file types for this purpose.

If you want to store a single ndarray object, store it as a .npy file using np.save. This can be loaded back using the **np.load**.

If you want to store more than 1 ndarray object in a single file, then save it as a .npz file using **np.savez**.

In [109]:
# Save single numpy array object as .npy file
np.save('myarray.npy', arr2d)  

# Save multile numy arrays as a .npz file
np.savez('array.npz', arr2d_f, arr2d_b)

In [110]:
# Load a .npy file
a = np.load('myarray.npy')
print(a)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [111]:
# Load a .npz file
b = np.load('array.npz')
print(b.files)
b['arr_0']

['arr_0', 'arr_1']


array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

# 5.其他

## Import numpy as np and see the version

In [112]:
print(np.__version__)

1.14.3


## How to extract items that satisfy a given condition from 1D array?

In [113]:
# Input
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Solution
arr[arr % 2 == 1]

array([1, 3, 5, 7, 9])

## How to replace items that satisfy a condition with another value in numpy array?

In [114]:
arr[arr % 2 == 1] = -1
arr

array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])

## How to replace items that satisfy a condition without affecting the original array?

In [115]:
arr = np.arange(10)
out = np.where(arr % 2 == 1, -1, arr)
print(arr)
out

[0 1 2 3 4 5 6 7 8 9]


array([ 0, -1,  2, -1,  4, -1,  6, -1,  8, -1])