# numpy
- Best array data manipulation, fast  
- numpy array allows only single data type, unlike list  
- Support matrix operation

## Environment Setup

In [191]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:75% !important; margin-left:350px; }</style>"))
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import math
pd.set_option( 'display.notebook_repr_html', False)  # render Series and DataFrame as text, not HTML
pd.set_option( 'display.max_column', 10)    # number of columns
pd.set_option( 'display.max_rows', 10)     # number of rows
pd.set_option( 'display.width', 90)        # number of characters per row

## Module Import

In [192]:
import numpy as np
np.__version__

## other modules
from datetime import datetime
from datetime import date
from datetime import time

## Data Types

### NumPy Data Types

NumPy supports a much greater variety of numerical types than Python does. This makes numpy **much more powerful**
https://www.numpy.org/devdocs/user/basics.types.html

Integer: np.int8, np.int16, np.int32, np.uint8, np.uint16, np.uint32  
Float: np.float32, np.float64

### int32/64
```np.int``` is actually **python standard int**

In [193]:
x = np.int(13)
y = int(13)
print( type(x) )
print( type(y) )

<class 'int'>
<class 'int'>


```np.int32/64``` are NumPy specific

In [194]:
x = np.int32(13)
y = np.int64(13)
print( type(x) )
print( type(y) )

<class 'numpy.int32'>
<class 'numpy.int64'>


### float32/64

In [195]:
x = np.float(13)
y = float(13)
print( type(x) )
print( type(y) )

<class 'float'>
<class 'float'>


In [196]:
x = np.float32(13)
y = np.float64(13)
print( type(x) )
print( type(y) )

<class 'numpy.float32'>
<class 'numpy.float64'>


### bool
```np.bool``` is actually **python standard bool**

In [197]:
x = np.bool(True)
print( type(x) )
print( type(True) )

<class 'bool'>
<class 'bool'>


### str
```np.str``` is actually **python standard str**

In [198]:
x = np.str("ali")
print( type(x) )

<class 'str'>


In [199]:
x = np.str_("ali")
print( type(x) )

<class 'numpy.str_'>


### datetime64
Unlike python standard datetime library, there is **no seperation** of date, datetime and time.  
There is **no time equivalent object**  
NumPy only has one object: **datetime64** object .

#### Constructor
**From String**  
Note that the input string **cannot be ISO8601 compliance**, meaning any timezone related information at the end of the string (such as Z or +8) will result in **error**.

In [286]:
np.datetime64('2005-02')

numpy.datetime64('2005-02')

In [201]:
np.datetime64('2005-02-25')

numpy.datetime64('2005-02-25')

In [202]:
np.datetime64('2005-02-25T03:30')

numpy.datetime64('2005-02-25T03:30')

**From datetime**

In [203]:
np.datetime64( date.today() )

numpy.datetime64('2019-02-05')

In [204]:
np.datetime64( datetime.now() )

numpy.datetime64('2019-02-05T13:51:43.945110')

#### Instance Method
Convert to **datetime** using **```astype()```**

In [279]:
dt64 = np.datetime64("2019-01-31" )
dt64.astype(datetime)

datetime.date(2019, 1, 31)

## Numpy Array

### Concept
Structure
- NumPy provides an N-dimensional array type, the **ndarray**
- **ndarray** is **homogenous**: every item takes up the same size block of memory, and all blocks
- For each ndarray, there is a seperate **dtype object**, which describe ndarray data type  
- An item extracted from an array, e.g., by indexing, is represented by a Python object whose type is one of the array scalar types built in NumPy. The array scalars allow easy manipulation of also more complicated arrangements of data.
![numpy_concept](./img/numpy.png)

### Constructor
By default, numpy.array autodetect its data types based on most common denominator

#### int, float

Notice example below **auto detected** as int32 data type

In [205]:
x = np.array( (1,2,3,4,5) )
x

array([1, 2, 3, 4, 5])

In [206]:
x.dtype

dtype('int32')

Notice example below **auto detected** as float64 data type

In [207]:
x = np.array( (1,2,3,4.5,5) )
x

array([1. , 2. , 3. , 4.5, 5. ])

In [208]:
x.dtype

dtype('float64')

You can specify dtype to specify desired data types.   
NumPy will **auto convert** the data into specifeid types. Observe below that we convert float into integer

In [209]:
x = np.array( (1,2,3,4.5,5), dtype='int' )
x.dtype
print( x )

[1 2 3 4 5]


#### datetime64
Provide **array of string in YYYY-MM-DD** format will generate array of datetime64

In [310]:
ar = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
ar

array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64[D]')

In [315]:
ar.dtype  # array dtype

dtype('<M8[D]')

In [319]:
type(ar[1]) # underlying date type

numpy.datetime64

In [320]:
ar[1]

numpy.datetime64('2006-01-13')

#### 2D Array

In [210]:
x = np.array([range(10),np.arange(10)])
x

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

### Dimensions

#### Differentiating Dimensions
1-D array is array of single list  
2-D array is array made of list containing lists (each row is a list)  
2-D single row array is array with list containing just one list

#### 1-D Array
Observe that the **shape of the array** is (5,). It seems like an array with 5 rows, **empty columns** !  
What it really means is 5 items **single dimension**.

In [241]:
arr = np.array(range(5))
print (arr)
print (arr.shape)
print (arr.ndim)

[0 1 2 3 4]
(5,)
1


#### 2-D Array

In [242]:
arr = np.array([range(5),range(5,10),range(10,15)])
print (arr)
print (arr.shape)
print (arr.ndim)

[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]
(3, 5)
2


#### 2-D Array - Single Row

In [243]:
arr = np.array([range(5)])
print (arr)
print (arr.shape)
print (arr.ndim)

[[0 1 2 3 4]]
(1, 5)
2


#### 2-D Array : Single Column
Using array slicing method with **newaxis** at **COLUMN**, will turn 1D array into 2D of **single column**

In [244]:
arr = np.arange(5)[:, np.newaxis]
print (arr)
print (arr.shape)
print (arr.ndim)

[[0]
 [1]
 [2]
 [3]
 [4]]
(5, 1)
2


Using array slicing method with **newaxis** at **ROW**, will turn 1D array into 2D of **single row**

In [245]:
arr = np.arange(5)[np.newaxis,:]
print (arr)
print (arr.shape)
print (arr.ndim)

[[0 1 2 3 4]]
(1, 5)
2


### Class Method

#### ```arange()```
Generate array with a sequence of numbers

In [211]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#### ```ones()```

In [212]:
np.ones(10)  # One dimension, default is float

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [213]:
np.ones((2,5),'int')  #Two dimensions

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

#### ```zeros()```

In [214]:
np.zeros( 10 )    # One dimension, default is float

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [215]:
np.zeros((2,5),'int')   # 2 rows, 5 columns of ZERO

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

#### ```where()```
On **1D array** ```numpy.where()``` returns the items matching the criteria

In [216]:
ar1 = np.array(range(10))
print( ar1 )
print( np.where(ar1>5) )

[0 1 2 3 4 5 6 7 8 9]
(array([6, 7, 8, 9], dtype=int64),)


On **2D array**, ```where()``` return array of **row index and col index** for matching elements

In [217]:
ar = np.array([(1,2,3,4,5),(11,12,13,14,15),(21,22,23,24,25)])
print ('Data : \n', ar)
np.where(ar>13)

Data : 
 [[ 1  2  3  4  5]
 [11 12 13 14 15]
 [21 22 23 24 25]]


(array([1, 1, 2, 2, 2, 2, 2], dtype=int64),
 array([3, 4, 0, 1, 2, 3, 4], dtype=int64))

#### Logical Methods
**```numpy.logical_or```**  
Perform **or** operation on two boolean array,  generate new resulting **boolean arrays**

In [218]:
ar = np.arange(10)
print( ar==3 )  # boolean array 1
print( ar==6 )  # boolean array 2
print( np.logical_or(ar==3,ar==6 ) ) # resulting boolean

[False False False  True False False False False False False]
[False False False False False False  True False False False]
[False False False  True False False  True False False False]


**```numpy.logical_and```**  
Perform **and** operation on two boolean array,  generate new resulting **boolean arrays**

In [285]:
ar = np.arange(10)
print( ar==3 ) # boolean array 1
print( ar==6 ) # boolean array 2
print( np.logical_and(ar==3,ar==6 ) )  # resulting boolean

[False False False  True False False False False False False]
[False False False False False False  True False False False]
[False False False False False False False False False False]


### Instance Method

#### ``` astype()``` conversion
**Convert to from datetime64 to datetime**

In [337]:
ar1 = np.array(['2007-07-13', '2006-01-13', '2010-08-13'], dtype='datetime64')
print( type(ar1) )  ## a numpy array
print( ar1.dtype )  ## dtype is a numpy data type

<class 'numpy.ndarray'>
datetime64[D]


After convert to datetime (non-numpy object, the dtype becomes **generic 'object'**.

In [340]:
ar2 = ar1.astype(datetime)
print( type(ar2) )  ## still a numpy array
print( ar2.dtype )  ## dtype becomes generic 'object'

<class 'numpy.ndarray'>
object


#### ```reshape()```
```
reshape ( row numbers, col numbers )
```

**Sample Data**

In [220]:
a = np.array([range(5), range(10,15), range(20,25), range(30,35)])
a

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14],
       [20, 21, 22, 23, 24],
       [30, 31, 32, 33, 34]])

**Resphepe 1-Dim to 2-Dim**

In [221]:
np.arange(12) # 1-D Array

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [222]:
np.arange(12).reshape(3,4)  # 2-D Array

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

**Respahe 2-Dim to 2-Dim**

In [223]:
np.array([range(5), range(10,15)])  # 2-D Array

array([[ 0,  1,  2,  3,  4],
       [10, 11, 12, 13, 14]])

In [224]:
np.array([range(5), range(10,15)]).reshape(5,2) # 2-D Array

array([[ 0,  1],
       [ 2,  3],
       [ 4, 10],
       [11, 12],
       [13, 14]])

**Reshape 2-Dimension to 2-Dim (of single row)**
- Change 2x10 to 1x10  
- Observe [[ ]], and the number of dimension is stll 2, don't be fooled

In [225]:
np.array( [range(0,5), range(5,10)])  # 2-D Array

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [226]:
np.array( [range(0,5), range(5,10)]).reshape(1,10) # 2-D Array

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

**Reshape 1-Dim Array to 2-Dim Array (single column)**

In [227]:
np.arange(8)

array([0, 1, 2, 3, 4, 5, 6, 7])

In [228]:
np.arange(8).reshape(8,1)

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7]])

A better method, use **newaxis**, easier because no need to input row number as parameter

In [229]:
np.arange(8)[:,np.newaxis]

array([[0],
       [1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7]])

**Reshape 1-Dim Array to 2-Dim Array (single row)**

In [230]:
np.arange(8)

array([0, 1, 2, 3, 4, 5, 6, 7])

In [231]:
np.arange(8)[np.newaxis,:]

array([[0, 1, 2, 3, 4, 5, 6, 7]])

### Element Selection
#### Sample Data

In [232]:
x1 = np.array( (0,1,2,3,4,5,6,7,8))
x2 = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
print(x1)
print(x2)

[0 1 2 3 4 5 6 7 8]
[[ 1  2  3  4  5]
 [11 12 13 14 15]
 [21 22 23 24 25]]


#### 1-Dimension
All indexing starts from 0 (not 1)

Choosing **Single Element** does not return array

In [233]:
print( x1[0]   )  ## first element
print( x1[-1]  )  ## last element

print( x1[3]   )  ## third element from start 3
print( x1[-3]  )  ## third element from end

0
8
3
6


Selecting **multiple elments** return **ndarray**

In [234]:
print( x1[:3]  )  ## first 3 elements
print( x1[-3:])   ## last 3 elements

print( x1[3:]  )  ## all except first 3 elements
print( x1[:-3] )  ## all except last 3 elements

print( x1[1:4] )  ## elemnt 1 to 4 (not including 4)

[0 1 2]
[6 7 8]
[3 4 5 6 7 8]
[0 1 2 3 4 5]
[1 2 3]


#### 2-Dimension
Indexing with **[ row_positoins, row_positions ]**, index starts with 0

In [235]:
x[1:3, 1:4] # row 1 to 2 column 1 to 3

array([[1, 2, 3]])

### Attributes
#### ```dtype```
ndarray contain a property called **dtype**, whcih tell us the type of underlying items

In [236]:
a = np.array( (1,2,3,4,5), dtype='float' )
a.dtype

dtype('float64')

In [237]:
print(a.dtype)
print( type(a[1]))

float64
<class 'numpy.float64'>


#### ```dim```
**```dim```** returns the number of dimensions of the NumPy array. Example below shows 2-D array

In [238]:
x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.ndim  

2

#### ```shape```
**```shape```** return a type of **(rows, cols)**

In [239]:
x = np.array(( (1,2,3,4,5), 
      (11,12,13,14,15),
      (21,22,23,24,25)))
x.shape  

(3, 5)

In [240]:
np.identity(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

### Operations

#### Arithmetic
**Sample Date**

In [341]:
ar = np.arange(10)
print( ar )

[0 1 2 3 4 5 6 7 8 9]


**```*```**

In [246]:
ar = np.arange(10)
print (ar)
print (ar*2)

[0 1 2 3 4 5 6 7 8 9]
[ 0  2  4  6  8 10 12 14 16 18]


```**+ and -**```

In [247]:
ar = np.arange(10)
print (ar+2)
print (ar-2)

[ 2  3  4  5  6  7  8  9 10 11]
[-2 -1  0  1  2  3  4  5  6  7]


#### Comparison
**Sample Data**

In [341]:
ar = np.arange(10)
print( ar )

[0 1 2 3 4 5 6 7 8 9]


**```==```**

In [251]:
print( ar==3 )

[False False False  True False False False False False False]


**```>, >=, <, <=```**

In [252]:
print( ar>3 )
print( ar<=3 )

[False False False False  True  True  True  True  True  True]
[ True  True  True  True False False False False False False]


## Random Numbers

### Uniform Distribution

#### Random Integer (with Replacement)
**randint()** Return random integers from **low (inclusive) to high (exclusive)**
```
np.random.randint( low )                  # generate an integer, i, which         i < low
np.random.randint( low, high )            # generate an integer, i, which  low <= i < high
np.random.randint( low, high, size=1)     # generate an ndarray of integer, single dimension
np.random.randint( low, high, size=(r,c)) # generate an ndarray of integer, two dimensions 
```

In [253]:
np.random.randint( 10 )

1

In [254]:
np.random.randint( 10, 20 )

16

In [255]:
np.random.randint( 10, high=20, size=5)   # single dimension

array([12, 10, 16, 14, 12])

In [256]:
np.random.randint( 10, 20, (3,5) )        # two dimensions

array([[16, 19, 12, 19, 18],
       [15, 18, 10, 10, 17],
       [14, 12, 11, 11, 15]])

#### Random Integer (with or without replacement)
```
numpy.random .choice( a, size, replace=True)
 # sampling from a, 
 #   if a is integer, then it is assumed sampling from arange(a)
 #   if a is an 1-D array, then sampling from this array
```

In [257]:
np.random.choice(10,5, replace=False) # take 5 samples from 0:19, without replacement

array([2, 4, 0, 9, 8])

In [258]:
np.random.choice( np.arange(10,20), 5, replace=False)

array([12, 13, 18, 16, 10])

#### Random Float
**randf()**  Generate float numbers in **between 0.0 and 1.0**
```
np.random.ranf(size=None)
```

In [259]:
np.random.ranf(4)

array([0.94250048, 0.94960753, 0.29055676, 0.25391307])

**uniform()** Return random float from **low (inclusive) to high (exclusive)**
```
np.random.uniform( low )                  # generate an float, i, which         f < low
np.random.uniform( low, high )            # generate an float, i, which  low <= f < high
np.random.uniform( low, high, size=1)     # generate an array of float, single dimension
np.random.uniform( low, high, size=(r,c)) # generate an array of float, two dimensions 
```

In [260]:
np.random.uniform( 2 )

1.8839947654697506

In [261]:
np.random.uniform( 2,5, size=(4,4) )

array([[3.53074707, 4.8318309 , 3.72637778, 3.91985394],
       [2.1729616 , 2.41428125, 3.09702137, 2.81555087],
       [4.95442552, 3.08758534, 4.04481284, 4.36283031],
       [2.98175957, 2.97723315, 3.46863149, 2.1131226 ]])

### Normal Distribution

```
numpy. random.randn (n_items)       # 1-D standard normal (mean=0, stdev=1)
numpy. random.randn (nrows, ncols)  # 2-D standard normal (mean=0, stdev=1)
numpy. random.standard_normal( size=None )                # default to mean = 0, stdev = 1, non-configurable
numpy. random.normal         ( loc=0, scale=1, size=None) # loc = mean, scale = stdev, size = dimension
```

#### Standard Normal Distribution
Generate random normal numbers with gaussion distribution (mean=0, stdev=1)

**One Dimension**

In [262]:
np.random.standard_normal(3)

array([-0.74560822, -2.31015733, -0.54510148])

In [263]:
np.random.randn(3)

array([ 0.65124691,  0.14847405, -0.18527882])

**Two Dimensions**

In [264]:
np.random.randn(2,4)

array([[ 0.43584434,  0.4222265 ,  0.532138  , -0.29037816],
       [-1.82357373,  0.97528663,  0.96177261,  1.38631561]])

In [265]:
np.random.standard_normal((2,4))

array([[-0.1877177 , -1.62689842, -0.87913195, -0.37687932],
       [-0.25675386,  0.66879526, -0.19631951, -0.61547975]])

**Observe:** randn(), standard_normal() and normal() are able to generate standard normal numbers

In [266]:
np.random.seed(15)
print (np.random.randn(5))
np.random.seed(15)
print (np.random.normal ( size = 5 )) # stdev and mean not specified, default to standard normal
np.random.seed(15)
print (np.random.standard_normal (size=5))

[-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]
[-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]
[-0.31232848  0.33928471 -0.15590853 -0.50178967  0.23556889]


#### Normal Distribution (Non-Standard)

In [267]:
np.random.seed(125)
np.random.normal( loc = 12, scale=1.25, size=(3,3))

array([[11.12645382, 12.01327885, 10.81651695],
       [12.41091248, 12.39383072, 11.49647195],
       [ 8.70837035, 12.25246312, 11.49084235]])

#### Linear Spacing

```
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)
# endpoint: If True, stop is the last sample, otherwise it is not included
```

**Include Endpoint**  
Step = Gap divide by (number of elements minus 1) (2/(10-1))

In [268]:
np.linspace(1,3,10) #default endpont=True

array([1.        , 1.22222222, 1.44444444, 1.66666667, 1.88888889,
       2.11111111, 2.33333333, 2.55555556, 2.77777778, 3.        ])

**Does Not Include Endpoint**  
Step = Gap divide by (number of elements minus 1) (2/(101))

In [269]:
np.linspace(1,3,10,endpoint=False)

array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8])

## Sampling (Integer)
```
random.choice( a, size=None, replace=True, p=None)  # a=integer, return <size> integers < a
random.choice( a, size=None, replace=True, p=None)  # a=array-like, return <size> integers picked from list a
```

In [270]:
np.random.choice (100, size=10)

array([58,  0, 84, 50, 89, 32, 87, 30, 66, 92])

In [271]:
np.random.choice( [1,3,5,7,9,11,13,15,17,19,21,23], size=10, replace=False)

array([ 5,  1, 23, 17,  3, 13, 15,  9, 21,  7])

## NaN : Missing Numerical Data

- You should be aware that NaN is a bit like a data virus–it infects any other object it touches  


In [272]:
t = np.array([1, np.nan, 3, 4]) 
t.dtype

dtype('float64')

Regardless of the operation, the result of arithmetic with NaN will be another NaN

In [273]:
1 + np.nan

nan

In [274]:
t.sum(), t.mean(), t.max()

  return umr_maximum(a, axis, None, out, keepdims, initial)


(nan, nan, nan)

In [275]:
np.nansum(t), np.nanmean(t), np.nanmax(t)

(8.0, 2.6666666666666665, 4.0)