# Chapter 4
## NumPy Basics: Arrays and Vectorized Computation
- efficient array package
- many packages depends on numpy

### Our focus:
- Fast array-based operations for data munging and cleaning, subsetting and filtering, transformation, and any other kind of computation
- Common array algorithms like sorting, unique, and set operations
- Efficient descriptive statistics and aggregating/summarizing data
- Data alignment and relational data manipulations for merging and joining heterogeneous datasets
- Expressing conditional logic as array expressions instead of loops with if-elif-else branches
- Group-wise data manipulations (aggregation, transformation, and function application)

In [9]:
import numpy as np
array = np.arange(1_000_000) # we can add _ for readability!
l = [x for x in range(1_000_000)]

In [17]:
%timeit?
# https://stackoverflow.com/questions/48258008/n-and-r-arguments-to-ipythons-timeit-magic/59543135#59543135
# gives an explanation of why there are runs and loops (measurement error of clock basically?)

In [14]:
# now lets time when do some operation to see efficiency difference
%timeit array2 = array*2
%timeit l2 = [x*2 for x in l]

1.44 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
68.9 ms ± 1.03 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [19]:
68.9/1.44 # almost 50 times faster with numpy than built-in list!

47.84722222222223

### Conclusion: Numpy arrays are much faster than

In [20]:
ar = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])
ar

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

A ndarray is a container meant for for homogeneous data - all elements ought to be the same

In [34]:
ar.dtype

dtype('float64')

we can create an array with mixed types, the numpy array then gets a mixed dtype which many methods will not accept

In [35]:
numpy_arr = np.array([1,2,"Hello",3,"World"])
numpy_arr.dtype
numpy_arr*2

UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U11'), dtype('int32')) -> None

### Creating a ndarray

In [44]:
# we can write
ar = np.array([6, 7.5, 8, 0, 1])

# or
l = [6, 7.5, 8, 0, 1]
np.array(l)

# 2-dim
ar2d = np.array([[6, 7.5, 8, 0, 1],[6, 7.5, 8, 0, 1]])

# or
l = [6, 7.5, 8, 0, 1], [6, 7.5, 8, 0, 1]
np.array(l)

array([[6. , 7.5, 8. , 0. , 1. ],
       [6. , 7.5, 8. , 0. , 1. ]])

- ndarrays have data types which can be seen by type .dtype
- unless explicitly stating np.array(..., dtype= wanted type ) then a dtype is inferred from the inputs
- dtypes can be changed using 'astype'-method

In [76]:
arr = np.array([1, 2, 3, 4, 5])
print(arr.dtype)
arr_new = arr.astype(np.float64)
print(arr_new.dtype)

int32
float64


One might have to convert an array of strings representing numbers as follows:

In [79]:
numeric_strings = np.array(["1.25", "-9.6", "42"], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

### Shapes and pre-filled ndarrays:

In [47]:
# getting the shape of a ndarray
ar2d.shape

(2, 5)

In [81]:
np.arange(10) # works like [x for x in range(10)]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [71]:
print( np.zeros(10) )
print( np.ones(10), '\n' )
print( np.ones((5,5)), '\n')
print( np.eye(3,3) )

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] 

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]] 

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


### Arithmetic
no for loops needed, we can do batch operations called vectorization in numpy

In [83]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr

array([[1., 2., 3.],
       [4., 5., 6.]])

In [89]:
print(arr + arr, '\n')
print(arr**2, '\n')
print(1/arr, '\n')

arr*1/2 > 1/arr


[[ 2.  4.  6.]
 [ 8. 10. 12.]] 

[[ 1.  4.  9.]
 [16. 25. 36.]] 

[[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]] 



array([[False,  True,  True],
       [ True,  True,  True]])

### indexing and slicing

In [95]:
arr = np.arange(10)
print(arr[0], '\n')
print(arr[2:7], '\n')

0 

[2 3 4 5 6] 



- a difference between built-in list is that data is not copied, but slices refer to the corresponding part of the ndarray, a change to the slice changes the ndarray, see the following

In [100]:
print(arr[5:8])

arr_slice = arr[5:8]
print(arr_slice)

arr[5:8] = 12
print(arr_slice)

arr_slice[1]=1000
print(arr)

[12 12 12]
[12 12 12]
[12 12 12]
[   0    1    2    3    4   12 1000   12    8    9]


- if we instead want a copy we use the .copy() method

In [103]:
arr = np.arange(10)
arr_copy = arr.copy()
arr[0] = 100
arr_copy # remains unchanged!

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

### 2dim

In [104]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [105]:
arr2d[0]

array([1, 2, 3])

In [106]:
arr2d[0,0]

1

In [108]:
arr2d[0,0]==arr2d[0][0]

True