<a href="https://colab.research.google.com/github/yuubunny/data_analytics/blob/main/Copy_of_2_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/39118381-910eb0c2-46e9-11e8-81f1-a5b897401c23.jpeg"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Numpy: Numeric computing library

NumPy (Numerical Python) is one of the core packages for numerical computing in Python. Pandas, Matplotlib, Statmodels and many other Scientific libraries rely on NumPy.

NumPy major contributions are:

* Efficient numeric computation with C primitives
* Efficient collections with vectorized operations
* An integrated and natural Linear Algebra API
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

Let's develop on efficiency. In Python, **everything is an object**, which means that even simple ints are also objects, with all the required machinery to make object work. We call them "Boxed Ints". In contrast, NumPy uses primitive numeric types (floats, ints) which makes storing and computation efficient.

<img src="https://docs.google.com/drawings/d/e/2PACX-1vTkDtKYMUVdpfVb3TTpr_8rrVtpal2dOknUUEOu85wJ1RitzHHf5nsJqz1O0SnTt8BwgJjxXMYXyIqs/pub?w=726&h=396" />


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on!

In [2]:
import sys
import numpy as np

## Basic Numpy Arrays

In [3]:
np.array([1, 2, 3, 4])

array([1, 2, 3, 4])

In [4]:
a = np.array([1, 2, 3, 4])

In [5]:
a

array([1, 2, 3, 4])

In [6]:
b = np.array([0, .5, 1, 1.5, 2])

In [7]:
b[2]

1.0

In [8]:
a[0], a[1]

(1, 2)

In [9]:
a[0:]

array([1, 2, 3, 4])

In [10]:
a[1:3] #arrays are not inclusive on the last index

array([2, 3])

In [11]:
a[1:-1]

array([2, 3])

In [12]:
a[::2]

array([1, 3])

In [13]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [14]:
b[0], b[2], b[-1]

(0.0, 1.0, 2.0)

In [15]:
b[[0, 2, -1]]

array([0., 1., 2.])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Array Types

In [16]:
a

array([1, 2, 3, 4])

In [17]:
a.dtype

dtype('int64')

In [18]:
b

array([0. , 0.5, 1. , 1.5, 2. ])

In [19]:
b.dtype

dtype('float64')

In [20]:
np.array([1, 2, 3, 4], dtype=np.float)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.array([1, 2, 3, 4], dtype=np.float)


array([1., 2., 3., 4.])

In [21]:
np.array([1, 2, 3, 4], dtype=np.int8)

array([1, 2, 3, 4], dtype=int8)

In [22]:
c = np.array(['a', 'b', 'c'])

In [23]:
c.dtype

dtype('<U1')

In [24]:
d = np.array([{'a': 1}, sys])

In [25]:
d.dtype

dtype('O')

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Dimensions and shapes

In [26]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6]
])

In [27]:
A.shape

(2, 3)

In [28]:
A.ndim

2

In [29]:
A.size

6

In [30]:
B = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4],
        [3, 2, 1]
    ]
])

In [31]:
B

array([[[12, 11, 10],
        [ 9,  8,  7]],

       [[ 6,  5,  4],
        [ 3,  2,  1]]])

In [32]:
B.shape

(2, 2, 3)

In [33]:
B.ndim

3

In [34]:
B.size

12

If the shape isn't consistent, it'll just fall back to regular Python objects:

In [35]:
C = np.array([
    [
        [12, 11, 10],
        [9, 8, 7],
    ],
    [
        [6, 5, 4]
    ]
])

  C = np.array([


In [36]:
C.dtype

dtype('O')

In [37]:
C.shape

(2,)

In [38]:
C.size

2

In [None]:
C.ndim

1

In [None]:
type(C[0])

list

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing and Slicing of Matrices

In [40]:
# Square matrix

A = np.array([
#.   0. 1. 2
    [1, 2, 3], # 0
    [4, 5, 6], # 1
    [7, 8, 9]  # 2
])

# Numpy Matrices are strictly 2d
# Numpy arrays are n dimensional

In [42]:
A.ndim # Note that the matrix is 2 dimensional

2

In [44]:
A.shape
# shape returns the rows and columns

(3, 3)

In [45]:
A.size

# size returns the number of elements in the matrix

9

### Matrices v Arrays

**Matricies** are a subclass of **arrays**. However matrcies are ONLY 2 dimensional whereas arrays can be n-dimensional. This means that matrices are able to inherit all of the attributes and methods of **ndarrays**.

The advantage of matrices is that they offer a simplification of matrix multiplication. If a and b are both matricies than the product of them is (a*b)

In [43]:
A[1]

array([4, 5, 6])

In [46]:
A[1][0]

4

In [47]:
# A[d1, d2, d3, d4]

In [48]:
A[1, 0]

4

In [49]:
A[0:2]

array([[1, 2, 3],
       [4, 5, 6]])

In [50]:
A[:, :2]

array([[1, 2],
       [4, 5],
       [7, 8]])

In [51]:
A[:2, :2]

array([[1, 2],
       [4, 5]])

In [52]:
A[:2, 2:]

array([[3],
       [6]])

In [53]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [56]:
A[1] = np.array([10, 10, 10])

In [57]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [ 7,  8,  9]])

In [58]:
A[2] = 99

In [60]:
A

array([[ 1,  2,  3],
       [10, 10, 10],
       [99, 99, 99]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Summary statistics

In [61]:
a = np.array([1, 2, 3, 4])

In [62]:
a.sum()

10

In [63]:
a.mean()

2.5

In [64]:
a.std()

1.118033988749895

In [65]:
a.var()

1.25

In [66]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [67]:
A.sum()

45

In [68]:
A.mean()

5.0

In [69]:
A.std()

2.581988897471611

In [70]:
A.sum(axis=0)

array([12, 15, 18])

In [73]:
A.sum(axis=0) # sums over the columns

array([12, 15, 18])

In [74]:
A.sum(axis=1) # sums over rows

array([ 6, 15, 24])

In [78]:
A.mean(axis=0) # mean over column

array([4., 5., 6.])

In [77]:
A.mean(axis=1)

array([2., 5., 8.])

In [79]:
A.std(axis=0)

array([2.44948974, 2.44948974, 2.44948974])

In [80]:
A.std(axis=1)

array([0.81649658, 0.81649658, 0.81649658])

And [many more](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html#array-methods)...

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Broadcasting and Vectorized operations

The arrange() method returns values from a given interval  


arrange() requires one parameter


> numpy.arrange(stop_value)





**parameters**


> **start** *optional* : integer/real


> **stop** : integer/real


> **step** *optional* : integer/real  


  

other parameters please refer to https://numpy.org/doc/stable/reference/generated/numpy.arange.html

In [101]:
a = np.arange(4)


In [102]:
a

array([0, 1, 2, 3])

In [103]:
a + 10

array([10, 11, 12, 13])

In [104]:
a * 10

array([ 0, 10, 20, 30])

In [106]:
a # note we didnt change a yet

array([0, 1, 2, 3])

In [107]:
a += 100

In [108]:
a # now we have changed a

array([100, 101, 102, 103])

In [109]:
l = [0, 1, 2, 3]

In [110]:
[i * 10 for i in l]

[0, 10, 20, 30]

In [111]:
a = np.arange(4)

In [112]:
a

array([0, 1, 2, 3])

In [113]:
b = np.array([10, 10, 10, 10])

In [114]:
b

array([10, 10, 10, 10])

In [115]:
a + b

array([10, 11, 12, 13])

In [116]:
a * b

array([ 0, 10, 20, 30])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
_(Also called masks)_

In [117]:
a = np.arange(4)

In [118]:
a

array([0, 1, 2, 3])

In [119]:
a[0], a[-1]

(0, 3)

In [120]:
a[[0, -1]]

array([0, 3])

In [124]:
a[[True, False, False, True]] # returns only the values in a that are true

array([0, 3])

In [123]:
a

array([0, 1, 2, 3])

In [125]:
a >= 2

array([False, False,  True,  True])

In [126]:
a[a >= 2]

array([2, 3])

In [127]:
a.mean()

1.5

In [128]:
a[a > a.mean()]

array([2, 3])

In [132]:
a[~(a > a.mean())] # returns values in a that are not greater than the mean of a

array([0, 1])

In [130]:
a[(a == 0) | (a == 1)] # returns values in a that are equal to 0 or 1

array([0, 1])

In [140]:
a[(a <= 2) & (a % 2 == 0)] # returns values in a that are greater than or equal to 2 and even

array([0, 2])

For np.random.randint documentation refer to  https://numpy.org/doc/stable/reference/random/generated/numpy.random.randint.html

In [141]:
A = np.random.randint(100, size=(3, 3)) #generates a 3x3 array with randomly generated numbers from 0-99

In [135]:
A

array([[76, 19,  3],
       [73, 43, 88],
       [50, 39, 19]])

In [136]:
A[np.array([
    [True, False, True],
    [False, True, False],
    [True, False, True]
])]

array([76,  3, 43, 50, 19])

In [137]:
A > 30

array([[ True, False, False],
       [ True,  True,  True],
       [ True,  True, False]])

In [138]:
A[A > 30]

array([76, 73, 43, 88, 50, 39])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Linear Algebra

In [139]:
A = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

In [142]:
B = np.array([
    [6, 5],
    [4, 3],
    [2, 1]
])

In [143]:
A.dot(B)

array([[ 742,  564],
       [ 988,  736],
       [1018,  800]])

In [144]:
A @ B

array([[ 742,  564],
       [ 988,  736],
       [1018,  800]])

In [145]:
B.T

array([[6, 4, 2],
       [5, 3, 1]])

In [146]:
A

array([[76, 41, 61],
       [75, 92, 85],
       [97, 97, 24]])

In [147]:
B.T @ A

array([[950, 808, 754],
       [702, 578, 584]])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Size of objects in Memory

### Int, floats

In [148]:
# An integer in Python is > 24bytes
sys.getsizeof(1)

28

In [149]:
# Longs are even larger
sys.getsizeof(10**100)

72

In [150]:
# Numpy size is much smaller
np.dtype(int).itemsize

8

In [151]:
# Numpy size is much smaller
np.dtype(np.int8).itemsize

1

In [152]:
np.dtype(float).itemsize

8

### Lists are even larger

In [153]:
# A one-element list
sys.getsizeof([1])

64

In [154]:
# An array of one element in numpy
np.array([1]).nbytes

8

### And performance is also important

In [155]:
l = list(range(100000))

In [156]:
a = np.arange(100000)

In [157]:
%time np.sum(a ** 2) # Note how computations are significantly faster using numpy vs python

CPU times: user 1.95 ms, sys: 0 ns, total: 1.95 ms
Wall time: 1.98 ms


333328333350000

In [158]:
%time sum([x ** 2 for x in l])

CPU times: user 35.4 ms, sys: 1.78 ms, total: 37.2 ms
Wall time: 36.8 ms


333328333350000

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Useful Numpy functions

* random
* arrange
* reshape
* linspace
* zeros, ones, empty
* identity and eye

### `random`

**np.random.random()** requires no arguments and returns random floats from the half open interval [0,1)

refer to https://numpy.org/doc/stable/reference/random/generated/numpy.random.random.html


In [163]:
np.random.random(size=2)

array([0.05349167, 0.12746379])

In [164]:
np.random.random()

0.9042321176172735

In [165]:
np.random.normal(size=2)

array([-1.45229945,  1.19225589])

In [168]:
np.random.normal(10,2,100) # returns sample values from the normal distribution

array([11.06020905,  8.8444715 ,  8.85740127, 10.44416704, 13.75853291,
        8.59636677,  9.19710946,  6.9739708 ,  5.95606526,  8.22798729,
       11.40995711, 11.78458411, 10.04247595,  7.74365272, 10.41591159,
        8.51466594,  9.84362967, 10.14582171, 13.34804501, 11.01710055,
        8.67165248, 10.87075065,  6.78531379, 10.84206843, 10.62361483,
        9.94733232, 10.77257888, 10.24536762,  9.31248515,  6.28344822,
       12.49822105, 11.50028686, 14.46511849, 11.61823382, 10.8090977 ,
       12.5135724 ,  9.2576005 , 11.07157409,  7.45203644, 10.35596666,
        8.64457903, 11.41594845, 11.16606727, 11.52061616, 10.64763345,
       12.81853293, 12.40533547,  7.66676141,  7.08107298, 12.99762771,
        8.16124569, 10.05823072,  6.56591337, 12.00647817,  9.53294428,
       11.73851311,  8.65454652,  6.20848562, 12.0225809 , 11.25911047,
        9.15454291, 11.73822227, 12.66931339, 11.04519308,  9.26774835,
       10.3549073 , 15.01789508,  8.95963165, 10.13076284, 11.92

In [169]:
np.random.rand(2, 4) # creates an array of given shape & populates it with random values from a uniform distribution [0,1)

array([[0.8831863 , 0.64452469, 0.93924181, 0.32942107],
       [0.07431756, 0.01870701, 0.47517385, 0.73263633]])

---
### `arange`

In [170]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [171]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [172]:
np.arange(0, 1, .1)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])

---
### `reshape`

In [173]:
np.arange(10).reshape(2, 5) # returns new shape of an array without changing its data

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [174]:
np.arange(10).reshape(5, 2)

array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])

---
### `linspace`

In [175]:
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [176]:
np.linspace(0, 1, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [177]:
np.linspace(0, 1, 20, False)

array([0.  , 0.05, 0.1 , 0.15, 0.2 , 0.25, 0.3 , 0.35, 0.4 , 0.45, 0.5 ,
       0.55, 0.6 , 0.65, 0.7 , 0.75, 0.8 , 0.85, 0.9 , 0.95])

---
### `zeros`, `ones`, `empty`

In [178]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [179]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [180]:
np.zeros((3, 3), dtype=np.int)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.zeros((3, 3), dtype=np.int)


array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [181]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [182]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [183]:
np.empty(5)

array([1., 1., 1., 1., 1.])

In [185]:
np.empty((2, 2))

array([[0.25, 0.5 ],
       [0.75, 1.  ]])

In [186]:
np.empty(2)

array([1.45229945, 1.19225589])

---
### `identity` and `eye`

In [187]:
np.identity(3) #identity matrix

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [188]:
np.eye(3, 3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [189]:
np.eye(8, 4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [190]:
np.eye(8, 4, k=1)

array([[0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [191]:
np.eye(8, 4, k=-3)

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.],
       [0., 0., 0., 0.]])

In [192]:
"Hello World"[6]

'W'

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)