<a href="https://colab.research.google.com/github/xalejandrow/machine-learning-prework/blob/main/Copia_de_02_1_Intro_to_Numpy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/4GeeksAcademy/machine-learning-prework/blob/main/02-numpy/assets/numpy_logo.png?raw=1" alt="logo" width="400"/>

## Introduction to numpy

NumPy stands for 'Numerical Python'. It is an open-source Python library used to perform various mathematical and scientific tasks. It contains multi-dimensional arrays and matrixes, along with many high-level mathematical functions that operate on them. It contains among other things:

→ a powerful N-dimensional array object.

→ sophisticated (broadcasting) functions.

→ tools for integrating C/C++ and Fortran code.

→ useful linear algebra, Fourier transform, and random number capabilities.

## Installing NumPy
When you want to work with numpy locally, you should run the following commands:

You can install NumPy with:\
`pip install numpy`\
or\
`conda install numpy`

In our case, 4Geeks have prepared all the environment in order that you can work comfortably.

## Why should we use NumPy?

Numpy is a library that performs numerical calculations in python. We will use it mainly because it allows us to create, modify matrixes, and do operations on them with ease.

NumPy is like Pandas, Matplotlib or Scikit-Learn. It is one of the packages that you cannot miss when you are studying Machine Learning. Mainly because this library provides a matrix data structure that has some benefits over regular Python lists. Some of these benefits are: being more compact, quicker access to reading and writing articles, more convenient and efficient.

For example, we will see later in the bootcamp that working with images is dealing with three-dimensional matrixes as large as 3840 x 2160, which means we will have 3×3840×2160 = 24883200 entries!!! 😱😱😱.

Working with matrixes of that magnitude is practically impossible to carry out with lists and dictionaries if one wants to have an efficient and fast programming.

#### Exercise: Import the numpy package under the name `np` (★☆☆)

`numpy` is commonly imported as `np` so we highly recommend to put this alias.

In [36]:
import numpy as np

## What is an array and why it is important for Machine Learning?

An array is a data structure consisting of a collection of elements (values or variables), each identified by at least one index or key.


![alt text](assets/1D.png "1D")

An array is known as the central data structure of the Numpy library; it can also be of several dimensions. For example, neural networks sometimes deal with 4D arrays.

Later on, we are also going to use another kind of arrays called: Tensors.

![alt text](./assets/3D.png "3D")

#### Exercise: Print the version and configuration of numpy (★☆☆)

You can print the version of any package of Python using `name_of_package.__version__`

In [37]:
from ctypes import pythonapi
import numpy as np
print(np.__version__)

1.21.6


#### Exercise: Create a null vector of size 10 (★☆☆)

A `null vector` is an array of zeros (`0`), also called `initialization vector`.

>Check de function `np.zeros` (https://numpy.org/doc/stable/reference/generated/numpy.zeros.html)

In [6]:
np.zeros(0)

array([], dtype=float64)

#### Exercise: Create a vector of ones with size 10 (★☆☆)

>Check de function `np.ones` (https://numpy.org/doc/stable/reference/generated/numpy.ones.html)

In [7]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

#### Exercise: Create an 1D array with a specific start value, end value, and number of values (★☆☆)

>Check the function `np.linspace` (https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)

In [8]:
np.linspace(1,2,10)

array([1.        , 1.11111111, 1.22222222, 1.33333333, 1.44444444,
       1.55555556, 1.66666667, 1.77777778, 1.88888889, 2.        ])

#### Run: Create a vector (1D array) with random integers from 10 to 49 and dimension 1x35 (★☆☆)

When `dimension` is expressed `1x35` it means: One dimension array with 35 items (length=35).

>Check the function `np.random` which allows you to create random arrays (https://numpy.org/doc/1.16/reference/routines.random.html)

In [9]:
import numpy as np

## 10 random numbers between (0, 1)
print(np.random.random(10)) 

[0.70366381 0.84645487 0.0749608  0.01098744 0.25912224 0.1263394
 0.70461799 0.24803037 0.30556218 0.15387526]


In [10]:
## Two ways to create numbers with normal distribution
print(np.random.rand(10)) # 10 random values with distribution N(0,1)
print(np.random.normal(loc = 0, scale = 1, size = 10)) # 10 random values with distribution N(0,1)

[0.21406123 0.90782751 0.0517274  0.84779707 0.85492579 0.60919865
 0.8312665  0.67573196 0.95807102 0.08007586]
[-1.1510162  -0.08334909 -1.13305293  0.96126494  0.22758861  0.95589715
 -0.24365657  1.02843587 -1.0768434   1.45243999]


In [11]:
## Did you notice the difference between both functions? 
print(np.random.normal(loc = -5, scale = 33, size = 10)) # 10 random values with distribution N(-5,33)

[ 18.82781862   4.60421365 -54.49314388 -13.15317385  77.6304475
  19.11384345  15.70169537  34.16492555  17.30644626 -57.9541401 ]


In [12]:
## 10 random values with uniform distribution. That means, all values have the same probability
print(np.random.uniform(-30,100,10)) # All values are between -30 and 100.

[ 62.10401835  70.36944842  33.75143824  53.66667606 -29.64579697
  74.26965764  40.40790918   4.23482696  39.58029292  13.84264878]


In [13]:
# 10 integers values between 0 and 100.
print(np.random.randint(0, 100, 10))

[71 39 84 10 40 11 95 80 67 11]


In [14]:
# 10 random values with Chi distribution with 5 degrees of freedom
print(np.random.chisquare(5,10))

[ 0.93615051  6.89481628  7.72542958  6.25459026  3.5903614  11.72614871
  5.14359619  1.78170814  7.00676104  2.62191512]


The above examples are the most common distribution and random values you will learn throughout the bootcamp. Now, let's deal with those arrays.

#### Exercise: Reverse one of the last vector we created before (first element becomes last) (★☆☆)
Try with `[::-1]`

In [15]:
vect = np.random.randint(0,100,10)
print(vect)
print(vect[::-1])

[82 17  4 41 71 92 67 60 28 40]
[40 28 60 67 92 71 41  4 17 82]


#### Exercise: Create a 5x5 identity matrix (★☆☆)

>Check the function `np.eye`(https://numpy.org/devdocs/reference/generated/numpy.eye.html)

In [16]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

#### Exercise: Find indexes of non-zero elements from [1,2,0,0,4,0] (★☆☆)

>Check the function `where`(https://numpy.org/devdocs/reference/generated/numpy.where.html)

In [17]:
v = np.array([1,2,0,0,4,0])
np.where(v == 0)[0]

array([2, 3, 5])

#### Exercise: Create a 10x10 array with random values and find the minimum and maximum values (★☆☆)

>Check the function `min` (https://numpy.org/devdocs/reference/generated/numpy.where.html) and `max` (https://numpy.org/devdocs/reference/generated/numpy.max.html)

In [18]:
matrand =  np.random.random((10, 10))
print(matrand)
print(np.amin(matrand))
print(np.amax(matrand))

[[0.4893697  0.46264989 0.99242977 0.19910849 0.40641801 0.58428378
  0.54931596 0.21165351 0.93265564 0.50379961]
 [0.57534985 0.81861849 0.46142539 0.53698752 0.90305705 0.35530724
  0.5424467  0.28357746 0.73202085 0.77797489]
 [0.64304713 0.43341682 0.67208372 0.72296411 0.95836428 0.06551834
  0.97936387 0.747455   0.30739881 0.80891194]
 [0.81366464 0.66862397 0.2259914  0.14424448 0.72477914 0.3455285
  0.55400281 0.53968182 0.25528184 0.61081227]
 [0.70892725 0.18712823 0.31439953 0.20492687 0.04065306 0.21386165
  0.69068622 0.57654475 0.66178501 0.86852351]
 [0.37992815 0.05528542 0.91797258 0.72529189 0.84456182 0.4230657
  0.50160984 0.08629873 0.20658612 0.09976872]
 [0.11966263 0.79433367 0.81703057 0.4237083  0.00932723 0.1252923
  0.79961266 0.60086388 0.71797102 0.99864697]
 [0.28020173 0.87118143 0.05110377 0.32467301 0.99264248 0.69381891
  0.29958817 0.78639366 0.61377939 0.75388362]
 [0.33780943 0.21718558 0.78338897 0.96735325 0.29651347 0.74235554
  0.96721561 0.

#### Exercise: Create a random vector of size 30 and find the mean value (★☆☆)

1.   Elemento de la lista
2.   Elemento de la lista



In [19]:
#matrand2 =  np.round(np.random.random(30)*100)
matrand2 =  np.random.random(30)*100
print(matrand2)
print(np.mean(matrand2))

[25.93127636 48.49566904 99.77030952 31.04303595 24.06496034 72.25718728
 84.08721774 54.24592104 88.36145399 45.46904071 80.42351146 53.43839439
 26.16530734 82.6575236  57.90359913 66.39946181 45.20569993 86.65534151
 12.1072661  44.51977653  6.87319438 30.75510608 18.22526151 82.28391424
 41.13044594 53.80929293 99.88884955 57.81026559 22.32659871  9.72850428]
51.73444623214676


#### Exercise: Define a function with your date of birth (yyyy/mm/dd) as imput, that returns a random array with the following dimensions: (★★☆)

$$yyyy-1900 \times |mm - dd|$$

In [20]:
def dateMatrix(mydate):
  data = mydate.split("/")
  f = int(data[0]) - 1900
  c = abs(int(data[1]) - int(data[2]))
  result = np.random.random((f,c))
  return result
arr = dateMatrix("2000/01/05")
print(arr)



[[0.33052919 0.15162577 0.05409636 0.8162672 ]
 [0.32335527 0.27781748 0.66228958 0.86526845]
 [0.66317858 0.48120752 0.91161567 0.07645536]
 [0.53200176 0.54540783 0.55557615 0.29598861]
 [0.2343418  0.29902253 0.66660014 0.09247575]
 [0.96175129 0.41398754 0.94529913 0.34823975]
 [0.31581661 0.13120595 0.98988473 0.79580526]
 [0.75082298 0.06242225 0.14834515 0.3983464 ]
 [0.52915742 0.10944834 0.17816516 0.362661  ]
 [0.75947105 0.30866378 0.78253263 0.01221727]
 [0.77828216 0.64897272 0.19872083 0.97580216]
 [0.12299243 0.9022331  0.52518187 0.5804199 ]
 [0.30701703 0.27724832 0.04898921 0.74903802]
 [0.11009545 0.76014304 0.88324298 0.01555231]
 [0.1442119  0.12219269 0.38454055 0.49086361]
 [0.46739395 0.15818834 0.26945721 0.5944593 ]
 [0.58719713 0.17039607 0.92924281 0.02615314]
 [0.90560916 0.39761818 0.88956835 0.38611815]
 [0.91707929 0.36981884 0.14888947 0.50127221]
 [0.15373638 0.26257556 0.71947624 0.28359334]
 [0.67090615 0.14163106 0.11245984 0.63546703]
 [0.45387475 

## What is the difference between Python List and a Numpy Array?

- Python list can contain elements with different data types, while Numpy Array‘s elements are always homogeneous (same data types).

- Numpy arrays are faster and more compact than Python lists.

## Why Numpy Arrays are faster than Lists?

- Numpy Arrays use fixed memory to store data and less memory than Python lists.

- Contiguous memory allocation in Numpy Arrays.

#### Exercise: Convert the list `my_list = [1, 2, 3]` to numpy array (★☆☆)

In [21]:
my_list = [ 1, 2, 3 ]
print(my_list)
print(type(my_list))
my_array = np.asarray(my_list)
print(type(my_array))
print(my_array)

[1, 2, 3]
<class 'list'>
<class 'numpy.ndarray'>
[1 2 3]


#### Exercise: Convert the tuple `my_list = (1, 2, 3)` to numpy array (★☆☆)

In [22]:
my_list = (1, 2, 3)
print(my_list)
print(type(my_list))
my_array = np.asarray(my_list)
print(type(my_array))
print(my_array)

(1, 2, 3)
<class 'tuple'>
<class 'numpy.ndarray'>
[1 2 3]


#### Exercise: Convert the list of tuples `my_list = [(1,2,3), (4,5)]` to numpy array (★☆☆)

In [23]:
my_list = [(1,2,3), (4,5)]
print(my_list)
print(type(my_list))
my_array = np.asanyarray(my_list,dtype='object')
print(my_array)
print(type(my_array))

[(1, 2, 3), (4, 5)]
<class 'list'>
[(1, 2, 3) (4, 5)]
<class 'numpy.ndarray'>


#### Exercise: Resize a random array of dimensions 5x12 into 12x5 (★☆☆)

>Check `reshape` from `numpy` (https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)

In [24]:
my_array = np.random.random((5,12))
print(my_array)
my_shape_array = np.reshape(my_array,(12,5))
print(my_shape_array)

[[0.59670497 0.38493279 0.91163556 0.34659848 0.02426525 0.64929023
  0.9659815  0.09685199 0.76927771 0.97316063 0.4263841  0.98251428]
 [0.97431717 0.33638249 0.00981882 0.20556468 0.19687164 0.62935312
  0.15661322 0.13668955 0.38669875 0.55841917 0.79039822 0.27510253]
 [0.81207702 0.96127246 0.21965847 0.92352731 0.83041857 0.09150516
  0.9278379  0.99338756 0.91702045 0.63925037 0.15970406 0.705582  ]
 [0.44071674 0.59082502 0.08281836 0.32576767 0.39453585 0.94314525
  0.12247703 0.34254715 0.08625545 0.5442902  0.00812444 0.49551835]
 [0.71231506 0.39730854 0.01572727 0.18685141 0.62112341 0.12974545
  0.90058904 0.74922757 0.25683366 0.58205327 0.24963062 0.88215367]]
[[0.59670497 0.38493279 0.91163556 0.34659848 0.02426525]
 [0.64929023 0.9659815  0.09685199 0.76927771 0.97316063]
 [0.4263841  0.98251428 0.97431717 0.33638249 0.00981882]
 [0.20556468 0.19687164 0.62935312 0.15661322 0.13668955]
 [0.38669875 0.55841917 0.79039822 0.27510253 0.81207702]
 [0.96127246 0.21965847 

#### Exercise: Create a function that normalize a 5x5 random matrix (★☆☆)

>Remember from probability (https://en.wikipedia.org/wiki/Normalization_(statistics)) that :
$$ x_{norm} = \frac{x - \bar{x}}{\sigma}$$


In [25]:
import numpy as np
def normalize(x):
  xmax, xmin = x.max(), x.min()
  x = (x - xmin)/(xmax - xmin)
  return x
x= np.random.random((5,5))
print("Original Array:")
print(x)
#xmax, xmin = x.max(), x.min()
#x = (x - xmin)/(xmax - xmin)
#print(x)
y = normalize(x)
print("After normalization:")
print(y)

Original Array:
[[0.25320898 0.63390458 0.21636345 0.30770933 0.57806675]
 [0.63326049 0.48223311 0.34872369 0.65247005 0.2139973 ]
 [0.75839871 0.0575221  0.52444093 0.85729965 0.80944647]
 [0.96065408 0.3620193  0.74535843 0.75258297 0.82135162]
 [0.80902603 0.84341926 0.11795545 0.78476463 0.29190497]]
After normalization:
[[0.21667584 0.63820405 0.17587834 0.27702179 0.57637716]
 [0.63749087 0.47026462 0.32243526 0.65876081 0.1732584 ]
 [0.77605115 0.         0.51699955 0.88555999 0.83257419]
 [1.         0.33715693 0.76161219 0.76961163 0.84575625]
 [0.83210865 0.87019082 0.06691531 0.80524502 0.25952228]]


## Stacking numpy arrays

Stacking is used to join a sequence of same dimension arrays along a new axis.

`numpy.stack(arrays,axis)` : It returns a stacked array of the input arrays which has one more dimension than the input arrays.

### You have two ways to do it:


![alt text](./assets/stack.jpeg "stack")


### or


![alt text](./assets/stack2.jpeg "stack")




#### Exercise: Generate two random arrays with integers and apply the stacking using `stack` (★★☆)

In [26]:
arr1 = np.random.random((3, 3))
print('arr1: \n',arr1)
arr2 = np.random.random((3,3))
print('\n arr2: \n',arr2)
arr_stack = np.stack((arr1, arr2))
print('\n arr_stack: \n',arr_stack)

arr1: 
 [[0.52940638 0.5552514  0.71730772]
 [0.46782776 0.63097616 0.77038899]
 [0.44280093 0.53148859 0.33000882]]

 arr2: 
 [[0.70500979 0.35816694 0.84507004]
 [0.96840054 0.2585141  0.18655587]
 [0.17503876 0.59114399 0.35866812]]

 arr_stack: 
 [[[0.52940638 0.5552514  0.71730772]
  [0.46782776 0.63097616 0.77038899]
  [0.44280093 0.53148859 0.33000882]]

 [[0.70500979 0.35816694 0.84507004]
  [0.96840054 0.2585141  0.18655587]
  [0.17503876 0.59114399 0.35866812]]]


#### Exercise: Generate two random arrays with integers and apply the stacking using `hstack` and `vstack` (★★☆)

In [27]:
arr1 = np.random.random((3,3))
print('arr1: \n',arr1)
arr2 = np.random.random((3,3))
print('\n arr2: \n',arr2)
arr_hstack = np.hstack((arr1, arr2))
print('\n arr_hstack: \n',arr_hstack)
arr_vstack = np.vstack((arr1, arr2))
print('\n arr_vstack: \n',arr_vstack)

arr1: 
 [[0.61740111 0.82602318 0.73210484]
 [0.77305462 0.46480616 0.11959146]
 [0.43022698 0.69785894 0.0682356 ]]

 arr2: 
 [[0.14342003 0.18917409 0.45833436]
 [0.16377405 0.90016411 0.99973938]
 [0.61438531 0.64485674 0.95687378]]

 arr_hstack: 
 [[0.61740111 0.82602318 0.73210484 0.14342003 0.18917409 0.45833436]
 [0.77305462 0.46480616 0.11959146 0.16377405 0.90016411 0.99973938]
 [0.43022698 0.69785894 0.0682356  0.61438531 0.64485674 0.95687378]]

 arr_vstack: 
 [[0.61740111 0.82602318 0.73210484]
 [0.77305462 0.46480616 0.11959146]
 [0.43022698 0.69785894 0.0682356 ]
 [0.14342003 0.18917409 0.45833436]
 [0.16377405 0.90016411 0.99973938]
 [0.61438531 0.64485674 0.95687378]]


## Basic maths in numpy

You can make typical math operations like:

- Addition,Subtraction,Multiplication and Division between two arrays using numpy.
- Operation on arrays using sum() & cumsum() function.
- Minimum and Maximum value from an array.
- Exponent/Power , Square Root and Cube Root functions.

or even apply common trigonometric functions:

- `numpy.sin()`:  Sine (x) Function
- `numpy.cos()`: Cosine(x) Function
- `numpy.tan()`: Tangent(x) Function
- `numpy.sinh()`: Hyperbolic Sine (x) Function
- `numpy.cosh()`: Hyperbolic Cosine(x) Function
- `numpy.tanh()`: Hyperbolic Tangent(x) Function
- `numpy.arcsin()`: Inverse Sine(x) Function
- `numpy.arccos()`: Inverse Cosine(x) Function
- `numpy.arctan()`: Inverse Tangent(X) Function
- `numpy.pi`: Pi value π
- `numpy.hypot(w,h)`: For calculating Hypotenuse $c = \sqrt{(w^2 + h^2)}$
- `numpy.rad2deg()`: Radians to degrees
- `numpy.deg2rad()`: Degrees to radians

#### Exercise: Generate two random 8 - dimensional vectors and apply the most common operation between vectors:  addition, substraction, multiplication, division(★☆☆)

>Check the math functions here: https://numpy.org/doc/stable/reference/routines.math.html

In [51]:
#arr1 = np.random.random((1,8))
#arr2 = np.random.random((1,8))
arr1 = np.round(np.random.random((1,8))*100)
arr2 = np.round(np.random.random((1,8))*100)
print(arr1)
print(arr2)
print("sum: \n",np.add(arr1, arr2))
print("subtract: \n",np.subtract(arr1, arr2))
print("multiply: \n",np.multiply(arr1, arr2))
print("divide: \n",np.divide(arr1, arr2))


[[13. 22. 17. 56.  9. 51. 45. 44.]]
[[98. 75. 26. 38. 58. 93. 89. 85.]]
sum: 
 [[111.  97.  43.  94.  67. 144. 134. 129.]]
subtract: 
 [[-85. -53.  -9.  18. -49. -42. -44. -41.]]
multiply: 
 [[1274. 1650.  442. 2128.  522. 4743. 4005. 3740.]]
divide: 
 [[0.13265306 0.29333333 0.65384615 1.47368421 0.15517241 0.5483871
  0.50561798 0.51764706]]


#### Exercise: Generate two random matrices with dimensions between 5 and 10. For example, try 5x7 vs 8x9. Were you able to do the matrix multiplication? why? (★★☆) 

In [29]:
arr1 = np.round(np.random.random((5,7))*100)
arr2 = np.round(np.random.random((8,9))*100)
print(arr1)
print(arr2)
#print("multiply: \n",np.multiply(arr1, arr2)) #Not be able because not sabe files or columns


[[ 69.  25.  82.  57.  10.  63.  48.]
 [ 59.  83.  57.  16.  39.  13.   4.]
 [ 60.  18.  49.  37.  55.  71.  75.]
 [ 12.  97.  23.  11.  82.  54.  50.]
 [ 74.  52.  36. 100.  87.  78.  80.]]
[[29. 28. 33. 86. 16.  3. 36. 86. 44.]
 [ 3. 37. 95. 26. 92. 38. 36. 13.  9.]
 [80. 63. 21.  5. 39. 44. 13.  4. 50.]
 [ 0. 40. 86. 30. 53. 74. 40. 58. 71.]
 [41. 22. 37.  5. 16. 49. 41. 56. 25.]
 [58. 88. 73. 29. 61. 89. 67. 36. 22.]
 [56. 82. 54.  5. 45. 81. 50. 48.  8.]
 [25. 48. 63. 98. 36. 20. 40. 48. 11.]]


#### Exercise: Given 2 numpy arrays as matrices, output the result of multiplying the 2 matrices (as a numpy array) Were you able to do the matrix multiplication? (★★☆) 

$$ a = \left(\begin{matrix}
0 & 1 & 2\\ 
3 & 4 & 5\\ 
6 & 7 & 8
\end{matrix}\right)$$

$$ b = \left(\begin{matrix}
2 & 3 & 4\\ 
5 & 6 & 7\\ 
8 & 9 & 10
\end{matrix}\right)$$




In [30]:
'If the arrays has sabe files or columns, we can do the multiplication'

'If the arrays has sabe files or columns, we can do the multiplication'

#### Exercise: Multiply a 5x3 matrix by a 3x2 matrix (real matrix product) (★★☆)

In [52]:
#arr1 = np.round(np.random.random((5,3))*10)
#arr2 = np.round(np.random.random((3,2))*10)
#rng = np.random.default_rng(0)
#arr1 = rng.integers(10, size=(5, 3))
#arr2 = rng.integers(10, size=(3, 2))
arr1 = np.random.randint(10, size=(5, 3))
arr2 = np.random.randint(10, size=(3, 2))
arr3 = np.random.randint(10, size=(3, 3))
arr4 = np.random.randint(10, size=(3, 3))
print("arr1: \n",arr1)
print("arr2: \n",arr2)
print(arr1.shape)
print(arr2.shape)
print(arr3.shape)
print(arr4.shape)

#requiere num columnas de A = num filas de B (multiplicación regular)
print("matmul: \n",np.matmul(arr1, arr3))
#requiere identicos rxc (Haddamart)
print("multiply: \n",np.multiply(arr3, arr4))
#
print("dot: \n",np.dot(arr1, arr2))

arr1: 
 [[3 8 9]
 [8 2 3]
 [0 2 7]
 [0 4 9]
 [8 5 0]]
arr2: 
 [[6 1]
 [8 6]
 [7 5]]
(5, 3)
(3, 2)
(3, 3)
(3, 3)
matmul: 
 [[120 103  75]
 [ 73  78  26]
 [ 69  30  18]
 [ 93  50  36]
 [ 55  96  53]]
multiply: 
 [[35 35  7]
 [ 3 48 18]
 [72  8  0]]
dot: 
 [[145  96]
 [ 85  35]
 [ 65  47]
 [ 95  69]
 [ 88  38]]


## Data types

Do you think the following preposition is true?

`8==8`

Surely you will definetely say yes, which is true mathematically, but computationally it is not always the same, at least in terms of memory. For example, run the following cell:

In [48]:
import sys

# int64
x = np.array(123)
print("int64: " + str(sys.getsizeof(x)))

# int8
x = np.array(123,dtype=np.int8)
print("int8: " + str(sys.getsizeof(x)))

# float32
x = np.array(123,dtype=np.float32)
print("float32: " + str(sys.getsizeof(x)))

int64: 96
int8: 89
float32: 92


#### It turns out that there are many computational representation of the same number and you can create arrays with different Data Types (dtypes) depending on what you need:

- Boolean : `np.bool_`
- Char : `np.byte`
- Short : `np.short`
- Integer : `np.short`
- Long : `np.int_`
- Float : `np.single`&np.float32`
- Double :`np.double`&`np.float64`
- `np.int8`: integer (-128 to 127)
- `np.int16`:integer( -32768 to 32767)
- `np.int32`: integer(-2147483648 to 2147483647)
- `np.int64`:integer( -9223372036854775808 to 9223372036854775807)


Sometimes, you will need to load, create or export arrays from different data types.


## Harder exercises

The next exercises are related with real situations you could face while you are working in data science and machine learning. Also, we will be frequently talking about matrices and bidimensional arrays.

#### Exercise: Subtract the mean of each row of a matrix (★★☆)

In [None]:
print("Original matrix:\n")
X = np.random.rand(5, 10)
print(X)
print("\nSubtract the mean of each row of the said matrix:\n")
Y = X - X.mean(axis=1, keepdims=True)
print(Y)
rng = np.random.default_rng(0)
x = rng.integers(100, size=(5, 3))
print(x)
print("\nSubtract the mean of each row of the said matrix:\n")
y = x - x.mean(axis=1, keepdims=True)
print(y)

#### Exercise: How to get the dates of yesterday, today and tomorrow? (★★☆)

>Check `np.datetime64`, `np.timedelta64` in numpy (https://numpy.org/doc/stable/reference/arrays.datetime.html)

In [None]:
yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
print("Yestraday: ",yesterday)
today     = np.datetime64('today', 'D')
print("Today: ",today)
tomorrow  = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
print("Tomorrow: ",tomorrow)

#### Exercise: How to get all the dates corresponding to the month of December 2022? (★★☆)
Combine `arange`with `datetime`


In [None]:
days = np.arange('2022-12', '2023-01', dtype='datetime64[D]')
print(days)

#### Exercise: Extract the integer part of a random array of positive numbers using 2 different methods (★★☆)

In [None]:
import numpy as np
arr1 = np.round(np.random.random((5,3))*10)
print(arr1)
arr2 = np.random.random((5,3))*100
print(arr2)
print(np.round(arr2))

#### Exercise: Create a 5x5 matrix with row values ranging from 0 to 4 (★★☆)

In [None]:
mat = np.random.randint(5, size=(5, 5))
print(mat)

#### Exercise: Consider a generator function that generates 10 integers and use it to build an array (★★☆)

In [None]:
#num = np.random.randint(10, size=(5, 5))
num = np.random.randint(100, size=(1,10))
print(num)

#### Exercise: Create a vector of size 10 with values ranging from 0 to 1, both excluded (★★☆)

In [None]:
x = np.linspace(0,1,12,endpoint=True)[1:-1]
print(x)

#### Exercise: Create a random vector of size 10 and sort it (★★☆)

In [None]:
x = np.random.random(10)
print("Original array:")
print(x)
x.sort()
print("Sorted array:")
print(x)

#### Exercise: Consider two random arrays A and B, check if they are equal (★★☆)

In [None]:
x = np.random.randint(0,2,6)
print("First array:")
print(x)
y = np.random.randint(0,2,6)
print("Second array:")
print(y)
print("Test above two arrays are equal or not!")
array_equal = np.allclose(x, y)
print(array_equal)

#### Exercise: Consider a random 10x2 matrix representing cartesian coordinates, convert them to polar coordinates (★★★)
>Suggestion: check how to calculate the "square of a matrix"

In [None]:
z= np.random.random((10,2))
x,y = z[:,0], z[:,1]
r = np.sqrt(x**2+y**2)
t = np.arctan2(y,x)
print(r)
print(t)

#### Exercise: Create random vector of size 10 and replace the maximum value by 0 (★★☆)

In [None]:
import numpy as np
x = np.random.random(10)
print("Original array:")
print(x)
x[x.argmax()] = 0
print("Maximum value replaced by 0:")
print(x)


#### Exercise: How to print all the values of an array? (★★☆)

In [None]:
import numpy as np
my_array = np.arange(1001)
print(my_array)
np.set_printoptions(threshold=np.inf)
print(my_array)

#### Exercise: How to convert a float (32 bits) array into an integer (32 bits) in place?
>Check: https://stackoverflow.com/a/4396247/5989906

In [None]:
x = np.arange(10, dtype='int32')
print(x)
y = x.view('float32')
y[:] = x
print(y)

x = np.arange(10, dtype='float32')
print(x)
y = x.view('int32')
y[:] = x
print(y)




#### Exercise: Subtract the mean of each row of a matrix (★★☆)

In [None]:
print("Original matrix:\n")
X = np.random.rand(5, 10)
print(X)
print("\nSubtract the mean of each row of the said matrix:\n")
Y = X - X.mean(axis=1, keepdims=True)
print(Y)

#### Exercise: How to sort an array by the nth column? (★★☆)

In [None]:
print("Original array:\n")
nums = np.random.randint(0,10,(3,3))
print(nums)
print("\nSort the said array by the nth column: ")
print(nums[nums[:,2].argsort()])

#### Exercise: Find the position of the minimum of a 2D matrix (★★☆)

In [None]:
print("Original array:\n")
#x = np.random.random((2,5))*100
x = np.random.random((2,5))*100
print(x)
print("Min in array:\n")
m = np.amin(x)
print(m)
print("Position of Min in array:\n")
pos = np.argmin(x)
print(pos)


#### Exercise: Read an image using openCV, check its dimensions, normalize the numbers and show the image (★★★)

>Check: https://www.geeksforgeeks.org/python-opencv-cv2-imread-method/

In [None]:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/')  #change dir
!pwd
import cv2
from google.colab.patches import cv2_imshow

  
# path
#path = r'C:\Users\Rajnish\Desktop\geeksforgeeks.png'
path = r'logo_2.png'
  
# Using cv2.imread() method
img = cv2.imread(path)
cv2_imshow(img)
# Displaying the image
#cv2.imshow('image', img)

#### Exercise: Considering a four dimensions array, how to get the sum over the last two axis at once? (★★★)

In [None]:
A = np.random.randint(0,10,(3,4,3,4))
print(A)
sum = A.reshape(A.shape[:-2] + (-1,)).sum(axis=-1)
print(sum)

#### Exercise: How to get the diagonal of a dot product? (★★★)

In [None]:
#a = np.random.rand(1000,200)
#b = np.random.rand(200,200)
a = np.random.rand(3,4)
b = np.random.rand(4,3)
print("dot: \n",np.dot(a, b))
print("diagonal: \n",np.diag(np.dot(a, b)))
print("a: \n",a)
print("b: \n",b)

#### Exercise: Consider an array of dimension (5,5,3), how to mulitply it by an array with dimensions (5,5)? (★★★)

In [None]:
A = np.ones((5,5,3))
B = 2*np.ones((5,5))
print("A: \n",A)
print("B: \n",B)
print("A*B: \n",A * B[:,:,None])

#### Exercise: How to swap two rows of an array? (★★★)

In [None]:
A = np.arange(25).reshape(5,5)
print("A: \n", A)
A[[0,1]] = A[[1,0]]
print("A swap: \n", A)
#A[[3,4]] = A[[4,3]]
#print("A swap: \n", A)

#### Exercise: Read an image using openCV and tranpose it. What did you get exactly? Was the image rotated? Moved? Reflected with respect to an axis? (★★★)

In [None]:
from google.colab import drive
drive.mount('/content/drive')
import os
os.chdir('/content/drive/My Drive/Colab Notebooks/')  #change dir
!pwd
# importing cv2
import cv2
from google.colab.patches import cv2_imshow
# path
path = r'logo_2.png'
  
# Using cv2.imread() method
img = cv2.imread(path)
cv2_imshow(img)
  
# Reading an image in default mode
src = cv2.imread(path)
  
# Window name in which image is displayed
window_name = 'Image'
  
# Using cv2.transpose() method
image = cv2.transpose(src)
  
# Displaying the image
#cv2_imshow(window_name, image)
cv2_imshow(image)
cv2.waitKey(0)

#### Exercise: Consider an array Z = [1,2,3,4,5,6,7,8,9,10,11,12,13,14], how to generate an array R = [[1,2,3,4], [2,3,4,5], [3,4,5,6], ..., [11,12,13,14]]? (★★★)

In [None]:
from numpy.lib import stride_tricks
Z = np.arange(1,15,dtype="uint32")
R = stride_tricks.as_strided(Z,(11,4),(4,4))
print("Z: \n",Z)
print("R: \n",R)

#### Exercise: How to find the most frequent value in an array? (★★★)

In [None]:
x = np.random.randint(0, 10, 40)
print("Original array:")
print(x)
print("Most frequent value in the above array:")
print(np.bincount(x).argmax())

#### Exercise: How to get the n largest values of an array (★★★)

In [None]:
import numpy as np
x = np.arange(10)
print("Original array:")
print(x)
np.random.shuffle(x)
n = 1
print (x[np.argsort(x)[-n:]])

#### Exercise: Consider a large vector Z, compute Z to the power of 3 using 3 different methods (★★★)

In [None]:
x = np.random.rand(57)
%timeit np.power(x,3)
#1 loops, best of 3: 574 ms per loop
%timeit x*x*x
#1 loops, best of 3: 429 ms per loop
%timeit np.einsum('i,i,i->i',x,x,x)
#1 loops, best of 3: 244 ms per loop

#### Exercise: Given a two dimensional array, how to extract unique rows? (★★★)

In [None]:
Z = np.random.randint(0,2,(6,3))
T = np.ascontiguousarray(Z).view(np.dtype((np.void, Z.dtype.itemsize * Z.shape[1])))
_, idx = np.unique(T, return_index=True)
uZ = Z[idx]
print("Z: \n",Z)
#print("T: \n",T)
print("uZ: \n",uZ)

#### Exercise: Can you have an array of strings? Can you mix different data types in the same array? Can you operate (add, sub, mult) arrays with different data types? (★★★)

In [None]:
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])
print(type(arr))

array_example = np.array([[[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0 ,1 ,2, 3],[4, 5, 6, 7]]])
print("array_example: \n", array_example)
#To find the number of dimensions of the array, run:
print("dimensions: \n",array_example.ndim)
#To find the total number of elements in the array, run:
print("total elements: \n",array_example.size)
#And to find the shape of your array, run:
print("shape: \n",array_example.shape)
array_example2 = np.array([[[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0, 1, 2, 3],[4, 5, 6, 7]],
                          [[0 ,1 ,2, 3],[4, 5, 6, 7]]])

print(np.add(array_example,array_example2))
#NOTE: The types of the arrays must be equal