**Topics**:
* Numpy:
 - Basic functions (np.mean, np.std, np.linspace, np.arange, np.random.choice, np.random.int, np.random.rand, np.where)
 - Array concatenation
 - Operations between Arrays
* Pandas:
 - Arrays Series vs Numpy
 - Series Initialization
 - Basic Operations with Series
 - Series concatenation

In [7]:
import numpy as np
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

# Numpy (Parte II)

## Basics Functions 


Numpy has some functions and sub-modules that help the scientist's day-to-day.

In [None]:
# Mean
np.mean([1,2,3,4])

2.5

In [None]:
# stardard desviation
np.std([1,2,3,4])

1.118033988749895

np.arange creates a spaced vector between two numbers

In [None]:
# np.arange
np.arange(start=1, stop=20, step=2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

np.linspace generates an array of n numbers between two values.

In [None]:
# np.linspace
np.linspace(start=1, stop=10, num=50)

array([ 1.        ,  1.18367347,  1.36734694,  1.55102041,  1.73469388,
        1.91836735,  2.10204082,  2.28571429,  2.46938776,  2.65306122,
        2.83673469,  3.02040816,  3.20408163,  3.3877551 ,  3.57142857,
        3.75510204,  3.93877551,  4.12244898,  4.30612245,  4.48979592,
        4.67346939,  4.85714286,  5.04081633,  5.2244898 ,  5.40816327,
        5.59183673,  5.7755102 ,  5.95918367,  6.14285714,  6.32653061,
        6.51020408,  6.69387755,  6.87755102,  7.06122449,  7.24489796,
        7.42857143,  7.6122449 ,  7.79591837,  7.97959184,  8.16326531,
        8.34693878,  8.53061224,  8.71428571,  8.89795918,  9.08163265,
        9.26530612,  9.44897959,  9.63265306,  9.81632653, 10.        ])

In [None]:
# np.where 
x = np.arange(9).reshape(3, 3)
x


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [None]:
np.where( x > 2 )

(array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))

In [None]:
x[np.where(x>2)]

array([3, 4, 5, 6, 7, 8])

Numpy's random submodule can be used to generate random numbers, select and shuffle numbers.

In [None]:
# Random choice 
np.random.choice([1,2,3,4,5], 10)

array([5, 3, 2, 4, 2, 2, 3, 3, 5, 4])

In [None]:
# Shuffle
x = [1,2,3]
np.random.shuffle(x)

In [None]:
x

[3, 1, 2]

## Concat Arrays

We can join 2 or more arrays, laterally or vertically.

In [None]:
array = np.arange(9)
array

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [None]:
array2D_1 = array.reshape((3,3))
array2D_1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [None]:
array2D_2 = np.arange(10,19).reshape(3,3)

In [None]:
# vertical concat
np.concatenate((array2D_1, array2D_2))

NameError: name 'np' is not defined

In [None]:
# horizontal concat
np.concatenate((array2D_1,array2D_2),axis=1)

array([[ 0,  1,  2, 10, 11, 12],
       [ 3,  4,  5, 13, 14, 15],
       [ 6,  7,  8, 16, 17, 18]])

In [None]:
# concat 2 or more arrays
np.concatenate((array2D_1, array2D_2, array2D_1, array2D_1))

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [10, 11, 12],
       [13, 14, 15],
       [16, 17, 18],
       [ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8]])

In [None]:
array2D_1

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [None]:
np.concatenate((array2D_1, np.arange(4).reshape(2,2)))

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 2

## Math operations with  Arrays

In [None]:
array1 = np.arange(9)
array2 = np.arange(9)

In [None]:
# Sum
array1 + array2

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16])

In [None]:
# Sub
array1 - array2

array([0, 0, 0, 0, 0, 0, 0, 0, 0])

In [None]:
# Division
array1/array2

  array1/array2


array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

In [None]:
# Multiplication
array1*array2

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64])


And if we try to do operations with lists???

In [None]:
[1,2,3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [None]:
[1,2,3]*[4, 5, 6]

TypeError: can't multiply sequence by non-int of type 'list'

In [None]:
[1,2,3]/[4, 5, 6]

TypeError: unsupported operand type(s) for /: 'list' and 'list'

In [None]:
[1,2,3]-[4, 5, 6]

TypeError: unsupported operand type(s) for -: 'list' and 'list'

# Pandas (Part I)

The Series is Pandas' base object, just as the array is numpy's.

* Three most important structures:
    + `Series`
    + `DataFrame`
    + `Index`

## Series vs Numpy

* Preview
* Index

In [None]:
# Visualization
# Numpy
np.array([1,2,3])

array([1, 2, 3])

In [None]:
pd.Series([1,2,3])

0    1
1    2
2    3
dtype: int64


Numpy brings us a horizontal (1D) format. Series are Columns.

* **Index**


In Numpy arrays have numeric indices (0, 1, 2, 3). In Pandas the indices can be numeric and also strings.

In [None]:
series_ex = pd.Series([1,2,3,4], index=['num_1', 'num_2', 'num_3', 'num_4'])

In [None]:
series_ex

numero_1    1
numero_2    2
numero_3    3
numero_4    4
dtype: int64

We can access the data as we did in lists and arrays, through the index.


In [None]:
series_ex[0]

1

In [None]:
series_ex['num_1']

1

In [None]:
series_ex['num_1':'num_3']

numero_1    1
numero_2    2
numero_3    3
dtype: int64

In [None]:
series_ex[0:3]

numero_1    1
numero_2    2
numero_3    3
dtype: int64

Podemos acessar seus valores



In [None]:
series_ex.values

array([1, 2, 3, 4])

Podemos acessar seus índices

In [None]:
series_ex.index

Index(['numero_1', 'numero_2', 'numero_3', 'numero_4'], dtype='object')

## Series inicialization

* We can build `Series` from scratch. The general way of doing this is as follows:

```python
pd.Series(data, index = index)
```
* `index` is an optional argument and `data` can be any number of things

In [None]:
# Series with just one value
pd.Series(5, index = [1, 2, 6]) 

1    5
2    5
6    5
dtype: int64

In [None]:
# Dictionary 
pd.Series({2 : 'a', 1 : 'b', 3 : 'c'}) 

2    a
1    b
3    c
dtype: object

## Basic Operations with Series

In [None]:
series1 = pd.Series([1,2,3], index=['meat', 'eggs', 'sheep'])
series2 = pd.Series([1,2,3], index=['eggs', 'meat', 'sheep'])

In [None]:
# Sum
series1 + series2

carne       NaN
carneiro    4.0
ovelha      6.0
ovo         NaN
dtype: float64

In [None]:
# Division 
series1/series2

carne       NaN
carneiro    1.0
ovelha      1.0
ovo         NaN
dtype: float64

In [None]:
# Subtração
series1 - series2

carne       NaN
carneiro    0.0
ovelha      0.0
ovo         NaN
dtype: float64

In [None]:
# Multiplicação
series1*series2

carne       NaN
carneiro    4.0
ovelha      9.0
ovo         NaN
dtype: float64

O que é NaN ?

Todas as operações são baseadas nos índices.

In [None]:
# filtros
series1 > 3

carne       False
carneiro    False
ovelha      False
digital      True
cabra        True
dtype: bool

o Filtro é como se estivéssemos comparando elemento por elemento e fazendo a comparação desejada.

In [None]:
# se quisermos conhecer os valores
series1[series1 > 3]

digital    23
cabra      10
dtype: int64

In [None]:
series1

carne        1
carneiro     2
ovelha       3
digital     23
cabra       10
dtype: int64

In [None]:
# Se quisermos a negação da condição
series1[~(series1>2)]

carne       1
carneiro    2
dtype: int64

O que ganhamos em relação aos dicionários?

In [None]:
series1.sum()

29

In [None]:
series1.std()

10.53169818531972

In [None]:
series1.max()

23

In [None]:
series1.min()

1

E se quisermos criar um novo elemento com um novo índice?

In [None]:
series1['cabra'] = 10

In [None]:
series1

carne        1
carneiro     2
ovelha       3
digital     23
cabra       10
dtype: int64

In [None]:
series2

ovo         1
carneiro    2
ovelha      3
dtype: int64

In [None]:
# value counts
series1.value_counts()

23    1
10    1
3     1
2     1
1     1
dtype: int64

Podemos checar se o valor de uma series existe em outra series ou lista.

In [None]:
#isin
#
series1.isin(series2)

carne        True
carneiro     True
ovelha       True
digital     False
cabra       False
dtype: bool

In [None]:
# operador 
series1

carne        1
carneiro     2
ovelha       3
digital     23
cabra       10
dtype: int64

In [None]:
series2

ovo         1
carneiro    2
ovelha      3
dtype: int64

## Concatenação de Series

In [None]:
series1 

carne        1
carneiro     2
ovelha       3
digital     23
cabra       10
dtype: int64

In [None]:
series2

ovo         1
carneiro    2
ovelha      3
dtype: int64

In [None]:
pd.concat([series1,series2], axis=1)

Unnamed: 0,0,1
carne,1.0,
carneiro,2.0,2.0
ovelha,3.0,3.0
digital,23.0,
cabra,10.0,
ovo,,1.0


In [None]:
pd.concat([series1,series2], axis=0)

carne        1
carneiro     2
ovelha       3
digital     23
cabra       10
ovo          1
carneiro     2
ovelha       3
dtype: int64

# Exercício:

In [None]:
import pandas as pd

In [None]:
ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

Trazer como 2 series os valores que estão em ser1 e não estão em ser2 e vice-versa.

In [None]:
ser1[~(ser1.isin(ser2))]

0    1
1    2
2    3
dtype: int64