<a href="https://colab.research.google.com/github/tarsojabbes/basics-data-analysis/blob/main/IntroductionToPandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Pandas**

In [1]:
import pandas as pd

## **Pandas Series**

Series will deal with 1D arrays, and we can use "index" to determine the way were going to access the data of these arrays.

In [23]:
example_1 = pd.Series([2,3,4,5], index=['a', 'b', 'c', 'd'])

We can access all the values and the indexes of the array with .values and .index

In [24]:
example_1.values

array([2, 3, 4, 5])

In [25]:
example_1.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [26]:
# Basic usage to access the value by the index
example_1['a']

2

The array slicing also works here

In [27]:
example_1['a':'c']

a    2
b    3
c    4
dtype: int64

We can create series by first creating a dictionary in Python and passing it to pd.Series()

In [28]:
grades_dict = {'A': 4, 'A-': 3.5, 'B': 3, 'B-': 2.5, 'C': 2}
grades = pd.Series(grades_dict)

In [29]:
marks_dict = {'A': 85, 'A-': 75, 'B': 65, 'B-': 55, 'C': 45}
marks = pd.Series(marks_dict)

Although we have explicit indexes such as A, A-, B, we can also use numbers to do a slicing

In [30]:
marks[0:2] # This will work the same way as marks['A':'B']

A     85
A-    75
dtype: int64

## **Pandas DataFrame**

In [33]:
example_dataframe = pd.DataFrame({'Marks':marks, 'Grades': grades})

Because Marks and Grades have the same indexes, our data frame will be a table containing the indexes and their values.

In [36]:
example_dataframe

Unnamed: 0,Marks,Grades
A,85,4.0
A-,75,3.5
B,65,3.0
B-,55,2.5
C,45,2.0


In [43]:
example_dataframe.T # This will transpose our table

Unnamed: 0,A,A-,B,B-,C
Marks,85.0,75.0,65.0,55.0,45.0
Grades,4.0,3.5,3.0,2.5,2.0


So, we can notice that DataFrame are actually a 2D array, and we can access the values and indexes using the normal notation we're used to use. It's possible to access the columns either.

In [44]:
example_dataframe.columns

Index(['Marks', 'Grades'], dtype='object')

Appending new dictionaries to DataFrame

In [46]:
example_dataframe['Scaled Marks'] = 100*(example_dataframe['Marks']/90)

In [47]:
example_dataframe

Unnamed: 0,Marks,Grades,Scaled Marks
A,85,4.0,94.444444
A-,75,3.5,83.333333
B,65,3.0,72.222222
B-,55,2.5,61.111111
C,45,2.0,50.0


Conditional slicing in DataFrame

In [48]:
copy = example_dataframe[example_dataframe['Marks'] > 60]

In [49]:
copy

Unnamed: 0,Marks,Grades,Scaled Marks
A,85,4.0,94.444444
A-,75,3.5,83.333333
B,65,3.0,72.222222


## **Pandas Missing Values**

In [53]:
missing = pd.DataFrame([{'a': 1, 'b': 2}, {'b': 3, 'c': 4}])

There's no 'a' in the second dict and neither 'c' in the first, so we'll get a NaN value/missing value

In [54]:
missing

Unnamed: 0,a,b,c
0,1.0,2,
1,,3,4.0


Using .fillna(value) will fill all NaN with the specified value

In [55]:
missing.fillna(0)

Unnamed: 0,a,b,c
0,1.0,2,0.0
1,0.0,3,4.0


Implicit and Explicit Indexes with loc and iloc

In [56]:
data = pd.Series(['a', 'b','c'], index=[1,3,5])


In [57]:
data.loc[1] # This will return the value for the explicit value

'a'

In [58]:
data.iloc[1] # This, the implicit, so: the normal indexing way for arrays

'b'

In [59]:
data.loc[1:3] # explicit index slicing

1    a
3    b
dtype: object

In [61]:
data.iloc[1:3] # implicit index slicing

3    b
5    c
dtype: object