## **Pandas Fundamentals**

## DataFrame & Series
1. **DataFrame**
    - Two-dimensional, tabular data structure.
    - Rows & Columns - each column can be different datatypes.
    - Container for storing the manipulated data in labeled index.
2. **Series**
    - One-dimensional labeled array which consists of data values with associated index.
    - Values of single column or collections of all the rows of single column.
3. **Integer Location [iloc]**
    - Integer-based indexing, used for selection by the position.
    - Allows the selection of data using integer indices of rows and columns.
    - While slicing excludes the end range. 
4. **Label Location [loc]**
    - Label-based indexing, used for selection by the labels.
    - Allows the selection of data using labels of rows and columns.
    - While slicing includes the end range. 

### 1. Importing the packages

In [13]:
import pandas as pd 

### 2. Creating a the dictionary and dataFrame

In [14]:
people = {
    'first_name': ['Magic', 'Old', 'Jack', 'Johnnie'],
    'last_name': ['Moments', 'Monk', 'Daniels', 'Walker'],
    'email_ID': ['mmoments@gmail.com', 'oldmonk@yahoo.com', 'jackdaniel@jack.com', 'Miki@tyson.com'],
    'contact_no': [9823983242, 4327847238, 6739297423, 7889732131]
}

people_df = pd.DataFrame(people)
people_df

Unnamed: 0,first_name,last_name,email_ID,contact_no
0,Magic,Moments,mmoments@gmail.com,9823983242
1,Old,Monk,oldmonk@yahoo.com,4327847238
2,Jack,Daniels,jackdaniel@jack.com,6739297423
3,Johnnie,Walker,Miki@tyson.com,7889732131


### 3. Accessing the dictionary
1. Definition
- Data type of key-value pair which aslo be heterogeneous
- Accessed by only using the key.
- Can't be accessed using the dot notation

In [15]:
#---><Dictionary><---
people['first_name']
people['last_name']
people['email_ID']
people['contact_no']

[9823983242, 4327847238, 6739297423, 7889732131]

### 4. Accessing the dataframe
- Dataframe is collection of Series.
- Can be accessed by both key and dot notations.


In [16]:
#Accessing the dataframe by key (bracket notation)--- Best way of accessing the df----
people_df['email_ID']
people_df['first_name']
people_df['last_name']
people_df['contact_no']


0    9823983242
1    4327847238
2    6739297423
3    7889732131
Name: contact_no, dtype: int64

In [17]:
#Accessing the dataframe using dot notation----Not a good way-----
#---------Due to the methods/functions might be in name of the column of the df-------
people_df.email_ID
people_df.first_name
people_df.last_name
people_df.contact_no

0    9823983242
1    4327847238
2    6739297423
3    7889732131
Name: contact_no, dtype: int64

### 5. Accessing Multiple Columns
- Insert the list with names of the column into the bracket notation with the dataframe name. 
- Series can't be accessed for multiple way. 
    - **Reason:** It's the values of all the rows of a single column/ One-dimensional array.

In [18]:
#Listing the columns
people_df.columns

people_df[['first_name', 'last_name', 'contact_no']]

#Brings error if you execute the below-----columns names without the inner bracket
#people_df['first_name', 'last_name']

Unnamed: 0,first_name,last_name,contact_no
0,Magic,Moments,9823983242
1,Old,Monk,4327847238
2,Jack,Daniels,6739297423
3,Johnnie,Walker,7889732131


### 6. Integer Location [iloc] operations

In [19]:
#accessing the rows using index ---><list all the elements in the 0th index><---
people_df.iloc[0]

#accesing the specific value in the row ---><list the 2nd index row and 2nd index column value [email_ID]><---
people_df.iloc[2,2]

#Accesing using iloc as a dataframe ---><results the 0th index as a dataframe><---
people_df.iloc[[0]]

#Accessing multiple rows using iloc ---><results the 0th, 1st and 2nd index row values in a dataframe><---
people_df.iloc[[0,1,2]]

#Access the multiple rows with specific column using iloc ---><results the 0th and 2nd row values with 0th and 2nd index columns ><---
people_df.iloc[[0,2], [0,3]]

#Specific rows (1st index) with selective columns(0th, 2nd and 3rd index)
people_df.iloc[[1],[0,2,3]]


Unnamed: 0,first_name,email_ID,contact_no
1,Old,oldmonk@yahoo.com,4327847238


### 6A. Slicing Operation using iloc

In [20]:
#To print the entire dataframe
people_df.iloc[:]

#All rows of first two columns of the dataframe
people_df.iloc[:,:2]

#All columns with first two rows of the dataframe
people_df.iloc[:2,:]

#Specific rows (0 and 2nd index) with their first two columns
people_df.iloc[[0,2],:2]

#Specific columns (1st and 3rd index) of with all the rows
people_df.iloc[:,[1,3]]



Unnamed: 0,last_name,contact_no
0,Moments,9823983242
1,Monk,4327847238
2,Daniels,6739297423
3,Walker,7889732131


### 7. Label Location [loc] operations

In [21]:
#Specific column with all the rows
people_df.loc[:, ['email_ID']]

#Specific rows with all columns
people_df.loc[[3], :]

#Multiple rows with all columns
people_df.loc[[0,2,3], :]

#Multiple columns with all rows
people_df.loc[:, ['first_name', 'email_ID', 'last_name']]

Unnamed: 0,first_name,email_ID,last_name
0,Magic,mmoments@gmail.com,Moments
1,Old,oldmonk@yahoo.com,Monk
2,Jack,jackdaniel@jack.com,Daniels
3,Johnnie,Miki@tyson.com,Walker


### 7A. Slicing Operations using loc

In [22]:
#Entire dataframe using slicing
people_df.loc[:, :]

#All columns with specific range of rows--->< Here end value is included (0:2) - (0,1,2) ><---
people_df.loc[0:2, :]

#All rows with specific range of columns
people_df.loc[:, 'first_name':'contact_no']

#Specific columns and specific rows
people_df.loc[[1,0,3], ['first_name','last_name', 'contact_no']]

#Range of rows and range of columns
people_df.loc[1:3, 'last_name':'contact_no']

Unnamed: 0,last_name,email_ID,contact_no
1,Monk,oldmonk@yahoo.com,4327847238
2,Daniels,jackdaniel@jack.com,6739297423
3,Walker,Miki@tyson.com,7889732131


### 8. Reading a csv file

In [23]:
#Reading a csv for Atheletes
file_path = './data/Athletes.csv'

dataframe = pd.read_csv(file_path, encoding='latin1')
dataframe.head()

Unnamed: 0,PersonName,Country,Discipline
0,AALERUD Katrine,Norway,Cycling Road
1,ABAD Nestor,Spain,Artistic Gymnastics
2,ABAGNALE Giovanni,Italy,Rowing
3,ABALDE Alberto,Spain,Basketball
4,ABALDE Tamara,Spain,Basketball


### 9. Pandas Functions

In [24]:
#Reading first 5 rows
dataframe.head()

#Reading specific number of rows-><(10)><-from the start
dataframe.head(10)

#Reading last 5 rows
dataframe.tail()

#Reading specific number of rows-><(8)><-from the last
dataframe.tail(8)

#Figure out the size of dataframe---><(rows, columns)><---
dataframe.shape

#Basic information about the dataframe
dataframe.info()

#Summary statistics of the dataframe
dataframe.describe()

#Entire columns in the dataframe
dataframe.columns

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11085 entries, 0 to 11084
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   PersonName  11085 non-null  object
 1   Country     11085 non-null  object
 2   Discipline  11085 non-null  object
dtypes: object(3)
memory usage: 259.9+ KB


Index(['PersonName', 'Country', 'Discipline'], dtype='object')