# Introduction to Pandas
---

**Objective:**

In this exercise, you will be covering basics of Pandas library (*Python Data Analysis Library provides high performance and easy to use data structures*).
Pandas stands for “Python Data Analysis Library”.Pandas is quite a game changer when it comes to analyzing data with Python and it is one of the most preferred and widely used tools in data munging/wrangling if not THE most used one. Pandas is an open source, free to use (under a BSD license). 

Source can be found [here](https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673) and on this [link](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html) as well.

In [None]:
#importing Pandas library
#pandas works with numpy so we usually import both libraries
import pandas as pd
import numpy as np

## Object Creation

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.) 

Creating a Series by passing a list of values, letting pandas create a default integer index:

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8]) # np.nan :Not a Number is not equivalent to infinity.
s

DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects.

Creating a DataFrame with a datetime index.


In [None]:
dates = pd.date_range('20130101', periods=6)
dates

## Import our Data

We are looking at a dataset with the following columns:
  - GRE Scores ( out of 340 )
  - TOEFL Scores ( out of 120 )
  - University Rating ( out of 5 )
  - Statement of Purpose ( out of 5 )
  - Letter of Recommendation Strength ( out of 5 )
  - Undergraduate GPA ( out of 10 )
  - Research Experience ( either 'yes' or 'no' )

The data is stored in a csv (Comma Separated Values) file. To load the data to our code, we use pandas module, more specifically, the read_csv function.

In [None]:
Path_to_data = '/data/'

In [None]:
# read dataset and display it
df = ____

df

Notice the columns of the resulting DataFrame have different dtypes.

In [None]:
df.dtypes

## Viewing Data

Here is how to view the top and bottom rows of the frame:

In [None]:
#view first 5 rows of datafarme
______

In [None]:
#view first 3 rows of datafarme
______

In [None]:
#view last 2 rows
________

Display the index:

In [None]:
df.index

In [None]:
#Display the data columns (features)

df.columns

describe() shows a quick statistic summary of your data:

In [None]:
df.describe()

Sorting by values:

In [None]:
df.sort_values(by='Chance of Admit', ascending=False) # Notice: updates not stored in dataframe

Notice the following:

In [None]:
print(df)
df = df.sort_values(by='Chance of Admit') # should assign changes to dataframe
print(df)

## Selection

In [None]:
# selecting column by name (Chance of Admit)
______

In [None]:
# selecting rows 
df[0:3]

***loc*** gets rows (or columns) with particular labels from the index. 

***iloc*** gets rows (or columns) at particular positions in the index (so it only takes integers).

In [None]:
df.loc[[0]] # selecting by label

In [None]:
df.loc[0, ['GRE Score', 'TOEFL Score']]

In [None]:
df.iloc[3:5, 0:2] # selection by index

In [None]:
df[df['Chance of Admit'] > 0.5]   #selection by condition

In [None]:
#Using the isin() method for filtering:
df[df['University Rating'].isin([1, 2])]

## Operations 

### Statistics

In [None]:
df.mean() #Performing average over columns

In [None]:
#Performing average over rows
df.mean(1)