# **Introduction to Pandas**

**What is Pandas?**
* Pandas is a Python library used for working with data sets.
* It has functions for analyzing, cleaning, exploring, and manipulating data.
* The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008.

**Why Use Pandas?**
* Pandas allows us to analyze big data and make conclusions based on statistical theories.
* Pandas can clean messy data sets, and make them readable and relevant.
* Relevant data is very important in data science.

**What Can Pandas Do?**
* Pandas gives you answers about the data. Like: 
1. Is there a correlation between two or more columns
2. What is average value?
3. Max value?
4. Min value?
5. Data Cleaning
*Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

**Installation of Pandas**

In [12]:
pip install pandas



**Checking Pandas Version**

* The version string is stored under version attribute.

In [14]:
print(pd.__version__)

1.1.5


**Import Pandas**
* Once Pandas is installed, import it in your applications by adding the import keyword:

In [None]:
import pandas as pd

In [13]:
x = {
      'course': ["ML", "DL", "DS"],
      'passings': [3, 2, 6]
    }

y = pd.DataFrame(x)

print(y)

  course  passings
0     ML         3
1     DL         2
2     DS         6


**What is a Series?**
* A Pandas Series is like a column in a table.
* It is a one-dimensional array holding data of any type.

In [15]:
# Create a simple Pandas Series from a list:
A = [1, 7, 2]
B = pd.Series(A)
print(B)

0    1
1    7
2    2
dtype: int64


**Labels**
* If nothing else is specified, the values are labeled with their index number.
* First value has index 0, second value has index 1 etc.
* This label can be used to access a specified value.

In [16]:
# Return the first value of the Series:
print(B[0])

1


**Create Labels**
* With the index argument, you can name your own labels.

In [17]:
# Create your own labels:
A = [1, 7, 2, 8, 9]
B = pd.Series(A, index = ["n1", "n2", "n3", "n4", "n5"])
print(B)

n1    1
n2    7
n3    2
n4    8
n5    9
dtype: int64


When you have created labels, you can access an item by referring to the label.

In [18]:
# Return the value of "y":
print(B["n3"])

2


**Key/Value Objects as Series**
* You can also use a key/value object, like a dictionary, when creating a Series.

In [19]:
# Create a simple Pandas Series from a dictionary:
sub = {"maths": 83, "Phy": 80, "Chem": 90}
B = pd.Series(sub)
print(B)

maths    83
Phy      80
Chem     90
dtype: int64


**Note:** The keys of the dictionary become the labels.

To select only some of the items in the dictionary, use the index argument and specify only the items you want to include in the Series.

In [20]:
# Create a Series using only data from "day1" and "day2":
sub = {"maths": 83, "Phy": 80, "Chem": 90}
B = pd.Series(sub, index = ["maths","Phy"])
print(B)

maths    83
Phy      80
dtype: int64


**DataFrames**
* Data sets in Pandas are usually multi-dimensional tables, called DataFrames.
* Series is like a column, a DataFrame is the whole table.

In [21]:
# Create a DataFrame from two Series:
A = { 
    "sub": [420, 380, 390],
    "marks": [50, 40, 45]
    }

B = pd.DataFrame(A)

print(B)

   sub  marks
0  420     50
1  380     40
2  390     45
