# Pandas

## Pandas Series

Pandas Series is a **one-dimensional labeled array** capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index. Pandas Series is nothing but a column in an excel sheet.

In the real world, a Pandas Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas Series can be created from the lists, dictionary, and from a scalar value etc. Series can be created in different ways, here are some ways by which we create a series:

In [17]:
#Creating a series from array

import pandas as pd
# as we are using array() 
import numpy as np
 
data = np.array(['p','a','n','d','a','s'])
 
ser1 = pd.Series(data)
print(ser1)

# Creating a series from Lists:
list = ['p', 'a', 'n', 'd', 'a' ,'s']
  
ser2 = pd.Series(list)
print(ser2)

# Accessing Element from Series with Position 
print(ser2[:3])

0    p
1    a
2    n
3    d
4    a
5    s
dtype: object
0    p
1    a
2    n
3    d
4    a
5    s
dtype: object
0    p
1    a
2    n
dtype: object


### Labels
If nothing else is specified, the values are labeled with their index number. First value has index 0, second value has index 1 etc.For example:

In [18]:
data=["Indian","Institute","Of","Technology","Bombay"]

ser3=pd.Series(data)
print(ser3)


0        Indian
1     Institute
2            Of
3    Technology
4        Bombay
dtype: object


**Create Labels**
With the index argument, you can name your own labels.


In [19]:
ser4=pd.Series(data,index=["a","b","c","d","e"])
print(ser4)

#Label can be used to access a specified value

print(ser4["d"])

a        Indian
b     Institute
c            Of
d    Technology
e        Bombay
dtype: object
Technology


**Key/Value Objects as Series**
You can also use a key/value object, like a dictionary, when creating a Series.

In [20]:
calories = {"day1": 420, "day2": 380, "day3": 390}

myvar = pd.Series(calories)

print(myvar)

day1    420
day2    380
day3    390
dtype: int64


Note:The keys of the dictionary become the labels.

### Check out [this](https://www.geeksforgeeks.org/python-pandas-series/?ref=lbp) for more details on Pandas Series.

## Pandas DataFrames

A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.

In [21]:
import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
df = pd.DataFrame(data)

print(df) 

   calories  duration
0       420        50
1       380        40
2       390        45


### For more details refer to [Creating a Pandas DataFrame](https://www.geeksforgeeks.org/different-ways-to-create-pandas-dataframe/)

### Locate Row
Pandas use the **loc** attribute to return one or more specified row(s).Example:

In [22]:
import pandas as pd
data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}
df = pd.DataFrame(data)

#Return row 0
print(df.loc[0])

#Return row 0 and 1
print(df.loc[[0,1]])

calories    420
duration     50
Name: 0, dtype: int64
   calories  duration
0       420        50
1       380        40


**Note: When using [], the result is a Pandas DataFrame.**

With the **index** argument, you can name your own indexes and use this named index in the **loc** attribute to return the specified row(s).

In [23]:
df=pd.DataFrame(data,index=["day1","day2","day3"])
print(df.loc["day1"])


calories    420
duration     50
Name: day1, dtype: int64


#### Rows can also be extracted using .iloc[]. Check out [here](https://www.geeksforgeeks.org/python-extracting-rows-using-pandas-iloc/?ref=lbp).

## Pandas Read CSV

A simple way to store big data sets is to use CSV files (comma separated files).

CSV files contains plain text and is a well know format that can be read by everyone including Pandas.
Check out this [link](https://www.geeksforgeeks.org/python-read-csv-using-pandas-read_csv/?ref=lbp) to understand how we can read CSV files 
Also note that when you print a DataFrame , you will only get the first 5 rows ,and last 5 rows.But what if you want to print the entire DataFrame?
Use **to_string()** to print the entire DataFrame!


## Pandas Read JSON

Big data sets are often stored, or extracted as JSON.

JSON is plain text, but has the format of an object, and is well known in the world of programming, including Pandas.
Head over to this [link](https://towardsdatascience.com/how-to-parse-json-data-with-python-pandas-f84fbd0b1025) that covers all necessary information about JSON files and how to read them using pandas.
Similar to the CSV files , use **to_string()** to print entire DataFrame

## Pandas Mean | Standard Deviation | Variance 
Similar to numpy , we have inbuilt functions to create the Mean,Standard Deviation and Variance of the Data of a Pandas Series


In [24]:
# Finding the mean and Standard Deviation of a Pandas Series.

import pandas as pd
  
# creating a series
s = pd.Series(data = [5, 9, 8, 5, 7, 8, 1, 2, 3,
                      4, 5, 6, 7, 8])

# displaying the series
print(s)
# finding the mean
print(s.mean())
# finding the Standard deviation
print(s.std())
# finding the variance
print(s.var())



0     5
1     9
2     8
3     5
4     7
5     8
6     1
7     2
8     3
9     4
10    5
11    6
12    7
13    8
dtype: int64
5.571428571428571
2.4405007592795287
5.956043956043955


In [14]:
#Finding the mean and Standard Deviation of a Pandas DataFrame.

import pandas as pd

# creating a dataframe 
df = pd.DataFrame({'Cost':[114, 345, 15778, 5626],
                   'Quantity':[10, 20, 50,30]})
  
# displaying the DataFrame
print(df)
# finding the mean
print(df.mean())
# finding the Standard deviation
print(df.std())
# finding the variance
print(df.var())

    Cost  Quantity
0    114        10
1    345        20
2  15778        50
3   5626        30
Cost        5465.75
Quantity      27.50
dtype: float64
Cost        7331.018318
Quantity      17.078251
dtype: float64
Cost        5.374383e+07
Quantity    2.916667e+02
dtype: float64


## Cleaning Data
Data cleaning is the process of changing or eliminating garbage, incorrect, duplicate, corrupted, or incomplete data in a dataset.The motive of data cleaning services is to construct uniform and standardized data sets that enable data analytical tools and business intelligence easy access and perceive accurate data for each problem.
There are numerous Data cleaning tools present but, the **Pandas** library provides a really fast and efficient way to manage and explore data. 

Head over to this [link](https://www.w3schools.com/python/pandas/pandas_cleaning.asp) to know about data cleaning using pandas

### In this notebook , we have touched upon basics of Pandas . To know more about it , you can use following links:

### [pandas tutorial](https://www.geeksforgeeks.org/pandas-tutorial/?ref=lbp)
### [Documentation](https://pandas.pydata.org/docs/)
