![U.png](attachment:U.png)

**Introduction to Pandas :**

1. Open-source Python library
2. Sample yet powerful and expressive tool
3. It is mainly used for <b>Data Manipulation & Analysis</b>
4. Pandas is created in 2015 by Wes McKinney

**Where did the name Pandas come from?**

- The name Pandas is derived from the word <b>Panel Data</b>
- Panel Data is a multi-dimensional data involving measurements over time e.g. Tables with rows and columns

**Features of Pandas**

1. Series object and DataFrames
2. Handling of missing data
3. Data alignment
4. Group by functionalities
5. Slicing, Indexing, Subseting
6. Merging and Joining
7. Reshaping
8. Hierarchical labeling of axes
9. Robust Input Output tool
10. Time series - specific functionality

![NumpyPandas.PNG](attachment:NumpyPandas.PNG)

**What kind of data does suit Pandas the most**
1. Tabular data [e.g. Excel data, SQL Server]
2. Time Series Data
3. Arbitary Matrix [e.g. with rows and columns]


![DataSetPandas.PNG](attachment:DataSetPandas.PNG)

**Series Object**

- Series is a <b>one-dimensional</b> labeled array capable of <b>holding data of any type</b> (integer, string, float, python objects, etc.). 
- The axis labels are collectively called index.

In [4]:
import pandas as pd

data = [1, 2, 3, 4]
series1 = pd.Series(data)

series1

0    1
1    2
2    3
3    4
dtype: int64

In [5]:
# check the type of Series data

type(series1)

pandas.core.series.Series

In [15]:
# Changing the index of a Series Object?

data = [1, 2, 3, 4]
series2 = pd.Series(data, index=['a', 'b', 'c', 'd'])

series2

a    1
b    2
c    3
d    4
dtype: int64

In [7]:
# Create a Empty Series

empty = pd.Series()
empty

  empty = pd.Series()


Series([], dtype: float64)

In [9]:
# Create Series from Dictionary : Example-1

# If no index is specified, then the dictionary keys are taken in a sorted order to construct index. 
# If index is passed, the values in data corresponding to the labels in the index will be pulled out

data = {'a' : 0., 'd' : 1., 'c' : 2.}
s = pd.Series(data)
s

a    0.0
d    1.0
c    2.0
dtype: float64

In [13]:
# Create Series from Dictionary : Example-2

# If index is passed, the values in data corresponding to the labels in the index will be pulled out
# Observe: Index order is persisted and the missing element is filled with NaN (Not a Number).

data = {'a' : 0., 'd' : 1., 'c' : 2.}
s = pd.Series(data, index = ['b','c','d','a'])

s

b    NaN
c    2.0
d    1.0
a    0.0
dtype: float64

In [16]:
# Create Series from Scaler

# If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

s = pd.Series(8, index=[0, 1, 2, 3])
s

0    8
1    8
2    8
3    8
dtype: int64

In [27]:
# Retrieve Data

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])

# Retrieve the first three element
s[:3]

# Retrieve the last three elements
s[-3:]

# Retrieve the third and fourth elements
s[2:4]

# Retrieve Data Using Label (Index)
s['d']
s[['c','d']]

c    3
d    4
dtype: int64

**DataFrame**

- Two-dimensional labeled data structures with columns which contains data of different types

**Features of DataFrame**
1. Contains Columns of Different types
2. Mutable size
3. Labeled Data axes(rows and columns) 
4. Perform Arithmetic operations on rows and columns

![twodim.PNG](attachment:twodim.PNG)

In [28]:
# Create Empty DataFrame

empty = pd.DataFrame()
empty

In [29]:
# Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

lst = ['Apple', 'Orange', 'Banana', 'Grapes']

df = pd.DataFrame(lst)
df

Unnamed: 0,0
0,Apple
1,Orange
2,Banana
3,Grapes


In [33]:
# Creating DataFrame from dict of ndarray/lists : 
# To create DataFrame from dict of narray/list, all the narray must be of same length. 
# If index is passed then the length index should be equal to the length of arrays. 
# If no index is passed, then by default, index will be range(n) where n is the array length.

data = {'Name':['Tom', 'nick', 'krish', 'jack'], 'Age':[20, 21, 19, 18]}

df = pd.DataFrame(data)

df


Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


In [35]:
# Dealing with Rows and Columns Selection

data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}

df = pd.DataFrame(data)

#  select two columns
df[["Age", "Qualification"]]


Unnamed: 0,Age,Qualification
0,27,Msc
1,24,MA
2,22,MCA
3,32,Phd


In [38]:
# Creating a DataFrane using series

series = pd.Series([6, 12], index=['a', 'b'])
df = pd.DataFrame(series)

df

Unnamed: 0,0
a,6
b,12


In [42]:
# Creating a DataFrane using NumPy Array

import numpy as np

numarray = np.array([[50000, 60000], ['John', 'James']])
df = pd.DataFrame({'Name' : numarray[1], 'Salary' : numarray[0]}) # Fetch the data of 1st cell of array

df

Unnamed: 0,Name,Salary
0,John,50000
1,James,60000


In [50]:
# DataFrame.loc[] method : To retrieve rows from a Data frame, 
#                          Rows can also be selected by passing integer location to an iloc[] function.

# index_col " "
df = pd.read_csv(r"C:\Users\Sony\Desktop\DataScienceAI\GIT\1. Python Concepts\Data\housing.csv", index_col=["bedrooms"])

df

Unnamed: 0_level_0,id,date,price,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
bedrooms,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
3,7129300520,20141013T000000,221900,1.00,1180,5650,1.0,0,0,3,7,1180,0,1955,0,98178,47.5112,-122.257,1340,5650
3,6414100192,20141209T000000,538000,2.25,2570,7242,2.0,0,0,3,7,2170,400,1951,1991,98125,47.7210,-122.319,1690,7639
2,5631500400,20150225T000000,180000,1.00,770,10000,1.0,0,0,3,6,770,0,1933,0,98028,47.7379,-122.233,2720,8062
4,2487200875,20141209T000000,604000,3.00,1960,5000,1.0,0,0,5,7,1050,910,1965,0,98136,47.5208,-122.393,1360,5000
3,1954400510,20150218T000000,510000,2.00,1680,8080,1.0,0,0,3,8,1680,0,1987,0,98074,47.6168,-122.045,1800,7503
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3,263000018,20140521T000000,360000,2.50,1530,1131,3.0,0,0,3,8,1530,0,2009,0,98103,47.6993,-122.346,1530,1509
4,6600060120,20150223T000000,400000,2.50,2310,5813,2.0,0,0,3,8,2310,0,2014,0,98146,47.5107,-122.362,1830,7200
2,1523300141,20140623T000000,402101,0.75,1020,1350,2.0,0,0,3,7,1020,0,2009,0,98144,47.5944,-122.299,1020,2007
3,291310100,20150116T000000,400000,2.50,1600,2388,2.0,0,0,3,8,1600,0,2004,0,98027,47.5345,-122.069,1410,1287
