# Introduction to Pandas

## What is Pandas?

Pandas is an open-source Python library used for data analysis and manipulation. It provides powerful data structures like DataFrame (2D table) and Series (1D array), which make it easy to work with structured data.

## Why do we use Pandas?

We use Pandas because it allows us to:
- Load and handle data from multiple sources (CSV, Excel, SQL, JSON, etc.)
- Clean and preprocess data (handling nulls, duplicates, type conversions)
- Perform data manipulation (filtering, grouping, aggregating, merging, reshaping)
- Analyze and explore datasets efficiently with simple syntax
- Integrate seamlessly with other Python libraries (NumPy, Matplotlib, Scikit-learn)


In [0]:
pip install pandas

- there are 2 types of datastructures in pandas that is series and dataframe 
- series is a 1d datatype and dataframe is a 2d datatype

In [0]:
import pandas as pd

In [0]:
s = pd.Series([1,2,3,4,5])

In [0]:
print(s)
# s.display() display and show doesnt work in the series data structure 

In [0]:
# series with indexing

s = pd.Series([1,2,3,4,5], index = ['a','b','c','d','e'])

In [0]:
print(s)

### creating dataframes 

In [0]:
df = pd.DataFrame({"names" : ['John','Jane','Jack','Jill','Joe'], "age" : [25,30,35,40,45]})

In [0]:
# df.display()
# print(df)
# print and display both works in the dataframe data structure

In [0]:
df = pd.read_csv("/Volumes/workspace/project/project_files/iris.csv")

In [0]:
df.display()

In [0]:
df.head(10)

In [0]:
df.tail(10)

In [0]:
df.describe()

In [0]:
df.info()

## Data Selection

In [0]:
df["sepal_length"]

In [0]:
# type of a single column in the dataframe 

type(df["sepal_length"])


In [0]:
# selecting multiple columns 
df[["sepal_length", "sepal_width"]]

In [0]:
# to drop the null in the dataframe

df.dropna()

In [0]:
# to fill the null values 
df.fillna(0)
# if we write inplace = true in the function then the original datafrme get changes

In [0]:
# renaming a column 
df.rename(columns = {"sepal_length": "SL"})

In [0]:
df.info()

In [0]:
# to change the data type of a column we can use as type 
df['sepal_length'] = df['sepal_length'].astype(int)

In [0]:
df.display(10)

In [0]:
# to get the number of rows present in the dataframe 
len(df)

In [0]:
# adding a column 

df['zeros'] = [0 for i in range (len(df))]

In [0]:
df.display()

In [0]:
# creating a function 
def fx(a):
    return a*a 

In [0]:
# applying the function to 
df["Petal_square"] = df["petal_length"].apply(fx)
df.display()

In [0]:
#  to save the csv file 
df.to_csv("/Volumes/workspace/project/project_files/iris_modified.csv",index=False)

In [0]:
# You can use concat to combine to datframe 
# pd.concat([df1,df2])
# you can merge commands to merge to dataframe
# pd.merge(df1,df2,on = "PK")