# Pandas
Pandas is an open-source data analysis and manipulation library for Python. It provides data structures and functions needed to work on structured data seamlessly and efficiently.

**Key Features of Pandas:**
- *DataFrame and Series:* Two main data structures for data manipulation.
- *Handling Missing Data:* Functions to detect and fill missing values.
- *Group By:* Tools for splitting data into groups and applying operations.
- *Merging and Joining:* Combining multiple datasets.
- *Reshaping:* Pivoting and melting data.

## 1. Installation
To install Pandas, use the following command in your terminal:

In [2]:
pip install pandas




## 2. Importing Pandas
You typically import Pandas using the alias pd for convenience:

In [3]:
import pandas as pd

## 3. Creating DataFrames
### 3.1. Creating DataFrame from a Dictionary
You can create a DataFrame from a dictionary where keys are column names and values are lists of column data.

In [8]:
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


### 3.2. Creating DataFrame from a List of Dictionaries
Create a DataFrame from a list of dictionaries.

In [6]:
data = [{'Name': 'Alice', 'Age': 25}, {'Name': 'Bob', 'Age': 30}]
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30


## 4. DataFrame Attributes and Methods
### 4.1. Attributes
- **shape:** Returns a tuple indicating the size of the DataFrame.

In [9]:
print(df.shape)

(3, 2)


- **columns:** Returns the column labels of the DataFrame.

In [10]:
print(df.columns)  

Index(['Name', 'Age'], dtype='object')


- **index:** Returns the row labels of the DataFrame

In [11]:
print(df.index)  

RangeIndex(start=0, stop=3, step=1)


- **dtypes:** Returns the data types of the DataFrame columns.

In [13]:
print(df.dtypes)

Name    object
Age      int64
dtype: object


### 4.2. Methods
- **head():** Returns the first n rows of the DataFrame.

In [14]:
print(df.head(2))  # First 2 rows

    Name  Age
0  Alice   25
1    Bob   30


- **tail():** Returns the last n rows of the DataFrame.

In [15]:
print(df.tail(2))  # Last 2 rows

      Name  Age
1      Bob   30
2  Charlie   35


- **describe():** Generates descriptive statistics.

In [16]:
print(df.describe())

        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


## 5. Data Selection
### 5.1. Selection using loc
The loc method is used for label-based indexing and selection.

In [17]:
# Selecting a single row by label
row = df.loc[1]

# Selecting multiple rows by labels
rows = df.loc[0:2]

# Selecting specific columns
columns = df.loc[:, ['Name', 'Age']]


In [18]:
row

Name    Bob
Age      30
Name: 1, dtype: object

In [19]:
rows

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


In [20]:
columns

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


### 5.2. Selection using iloc
The iloc method is used for position-based indexing and selection

In [21]:
# Selecting a single row by position
row = df.iloc[1]

# Selecting multiple rows by positions
rows = df.iloc[0:2]

# Selecting specific columns
columns = df.iloc[:, [0, 1]]


In [22]:
row

Name    Bob
Age      30
Name: 1, dtype: object

In [23]:
rows

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30


In [24]:
columns

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


## 6. Data Manipulation
### 6.1. Adding a Column
Add a new column to the DataFrame.

In [25]:
df['Salary'] = [50000, 60000, 70000]

In [26]:
df

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


### 6.2. Removing a Column
Remove a column from the DataFrame.

In [27]:
df.drop('Salary', axis=1, inplace=True) # axis=1  means column should be deleted
df

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


### 6.3. Filtering Data
Filter data based on conditions.

In [28]:
filtered_df = df[df['Age'] > 30]
filtered_df

Unnamed: 0,Name,Age
2,Charlie,35


## 7. Handling Missing Data
### 7.1. Detecting Missing Data
Detect missing values in the DataFrame.

In [29]:
df.isnull()

Unnamed: 0,Name,Age
0,False,False
1,False,False
2,False,False


In [34]:
df['Salary']=[5000,None,7000]
df

Unnamed: 0,Name,Age,Salary
0,Alice,25,5000.0
1,Bob,30,
2,Charlie,35,7000.0


In [35]:
df.isnull()

Unnamed: 0,Name,Age,Salary
0,False,False,False
1,False,False,True
2,False,False,False


### 7.2. Filling Missing Data
Fill missing values with a specified value.

In [36]:
df.fillna(0, inplace=True)

In [37]:
df

Unnamed: 0,Name,Age,Salary
0,Alice,25,5000.0
1,Bob,30,0.0
2,Charlie,35,7000.0


### 7.3. Dropping Missing Data
Drop rows with missing values.

In [38]:
df.dropna(inplace=True)

In [39]:
df

Unnamed: 0,Name,Age,Salary
0,Alice,25,5000.0
1,Bob,30,0.0
2,Charlie,35,7000.0


## 8. Group By
Group data and perform aggregate operations.

In [43]:
grouped = df.groupby('Age').mean()
grouped

  grouped = df.groupby('Age').mean()


Unnamed: 0_level_0,Salary
Age,Unnamed: 1_level_1
25,5000.0
30,0.0
35,7000.0


## 9. Merging and Joining DataFrames
### 9.1. Merging
Merge two DataFrames based on a key.

In [51]:
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')
merged_df

Unnamed: 0,key,value_x,value_y
0,A,1,4
1,B,2,5


### 9.2. Joining
Join two DataFrames using their indexes.

In [56]:
if 'key' in df1.columns and 'key' in df2.columns:
    df1 = df1.set_index('key')
    df2 = df2.set_index('key')
    joined_df = df1.join(df2, how='inner')
    print(joined_df)

## 10. Reshaping Data
### 10.1. Pivoting
Pivot data to a new DataFrame.

In [58]:
pivot_df = df.pivot(index='Age', columns='Name', values='Salary')
pivot_df

Name,Alice,Bob,Charlie
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
25,5000.0,,
30,,0.0,
35,,,7000.0


### 10.2. Melting
Unpivot data from a DataFrame.

In [59]:
melted_df = pd.melt(df, id_vars=['Name'], value_vars=['Age'])
melted_df

Unnamed: 0,Name,variable,value
0,Alice,Age,25
1,Bob,Age,30
2,Charlie,Age,35
