# Let's Explore `Pandas` - An Awesome Python Library


**Pandas** is a powerful Python library primarily used for data manipulation and analysis. It's built on top of **NumPy** and provides two main data structures: **Series** and **DataFrame**, which are designed to handle structured data intuitively.

Let's dive into the details of Pandas:

### 1. **Installing Pandas**
First, you need to install Pandas (if you haven't already):
```bash
pip install pandas
```

### 2. **Importing Pandas**
You typically import Pandas using the alias `pd`:
```python
import pandas as pd
```

### 3. **Pandas Data Structures**

#### **Series**
- A **Series** is a one-dimensional labeled array capable of holding any data type (integer, float, string, etc.). It’s similar to a column in a spreadsheet.
- Each value in a Series is associated with an index, which makes it easy to access values.

##### Example:
```python
import pandas as pd

# Creating a Series from a list
s = pd.Series([1, 3, 5, 7])
print(s)

# Creating a Series with custom index
s_custom = pd.Series([1, 3, 5, 7], index=['a', 'b', 'c', 'd'])
print(s_custom)
```

#### **DataFrame**
- A **DataFrame** is a two-dimensional labeled data structure with columns of potentially different types. It’s similar to a table or a spreadsheet.
- A DataFrame can be created from various data sources, including lists, dictionaries, and files (like CSV or Excel).

##### Example:
```python
# Creating a DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
```

### 4. **Reading Data from Files**
Pandas makes it easy to read data from various file formats like CSV, Excel, etc.

- **CSV files**:
```python
df = pd.read_csv('filename.csv')
```

- **Excel files**:
```python
df = pd.read_excel('filename.xlsx')
```

- **JSON files**:
```python
df = pd.read_json('filename.json')
```

### 5. **Inspecting Data**
Once you have a DataFrame, you’ll often want to inspect the data to understand its structure.

- **Viewing the first few rows**:
```python
df.head()  # Default: shows first 5 rows
```

- **Viewing the last few rows**:
```python
df.tail()  # Default: shows last 5 rows
```

- **Getting basic information**:
```python
df.info()  # Provides a summary of the DataFrame, including data types and missing values
```

- **Descriptive statistics**:
```python
df.describe()  # Gives summary statistics (mean, std, min, etc.) for numeric columns
```

### 6. **Indexing and Selecting Data**

#### **Selecting Columns**
You can select a single column or multiple columns from a DataFrame.

- Single column (returns a Series):
```python
df['Name']
```

- Multiple columns (returns a DataFrame):
```python
df[['Name', 'City']]
```

#### **Selecting Rows**
There are multiple ways to select rows in a DataFrame:

- Using the index:
```python
df.loc[0]  # Selects the first row by label/index
```

- Using integer-based indexing:
```python
df.iloc[0]  # Selects the first row by integer position
```

- Slicing rows:
```python
df.iloc[0:3]  # Selects the first three rows
```

### 7. **Filtering Data**
You can filter the rows of a DataFrame based on a condition.

##### Example:
```python
# Filter rows where Age is greater than 30
df_filtered = df[df['Age'] > 30]
print(df_filtered)
```

You can also filter with multiple conditions using `&` (and), `|` (or).

##### Example:
```python
# Filter rows where Age > 30 and City is 'Chicago'
df_filtered = df[(df['Age'] > 30) & (df['City'] == 'Chicago')]
print(df_filtered)
```

### 8. **Modifying Data**

#### **Adding New Columns**
You can add new columns to a DataFrame by directly assigning values.

##### Example:
```python
# Adding a new column 'Salary'
df['Salary'] = [50000, 60000, 70000]
print(df)
```

#### **Updating Values**
You can update values in a DataFrame based on conditions or directly by index.

##### Example:
```python
# Update the 'City' of the first row
df.loc[0, 'City'] = 'San Francisco'
print(df)
```

#### **Dropping Columns or Rows**
You can remove columns or rows using `drop()`.

##### Example:
- Dropping a column:
```python
df = df.drop(columns=['Salary'])
print(df)
```

- Dropping a row:
```python
df = df.drop(0)  # Drops the first row
print(df)
```

### 9. **Handling Missing Data**
Pandas provides tools to handle missing data (NaN values).

- **Checking for missing values**:
```python
df.isnull().sum()  # Shows the number of missing values per column
```

- **Filling missing values**:
```python
df.fillna(value=0)  # Fills NaN values with 0
```

- **Dropping missing values**:
```python
df.dropna()  # Drops rows with any NaN values
```

### 10. **GroupBy and Aggregation**
The `groupby()` function is used to group data based on a column and then apply an aggregation function (like sum, mean, count).

##### Example:
```python
# Group by 'City' and calculate the mean age
df_grouped = df.groupby('City')['Age'].mean()
print(df_grouped)
```

### 11. **Sorting Data**
You can sort the data in a DataFrame by columns using `sort_values()`.

##### Example:
```python
# Sort by Age in descending order
df_sorted = df.sort_values(by='Age', ascending=False)
print(df_sorted)
```

### 12. **Merging and Joining Data**
You can merge or join two DataFrames, similar to SQL joins.

##### Example:
```python
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 3], 'Salary': [50000, 60000, 70000]})

# Merge on 'ID'
df_merged = pd.merge(df1, df2, on='ID')
print(df_merged)
```

### 13. **Pivot Tables**
You can create pivot tables to summarize data.

##### Example:
```python
df = pd.DataFrame({
    'City': ['New York', 'Chicago', 'New York', 'Chicago'],
    'Sales': [200, 150, 300, 250]
})

# Create a pivot table showing the sum of Sales by City
pivot_table = df.pivot_table(values='Sales', index='City', aggfunc='sum')
print(pivot_table)
```

### 14. **Exporting Data**
You can export a DataFrame to various formats, such as CSV, Excel, or JSON.

- **To CSV**:
```python
df.to_csv('output.csv', index=False)
```

- **To Excel**:
```python
df.to_excel('output.xlsx', index=False)
```

### 15. **Advanced Operations**

#### **Apply Functions**
You can apply custom functions to columns or rows using the `apply()` function.

##### Example:
```python
# Apply a custom function to the 'Age' column
df['Age_in_5_years'] = df['Age'].apply(lambda x: x + 5)
print(df)
```

#### **Window Functions**
Pandas supports rolling window operations, which are useful for time series analysis.

##### Example:
```python
# Rolling mean over 2 periods
df['Rolling_mean'] = df['Sales'].rolling(window=2).mean()
print(df)
```

---


In [33]:
import pandas as pd

In [34]:
company_info = {
    'Name' : ["Wasif","Galib","Hasib"],
    'Age'  : [23,21,26],
    'Salary' : [120000,45000,34000]
}
df = pd.DataFrame(company_info)
print(df)

    Name  Age  Salary
0  Wasif   23  120000
1  Galib   21   45000
2  Hasib   26   34000


In [35]:
df= pd.read_excel("usainfo.xlsx")
df.head(10)


Unnamed: 0,Row ID,Order Priority,Discount,Unit Price,Shipping Cost,Customer ID,Customer Name,Ship Mode,Customer Segment,Product Category,...,Region,State or Province,City,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID
0,20847,High,0.01,2.84,0.93,3,Bonnie Potter,Express Air,Corporate,Office Supplies,...,West,Washington,Anacortes,98221,2015-01-07,2015-01-08,4.56,4,13.01,88522
1,20228,Not Specified,0.02,500.98,26.0,5,Ronnie Proctor,Delivery Truck,Home Office,Furniture,...,West,California,San Gabriel,91776,2015-06-13,2015-06-15,4390.3665,12,6362.85,90193
2,21776,Critical,0.06,9.48,7.29,11,Marcus Dunlap,Regular Air,Home Office,Furniture,...,East,New Jersey,Roselle,7203,2015-02-15,2015-02-17,-53.8096,22,211.15,90192
3,24844,Medium,0.09,78.69,19.99,14,Gwendolyn F Tyson,Regular Air,Small Business,Furniture,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-14,803.4705,16,1164.45,86838
4,24846,Medium,0.08,3.28,2.31,14,Gwendolyn F Tyson,Regular Air,Small Business,Office Supplies,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-13,-24.03,7,22.23,86838
5,24847,Medium,0.05,3.28,4.2,14,Gwendolyn F Tyson,Regular Air,Small Business,Office Supplies,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-13,-37.03,4,13.99,86838
6,24848,Medium,0.05,3.58,1.63,14,Gwendolyn F Tyson,Regular Air,Small Business,Office Supplies,...,Central,Minnesota,Prior Lake,55372,2015-05-12,2015-05-13,-0.71,4,14.26,86838
7,18181,Critical,0.0,4.42,4.99,15,Timothy Reese,Regular Air,Small Business,Office Supplies,...,East,New York,Smithtown,11787,2015-04-08,2015-04-09,-59.82,7,33.47,86837
8,20925,Medium,0.01,35.94,6.66,15,Timothy Reese,Regular Air,Small Business,Office Supplies,...,East,New York,Smithtown,11787,2015-05-28,2015-05-28,261.8757,10,379.53,86839
9,26267,High,0.04,2.98,1.58,16,Sarah Ramsey,Regular Air,Small Business,Office Supplies,...,East,New York,Syracuse,13210,2015-02-12,2015-02-15,2.63,6,18.8,86836


In [36]:
# explore data, size, types, nulls
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 498 entries, 0 to 497
Data columns (total 25 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Row ID                498 non-null    int64         
 1   Order Priority        498 non-null    object        
 2   Discount              498 non-null    float64       
 3   Unit Price            498 non-null    float64       
 4   Shipping Cost         498 non-null    float64       
 5   Customer ID           498 non-null    int64         
 6   Customer Name         498 non-null    object        
 7   Ship Mode             498 non-null    object        
 8   Customer Segment      498 non-null    object        
 9   Product Category      498 non-null    object        
 10  Product Sub-Category  498 non-null    object        
 11  Product Container     498 non-null    object        
 12  Product Name          498 non-null    object        
 13  Product Base Margin 

In [37]:
# basic summary of all data
df.describe()

Unnamed: 0,Row ID,Discount,Unit Price,Shipping Cost,Customer ID,Product Base Margin,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID
count,498.0,498.0,498.0,498.0,498.0,495.0,498.0,498,498,498.0,498.0,498.0,498.0
mean,19441.965863,0.047731,101.273273,12.887751,450.807229,0.523636,58747.965863,2015-03-29 18:24:34.698795264,2015-03-31 17:18:04.337349632,152.297517,13.544177,1028.716446,80958.01004
min,64.0,0.0,1.14,0.5,3.0,0.35,1007.0,2015-01-02 00:00:00,2015-01-02 00:00:00,-6923.5992,1.0,2.25,359.0
25%,18817.25,0.02,6.12,3.035,220.5,0.385,32409.0,2015-02-10 06:00:00,2015-02-12 06:00:00,-89.06275,5.0,54.225,86838.25
50%,20874.5,0.05,17.98,6.07,477.5,0.55,63110.5,2015-03-30 00:00:00,2015-04-01 00:00:00,-3.3565,9.0,188.585,88571.0
75%,23399.5,0.07,108.595,13.99,678.75,0.6,89115.0,2015-05-17 00:00:00,2015-05-18 00:00:00,100.6475,17.0,804.165,90027.0
max,26321.0,0.1,3502.14,99.0,896.0,0.85,99362.0,2015-06-30 00:00:00,2015-07-03 00:00:00,7402.32,146.0,43046.2,91576.0
std,6458.212374,0.030769,274.255901,16.9498,257.806579,0.14098,32495.291935,,,1022.71145,15.12387,2954.426556,21394.540035


In [38]:
df.isnull()

Unnamed: 0,Row ID,Order Priority,Discount,Unit Price,Shipping Cost,Customer ID,Customer Name,Ship Mode,Customer Segment,Product Category,...,Region,State or Province,City,Postal Code,Order Date,Ship Date,Profit,Quantity ordered new,Sales,Order ID
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
493,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
494,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
495,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [39]:
# finding how many null values are there
df.isnull().sum()

Row ID                  0
Order Priority          0
Discount                0
Unit Price              0
Shipping Cost           0
Customer ID             0
Customer Name           0
Ship Mode               0
Customer Segment        0
Product Category        0
Product Sub-Category    0
Product Container       0
Product Name            0
Product Base Margin     3
Country                 0
Region                  0
State or Province       0
City                    0
Postal Code             0
Order Date              0
Ship Date               0
Profit                  0
Quantity ordered new    0
Sales                   0
Order ID                0
dtype: int64