# Introduction

## Overview of Pandas

Pandas is an open-source data analysis and manipulation library for Python. It provides data structures and functions needed to manipulate structured data seamlessly. Built on top of NumPy, Pandas integrates with other popular Python libraries and tools used for data analysis and visualization.

Key Features of Pandas:
- Fast and efficient DataFrame object for data manipulation with integrated indexing.
- Tools for reading and writing data between in-memory data structures and different formats (e.g., CSV, text, Excel, SQL databases).
- Data alignment and integrated handling of missing data.
- Reshaping and pivoting of data sets.
- Label-based slicing, indexing, and subsetting of large data sets.
- Data structure merging and joining.
- Time series functionality.

## Installation and Setup

To install Pandas, you can use pip, the Python package installer. Open your command line interface and run:

```sh
pip install pandas
```

If you are using Anaconda, you can install Pandas using the conda package manager:

```sh
conda install pandas
```

To ensure Pandas is installed correctly, you can check its version:

```python
import pandas as pd
print(pd.__version__)
```

## Importing Pandas

Before you can use Pandas in your Python script or Jupyter Notebook, you need to import it. The common convention is to import Pandas as `pd`:

```python
import pandas as pd
```

This aliasing makes it easier to call Pandas functions and methods.

## Basic Workflow with Pandas

1. **Load Data**: Read data from various file formats such as CSV, Excel, or SQL databases into a DataFrame.
2. **Inspect Data**: Examine the structure, summary statistics, and first few rows of the DataFrame to understand its contents.
3. **Clean Data**: Handle missing values, remove duplicates, and correct data types.
4. **Analyze Data**: Perform operations like filtering, grouping, and aggregating to gain insights from the data.
5. **Visualize Data**: Generate plots and charts to visualize the results of your analysis.
6. **Export Data**: Save the processed data back to a file or database.

## Example Workflow

```python
# Importing pandas
import pandas as pd

# Load data
df = pd.read_csv('data.csv')

# Inspect data
print(df.head())
print(df.info())

# Clean data
df.dropna(inplace=True)
df['column'] = df['column'].astype(int)

# Analyze data
grouped = df.groupby('category').sum()

# Visualize data
grouped.plot(kind='bar')

# Export data
grouped.to_csv('cleaned_data.csv')
```

# Input and Output

## Reading Data

- **From CSV**
  - `pd.read_csv(filepath_or_buffer, **kwargs)`
    - Examples
      ```python
      import pandas as pd
      
      # Reading a CSV file
      df = pd.read_csv('data.csv')
      print(df.head())
      ```
- **From Excel**
  - `pd.read_excel(io, **kwargs)`
    - Examples
      ```python
      # Reading an Excel file
      df = pd.read_excel('data.xlsx')
      print(df.head())
      ```
- **From SQL**
  - `pd.read_sql(sql, con, **kwargs)`
    - Examples
      ```python
      import sqlite3
      
      # Establishing a connection
      conn = sqlite3.connect('database.db')
      
      # Reading from SQL
      df = pd.read_sql('SELECT * FROM table_name', conn)
      print(df.head())
      ```
- **From JSON**
  - `pd.read_json(path_or_buf, **kwargs)`
    - Examples
      ```python
      # Reading a JSON file
      df = pd.read_json('data.json')
      print(df.head())
      ```
- **Other Formats**
  - **HDF5**: `pd.read_hdf(path_or_buf, key=None, **kwargs)`
  - **Parquet**: `pd.read_parquet(path, engine='auto', **kwargs)`
  - **Feather**: `pd.read_feather(path, **kwargs)`

## Writing Data

- **To CSV**
  - `DataFrame.to_csv(path_or_buf, **kwargs)`
    - Examples
      ```python
      # Writing to a CSV file
      df.to_csv('output.csv', index=False)
      ```
- **To Excel**
  - `DataFrame.to_excel(excel_writer, **kwargs)`
    - Examples
      ```python
      # Writing to an Excel file
      df.to_excel('output.xlsx', index=False)
      ```
- **To SQL**
  - `DataFrame.to_sql(name, con, **kwargs)`
    - Examples
      ```python
      import sqlite3
      
      # Establishing a connection
      conn = sqlite3.connect('database.db')
      
      # Writing to SQL
      df.to_sql('table_name', conn, if_exists='replace', index=False)
      ```
- **To JSON**
  - `DataFrame.to_json(path_or_buf=None, **kwargs)`
    - Examples
      ```python
      # Writing to a JSON file
      df.to_json('output.json')
      ```
- **Other Formats**
  - **HDF5**: `DataFrame.to_hdf(path_or_buf, key, **kwargs)`
  - **Parquet**: `DataFrame.to_parquet(path, engine='auto', **kwargs)`
  - **Feather**: `DataFrame.to_feather(path, **kwargs)`

# Data Structures

## Series

The `Series` is one of the fundamental data structures in Pandas. It is essentially a one-dimensional array with labels (indices) for each element, which can hold any data type such as integers, floats, strings, etc.

### Creating a Series

You can create a `Series` from various data structures like lists, dictionaries, and arrays.

1. **From a List or Array:**

   ```python
   import pandas as pd

   # Creating a Series from a list
   s = pd.Series([1, 2, 3, 4])
   print(s)
   ```

   **Output:**
   ```
   0    1
   1    2
   2    3
   3    4
   dtype: int64
   ```

2. **From a Dictionary:**

   ```python
   # Creating a Series from a dictionary
   s = pd.Series({'a': 1, 'b': 2, 'c': 3})
   print(s)
   ```

   **Output:**
   ```
   a    1
   b    2
   c    3
   dtype: int64
   ```

3. **With Custom Index:**

   ```python
   # Creating a Series with a custom index
   s = pd.Series([10, 20, 30], index=['x', 'y', 'z'])
   print(s)
   ```

   **Output:**
   ```
   x    10
   y    20
   z    30
   dtype: int64
   ```

### Accessing Data in Series

You can access elements in a `Series` by their index label or position.

1. **By Index Label:**

   ```python
   print(s['a'])  # Accessing by index label
   ```

   **Output:**
   ```
   1
   ```

2. **By Position:**

   ```python
   print(s[0])  # Accessing by position
   ```

   **Output:**
   ```
   1
   ```

3. **Slicing:**

   ```python
   print(s[1:3])  # Slicing
   ```

   **Output:**
   ```
   b    2
   c    3
   dtype: int64
   ```

### Series Operations

Perform operations on a `Series` such as arithmetic and statistical computations.

1. **Arithmetic Operations:**

   ```python
   s1 = pd.Series([1, 2, 3])
   s2 = pd.Series([4, 5, 6])
   print(s1 + s2)  # Addition
   ```

   **Output:**
   ```
   0    5
   1    7
   2    9
   dtype: int64
   ```

2. **Statistical Operations:**

   ```python
   print(s.mean())  # Mean
   print(s.sum())   # Sum
   ```

   **Output:**
   ```
   Mean: 20.0
   Sum: 60
   ```

## DataFrame

The `DataFrame` is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is the most commonly used Pandas object for data manipulation.

### Creating a DataFrame

You can create a `DataFrame` from dictionaries, lists, and other data structures.

1. **From a Dictionary of Lists:**

   ```python
   # Creating a DataFrame from a dictionary of lists
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6]
   })
   print(df)
   ```

   **Output:**
   ```
      A  B
   0  1  4
   1  2  5
   2  3  6
   ```

2. **From a List of Dictionaries:**

   ```python
   # Creating a DataFrame from a list of dictionaries
   df = pd.DataFrame([
       {'A': 1, 'B': 4},
       {'A': 2, 'B': 5},
       {'A': 3, 'B': 6}
   ])
   print(df)
   ```

   **Output:**
   ```
      A  B
   0  1  4
   1  2  5
   2  3  6
   ```

3. **With Custom Index and Columns:**

   ```python
   # Creating a DataFrame with custom index and columns
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6]
   }, index=['row1', 'row2', 'row3'])
   print(df)
   ```

   **Output:**
   ```
         A  B
   row1  1  4
   row2  2  5
   row3  3  6
   ```

### Accessing Data in DataFrame

Access elements, rows, and columns using labels or positions.

1. **By Column Label:**

   ```python
   print(df['A'])  # Accessing a single column
   ```

   **Output:**
   ```
   row1    1
   row2    2
   row3    3
   Name: A, dtype: int64
   ```

2. **By Row Index:**

   ```python
   print(df.loc['row1'])  # Accessing by row index label
   ```

   **Output:**
   ```
   A    1
   B    4
   Name: row1, dtype: int64
   ```

3. **By Position:**

   ```python
   print(df.iloc[0])  # Accessing by position
   ```

   **Output:**
   ```
   A    1
   B    4
   Name: row1, dtype: int64
   ```

4. **Slicing:**

   ```python
   print(df.iloc[0:2])  # Slicing rows
   ```

   **Output:**
   ```
         A  B
   row1  1  4
   row2  2  5
   ```

### DataFrame Operations

Perform various operations such as arithmetic and statistical computations on a `DataFrame`.

1. **Arithmetic Operations:**

   ```python
   df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
   df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
   print(df1 + df2)  # Addition
   ```

   **Output:**
   ```
      A   B
   0  6  10
   1  8  12
   ```

2. **Statistical Operations:**

   ```python
   print(df.mean())  # Mean of each column
   print(df.sum())   # Sum of each column
   ```

   **Output:**
   ```
   Mean:
   A    2.0
   B    5.0
   dtype: float64

   Sum:
   A     6
   B    15
   dtype: int64
   ```

3. **Apply Functions:**

   ```python
   # Applying a function to each element
   print(df.apply(lambda x: x * 2))
   ```

   **Output:**
   ```python
      A   B
   0  2   6
   1  4   8
   ```

4. **Handling Missing Data:**

   ```python
   df_with_nan = pd.DataFrame({'A': [1, 2, None], 'B': [None, 5, 6]})

   # Filling missing values with a specific value
   print(df_with_nan.fillna(0))

   # Dropping rows with any missing values
   print(df_with_nan.dropna())
   ```

   **Output:**
   ```
   Filled with 0:
      A    B
   0  1.0  0.0
   1  2.0  5.0
   2  0.0  6.0

   Dropped missing values:
      A    B
   1  2.0  5.0
   ```

# Data Manipulation

## Indexing and Selection

Indexing and selection are critical for extracting specific parts of data in a `DataFrame` or `Series`. This section covers various methods for accessing and filtering data.

### Selection by Label

1. **Selecting Columns:**

   To select columns from a `DataFrame`, you can use column labels.

   ```python
   import pandas as pd

   # Creating a DataFrame
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6]
   })

   # Selecting a single column
   print(df['A'])
   ```

   **Output:**
   ```
   0    1
   1    2
   2    3
   Name: A, dtype: int64
   ```

   ```python
   # Selecting multiple columns
   print(df[['A', 'B']])
   ```

   **Output:**
   ```
      A  B
   0  1  4
   1  2  5
   2  3  6
   ```

2. **Selecting Rows by Index Label:**

   Use `loc` to select rows by their index labels.

   ```python
   # Creating a DataFrame with custom index
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6]
   }, index=['row1', 'row2', 'row3'])

   # Selecting a single row by index label
   print(df.loc['row1'])
   ```

   **Output:**
   ```
   A    1
   B    4
   Name: row1, dtype: int64
   ```

   ```python
   # Selecting multiple rows by index labels
   print(df.loc[['row1', 'row2']])
   ```

   **Output:**
   ```
         A  B
   row1  1  4
   row2  2  5
   ```

### Selection by Position

1. **Selecting Rows by Position:**

   Use `iloc` to select rows based on their integer position.

   ```python
   # Selecting rows by integer position
   print(df.iloc[0])  # First row
   ```

   **Output:**
   ```
   A    1
   B    4
   Name: row1, dtype: int64
   ```

   ```python
   # Selecting a range of rows
   print(df.iloc[0:2])  # First two rows
   ```

   **Output:**
   ```
         A  B
   row1  1  4
   row2  2  5
   ```

2. **Selecting Specific Rows and Columns by Position:**

   ```python
   # Selecting specific rows and columns by position
   print(df.iloc[0, 1])  # Element at first row and second column
   ```

   **Output:**
   ```
   4
   ```

   ```python
   # Selecting a subset of the DataFrame
   print(df.iloc[0:2, 0:2])
   ```

   **Output:**
   ```
         A  B
   row1  1  4
   row2  2  5
   ```

### Boolean Indexing

Boolean indexing allows filtering of data based on conditions.

1. **Filtering Rows Based on a Condition:**

   ```python
   # Filtering rows where column 'A' is greater than 1
   print(df[df['A'] > 1])
   ```

   **Output:**
   ```
         A  B
   row2  2  5
   row3  3  6
   ```

2. **Filtering with Multiple Conditions:**

   ```python
   # Filtering rows where column 'A' is greater than 1 and column 'B' is less than 6
   print(df[(df['A'] > 1) & (df['B'] < 6)])
   ```

   **Output:**
   ```
         A  B
   row2  2  5
   ```

## Data Cleaning

Data cleaning involves preparing your data for analysis by handling missing data, removing duplicates, and correcting data types.

### Handling Missing Data

1. **Detecting Missing Data:**

   ```python
   df = pd.DataFrame({
       'A': [1, 2, None],
       'B': [None, 5, 6]
   })

   # Detecting missing values
   print(df.isna())
   ```

   **Output:**
   ```
        A      B
   0  False   True
   1  False  False
   2   True  False
   ```

2. **Filling Missing Data:**

   ```python
   # Filling missing values with a specific value
   print(df.fillna(0))
   ```

   **Output:**
   ```
      A    B
   0  1.0  0.0
   1  2.0  5.0
   2  0.0  6.0
   ```

   ```python
   # Filling missing values using forward fill
   print(df.fillna(method='ffill'))
   ```

   **Output:**
   ```
      A    B
   0  1.0  NaN
   1  2.0  5.0
   2  2.0  6.0
   ```

3. **Dropping Missing Data:**

   ```python
   # Dropping rows with any missing values
   print(df.dropna())
   ```

   **Output:**
   ```
      A  B
   1  2  5
   ```

   ```python
   # Dropping columns with any missing values
   print(df.dropna(axis=1))
   ```

   **Output:**
   ```
      B
   1  5
   2  6
   ```

### Removing Duplicates

1. **Removing Duplicate Rows:**

   ```python
   df = pd.DataFrame({
       'A': [1, 2, 2, 3],
       'B': [4, 5, 5, 6]
   })

   # Removing duplicate rows
   print(df.drop_duplicates())
   ```

   **Output:**
   ```
      A  B
   0  1  4
   1  2  5
   3  3  6
   ```

2. **Removing Duplicate Rows Based on Specific Columns:**

   ```python
   # Removing duplicate rows based on column 'A'
   print(df.drop_duplicates(subset='A'))
   ```

   **Output:**
   ```
      A  B
   0  1  4
   1  2  5
   3  3  6
   ```

### Replacing Values

1. **Replacing Specific Values:**

   ```python
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6]
   })

   # Replacing specific values
   print(df.replace({1: 10, 4: 40}))
   ```

   **Output:**
   ```
      A   B
   0  10  40
   1   2   5
   2   3   6
   ```

2. **Replacing Values with Conditions:**

   ```python
   # Replacing values in column 'A' where condition is met
   df.loc[df['A'] == 2, 'A'] = 20
   print(df)
   ```

   **Output:**
   ```
      A  B
   0  10  40
   1  20  5
   2   3  6
   ```

## Data Transformation

Data transformation involves changing the format or structure of your data to better suit your analysis.

### Applying Functions

1. **Applying a Function to Each Element:**

   ```python
   df = pd.DataFrame({'A': [1, 2, 3]})

   # Applying a function to each element
   print(df.apply(lambda x: x * 2))
   ```

   **Output:**
   ```
      A
   0  2
   1  4
   2  6
   ```

2. **Applying a Function to Each Column or Row:**

   ```python
   # Applying a function to each column
   print(df.apply(lambda x: x.sum(), axis=0))
   ```

   **Output:**
   ```
   A    6
   dtype: int64
   ```

   ```python
   # Applying a function to each row
   print(df.apply(lambda x: x.sum(), axis=1))
   ```

   **Output:**
   ```
   0    1
   1    2
   2    3
   dtype: int64
   ```

### Mapping and Renaming

1. **Mapping Values Using a Dictionary:**

   ```python
   df = pd.DataFrame({'A': ['cat', 'dog', 'bird']})

   # Mapping values
   ```
   
   ```python
   using a dictionary
   print(df['A'].map({'cat': 'feline', 'dog': 'canine'}))
   
   ```

   **Output:**
   ```
   0    feline
   1    canine
   2      NaN
   Name: A, dtype: object
   ```

2. **Renaming Columns and Index:**

   ```python
   # Renaming columns
   df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
   df.columns = ['X', 'Y']
   print(df)
   ```

   **Output:**
   ```
      X  Y
   0  1  3
   1  2  4
   ```

   ```python
   # Renaming index
   df.index = ['row1', 'row2']
   print(df)
   ```

   **Output:**
   ```
         X  Y
   row1  1  3
   row2  2  4
   ```

### Reshaping Data

1. **Reshaping with `melt`:**

   ```python
   df = pd.DataFrame({
       'A': [1, 2],
       'B': [3, 4]
   })

   # Melting the DataFrame
   melted_df = pd.melt(df, id_vars=['A'], value_vars=['B'])
   print(melted_df)
   ```

   **Output:**
   ```
      A variable  value
   0  1        B      3
   1  2        B      4
   ```

2. **Reshaping with `pivot`:**

   ```python
   df = pd.DataFrame({
       'Date': ['2024-01-01', '2024-01-02'],
       'Variable': ['A', 'B'],
       'Value': [10, 20]
   })

   # Pivoting the DataFrame
   pivoted_df = df.pivot(index='Date', columns='Variable', values='Value')
   print(pivoted_df)
   ```

   **Output:**
   ```
   Variable       A     B
   Date                  
   2024-01-01   10   NaN
   2024-01-02  NaN    20
   ```

3. **Reshaping with `stack` and `unstack`:**

   ```python
   df = pd.DataFrame({
       'A': [1, 2],
       'B': [3, 4]
   })

   # Stacking the DataFrame
   stacked_df = df.stack()
   print(stacked_df)
   ```

   **Output:**
   ```
   0  A    1
      B    3
   1  A    2
      B    4
   dtype: int64
   ```

   ```python
   # Unstacking the DataFrame
   unstacked_df = stacked_df.unstack()
   print(unstacked_df)
   ```

   **Output:**
   ```
      A  B
   0  1  3
   1  2  4
   ```

# Data Aggregation and Grouping

## GroupBy Operation

The `groupby` operation in Pandas is used to split data into groups based on some criteria, apply a function to each group independently, and then combine the results. This is very useful for aggregating and summarizing data.

### Basic GroupBy

1. **Grouping and Aggregating Data:**

   ```python
   import pandas as pd

   # Creating a DataFrame
   df = pd.DataFrame({
       'Category': ['A', 'B', 'A', 'B'],
       'Values': [10, 20, 30, 40]
   })

   # Grouping by 'Category' and calculating the sum of 'Values'
   grouped_df = df.groupby('Category').sum()
   print(grouped_df)
   ```

   **Output:**
   ```
            Values
   Category        
   A             40
   B             60
   ```

   **Explanation:**
   - `df.groupby('Category')` groups the data by the 'Category' column.
   - `.sum()` calculates the sum of 'Values' for each group.

2. **Grouping by Multiple Columns:**

   ```python
   # Creating a DataFrame
   df = pd.DataFrame({
       'Category': ['A', 'A', 'B', 'B'],
       'Type': ['X', 'Y', 'X', 'Y'],
       'Values': [10, 20, 30, 40]
   })

   # Grouping by 'Category' and 'Type' and calculating the sum of 'Values'
   grouped_df = df.groupby(['Category', 'Type']).sum()
   print(grouped_df)
   ```

   **Output:**
   ```
                     Values
   Category Type        
   A       X           10
           Y           20
   B       X           30
           Y           40
   ```

   **Explanation:**
   - `df.groupby(['Category', 'Type'])` groups the data by both 'Category' and 'Type' columns.
   - `.sum()` calculates the sum of 'Values' for each group.

### Aggregation Functions

1. **Applying Multiple Aggregations:**

   ```python
   # Applying multiple aggregation functions
   aggregated_df = df.groupby('Category').agg({
       'Values': ['sum', 'mean', 'max']
   })
   print(aggregated_df)
   ```

   **Output:**
   ```
           Values           
             sum  mean max
   Category                  
   A            40  20.0  30
   B            60  30.0  40
   ```

   **Explanation:**
   - `.agg({'Values': ['sum', 'mean', 'max']})` calculates the sum, mean, and maximum of 'Values' for each category.

### Filtering Groups

1. **Filtering Groups Based on a Condition:**

   ```python
   # Filter groups where the sum of 'Values' is greater than 50
   filtered_df = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50)
   print(filtered_df)
   ```

   **Output:**
   ```
     Category  Values
   1        B      20
   3        B      40
   ```

   **Explanation:**
   - `.filter(lambda x: x['Values'].sum() > 50)` keeps only those groups where the sum of 'Values' is greater than 50.

### Transformations

1. **Transforming Data:**

   ```python
   # Normalizing 'Values' within each group
   normalized_df = df.groupby('Category')['Values'].transform(lambda x: (x - x.mean()) / x.std())
   print(normalized_df)
   ```

   **Output:**
   ```
   0   -1.0
   1    1.0
   2    1.0
   3   -1.0
   Name: Values, dtype: float64
   ```

   **Explanation:**
   - `.transform(lambda x: (x - x.mean()) / x.std())` normalizes 'Values' within each group to have mean 0 and standard deviation 1.

## Pivot Tables

Pivot tables are used to summarize data and perform aggregation in a more flexible way.

### Creating a Pivot Table

1. **Basic Pivot Table:**

   ```python
   # Creating a pivot table
   pivot_table = pd.pivot_table(df, values='Values', index='Category', aggfunc='sum')
   print(pivot_table)
   ```

   **Output:**
   ```
            Values
   Category        
   A             40
   B             60
   ```

   **Explanation:**
   - `pd.pivot_table(df, values='Values', index='Category', aggfunc='sum')` creates a pivot table that summarizes 'Values' by 'Category' using the sum function.

### Advanced Pivot Table

1. **Pivot Table with Multiple Aggregations:**

   ```python
   # Creating a pivot table with multiple aggregations
   pivot_table = pd.pivot_table(df, values='Values', index='Category', aggfunc=['sum', 'mean', 'max'])
   print(pivot_table)
   ```

   **Output:**
   ```
           sum  mean max
   Category                
   A         40  20.0  30
   B         60  30.0  40
   ```

   **Explanation:**
   - `aggfunc=['sum', 'mean', 'max']` applies multiple aggregation functions (sum, mean, and max) to 'Values' in the pivot table.

### Crosstabulation

1. **Creating a Crosstab:**

   ```python
   # Creating a crosstab
   crosstab = pd.crosstab(df['Category'], df['Values'])
   print(crosstab)
   ```

   **Output:**
   ```
   Values  10  20  30  40
   Category               
   A          1   0   1   0
   B          0   1   0   1
   ```

   **Explanation:**
   - `pd.crosstab(df['Category'], df['Values'])` creates a cross-tabulation of 'Category' and 'Values', showing the frequency of each combination.

# Handling Missing Data

Missing data is a common issue in data analysis. Pandas provides several methods to handle missing data, including identifying, replacing, and removing missing values.

## Identifying Missing Data

1. **Checking for Missing Values:**

   ```python
   import pandas as pd

   # Creating a DataFrame with missing values
   df = pd.DataFrame({
       'A': [1, 2, None],
       'B': [None, 2, 3]
   })

   # Checking for missing values
   print(df.isnull())
   ```

   **Output:**
   ```
          A      B
   0  False   True
   1  False  False
   2   True  False
   ```

   **Explanation:**
   - `df.isnull()` returns a DataFrame of the same shape as `df`, where each element is `True` if the corresponding element in `df` is `NaN` and `False` otherwise.

2. **Summarizing Missing Values:**

   ```python
   # Counting missing values in each column
   print(df.isnull().sum())
   ```

   **Output:**
   ```
   A    1
   B    1
   dtype: int64
   ```

   **Explanation:**
   - `df.isnull().sum()` provides a summary count of missing values for each column.

## Replacing Missing Data

1. **Filling Missing Values:**

   ```python
   # Filling missing values with a specified value
   filled_df = df.fillna(0)
   print(filled_df)
   ```

   **Output:**
   ```
        A    B
   0  1.0  0.0
   1  2.0  2.0
   2  0.0  3.0
   ```

   **Explanation:**
   - `df.fillna(0)` replaces all `NaN` values in the DataFrame with `0`.

2. **Forward Fill:**

   ```python
   # Forward fill to propagate the next valid observation backward
   forward_filled_df = df.fillna(method='ffill')
   print(forward_filled_df)
   ```

   **Output:**
   ```
        A    B
   0  1.0  NaN
   1  2.0  2.0
   2  2.0  3.0
   ```

   **Explanation:**
   - `df.fillna(method='ffill')` propagates the next valid observation backward.

3. **Backward Fill:**

   ```python
   # Backward fill to propagate the next valid observation forward
   backward_filled_df = df.fillna(method='bfill')
   print(backward_filled_df)
   ```

   **Output:**
   ```
        A    B
   0  1.0  2.0
   1  2.0  2.0
   2  NaN  3.0
   ```

   **Explanation:**
   - `df.fillna(method='bfill')` propagates the next valid observation forward.

## Dropping Missing Data

1. **Dropping Rows with Missing Values:**

   ```python
   # Dropping rows with any missing values
   dropped_rows_df = df.dropna()
   print(dropped_rows_df)
   ```

   **Output:**
   ```
        A    B
   1  2.0  2.0
   ```

   **Explanation:**
   - `df.dropna()` removes all rows containing `NaN` values.

2. **Dropping Columns with Missing Values:**

   ```python
   # Dropping columns with any missing values
   dropped_columns_df = df.dropna(axis=1)
   print(dropped_columns_df)
   ```

   **Output:**
   ```
        A
   0  1.0
   1  2.0
   2  NaN
   ```

   **Explanation:**
   - `df.dropna(axis=1)` removes all columns containing `NaN` values.

## Interpolating Missing Data

1. **Linear Interpolation:**

   ```python
   # Linear interpolation to fill missing values
   interpolated_df = df.interpolate()
   print(interpolated_df)
   ```

   **Output:**
   ```
        A    B
   0  1.0  NaN
   1  2.0  2.0
   2  2.0  3.0
   ```

   **Explanation:**
   - `df.interpolate()` performs linear interpolation to estimate and fill in missing values.

# Merging and Joining DataFrames

Merging and joining DataFrames in Pandas allow you to combine data from different sources into a single DataFrame. This is useful for integrating data, performing analysis, and creating comprehensive datasets.

## Types of Joins

1. **Inner Join:**
   - Returns only the rows with matching keys in both DataFrames.

2. **Outer Join:**
   - Returns all rows from both DataFrames, with NaNs where there are no matches.

3. **Left Join:**
   - Returns all rows from the left DataFrame and matched rows from the right DataFrame. NaNs are placed where there are no matches in the right DataFrame.

4. **Right Join:**
   - Returns all rows from the right DataFrame and matched rows from the left DataFrame. NaNs are placed where there are no matches in the left DataFrame.

## Merging DataFrames

The `merge` function is used to merge DataFrames on a specified key or keys.

1. **Inner Join:**

   ```python
   import pandas as pd

   # Creating two DataFrames
   df1 = pd.DataFrame({
       'key': ['A', 'B', 'C'],
       'value1': [1, 2, 3]
   })

   df2 = pd.DataFrame({
       'key': ['B', 'C', 'D'],
       'value2': [4, 5, 6]
   })

   # Performing an inner join
   merged_df = pd.merge(df1, df2, on='key', how='inner')
   print(merged_df)
   ```

   **Output:**
   ```
     key  value1  value2
   0   B       2       4
   1   C       3       5
   ```

   **Explanation:**
   - `pd.merge(df1, df2, on='key', how='inner')` merges `df1` and `df2` on the 'key' column using an inner join, returning only rows with matching keys.



2. **Outer Join:**

   ```python
   # Performing an outer join
   merged_df = pd.merge(df1, df2, on='key', how='outer')
   print(merged_df)
   ```

   **Output:**
   ```
     key  value1  value2
   0   A     1.0     NaN
   1   B     2.0     4.0
   2   C     3.0     5.0
   3   D     NaN     6.0
   ```

   **Explanation:**
   - `pd.merge(df1, df2, on='key', how='outer')` merges `df1` and `df2` on the 'key' column using an outer join, returning all rows from both DataFrames with NaNs where there are no matches.



3. **Left Join:**

   ```python
   # Performing a left join
   merged_df = pd.merge(df1, df2, on='key', how='left')
   print(merged_df)
   ```

   **Output:**
   ```
     key  value1  value2
   0   A       1     NaN
   1   B       2     4.0
   2   C       3     5.0
   ```

   **Explanation:**
   - `pd.merge(df1, df2, on='key', how='left')` merges `df1` and `df2` on the 'key' column using a left join, returning all rows from `df1` and matched rows from `df2`.



4. **Right Join:**

   ```python
   # Performing a right join
   merged_df = pd.merge(df1, df2, on='key', how='right')
   print(merged_df)
   ```

   **Output:**
   ```
     key  value1  value2
   0   B     2.0       4
   1   C     3.0       5
   2   D     NaN       6
   ```

   **Explanation:**
   - `pd.merge(df1, df2, on='key', how='right')` merges `df1` and `df2` on the 'key' column using a right join, returning all rows from `df2` and matched rows from `df1`.

## Joining DataFrames

The `join` method is used to join DataFrames on their index.

1. **Joining on Index:**

   ```python
   # Creating two DataFrames
   df1 = pd.DataFrame({
       'value1': [1, 2, 3]
   }, index=['A', 'B', 'C'])

   df2 = pd.DataFrame({
       'value2': [4, 5, 6]
   }, index=['B', 'C', 'D'])

   # Joining DataFrames
   joined_df = df1.join(df2, how='inner')
   print(joined_df)
   ```

   **Output:**
   ```
      value1  value2
   B       2       4
   C       3       5
   ```

   **Explanation:**
   - `df1.join(df2, how='inner')` joins `df1` and `df2` on their index using an inner join.

2. **Left Join on Index:**

   ```python
   # Joining DataFrames with a left join
   joined_df = df1.join(df2, how='left')
   print(joined_df)
   ```

   **Output:**
   ```
      value1  value2
   A       1     NaN
   B       2     4.0
   C       3     5.0
   ```

   **Explanation:**
   - `df1.join(df2, how='left')` joins `df1` and `df2` on their index using a left join, returning all rows from `df1` and matched rows from `df2`.

3. **Right Join on Index:**

   ```python
   # Joining DataFrames with a right join
   joined_df = df1.join(df2, how='right')
   print(joined_df)
   ```

   **Output:**
   ```
      value1  value2
   B     2.0       4
   C     3.0       5
   D     NaN       6
   ```

   **Explanation:**
   - `df1.join(df2, how='right')` joins `df1` and `df2` on their index using a right join, returning all rows from `df2` and matched rows from `df1`.

## Concatenating DataFrames

The `concat` function is used to concatenate DataFrames along a particular axis.

1. **Concatenating Along Rows:**

   ```python
   # Creating two DataFrames
   df1 = pd.DataFrame({
       'A': [1, 2],
       'B': [3, 4]
   })

   df2 = pd.DataFrame({
       'A': [5, 6],
       'B': [7, 8]
   })

   # Concatenating DataFrames along rows
   concatenated_df = pd.concat([df1, df2], axis=0)
   print(concatenated_df)
   ```

   **Output:**
   ```
      A  B
   0  1  3
   1  2  4
   0  5  7
   1  6  8
   ```

   **Explanation:**
   - `pd.concat([df1, df2], axis=0)` concatenates `df1` and `df2` along rows.

2. **Concatenating Along Columns:**

   ```python
   # Concatenating DataFrames along columns
   concatenated_df = pd.concat([df1, df2], axis=1)
   print(concatenated_df)
   ```

   **Output:**
   ```
      A  B  A  B
   0  1  3  5  7
   1  2  4  6  8
   ```

   **Explanation:**
   - `pd.concat([df1, df2], axis=1)` concatenates `df1` and `df2` along columns.

# Working with Dates and Times

Handling dates and times is a common task in data analysis. Pandas provides powerful tools to work with dates and times, including parsing, manipulating, and formatting datetime objects.

## Creating Date and Time Objects

1. **Creating a DateTimeIndex:**

   ```python
   import pandas as pd

   # Creating a DateTimeIndex
   dates = pd.date_range(start='2023-01-01', periods=6, freq='D')
   print(dates)
   ```

   **Output:**
   ```
   DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04',
                  '2023-01-05', '2023-01-06'],
                 dtype='datetime64[ns]', freq='D')
   ```

   **Explanation:**
   - `pd.date_range(start='2023-01-01', periods=6, freq='D')` creates a sequence of 6 dates starting from January 1, 2023, with daily frequency.

2. **Converting Strings to DateTime:**

   ```python
   # Converting strings to datetime
   date_str = ['2023-01-01', '2023-01-02', '2023-01-03']
   dates = pd.to_datetime(date_str)
   print(dates)
   ```

   **Output:**
   ```
   DatetimeIndex(['2023-01-01', '2023-01-02', '2023-01-03'], dtype='datetime64[ns]', freq=None)
   ```

   **Explanation:**
   - `pd.to_datetime(date_str)` converts a list of date strings to a DatetimeIndex.

## Extracting Date and Time Components

1. **Extracting Year, Month, Day, etc.:**

   ```python
   # Creating a DataFrame with datetime objects
   df = pd.DataFrame({
       'date': pd.date_range(start='2023-01-01', periods=3, freq='D')
   })

   # Extracting year, month, and day
   df['year'] = df['date'].dt.year
   df['month'] = df['date'].dt.month
   df['day'] = df['date'].dt.day
   print(df)
   ```

   **Output:**
   ```
          date  year  month  day
   0 2023-01-01  2023      1    1
   1 2023-01-02  2023      1    2
   2 2023-01-03  2023      1    3
   ```

   **Explanation:**
   - `df['date'].dt.year`, `df['date'].dt.month`, and `df['date'].dt.day` extract the year, month, and day components from the 'date' column.

## Date and Time Arithmetic

1. **Adding and Subtracting Dates:**

   ```python
   # Adding days to a date
   df['date_plus_5'] = df['date'] + pd.Timedelta(days=5)
   print(df)
   ```

   **Output:**
   ```
          date  year  month  day date_plus_5
   0 2023-01-01  2023      1    1  2023-01-06
   1 2023-01-02  2023      1    2  2023-01-07
   2 2023-01-03  2023      1    3  2023-01-08
   ```

   **Explanation:**
   - `df['date'] + pd.Timedelta(days=5)` adds 5 days to each date in the 'date' column.

2. **Calculating Date Differences:**

   ```python
   # Calculating the difference between dates
   df['date_diff'] = df['date'].diff()
   print(df)
   ```

   **Output:**
   ```
          date  year  month  day date_plus_5 date_diff
   0 2023-01-01  2023      1    1  2023-01-06       NaT
   1 2023-01-02  2023      1    2  2023-01-07    1 days
   2 2023-01-03  2023      1    3  2023-01-08    1 days
   ```

   **Explanation:**
   - `df['date'].diff()` calculates the difference between consecutive dates.

## Handling Time Zones

1. **Setting Time Zones:**

   ```python
   # Setting the time zone
   df['date_utc'] = df['date'].dt.tz_localize('UTC')
   print(df)
   ```

   **Output:**
   ```
          date  year  month  day date_plus_5 date_diff                   date_utc
   0 2023-01-01  2023      1    1  2023-01-06       NaT 2023-01-01 00:00:00+00:00
   1 2023-01-02  2023      1    2  2023-01-07    1 days 2023-01-02 00:00:00+00:00
   2 2023-01-03  2023      1    3  2023-01-08    1 days 2023-01-03 00:00:00+00:00
   ```

   **Explanation:**
   - `df['date'].dt.tz_localize('UTC')` sets the time zone to UTC for the 'date' column.

2. **Converting Time Zones:**

   ```python
   # Converting to a different time zone
   df['date_est'] = df['date_utc'].dt.tz_convert('US/Eastern')
   print(df)
   ```

   **Output:**
   ```
          date  year  month  day date_plus_5 date_diff                   date_utc                   date_est
   0 2023-01-01  2023      1    1  2023-01-06       NaT 2023-01-01 00:00:00+00:00 2022-12-31 19:00:00-05:00
   1 2023-01-02  2023      1    2  2023-01-07    1 days 2023-01-02 00:00:00+00:00 2023-01-01 19:00:00-05:00
   2 2023-01-03  2023      1    3  2023-01-08    1 days 2023-01-03 00:00:00+00:00 2023-01-02 19:00:00-05:00
   ```

   **Explanation:**
   - `df['date_utc'].dt.tz_convert('US/Eastern')` converts the 'date_utc' column to Eastern Time.

## Resampling and Frequency Conversion

1. **Resampling Data:**

   ```python
   # Creating a time series DataFrame
   ts = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range(start='2023-01-01', periods=6, freq='D'))

   # Resampling to monthly frequency
   resampled_ts = ts.resample('M').sum()
   print(resampled_ts)
   ```

   **Output:**
   ```
   2023-01-31    21
   Freq: M, dtype: int64
   ```

   **Explanation:**
   - `ts.resample('M').sum()` resamples the time series `ts` to a monthly frequency and calculates the sum for each month.

2. **Frequency Conversion:**

   ```python
   # Converting daily data to business day frequency
   bday_ts = ts.asfreq('B')
   print(bday_ts)
   ```

   **Output:**
   ```
   2023-01-02    2.0
   2023-01-03    3.0
   2023-01-04    4.0
   2023-01-05    5.0
   2023-01-06    6.0
   Freq: B, dtype: float64
   ```

   **Explanation:**
   - `ts.asfreq('B')` converts the daily time series `ts` to a business day frequency.

# Visualization

## Plotting with Pandas

Pandas integrates with Matplotlib to provide a simple interface for generating plots directly from DataFrames and Series.

### Basic Plotting

Pandas offers a convenient way to create basic plots using the `plot` method available in DataFrames and Series.

#### Line Plot

```python
import pandas as pd
import matplotlib.pyplot as plt

# Creating a DataFrame
df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 20, 15, 25]
})

# Line plot
df.plot(x='x', y='y', kind='line')
plt.show()
```

**Output:**
A line plot displaying the data points from the DataFrame.

#### Bar Plot

```python
# Bar plot
df.plot(x='x', y='y', kind='bar')
plt.show()
```

**Output:**
A bar chart with bars representing the values of 'y'.

#### Histogram

```python
# Creating a Series
s = pd.Series([1, 2, 2, 3, 3, 3, 4, 4, 4, 4])

# Histogram
s.plot(kind='hist', bins=4)
plt.show()
```

**Output:**
A histogram showing the distribution of values in the Series.

#### Scatter Plot

```python
# Scatter plot
df.plot(x='x', y='y', kind='scatter')
plt.show()
```

**Output:**
A scatter plot displaying the relationship between 'x' and 'y'.

### Customizing Plots

Customizations in Pandas plots can be applied using parameters in the `plot` method or by modifying the Matplotlib `Axes` object.

#### Adding Titles and Labels

```python
# Customizing plot
ax = df.plot(x='x', y='y', kind='line')
ax.set_title('Line Plot')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
plt.show()
```

**Output:**
A line plot with a title and labeled axes.

#### Adjusting Plot Style

```python
# Using different styles
df.plot(x='x', y='y', kind='bar', color='skyblue', edgecolor='black')
plt.show()
```

**Output:**
A bar plot with customized colors and edge styles.

#### Adding Legends

```python
# Adding legend
df.plot(x='x', y='y', kind='line', label='Line Data')
plt.legend()
plt.show()
```

**Output:**
A line plot with a legend indicating the data series.

### Plotting with Matplotlib

For more advanced customizations and features, you can use Matplotlib directly. Pandas plotting functions return Matplotlib `Axes` objects, which can be further customized.

#### Customizing with Matplotlib

```python
import matplotlib.pyplot as plt

# Creating a DataFrame
df = pd.DataFrame({
    'x': [1, 2, 3, 4],
    'y': [10, 20, 15, 25]
})

# Plot with Matplotlib
fig, ax = plt.subplots()
df.plot(x='x', y='y', kind='line', ax=ax)
ax.set_title('Advanced Line Plot')
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.grid(True)
plt.show()
```

**Output:**
A line plot with advanced customizations such as grid lines, title, and axis labels.

# Advanced Pandas

## MultiIndex

### Creating MultiIndex

MultiIndex allows for hierarchical indexing, which is useful for higher-dimensional data in a DataFrame or Series.

#### Creating MultiIndex from Arrays

```python
import pandas as pd

# Creating MultiIndex
arrays = [
    ['A', 'A', 'B', 'B'],
    ['X', 'Y', 'X', 'Y']
]
index = pd.MultiIndex.from_arrays(arrays, names=('level_1', 'level_2'))

# Creating a DataFrame with MultiIndex
df = pd.DataFrame({
    'value': [1, 2, 3, 4]
}, index=index)
print(df)
```

**Output:**
```
              value
level_1 level_2      
A       X         1
        Y         2
B       X         3
        Y         4
```

### Indexing and Slicing

You can use the MultiIndex to slice and index data more efficiently.

#### Indexing with MultiIndex

```python
# Accessing data using MultiIndex
print(df.loc['A'])
```

**Output:**
```
         value
level_2      
X           1
Y           2
```

#### Slicing with MultiIndex

```python
# Slicing data with MultiIndex
print(df.loc['A':'B'])
```

**Output:**
```
              value
level_1 level_2      
A       X         1
        Y         2
B       X         3
        Y         4
```

### Reshaping with MultiIndex

MultiIndex can be used to reshape data, making it easier to work with hierarchical data.

#### Reshaping Data

```python
# Reshaping DataFrame with MultiIndex
df_reset = df.reset_index()
print(df_reset)
```

**Output:**
```
  level_1 level_2  value
0       A       X      1
1       A       Y      2
2       B       X      3
3       B       Y      4
```

## Reshaping and Pivoting

### Stack and Unstack

Stack and Unstack operations are used to reshape DataFrames with hierarchical indices.

#### Stack

```python
# Stacking the columns to rows
stacked_df = df.stack()
print(stacked_df)
```

**Output:**
```
level_1  level_2
A        X          1
         Y          2
B        X          3
         Y          4
dtype: int64
```

#### Unstack

```python
# Unstacking the rows to columns
unstacked_df = stacked_df.unstack()
print(unstacked_df)
```

**Output:**
```
level_2  X  Y
level_1      
A        1  2
B        3  4
```

### Melt

The `melt` function is used to unpivot DataFrames from wide to long format.

#### Melting DataFrames

```python
# Creating a DataFrame
df = pd.DataFrame({
    'A': [1, 2],
    'B': [3, 4],
    'C': [5, 6]
}, index=['row1', 'row2'])

# Melting the DataFrame
melted_df = df.reset_index().melt(id_vars='index')
melted_df.columns = ['Row', 'Variable', 'Value']
print(melted_df)
```

**Output:**
```
    Row Variable  Value
0  row1        A      1
1  row2        A      2
2  row1        B      3
3  row2        B      4
4  row1        C      5
5  row2        C      6
```

## Working with Text Data

### String Methods

Pandas provides string methods for Series with object dtype to handle and manipulate text data.

#### String Methods Examples

```python
# Creating a Series of strings
s = pd.Series(['apple', 'banana', 'cherry'])

# Basic string methods
print(s.str.upper())        # Convert to uppercase
print(s.str.contains('a'))  # Check for substring
print(s.str.replace('a', 'X'))  # Replace substring
```

**Output:**
```
0     APPLE
1    BANANA
2    CHERRY
dtype: object

0     True
1     True
2    False
dtype: bool

0     Xpple
1    BxnXnX
2    Cherry
dtype: object
```

### Regular Expressions

Pandas supports regular expressions for more advanced text searching and manipulation.

#### Using Regular Expressions

```python
# Series with text data
s = pd.Series(['apple pie', 'banana split', 'cherry tart'])

# Regular expression search
print(s.str.contains('pie|tart'))  # Check if 'pie' or 'tart' is in the text
```

**Output:**
```
0     True
1    False
2     True
dtype: bool
```

# Case Studies and Applications

## Real-world Examples

### Financial Data Analysis

#### Example: Analyzing Stock Prices

- **Loading Data**:
  ```python
  import pandas as pd

  # Load historical stock price data
  df = pd.read_csv('stock_prices.csv', parse_dates=['Date'], index_col='Date')
  ```

- **Calculating Returns**:
  ```python
  # Calculate daily returns
  df['Return'] = df['Close'].pct_change()
  ```

- **Visualizing Data**:
  ```python
  import matplotlib.pyplot as plt

  # Plot stock prices and returns
  df[['Close', 'Return']].plot(subplots=True, figsize=(10, 6))
  plt.show()
  ```

### Sales Data Analysis

#### Example: Analyzing Monthly Sales Trends

- **Loading Data**:
  ```python
  df = pd.read_csv('sales_data.csv', parse_dates=['Date'])
  df.set_index('Date', inplace=True)
  ```

- **Aggregating Monthly Sales**:
  ```python
  # Resample data to monthly frequency and sum sales
  monthly_sales = df.resample('M').sum()
  ```

- **Plotting Sales Trends**:
  ```python
  monthly_sales['Sales'].plot(title='Monthly Sales Trend')
  plt.show()
  ```

### Customer Segmentation

#### Example: Segmenting Customers Based on Purchase Behavior

- **Loading Data**:
  ```python
  df = pd.read_csv('customer_data.csv')
  ```

- **Performing Clustering**:
  ```python
  from sklearn.cluster import KMeans

  # Prepare data for clustering
  features = df[['Annual_Spend', 'Purchase_Frequency']]
  kmeans = KMeans(n_clusters=3)
  df['Segment'] = kmeans.fit_predict(features)
  ```

- **Visualizing Segments**:
  ```python
  import seaborn as sns

  # Plot customer segments
  sns.scatterplot(data=df, x='Annual_Spend', y='Purchase_Frequency', hue='Segment')
  plt.show()
  ```

## End-to-End Projects

### Project: Analyzing and Visualizing eCommerce Data

#### Data Collection

- **Data Source**:
  ```python
  df = pd.read_csv('ecommerce_data.csv')
  ```

#### Data Cleaning

- **Handling Missing Values**:
  ```python
  df.fillna(method='ffill', inplace=True)
  ```

- **Removing Duplicates**:
  ```python
  df.drop_duplicates(inplace=True)
  ```

#### Data Analysis

- **Descriptive Statistics**:
  ```python
  print(df.describe())
  ```

- **Sales Analysis**:
  ```python
  sales_by_category = df.groupby('Category')['Sales'].sum()
  ```

#### Data Visualization

- **Sales by Category**:
  ```python
  sales_by_category.plot(kind='bar', title='Sales by Category')
  plt.show()
  ```

- **Sales Over Time**:
  ```python
  df['Date'] = pd.to_datetime(df['Date'])
  df.set_index('Date', inplace=True)
  df.resample('M')['Sales'].sum().plot(title='Monthly Sales')
  plt.show()
  ```

#### Reporting and Sharing Insights

- **Generating Reports**:
  ```python
  summary = df.groupby('Category').agg({'Sales': 'sum', 'Quantity': 'sum'})
  summary.to_csv('sales_summary.csv')
  ```

- **Creating Dashboards**:
  ```python
  import dash
  import dash_core_components as dcc
  import dash_html_components as html
  import plotly.express as px

  app = dash.Dash(__name__)

  fig = px.bar(sales_by_category, x=sales_by_category.index, y='Sales')

  app.layout = html.Div([
      html.H1('eCommerce Sales Dashboard'),
      dcc.Graph(figure=fig)
  ])

  if __name__ == '__main__':
      app.run_server(debug=True)
  ```

# Methods and Functions Summary

## Data Inspection

### Overview

- **`info()`**: Summary of DataFrame
- **`describe()`**: Statistical summary of DataFrame
- **`head()`**: First n rows of DataFrame
- **`tail()`**: Last n rows of DataFrame
- **`shape`**: Dimensions of DataFrame
- **`columns`**: Column names
- **`index`**: Index of DataFrame
- **`dtypes`**: Data types of columns

### Examples

```python
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# Inspecting the DataFrame
print(df.info())
print(df.describe())
print(df.head())
print(df.tail())
print(df.shape)
print(df.columns)
print(df.index)
print(df.dtypes)
```

## Data Selection and Indexing

### Overview

- **`loc[]`**: Label-based selection
- **`iloc[]`**: Position-based selection
- **`at[]`**: Fast label-based scalar access
- **`iat[]`**: Fast position-based scalar access
- **`query()`**: Query DataFrame with a boolean expression
- **`xs()`**: Cross-section of DataFrame

### Examples

```python
# Selection by label
print(df.loc[0, 'A'])

# Selection by position
print(df.iloc[0, 0])

# Querying data
print(df.query('A > 1'))

# Cross-section
print(df.xs(0))
```

## Data Manipulation

### Overview

- **`assign()`**: Add new columns
- **`drop()`**: Remove rows or columns
- **`rename()`**: Rename columns or index
- **`fillna()`**: Fill missing values
- **`dropna()`**: Drop missing values
- **`replace()`**: Replace values
- **`map()`**: Map values from a dictionary
- **`apply()`**: Apply function along an axis

### Examples

```python
# Adding a new column
df = df.assign(C=[7, 8, 9])

# Dropping a column
df = df.drop(columns='C')

# Renaming a column
df = df.rename(columns={'A': 'Alpha'})

# Filling missing values
df = df.fillna(0)

# Dropping missing values
df = df.dropna()

# Replacing values
df = df.replace({4: 40})

# Mapping values
df['B'] = df['B'].map({40: 'Forty'})

# Applying a function
df['B'] = df['B'].apply(lambda x: x.upper())
```

## Data Aggregation and Grouping

### Overview

- **`groupby()`**: Group data by one or more columns
- **`agg()`**: Aggregate data with custom functions
- **`transform()`**: Transform data within groups
- **`pivot_table()`**: Create a pivot table
- **`crosstab()`**: Compute a cross-tabulation of two or more factors

### Examples

```python
# Grouping and aggregating
grouped = df.groupby('B').agg({'Alpha': 'sum'})

# Transforming within groups
df['Normalized'] = df.groupby('B')['Alpha'].transform(lambda x: (x - x.mean()) / x.std())

# Creating a pivot table
pivot = df.pivot_table(values='Alpha', index='B', aggfunc='sum')

# Computing a cross-tabulation
cross = pd.crosstab(df['B'], df['Alpha'])
```

## Data Merging and Joining

### Overview

- **`merge()`**: Merge DataFrames
- **`concat()`**: Concatenate DataFrames
- **`join()`**: Join DataFrames on indices

### Examples

```python
# Merging DataFrames
df2 = pd.DataFrame({'Alpha': [1, 2], 'Beta': [3, 4]})
merged = df.merge(df2, on='Alpha')

# Concatenating DataFrames
concatenated = pd.concat([df, df2])

# Joining DataFrames
joined = df.join(df2, on='Alpha')
```

## Handling Missing Data

### Overview

- **`isna()`**: Detect missing values
- **`notna()`**: Detect non-missing values
- **`fillna()`**: Fill missing values
- **`dropna()`**: Drop missing values
- **`interpolate()`**: Interpolate missing values

### Examples

```python
# Detecting missing values
print(df.isna())

# Filling missing values
df = df.fillna(method='ffill')

# Dropping missing values
df = df.dropna()

# Interpolating missing values
df = df.interpolate()
```

## Working with Dates and Times

### Overview

- **`to_datetime()`**: Convert to datetime
- **`date_range()`**: Generate a date range
- **`resample()`**: Resample time series data
- **`dt` accessor**: Access datetime properties

### Examples

```python
# Converting to datetime
df['Date'] = pd.to_datetime(df['Date'])

# Generating a date range
dates = pd.date_range(start='2022-01-01', periods=10)

# Resampling time series data
df_resampled = df.resample('M').sum()

# Accessing datetime properties
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
```

## Visualization

### Overview

- **`plot()`**: Basic plotting
- **`hist()`**: Histogram
- **`boxplot()`**: Box plot
- **`scatter()`**: Scatter plot

### Examples

```python
# Basic plot
df['Alpha'].plot()

# Histogram
df['Alpha'].hist()

# Box plot
df.boxplot(column='Alpha')

# Scatter plot
df.plot.scatter(x='Alpha', y='Beta')
```

## Advanced Pandas

### Overview

- **`MultiIndex`**: Multi-level indexing
- **`stack()`**: Stack DataFrame
- **`unstack()`**: Unstack DataFrame
- **`melt()`**: Unpivot DataFrame
- **`string methods`**: String operations
- **`regular expressions`**: Regex operations

### Examples

```python
# Creating MultiIndex
multi_index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1)], names=['Letter', 'Number'])

# Stacking and unstacking
stacked = df.stack()
unstacked = stacked.unstack()

# Melting DataFrame
melted = pd.melt(df, id_vars=['B'], value_vars=['Alpha'])

# String methods
df['Text'] = df['Text'].str.lower()

# Regular expressions
df['Text'] = df['Text'].str.replace(r'\d+', '')
```