# Topic 26: Pandas Basics - Data Manipulation

## Overview
Pandas is the most popular Python library for data manipulation and analysis[3][6]. It provides powerful data structures like DataFrame and Series for working with structured data[13].

### What You'll Learn:
- Series and DataFrame structures
- Data loading and saving
- Data selection and filtering
- Data cleaning and transformation
- Grouping and aggregation
- Merging and joining data

---

## 1. Pandas Data Structures

Understanding Series and DataFrame:

In [1]:
# Pandas data structures
import pandas as pd
import numpy as np

print("Pandas Data Structures:")
print("=" * 23)

# Series - 1D labeled array
print("1. Pandas Series:")

# Create Series from list
series_from_list = pd.Series([1, 2, 3, 4, 5])
print(f"   Series from list:")
print(f"{series_from_list}")
print(f"   Data type: {type(series_from_list)}")

# Series with custom index
series_custom_index = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(f"\n   Series with custom index:")
print(f"{series_custom_index}")

# Series from dictionary
series_from_dict = pd.Series({'apple': 5, 'banana': 3, 'orange': 8, 'grape': 2})
print(f"\n   Series from dictionary:")
print(f"{series_from_dict}")

# Series attributes
print(f"\n2. Series attributes:")
print(f"   Values: {series_from_dict.values}")
print(f"   Index: {series_from_dict.index.tolist()}")
print(f"   Shape: {series_from_dict.shape}")
print(f"   Size: {series_from_dict.size}")
print(f"   Data type: {series_from_dict.dtype}")

# Series operations
print(f"\n3. Series operations:")
print(f"   Original: {series_from_dict.tolist()}")
print(f"   + 2: {(series_from_dict + 2).tolist()}")
print(f"   * 3: {(series_from_dict * 3).tolist()}")
print(f"   > 5: {(series_from_dict > 5).tolist()}")

# DataFrame - 2D labeled data structure
print(f"\n4. Pandas DataFrame:")

# Create DataFrame from dictionary
data_dict = {
    'Name': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
    'Age': [25, 30, 35, 28, 32],
    'City': ['New York', 'London', 'Tokyo', 'Paris', 'Sydney'],
    'Salary': [70000, 80000, 90000, 75000, 85000]
}

df = pd.DataFrame(data_dict)
print(f"   DataFrame from dictionary:")
print(f"{df}")

# DataFrame from list of lists
data_lists = [
    ['Alice', 25, 'New York', 70000],
    ['Bob', 30, 'London', 80000],
    ['Charlie', 35, 'Tokyo', 90000]
]

df_from_lists = pd.DataFrame(data_lists, 
                            columns=['Name', 'Age', 'City', 'Salary'])
print(f"\n   DataFrame from list of lists:")
print(f"{df_from_lists}")

# DataFrame attributes
print(f"\n5. DataFrame attributes:")
print(f"   Shape: {df.shape}")
print(f"   Size: {df.size}")
print(f"   Columns: {df.columns.tolist()}")
print(f"   Index: {df.index.tolist()}")
print(f"   Data types:")
print(f"{df.dtypes}")

# DataFrame info
print(f"\n6. DataFrame info:")
print(f"   Info summary:")
df.info()

print(f"\n   Statistical description:")
print(f"{df.describe()}")

# Accessing DataFrame data
print(f"\n7. Accessing DataFrame data:")

# Select single column (returns Series)
print(f"   Single column 'Name' (Series):")
print(f"{df['Name']}")
print(f"   Type: {type(df['Name'])}")

# Select multiple columns (returns DataFrame)
print(f"\n   Multiple columns ['Name', 'Age']:")
print(f"{df[['Name', 'Age']]}")
print(f"   Type: {type(df[['Name', 'Age']])}")

# Select rows by index
print(f"\n   First row:")
print(f"{df.iloc[0]}")

# Select specific rows and columns
print(f"\n   First 3 rows, columns 'Name' and 'Salary':")
print(f"{df.loc[0:2, ['Name', 'Salary']]}")

# Creating DataFrame with custom index
print(f"\n8. DataFrame with custom index:")
df_custom_index = pd.DataFrame(data_dict, 
                              index=['emp001', 'emp002', 'emp003', 'emp004', 'emp005'])
print(f"   DataFrame with custom index:")
print(f"{df_custom_index}")

# Working with missing data
print(f"\n9. DataFrame with missing data:")
data_with_nan = {
    'A': [1, 2, np.nan, 4, 5],
    'B': [np.nan, 2, 3, 4, np.nan],
    'C': [1, 2, 3, np.nan, 5]
}

df_nan = pd.DataFrame(data_with_nan)
print(f"   DataFrame with NaN values:")
print(f"{df_nan}")
print(f"\n   Check for null values:")
print(f"{df_nan.isnull()}")
print(f"\n   Count of null values per column:")
print(f"{df_nan.isnull().sum()}")

Pandas Data Structures:
1. Pandas Series:
   Series from list:
0    1
1    2
2    3
3    4
4    5
dtype: int64
   Data type: <class 'pandas.core.series.Series'>

   Series with custom index:
a    10
b    20
c    30
d    40
dtype: int64

   Series from dictionary:
apple     5
banana    3
orange    8
grape     2
dtype: int64

2. Series attributes:
   Values: [5 3 8 2]
   Index: ['apple', 'banana', 'orange', 'grape']
   Shape: (4,)
   Size: 4
   Data type: int64

3. Series operations:
   Original: [5, 3, 8, 2]
   + 2: [7, 5, 10, 4]
   * 3: [15, 9, 24, 6]
   > 5: [False, False, True, False]

4. Pandas DataFrame:
   DataFrame from dictionary:
      Name  Age      City  Salary
0    Alice   25  New York   70000
1      Bob   30    London   80000
2  Charlie   35     Tokyo   90000
3    Diana   28     Paris   75000
4      Eve   32    Sydney   85000

   DataFrame from list of lists:
      Name  Age      City  Salary
0    Alice   25  New York   70000
1      Bob   30    London   80000
2  Charlie   3

## 2. Data Loading and Basic Operations

Reading data from various sources and basic operations:

In [2]:
# Data loading and basic operations
import pandas as pd
import numpy as np
from io import StringIO

print("Data Loading and Basic Operations:")
print("=" * 34)

# Create sample CSV data
csv_data = """
Name,Age,City,Salary,Department
Alice,25,New York,70000,Engineering
Bob,30,London,80000,Marketing
Charlie,35,Tokyo,90000,Engineering
Diana,28,Paris,75000,HR
Eve,32,Sydney,85000,Marketing
Frank,29,Berlin,78000,Engineering
Grace,31,Madrid,82000,Finance
Henry,27,Rome,72000,HR
Ivy,33,Amsterdam,88000,Finance
Jack,26,Dublin,71000,Marketing
"""

# Simulate reading from CSV
print("1. Loading data from CSV:")
df = pd.read_csv(StringIO(csv_data.strip()))
print(f"   Loaded DataFrame shape: {df.shape}")
print(f"{df.head()}")

# Basic DataFrame inspection
print(f"\n2. Basic DataFrame inspection:")
print(f"   First 3 rows:")
print(f"{df.head(3)}")
print(f"\n   Last 3 rows:")
print(f"{df.tail(3)}")
print(f"\n   Random sample of 3 rows:")
print(f"{df.sample(3)}")

# Column operations
print(f"\n3. Column operations:")
print(f"   Column names: {df.columns.tolist()}")
print(f"   Data types:")
for col, dtype in df.dtypes.items():
    print(f"     {col}: {dtype}")

# Add new column
df['Experience_Years'] = np.random.randint(1, 10, len(df))
df['Bonus'] = df['Salary'] * 0.1
print(f"\n   Added columns 'Experience_Years' and 'Bonus'")
print(f"{df[['Name', 'Salary', 'Experience_Years', 'Bonus']].head()}")

# Drop columns
df_dropped = df.drop(['Bonus'], axis=1)
print(f"\n   After dropping 'Bonus' column:")
print(f"   Columns: {df_dropped.columns.tolist()}")

# Rename columns
df_renamed = df.rename(columns={'Name': 'Employee_Name', 'Age': 'Employee_Age'})
print(f"\n   After renaming columns:")
print(f"   New columns: {df_renamed.columns.tolist()}")

# Row operations
print(f"\n4. Row operations:")

# Add new row
new_row = pd.DataFrame({
    'Name': ['Kate'],
    'Age': [29],
    'City': ['Vienna'],
    'Salary': [79000],
    'Department': ['Engineering'],
    'Experience_Years': [4],
    'Bonus': [7900]
})

df_with_new = pd.concat([df, new_row], ignore_index=True)
print(f"   After adding new row, shape: {df_with_new.shape}")
print(f"{df_with_new.tail(2)}")

# Drop rows
df_dropped_rows = df.drop([0, 1])  # Drop first two rows
print(f"\n   After dropping first 2 rows, shape: {df_dropped_rows.shape}")

# Sorting data
print(f"\n5. Sorting data:")

# Sort by single column
df_sorted_age = df.sort_values('Age')
print(f"   Sorted by Age (ascending):")
print(f"{df_sorted_age[['Name', 'Age', 'Salary']].head()}")

# Sort by multiple columns
df_sorted_multi = df.sort_values(['Department', 'Salary'], ascending=[True, False])
print(f"\n   Sorted by Department (asc), then Salary (desc):")
print(f"{df_sorted_multi[['Name', 'Department', 'Salary']].head()}")

# Filtering data
print(f"\n6. Filtering data:")

# Single condition
engineers = df[df['Department'] == 'Engineering']
print(f"   Engineers only:")
print(f"{engineers[['Name', 'Department', 'Salary']]}")

# Multiple conditions
high_earners = df[(df['Salary'] > 75000) & (df['Age'] < 30)]
print(f"\n   High earners under 30:")
print(f"{high_earners[['Name', 'Age', 'Salary']]}")

# Filter with isin()
selected_cities = df[df['City'].isin(['New York', 'London', 'Tokyo'])]
print(f"\n   Employees in major cities:")
print(f"{selected_cities[['Name', 'City']]}")

# String operations
print(f"\n7. String operations:")

# String methods
print(f"   Cities starting with 'L':")
cities_with_l = df[df['City'].str.startswith('L')]
print(f"{cities_with_l[['Name', 'City']]}")

# String contains
print(f"\n   Names containing 'a':")
names_with_a = df[df['Name'].str.contains('a', case=False)]
print(f"{names_with_a[['Name']]}")

# String length
df['Name_Length'] = df['Name'].str.len()
print(f"\n   Name lengths:")
print(f"{df[['Name', 'Name_Length']]}")

# Grouping preview
print(f"\n8. Basic grouping:")

# Group by single column
dept_stats = df.groupby('Department')['Salary'].agg(['mean', 'min', 'max', 'count'])
print(f"   Salary statistics by Department:")
print(f"{dept_stats.round(2)}")

# Value counts
print(f"\n9. Value counts:")
print(f"   Department distribution:")
print(f"{df['Department'].value_counts()}")

print(f"\n   City distribution:")
print(f"{df['City'].value_counts()}")

# Basic statistics
print(f"\n10. Basic statistics:")
print(f"   Numerical columns summary:")
print(f"{df.describe()}")

# Correlation
print(f"\n   Correlation matrix:")
corr_matrix = df[['Age', 'Salary', 'Experience_Years']].corr()
print(f"{corr_matrix.round(3)}")

print(f"\n11. Data loading formats:")
print(f"   ✓ CSV: pd.read_csv('file.csv')")
print(f"   ✓ Excel: pd.read_excel('file.xlsx')")
print(f"   ✓ JSON: pd.read_json('file.json')")
print(f"   ✓ SQL: pd.read_sql('query', connection)")
print(f"   ✓ HTML: pd.read_html('url')")

print(f"\n12. Data saving formats:")
print(f"   ✓ CSV: df.to_csv('file.csv', index=False)")
print(f"   ✓ Excel: df.to_excel('file.xlsx', index=False)")
print(f"   ✓ JSON: df.to_json('file.json')")
print(f"   ✓ SQL: df.to_sql('table', connection)")

Data Loading and Basic Operations:
1. Loading data from CSV:
   Loaded DataFrame shape: (10, 5)
      Name  Age      City  Salary   Department
0    Alice   25  New York   70000  Engineering
1      Bob   30    London   80000    Marketing
2  Charlie   35     Tokyo   90000  Engineering
3    Diana   28     Paris   75000           HR
4      Eve   32    Sydney   85000    Marketing

2. Basic DataFrame inspection:
   First 3 rows:
      Name  Age      City  Salary   Department
0    Alice   25  New York   70000  Engineering
1      Bob   30    London   80000    Marketing
2  Charlie   35     Tokyo   90000  Engineering

   Last 3 rows:
    Name  Age       City  Salary Department
7  Henry   27       Rome   72000         HR
8    Ivy   33  Amsterdam   88000    Finance
9   Jack   26     Dublin   71000  Marketing

   Random sample of 3 rows:
    Name  Age      City  Salary   Department
7  Henry   27      Rome   72000           HR
1    Bob   30    London   80000    Marketing
0  Alice   25  New York   70

## 3. Data Cleaning and Transformation

Handling missing data and transforming datasets:

In [3]:
# Data cleaning and transformation
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

print("Data Cleaning and Transformation:")
print("=" * 33)

# Create messy dataset for cleaning
np.random.seed(42)
messy_data = {
    'ID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    'Name': ['Alice Johnson', ' bob smith ', 'CHARLIE BROWN', 'diana davis', 'Eve Wilson', None, 'frank miller', 'Grace LEE', 'henry CLARK', ''],
    'Age': [25, 30, np.nan, 28, 32, 45, 29, np.nan, 27, 35],
    'Email': ['alice@email.com', 'BOB@EMAIL.COM', 'charlie@email.com', 'invalid-email', 'eve@email.com', 'frank@email.com', None, 'grace@email.com', 'henry@email.com', 'duplicate@email.com'],
    'Salary': [70000, 80000, 90000, np.nan, 85000, 95000, 78000, 82000, np.nan, 88000],
    'Join_Date': ['2021-01-15', '2020-03-22', '2019-07-10', '2022-01-05', 'invalid-date', '2020-11-18', '2021-06-30', '2019-12-01', '2022-03-15', '2020-08-25'],
    'Department': ['Engineering', 'marketing', 'ENGINEERING', 'hr', 'Marketing', 'engineering', 'HR', 'Marketing', 'Engineering', 'HR']
}

df_messy = pd.DataFrame(messy_data)
print("1. Original messy dataset:")
print(f"{df_messy}")

# Data quality assessment
print(f"\n2. Data quality assessment:")
print(f"   Dataset shape: {df_messy.shape}")
print(f"   Missing values per column:")
print(f"{df_messy.isnull().sum()}")
print(f"\n   Data types:")
print(f"{df_messy.dtypes}")

# Handling missing values
print(f"\n3. Handling missing values:")

# Check for different types of missing data
print(f"   Empty strings count: {(df_messy['Name'] == '').sum()}")
print(f"   None values count: {df_messy['Name'].isnull().sum()}")

# Replace empty strings with NaN
df_cleaned = df_messy.copy()
df_cleaned = df_cleaned.replace('', np.nan)
print(f"\n   After replacing empty strings with NaN:")
print(f"{df_cleaned.isnull().sum()}")

# Drop rows with too many missing values
print(f"\n4. Removing rows with excessive missing data:")
# Keep rows with at least 6 non-null values
df_cleaned = df_cleaned.dropna(thresh=6)
print(f"   Shape after removing rows with >4 missing values: {df_cleaned.shape}")

# Fill missing values
print(f"\n5. Filling missing values:")

# Fill Age with median
median_age = df_cleaned['Age'].median()
df_cleaned['Age'] = df_cleaned['Age'].fillna(median_age)
print(f"   Filled missing Age with median: {median_age}")

# Fill Salary with mean by Department
print(f"   Filling Salary with department mean:")
for dept in df_cleaned['Department'].unique():
    if pd.isna(dept):
        continue
    dept_mask = df_cleaned['Department'] == dept
    dept_mean_salary = df_cleaned.loc[dept_mask, 'Salary'].mean()
    df_cleaned.loc[dept_mask, 'Salary'] = df_cleaned.loc[dept_mask, 'Salary'].fillna(dept_mean_salary)

# Forward fill for remaining missing values
df_cleaned['Salary'] = df_cleaned['Salary'].fillna(method='ffill')

print(f"   Missing values after filling:")
print(f"{df_cleaned.isnull().sum()}")

# String cleaning
print(f"\n6. String cleaning:")

# Clean Name column
df_cleaned['Name'] = df_cleaned['Name'].str.strip()  # Remove whitespace
df_cleaned['Name'] = df_cleaned['Name'].str.title()  # Title case
print(f"   Cleaned Names:")
print(f"{df_cleaned['Name'].tolist()}")

# Clean Email column
df_cleaned['Email'] = df_cleaned['Email'].str.lower()  # Lowercase emails
print(f"   Cleaned Emails:")
print(f"{df_cleaned['Email'].tolist()}")

# Standardize Department names
dept_mapping = {
    'engineering': 'Engineering',
    'marketing': 'Marketing', 
    'hr': 'HR'
}
df_cleaned['Department'] = df_cleaned['Department'].str.lower().map(dept_mapping)
print(f"   Standardized Departments:")
print(f"{df_cleaned['Department'].value_counts()}")

# Data type conversion
print(f"\n7. Data type conversion:")

# Convert Join_Date to datetime
print(f"   Converting Join_Date to datetime:")
df_cleaned['Join_Date'] = pd.to_datetime(df_cleaned['Join_Date'], errors='coerce')
print(f"   Join_Date type: {df_cleaned['Join_Date'].dtype}")
print(f"   Invalid dates (NaT): {df_cleaned['Join_Date'].isnull().sum()}")

# Fill invalid dates with a reasonable default
default_date = pd.to_datetime('2021-01-01')
df_cleaned['Join_Date'] = df_cleaned['Join_Date'].fillna(default_date)

# Convert numeric columns
df_cleaned['Age'] = df_cleaned['Age'].astype(int)
df_cleaned['Salary'] = df_cleaned['Salary'].astype(int)

print(f"   Final data types:")
print(f"{df_cleaned.dtypes}")

# Remove duplicates
print(f"\n8. Handling duplicates:")
print(f"   Original shape: {df_cleaned.shape}")

# Check for duplicate emails
duplicate_emails = df_cleaned['Email'].duplicated().sum()
print(f"   Duplicate emails: {duplicate_emails}")

# Remove rows with duplicate emails (keep first)
df_cleaned = df_cleaned.drop_duplicates(subset=['Email'], keep='first')
print(f"   Shape after removing duplicate emails: {df_cleaned.shape}")

# Data validation
print(f"\n9. Data validation:")

# Validate email format
email_pattern = r'^[\w\.-]+@[\w\.-]+\.[\w]+$'
valid_emails = df_cleaned['Email'].str.match(email_pattern, na=False)
print(f"   Valid email addresses: {valid_emails.sum()}/{len(df_cleaned)}")
print(f"   Invalid emails:")
invalid_email_mask = ~valid_emails & df_cleaned['Email'].notna()
if invalid_email_mask.any():
    print(f"{df_cleaned.loc[invalid_email_mask, ['Name', 'Email']]}")

# Validate age range
valid_ages = (df_cleaned['Age'] >= 18) & (df_cleaned['Age'] <= 65)
print(f"   Valid ages (18-65): {valid_ages.sum()}/{len(df_cleaned)}")

# Validate salary range
valid_salaries = (df_cleaned['Salary'] >= 30000) & (df_cleaned['Salary'] <= 200000)
print(f"   Valid salaries (30k-200k): {valid_salaries.sum()}/{len(df_cleaned)}")

# Feature engineering
print(f"\n10. Feature engineering:")

# Calculate years since joining
current_date = datetime.now()
df_cleaned['Years_Since_Join'] = (current_date - df_cleaned['Join_Date']).dt.days / 365.25
df_cleaned['Years_Since_Join'] = df_cleaned['Years_Since_Join'].round(1)

# Create age categories
def categorize_age(age):
    if age < 25:
        return 'Young'
    elif age < 35:
        return 'Mid-Career'
    else:
        return 'Senior'

df_cleaned['Age_Category'] = df_cleaned['Age'].apply(categorize_age)

# Create salary bands
df_cleaned['Salary_Band'] = pd.cut(df_cleaned['Salary'], 
                                  bins=[0, 75000, 85000, float('inf')], 
                                  labels=['Low', 'Medium', 'High'])

print(f"   New features created:")
print(f"{df_cleaned[['Name', 'Age', 'Age_Category', 'Salary', 'Salary_Band', 'Years_Since_Join']]}")

# Outlier detection
print(f"\n11. Outlier detection:")

# Using IQR method for salary
Q1 = df_cleaned['Salary'].quantile(0.25)
Q3 = df_cleaned['Salary'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

outliers = (df_cleaned['Salary'] < lower_bound) | (df_cleaned['Salary'] > upper_bound)
print(f"   Salary outliers (IQR method): {outliers.sum()}")
print(f"   Outlier bounds: ${lower_bound:.0f} - ${upper_bound:.0f}")

if outliers.any():
    print(f"   Outlier records:")
    print(f"{df_cleaned.loc[outliers, ['Name', 'Salary']]}")

# Final cleaned dataset
print(f"\n12. Final cleaned dataset:")
print(f"   Shape: {df_cleaned.shape}")
print(f"   Missing values: {df_cleaned.isnull().sum().sum()}")
print(f"\n   Sample of cleaned data:")
print(f"{df_cleaned.head()}")

# Data cleaning summary
print(f"\n13. Data cleaning checklist:")
print(f"   ✓ Handled missing values (fill, drop, impute)")
print(f"   ✓ Standardized string formats (case, whitespace)")
print(f"   ✓ Converted data types appropriately")
print(f"   ✓ Removed duplicates")
print(f"   ✓ Validated data ranges and formats")
print(f"   ✓ Created derived features")
print(f"   ✓ Detected outliers")
print(f"   ✓ Documented cleaning process")

# Export cleaned data
print(f"\n14. Exporting cleaned data:")
print(f"   # Save cleaned dataset")
print(f"   df_cleaned.to_csv('cleaned_data.csv', index=False)")
print(f"   # Create data dictionary")
print(f"   data_dict = df_cleaned.dtypes.to_dict()")
print(f"   print('Data Dictionary:', data_dict)")

Data Cleaning and Transformation:
1. Original messy dataset:
   ID           Name   Age                Email   Salary     Join_Date  \
0   1  Alice Johnson  25.0      alice@email.com  70000.0    2021-01-15   
1   2     bob smith   30.0        BOB@EMAIL.COM  80000.0    2020-03-22   
2   3  CHARLIE BROWN   NaN    charlie@email.com  90000.0    2019-07-10   
3   4    diana davis  28.0        invalid-email      NaN    2022-01-05   
4   5     Eve Wilson  32.0        eve@email.com  85000.0  invalid-date   
5   6           None  45.0      frank@email.com  95000.0    2020-11-18   
6   7   frank miller  29.0                 None  78000.0    2021-06-30   
7   8      Grace LEE   NaN      grace@email.com  82000.0    2019-12-01   
8   9    henry CLARK  27.0      henry@email.com      NaN    2022-03-15   
9  10                 35.0  duplicate@email.com  88000.0    2020-08-25   

    Department  
0  Engineering  
1    marketing  
2  ENGINEERING  
3           hr  
4    Marketing  
5  engineering  
6    

  df_cleaned['Salary'] = df_cleaned['Salary'].fillna(method='ffill')


## Summary

In this notebook, you learned about:

✅ **Pandas Data Structures**: Series and DataFrame for structured data[3][13]  
✅ **Data Loading**: Reading from CSV, Excel, JSON and other formats[3]  
✅ **Data Selection**: Indexing, filtering, and querying data effectively  
✅ **Data Cleaning**: Handling missing values, duplicates, and data validation  
✅ **Data Transformation**: Type conversion, feature engineering, and standardization  
✅ **Basic Operations**: Sorting, grouping, and statistical summaries[6]  

### Key Takeaways:
1. Pandas is built on NumPy and provides high-level data structures[7][13]
2. DataFrame is like a spreadsheet or SQL table with labeled rows and columns
3. Always inspect data quality before analysis (missing values, duplicates, outliers)
4. String methods (.str) provide powerful text processing capabilities
5. Proper data cleaning is crucial for accurate analysis results[6]
6. Pandas integrates seamlessly with NumPy and Matplotlib[7]

### Next Topic: 27_matplotlib_basics.ipynb
Learn about data visualization with Matplotlib for creating plots and charts.