# Notebook 3: Lists and Basic Data Structures

Welcome to your third Python notebook! Now you'll learn about data structures - ways to store and organize multiple pieces of data. This is crucial for data science work.

**Learning Objectives:**
- Create and manipulate lists
- Use indexing and slicing to access data
- Understand tuples and their differences from lists
- Work with strings as sequences
- Apply list methods for data manipulation

## Introduction to Lists

A list is a collection of items that can be of different types. Lists are one of the most important data structures in Python and data science.

In [None]:
# Creating lists
numbers = [1, 2, 3, 4, 5]
names = ["Alice", "Bob", "Charlie"]
mixed = [1, "hello", 3.14, True]
empty_list = []

print("Numbers:", numbers)
print("Names:", names)
print("Mixed types:", mixed)
print("Empty list:", empty_list)

## List Indexing

You can access individual items in a list using their index (position). **Important:** Python uses 0-based indexing!

In [None]:
fruits = ["apple", "banana", "cherry", "date", "elderberry"]

print("List:", fruits)
print("Length:", len(fruits))
print()

# Positive indexing (from the beginning)
print("First item (index 0):", fruits[0])
print("Second item (index 1):", fruits[1])
print("Third item (index 2):", fruits[2])
print()

# Negative indexing (from the end)
print("Last item (index -1):", fruits[-1])
print("Second to last (index -2):", fruits[-2])

## List Slicing - VERY IMPORTANT for Data Science!

Slicing lets you get a portion of a list. This is used extensively in data science notebooks!

In [None]:
# Sample data (like you'll see in data science)
data = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
print("Original data:", data)
print()

# Basic slicing: [start:end] (end is not included)
print("First 3 items [0:3]:", data[0:3])  # This pattern is used a lot!
print("Items 2 to 5 [2:5]:", data[2:5])
print("Last 3 items [-3:]:", data[-3:])
print()

# Slicing shortcuts
print("First 5 items [:5]:", data[:5])
print("From index 3 to end [3:]:", data[3:])
print("All items [:]:", data[:])
print()

# Step slicing: [start:end:step]
print("Every 2nd item [::2]:", data[::2])
print("Every 3rd item from index 1 [1::3]:", data[1::3])
print("Reverse the list [::-1]:", data[::-1])

## Modifying Lists

Lists are mutable, meaning you can change their contents:

In [None]:
# Starting with a list of scores
scores = [85, 92, 78, 96, 88]
print("Original scores:", scores)

# Change a single item
scores[2] = 82  # Change the third score
print("After changing index 2:", scores)

# Change multiple items using slicing
scores[1:3] = [90, 85]  # Replace items at index 1 and 2
print("After changing slice [1:3]:", scores)

## List Methods

Lists have many built-in methods for manipulation:

In [None]:
# Starting with an empty list
data_points = []
print("Starting list:", data_points)

# Adding items
data_points.append(10)  # Add one item
data_points.append(20)
data_points.append(30)
print("After appending:", data_points)

# Add multiple items
data_points.extend([40, 50, 60])
print("After extending:", data_points)

# Insert at specific position
data_points.insert(0, 5)  # Insert 5 at the beginning
print("After inserting 5 at index 0:", data_points)

In [None]:
# Removing items
numbers = [1, 2, 3, 2, 4, 2, 5]
print("Original:", numbers)

# Remove by value (removes first occurrence)
numbers.remove(2)
print("After removing first 2:", numbers)

# Remove by index
removed_item = numbers.pop(1)  # Remove item at index 1
print(f"Removed item at index 1: {removed_item}")
print("List after pop:", numbers)

# Remove last item
last_item = numbers.pop()
print(f"Removed last item: {last_item}")
print("Final list:", numbers)

## List Operations and Functions

Common operations you'll use in data science:

In [None]:
# Sample dataset
temperatures = [23.5, 25.1, 22.8, 26.3, 24.7, 21.9, 27.2]
print("Temperatures:", temperatures)
print()

# Basic statistics
print(f"Number of readings: {len(temperatures)}")
print(f"Minimum temperature: {min(temperatures)}")
print(f"Maximum temperature: {max(temperatures)}")
print(f"Sum of temperatures: {sum(temperatures)}")
print(f"Average temperature: {sum(temperatures) / len(temperatures):.2f}")
print()

# Sorting (creates a new sorted list)
sorted_temps = sorted(temperatures)
print("Sorted temperatures:", sorted_temps)
print("Original (unchanged):", temperatures)
print()

# Sort in place (modifies the original list)
temperatures.sort()
print("After sorting in place:", temperatures)

# Reverse
temperatures.reverse()
print("After reversing:", temperatures)

## Working with Lists in Loops

This is essential for data processing:

In [None]:
# Sample data
raw_scores = [85, 92, 78, 96, 88, 73, 91]

# Method 1: Iterate through values
print("All scores:")
for score in raw_scores:
    print(f"Score: {score}")

print()

# Method 2: Iterate with index (useful for data science)
print("Scores with position:")
for i in range(len(raw_scores)):
    print(f"Student {i+1}: {raw_scores[i]}")

print()

# Method 3: Enumerate (gets both index and value)
print("Using enumerate:")
for index, score in enumerate(raw_scores):
    print(f"Index {index}: {score}")

## List Comprehensions - Powerful Data Processing!

List comprehensions are a concise way to create lists. Very useful in data science!

In [None]:
# Traditional way to create a list of squares
squares_traditional = []
for i in range(1, 6):
    squares_traditional.append(i ** 2)
print("Traditional way:", squares_traditional)

# List comprehension way (more concise)
squares_comprehension = [i ** 2 for i in range(1, 6)]
print("List comprehension:", squares_comprehension)

# More examples
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

# Get even numbers
evens = [x for x in numbers if x % 2 == 0]
print("Even numbers:", evens)

# Convert temperatures from Celsius to Fahrenheit
celsius = [0, 20, 30, 40]
fahrenheit = [c * 9/5 + 32 for c in celsius]
print("Celsius:", celsius)
print("Fahrenheit:", fahrenheit)

## Tuples - Immutable Sequences

Tuples are like lists but cannot be changed after creation:

In [None]:
# Creating tuples
coordinates = (3.5, 7.2)  # x, y coordinates
rgb_color = (255, 128, 0)  # Red, Green, Blue values
student_info = ("Alice", 20, "Computer Science")

print("Coordinates:", coordinates)
print("RGB Color:", rgb_color)
print("Student Info:", student_info)
print()

# Accessing tuple elements (same as lists)
print(f"X coordinate: {coordinates[0]}")
print(f"Y coordinate: {coordinates[1]}")
print(f"Student name: {student_info[0]}")
print()

# Tuple unpacking (very useful!)
x, y = coordinates
name, age, major = student_info
print(f"Unpacked coordinates: x={x}, y={y}")
print(f"Unpacked student: {name}, {age} years old, studying {major}")

## Strings as Sequences

Strings behave like lists of characters:

In [None]:
text = "Data Science"
print(f"Text: '{text}'")
print(f"Length: {len(text)}")
print()

# Indexing
print(f"First character: '{text[0]}'")
print(f"Last character: '{text[-1]}'")
print()

# Slicing
print(f"First 4 characters: '{text[0:4]}'")
print(f"Last 7 characters: '{text[-7:]}'")
print(f"Every 2nd character: '{text[::2]}'")
print()

# String methods
print(f"Uppercase: '{text.upper()}'")
print(f"Replace 'Data' with 'Machine': '{text.replace('Data', 'Machine')}'")
print(f"Split into words: {text.split()}")

## Nested Lists - Multi-dimensional Data

Lists can contain other lists, creating multi-dimensional structures:

In [None]:
# A simple 2D dataset (like a spreadsheet)
student_grades = [
    ["Alice", 85, 92, 78],
    ["Bob", 79, 88, 91],
    ["Charlie", 92, 85, 87]
]

print("Student grades:")
for row in student_grades:
    print(row)
print()

# Accessing nested data
print(f"First student: {student_grades[0]}")
print(f"First student's name: {student_grades[0][0]}")
print(f"First student's first grade: {student_grades[0][1]}")
print()

# Processing nested data
print("Student averages:")
for student in student_grades:
    name = student[0]
    grades = student[1:]  # All grades (excluding name)
    average = sum(grades) / len(grades)
    print(f"{name}: {average:.1f}")

## Practice Exercises

These exercises will prepare you for data science work:

### Exercise 1: Data Filtering
Given a list of temperatures, create a new list with only temperatures above 25°C.

In [None]:
temperatures = [23.1, 26.5, 22.8, 27.3, 24.9, 28.1, 21.7, 25.4, 29.2]

# Method 1: Using a loop
hot_temps_loop = []
for temp in temperatures:
    # Your code here
    pass

# Method 2: Using list comprehension
hot_temps_comprehension = []  # Your code here

print(f"Original temperatures: {temperatures}")
print(f"Hot temperatures (loop): {hot_temps_loop}")
print(f"Hot temperatures (comprehension): {hot_temps_comprehension}")

### Exercise 2: Data Slicing Practice
Practice the slicing operations you'll see in data science notebooks.

In [None]:
# Sample dataset (like X in machine learning)
X = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

print(f"Original data X: {X}")

# Practice these common slicing patterns:
print(f"First 3 items X[0:3]: {X[0:3]}")  # Very common in ML notebooks!
print(f"Items 2 to 5 X[2:5]: {X[2:5]}")
print(f"Last 3 items X[-3:]: {X[-3:]}")
print(f"Every other item X[::2]: {X[::2]}")

# Your turn - try these:
# Get items from index 1 to 4
# Get the middle 4 items
# Get every 3rd item starting from index 1

### Exercise 3: Basic Data Analysis
Analyze a dataset of sales figures.

In [None]:
# Monthly sales data
sales = [12500, 13200, 11800, 14500, 13900, 15200, 14800, 13600, 12900, 14100, 15500, 16200]
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
          "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

print("Monthly Sales Analysis")
print("=" * 25)

# Basic statistics
total_sales = sum(sales)
average_sales = total_sales / len(sales)
min_sales = min(sales)
max_sales = max(sales)

print(f"Total sales: ${total_sales:,}")
print(f"Average monthly sales: ${average_sales:,.0f}")
print(f"Minimum sales: ${min_sales:,}")
print(f"Maximum sales: ${max_sales:,}")
print()

# Find which months had min and max sales
min_month_index = sales.index(min_sales)
max_month_index = sales.index(max_sales)

print(f"Lowest sales month: {months[min_month_index]} (${min_sales:,})")
print(f"Highest sales month: {months[max_month_index]} (${max_sales:,})")
print()

# Find months with above-average sales
above_average_months = []
for i, sale in enumerate(sales):
    if sale > average_sales:
        above_average_months.append(months[i])

print(f"Months with above-average sales: {above_average_months}")

### Exercise 4: Data Transformation
Convert raw data into a more useful format.

In [None]:
# Raw survey data: [age, satisfaction_score, years_experience]
survey_data = [
    [25, 8, 2],
    [32, 7, 8],
    [28, 9, 4],
    [45, 6, 15],
    [23, 8, 1],
    [38, 7, 12]
]

print("Raw survey data:")
for i, person in enumerate(survey_data):
    print(f"Person {i+1}: Age={person[0]}, Satisfaction={person[1]}, Experience={person[2]}")
print()

# Extract individual columns (like you'll do in data science)
ages = [person[0] for person in survey_data]
satisfaction_scores = [person[1] for person in survey_data]
experience_years = [person[2] for person in survey_data]

print(f"Ages: {ages}")
print(f"Satisfaction scores: {satisfaction_scores}")
print(f"Years of experience: {experience_years}")
print()

# Calculate statistics for each column
print("Statistics:")
print(f"Average age: {sum(ages) / len(ages):.1f}")
print(f"Average satisfaction: {sum(satisfaction_scores) / len(satisfaction_scores):.1f}")
print(f"Average experience: {sum(experience_years) / len(experience_years):.1f}")

## Key Takeaways

1. **Lists** are ordered, mutable collections - fundamental to data science
2. **Indexing** starts at 0, negative indices count from the end
3. **Slicing** [start:end:step] is crucial - you'll see X[0:3] patterns everywhere in ML
4. **List methods** like append(), extend(), remove() help manipulate data
5. **List comprehensions** provide concise ways to transform data
6. **Tuples** are immutable and good for fixed data like coordinates
7. **Nested lists** represent multi-dimensional data like spreadsheets
8. **String slicing** works the same as list slicing

These concepts are essential for data science. You'll use list slicing, indexing, and comprehensions constantly when working with datasets. In the next notebook, we'll learn about dictionaries and more advanced data manipulation techniques.

---

## 🔧 Common Errors and Troubleshooting

Learning to debug is a crucial skill in data science! Here are common errors you might encounter:

### IndexError: list index out of range
**What it means:** You're trying to access a position that doesn't exist

In [None]:
# Common Error Examples - Understanding and Fixing Them

# 1. IndexError - Trying to access non-existent index
data = [10, 20, 30]
print(f"List length: {len(data)}")
print(f"Last valid index: {len(data) - 1}")

# This would cause an error: print(data[3])
# Fix: Check list length first
if len(data) > 3:
    print(data[3])
else:
    print("Index 3 doesn't exist in this list")

# 2. ValueError - Wrong value type
numbers = ["1", "2", "three", "4"]
converted = []
for item in numbers:
    try:
        converted.append(int(item))
    except ValueError:
        print(f"'{item}' cannot be converted to integer")
        converted.append(0)  # or handle as needed
        
print(f"Converted numbers: {converted}")

# 3. Safe list operations
def safe_get_item(lst, index, default=None):
    """Safely get an item from a list"""
    if 0 <= index < len(lst):
        return lst[index]
    return default

# Example usage
temperatures = [23.5, 25.1, 22.8]
print(f"Temperature at index 1: {safe_get_item(temperatures, 1)}")
print(f"Temperature at index 10: {safe_get_item(temperatures, 10, 'Not available')}")

---

## 📝 Mini-Challenge: Real Data Processing

### Challenge 1: Student Grade Analyzer
You have a list of student scores. Create a program that:
1. Calculates basic statistics (min, max, average)
2. Finds students above/below average
3. Creates a grade distribution
4. Identifies outliers (scores very far from average)

In [None]:
# Challenge 1: Student Grade Analyzer
student_scores = [88, 92, 76, 85, 98, 73, 89, 91, 45, 87, 94, 82, 78, 96, 88, 85, 90, 83, 79, 86]

# 1. Basic statistics
min_score = min(student_scores)
max_score = max(student_scores)
average_score = sum(student_scores) / len(student_scores)

print(f"=== GRADE ANALYSIS ===")
print(f"Total students: {len(student_scores)}")
print(f"Min score: {min_score}")
print(f"Max score: {max_score}")
print(f"Average score: {average_score:.1f}")

# 2. Above/below average
above_average = [score for score in student_scores if score > average_score]
below_average = [score for score in student_scores if score < average_score]

print(f"\nAbove average ({len(above_average)} students): {above_average}")
print(f"Below average ({len(below_average)} students): {below_average}")

# 3. Grade distribution
grades = {"A": 0, "B": 0, "C": 0, "D": 0, "F": 0}
for score in student_scores:
    if score >= 90:
        grades["A"] += 1
    elif score >= 80:
        grades["B"] += 1
    elif score >= 70:
        grades["C"] += 1
    elif score >= 60:
        grades["D"] += 1
    else:
        grades["F"] += 1

print(f"\nGrade Distribution: {grades}")

# 4. Outliers (scores more than 20 points from average)
outliers = [score for score in student_scores if abs(score - average_score) > 20]
print(f"\nOutliers (>20 points from average): {outliers}")

---

## ✅ Self-Assessment Checklist

Before moving to the next notebook, make sure you can:

**Lists:**
- [ ] Create lists and access elements by index
- [ ] Use negative indexing to access elements from the end
- [ ] Slice lists to get subsequences
- [ ] Add and remove elements (append, insert, remove, pop)
- [ ] Find the length of a list and check if items exist

**Tuples:**
- [ ] Create tuples and understand their immutability
- [ ] Use tuples for structured data (coordinates, records)
- [ ] Unpack tuples into variables

**Advanced Techniques:**
- [ ] Write list comprehensions for data transformation
- [ ] Handle nested lists and complex data structures
- [ ] Use built-in functions (min, max, sum, sorted)
- [ ] Debug common list errors (IndexError, ValueError)

**Data Science Connection:** These skills are essential for:
- Processing datasets row by row
- Filtering and transforming data
- Extracting specific information from complex data structures
- Preparing data for analysis

---

## 🚀 What's Next?

In the next notebook, you'll learn about:
- **Dictionaries**: Key-value pairs for structured data
- **Data organization**: Grouping and categorizing information
- **JSON data**: Working with real-world data formats
- **Advanced data manipulation**: Combining lists and dictionaries

These skills will prepare you to work with real datasets!