# Session 8: Practical (Loops and Conditionals on Datasets)

## Practical: Using Loops and Conditionals to Analyze Earth’s Temperature Data

### Introduction

In this practical, you will learn how to use loops and conditionals to process and analyze a dataset containing Earth’s global average surface temperature anomalies ([dataset link](https://berkeley-earth-temperature.s3.us-west-1.amazonaws.com/Global/Land_and_Ocean_complete.txt) from [Berkeley Earth](https://berkeleyearth.org/data/)). You will handle data efficiently and perform operations such as filtering, summing, and averaging data using loops and arrays.

### Objectives

- Understand and create arrays from datasets
- Use for and while loops to iterate over arrays
- Use conditionals within loops to process data
- Perform data processing tasks on large datasets
- Solve practice problems to reinforce your understanding

### Prerequisites

Basic knowledge of Python variables, functions, conditionals, arrays, and NumPy

### Estimated Time: 1.5 hours

1. **Working with NumPy Arrays**

   1.1 **Creating Arrays from a Dataset**

   We will use a sample dataset containing Earth’s temperature anomalies. The dataset includes monthly, annual, five-year, ten-year, and twenty-year anomalies along with their uncertainties.


In [2]:
import numpy as np

# Sample dataset
data = """
1850 1 -0.790 0.404
1850 2 -0.243 0.524
1850 3 -0.394 0.449
1850 4 -0.625 0.304
1850 5 -0.652 0.249
1850 6 -0.374 0.295
1850 7 -0.192 0.223
1850 8 -0.179 0.257
1850 9 -0.443 0.225
1850 10 -0.611 0.347
"""

# Parsing the dataset into a NumPy array
data_lines = data.strip().split('\n')
temperature_data = np.array([line.split() for line in data_lines], dtype=object)

# Printing the parsed data
print(temperature_data)


[['1850' '1' '-0.790' '0.404']
 ['1850' '2' '-0.243' '0.524']
 ['1850' '3' '-0.394' '0.449']
 ['1850' '4' '-0.625' '0.304']
 ['1850' '5' '-0.652' '0.249']
 ['1850' '6' '-0.374' '0.295']
 ['1850' '7' '-0.192' '0.223']
 ['1850' '8' '-0.179' '0.257']
 ['1850' '9' '-0.443' '0.225']
 ['1850' '10' '-0.611' '0.347']]


### Explanation:

**Step 1:** Strip any leading or trailing whitespace from the dataset.
**Step 2:** Split the dataset into individual lines.
**Step 3:** Split each line into individual elements and store them in a NumPy array.

*Hint:* Use the `strip()` and `split()` methods to process the data.

### Practice Problem 1: Data Parsing (10 minutes)

Write a function `parse_data` that takes a multiline string dataset and returns a NumPy array where each row represents a line of data.


**Solution:**

In [4]:
def parse_data(data):
    data_lines = data.strip().split('\n')
    parsed_data = np.array([line.split() for line in data_lines], dtype=object)
    return parsed_data

# Test the function
parsed_data = parse_data(data)
print(parsed_data)


[['1850' '1' '-0.790' '0.404']
 ['1850' '2' '-0.243' '0.524']
 ['1850' '3' '-0.394' '0.449']
 ['1850' '4' '-0.625' '0.304']
 ['1850' '5' '-0.652' '0.249']
 ['1850' '6' '-0.374' '0.295']
 ['1850' '7' '-0.192' '0.223']
 ['1850' '8' '-0.179' '0.257']
 ['1850' '9' '-0.443' '0.225']
 ['1850' '10' '-0.611' '0.347']]


# 2. Using Loops with Arrays

## 2.1 for Loop with Arrays

The for loop is used to iterate over each element in an array.

**Example:**


In [5]:
# Iterating over parsed data
for entry in parsed_data:
    year, month, monthly_anomaly = entry[0], entry[1], entry[2]
    print(f"Year: {year}, Month: {month}, Monthly Anomaly: {monthly_anomaly}")


Year: 1850, Month: 1, Monthly Anomaly: -0.790
Year: 1850, Month: 2, Monthly Anomaly: -0.243
Year: 1850, Month: 3, Monthly Anomaly: -0.394
Year: 1850, Month: 4, Monthly Anomaly: -0.625
Year: 1850, Month: 5, Monthly Anomaly: -0.652
Year: 1850, Month: 6, Monthly Anomaly: -0.374
Year: 1850, Month: 7, Monthly Anomaly: -0.192
Year: 1850, Month: 8, Monthly Anomaly: -0.179
Year: 1850, Month: 9, Monthly Anomaly: -0.443
Year: 1850, Month: 10, Monthly Anomaly: -0.611


**Hint:** Use the for loop to go through each element in the `parsed_data` array. Extract and print specific elements from each entry.

### Practice Problem 2: Extracting Monthly Anomalies

Write a function `extract_monthly_anomalies` that takes the parsed data and returns a NumPy array of monthly anomalies.

**Hints:**
- Initialize an empty list to store the anomalies.
- Use a for loop to iterate over the parsed data.
- Extract the monthly anomaly (convert it to a float) and append it to the list.
- Convert the list to a NumPy array at the end.

**Solution:**



In [7]:
def extract_monthly_anomalies(data):
    monthly_anomalies = []
    for entry in data:
        monthly_anomaly = float(entry[2])
        monthly_anomalies.append(monthly_anomaly)
    return np.array(monthly_anomalies)

# Test the function
monthly_anomalies = extract_monthly_anomalies(parsed_data)
print("Monthly Anomalies:", monthly_anomalies)


Monthly Anomalies: [-0.79  -0.243 -0.394 -0.625 -0.652 -0.374 -0.192 -0.179 -0.443 -0.611]


## 2.2 while Loop with Arrays

The while loop can be used to iterate over arrays when the number of iterations is not known beforehand.

**Example:**


In [8]:
# Iterating over parsed data using while loop
index = 0
while index < len(parsed_data):
    entry = parsed_data[index]
    year, month, monthly_anomaly = entry[0], entry[1], entry[2]
    print(f"Year: {year}, Month: {month}, Monthly Anomaly: {monthly_anomaly}")
    index += 1


Year: 1850, Month: 1, Monthly Anomaly: -0.790
Year: 1850, Month: 2, Monthly Anomaly: -0.243
Year: 1850, Month: 3, Monthly Anomaly: -0.394
Year: 1850, Month: 4, Monthly Anomaly: -0.625
Year: 1850, Month: 5, Monthly Anomaly: -0.652
Year: 1850, Month: 6, Monthly Anomaly: -0.374
Year: 1850, Month: 7, Monthly Anomaly: -0.192
Year: 1850, Month: 8, Monthly Anomaly: -0.179
Year: 1850, Month: 9, Monthly Anomaly: -0.443
Year: 1850, Month: 10, Monthly Anomaly: -0.611


**Hint:** Use a variable to keep track of the current index and increment it in each iteration.

### Practice Problem 3: Summing Monthly Anomalies

Write a function `sum_monthly_anomalies` that takes a NumPy array of monthly anomalies and returns their sum using a while loop.

**Hints:**
- Initialize a variable to hold the total sum.
- Use a while loop to iterate over the array.
- Add each element to the total sum.

**Solution:**



In [9]:
def sum_monthly_anomalies(anomalies):
    total = 0
    index = 0
    while index < len(anomalies):
        total += anomalies[index]
        index += 1
    return total

# Test the function
total_anomalies = sum_monthly_anomalies(monthly_anomalies)
print("Total Monthly Anomalies:", total_anomalies)


Total Monthly Anomalies: -4.503


# 3. Performing Data Analysis with Loops and Arrays

## 3.1 Summing Elements in an Array

Write a function `sum_array` that takes a NumPy array of numbers and returns the sum of its elements.

**Hints:**
- Initialize a variable to hold the total sum.
- Use a for loop to iterate over the array and add each element to the total sum.

**Solution:**

In [10]:
def sum_array(numbers):
    total = 0
    for number in numbers:
        total += number
    return total

# Test the function
print("Sum of anomalies:", sum_array(monthly_anomalies))  # Should print the sum of the monthly anomalies


Sum of anomalies: -4.503


### 3.2 Finding the Average of an Array

Write a function `average_array` that takes a NumPy array of numbers and returns the average.

**Hints:**
- Use the `sum_array` function to get the total sum of the array.
- Divide the total sum by the number of elements in the array.

**Solution:**

In [14]:
def average_array(numbers):
    total = sum_array(numbers)
    return total / len(numbers)

# Test the function
print("Average of anomalies:", average_array(monthly_anomalies))  # Should print the average of the monthly anomalies


Average of anomalies: -0.45030000000000003


### 3.3 Filtering Data

Write a function `filter_negative_anomalies` that takes a NumPy array of anomalies and returns a NumPy array of only the negative anomalies.

**Hints:**
- Initialize an empty list to store the negative anomalies.
- Use a for loop to iterate over the anomalies.
- Use an if condition to check if an anomaly is negative, and if so, append it to the list.
- Convert the list to a NumPy array at the end.

**Solution:**

In [13]:
def filter_negative_anomalies(anomalies):
    negative_anomalies = []
    for anomaly in anomalies:
        if anomaly < 0:
            negative_anomalies.append(anomaly)
    return np.array(negative_anomalies)

# Test the function
negative_anomalies = filter_negative_anomalies(monthly_anomalies)
print("Negative Anomalies:", negative_anomalies)  # Should


Negative Anomalies: [-0.79  -0.243 -0.394 -0.625 -0.652 -0.374 -0.192 -0.179 -0.443 -0.611]


### 3.4 Counting Occurrences of Anomalies

Write a function `count_anomalies` that takes a NumPy array of anomalies and returns the number of occurrences of each unique anomaly using just basic structures.

**Hints:**
- Initialize an empty list for unique anomalies.
- Initialize an empty list for counts.
- Use a for loop to iterate over the anomalies.
- Use another for loop to check if the anomaly is already in the list of unique anomalies.
- If it is, increment the corresponding count.
- If it isn’t, append it to the list of unique anomalies and add a count of 1.

**Solution:**

In [15]:
def count_anomalies(anomalies):
    unique_anomalies = []
    counts = []
    for anomaly in anomalies:
        found = False
        for i in range(len(unique_anomalies)):
            if unique_anomalies[i] == anomaly:
                counts[i] += 1
                found = True
                break
        if not found:
            unique_anomalies.append(anomaly)
            counts.append(1)
    return np.array(unique_anomalies), np.array(counts)

# Test the function
unique_anomalies, counts = count_anomalies(monthly_anomalies)
print("Unique Anomalies:", unique_anomalies)
print("Counts:", counts)


Unique Anomalies: [-0.79  -0.243 -0.394 -0.625 -0.652 -0.374 -0.192 -0.179 -0.443 -0.611]
Counts: [1 1 1 1 1 1 1 1 1 1]


## Summary

In this practical, you learned how to:
- Parse a multiline string dataset into a NumPy array.
- Use for and while loops to iterate over arrays.
- Perform various data processing tasks such as summing, averaging, and filtering data using loops and arrays.

By working through these examples and practice problems, you have gained a deeper understanding of how to handle and analyze datasets using basic Python constructs.

## Practice Problems for Further Learning

### Finding Maximum and Minimum Anomalies
Write a function `find_max_min_anomalies` that returns the maximum and minimum monthly anomalies from the dataset.

**Solution:**

In [16]:
def find_max_min_anomalies(anomalies):
    max_anomaly = np.max(anomalies)
    min_anomaly = np.min(anomalies)
    return max_anomaly, min_anomaly

### Calculating Standard Deviation
Write a function `calculate_std_deviation` that calculates the standard deviation of the monthly anomalies.

**Solution:**

In [17]:
def calculate_std_deviation(anomalies):
    mean_anomaly = np.mean(anomalies)
    variance = np.mean((anomalies - mean_anomaly) ** 2)
    std_deviation = np.sqrt(variance)
    return std_deviation

### Normalizing Anomalies
Write a function `normalize_anomalies` that normalizes the anomalies to a range of 0 to 1.

**Solution:**

In [18]:
def normalize_anomalies(anomalies):
    min_anomaly = np.min(anomalies)
    max_anomaly = np.max(anomalies)
    normalized_anomalies = (anomalies - min_anomaly) / (max_anomaly - min_anomaly)
    return normalized_anomalies