# Python Programming Revision

This notebook includes the Python Programming Revision lecture and examples of useful Python programming concepts. These examples aim to familiarize students with Python fundamentals and introduce scientific libraries, such as NumPy, Matplotlib, and Seaborn, which will be used in subsequent lectures.

The link to the GitHub repository: https://github.uio.no/milenpa/IN-STK1050

## Learning aims

- Set up and manage virtual environments
- Work effectively within Jupyter Notebooks
- Review useful Python features for data science
- Perform numerical operations using the NumPy library
- Create visualisations using Matplotlib and Seaborn


## Remember

The course content might sometimes deviate from best Software Engineering practices. The primary goal of the code examples is to illustrate statistical concepts and build intuition through simulations, rather than adhering strictly to engineering standards.


## What is a Jupyter Notebook?

- Interactive Python environment for running code.
- Combines code, output, and documentation in one place.
- Supports rich outputs: text, images, plots, and interactive visualizations.
- Ideal for live coding, testing, and quick feedback.
- Allows exporting and sharing notebooks as HTML, PDF, or .ipynb files.


## How to run Jupyter Notebooks?

- UiO JupyterHub: https://www.uio.no/tjenester/it/utdanning/jupyter/
- Hosted Jupyter Services:
    - Binder: https://mybinder.org/
    - Google Colab: https://colab.research.google.com/
    - Kaggle: https://www.kaggle.com/code
- Local Setup: recommended with a virtual environment, like uv, Conda or venv.

## What is uv?

- A package and project manager.
- Helps manage dependencies and libraries in isolated environments.
- Keeps projects organized by avoiding version conflicts.
- Allows different projects to use different versions of Python or libraries.


To set up the environment on your local machine, follow these steps:

1. Clone the GitHub repository for the course:
```
git clone https://github.uio.no/milenpa/IN-STK1050.git
```

Alternatively, use the link above to clone repository directly via your IDE.

2. Install uv and create the environment with all required packages:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
```

To run Python console:
```
(in-stk1050) user@computer ~ % python
Python 3.12.11 (main, Sep 18 2025, 19:41:45) [Clang 20.1.4 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.__version__
'2.4.1'
```

Other environment managers:
- venv (https://docs.python.org/3/library/venv.html)
- conda (https://www.anaconda.com/docs/getting-started/main)
- more on uv: https://docs.astral.sh/uv/


## Useful Python features for data science

In this section, we will cover:
- f-strings,
- revisit lists and dictionaries,
- list and dict comprehensions, 
- functions,
- typing.

## F-strings

F-strings provide an easy and efficient way to format strings in Python. They allow embedding expressions inside string literals, using curly braces {}.

In [None]:
# Classic Hello World example:
print("Hello World!")

In [None]:
# Using f-strings:
name = "Alice"
score = 90
print(f"{name} scored {score} in the test.")  # Output: Alice scored 90 in the test.

In [None]:
# Complex expression inside f-string:
print(f"Half of {score} is {score / 2}.")

## Working with Lists

Lists are ordered, mutable collections used to store a sequence of elements. They allow for operations like appending, accessing, and slicing.


In [None]:
# Creating and adding values to the list:
my_list = [10, 20, 30, 40, 50]
print(my_list)  # Output: [10, 20, 30, 40, 50]
my_list.append(60)
print(my_list)  # Output: [10, 20, 30, 40, 50, 60]

# Deleting fifth element
my_list.pop(4)
print(my_list)  # Output: [10, 20, 30, 40, 60]
# Deleting a specific element from the list:
my_list.remove(60)
print(my_list) # Output: [10, 20, 30, 40, 60]

In [None]:
# Accessing Elements and Slicing:
print(my_list)
print(my_list[1])  # Output: 20 (second element)
print(my_list[-1])  # Output: 40 (last element)
print(my_list[:3])  # Output: [10, 20, 30] (first three elements)
print(my_list[-2:])  # Output: [30, 40] (last two elements)
print(my_list[1:3])  # Output: [20, 30] (second and third elements)

## Working with Dictionaries

Dictionaries store key-value pairs. They are unordered and mutable, making them ideal for mapping relationships (like names and scores).


In [None]:
# Creating and accessing a dictionary:
student_scores = {"Alice": 85, "Bob": 92, "Charlie": 78}

print(student_scores)  # Output: {'Alice': 85, 'Bob': 92, 'Charlie': 78}
print(student_scores["Bob"])  # Output: 92 (Bob's score)
print(student_scores.get("Charlie"))  # Output: 78 (Charlie's score)
print(student_scores.values()) # Output: dict_values([85, 92, 78])
print(student_scores.keys()) # Output: dict_keys(['Alice', 'Bob', 'Charlie'])

### Updating and iterating over dictionaries

In [None]:
# Adding a new key-value pair:
student_scores["David"] = 90
print(student_scores)  # Output: {'Alice': 85, 'Bob': 92, 'Charlie': 78, 'David': 90}

# Updating an existing value:
student_scores["Charlie"] = 80
print(student_scores)  # Output: {'Alice': 85, 'Bob': 92, 'Charlie': 80, 'David': 90}

# Deleting a key-value pair:
del student_scores["Alice"]
print(student_scores)  # Output: {'Bob': 92, 'Charlie': 80, 'David': 90}

In [None]:
# Iterating over a dictionary with for loop:
for key, value in student_scores.items():
    print(f"{key} scored {value}")

## List comprehension

List comprehensions offer a concise way to create lists in Python. They provide an elegant and efficient alternative to using loops for generating lists.
They are often used to apply an expression to each item in an iterable (such as a list or range) or to filter elements based on a condition.


In [None]:
my_list = list(range(1, 6))
print(my_list)

# find squares of all elements in the list:
squares = []
for x in my_list:
    squares.append(x**2)

print(squares)

In [None]:
# the same with list comprehension:
squares = [x**2 for x in my_list]
print(squares)

### Filtering a list

In [None]:
# filtering a list in a for loop:
even_numbers = []
for x in my_list:
    if x % 2 == 0:
        even_numbers.append(x)

print(even_numbers)

# list comprehension for filtering:
even_numbers = [x for x in my_list if x % 2 == 0]
print(even_numbers)

### Conditional processing of elements of the list

In [None]:
# Conditional processing of elements of the list:
even_squares = []
for x in my_list:
    if x % 2 == 0:
        even_squares.append(x**2)
print(even_squares)

# List comprehension with condition:
even_squares = [x**2 for x in range(1, 6) if x % 2 == 0]
print(even_squares)  # Output: [4, 16]

### Exercise: student grades with lists

Given a list of student scores, first filter out the ones below 60, and then convert the remaining scores to grades in the following way: for scores above 90, the grade is "A"; for scores above 75, the grade is "B", and for scores above 60, the grade is "C".

In [None]:
scores = [55, 81, 31, 78, 93, 61]

# TODO: fill this
scores_passed = None
grades = None

print(scores_passed)
print(grades)

## Dict comprehension

- Dict comprehensions provide a concise way to create dictionaries in Python, just like list comprehensions do for lists.
- They allow you to generate dictionaries from an iterable by specifying both the keys and the values in a single, readable line of code.
- You can also apply conditions to filter which key-value pairs are added to the dictionary.


In [None]:
# Make a dict where keys are numbers 1 to 5, and values are their squares:
squares_dict = {}
for x in range(1, 6):
    squares_dict[x] = x**2

print(squares_dict)

# Dict comprehension:
squares_dict = {x: x**2 for x in range(1, 6)}
print(squares_dict)

### Dict comprehension with condition

In [None]:
# The same task with only even numbers in the given range:
even_squares_dict = {}
for x in range(1, 6):
    if x % 2 == 0:
        even_squares_dict[x] = x**2

print(even_squares_dict)

# Dict comprehension with condition:
even_squares_dict = {x: x**2 for x in range(1, 6) if x % 2 == 0}
print(even_squares_dict)


### Swapping keys and values in a dict

In [None]:
# Swap keys and values in an existing dictionary
original_dict = {'a': 1, 'b': 2, 'c': 3}
swapped_dict = {v: k for k, v in original_dict.items()}
print(f"Original dict: {original_dict}")
print(f"Swapped dict: {swapped_dict}")  # Output: {1: 'a', 2: 'b', 3: 'c'}

### Exercise: student grades with dicts

Given a dictionary with students' names and scores, make a new dictionary only with students that have passed the exam (score > 60) and map each student score to a grade (score > 90 -> "A", score > 75 -> "B", score > 60 -> "C").

In [None]:
scores = {"Alice": 55, "Bob": 81, "Charlie": 31, "Diana": 78, "Eve": 93, "Helen": 61}

# TODO: implement this
passed_scores = None
grades = None

print(passed_scores)
print(grades)

### Exercise: student grades with nested dicts

Starting from the scores, make a dictionary that will have information for each student if they passed/failed and the score they got.

In [None]:
# TODO: implement this
grades = None

print(grades)
print(grades['Eve'])

## Functions

- We are used to thinking of functions as reusable blocks of code that can accept input (arguments), process it, and return output. They help in organizing and simplifying your code.
- Python functions support default arguments, multiple arguments, multiple return values, and lambda (anonymous) functions.
- Functions in mathematics (and statistics) are a way to map one value to another (think of f(x)) - this is similar to dicts in Python, except that mathematical functions can map an infinite number of values.
- Python functions (and functions in programming) also map one value (set of arguments) to another value (return value). They can even map an infinite number of possible input values (unlike a dict).


In [None]:
# Programming function (not a typical mathematical function):
def greet(name):
    return f"Hello, {name}!"

print(greet("Alice"))  # Output: Hello, Alice!

# a dict is a mapping and thus a kind of f(x)
squares = {
    1: 1,
    2: 4,
    3: 9
}

# the same without the limit of possible inputs
def square(x):
    return x**2

print(square(2))

## Type hints

- Python is a dynamically typed language: types are checked at runtime
- **Type hints** allow you to annotate expected types, even though the annotated types are not enforced at runtime
- Type hints improve readability and documentation of the code, especially in larger codebases
- We will here use it (and encourage you to use it) to be precise about the nature of various statistical concepts

In [None]:
# argument name is expected to be a string and the function returns a string
def greet(name: str) -> str: 
    return f"Hello, {name}!"

return_value = greet("Alice")
print(return_value)
print(type(return_value))

In [None]:
# Functions with default arguments:
def greet(name: str = "Alice"):
    return f"Hello, {name}!"

print(greet())  # Output: Hello, Alice!
print(greet("Bob"))  # Output: Hello, Bob!

In [None]:
# Functions with multiple arguments:
def greet(name: str, message: str):
    return f"{message}, {name}!"

print(greet("Alice", "Good Morning"))  # Output: Good Morning, Alice!

### Exercise: square function with type hints

Write the square function with type hints to map from float to float.

## Packing and unpacking with functions

In [None]:
def get_student_info():
    name = "Alice"
    score = 85
    return name, score

student_name, student_score = get_student_info()
print(f"{student_name} scored {student_score}")  # Output: Alice scored 85
student_information = get_student_info()
print(f"{student_information[0]} scored {student_information[1]}")  # Output: Alice scored 85

### Exercise: switch two values

Write **one line** to switch the two values.

In [None]:
x = 1
y = 2

# TODO: switch the values

print(x, y)

## Lambda functions

Lambda functions are small anonymous functions with no name, often used for short operations.

In [None]:
# Lambda functions:
multiply = lambda x, y: x * y
print(multiply(5, 4))  # Output: 20

# Using lambda function with sorting
points = [(1, 2), (3, 1), (5, 4), (2, 3)]
points_sorted = sorted(points, key=lambda point: point[0])  # Sort by x-coordinate
print(points_sorted)

# Libraries for Data Science
In this section, you will learn about Python libraries used in Data Science, such as numpy, matplotlib and seaborn.

## Importing libraries

- Importing libraries allows you to use additional functionality from external modules and packages.
- Python has a rich ecosystem of libraries for various tasks such as data manipulation, visualization, and scientific computing.
- You can import entire libraries, specific functions, or give aliases for easier usage.


In [None]:
# Importing libraries:
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Random numbers and operations in Python (random library)

- The random module in Python provides functions to generate random numbers and select random elements. It is widely used in simulations, randomized testing, and creating random datasets for data analysis.
- You can generate random numbers, select random elements from lists, and control reproducibility with a seed.


In [None]:
# You can set a seed to generate the same random numbers:
random.seed(0)

# Generate a random float:
print(random.random())  # Output: Random float between 0 and 1
print(random.uniform(1, 10))  # Output: Random float between 1 and 10

In [None]:
# Generate a random integer:
print(random.randint(1, 10))  # Output: Random integer between 1 and 10

### Sampling from a list

In [None]:
# Generate a random sample from a list:
my_list = [10, 20, 30, 40, 50]

# Sampling without replacement: already sampled element is not available to sample again in the same function call
print(random.sample(my_list, 2))  # Output: Random sample of 2 elements [no replacement]

print(random.choice(my_list))  # Output: Random choice from the list

# Sampling with replacement: choosing from the whole list for every of the k elements
print(random.choices(my_list, k=2))  # Output: Random choice of 2 elements [with replacement]

### Exercise: daily temperatures

Write a function to simulate daily temperature given the baseline temperature. Assume that all temperatures +/- 5 degrees around the baseline temperature are equally likely.

## Introduction to NumPy

NumPy (Numerical Python) is a powerful library for numerical computations. It provides support for multi-dimensional arrays and functions for numerical operations on these arrays.


### Creating NumPy arrays

Numpy arrays are more efficient than Python lists for numerical operations.

In [None]:
# Creating Numpy arrays:
# 1D array (vector)
arr_1d = np.array([1, 2, 3, 4, 5])
print("1D array:\n", arr_1d)

# 2D array (matrix)
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D array:\n", arr_2d)

# 3D array (tensor)
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D array:\n", arr_3d)

### Properties of arrays

In [None]:
# Checking the type of the arrays
print(type(arr_1d))  # Output: <class 'numpy.ndarray'>
print(type(arr_2d))  # Output: <class 'numpy.ndarray'>

In [None]:
# Shape of the arrays
print("Shape of 1D array:", arr_1d.shape)  # Output: (5,)
print("Shape of 2D array:", arr_2d.shape)  # Output: (3, 3)
print("Shape of 3D array:", arr_3d.shape)  # Output: (2, 2, 2)

In [None]:
# Number of dimensions (ndim) of the arrays
print("Dimensions of 1D array:", arr_1d.ndim)  # Output: 1
print("Dimensions of 2D array:", arr_2d.ndim)  # Output: 2
print("Dimensions of 3D array:", arr_3d.ndim)  # Output: 3

### Accessing elements of arrays

In [None]:
# Accessing elements in a 1D array
print(f"Element at index 2 in 1D array: {arr_1d[2]}")  # Output: 3

# Accessing elements in a 2D array
print(f"\nElement at row 1, col 2 in 2D array: {arr_2d[1, 2]}")  # Output: 6

# Slicing 1D array
print(f"\nSlicing 1D array [1:4]: {arr_1d[1:4]}")  # Output: [2, 3, 4]

# Slicing 2D array
print(f"\nSlicing 2D array [0:2, 1:3]:\n{arr_2d[0:2, 1:3]}")

## Vectorization with NumPy

Vectorization refers to performing operations on entire arrays or matrices (vectors) without explicit loops. It allows you to perform element-wise operations in bulk, making the code more efficient and faster compared to using traditional loops.


In [None]:
# Vectorization
# Without vectorization (using a Python loop)
arr = np.array([1, 2, 3, 4, 5])
squared = np.zeros_like(arr)

for i in range(len(arr)):
    squared[i] = arr[i] ** 2

print(squared)  # Output: [ 1  4  9 16 25]

# With vectorization (using NumPy's array operations)
squared_vectorized = arr ** 2
print(squared_vectorized)  # Output: [ 1  4  9 16 25]

### Basic operations on NumPy arrays

In [None]:
# Basic operations on Numpy arrays:

# Element-wise addition
arr_sum = arr_1d + 2  # Adds 2 to each element
print(f"1D array after addition:\n{arr_sum}")

# Note the difference with Python lists:
list1 = [1, 2, 3]
list2 = [5]
list_sum = list1 + list2
print(f"\n\"Sum\" of two lists: {list_sum}")

In [None]:
# Element-wise multiplication
arr_product = arr_2d * 2  # Multiplies each element by 2
print(f"2D array after multiplication:\n{arr_product}")

## Useful NumPy functions

In [None]:
# Array of evenly spaced values (like a range)
arr_range = np.arange(0, 10, 2)
print(f"Array with range 0 to 10 with step 2: {arr_range}")

# Array of random values
np.random.seed(0)
arr_random = np.random.random((2, 3))
print(f"Array of random values:\n{arr_random}")

In [None]:
# Numpy functions for numerical analysis
arr = np.array([1, 2, 3, 4, 5])

print(np.mean(arr))  # Output: 3.0 (mean)
print(np.std(arr))   # Output: 1.414... (standard deviation)
print(np.sum(arr))   # Output: 15 (sum of all elements)
print(np.max(arr))   # Output: 5 (maximum value)

### Exercise: summarize temperatures in a week

Daily temperatures for a week is provided. Write functions to compute the following summary: the average temperature, the lowest and highest temperature, and number of days with temperatures above and below the historical daily average.

In [None]:
temperatures = np.array([-1, 1, -4, -1, -2, 2, 1])
historical_daily_average = -1

# TODO: implement this
def summarize_week(temperatures: np.ndarray, historical_daily_average: int) -> dict:
    return {}

summarize_week(temperatures, historical_daily_average)

## Introduction to Data Visualization with Matplotlib and Seaborn

Matplotlib and Seaborn are libraries used to create data visualizations in Python.
They are widely used in data science for creating plots such as scatter plots, bar charts, and line charts.
Matplotlib provides low-level control over plot customization, while Seaborn simplifies the creation of common plots by building on Matplotlib.

In [None]:
import seaborn as sns

In [None]:
data = {'Name': ['Alice', 'Tom', 'Bob', 'John', 'Kate', 'Selena', 'Adam', 'Nancy', 'Mary', 'Andrew', 'Jennifer', 'Robert', 'Matthew', 'Mark', 'Anthony', 'Steven', 'Audrey', 'Julia', 'Lisa', 'Richard'], 
        'Age': [22, 21, 20, 33, 30, 27, 28, 21, 24, 35, 32, 30, 21, 31, 22, 29, 31, 26, 27, 30], 
        'Score': [85, 90, 88, 70, 95, 90, 98, 60, 95, 60, 55, 60, 92, 88, 80, 85, 90, 92, 65, 78], 
        'Hours_Studied': [4, 5, 4, 3, 6, 5, 7, 3, 7, 2, 3, 3, 7, 5, 4, 4, 6, 6, 3, 4]}

### Scatterplot

In [None]:
sns.scatterplot(x=data["Name"], y=data["Score"])
plt.title("Student Scores by Name (scatter plot)")
plt.xticks(rotation=45)
plt.show()

### Adding color and size based on another variable

In [None]:
sns.scatterplot(
    x=data["Name"],
    y=data["Score"],
    size=data["Hours_Studied"],
    hue=data["Hours_Studied"]
)

plt.title("Student Scores and Hours Studied")
plt.xticks(rotation=45)
plt.legend().remove()
plt.show()

### Bar plot

In [None]:
sns.barplot(x=data["Name"], y=data["Score"])
plt.title("Student Scores by Name (bar plot)")
plt.xticks(rotation=45)
plt.show()

### Customizing bar plot

In [None]:
sns.barplot(
    x=data["Name"],
    y=data["Score"],
    hue=data["Hours_Studied"]
)

plt.title("Student Scores by Name (sorted bar plot)")
plt.xticks(rotation=45)
plt.legend(
    bbox_to_anchor=(1.05, 1),
    loc='upper left',
)
plt.show()

### Histogram

In [None]:
sns.histplot(x=data["Score"], bins=5)
plt.title("Student Scores Distribution (histogram)")
plt.show()

### Line plot

In [None]:
time_data = {'Week': [1, 2, 3, 4, 5],
             'Score': [75, 80, 82, 85, 90]}

sns.lineplot(x=time_data["Week"], y=time_data["Score"], marker="o")
plt.title("Score Trend Over Time")
plt.show()

### Customizing the line plot

In [None]:
sns.lineplot(x=time_data["Week"], y=time_data["Score"], marker="o", color="green")
plt.title("Student Scores Trend Over Time")
plt.xlabel("Week Number")
plt.ylabel("Score Value")
plt.grid(True)
# Save the plot to a jpg file
plt.savefig("student_scores.jpg")
plt.show()

### Heatmap

In [None]:
scores = [
    [80, 85, 78],   # Alice
    [90, 88, 92],   # Bob
    [70, 75, 82]    # Charlie
]

sns.heatmap(
    scores,
    annot=True,
    cmap="viridis",
    xticklabels=["Math", "Science", "English"],
    yticklabels=["Alice", "Bob", "Charlie"]
)

plt.title("Student Scores Heatmap")
plt.show()

### Exercise: explore temperature and rain data

Given the temperature and rain data per day of the week:
- create a line, scatter, and bar plot to visualize temperature changes over time - which visualization is the most informative for this case?
- add a "rained" key with True/False values to the dictionary and plot two temperature histograms: one for rainy days and one for non-rainy days.

In [None]:
weather_dict = {
    "day": [1, 2, 3, 4, 5, 6, 7],
    "temperature": [18, 19, 21, 23, 22, 20, 19],
    "rain_mm": [2, 0, 1, 0, 5, 3, 0]
}

# Full example
This task combines lists, dictionaries, functions, Numpy, Pandas, and Seaborn/Matplotlib.

Problem Statement:
- Generate 10 random student scores.
- Create a function to calculate the average score.
- Store student names and scores in a Pandas DataFrame.
- Filter students who scored above the average.
- Visualize the data using Seaborn/Matplotlib.

In [None]:
import numpy as np
from random import randint
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Generate random student scores
students = ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank', 'Grace', 'Hannah', 'Ivy', 'Jack']
scores = [randint(50, 100) for _ in range(10)]
print(students)
print(scores)

# 2. Function to calculate average score
def calculate_average(scores):
    return np.mean(scores)

average_score = calculate_average(scores)
print(f"Average score: {average_score}")

# 3. Filter students with scores above average
above_average_indices = np.array(scores) > average_score
above_average = {'Name': np.array(students)[above_average_indices], 'Score': np.array(scores)[above_average_indices]}
print(above_average)

# 4. Visualize results with Seaborn/Matplotlib
sns.barplot(x=above_average['Name'], y=above_average['Score'])
plt.title('Students Scoring Above Average')
plt.xlabel('Name')
plt.ylabel('Score')
plt.xticks(rotation=45)

# Exercises
In this section, you will find exercises to practice the concepts covered in this notebook. You can try to solve these exercises on your own or discuss them with your peers. The solutions are provided below.

# Stock Market Simulation
Simulate random price fluctuations of a stock over time.

Problem statement:

- Use random.uniform(-0.05, 0.05) to generate random daily percentage changes (between -5% and +5%) for 100 days.
- Assume a starting stock price of $100.
- Calculate the daily price changes based on the random percentage changes.
- Visualize the stock prices over time using Seaborn/Matplotlib.
- You can also simulate multiple stocks and compare their random price changes in a single visualization.

# Random Daily Temperature Simulation
Simulate daily temperature changes for a year using randomness.

Problem statement:
- Use random.uniform(-5, 5) to simulate daily temperature deviations around an average value (e.g., 25°C).
- Add seasonal effects by varying the average temperature based on the month (e.g., cooler in winter, warmer in summer).
- Store the data in a dictionary with keys: "Date" and "Temperature".
- Calculate the average monthly temperature.
- Use Seaborn/Matplotlib to visualize the daily temperature as a time series and a bar chart for monthly averages.


# Solutions for the exercises

### Exercise: student grades with lists

In [None]:
scores = [55, 81, 31, 78, 93, 61]

scores_passed = [el for el in scores if el > 60]
grades = ["A" if el > 90 else "B" if el > 75 else "C" for el in scores_passed]

print(scores_passed)
print(grades)

### Exercise: student grades with dicts

In [None]:
scores = {"Alice": 55, "Bob": 81, "Charlie": 31, "Diana": 78, "Eve": 93, "Helen": 61}

passed_scores = {name: score for name, score in scores.items() if score > 60}
grades = {name: "A" if score > 90 else "B" if score > 75 else "C"
          for name, score in passed_scores.items()}

print(passed_scores)
print(grades)

### Exercise: student grades with nested dicts

In [None]:
scores = {"Alice": 55, "Bob": 81, "Charlie": 31, "Diana": 78, "Eve": 93, "Helen": 61}

grades = {name: {"status": "passed" if score > 60 else "failed", "score": score}
          for name, score in scores.items()}

print(grades)
print(grades['Eve'])

### Exercise: square function with type hints

In [None]:
def square(x: float) -> float:
    return x**2

print(square(4.5))

### Exercise: switch two values

In [None]:
x = 1
y = 2

x, y = y, x

print(x, y)

### Exercise: daily temperature

In [None]:
def simulate_daily_temperature(baseline_temp: int = 20) -> int:
    return baseline_temp + random.randint(-5, 5)

simulate_daily_temperature()

### Exercise: summarize temperatures in a week

In [None]:
temperatures = np.array([-1, 1, -4, -1, -2, 2, 1])
historical_daily_average = -1

def summarize_week(temperatures: np.ndarray, historical_daily_average: int) -> dict:
    return {
        "highest_temperature": int(np.max(temperatures)),
        "lowest_temperature": int(np.min(temperatures)),
        'average_temperature': float(round(np.mean(temperatures), 2)),
        'days_above_avg': int(np.sum(temperatures > historical_daily_average)),
        'days_below_avg': int(np.sum(temperatures < historical_daily_average))
    }

summarize_week(temperatures, historical_daily_average)

### Exercise: process student exam data

In [None]:
import numpy as np

student_exam_data = {'Name': ['Alice', 'Tom', 'Bob', 'John', 'Kate', 'Selena', 'Adam', 'Nancy', 'Mary', 'Andrew', 'Jennifer', 'Robert', 'Matthew', 'Mark', 'Anthony', 'Steven', 'Audrey', 'Julia', 'Lisa', 'Richard'], 
                     'Age': [22, 21, 20, 33, 30, 27, 28, 21, 24, 35, 32, 30, 21, 31, 22, 29, 31, 26, 27, 30], 
                     'Score': [85, 90, 88, 70, 95, 90, 98, 60, 95, 60, 55, 60, 92, 88, 80, 85, 90, 92, 65, 78], 
                     'Hours_Studied': [4, 5, 4, 3, 6, 5, 7, 3, 7, 2, 3, 3, 7, 5, 4, 4, 6, 6, 3, 4]}

student_exam_data['Passed'] = [el > 60 for el in student_exam_data['Score']]
print(student_exam_data)

average_hours_passed = np.mean(np.array(student_exam_data['Hours_Studied'])[student_exam_data['Passed']])

print(f"\nPassed exam: {sum(student_exam_data['Passed'])}/{len(student_exam_data['Name'])}")
print(f"Average hours studied for students who passed the exam: {average_hours_passed}")

### Exercise: explore temperature and rain data

In [None]:
import seaborn as sns
from matplotlib import pyplot as plt

weather_dict = {
    "day": [1, 2, 3, 4, 5, 6, 7],
    "temperature": [16, 20, 19, 23, 17, 19, 25],
    "rain_mm": [2, 0, 1, 0, 5, 3, 0]
}

sns.lineplot(x=weather_dict["day"], y=weather_dict["temperature"], marker="o")
plt.title("Temperature during the week")
plt.xlabel("Day")
plt.ylabel("Temperature")
plt.show()

sns.scatterplot(x=weather_dict["day"], y=weather_dict["temperature"])
plt.title("Temperature during the week")
plt.xlabel("Day")
plt.ylabel("Temperature")
plt.show()

sns.barplot(x=weather_dict["day"], y=weather_dict["temperature"])
plt.title("Temperature during the week")
plt.xlabel("Day")
plt.ylabel("Temperature")
plt.show()



In [None]:
weather_dict["rained"] = [mm > 0 for mm in weather_dict["rain_mm"]]

temps_rained = [
    t for t, r in zip(weather_dict["temperature"], weather_dict["rained"]) if r
]
sns.histplot(temps_rained, bins=5)
plt.title("Temperature Distribution (Rained)")
plt.xlabel("Temperature")
plt.show()

temps_no_rain = [
    t for t, r in zip(weather_dict["temperature"], weather_dict["rained"]) if not r
]
sns.histplot(temps_no_rain, bins=5)
plt.title("Temperature Distribution (No Rain)")
plt.xlabel("Temperature")
plt.show()

### Exercise: Stock Market Simulation

In [None]:
def simulate_stock_prices(stock_name, days=100, start_price=100):
    """
    Simulates random stock price changes for a given number of days.
    :param stock_name: The name of the stock.
    :param days: The number of days to simulate.
    :param start_price: The starting price of the stock.
    :return: A dictionary with the simulated stock prices and days.
    """
    random_percentage_changes = [random.uniform(-0.05, 0.05) for _ in range(days)]
    prices = [start_price]

    for change in random_percentage_changes:
        prices.append(prices[-1] * (1 + change))

    days_list = list(range(days + 1))

    return {
        "Stock": stock_name,
        "Day": days_list,
        "Price": prices
    }

stocks = ["Stock_A", "Stock_B", "Stock_C"]
stock_data = [simulate_stock_prices(stock, days=100) for stock in stocks]

for stock in stock_data:
    sns.lineplot(
        x=stock["Day"],
        y=stock["Price"],
        label=stock["Stock"]
    )

plt.title("Simulated Stock Prices Over 100 Days for Multiple Stocks")
plt.xlabel("Day")
plt.ylabel("Price")
plt.legend(title="Stock")
plt.grid(True)
plt.show()

### Exercise: Random Daily Temperature Simulation

In [None]:
def simulate_daily_temperatures(days=365, base_temperature=20) -> dict:
    """
    Simulates daily temperature changes for a period of time (supposedly a year).
    :param days: number of days to simulate
    :param base_temperature: average temperature around which daily deviations occur
    :return: a dict with daily temperatures
    """
    temperatures = []
    for day in range(days):
        # Determine the month (roughly dividing 365 days into 12 months)
        month = (day // 30.5) + 1
        # Apply seasonal effects (cooler in winter, warmer in summer)
        if month in [12, 1, 2]:  # Winter months
            avg_temp = base_temperature - 10
        elif month in [6, 7, 8]:  # Summer months
            avg_temp = base_temperature + 10
        else:  # Spring/Autumn months
            avg_temp = base_temperature

        # Simulate daily temperature deviation
        daily_temp = avg_temp + random.uniform(-5, 5)
        temperatures.append(daily_temp)

    days_range = list(range(1, days + 1))

    data = {
        'Day': np.array(days_range),
        'Temperature': np.array(temperatures)
    }

    return data

data = simulate_daily_temperatures()

data['Month'] = data['Day'] // 30.5 + 1  # Map days to months
monthly_avg_temp = {int(month): float(np.mean(data['Temperature'][data['Month'] == month])) for month in set(data['Month'])}
print(monthly_avg_temp)

sns.lineplot(x=data["Day"], y=data["Temperature"])
plt.title("Daily Temperature Over One Year")
plt.xlabel("Day")
plt.ylabel("Temperature (°C)")
plt.grid(True)
plt.show()

sns.barplot(x=list(monthly_avg_temp.keys()), y=list(monthly_avg_temp.values()))
plt.title("Average Monthly Temperature")
plt.xlabel("Month")
plt.ylabel("Average Temperature (°C)")
plt.show()

## Optional: Introduction to Pandas

Pandas is a powerful Python library for data manipulation and analysis. It provides the **DataFrame**, a 2D labeled data structure that makes it easy to organize, filter, and manipulate structured data, similar to an Excel spreadsheet or SQL table.


In [None]:
import pandas as pd

# Creating a Pandas DataFrame
data = {
    "Name": ["Alice", "Bob", "Charlie", "David"],
    "Age": [25, 30, 35, 40],
    "Score": [85, 90, 88, 92]
}
df = pd.DataFrame(data)
print(df)

In [None]:
# Adding a new column
df["Hours_Studied"] = [5, 10, 7, 8]
print(df)

In [None]:
# Access a column and get values
hours_studied = df['Hours_Studied']
print(f"Column values:\n{hours_studied}\ntype: {type(hours_studied)}\n")

# To get it as a numpy array:
print(f"Getting it as a numpy array:\n{df['Hours_Studied'].values}\ntype: {type(df['Hours_Studied'].values)}")

In [None]:
# Reading a CSV file
df = pd.read_csv("week_1_data.csv")
# Displaying the first few rows of the DataFrame
print(df.head())

In [None]:
# Displaying the columns of the DataFrame
print(df.columns)

In [None]:
# Displaying the shape of the DataFrame
print(df.shape)

In [None]:
# DataFrame operations
# Accessing columns
print(df[["Name", "Age"]])

In [None]:
# Filtering rows based on a condition
print(df[df["Age"] > 30])

## Saving and exporting data frames

CSV is a very common format for saving and sharing tabular data.

Depending on your needs, you can also export your DataFrame into Excel or JSON formats, giving you flexibility when interacting with different tools or sharing data.

In [None]:
# Saving the DataFrame to a CSV file
df.to_csv("week_1_data_tmp.csv", index=False)