## Python Warm-Up for Machine Learning 


---
## Part 1. Complex data types in Python
### Exercise 1: Basic List Operations
- Create a list named temperatures containing any five different integer values (representing temperatures in degrees Celsius).
- Print the second temperature in the list using indexing.
- Append a new temperature to the list and then print the updated list.
- Using slicing, print the first three temperatures from the list.
- Write a loop that iterates through the list and prints each temperature converted to Fahrenheit. The formula to convert Celsius to Fahrenheit is (Celsius * 9/5) + 32.

In [None]:
# Creating a list
temperature = []

### Exercise 2: Basic Dictionary Operations
- Create a dictionary named student_scores with at least five key-value pairs, where the keys are student names (strings) and the values are their scores (integers).
- Add a new student-score pair to the dictionary.
- Update the score of an existing student.
- Write a function to calculate the average score of all students and print the result.


In [1]:
# Creating a dictionary with student scores
student_scores = {"Alice": 85, "Bob": 90}

# Adding a new student-score pair


# Updating the score of an existing student


# Function to calculate average score


# Printing the average score


Average Score: 87.0


### Exercise 3: Working with Nested Dictionaries

Given is a nested dictionary where each key is a classroom (e.g., 'Class A'), and its value is another dictionary with student names as keys and their scores as values.

- Write a Python script to print each classroom's average score.

Example input: {"Class A": {"Alice": 88, "Bob": 76, "Charlie": 90}, "Class B": {"Zara": 92, "Daniel": 64}}


In [None]:
classes = {"Class A": {"Alice": 88, "Bob": 76, "Charlie": 90}, "Class B": {"Zara": 92, "Daniel": 64}}

### Exercise 4: Working with NumPy Arrays

- Import the NumPy library with the shortened alias `np` (convention and convenience).
- Create a NumPy array named numbers containing increasing integers from 1 to 100.
- Print the shape of the array.
- Calculate and print the mean, median, min and max values of the array's elements.
- Using slicing, create a new array that contains only the first 5 elements of numbers and print it.
- Multiply every element of the array by 2 (element-wise multiplication) and print the updated array.

In [5]:
import numpy as np

# Creating the NumPy array
numbers = np.arange(1, 11)
print("Array:", numbers)

# Printing the shape of the array
print("Shape:", numbers.shape)

# Calculating and printing the mean, median, min and max values
mean_value = np.mean(numbers)
print("Mean value:", mean_value)

# Slicing the first 5 elements


# Multiplying every element by 2


Array: [ 1  2  3  4  5  6  7  8  9 10]
Shape: (10,)
Mean value: 5.5


---

## Part 2. Pandas Dataframes

### Exercise 0: Dataframe Basics
- Import Pandas as `pd`.
- Create a Series with `pd.Series` by passing it a list of values.
- Create a Dataframe with `pd.Dataframe` by passing it a dictionary with lists of values.
- Explore how different data types are stored in the dataframe with `.dtypes` (it's an attribute, not a function).

In [41]:
values = [14, 42, 23, 17, 99]

In [None]:
series = ...
series

0    14
1    42
2    23
3    17
4    99
dtype: int64

In [35]:
data = {
    "Variable A": [3, 15, 45],
    "Variable B": [0.1, 0.4, 0.2],
    "Variable C": [True, True, False],
    "Variable D": ["blue", "red", "green"],
}

In [None]:
df = ...
df

Unnamed: 0,Variable A,Variable B,Variable C,Variable D
0,3,0.1,True,blue
1,15,0.4,True,red
2,45,0.2,False,green


### Exercise 1: Dataset Loading and Overview
- Load the Titanic training dataset (train.csv) into a Pandas dataframe (`pd.read_csv`).
- Look at the top five rows of the dataframe (`pd.head`) and understand what the columns represent.
- Print descriptive statistics about the dataframe (`pd.describe`).

In [1]:
import pandas as pd

df = pd.read_csv("train.csv")

In [2]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


### Exercise 2: Dataframe Indexing

There are multiple ways to access data in Pandas, let's explore them a bit.

- Access a column of your choice by its name with `df["column_name"]`.
  - This gives us a Pandas Series (1D array compared to the 2D table of a Dataframe).
  - Print the value counts with `.value_counts`.
  - Access multiple columns by providing a list of names.
- Access a single row, column and value with `df.loc`.
- Access a single row, column and value with `df.iloc`.
- Use a boolean index to find passengers that are above the age of 30.
  - We can use comparisons on different columns to create a boolean index like `df["column_name"] > X`.
  - We can use `df[<binary_index>]` with a list or array with boolean values to access rows where the index is True.
  - What is their average fare?

In [None]:
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [None]:
series = df[...]

---

## Part 3. Data Visualisation

### Exercise 1: Plot Histogram

- Import matplotlib.pyplot as plt.
- Select a column for which you want to to see the histogram.
    Always good to ask yourself: what do you expect to see?
- Use the `plt.hist` method on the Series of the selected column.
- Explore different values for the `bins` and `range` parameters.
- Use `plt.xlabel` and `plt.ylabel` to add axis labels to the plot.

In [None]:
series = df["..."]

### Exercise 2: Scatter Plot

- Select two numeric columns.
- Plot their series against each other using the `plt.scatter` function.
  - What do you expect to see?

In [None]:
s1 = df[...]
s2 = df[...]

### Exercise 3: Box Plot

- Select a numeric column.
- Use the `plt.boxplot` function to create a Box plot.
- What does it show?

In [None]:
series = df[...]

### Exercise 4: Exploring Pairwise Relationships with Pairplot

- Import Seaborn as `sns`.
- Use Seaborn's `pairplot` function to visualize pairwise relationships between our features.

In [31]:
# Create a pairplot to explore pairwise relationships


### Exercise 5: Correlation Heatmap

- Use Pandas' `corr` function and  Seaborn's `heatmap` function to visualize pairwise correlations.
  - `corr` as a helpful `numeric_only` argument.
  - Can you convert the non-numeric columns in some way so that we can include them in the correlation plot?

---

## Part 4. Free Form Exploration

- Try to think of a few questions and answer them from the data.
  - For example, was it better or worse to be in 1st class?