# Essentials for Data Science 2022/2023: Exam

(2023-06-05)

## Introduction

The primary goal is to **fill in this notebook with Python code** producing answers to the questions.

During the exam:

- Using personal notes, books, the online course materials or other on-line documents is permitted.
- It is forbidden to communicate with other individuals.
- ChatGPT or similar tools are permitted, but you need to declare that you use them. The use of such tools will NOT be taken into account in the grading process.

### Steps

1. Download (from Brightspace):
    - `exam.ipynb`: this Python notebook
    - `fib_true.txt`, `fib_false.txt`: two text files with numbers; you may use these files to test your code in the Fibonacci sequence question (1.D.).
    - `msleep.csv`: a CSV-formatted table needed for the data manipulation and visualization questions (2.).
2. Edit `exam.ipynb` to answer each question:
    - Below each question there is one cell which you need to fill in with the Python code solving the question.
    - The complete answer needs to fit in the single cell.
    - The first line of the cell contains a comment with a ChatGPT declaration. You have two options `NO/YES`. Please leave `YES` if you have used ChatGPT (or a similar tool) to solve the question. Otherwise leave `NO`.
4. Submit (in Brightspace):
    - Submit the `exam.ipynb` file in Brightspace. You may submit more than once. The last submission will be graded.


## 1. Basic Python questions

### 1.A. [12p] A small restaurant

1. [2p] **Menu:** A small restaurant serves several products. Each product has a price.  
Create manually a dictionary mapping product names to their prices (have at least 7 products; `coffee` costs 2.50, `tiramisu` costs 5.00, ...)  

2. [3p] **Generate Order:** A customer orders one or more products.  
Write a function which generates a random list of 1 to 7 product names.
The same product can be ordered multiple times (e.g. `coffee`, `tiramisu`, `coffee`, `coffee`).

3. [3p] **Calculate Price:**  
Write a function which takes a list of products as the argument and calculates the total price.  
Use the price dictionary generated above.

4. [2p] **Orders of Customers:**  
Create manually a list with at least 5 random names of customers.  
Use a comprehension to build a dictionary where the keys are the names of the customers and the values are their randomly generated orders.  
Use the function developed above.

5. [1p] **Prices of Orders:**  
Use a comprehension to build a dictionary where the keys are the names of the customers and the values are the calculated prices of their orders.  
Use the function developed above.

6. [1p] **Discount:**  
Modify your function for calculating the total price so that it offers a discount of 10% if the customer orders at least 3 items.


In [None]:
### QUESTION_1A_ChatGPT:NO. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###
# 1.


### 1.B. [6p] Counting letters

1. [1p] **Text:** Manually create a variable with a random sentence (English language, at least 8 words). Use only small letters and spaces.  

2. [2p] **Counting:** Build a dictionary where keys are the characters of the sentence and the values are obtained by counting the occurrences of each character in the sentence.

3. [3p] **Sorting/printing:** Print the dictionary as shown below (each line with one letter followed by a space and then the number of occurrences). The lines should be printed in the alphabetical order of the characters.

For example, for the sentence `'the course offers a practical introduction to a few programming languages and tools currently used in data science'` the program should produce the following output (the first line corresponds to 17 space characters, the second line to 10 `a` characters, the third line to 7 `c` characters, etc.):
```
  17
a 10
c 7
d 4
e 9
f 3
g 4
h 1
i 6
l 4
m 2
n 8
o 8
p 2
r 8
s 6
t 8
u 5
w 1
y 1
```

In [None]:
### QUESTION_1B_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 1.C. [8p] ATM machine

An ATM can provide 100, 50, 20, 10 and 5 euro banknotes:
- [6p] Write a function which takes as an argument the amount of money to be withdrawn and returns a list of banknotes to be dispensed by the ATM. The function should return the smallest number of banknotes possible.
- [1p] If the amount cannot be expressed in any combination of banknotes then it can not be withdrawn and the function should raise a ValueError exception with a meaningful message.
- [1p] If the amount needs more than 20 banknotes then the function should raise a ValueError exception with a meaningful message.

Example/hint: for the amount of 375 the returned list should be `[100, 100, 100, 50, 20, 5]` (in any order, but it is useful to process the banknotes in decreasing order of values).

The 6p will be split as follows: [2p] for construction of the result, [3p] for structure of the loops and [1p] for overall good structure of the function.

In [None]:
### QUESTION_1C_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 1.D. [8p] Fibonacci sequence

Imagine, that a colleague was requested to calculate the first N numbers of the Fibonacci sequence, shuffle their order and then write them to a file.  
Below you can see an example of the contents of such a file for N=8.
Each line contains one integer number and the number of lines is equal to N.  
The Fibonacci sequence starts with 0, 1, 1, 2, 3, 5, 8...

```
3
5
1
0
2
8
1
13
```

Now you are provided a file with N numbers. You need to check whether these numbers represent a shuffled Fibonacci sequence.  
Write a function which reads all numbers from the file, sorts them and compares them with the first N numbers of the Fibonacci sequence.  
The function should return `True` if the numbers were generated as expected and `False` otherwise. The argument of the function is the name of the file with numbers.  
You may test your function with the files `fib_true.txt` (expected result: `True`) and `fib_false.txt` (expected result: `False`) provided in Brightspace.

The scores will be split as follows: [3p] for correct calculation of the Fibonacci sequence, [2p] for correct reading of the numbers from the file, [1p] for correct sorting of the numbers, [1p] for performing the correct comparison and [1p] for overall good structure of the function(s).

In [None]:
### QUESTION_1D_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

## 2. Data manipulation and visualisation questions

For this part you will need the `pandas` and `seaborn` modules


In [None]:
import pandas as pd
import seaborn as sb


You will analyse the `msleep` dataset which contains sleep properties of several mammals.  
Each row of the `msleep` table describes one mammal. There are more columns but for the exam you will need only the following variables:

- `name`: Common name (e.g. `Cow`).
- `vore`: Eating category -- what food type the animal eats (e.g. `carni` eats only meat, `herbi` eats only plants, etc).
- `order`: Taxonomic rank, e.g. Primates, Carnivora, etc.
- `sleep_total`: Total amount of sleep per day, in hours.
- `sleep_rem`:  Total amount of sleep in the REM phase, in hours.
- `brainwt`: Brain weight, in kilograms.
- `bodywt`: Body weight, in kilograms.



### 2.A. [2p] Table Summary


- Read the `msleep` dataset int a `pandas` DataFrame.
- [1p] How many rows does it contain?
- [1p] List the types of the variables (columns).

In [None]:
### QUESTION_2A_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.B. [3p] Missing values

Calculate the percentage of non-missing values in the `brainwt` column.

In [None]:
### QUESTION_2B_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.C. [3p] DataFrame creation

[1p] Create a new DataFrame with a single row and with the following variables:
  - [1p] `max_sleep_total`: the maximum total amount of daily sleep (based on `sleep_total` column)
  - [1p] `mean_sleep_rem`: the average length of the REM sleep phase (based on `sleep_rem` column)

In [None]:
### QUESTION_2C_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.D. [1p] Filtering

[1p] What is the `name` of the heaviest animal?

In [None]:
### QUESTION_2D_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.E. [1p] Counting

[1p] Count the number of observations per `vore` category. Ignore missing values.

In [None]:
### QUESTION_2E_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.F. [2p] Boxplot

[2p] Create the following boxplot:

- The horizontal axis should represent the `vore` column.
- The vertical axis should represent the `sleep_rem` column.

In [None]:
### QUESTION_2F_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.G. [2p] Scatterplot

[2p] Create a scatterplot of `brainwt` against `bodywt` of the order `Rodentia` only.

In [None]:
### QUESTION_2G_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.H. [2p] KDE plot

- [1p] Create KDE plot of total sleep against body weight coloured by `vore` categories. 
- [1p] Use only observations with `bodywt`<100.

In [None]:
### QUESTION_2H_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###

### 2.I. [3p] New column


Add the column `weight_group` to the dataset with categories:

- `light` -- when `bodywt` is below 0.1,
- `heavy` -- when `bodywt` is 10 or more,
- `middle` -- otherwise.


In [None]:
### QUESTION_2I_ChatGPT:NO/YES. Leave YES or NO to indicate whether you used ChatGPT here. Write your solution below in this cell. ###