# Mini-project 1: WARMUP - A dataset with CSV and JSON

In this mini-project, we will generate a fake dataset to warmup with dictionaries, functional programming, and the following libraries: `csv`, `json`, `itertools`, `numpy` and `matplotlib`.

## 1. Generate composed names
### 1.1. Generate a regular list with a regular function 

Write a custom function `generate_composed_names` that returns a list of permutations to create fake composed names separated by a dash, e.g. `Alice-Maria`. 

Be careful:
* The function has 1 input parameter: a list of first names ; and returns the list of permutations with a "-" in-between
* The output list must also include the opposite, e.g. `Alice-Maria` and also `Maria-Alice`
* The output list must not contain repetitions, e.g. `Bob-Bob` (this is a permutation, not a product)

Although Python has tools to do this in some modules, it is a good exercise to start from an exmpty list and fill it progressively with functions we know.

In [None]:
# My code here

Here is a list of 11 first names
```
names = ["Bob", "Alice", "Maria", "Albert", "Paul", "Alex", "Luc", "Robert", "Dylan", "Léa", "Richard"]
```
The function call with this list must return exactly 110 composed names (i.e. the exact number of permutations w/o repetition of 11 elements) stored in a variables named `composed_names`.

In [None]:
# My code here

### 1.2. Use an existing iterator from `itertools`

**First**, make a few tests of use of `itertools.permutations` with input lists of integers. [Read the documentation](https://docs.python.org/3/library/itertools.html).
- What is the type of the return value?
- Cast it in list to get a regular list

In [None]:
# My code here

Now implement a regular function `generate_composed_names_from_permutations(...)` that makes use of `itertools.permutations` in its body. Make sure your final output is a list of 110 strings.

In [None]:
# My code here

### 1.3. Performance comparison

Use the following to benchmark your functions:
- **Time complexity**: the magic `%%timeit` at the beginning of a cell to repeat it and get its average duration of execution
- **Space complexity**: call `sys.getsizeof(x)` to get the memory used by a literal `x` (in Bytes)

Compare the time and space complexities of the call to the functions above. 

In [None]:
# My code here

Did you save any performance by using the iterator this way? And why?

*Explain here*

# 2. Generate characters as full names
We are going to generate full names (first + last names) using different methods again: a generator, an iterator, a  

## 2.1. Define the generation function as a Generator

Write a custom **generator** named `generate_characters_gen(...)` that returns a list of combinations of composed names and last names separated by a space, e.g. `Paul-Robert Loiseau`.

Recall that a generator is a function that uses the `yield` keyword.

Be careful:
* The function has 2 input parameters: a list of composed first names and a list of last names ; and returns a list of combinations
* It means that for each last name, we will insert in the resulting list as many characters as there are names in the list of first names
* Your list must be in this order: **first name and then last name**, thus it must not contain `Tournesol Paul-Alex` for instance.

In [None]:
# My code here

## 2.2. Call and iterate over the generator

Here is a list of 11 last names (from the stories of Tintin):
```
surnames = ["Dupont", "Dupond", "Haddock", "Tournesol", "Castafiore", "Lampion", "Lopez", "Loiseau", "Müller", "Sanzot"]
```

Call your generator and iterate over it with a regular `for` loop displaying `f"Character {i} is named {character}"` each line

In [None]:
# My code here

Observe the return type of the function call and discuss about the difference with an iterator.

*Explain here*

In a loop, iterate over the generator to print the string `f"Character {i} is named {character}"` for every full name.

In [None]:
# My code here

## 2.3. Define the Iterator

Let's write another code to generate the full names again, this time with an  **iterator** named `CharactersIterator`.
Recall that an iterator is a class that implements magic methods `__iter__` and `__next__` 

**Tip:** rely on the existing iterator `itertools.product` to build yours. Your iterator can consume that `product` iterator.

In [None]:
# My code here

## 2.4. Declare and iterate over the iterator

In a loop, iterate over the iterator to print the string `f"Character {i} is named {character}"` for every full name.

In [None]:
# My code here

## 2.5. Define a regular function

Finally, write a regular function `generate_characters_func` that returns a regular list of the full names by making no use of iterators or generators.

In [None]:
# My code here

## 2.6 Performance comparison

You have already consumed both the iterator and generator with your prints. Since you cannot rewind them, we have to call them again to get new ones.

Assign the following outputs to variables:
- Assign the generator call to `characters_gen` (of type `generator`)
- Assign the iterator instanciation to `characters_iter` (of type `iterator`)
- Assign the output of the function call to `characters_func` (of type `list`)

In [None]:
# My code here

Now benchmark the **time complexity** of these 3 assignments

In [None]:
# My code here

Benchmark also the **space complexity** of these 3 literals:

In [None]:
# My code here

Interpret the results in terms of time and space complexity

*Explain here*

In [None]:
characters = characters_func   # Assign the final list of full names for the next part of the exercise

# 3. Import data from a CSV file

We will associate to these characters exam marks generated by another program in a CSV file.
Use the documentation of the [`csv`](https://docs.python.org/3/library/csv.html) module for the next questions:

## 3.1. Load the file

Manually download the file [`exams.csv`](https://raw.githubusercontent.com/ymollard/python-advanced-slides/main/exercises/data/exams.csv?token=AAZEO6XULYU2ZIIZJLGSD4DBVANQC). With Python, open it, load its content, and transform-it in order to get marks by discipline, for instance `math_marks = [15, 13...]`

In [None]:
# My code here

Install the numerical module `numpy` with pip in your venv (in the PyCharm system terminal).

Use functions `numpy.mean()` et `numy.std()` to get the mean and the standard deviation of marks by discipline

In [None]:
# My code here

## 3.2. Plot the density of marks

A density plot shows, for each of the 40 possible notes in the horizontal axis (from 0 to 20 with a 0.5 step), the number of occurences of this mark, on the vertical axis. This is a way to check how data are distributed. We could build this plot by hand but popular Python libraries do it for us. 

Install the statistical module `pandas`, the plot module `matplotlib` and the scientific module `scipy` with pip in your venv (in the PyCharm system terminal).

Use [pandas.DataFrame.plot.density](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.density.html) to plot the density of marks.

In [None]:
# My code here

# 4. Produce a new dataset
## 4.1. Build the new data structure 
Transform the data you read in a data structure made of nested dictionaries and/or lists.

The structure must represtent the name of students as well as their marks to the 3 exams

For instance:
```
{
  "Alice-Maria Lampion" : {"math": 15, "french": 10, "philosophy": 11.5},
  "Paul-Alex Loiseau" : {"math": 8.5, "french": 17, "philosophy": 15},
  ...
}
```


In [None]:
# My code here

## 4.2. Save you data structure in JSON

Import module `json` and use `json.dump()` to save your database un a file: `dataset.json`

Protip: add the paramter `indent=4` in order to make your JSON file readable by a human with a simple text editor. Open the file without Python to observer.

In [None]:
# My code here

## 4.3. Read and check

We are now going to check that we can load properly the JSON file with `json.load()`.

We will first crash voluntarily this Jupyter Notebook in order to start from scratch. Your code will remain in your browser but all variables will be lost.

In [None]:
# We volontarily end the interpreter here to make sure all previous variables are cleared up.
import os
os._exit(0)

Now re-load the JSON file into a Python variable and consult the math mark of Paul-Robert Müller :

In [None]:
# My code here

# Resources

* itertools: https://docs.python.org/3/library/itertools.html
* Functional programming: https://docs.python.org/3/howto/functional.html
* csv: https://docs.python.org/3/library/csv.html
* json: https://docs.python.org/fr/3/library/json.html
* pandas: https://pandas.pydata.org/pandas-docs/stable/

