# Assignments - module 0

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vigji/python-cimec/blob/main/assignments/Assignments_0.ipynb)

This notebook contains the assignments to complete for credits for the first module. Once you're happy with your solutions, send it to me in any form (email the file, share it through Colab/Google Drive, send me a link to your GitHub repo...).

Deadline: still pending, together with the rest of the assignments - likely beginning of September.

Try to keep in mind not only the goal of the exercise, but also all the coding best practices we have been considering in the lectures. Do your best, and feel free to ask for help! 

## 0. The Library of Babel

In [The Library of Babel](https://sites.evergreen.edu/politicalshakespeares/wp-content/uploads/sites/226/2015/12/Borges-The-Library-of-Babel.pdf), the Argentinian writer Jorge Louis Borges imagines a universe where people are born and live inside a giant library that contains an astronomical number of books. Each book:

    ...contains four hundred ten pages; each page, forty lines; each line, approximately eighty black letters.   [...] all books, however different from one another they might be, consist of identical elements: the space, the period, the comma, and the twenty-two letters of the alphabet.


Books in this library consist in all possible combinations of such symbols; as a consequence, the library contains: 

    ... All-the detailed history of the future, the autobiographies of the archangels, the faithful catalog of the Library, thousands and thousands of false catalogs, the proof of the falsity of those false catalogs, a proof of the falsity of the true catalog, the gnostic gospel of Basilides, the commentary upon that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book into every language, the interpolations of every book into all books, the treatise Bede could have written (but did not) on the mythology of the Saxon people, the lost books of Tacitus.



### Exercise 0.0
In the library rests an enigmatic quantity: the number of books it contains.

Let's calculate the number of books in the library! The formula is simple, but think beforehand: 
- Can we compute this number? Can you estimate a lower boundary for the space that the number would need in memory?
- Can we represent the result with a float?
- Can we represent it with an int? 

Try to calculate the number and assign it to a variable. Read its size with the sys.getsizeof() function. Was your estimate reasonable?

(idle bonus question: do you think that your birth date, expressed as the sequence DDMMYY (day, month, shortened year), can be found somewhere in that sequence of numbers? If you're curious, try to find it!)

In [None]:
import sys

### Exercise 0.1

Make a generator to create different books from the library every time it runs - ideally, in a randomic way. Then, make it a function!

In [None]:
import random

### Exercise 0.2

Find in a given book all words / sequences of words that have a meaning in English. How many do you expect to find based on their length?

In [None]:
# To get a list of english words, you are given the following function:

import requests

def get_english_words_list():
    """Download a reasonably complete English dictionary.
    """
    resp = requests.get("https://www.mit.edu/~ecprice/wordlist.10000")
    return resp.text.split("\n")

# Call it and assign the resulting list to a variable!

# Then, write a function to look up for words in a book, returning a list of the words that were found.


#### Exercise 0.4

Use the `%%timeit` special command to measure how long does it take to check for words in a book. How much time would you need to check the whole library?

## 1. Spotted UniTn

In this exercise, we'll be doing some stats on a dataset of all the people employed at UniTn scraped from the UniTn website.

**Note**: We have not learned yet how to use arrays, matrices, and dataframes. Some of the analysis in this exercise will inevitabily look a bit cumbersome, because they are - with the tools we have now. They'll become a piece of cake with `pandas`!

In [None]:
import json
import requests

def get_unitn_hr_dataset():
    """Download all data about UniTn employees from their website.
    
    !!!Note: all information we are using here is made openly available from 
    the university. However, please do appreciate the power of similar data
    scraping through any of the online platforms we're giving our data to,
    were there some security holes! 
    This is no endorsment toward trying anything like that yourself, hacking 
    is bad. No seriously, it is. Also, copyright is good.
    
    Returns:
    
        list : A list of uni employees.
    
    """
    
    # This string contains the address at which we'll find the dataset:
    UNITN_PEOPLE_URL = "https://dati.unitn.it/du/Person/en"

    # Get page response:
    response = requests.get(UNITN_PEOPLE_URL)

    # Parse a json from the page:
    json_data = json.loads(response.text)

    # Get actual data and return:
    return json_data["value"]["data"] 


#### Exercise 1.0

Call the function and try to have a look at the result. How many people are employed at the university? How many at each department? Which is the deparment with the most professors?

Make a nice `print` of all those results! (You'll see a lot of different departments. You can filter results for the ones with at least 10 people)

- If people have multiple affiliations, count them in each one of them. Eg, if someone is listed under both  `"Center for Mind/Brain Sciences - CIMeC"` and `"CeRiN - Center for Neurocognitive Rehabilitation"`, put the person in the count for both departments.
- If a person is listed with two different roles at the same department (e.g., as both `"Graduate student"` and `"Research intern"`) count that person only once for that department.

#### Exercise 1.1

Imagine you want to call-bomb the `"Department of Economics and Management"` for a prank. You'll first need a list of all the phone numbers you can find in that department. Create that list!

#### Exercise 1.2



Use the function below to get a dictionary of Italian names divided by gender. 

Then, print out the gender ratio (how many women, how many men) for all the position roles that you can find in the dataset (filter out positions with less then 10 people). Then, jump to conclusions!

- If a person has multiple roles count them for each of the roles they have
- Yes, it can be erroneous to infer gender just from the name; here we assume potential errors will average out in the large numbers.
- Yes, this will consider only Italian employees. You can print out how many names were left out (and which ones), and if you want try and improve the function by including international names in the list as well!

In [None]:
import requests

def get_names():
    """Download a list of italian names, divided by gender.
    
    Returns:
    
        dict : A dictionary of masculine and feminine names.
        
    """
    
    # This string contains the address at which we'll find the names:
    FIRST_NAMES_URL = "https://gist.githubusercontent.com/metalelf0/a2ab283d0d5fd9b4b8a10d6427630627/raw/b848ffee70464fd39714a1a621f3a2eba6c3812e/italian_names.md"

    # Get page response:
    response = requests.get(FIRST_NAMES_URL)
    
    # read the response as string:
    raw_content = response.text 
    
    # split lines and exclude fir header (# Male names):
    full_names_list = raw_content.split("\n")[1:]
    
    # Look for the header "# Female names":
    female_header_idx = full_names_list.index('# Female names')

    # Names before header are male, after are female:
    return dict(male=full_names_list[:female_header_idx],
                female=full_names_list[female_header_idx + 1:])
     

## 2. A Class `class`

Here you will implement classes to represent students and a classroom. I give lists of possible attributes and methods that you can define on them, but feel free to interpret the instructions, or ignore them and add other methods and attributes of your making if you wish! 

To generate fake people, feel free to use either combinations of the above function and `random`,  or [the `faker` library](https://faker.readthedocs.io/en/master/) (funny library to know about! To install it, write in a cell of the notebook `!pip install Faker` and run it)

#### Exercise 2.0

Define the class `Student` to represent a student. It may contain something along the following:

Attributes:
- a `name`
- a `phone_number`
- a `knowledge_score` in some range

Methods:
- a `learn` method that takes as input a number of hours and increases the `knowledge_score`.
- a `undertake_exam` method that generates a test score in some range proportionally to the `knowledge_score` plus a random effect that you can incorporate with the `random` library

Bonus:
- implement the `__eq__()` special method to return `True` if two students have the same name and phone number
- implement the `__repr__()` special method to show nicely info on the student when the variable is shown

Implement the class with its docstrings, and write some code cells to show that it behaves properly.


In [None]:
class Student:
    ...

#### Exercise 2.1

Define the class `Class`, that represents a classroom of students, document it and show how it can be used. It may contain something along the following:

Attributes:

 - a list of students (use the `Student` object!)
 - a lectures counter
 - an exam logbook with all the exams taken; use the format that you find most suitable, but think about a reasonable and accessible way to keep track of all grades from all students

Methods:
 - a method to add a new student to the class
 - a lecture method that increases the knowledge score of students (using the `Student` class methods)
 - a do test method that gets scores for a test for the whole classroom and store it in the log book (using the `Student` class methods)
 
Bonus:
- implement the `__eq__()` special method to return `True` if all students have the same names
- implement the `__getitem__()` special method to get individual students from the classroom with the square brackets indexing
- implement the `__repr__()` special method to show nicely info on the class
 

Implement the class with its docstrings, and write some code cells to call the various methods and to show that it behaves properly.

In [None]:
class Class:
    ...