# DS 3000 Lab 1

Due: Thursday, Sept. 16 @ 11:59 pm

### Submission Instructions
Submit this `ipynb` file to Gradescope (this can also be done via the assignment on Canvas).  To ensure that your submitted `ipynb` file represents your latest code, make sure to give a fresh `Kernel > Restart & Run All` just before uploading the `ipynb` file to gradescope.

### Tips for success
- Collaborate: bounce ideas off of each other, if you are having trouble you can ask your classmates or Dr. Singhal for help with specific issues, however...
- Under no circumstances may one student view or share their ungraded homework or quiz with another student [(see also)](http://www.northeastern.edu/osccr/academic-integrity), i.e. you are welcome to **talk about** (*not* show each other your answers to) the problems.

In [1]:
# you may want to use the below modules on this lab
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import requests
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
from bs4 import BeautifulSoup

# Part 1: Sketch and Begin Implementing a Pipeline

We wish to create a data frame that includes all the spells for each class (a "class" is something like a "wizard", or a "bard") in Dungeons and Dragons 5th Edition, which you can find [here](http://dnd5e.wikidot.com/). Your final data frame should look something like:

| Class     | Level     | Spell Name    | School      | Casting Time | Range                | Duration      | Components |
|----------:|----------:|--------------:|------------:|-------------:|---------------------:|--------------:|-----------:|
| Artificer | Cantrip   | Acid Splash   | Conjuration | 1 Action     | 60 Feet              | Instantaneous | V, S       |
| Artificer | Cantrip   | Booming Blade | Evocation   | 1 Action     | Self (5-foot radius) | 1 Round       | S, M       |
| ...       | ...       | ...           | ...         | ...          | ...                  | ...           | ...        |
| Wizard    | 9th Level | Wish          | Conjuration | 1 Action     | Self                 | Instantaneous | V          |

## Part 1.1: Poking Around (10 points)

Go to the D&D 5th Edition linked above. Scroll down to the "All Spells" list and click on "Artificer Spells", spend a moment looking around at the page, then "Bard Spells" to do the same, and make note of the url of each. What do you note about the pages and url that should be pretty convenient for scraping the data we are interested in for all different types of spells? Discuss anything else you might notice about the pages that may be either tricky or convenient to deal with. Note that in our desired data frame, we include the "Class" and "Level" for each spell.

## Part 1.2: Sketch the Pipeline (20 points)

First, in a markdown cell, write a bullet point list of tasks we need to get the data frame we want. I'll give you what the first bullet point should be, and you fill in the rest (there may be only one more, depending on how efficient you are in describing the tasks...):

- Write a function that takes a class (string) as an argument and returns the tables from the class's DND wiki spell page in a dictionary for each spell level
- ... 

Then, in a code cell, define **empty** functions that correspond to the tasks you identified as needing done. For example, the function for the first bullet point above might start with:

```python
def get_class_spell_dict(dnd_class):
    """ takes a D&D class (string) and gets the spell tables and saves them in a dictionary
    
    Args:
        dnd_class (str): the D&D class
        
    Returns:
        table_dict (dict): a dictionary of tables, one for each spell level
    """
    pass
```

## Part 1.3: Write the first function (30 points)

Go ahead and write the first function, and then test it by getting the dictionary of Druid spell tables. Show that it works by printing out the head of the 4th level Druid spells. Your final calls should be something like:

```python
druid_spell_tables = get_class_spell_dict("druid")
druid_spell_tables['Level 4'].head()
```

**Note**: depending on how you create the dictionary in your `get_class_spell_dict` function, you may not have `'Level 4'` as the key; that's fine. The top of the table should look like:

| Spell Name       | School        | Casting Time | Range                | Duration      | Components |
|-----------------:|--------------:|-------------:|---------------------:|--------------:|-----------:|
| Blight              | Necromancy    | 1 Action     | 30 Feet              | Instantaneous       | V, S   |
| Charm Monster	 | Enchantment   | 1 Action    | 30 Feet              | 1 hour      | V, S

# Part 2: Web Scraping EuroMillions Results

For this problem, we will begin to create a small data set scraped from [Euro-Millions](https://www.euro-millions.com/) which is a lottery that is played across nine European countries. Draws take place on Tuesday and Friday evenings with a minimum guaranteed jackpot of â‚¬17 million. 

## Part 2.1: The Scraper Function (20 points)

Complete the function `get_lottery_html()` below (including docstring) which visits the lottery results for a specific date and grabs the html. Visit [the website](https://www.euro-millions.com/results/) to select a date or two and notice the pattern in the url so that you can pass any date to the function as a string. 

**Make sure to remove the `pass` statement when you are finished**. I have written the code you should run once the function is completed, getting the lottery results from the last day in April.

In [2]:
def get_lottery_html(code):
    
    pass


In [None]:
# when you are done the following code should be run
url_text = get_lottery_html('06-05-2025')
# url_text # uncomment to see if it worked

## Part 2.2: The Soup Function (20 points)

Complete the function `get_country_soup()` below (including docstring) which takes the html from the previous function and outputs one of nine beautiful soup objects, depending on the country you are interested in as defined by the `'id'` attribute:

    - `id='PrizeAT'` (Austria)
    - `id='PrizeBE'` (Belgium)
    - `id='PrizeFR'` (France)
    - `id='PrizeIE'` (Ireland)
    - `id='PrizeLU'` (Luxembourg)
    - `id='PrizePT'` (Portugal)
    - `id='PrizeES'` (Spain)
    - `id='PrizeCH'` (Switzerland)
    - `id='PrizeGB'` (UK)
    
The function should take two arguments: the html object from `get_lottery_html()` and a string that specifies the `id` you are interested in (by default, Belgium or `BE`).
    
**Make sure to remove the `pass` statement when you are finished.** Then, also make sure to run the code to ensure your function works.

In [4]:
def get_country_soup(html, country):
    
    pass


In [5]:
# when you are done the following code should be run (feel free to change the country if you wish)
country_choice = 'FR'
my_country_soup = get_country_soup(url_text, country_choice)
# my_country_soup #uncomment to see if it worked