# Module 2 Assessment: Solutions for 071519

### by Rob + Yish 😬

Welcome to your Mod 2 Assessment. You will be tested for your understanding of concepts and ability to solve problems that have been covered in class and in the curriculum.

Use any libraries you want to solve the problems in the assessment.

You will have up to two hours to complete this assessment.

The sections of the assessment are:

- Accessing Data Through APIs
- Object Oriented Programming
- SQL and Relational Databases
- HTML, CSS and Web Scraping
- Other Database Structures (MongoDB)

In this assessment you will be exploring two datasets: Pokemon and Quotes.

In [None]:
# import the necessary libraries

import requests
import json
import pandas as pd
import sqlite3
from bs4 import BeautifulSoup
# import pymongo

## Part 1: Accessing Data Through APIs

In this section we'll be using PokeAPI to get data on Pokemon. Let's first define functions to get information from the API. Provided below is a URL that will get you started with the first 151 Pokemon! Run the cell below to see what we get.

In [None]:
url = 'https://pokeapi.co/api/v2/pokemon/?limit=151'
results = requests.get(url).json()['results']
results

[Read the documentation here](https://pokeapi.co/) for information on navigating this API and use the API to obtain data to answer the following questions.

### Accessing Data

1. For any **one** Pokemon, retrive the following information in a dictionary format with the following keys:
    - ID
    - Name
    - Base experience
    - Weight
    - Height
    - Types
    - Abilities

For `Types` and `Abilities`, you might want to write helper functions to have each attribute just be a list of types and a list of abilities. Your output should look like this:

```
{'id': 1, 
'name': 'bulbasaur', 
'base_experience': 64, 
'weight': 69, 
'height': 7, 
'types': ['poison', 'grass'], 
'abilities': ['chlorophyll', 'overgrow']}

```
    


In [None]:
""" SOLUTION: data for one Pokemon """

# helper functions for types and abilities

def typelist(types):
    result = []
    
    # iterating through the nested dict and appending to the empty list:
    
    for i in range(len(types)):
        result.append(types[i]['type']['name'])
    return result

def abilitylist(abilities):
    result = []
    
    # iterating through the nested dict and appending to the empty list:
    
    for i in range(len(abilities)):
        result.append(abilities[i]['ability']['name'])
    return result

In [None]:
def get_pokedata(url):
    
    # getting full results for one pokemon
    info = requests.get(url).json() 
    
    # list of keys with values that don't need editing
    keys = ['id', 'name', 'base_experience', 'weight', 'height'] 
    data = {k: info[k] for k in keys} # dictionary comprehension
    
    # using the two helper functions to add types and abilities
    data['types'] = typelist(info['types'])
    data['abilities'] = abilitylist(info['abilities'])
    
    return data

### Pagination

2. Get the same information for the first **151** Pokemon as a list of dictionaries ordered by Pokemon ID. Print the first and last elements of the list. (Hint: Use pagination) Your output should save the list to a variable and look like this:

```
[{'id': 1, 
'name': 'bulbasaur', 
'base_experience': 64, 
'weight': 69, 
'height': 7, 
'types': ['poison', 'grass'], 
'abilities': ['chlorophyll', 'overgrow']}, 
{'id': 2, 
'name': 'ivysaur', 
'base_experience': 142, 
'weight': 130, 
'height': 10, 
'types': ['poison', 'grass'], 
'abilities': ['chlorophyll', 'overgrow']}, ... ]

```



In [None]:
""" SOLUTION: data for 151 Pokemon """

# list comprehension to get a list of just URLs
urls = [r['url'] for r in results]

# list comprehension with the previous function to get full data
pokedata = [get_pokedata(url) for url in urls]


In [None]:
# printing first and last elements

print(pokedata[0], pokedata[-1])

## Part 2: Object Oriented Programming

We're going to use the data gathered in the previous section on APIs for this section on Object Oriented Programming to instantiate Pokemon objects and write instance methods.

### Creating a Class

1. Create a class called `Pokemon` with an `__init__` method to instantiate the following attributes:
    - ID
    - Name
    - Base experience
    - Weight
    - Height
    - Types
    - Abilities
    
    
### Instantiating Objects

2. Using the data you obtained from the API, instantiate the first, fourth and seventh Pokemon. Assign them to the variables `bulbasaur`, `charmander` and `squirtle`.


In [None]:
# if you were unable to get the data from the API in the right format,
# uncomment the code below to access a JSON file with the list of dictionaries

# with open('data/pokemon.json') as f:  
#     pokelist = json.load(f)

In [None]:
class Pokemon:
    
    def __init__(self, ID, name, exp, weight, height, types, abilities):
        self.ID = ID
        self.name = name
        self.exp = exp
        self.weight = weight
        self.height = height
        self.types = types
        self.abilities = abilities
        
    def bmi(self):
        return (self.weight*0.1)/(self.height*0.1)**2
        


In [None]:
""" SOLUTION: instantiating three Pokemon """

# function to instantiate a Pokemon
def instantiate_pokemon(info):
    pokemon = Pokemon(info['id'], 
                      info['name'], 
                      info['base_experience'], 
                      info['weight'], 
                      info['height'], 
                      info['types'],
                      info['abilities'])
    return pokemon

"""
can also be done manually:

bulbasaur = Pokemon(1, 'bulbasaur', 64, 69, 7, ['poison', 'grass'], ['chlorophyll', 'overgrow'])

etc.

"""

bulbasaur = instantiate_pokemon(pokedata[0])
charmander = instantiate_pokemon(pokedata[3])
squirtle = instantiate_pokemon(pokedata[6])

In [None]:
# run this cell to test and check your code
# you may need to edit the attribute variable names if you named them differently!

def print_pokeinfo(pokemon_object):
    o = pokemon_object
    print('ID: ' + str(o.ID) + '\n' +
          'Name: ' + o.name.title() + '\n' +
          'Base experience: ' + str(o.exp) + '\n' +
          'Weight: ' + str(o.weight) + '\n' +
          'Height: ' + str(o.height) + '\n' +
          'Types: ' + str(o.types) + '\n' +
          'Abilities: ' + str(o.abilities) + '\n'
         )
    
print_pokeinfo(bulbasaur)
print_pokeinfo(ivysaur)
print_pokeinfo(venusaur)

### Instance Methods

3. Write an instance method within the class `Pokemon` to find the BMI of a Pokemon. BMI is defined by $\frac{weight}{height^{2}}$ with weight in **kilograms** and height in **meters**. The height and weight data of Pokemon from the API is in **decimeters** and **hectograms** respectively.


    1 decimeter = 0.1 meters
    1 hectogram = 0.1 kilograms

In [None]:
# run this cell to test and check your code
# you probably have to rerun the code to instantiate your objects

print(bulbasaur.bmi()) # 14.08
print(charmander.bmi()) # 23.61
print(squirtle.bmi()) # 36

## Part 3: SQL and Relational Databases

For this section, we've put the Pokemon data into SQL tables. You won't need to use your list of dictionaries or the JSON file for this section. The schema of `pokemon.db` is as follows:

<img src="data/pokemondb.png" alt="db schema" style="width:500px;"/>

Assign your SQL queries as strings to the variables `q1`, `q2`, etc. and run the cells at the end of this section to print your results as Pandas DataFrames.

- q1: query all columns from `Pokemon` the Pokemon that have base_experience above 200
- q2: query the id, name, type1 and type2 of Pokemon that have **water** types as either their first or second type
- q3: query the average weight of Pokemon by their first type in descending order
- q4: query the Pokemon name, Pokemon type2, and what **type2** has "2xdamage" to
- q5: query the top 5 most common type1s, the minimum height, maximum height, minimum weight and maximum weight of pokemon with those type1s, and what associated type they do "0.5xdamage" to


**Important note on syntax**: use `double quotes ""` when quoting strings **within** your query and wrap the entire query in `single quotes ''` For the column titles that begin with numbers, you need to wrap the column names in double quotes.

In [None]:
cnx = sqlite3.connect('data/pokemon.db')

In [None]:
q1 = 'select * from pokemon where base_experience > 200'
pd.read_sql(q1, cnx)

In [None]:
q2 = 'select id, name, type1, type2 from pokemon where type1 == "water" or type2 == "water"'
pd.read_sql(q2, cnx)

In [None]:
q3 = 'select type1, avg(weight) from pokemon group by type1 order by avg(weight) desc'
pd.read_sql(q3, cnx)

In [None]:
q4 = 'select pokemon.name, type2, "2xdamage" from pokemon join types on pokemon.type2 = types.name'
pd.read_sql(q4, cnx)

In [None]:
q5 = 'select type1, count(*), min(height), max(height), min(weight), max(weight), "0.5xdamage" from pokemon \
        join types on pokemon.type1 = types.name \
        group by type1 \
        order by count(*) desc limit 5'
pd.read_sql(q5, cnx)

## Section 4: Web Scraping

### Accessing Data Using BeautifulSoup

Use BeautifulSoup to get quotes, authors, and tags from [Quotes to Read](http://quotes.toscrape.com/).

First go to the site and inspect the page, look at what links there are and how the entire site is structured.

1. Get the first author and the path for the author's page as a tuple from the [homepage](http://quotes.toscrape.com/).

In [None]:
# Make a get request to retrieve the page
html_page = requests.get('http://quotes.toscrape.com/') 
# Pass the page contents to beautiful soup for parsing
soup = BeautifulSoup(html_page.content, 'html.parser')

# Your code here


In [None]:
""" SOLUTION: data for one author """

author = soup.find('small')
author.find_next_siblings()[0].get('href')
(author.text, author.find_next_siblings()[0].get('href'))

2. Write a function to get **all** the authors and href links for the authors from the [homepage](http://quotes.toscrape.com/)


In [None]:
def authors(url):
    '''
    input: url
    
    return: a dictionary of of authors and their urls
            {'author_1':'url_of_author_1', 'author_2':'url_of_author_2' ...}
    '''
    pass

In [None]:
""" SOLUTION: data for all the authors on a page """

def authors(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
    authors = soup.find_all('small')
    author_dictionary = {}
    for author in authors:
        author_dictionary[author.text] = author.find_next_siblings()[0].get('href')
    return author_dictionary

In [None]:
# run this cell to test the function
print(authors('http://quotes.toscrape.com/'))
print('\n')
print(authors('http://quotes.toscrape.com/page/3'))

### Pagination

3. Get the first author on each of the first 5 pages of quotes. You can get to the next page with the next button at the bottom of the homepage.


In [None]:
# Your code here


In [None]:
""" SOLUTION: get_some_quotes """

for i in range(1,6):
    html_page = requests.get(f'http://quotes.toscrape.com/page/{i}/')
    soup = BeautifulSoup(html_page.content, 'html.parser')
    author = soup.find('small')
    print(author.text)

4. Write a function to get all of the quotes from a page.

In [None]:
def get_some_quotes(url):
    '''
    input: url, number of pages to scrap (just scrape the home page if no argument is passed in)
    
    return: a list of dictionaries of quotes with their attributes
            [{'quote':'quote_1_text', 'author':'url_of_author_1'}, 
            {'quote':'quote_2_text', 'author':'url_of_author_2', 'quote_tags':[list_of_quote_2_tags]}, ...]
    '''
    pass

In [None]:
""" SOLUTION: get_some_quotes """

def get_some_quotes(url):
    # Make a get request to retrieve the page
    html_page = requests.get(url) 
    # Pass the page contents to beautiful soup for parsing
    soup = BeautifulSoup(html_page.content, 'html.parser')
        
    list_quotes = []
    for i in soup.find_all(class_="quote"):
        quotes = {}
        quote = (i.find(class_="text").text)
        quotes['quote'] = quote
        list_quotes.append(quotes)
        author = i.find(class_ = "author").text
        quotes['author'] = author
    return list_quotes

In [None]:
# set the function to a variable to use later
quotes_for_mongo = get_some_quotes('http://quotes.toscrape.com/' )
quotes_for_mongo

## Part 5: MongoDB

To do this section open a connection to a mongo database in the terminal, using `mongod` You will **create**, **update**, and **read** from a mongo database.

Create and connect to a mongo database.

In [None]:
myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")
mydb = myclient['quote_database']

In [None]:
mycollection = mydb['quote_collection']

1. Add the quotes from `get_some_quotes` for the [homepage](http://quotes.toscrape.com/) or use the JSON file `quotes.json` for this section. To verify this get the resulting _ids back from the `results` variable.

In [None]:
# if not using  the get_some_quotes function read in the JSON file and set it to variable data

with open(r"data/quotes.json", "r") as r:
    data = json.load(r)

In [None]:
# results is variable th
results = None

In [None]:
""" SOLUTION:  for adding data in the database"""

### add the data from the JSON file
results = mycollection.insert_many(data)

### add the data from the get_some_quotes function
# results = mycollection.insert_many(quotes_for_mongo)

# check they are in the database
results.inserted_ids

2. Query the database for all the quotes by `'Albert Einstein'`.

In [None]:
q1 = None

In [None]:
""" SOLUTION: data for Albert Einstein quotes """

q1 = mycollection.find({'author':'Albert Einstein'})
for x in q1:
    print(x)

3. Update Steve Martin's quote with the tags for the quote stored in the variable `steve_martin_tags`.

In [None]:
steve_martin_tags = {'quote_tags': ['change', 'deep-thoughts', 'thinking', 'world']}
update_steve = None
first_quote_tags = None


In [None]:
""" SOLUTION: data for Steve Martin tags """

update_steve = {'author': 'Steve Martin'}
steve_quote_tags = {'$set':steven_martin_tags}

mycollection.update_one(update_steve, steve_quote_tags)

4. Query the database to confirm that  `'Steve Martin'` is updated with `steve_martin_tags`.

In [None]:
q2 = None

In [None]:
""" SOLUTION: data for Steve Martin tags query """

q2 = mycollection.find({'author': 'Steve Martin'})
for item in q2:
    print(item)