# Quiz 2
DS 3000
Spring 2021

After completing the quiz below, please follow the instructions below to submit:
1. "Kernel" -> "Restart & Run All"
1. save your quiz file to this latest version
1. upload the `.ipynb` to gradescope **before** clicking submit
1. ensure that you can see your jupyter notebook in the gradescope interface after clicking "submit"

We specify the last note above as gradescope will allow you to "submit" without uploading a file.  It is your responsibility to ensure that you've actually submitted a file.

### Instructions
Build a pipeline of functions which gets data from the "Historic values for a single state" API https://covidtracking.com/data/api and returns a dataframe of all the data received.
- each row of the resulting dataframe should correspond to a unique day of data
- clean the `date` column so that the resulting dataframe is of type `date`
    - remember you can get a `date` object by calling the `.date()` method of a `datetime` object
- each of the other columns should be left unmodified
- your software should allow a user to easily swap between states by passing a different state abbreviation 
    - (e.g. `ma` or `nj`)
    
In the final cell of your submission, call your functions and output the portition of the dataframe shown in the hint below:

```python
state = 'ma'
some_variable0 = some_fnc0(state)
some_variable2 = some_fnc1(some_variable1)
df_covid = some_fnc2(some_variable2)

df_covid.iloc[:5, :10]
```

Note: the code above is for illustrative purposes only.  You may have a different number of functions or input / output variables in your submission.

### Don't forget: 
In addition to accomplishing the task above, your software will be evaluated for:
- documentation
    - comments
    - variable names
    - "chunking" the code into consistent pieces
    - docstrings
- program design
    - which steps of the ETL / (get-clean-do) pipeline are we performing?
    - should one function perform multiple steps of the ETL / (get-clean-do) pipeline?
    
### Hint:
Grabbing the first 5 rows and 10 columns of the output dataframe should yield something similar* to:

```python
df_covid.iloc[:5, :10]
```

|   | checkTimeEt | commercialScore | dataQualityGrade |       date |          dateChecked |         dateModified | death | deathConfirmed | deathIncrease | deathProbable |
|--:|------------:|----------------:|-----------------:|-----------:|---------------------:|---------------------:|------:|---------------:|--------------:|---------------|
| 0 | 02/22 18:59 |             0.0 |             None | 2021-02-23 | 2021-02-22T23:59:00Z | 2021-02-22T23:59:00Z | 15883 |          15564 |          30.0 | 319           |
| 1 | 02/21 18:59 |             0.0 |             None | 2021-02-22 | 2021-02-21T23:59:00Z | 2021-02-21T23:59:00Z | 15853 |          15534 |          27.0 | 319           |
| 2 | 02/20 18:59 |             0.0 |             None | 2021-02-21 | 2021-02-20T23:59:00Z | 2021-02-20T23:59:00Z | 15826 |          15508 |          47.0 | 318           |
| 3 | 02/19 18:59 |             0.0 |             None | 2021-02-20 | 2021-02-19T23:59:00Z | 2021-02-19T23:59:00Z | 15779 |          15462 |          53.0 | 317           |
| 4 | 02/18 18:59 |             0.0 |             None | 2021-02-19 | 2021-02-18T23:59:00Z | 2021-02-18T23:59:00Z | 15726 |          15409 |          40.0 | 317           |

*of course, it won't be identical as your software will be run at a later date with more up-to-date information.

In [1]:
from datetime import date, time, datetime, timedelta
import pytz
import requests
import pandas as pd
from datetime import time
import requests
import json


In [2]:
# the funciton is to request the daily infor from the api with different state name
def requestInfo(state):
    """
    The function request the info from the api with correspoding state name 
    Args：
    state("string"): state is a string abbreviation of the state name
    
    Returns:
    cov_dic: the dictionary that contains the covid infor that request from the api. 
    
    """
    
    # url witht the corresponding state as a string
    url = f"https://api.covidtracking.com/v1/states/{state}/daily.json"

     # get url as a string
    url_text = requests.get(url).text
    
    # convert json to a nested dict
    cov_dic = json.loads(url_text)
    
    return cov_dic


#helper function:
def getdate(dic):
    """
    getdate funciton return the date as a datetime by taking in a single dictionary
    
    Args:
    dic("dictionary"): is sinlge iterms of  the dic that requested from api.
    
    Returns:
    date(datetime): convert the integer date into a datetime format
    
    """
    
    # convert the int into string then convert into a datetime witht the year, month, day string format.
    datestring = str(dic["date"])
    date = datetime.strptime(datestring, '%Y%m%d').date()
    return date
    
    
# clean the data, only replace the "date" item
def cleanDic(some_variable0):
    """
    replace the date item of the original dic with the converted datetime item by helper fuction
    
    Args:
    somvariable0:(dictionary): the requested nested dic from api
    
    returns:
    some_variable0(dictionary): return a mutated dic which date column contains datetime object. 
    
    """
    
    # replace all of the items under date column
    length = len(some_variable0)
    
    for i in range(length):
        some_variable0[i]['date'] = getdate(some_variable0[i])
        
    return some_variable0


# create the cleaned dataframe
def getTable(some_variable1):
    """
    put the cleaned dic into a pandas dataframe
    
    Args:
    some_variable1(dic):this is a cleaned dic, where date was replaced with datetime object
    
    Returns:
    df(dataframe): a sorted columns dataframe that created by the cleaned dictionary 
    """
    
    # initialized the datafrmae, where the columns are alphabetical
    df = pd.DataFrame(some_variable1).sort_index(axis = 1)
    
    
    return df

In [3]:
# call the functions
state = 'ma'
some_variable0 = requestInfo(state)
some_variable2 = cleanDic(some_variable0)
df_covid = getTable(some_variable2)

df_covid.iloc[:5, :10]

Unnamed: 0,checkTimeEt,commercialScore,dataQualityGrade,date,dateChecked,dateModified,death,deathConfirmed,deathIncrease,deathProbable
0,02/28 18:59,0,,2021-03-01,2021-02-28T23:59:00Z,2021-02-28T23:59:00Z,16144.0,15822.0,26,322.0
1,02/27 18:59,0,,2021-02-28,2021-02-27T23:59:00Z,2021-02-27T23:59:00Z,16118.0,15796.0,51,322.0
2,02/26 18:59,0,,2021-02-27,2021-02-26T23:59:00Z,2021-02-26T23:59:00Z,16067.0,15744.0,43,323.0
3,02/25 18:59,0,,2021-02-26,2021-02-25T23:59:00Z,2021-02-25T23:59:00Z,16024.0,15703.0,46,321.0
4,02/24 18:59,0,,2021-02-25,2021-02-24T23:59:00Z,2021-02-24T23:59:00Z,15978.0,15657.0,33,321.0
