# Final Project Phase 3 Summary
This Jupyter Notebook (.ipynb) will serve as the skeleton file for your submission for Phase 3 of the Final Project. Complete all sections below as specified in the instructions for the project, covering all necessary details. We will use this to grade your individual code (Do this whether you are in a group or not). Good luck! <br><br>

Note: To edit a Markdown cell, double-click on its text.

## Jupyter Notebook Quick Tips
Here are some quick formatting tips to get you started with Jupyter Notebooks. This is by no means exhaustive, and there are plenty of articles to highlight other things that can be done. We recommend using HTML syntax for Markdown but there is also Markdown syntax that is more streamlined and might be preferable. 
<a href = "https://towardsdatascience.com/markdown-cells-jupyter-notebook-d3bea8416671">Here's an article</a> that goes into more detail. (Double-click on cell to see syntax)

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
<br>
<b>BoldText</b> or <i>ItalicText</i>
<br> <br>
Math Formulas: $x^2 + y^2 = 1$
<br> <br>
Line Breaks are done using br enclosed in < >.
<br><br>
Hyperlinks are done with: <a> https://www.google.com </a> or 
<a href="http://www.google.com">Google</a><br>

# Data Collection and Cleaning


Transfer/update the data collection and cleaning you created for Phase II below. You may include additional cleaning functions if you have extra datasets. If no changes are necessary, simply copy and paste your phase II parsing/cleaning functions.


## Downloaded Dataset Requirement



In [3]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from pprint import pprint
import requests
import json

In [4]:
def global_power():
  # IMPORT FILES
  generation_path = "data\global-power-plants\global_power_plant_database.csv"
  emission_path = "data\global-power-plants\global_power_emissions_database.xlsx"
  
  # power generation csv
  with open(generation_path, encoding='utf8') as fin:
    ppg = pd.read_csv(fin, low_memory=False)
  
  # power plant emissions xlsx
  ppe = pd.read_excel(emission_path, sheet_name='GPED_v1.0_Plant Level', skiprows=0, header=1)

  # removing unwanted data
  unwanted_columns = ['latitude',
                      'longitude',
                      'other_fuel1',
                      'other_fuel2',
                      'other_fuel3',
                      'commissioning_year',
                      'gppd_idnr',
                      'owner',
                      'source',
                      'url',
                      'geolocation_source',
                      'wepp_id',
                      'year_of_capacity_data',
                      'generation_data_source',
                      'generation_gwh_2018',
                      'generation_gwh_2019',
                      'estimated_generation_note_2013',
                      'estimated_generation_note_2014',
                      'estimated_generation_note_2015',
                      'estimated_generation_note_2016',
                      'estimated_generation_note_2017']
  ppg.drop(unwanted_columns, axis=1, inplace=True)

  unwanted_columns = ['No.', 'Number of Units', 'Total Plant Installed Capacity (MW)']
  ppe.drop(unwanted_columns, axis=1, inplace=True)

  # AGGREGATING DATA
  avgs = ['generation_gwh_2013',
          'generation_gwh_2014',
          'generation_gwh_2015',
          'generation_gwh_2016',
          'generation_gwh_2017']
  ppg['AVG_GENERATION'] = ppg[avgs].mean(axis=1)

  avgs = ['estimated_generation_gwh_2013',
          'estimated_generation_gwh_2014',
          'estimated_generation_gwh_2015',
          'estimated_generation_gwh_2016',
          'estimated_generation_gwh_2017']
  ppg['AVG_EST_GENERATION'] = ppg[avgs].mean(axis=1)

  # merge the two average columns into a single column
  ppg['GENERATION_MW'] = ppg.apply(lambda x : np.fmax(x['AVG_GENERATION'], x['AVG_EST_GENERATION']), axis=1)

  # remove unaggregated columns
  unwanted_columns = ['AVG_GENERATION',
                      'AVG_EST_GENERATION',
                      'generation_gwh_2013',
                      'generation_gwh_2014',
                      'generation_gwh_2015',
                      'generation_gwh_2016',
                      'generation_gwh_2017',
                      'estimated_generation_gwh_2013',
                      'estimated_generation_gwh_2014',
                      'estimated_generation_gwh_2015',
                      'estimated_generation_gwh_2016',
                      'estimated_generation_gwh_2017']
  ppg.drop(unwanted_columns, axis=1, inplace=True)

  # merge rows by fuel type
  generation_dist = ppg.groupby('primary_fuel')['GENERATION_MW'].sum().sort_values(ascending=False)
  emission_dist = ppe.groupby('Fuel Types').aggregate({'CO2 Emissions (Mg)':'sum',
                                                       'SO2 Emissions (Mg)':'sum',
                                                       'NOx Emissions (Mg)':'sum',
                                                       'PM2.5 Emissions (Mg)':'sum'})

  # COMBINING TABLES
  generation_dist.index = generation_dist.index.str.upper()
  generation_dist.drop('WAVE AND TIDAL', axis=0, inplace=True)
  gen_other = ['PETCOKE','WASTE','COGENERATION','STORAGE','NUCLEAR']
  generation_dist['OTHER'] = generation_dist[gen_other].sum(axis=0)
  generation_dist.drop(gen_other, axis=0, inplace=True)
  emission_dist = emission_dist.rename(index={'NG':'GAS'})

  power = pd.concat([generation_dist, emission_dist], axis=1)
  power = power.fillna(0)

  return power

############ Function Call ############
global_power()






FileNotFoundError: [Errno 2] No such file or directory: 'data\\global-power-plants\\global_power_plant_database.csv'

## Web Collection Requirement \#1


In [5]:
def web_scrape(): 

    #creates a dict that connects to a list of all the HashRates of GPUs for easy iteration when creating visuals
    hashlist = []
    hashdict = {}
    url = requests.get('https://whattomine.com/gpus')
    soup = BeautifulSoup(url.text, 'html.parser')
    for hasher in soup.find_all('div',{'class' :'position-relative'}):
        for h in hasher.stripped_strings:
            hashlist.append(h)
            hashlist2=  hashlist[::2]
            hashdict['Hashrate(Millions of Hash Per Sec)'] = [z for z in hashlist2] 
            
    wattlist = []
    wattdict = {}
    for wat in soup.find_all('small',{'class':'text-muted position-absolute'}):
        for w in wat.stripped_strings:
            wattlist.append(w[-4:-1])
    #return wattlist
            wattdict["Watt"] = [z for z in wattlist]
    
    #creates a dict that connects to a list of all the Revenues of GPUs for easy iteration when creating visuals
    revlist = []
    revdict = {}
    for rev in soup.find_all('td',{'class':'text-right table-success font-weight-bold'}):
        for r in rev.stripped_strings:
            revlist.append(r)
            revdict["24Hour Revenue"] = [z for z in revlist]
    
    #creates a dict that connects to a list of all the Names of GPUs
    namelist = []
    namedict = {}
    for name in soup.find_all('td'):
        for n in name.stripped_strings:
            namelist.append(n)
            if '(*)' in namelist:
                namelist.remove('(*)')
            namelist2 = namelist[1:650:16]
            namedict["GPU Model"] = [z for z in namelist2]

    # this dictionary matches GPU model to the hash rate and 24 hour revenue
    fulldict = {}
    for i in range(len(namelist2)):
        fulldict[namelist2[i]] = {"Hashrate(Millions of Hash Per Sec)":hashlist2[i],"24Hour Revenue": revlist[i]}
    
    #creates a data frame with the columns as GPU name and the index as the description 
    data = []
    data.append(hashlist2)
    data.append(revlist)
    data.append(wattlist)
    df = pd.DataFrame(data, index= ['Hashrate (Millions of Hashes Per Sec)','24 Hour Revenue','Watts'], columns = namelist2).T
    return df 

############ Function Call ############
web_scrape()


Unnamed: 0,Hashrate (Millions of Hashes Per Sec),24 Hour Revenue,Watts
GeForce RTX 3090,114.00 Mh/s,$6.87,320
Radeon VII,93.00 Mh/s,$5.75,200
GeForce RTX 3080,91.50 Mh/s,$5.58,230
GeForce RTX 3080 Ti,230.00 Mh/s,$4.31,280
Radeon RX 6800,64.00 Mh/s,$3.93,150
Radeon RX 6900 XT,64.00 Mh/s,$3.93,150
Radeon RX 6800 XT,64.00 Mh/s,$3.93,150
GeForce RTX 3060 Ti,58.10 Mh/s,$3.58,130
GeForce RTX 3070,58.10 Mh/s,$3.58,130
GeForce RTX 3070 Ti,35.00 Mh/s,$3.48,230


## Web Collection Requirement #2

In [6]:
#data with details on specific crypto - to utilize when visualizing 
def coin_info():
    url = requests.get('https://whattomine.com/coins.json')
    j = url.json()
    df = pd.DataFrame(j['coins'])
    coins = pd.Series(j['coins'].keys)
    df.rename(index=coins, inplace=True)
    return df.transpose()

############ Function Call ############
coin_info()

TypeError: unhashable type: 'dict'

#Inconsistency Revisions
 **If you were requested to revise your inconsistency section from Phase II, enter your responses here. Otherwise, ignore this section.**

For each inconsistency (NaN, null, duplicate values, empty strings, etc.) you discover in your datasets, write at least 2 sentences stating the significance, how you identified it, and how you handled it.

1. 

2. 

3. 

4. (if applicable)

5. (if applicable)


## Data Sources

Include sources (as links) to your datasets. If any of these are different from your sources used in Phase II, please <b>clearly</b> specify.

*   Downloaded Dataset Source:data\global-power-plants\global_power_plant_database.csv
*   Web Collection #1 Source:https://whattomine.com/gpus
*   Web Collection #2 Source:https://whattomine.com/coins.json



# Data Analysis
For the Data Analysis section, you are required to utilize your data to complete the following:

*   Create at least 5 insights
*   Generate at least 3 data visualizations
*   Export aggregated data to at least 1 summary file 

Create a function for each of the following sections mentioned above. Do not forget to fill out the explanation section for each function. 

Make sure your data analysis is not too simple. Performing complex aggregation and using modules not taught in class shows effort, which will increase the chance of receiving full credit. 

# Graphical User Interface (GUI) Implementation
If you decide to create a GUI for Phase II, please create a separate Python file (.py) to build your GUI. You must submit both the completed PhaseII.ipynb and your Python GUI file.

## Insights

In [7]:
def insight1():
  return web_scrape()





############ Function Call ############
insight1()

Unnamed: 0,Hashrate (Millions of Hashes Per Sec),24 Hour Revenue,Watts
GeForce RTX 3090,114.00 Mh/s,$6.87,320
Radeon VII,93.00 Mh/s,$5.75,200
GeForce RTX 3080,91.50 Mh/s,$5.58,230
GeForce RTX 3080 Ti,230.00 Mh/s,$4.31,280
Radeon RX 6800,64.00 Mh/s,$3.93,150
Radeon RX 6900 XT,64.00 Mh/s,$3.93,150
Radeon RX 6800 XT,64.00 Mh/s,$3.93,150
GeForce RTX 3060 Ti,58.10 Mh/s,$3.58,130
GeForce RTX 3070,58.10 Mh/s,$3.58,130
GeForce RTX 3070 Ti,35.00 Mh/s,$3.48,230


### Insight 1 Explanation

Insert explanation here

In [None]:
def insight2():
  pass





############ Function Call ############
insight2()

### Insight 2 Explanation

Insert explanation here

In [None]:
def insight3():
  pass





############ Function Call ############
insight3()

### Insight 3 Explanation

Insert explanation here

In [None]:
def insight4():
  pass





############ Function Call ############
insight4()

### Insight 4 Explanation

Insert explanation here

In [None]:
def insight5():
  pass





############ Function Call ############
insight5()

### Insight 5 Explanation

Insert explanation here

## Data Visualizations

In [None]:
def visual1():
  pass





############ Function Call ############
visual1()

### Visualization 1 Explanation

Insert explanation here

In [None]:
def visual2():
  pass





############ Function Call ############
visual2()

### Visualization 2 Explanation

Insert explanation here

In [None]:
def visual3():
  pass





############ Function Call ############
visual3()

### Visualization 3 Explanation

Insert explanation here

## Summary Files

In [None]:
def summary1():
  pass





############ Function Call ############
summary1()

# Cited Sources

If you used any additional sources to complete your Data Analysis section, list them here:


*   Example Module Documentation
*   Example Stack Overflow Assistance



# Video Presentation

If you uploaded your Video Presentation to Bluejeans, YouTube, or any other streaming services, please provide the link here:


*   Video Presentation Link


Make sure the video sharing permissions are accessible for anyone with the provided link.

# Submission

Prior to submitting your notebook to Gradescope, be sure to <b>run all functions within this file</b>. We will not run your functions ourselves, so we must see your outputs within this file in order to receive full credit.
