# Homework III - ME 364 (Spring 2022)

For this homework assignment, you are going to use <u>part</u> of the United States Wind Turbine Database (USWTDB), containing the information about the locations of land-based and offshore wind turbines in the United States, corresponding wind project information, and turbine technical specifications (for more information see: https://eerscmap.usgs.gov/uswtdb). The dataset is in the same zip file with this notebook. The variables in the dataset are:

- **case_id**: Unique uswtdb id
- **faa_ors**: Federal Avaiation Administration digital obstacle file (dof) for obstacle repository system (ors)
- **faa_asn**: Federal Avaiation Administration obstruction evaluation - airport airspace analysis (oe-aaa) aeronautical study number (asn)
- **usgs_pr_id**: United States Geological Survey id from prior turbine dataset
- **eia_id**: Energy Information Administration plant id from eia form 860
- **t_state**: State where turbine is located
- **t_county**: County where turbine is located
- **t_fips**: State and county fips where turbine is located
- **p_name**: Project name
- **p_year**: Year project became operational
- **p_tnum**: Number of turbines in project
- **p_cap**: Project capacity (MW)
- **t_manu**: Turbine original equipment manufacturer
- **t_model**: Turbine model
- **t_cap**: Turbine capacity (kW)
- **t_hh**: Turbine hub height (meters)
- **t_rd**: Turbine rotor diameter (meters)
- **t_rsa**: Turbine rotor swept area (meters^2)
- **t_ttlh**: Turbine total height - calculated (meters)
- **t_conf_atr**: Turbine characteristic confidence (0-3)
- **t_conf_loc**: Location confidence (0-3)
- **t_img_date**: Date of image used to visually verify turbine location
- **t_img_srce**: Source of image used to visually verify turbine location
- **xlong**: Longitude (decimal degrees - NAD 83 datum)
- **ylat**: Latitude (decimal degrees - NAD 83 datum)

<font color='red'>__IMPORTANT NOTE :__</font> _for all the plots, make sure that your axes and all the variables shown on the plots are properly named (not the default abbreviations used in the dataset) and they all have units associated with them, as long as the variable has a unit._

<font color='blue'>__(1)__</font>  Import the data to the notebook. How many entries (i.e., rows) do we have in this dataset? Show the first five rows of the dataset.

In [17]:
# Your code goes here
import pandas as pd

# Import dataframe
url = 'https://raw.githubusercontent.com/yairg98/Data-Driven-Problem-Solving/main/Homework%203/USWTDB_v3.csv'
df = pd.read_csv(url)

# Get number of entries
n = len(df.index)
print("Number of entries: {}".format(n))

print(df.head(5))

Number of entries: 21826
   case_id    faa_ors           faa_asn  usgs_pr_id   eia_id t_state  \
0  3046335  25-025116  2013-WTE-5773-OE     26722.0  58661.0      MA   
1  3046262  25-025115  2013-WTE-5497-OE     26723.0  58661.0      MA   
2  3039278  25-022038  2011-WTE-7517-OE     26677.0  57253.0      MA   
3  3039277  25-022039  2011-WTE-7516-OE     26676.0  57253.0      MA   
4  3014014  39-003863  2003-AGL-6902-OE     35938.0  56226.0      OH   

            t_county  t_fips                                    p_name  \
2  Barnstable County   25001                        AFCEE MMR Turbines   
3  Barnstable County   25001                        AFCEE MMR Turbines   
4        Wood County   39173  AMP-Ohio/Green Mountain Energy Wind Farm   

   p_year  ...  t_hh  t_rd    t_rsa t_ttlh  t_conf_atr  t_conf_loc  \
0  2013.0  ...  80.0  82.5  5345.62  121.3           3           3   
1  2013.0  ...  80.0  82.5  5345.62  121.3           3           3   
2  2011.0  ...  80.0  77.0  4656.63

<font color='blue'>__(2)__</font> Provide a bubble chart, representing the turbine capacity versus turbine roto diameter with the size of the markers representing the project capacity. Show the Turbine characteristic confidence on your plot as well. Do not forget to use option `labels` to properly name all the variables.

In [18]:
# Your code goes here
import plotly.express as px

fig = px.scatter(
     df, x="t_rd", y="t_cap",	size="p_cap", color="t_conf_atr",
     hover_name="p_name", size_max=35, height=600, width=1500,
     labels={
          "t_rd": "Turbine Rotor Diameter (m)",
          "t_cap": "Turbine Capacity (kW)",
          "t_conf_atr": "Turbin Characteristic Confidence",
          "p_cap": " Project Capacity (MW)"
     },
     title="Turbine Capacity VS Rotor Diameter"
)

fig.show()

<font color='blue'>__(3)__</font> Create a map centered on the US (look up the US latitude and longitude) and represent the locations of the wind turbine projects on the map. Use green circles to represent each project and make sure that the project name is shown as a popup for each project. Save the map as an html file and submit it along with your notebook for this assignment.

In [19]:
# Your code goes here
fig = px.scatter_geo(
    df, lat="ylat", lon="xlong", color_discrete_sequence=['green']*n,
    center={
        'lat': 44.96,
        'lon': -103.77
    },
    hover_name='p_name'
)

# Added this because I thought it looked nicer
# (still included center by geolocation above)
fig.update_geos(fitbounds="locations")
                     
fig.show()
fig.write_html("us_turbine_locations.html")

<font color='blue'>__(4)__</font> For the state of Texas, provide a pie chart showing the project capacity as a percentage of total capacity for each year. Make sure that you use option `labels` to properly name the variables with their units and put the years and percentages inside each slice. (**note**: you first need to define a new dataframe that only includes the data for the state of Texas.)

In [21]:
# Your code goes here

# Create nex dataframe only Texas data
df_tx = df[df["t_state"]=="TX"]

fig = px.pie(
    df_tx, values='p_cap', names='p_year',
    labels={
        'p_cap':'New Project Capacity (kW)',
        'p_year':'Year'})
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()