# Data Collection and Processing

Description: I want to visualize the Texas power outage during Feb.10 to Feb.19 hour by hour in Tableau.

In this notebook, I will prepare outage data, weather history data and Texas county data.
- Outage data is from: [ERCOT, Generation Resource and Energy Storage Resource Outages and Derates for February 10-19, 2021 Excel Version](http://www.ercot.com/content/wcm/lists/226521/Unit_Outage_Data_20210312.xlsx).
- Texas county list is from: [Texas-Counties-Centroid-Map](https://data.texas.gov/dataset/Texas-Counties-Centroid-Map/ups3-9e8m/data). Another source is [Wikipedia](https://en.wikipedia.org/wiki/User:Michael_J/County_table).
- Weather history data is from: [Meteostat](https://dev.meteostat.net/python/).

In [4]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


## 1 Outage data

In [11]:
outage = pd.read_excel('outage_feb1019.xlsx', sheet_name='OutageData')
print(f'Outage size: {outage.shape}')
outage.head(3)

Outage size: (2599, 11)


Unnamed: 0,STATION,STATION LONG NAME,UNIT NAME,SEASONAL MAX MW (HSL),AVAILABLE MW AFTER OUTAGE/DERATE,MW REDUCTION FROM OUTAGE/DERATE,FUEL TYPE,START,END,RESOURCE ENTITY,COUNTY
0,NBOHR,NIELS BOHR,UNIT1,197,20,177,WIND,2021-02-10 00:00:00,2021-02-12 04:20:00,BEARKAT WIND ENERGY I LLC (RE),GLASSCOCK
1,KEECHI,KEECHI WIND,U1,110,0,110,WIND,2021-02-10 00:30:00,2021-02-15 08:51:00,KEECHI WIND LLC (RE),JACK
2,BLSUMIT3,BLUE SUMMIT 3,UNIT_17,13,1,12,WIND,2021-02-10 01:15:00,2021-02-10 18:46:00,BLUE SUMMIT III WIND LLC (RE),HARDEMAN


## 2 County data

In [12]:
cnty = pd.read_csv('Texas_Counties_Centroid_Map.csv')
print(f'County size: {cnty.shape}')
cnty.head(3)

County size: (254, 8)


Unnamed: 0,X (Lat),Y (Long),CNTY_NM,CNTY_NBR,FIPS,Shape_Leng,Shape_Area,County Location
0,-97.492799,29.456415,Gonzales,90,48177,2.124911,0.257805,"(-97.492799, 29.456415)"
1,-98.697292,27.043405,Jim Hogg,125,48247,2.271751,0.267624,"(-98.697292, 27.043405)"
2,-97.681378,26.924094,Kenedy,66,48261,5.067864,0.389397,"(-97.681378, 26.924094)"


## 3 Weather history data

In [25]:
from datetime import datetime
from meteostat import Point, Daily

In [29]:

# Set time period
start = datetime(2018, 1, 1)
end = datetime(2018, 12, 31)

# Create Point for Harris, TX
hs = Point(-95.3978, 29.8596)


# Get daily data for 2018
data = Daily(hs, start, end)
data = data.fetch()

# # Plot line chart including average, minimum and maximum temperature
# data.plot(y=['tavg', 'tmin', 'tmax'])
# plt.show()

In [30]:
data

Unnamed: 0,tavg,tmin,tmax,prcp,snow,wdir,wspd,wpgt,pres,tsun


In [31]:
from meteostat import Stations

stations = Stations()
stations = stations.nearby(-95.3978, 29.8596)
station = stations.fetch(1)

print(station)

                                    name country region    wmo  icao  \
id                                                                     
89009  Amundsen-Scott South Pole Station      AQ   <NA>  89009  NZSP   

       latitude  longitude  elevation               timezone hourly_start  \
id                                                                          
89009     -90.0        0.0     2830.0  Antarctica/South_Pole   1957-01-09   

      hourly_end daily_start  daily_end       distance  
id                                                      
89009 2021-03-30  1957-01-11 2021-03-16  620235.502334  


In [32]:
station

Unnamed: 0_level_0,name,country,region,wmo,icao,latitude,longitude,elevation,timezone,hourly_start,hourly_end,daily_start,daily_end,distance
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
89009,Amundsen-Scott South Pole Station,AQ,,89009,NZSP,-90.0,0.0,2830.0,Antarctica/South_Pole,1957-01-09,2021-03-30,1957-01-11,2021-03-16,620235.502334
