# Phase 3 - Joining the data
Trevor's [Phase 1](https://adb-731998097721284.4.azuredatabricks.net/?o=731998097721284#notebook/1038316885899254/command/1038316885899255) and [Phase 2](https://adb-731998097721284.4.azuredatabricks.net/?o=731998097721284#notebook/3816960191357033/command/3816960191357034) explorations were used as a base
<br> Architecture diagram courtesy of Esther

## Join Architecture Diagram
Below, we show the planned sequence of joins to join our 4 datasets (OpenFights, Stations, Airports, Weather) into the full dataset we plan to use for modeling.

This first diagram shows how we dealed with the missing airport and generated a mapping table containing closest single weather station for each airport.

![Joining_Process_1](files/shared_uploads/yuqiaochen@berkeley.edu/Joining_Process_1.png)

This second diagram shows all the joins we have performed to get our final fully joined dataset.

![Joining_Process_2](files/shared_uploads/yuqiaochen@berkeley.edu/Joining_Process_2.png)

## Imports

In [0]:
from pyspark.sql import types, functions as F
import pandas as pd
from pyspark.sql.functions import col, max
from pyspark.sql.window import Window
from pyspark.sql.functions import row_number

## Configure Databricks blob access

In [0]:
blob_container = "main-storage" # The name of your container created in https://portal.azure.com
storage_account = "team05w261" # The name of your Storage account created in https://portal.azure.com
secret_scope = "team05" # The name of the scope created in your local computer using the Databricks CLI
secret_key = "team05-key" # The name of the secret key created in your local computer using the Databricks CLI 
blob_url = f"wasbs://{blob_container}@{storage_account}.blob.core.windows.net"
mount_path = "/mnt/mids-w261"

# Configure blob storage account access key globally
spark.conf.set(
  f"fs.azure.account.key.{storage_account}.blob.core.windows.net",
  dbutils.secrets.get(scope = secret_scope, key = secret_key)
)

In [0]:
display(dbutils.fs.ls(f"{mount_path}/datasets_final_project"))

path,name,size
dbfs:/mnt/mids-w261/datasets_final_project/airlines/,airlines/,0
dbfs:/mnt/mids-w261/datasets_final_project/airlines_data/,airlines_data/,0
dbfs:/mnt/mids-w261/datasets_final_project/parquet_airlines_data/,parquet_airlines_data/,0
dbfs:/mnt/mids-w261/datasets_final_project/parquet_airlines_data_3m/,parquet_airlines_data_3m/,0
dbfs:/mnt/mids-w261/datasets_final_project/parquet_airlines_data_6m/,parquet_airlines_data_6m/,0
dbfs:/mnt/mids-w261/datasets_final_project/stations_data/,stations_data/,0
dbfs:/mnt/mids-w261/datasets_final_project/weather_data/,weather_data/,0
dbfs:/mnt/mids-w261/datasets_final_project/weather_data_6_hr/,weather_data_6_hr/,0
dbfs:/mnt/mids-w261/datasets_final_project/weather_data_single/,weather_data_single/,0


## Read in datasets

In [0]:
###########################################
##### For FULL dataset, run this cell #####
###########################################

df_airlines = spark.read.parquet("/mnt/mids-w261/datasets_final_project/parquet_airlines_data/*")
df_stations = spark.read.parquet('dbfs:/mnt/mids-w261/datasets_final_project/stations_data/*')
df_weather = spark.read.parquet('dbfs:/mnt/mids-w261/datasets_final_project/weather_data/*')
open_flights = pd.read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", 
                           names = ['id','name', 'city', 'country', 'iata', 'icao', 'lat', 'lng', 'altitude', 
                                    'timezone', 'dst', 'tz_db_time_zone', 'type', 'source'])
open_flights = spark.createDataFrame(open_flights)

In [0]:
##############################################
##### For LIMITED dataset, run this cell #####
##############################################

df_airlines = spark.read.parquet("/mnt/mids-w261/datasets_final_project/parquet_airlines_data_3m/")
df_stations = spark.read.parquet('dbfs:/mnt/mids-w261/datasets_final_project/stations_data/*')
df_weather = spark.read.parquet('dbfs:/mnt/mids-w261/datasets_final_project/weather_data/*').filter(F.col('DATE') < "2015-04-01T00:00:00.000")
open_flights = pd.read_csv("https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat", 
                           names = ['id','name', 'city', 'country', 'iata', 'icao', 'lat', 'lng', 'altitude', 
                                    'timezone', 'dst', 'tz_db_time_zone', 'type', 'source'])
open_flights = spark.createDataFrame(open_flights)

In [0]:
# Full dataset row count

# df_airlines.count() # 63493682 rows
# df_stations.count() # 5004169 rows
# df_weather.count() # 630904436 rows
# open_flights.count() # 7698 rows

## Subset columns to those we are keeping and remove duplicates

In [0]:
# list the columns we're keeping
weather_col = ['STATION', 'SOURCE', 'DATE', 'LATITUDE', 'LONGITUDE', 'ELEVATION', 
               'NAME', 'REPORT_TYPE', 'CALL_SIGN', 'QUALITY_CONTROL', 'WND', 
               'CIG', 'VIS', 'TMP', 'DEW', 'SLP', 'GA1', 'GF1', 'MA1', 'REM', 
               'AA1', 'AA2', 'AJ1', 'AL1', 'AN1', 'AO1', 'AU1', 'AT1']
airline_col = ["YEAR", "QUARTER", "MONTH", "DAY_OF_MONTH", "DAY_OF_WEEK", 
               "FL_DATE", "DEP_TIME", "DEP_TIME_BLK", "CRS_DEP_TIME", "CRS_ARR_TIME",
               "CRS_ELAPSED_TIME", "ARR_TIME", "ARR_TIME_BLK", "ACTUAL_ELAPSED_TIME",
               "ORIGIN", "ORIGIN_CITY_NAME", "ORIGIN_STATE_ABR", "ORIGIN_STATE_FIPS",
               "ORIGIN_STATE_NM", "ORIGIN_WAC",
               "DEST", "DEST_CITY_NAME", "DEST_STATE_ABR", "DEST_STATE_FIPS", 
               "DEST_STATE_NM","DEST_WAC",
               "OP_UNIQUE_CARRIER", "FLIGHTS", "DISTANCE", "DISTANCE_GROUP", "DIVERTED",
               "CANCELLED", "CANCELLATION_CODE", "CARRIER_DELAY", "DEP_DELAY", "DEP_DELAY_NEW",
               "DEP_DELAY_GROUP", "DEP_DEL15", "ARR_DELAY", "ARR_DELAY_NEW", "ARR_DELAY_GROUP",
               "ARR_DEL15", "WEATHER_DELAY", "NAS_DELAY", "SECURITY_DELAY", "LATE_AIRCRAFT_DELAY", "TAIL_NUM"]
station_col = ['station_id', 'wban', 'lat', 'lon', 'neighbor_id', 'neighbor_name', 'neighbor_state',
               'neighbor_call', 'neighbor_lat', 'neighbor_lon', 'distance_to_neighbor']
openfli_col = ['name', 'city', 'country', 'iata', 'lat', 'lng', 'icao', 'timezone', 'tz_db_time_zone']

In [0]:
# perform subsetting and drop duplicates
df_airlines = df_airlines.select(airline_col).dropDuplicates()
df_weather = df_weather.select(weather_col) # We did EDA and know there's no duplicate
df_stations = df_stations.select(station_col).dropDuplicates()
open_flights = open_flights.select(openfli_col).dropDuplicates()

In [0]:
# Filter out cancelled and diverted flights
df_airlines = df_airlines.filter((df_airlines.CANCELLED == 0) & (df_airlines.DIVERTED == 0))

In [0]:
df_airlines = df_airlines.drop('CANCELLED', 'DIVERTED', 'CANCELLATION_CODE')

## Prepping for a airport_weather_mapping

In [0]:
# select subset of columns
open_flights = open_flights.select('iata', 'icao', 'lat', 'lng', 'tz_db_time_zone')
print(open_flights.count())

# add 4 more airports
open_flights2 = spark.createDataFrame(pd.DataFrame({
    'iata': ['XWA', 'EAR', 'TKI', 'IFP'], 
    'icao': ['KXWA', 'KEAR', 'KTKI', 'KIFP'], 
    'lat': [48.2578135, 40.7274925, 33.1775399, 35.16558], 
    'lng': [-103.7418471, -99.0122646, -96.5926444, -114.557093], 
    'tz_db_time_zone': ['America/Chicago', 'America/Chicago', 'America/Chicago', 'America/Phoenix']
}))

# now union them
open_flights = open_flights.union(open_flights2)

# only need to keep the flights that are in df_airlines
origin_airports = df_airlines.select("origin").distinct().toPandas()
dest_airports = df_airlines.select("dest").distinct().toPandas()
all_airports = set(list(origin_airports.origin) + list(dest_airports.dest))
# do the filter
open_flights = open_flights.filter(F.col('iata').isin(all_airports))

open_flights.cache()

In [0]:
open_flights.createOrReplaceTempView("open_flights")
# 45,638 distinct weather stations
df_weather.select('latitude', 'longitude', 'station').distinct().createOrReplaceTempView('df_weather')

open_flights_closest_weather = spark.sql('''

with 

--step1: get every single combination between the two
-- and compute distance between. just use euclidean for now
tbl1 as (
    select
        a.iata
        ,a.icao
        ,a.lat lat_of
        ,a.lng lng_of
        ,b.latitude lat_w
        ,b.longitude lng_w
        ,b.station
        ,a.tz_db_time_zone
        ,power(power(a.lat - b.latitude,2) + power(a.lng - b.longitude,2),.5) as distance_euclidean
    from open_flights a 
    cross join df_weather b
)

--step2: the memory intensive sort to sort the distances for each airport
,tbl2 as(
    select
        tbl1.*
        ,dense_rank() over (partition by tbl1.iata order by tbl1.distance_euclidean) as n
    from tbl1
)

--step 3: filter to only include the 1 closest weather station. We could alternatively filter to include the top 3 closest weather stations and use that to take some sort of average if we want. 

select *
from tbl2
where n = 1
''')

# see a view of the whole thing:
display(open_flights_closest_weather)

# select a subset of columns to use as our "master mapping"
airport_weather_mapping = open_flights_closest_weather.select(['iata', 'icao', 'station', 'tz_db_time_zone'])
airport_weather_mapping.cache()

iata,icao,lat_of,lng_of,lat_w,lng_w,station,tz_db_time_zone,distance_euclidean,n
ATY,KATY,44.91400146,-97.15470123,44.9047,-97.1494,72654614946,America/Chicago,0.0107060822733886,1
BGM,KBGM,42.20869827,-75.97979736,42.2068,-75.98,72515004725,America/New_York,0.0019090552539127,1
BUR,KBUR,34.20069885253906,-118.35900115966795,34.20056,-118.3575,72288023152,America/Los_Angeles,0.001507567702059,1
DLG,PADL,59.04470062,-158.5050049,59.05,-158.5167,70321025513,America/Anchorage,0.0128397349035834,1
DRT,KDRT,29.3742008209,-100.927001953,29.3784,-100.927,72261022010,America/Chicago,0.0041991795541599,1
EUG,KEUG,44.12459945678711,-123.21199798583984,44.1278,-123.2206,72693024221,America/Los_Angeles,0.0091781329511612,1
GEG,KGEG,47.61989974975586,-117.53399658203124,47.6216,-117.528,72785024157,America/Los_Angeles,0.0062329645394445,1
GRB,KGRB,44.48509979248047,-88.12960052490234,44.4794,-88.1366,72645014898,America/Chicago,0.0090266431170816,1
GRR,KGRR,42.88079834,-85.52279663,42.8825,-85.52389,72635094860,America/New_York,0.0020226479457607,1
GTF,KGTF,47.48199844,-111.3710022,47.4733,-111.3822,72775024143,America/Denver,0.0141793364891778,1


## Pre-processing the flight and weather data sets

In [0]:
# Adding an ID to the flights data
df_airlines = df_airlines.withColumn("id", F.monotonically_increasing_id())

# rename airports columns with suffix
old_col_nms = df_airlines.columns
new_col_nms = [var + '_AIRLNS' for var in old_col_nms]
for i in range(len(old_col_nms)):
  df_airlines = df_airlines.withColumnRenamed(old_col_nms[i], new_col_nms[i])
df_airlines.columns

In [0]:
# rename stations columns with suffix
old_col_nms = df_stations.columns
new_col_nms = [var + '_STNS' for var in old_col_nms]
for i in range(len(old_col_nms)):
  df_stations = df_stations.withColumnRenamed(old_col_nms[i], new_col_nms[i])
df_stations.columns


In [0]:
# rename weather columns with suffix
old_col_nms = df_weather.columns
new_col_nms = [var + '_WTHR' for var in old_col_nms]
for i in range(len(old_col_nms)):
  df_weather = df_weather.withColumnRenamed(old_col_nms[i], new_col_nms[i])
df_weather.columns



In [0]:
# # rename openflights columns with suffix

old_col_nms = open_flights.columns
new_col_nms = [var + '_OPNFLGHT' for var in old_col_nms]
for i in range(len(old_col_nms)):
  open_flights = open_flights.withColumnRenamed(old_col_nms[i], new_col_nms[i])
open_flights.columns

## Join part 1
  
- Join the mapping table containing closest single weather station for each airport with Weather table on StationID to get enriched Weather table with airport codes.

In [0]:
# Enrich the weather data with the airport mapping, essentially attach IATA code to each weather measurement
df_weather_origin = df_weather \
    .join(airport_weather_mapping, on=[df_weather.STATION_WTHR == airport_weather_mapping.station], how='inner')\
    .withColumn('DATE_WTHR_rounded', F.date_trunc("hour", F.col('DATE_WTHR')))

## Prepping 2 sets of weather data - origin and destination

In [0]:
# Copy the same data to construct the df_weather_dest dataframe and rename columns
df_weather_dest = df_weather_origin.toDF(*df_weather_origin.columns)
old_col_nms = df_weather_dest.columns
new_col_nms = [var + '_dest' for var in old_col_nms]
for i in range(len(old_col_nms)):
  df_weather_dest = df_weather_dest.withColumnRenamed(old_col_nms[i], new_col_nms[i])
  
df_weather_dest.columns

In [0]:
# Go back and rename the origin columns
# rename some weather columns
old_col_nms = df_weather_origin.columns
new_col_nms = [var + '_origin' for var in old_col_nms]
for i in range(len(old_col_nms)):
  df_weather_origin = df_weather_origin.withColumnRenamed(old_col_nms[i], new_col_nms[i])
  
df_weather_origin.columns

In [0]:
# create a timestamp feature
df_airlines = df_airlines.withColumn('datetime_dep', 
    F.unix_timestamp(F.concat(
        F.col('FL_DATE_AIRLNS'), 
        F.lit(' '), 
        F.lpad(F.col('CRS_DEP_TIME_AIRLNS'), 4, '0')
        ), 'yyyy-MM-dd HHmm').cast(types.TimestampType())
    )

# bring time-zones 
df_airlines_pre_join = df_airlines\
  .join(open_flights.select(['iata_OPNFLGHT', 'tz_db_time_zone_OPNFLGHT']).distinct(), on=[df_airlines.ORIGIN_AIRLNS == open_flights.iata_OPNFLGHT], how='left')\
  .withColumn("utc_dep", F.to_utc_timestamp(F.col("datetime_dep"), F.col("tz_db_time_zone_OPNFLGHT")))\
  .withColumn("utc_dep_3hrs_prior", F.col('utc_dep') - F.expr('INTERVAL 3 HOURS')) \
  .withColumn("utc_dep_3hrs_prior_rounded", F.date_trunc("hour", F.col('utc_dep_3hrs_prior'))) \
  .withColumn("utc_arrive", col("utc_dep") + (col("CRS_ELAPSED_TIME_AIRLNS") * F.expr("Interval 1 Minutes")))


In [0]:
df_airlines_pre_join.drop('iata_OPNFLGHT', 'tz_db_time_zone_OPNFLGHT')

## Final join

- Join the weather tables onto the flight table at both origin and destination
- The weather is between 2 to 3 hours before scheduled departure at the weather station closest to the airport

In [0]:
# Join -- without the window function but with rounded timing
df_full_5 = df_airlines_pre_join \
  .join(df_weather_origin, on=[df_airlines_pre_join.utc_dep_3hrs_prior_rounded == df_weather_origin.DATE_WTHR_rounded_origin, df_airlines_pre_join.ORIGIN_AIRLNS == df_weather_origin.iata_origin], how='left') \
  .join(df_weather_dest, on=[df_airlines_pre_join.utc_dep_3hrs_prior_rounded == df_weather_dest.DATE_WTHR_rounded_dest, df_airlines_pre_join.DEST_AIRLNS == df_weather_dest.iata_dest], how='left')

# Now multiple weather station data could be joined to the same flight on either origin or dest - need to reduce that to only 1
windowSpec = Window.partitionBy("id_AIRLNS").orderBy(F.expr("bigint(DATE_WTHR_origin) - bigint(DATE_WTHR_rounded_origin) + bigint(DATE_WTHR_dest) - bigint(DATE_WTHR_rounded_dest)"))

df_full_5_dedup = df_full_5 \
  .withColumn("row_number", row_number().over(windowSpec)) \
  .filter("row_number == 1")


## Write to blob - this is checkpoint 1

In [0]:
# df_full_5_dedup.write.parquet(f"{blob_url}/2015_q1_full_join_5")

#df_full_5_dedup.write.parquet(f"{blob_url}/all_time_full_join_1") # Old version

df_full_5_dedup.write.parquet(f"{blob_url}/all_time_full_join_2") # New version: Included TAIL_NUM from Flight Dataset, filtered out Cancelled and diverted flights, created utc_arrive

# Data processing continues

In [0]:
%run ./dataclean_functions

In [0]:
# Reading from blob after checkpointing
#df_full_joined = spark.read.parquet(f"{blob_url}/all_time_full_join_1")
df_full_joined = spark.read.parquet(f"{blob_url}/all_time_full_join_2")


In [0]:
print(df_full_joined.count())
display(df_full_joined)

YEAR_AIRLNS,QUARTER_AIRLNS,MONTH_AIRLNS,DAY_OF_MONTH_AIRLNS,DAY_OF_WEEK_AIRLNS,FL_DATE_AIRLNS,DEP_TIME_AIRLNS,DEP_TIME_BLK_AIRLNS,CRS_DEP_TIME_AIRLNS,CRS_ARR_TIME_AIRLNS,CRS_ELAPSED_TIME_AIRLNS,ARR_TIME_AIRLNS,ARR_TIME_BLK_AIRLNS,ACTUAL_ELAPSED_TIME_AIRLNS,ORIGIN_AIRLNS,ORIGIN_CITY_NAME_AIRLNS,ORIGIN_STATE_ABR_AIRLNS,ORIGIN_STATE_FIPS_AIRLNS,ORIGIN_STATE_NM_AIRLNS,ORIGIN_WAC_AIRLNS,DEST_AIRLNS,DEST_CITY_NAME_AIRLNS,DEST_STATE_ABR_AIRLNS,DEST_STATE_FIPS_AIRLNS,DEST_STATE_NM_AIRLNS,DEST_WAC_AIRLNS,OP_UNIQUE_CARRIER_AIRLNS,FLIGHTS_AIRLNS,DISTANCE_AIRLNS,DISTANCE_GROUP_AIRLNS,CARRIER_DELAY_AIRLNS,DEP_DELAY_AIRLNS,DEP_DELAY_NEW_AIRLNS,DEP_DELAY_GROUP_AIRLNS,DEP_DEL15_AIRLNS,ARR_DELAY_AIRLNS,ARR_DELAY_NEW_AIRLNS,ARR_DELAY_GROUP_AIRLNS,ARR_DEL15_AIRLNS,WEATHER_DELAY_AIRLNS,NAS_DELAY_AIRLNS,SECURITY_DELAY_AIRLNS,LATE_AIRCRAFT_DELAY_AIRLNS,TAIL_NUM_AIRLNS,id_AIRLNS,datetime_dep,iata_OPNFLGHT,tz_db_time_zone_OPNFLGHT,utc_dep,utc_dep_3hrs_prior,utc_dep_3hrs_prior_rounded,utc_arrive,STATION_WTHR_origin,SOURCE_WTHR_origin,DATE_WTHR_origin,LATITUDE_WTHR_origin,LONGITUDE_WTHR_origin,ELEVATION_WTHR_origin,NAME_WTHR_origin,REPORT_TYPE_WTHR_origin,CALL_SIGN_WTHR_origin,QUALITY_CONTROL_WTHR_origin,WND_WTHR_origin,CIG_WTHR_origin,VIS_WTHR_origin,TMP_WTHR_origin,DEW_WTHR_origin,SLP_WTHR_origin,GA1_WTHR_origin,GF1_WTHR_origin,MA1_WTHR_origin,REM_WTHR_origin,AA1_WTHR_origin,AA2_WTHR_origin,AJ1_WTHR_origin,AL1_WTHR_origin,AN1_WTHR_origin,AO1_WTHR_origin,AU1_WTHR_origin,AT1_WTHR_origin,iata_origin,icao_origin,station_origin,tz_db_time_zone_origin,DATE_WTHR_rounded_origin,STATION_WTHR_dest,SOURCE_WTHR_dest,DATE_WTHR_dest,LATITUDE_WTHR_dest,LONGITUDE_WTHR_dest,ELEVATION_WTHR_dest,NAME_WTHR_dest,REPORT_TYPE_WTHR_dest,CALL_SIGN_WTHR_dest,QUALITY_CONTROL_WTHR_dest,WND_WTHR_dest,CIG_WTHR_dest,VIS_WTHR_dest,TMP_WTHR_dest,DEW_WTHR_dest,SLP_WTHR_dest,GA1_WTHR_dest,GF1_WTHR_dest,MA1_WTHR_dest,REM_WTHR_dest,AA1_WTHR_dest,AA2_WTHR_dest,AJ1_WTHR_dest,AL1_WTHR_dest,AN1_WTHR_dest,AO1_WTHR_dest,AU1_WTHR_dest,AT1_WTHR_dest,iata_dest,icao_dest,station_dest,tz_db_time_zone_dest,DATE_WTHR_rounded_dest,row_number
2016,2,4,21,4,2016-04-21,1924,1900-1959,1921,2230,369.0,2213,2200-2259,349.0,EWR,"Newark, NJ",NJ,34,New Jersey,21,SEA,"Seattle, WA",WA,53,Washington,93,UA,1.0,2402.0,10,,3.0,3.0,0,0.0,-17.0,0.0,-2,0.0,,,,,N78448,948,2016-04-21T19:21:00.000+0000,EWR,America/New_York,2016-04-21T23:21:00.000+0000,2016-04-21T20:21:00.000+0000,2016-04-21T20:00:00.000+0000,2016-04-22T05:30:00.000+0000,72502014734.0,7.0,2016-04-21T20:51:00.000+0000,40.6825,-74.1694,2.1,"NEWARK LIBERTY INTERNATIONAL AIRPORT, NJ US",FM-15,KEWR,V030,"220,5,N,0072,5","07620,5,M,N","016093,5,N,5",+02335,+00675,101475.0,"02,5,+06096,5,99,9",99999999999060961999999,101495101385.0,MET11404/21/16 15:51:02 METAR KEWR 212051Z 22014G18KT 10SM FEW200 BKN250 23/07 A2997 RMK AO2 SLP147 T02330067 56031 (DH),01000095,,,,,,,,EWR,KEWR,72502014734.0,America/New_York,2016-04-21T20:00:00.000+0000,72793024233.0,7.0,2016-04-21T20:53:00.000+0000,47.4444,-122.3138,112.8,"SEATTLE TACOMA INTERNATIONAL AIRPORT, WA US",FM-15,KSEA,V030,"210,5,N,0031,5","22000,5,9,N","016093,5,N,5",2065.0,675.0,100845.0,"02,5,+01067,5,99,9",02995999999010671999999,100785099215.0,MET11804/21/16 12:53:02 METAR KSEA 212053Z 21006KT 10SM FEW035 FEW120 FEW250 21/07 A2976 RMK AO2 SLP084 T02060067 58016 (RS),01000095,,,,,,,,SEA,KSEA,72793024233.0,America/Los_Angeles,2016-04-21T20:00:00.000+0000,1
2016,2,4,1,5,2016-04-01,1742,1600-1659,1630,1900,150.0,2005,1900-1959,143.0,FLL,"Fort Lauderdale, FL",FL,12,Florida,33,DCA,"Washington, DC",VA,51,Virginia,38,WN,1.0,899.0,4,31.0,72.0,72.0,4,1.0,65.0,65.0,4,1.0,0.0,0.0,0.0,34.0,N713SW,1021,2016-04-01T16:30:00.000+0000,FLL,America/New_York,2016-04-01T20:30:00.000+0000,2016-04-01T17:30:00.000+0000,2016-04-01T17:00:00.000+0000,2016-04-01T23:00:00.000+0000,74783012849.0,7.0,2016-04-01T17:53:00.000+0000,26.07875,-80.16217,3.4,"FORT LAUDERDALE INTERNATIONAL AIRPORT, FL US",FM-15,KFLL,V030,"160,5,N,0062,5","22000,5,9,N","016093,5,N,5",+02945,+02285,101415.0,"04,5,+00914,5,99,9",04995999999009141999999,101425101395.0,MET11704/01/16 12:53:02 METAR KFLL 011753Z 16012KT 10SM SCT030 29/23 A2995 RMK AO2 SLP141 T02940228 10300 20256 58013 (SMF),01000095,,,,,,,,FLL,KFLL,74783012849.0,America/New_York,2016-04-01T17:00:00.000+0000,72405013743.0,7.0,2016-04-01T17:52:00.000+0000,38.8472,-77.03454,3.0,"WASHINGTON REAGAN NATIONAL AIRPORT, VA US",FM-15,KDCA,V030,"190,5,N,0077,5","22000,5,9,N","016093,5,N,5",2445.0,1725.0,100215.0,"02,5,+00975,5,99,9",04995999999009751999999,100245100005.0,MET13904/01/16 12:52:02 METAR KDCA 011752Z 19015G21KT 10SM FEW032 FEW140 SCT250 24/17 A2960 RMK AO2 SLP021 60005 T02440172 10244 20178 57022 (RS),01000095,6001391.0,,,,,,,DCA,KDCA,72405013743.0,America/New_York,2016-04-01T17:00:00.000+0000,1
2016,2,4,28,4,2016-04-28,1855,1800-1859,1805,1853,108.0,1919,1800-1859,84.0,LAN,"Lansing, MI",MI,26,Michigan,43,MSP,"Minneapolis, MN",MN,27,Minnesota,63,EV,1.0,455.0,2,0.0,50.0,50.0,3,1.0,26.0,26.0,1,1.0,0.0,26.0,0.0,0.0,N371CA,1452,2016-04-28T18:05:00.000+0000,LAN,America/New_York,2016-04-28T22:05:00.000+0000,2016-04-28T19:05:00.000+0000,2016-04-28T19:00:00.000+0000,2016-04-28T23:53:00.000+0000,72539014836.0,7.0,2016-04-28T19:53:00.000+0000,42.7761,-84.5997,261.8,"LANSING CAPITAL CITY AIRPORT, MI US",FM-15,KLAN,V030,"020,5,N,0041,5","01524,5,M,N","009656,5,N,5",+00615,+00395,101615.0,"02,5,+00457,5,99,9",99999999999004571999999,101565098395.0,MET13304/28/16 14:53:02 METAR KLAN 281953Z 02008KT 6SM -RA BR FEW015 BKN050 OVC070 06/04 A2999 RMK AO2 RAE13B45 SLP161 P0000 T00610039 (JR),01000025,,,,,,10020015,,LAN,KLAN,72539014836.0,America/New_York,2016-04-28T19:00:00.000+0000,72658014922.0,7.0,2016-04-28T19:05:00.000+0000,44.8831,-93.2289,265.8,"MINNEAPOLIS ST PAUL INTERNATIONAL AIRPORT, MN US",FM-16,KMSP,V030,"070,5,N,0077,5","00579,5,M,N","016093,5,N,5",565.0,115.0,999999.0,"07,5,+00579,5,99,9",99999999999005791999999,101635098595.0,MET11804/28/16 13:05:02 SPECI KMSP 281905Z 07015G22KT 10SM BKN019 OVC038 06/01 A3001 RMK AO2 PRESENT WX VCSH T00560011 (CLB),,,,,,,,,MSP,KMSP,72658014922.0,America/Chicago,2016-04-28T19:00:00.000+0000,1
2019,1,3,30,6,2019-03-30,1458,1500-1559,1502,1657,235.0,1647,1600-1659,229.0,ORD,"Chicago, IL",IL,17,Illinois,41,PHX,"Phoenix, AZ",AZ,4,Arizona,81,NK,1.0,1440.0,6,,-4.0,0.0,-1,0.0,-10.0,0.0,-1,0.0,,,,,N683NK,1731,2019-03-30T15:02:00.000+0000,ORD,America/Chicago,2019-03-30T20:02:00.000+0000,2019-03-30T17:02:00.000+0000,2019-03-30T17:00:00.000+0000,2019-03-30T23:57:00.000+0000,72530094846.0,7.0,2019-03-30T17:51:00.000+0000,41.96019,-87.93162,201.8,"CHICAGO OHARE INTERNATIONAL AIRPORT, IL US",FM-15,KORD,V030,"340,5,N,0072,5","02896,5,M,N","016093,5,N,5",+00615,-00395,101245.0,"04,5,+02134,5,99,9",99999999999021341999999,101195098755.0,MET12503/30/19 11:51:02 METAR KORD 301751Z 34014KT 10SM SCT070 OVC095 06/M04 A2988 RMK AO2 SLP124 T00611039 10061 20033 50010 (BLH),01000095,,,,,,,,ORD,KORD,72530094846.0,America/Chicago,2019-03-30T17:00:00.000+0000,72278023183.0,7.0,2019-03-30T17:51:00.000+0000,33.4277,-112.0038,337.4,"PHOENIX AIRPORT, AZ US",FM-15,KPHX,V030,"090,5,N,0051,5","22000,5,9,N","016093,5,N,5",2395.0,-835.0,101485.0,"02,5,+09144,5,99,9",02995999999091441999999,101635097635.0,MET11203/30/19 10:51:02 METAR KPHX 301751Z 09010KT 10SM FEW300 24/M08 A3001 RMK AO2 SLP148 T02391083 10239 20128 58004,01000095,,,,,,,,PHX,KPHX,72278023183.0,America/Phoenix,2019-03-30T17:00:00.000+0000,1
2019,1,3,2,6,2019-03-02,2117,2100-2159,2120,2216,56.0,2215,2200-2259,58.0,DFW,"Dallas/Fort Worth, TX",TX,48,Texas,74,OKC,"Oklahoma City, OK",OK,40,Oklahoma,73,AA,1.0,175.0,1,,-3.0,0.0,-1,0.0,-1.0,0.0,-1,0.0,,,,,N936AN,1805,2019-03-02T21:20:00.000+0000,DFW,America/Chicago,2019-03-03T03:20:00.000+0000,2019-03-03T00:20:00.000+0000,2019-03-03T00:00:00.000+0000,2019-03-03T04:16:00.000+0000,72259003927.0,4.0,2019-03-03T00:00:00.000+0000,32.8978,-97.0189,170.7,"DAL FTW WSCMO AIRPORT, TX US",FM-12,99999,V020,"020,1,N,0041,1","99999,9,9,N",002400199,+00501,+00441,101831.0,"99,9,+00150,1,99,9",08991999999001501999999,999999099741.0,SYN09872259 11224 80208 10050 20044 39974 40183 56003 69951 751// 92353 333 10056 20028 96010 555 90300=,06000531,,,,,,,,DFW,KDFW,72259003927.0,America/Chicago,2019-03-03T00:00:00.000+0000,72353013967.0,4.0,2019-03-03T00:00:00.000+0000,35.3889,-97.6006,391.7,"OKLAHOMA CITY WILL ROGERS WORLD AIRPORT, OK US",FM-12,99999,V020,"040,1,N,0057,1","99999,9,9,N",016000199,111.0,-331.0,102101.0,"99,9,+00800,1,99,9",08991999999008001999999,999999097371.0,SYN08072353 32566 80411 10011 21033 39737 40210 58007 92352 333 10017 21027 555 90300=,,,,,,,,,OKC,KOKC,72353013967.0,America/Chicago,2019-03-03T00:00:00.000+0000,1
2019,1,3,7,4,2019-03-07,1805,1800-1859,1810,1930,80.0,1914,1900-1959,69.0,IAH,"Houston, TX",TX,48,Texas,74,JAN,"Jackson/Vicksburg, MS",MS,28,Mississippi,53,EV,1.0,351.0,2,,-5.0,0.0,-1,0.0,-16.0,0.0,-2,0.0,,,,,N17560,2026,2019-03-07T18:10:00.000+0000,IAH,America/Chicago,2019-03-08T00:10:00.000+0000,2019-03-07T21:10:00.000+0000,2019-03-07T21:00:00.000+0000,2019-03-08T01:30:00.000+0000,72243012960.0,7.0,2019-03-07T21:53:00.000+0000,29.98,-95.36,29.0,"HOUSTON INTERCONTINENTAL AIRPORT, TX US",FM-15,KIAH,V030,"170,5,N,0057,5","00488,5,M,N","016093,5,N,5",+02065,+01895,101785.0,"02,5,+00305,5,99,9",99999999999003051999999,101795101415.0,MET11203/07/19 15:53:02 METAR KIAH 072153Z 17011KT 10SM FEW010 BKN016 OVC040 21/19 A3006 RMK AO2 SLP178 T02060189 (MM),01000095,,,,,,,,IAH,KIAH,72243012960.0,America/Chicago,2019-03-07T21:00:00.000+0000,72235003940.0,4.0,2019-03-07T21:00:00.000+0000,32.3205,-90.0777,100.6,"JACKSON INTERNATIONAL AIRPORT, MS US",FM-12,99999,V020,"140,1,N,0041,1","99999,9,9,N",016000199,1501.0,-61.0,102211.0,"99,9,+01750,1,99,9",08991999999017501999999,999999101121.0,SYN06472235 32766 81408 10150 21006 30112 40221 56033 92054 555 90721=,,,,,,,,,JAN,KJAN,72235003940.0,America/Chicago,2019-03-07T21:00:00.000+0000,1
2019,1,3,14,4,2019-03-14,1107,0900-0959,930,1040,70.0,1219,1000-1059,72.0,SHV,"Shreveport, LA",LA,22,Louisiana,72,IAH,"Houston, TX",TX,48,Texas,74,EV,1.0,192.0,1,0.0,97.0,97.0,6,1.0,99.0,99.0,6,1.0,0.0,2.0,0.0,97.0,N15572,2043,2019-03-14T09:30:00.000+0000,SHV,America/Chicago,2019-03-14T14:30:00.000+0000,2019-03-14T11:30:00.000+0000,2019-03-14T11:00:00.000+0000,2019-03-14T15:40:00.000+0000,72248013957.0,7.0,2019-03-14T11:56:00.000+0000,32.4472,-93.8244,77.4,"SHREVEPORT REGIONAL AIRPORT, LA US",FM-15,KSHV,V030,"220,5,N,0051,5","02134,5,M,N","016093,5,N,5",+01565,+01395,100935.0,"08,5,+02134,5,99,9",99999999999021341999999,100955100015.0,MET14103/14/19 05:56:02 METAR KSHV 141156Z 22010KT 10SM OVC070 16/14 A2981 RMK AO2 RAE02 SLP093 P0000 60039 70113 T01560139 10222 20150 51054 (EAD),01000025,6009991.0,,,,,,,SHV,KSHV,72248013957.0,America/Chicago,2019-03-14T11:00:00.000+0000,72243012960.0,7.0,2019-03-14T11:16:00.000+0000,29.98,-95.36,29.0,"HOUSTON INTERCONTINENTAL AIRPORT, TX US",FM-16,KIAH,V030,"310,5,N,0067,5","00457,5,M,N","004828,5,N,5",1945.0,1785.0,999999.0,"02,5,+00244,5,99,9",99999999999002441999999,101195100805.0,MET12803/14/19 05:16:02 SPECI KIAH 141116Z 31013G22KT 3SM -RA BR FEW008 BKN015 OVC027 19/18 A2988 RMK AO2 RAB1056 P0009 T01940178 (CB),01002231,,,,,,10020015,,IAH,KIAH,72243012960.0,America/Chicago,2019-03-14T11:00:00.000+0000,1
2019,1,3,2,6,2019-03-02,641,0600-0659,650,900,190.0,845,0900-0959,184.0,MIA,"Miami, FL",FL,12,Florida,33,AUS,"Austin, TX",TX,48,Texas,74,AA,1.0,1103.0,5,,-9.0,0.0,-1,0.0,-15.0,0.0,-1,0.0,,,,,N940NN,2550,2019-03-02T06:50:00.000+0000,MIA,America/New_York,2019-03-02T11:50:00.000+0000,2019-03-02T08:50:00.000+0000,2019-03-02T08:00:00.000+0000,2019-03-02T15:00:00.000+0000,72202012839.0,7.0,2019-03-02T08:53:00.000+0000,25.7881,-80.3169,8.8,"MIAMI INTERNATIONAL AIRPORT, FL US",FM-15,KMIA,V030,"999,9,C,0000,5","02743,5,M,N","016093,5,N,5",+02115,+02005,101915.0,"02,5,+01219,5,99,9",99999999999012191999999,101935101825.0,MET11803/02/19 03:53:02 METAR KMIA 020853Z 00000KT 10SM FEW040 BKN090 BKN250 21/20 A3010 RMK AO2 SLP191 T02110200 56007 (AM),01000095,,,,,,,,MIA,KMIA,72202012839.0,America/New_York,2019-03-02T08:00:00.000+0000,72254013904.0,7.0,2019-03-02T08:12:00.000+0000,30.1831,-97.6799,146.3,"AUSTIN BERGSTROM INTERNATIONAL AIRPORT, TX US",FM-16,KAUS,V030,"050,5,N,0041,5","00152,5,M,N","004023,5,N,5",1065.0,895.0,999999.0,"08,5,+00152,5,99,9",99999999999001521999999,101765099955.0,MET09703/02/19 02:12:02 SPECI KAUS 020812Z 05008KT 2 1/2SM BR OVC005 11/09 A3005 RMK AO2 T01060089 (DF),,,,,,,00001015,,AUS,KAUS,72254013904.0,America/Chicago,2019-03-02T08:00:00.000+0000,1
2019,1,3,19,2,2019-03-19,1314,1300-1359,1325,1414,49.0,1356,1400-1459,42.0,DEN,"Denver, CO",CO,8,Colorado,82,PUB,"Pueblo, CO",CO,8,Colorado,82,OO,1.0,109.0,1,,-11.0,0.0,-1,0.0,-18.0,0.0,-2,0.0,,,,,N961SW,2694,2019-03-19T13:25:00.000+0000,DEN,America/Denver,2019-03-19T19:25:00.000+0000,2019-03-19T16:25:00.000+0000,2019-03-19T16:00:00.000+0000,2019-03-19T20:14:00.000+0000,72565003017.0,7.0,2019-03-19T16:53:00.000+0000,39.8328,-104.6575,1650.2,"DENVER INTERNATIONAL AIRPORT, CO US",FM-15,KDEN,V030,"010,5,N,0051,5","06096,5,M,N","016093,5,N,5",+00445,-00395,102375.0,"02,5,+01981,5,99,9",99999999999019811999999,102405083875.0,MET11703/19/19 09:53:02 METAR KDEN 191653Z 01010G17KT 10SM FEW065 SCT075 BKN200 04/M04 A3024 RMK AO2 SLP237 T00441039 (MAD),01000095,,,,,,,,DEN,KDEN,72565003017.0,America/Denver,2019-03-19T16:00:00.000+0000,72464093058.0,7.0,2019-03-19T16:53:00.000+0000,38.2901,-104.4983,1438.7,"PUEBLO MEMORIAL AIRPORT, CO US",FM-15,KPUB,V030,"070,5,N,0046,5","01829,5,M,N","016093,5,N,5",615.0,-335.0,102345.0,"08,5,+01829,5,99,9",99999999999018291999999,102405086125.0,MET09903/19/19 09:53:02 METAR KPUB 191653Z 07009KT 10SM OVC060 06/M03 A3024 RMK AO2 SLP234 T00611033 (MR),01000095,,,,,,,,PUB,KPUB,72464093058.0,America/Denver,2019-03-19T16:00:00.000+0000,1
2019,1,1,8,2,2019-01-08,1428,1400-1459,1415,1610,235.0,1557,1600-1659,209.0,HOU,"Houston, TX",TX,48,Texas,74,LAX,"Los Angeles, CA",CA,6,California,91,WN,1.0,1390.0,6,,13.0,13.0,0,0.0,-13.0,0.0,-1,0.0,,,,,N437WN,2999,2019-01-08T14:15:00.000+0000,HOU,America/Chicago,2019-01-08T20:15:00.000+0000,2019-01-08T17:15:00.000+0000,2019-01-08T17:00:00.000+0000,2019-01-09T00:10:00.000+0000,72244012918.0,7.0,2019-01-08T17:53:00.000+0000,29.63806,-95.28194,13.4,"HOUSTON WILLIAM P HOBBY AIRPORT, TX US",FM-15,KHOU,V030,"220,5,N,0036,5","07620,5,M,N","016093,5,N,5",+02395,+01675,102285.0,"04,5,+00762,5,99,9",99999999999007621999999,102245102075.0,MET12401/08/19 11:53:02 METAR KHOU 081753Z 22007KT 10SM SCT025 BKN250 24/17 A3019 RMK AO2 SLP228 T02390167 10239 20139 58003 (KAB),01000095,,,,,,,,HOU,KHOU,72244012918.0,America/Chicago,2019-01-08T17:00:00.000+0000,72295023174.0,7.0,2019-01-08T17:53:00.000+0000,33.938,-118.3888,29.6,"LOS ANGELES INTERNATIONAL AIRPORT, CA US",FM-15,KLAX,V030,"040,5,N,0026,5","06096,5,M,N","016093,5,N,5",1785.0,835.0,102035.0,"02,5,+00762,5,99,9",99999999999007621999999,102035100845.0,MET13101/08/19 09:53:02 METAR KLAX 081753Z 04005KT 10SM FEW025 SCT160 BKN200 18/08 A3013 RMK AO2 SLP203 T01780083 10178 20106 50006 (DMQ),01000095,,,,,,,,LAX,KLAX,72295023174.0,America/Los_Angeles,2019-01-08T17:00:00.000+0000,1


In [0]:
df_clean = generate_columns(df_full_joined)
df_clean_1 = drop_qc_columns(df_clean)
df_clean_2 = impute_weather_num(df_clean_1)
df_clean_3 = make_binary_cat_cols(df_clean_2)

In [0]:
# Generate flight derived features 
# (Calling flight_derived_features_creation function from ./working_notebooks/derived_features_function_esther)

df_clean_4 = flight_derived_features_creation(df_clean_3)

In [0]:
display(df_clean_4)

YEAR_AIRLNS,QUARTER_AIRLNS,MONTH_AIRLNS,DAY_OF_MONTH_AIRLNS,DAY_OF_WEEK_AIRLNS,FL_DATE_AIRLNS,DEP_TIME_AIRLNS,DEP_TIME_BLK_AIRLNS,CRS_DEP_TIME_AIRLNS,CRS_ARR_TIME_AIRLNS,CRS_ELAPSED_TIME_AIRLNS,ARR_TIME_AIRLNS,ARR_TIME_BLK_AIRLNS,ACTUAL_ELAPSED_TIME_AIRLNS,ORIGIN_AIRLNS,ORIGIN_CITY_NAME_AIRLNS,ORIGIN_STATE_ABR_AIRLNS,ORIGIN_STATE_FIPS_AIRLNS,ORIGIN_STATE_NM_AIRLNS,ORIGIN_WAC_AIRLNS,DEST_AIRLNS,DEST_CITY_NAME_AIRLNS,DEST_STATE_ABR_AIRLNS,DEST_STATE_FIPS_AIRLNS,DEST_STATE_NM_AIRLNS,DEST_WAC_AIRLNS,OP_UNIQUE_CARRIER_AIRLNS,FLIGHTS_AIRLNS,DISTANCE_AIRLNS,DISTANCE_GROUP_AIRLNS,CARRIER_DELAY_AIRLNS,DEP_DELAY_AIRLNS,DEP_DELAY_NEW_AIRLNS,DEP_DELAY_GROUP_AIRLNS,DEP_DEL15_AIRLNS,ARR_DELAY_AIRLNS,ARR_DELAY_NEW_AIRLNS,ARR_DELAY_GROUP_AIRLNS,ARR_DEL15_AIRLNS,WEATHER_DELAY_AIRLNS,NAS_DELAY_AIRLNS,SECURITY_DELAY_AIRLNS,LATE_AIRCRAFT_DELAY_AIRLNS,TAIL_NUM_AIRLNS,id_AIRLNS,datetime_dep,iata_OPNFLGHT,tz_db_time_zone_OPNFLGHT,utc_dep,utc_dep_3hrs_prior,utc_dep_3hrs_prior_rounded,utc_arrive,STATION_WTHR_origin,SOURCE_WTHR_origin,DATE_WTHR_origin,LATITUDE_WTHR_origin,LONGITUDE_WTHR_origin,ELEVATION_WTHR_origin,NAME_WTHR_origin,REPORT_TYPE_WTHR_origin,CALL_SIGN_WTHR_origin,QUALITY_CONTROL_WTHR_origin,iata_origin,icao_origin,station_origin,tz_db_time_zone_origin,DATE_WTHR_rounded_origin,STATION_WTHR_dest,SOURCE_WTHR_dest,DATE_WTHR_dest,LATITUDE_WTHR_dest,LONGITUDE_WTHR_dest,ELEVATION_WTHR_dest,NAME_WTHR_dest,REPORT_TYPE_WTHR_dest,CALL_SIGN_WTHR_dest,QUALITY_CONTROL_WTHR_dest,iata_dest,icao_dest,station_dest,tz_db_time_zone_dest,DATE_WTHR_rounded_dest,row_number,WND_WTHR_direction_angle_origin,WND_WTHR_speed_rate_origin,CIG_WTHR_ceiling_height_dimension_origin,VIS_WTHR_distance_dimension_origin,TMP_WTHR_air_temperature_origin,DEW_WTHR_dew_point_temperature_origin,SLP_WTHR_sea_level_pressure_origin,GA1_WTHR_base_height_dimension_origin,GF1_WTHR_lowest_cloud_base_height_dimension_origin,MA1_WTHR_altimeter_setting_rate_origin,MA1_WTHR_station_pressure_rate_origin,AA1_WTHR_period_quantity_in_hours_origin,AA1_WTHR_depth_dimension_origin,AA2_WTHR_period_quantity_in_hours_origin,AA2_WTHR_depth_dimension_origin,AJ1_WTHR_dimension_origin,AJ1_WTHR_equivalent_water_depth_dimension_origin,AL1_WTHR_period_quantity_origin,AL1_WTHR_depth_dimension_origin,AN1_WTHR_period_quantity_origin,AN1_WTHR_depth_dimension_origin,AO1_WTHR_period_quantity_in_minutes_origin,AO1_WTHR_depth_dimension_origin,WND_WTHR_direction_angle_dest,WND_WTHR_speed_rate_dest,CIG_WTHR_ceiling_height_dimension_dest,VIS_WTHR_distance_dimension_dest,TMP_WTHR_air_temperature_dest,DEW_WTHR_dew_point_temperature_dest,SLP_WTHR_sea_level_pressure_dest,GA1_WTHR_base_height_dimension_dest,GF1_WTHR_lowest_cloud_base_height_dimension_dest,MA1_WTHR_altimeter_setting_rate_dest,MA1_WTHR_station_pressure_rate_dest,AA1_WTHR_period_quantity_in_hours_dest,AA1_WTHR_depth_dimension_dest,AA2_WTHR_period_quantity_in_hours_dest,AA2_WTHR_depth_dimension_dest,AJ1_WTHR_dimension_dest,AJ1_WTHR_equivalent_water_depth_dimension_dest,AL1_WTHR_period_quantity_dest,AL1_WTHR_depth_dimension_dest,AN1_WTHR_period_quantity_dest,AN1_WTHR_depth_dimension_dest,AO1_WTHR_period_quantity_in_minutes_dest,AO1_WTHR_depth_dimension_dest,WND_WTHR_type_code_origin-V,WND_WTHR_type_code_origin-C,WND_WTHR_type_code_origin-N,WND_WTHR_type_code_origin-R,WND_WTHR_type_code_origin-H,CIG_WTHR_ceiling_determination_code_origin-M,CIG_WTHR_ceiling_determination_code_origin-C,CIG_WTHR_ceiling_determination_code_origin-W,CIG_WTHR_CAVOK_code_origin-Y,CIG_WTHR_CAVOK_code_origin-N,GA1_WTHR_coverage_code_origin-00,GA1_WTHR_coverage_code_origin-01,GA1_WTHR_coverage_code_origin-02,GA1_WTHR_coverage_code_origin-03,GA1_WTHR_coverage_code_origin-04,GA1_WTHR_coverage_code_origin-05,GA1_WTHR_coverage_code_origin-06,GA1_WTHR_coverage_code_origin-07,GA1_WTHR_coverage_code_origin-08,GA1_WTHR_coverage_code_origin-09,GA1_WTHR_coverage_code_origin-10,GA1_WTHR_cloud_type_code_origin-00,GA1_WTHR_cloud_type_code_origin-01,GA1_WTHR_cloud_type_code_origin-02,GA1_WTHR_cloud_type_code_origin-03,GA1_WTHR_cloud_type_code_origin-04,GA1_WTHR_cloud_type_code_origin-05,GA1_WTHR_cloud_type_code_origin-06,GA1_WTHR_cloud_type_code_origin-07,GA1_WTHR_cloud_type_code_origin-08,GA1_WTHR_cloud_type_code_origin-09,GA1_WTHR_cloud_type_code_origin-10,GA1_WTHR_cloud_type_code_origin-12,GA1_WTHR_cloud_type_code_origin-15,GF1_WTHR_total_coverage_code_origin-00,GF1_WTHR_total_coverage_code_origin-01,GF1_WTHR_total_coverage_code_origin-02,GF1_WTHR_total_coverage_code_origin-03,GF1_WTHR_total_coverage_code_origin-04,GF1_WTHR_total_coverage_code_origin-05,GF1_WTHR_total_coverage_code_origin-06,GF1_WTHR_total_coverage_code_origin-07,GF1_WTHR_total_coverage_code_origin-08,GF1_WTHR_total_coverage_code_origin-09,GF1_WTHR_total_lowest_cloud_cover_code_origin-00,GF1_WTHR_total_lowest_cloud_cover_code_origin-01,GF1_WTHR_total_lowest_cloud_cover_code_origin-02,GF1_WTHR_total_lowest_cloud_cover_code_origin-03,GF1_WTHR_total_lowest_cloud_cover_code_origin-04,GF1_WTHR_total_lowest_cloud_cover_code_origin-05,GF1_WTHR_total_lowest_cloud_cover_code_origin-06,GF1_WTHR_total_lowest_cloud_cover_code_origin-07,GF1_WTHR_total_lowest_cloud_cover_code_origin-08,GF1_WTHR_total_lowest_cloud_cover_code_origin-09,GF1_WTHR_total_lowest_cloud_cover_code_origin-10,AA1_WTHR_condition_code_origin-3,AA1_WTHR_condition_code_origin-1,AA1_WTHR_condition_code_origin-2,AA2_WTHR_condition_code_origin-M,AA2_WTHR_condition_code_origin-C,AA2_WTHR_condition_code_origin-W,AJ1_WTHR_condition_code_origin-3,AJ1_WTHR_condition_code_origin-1,AL1_WTHR_condition_code_origin-3,AL1_WTHR_condition_code_origin-1,AN1_WTHR_condition_code_origin-3,AU1_WTHR_intensity_and_proximity_code_origin-0,AU1_WTHR_intensity_and_proximity_code_origin-1,AU1_WTHR_intensity_and_proximity_code_origin-2,AU1_WTHR_intensity_and_proximity_code_origin-3,AU1_WTHR_intensity_and_proximity_code_origin-4,AU1_WTHR_descriptor_code_origin-0,AU1_WTHR_descriptor_code_origin-1,AU1_WTHR_descriptor_code_origin-2,AU1_WTHR_descriptor_code_origin-3,AU1_WTHR_descriptor_code_origin-4,AU1_WTHR_descriptor_code_origin-5,AU1_WTHR_descriptor_code_origin-6,AU1_WTHR_descriptor_code_origin-7,AU1_WTHR_descriptor_code_origin-8,AU1_WTHR_precipitation_code_origin-00,AU1_WTHR_precipitation_code_origin-01,AU1_WTHR_precipitation_code_origin-02,AU1_WTHR_precipitation_code_origin-03,AU1_WTHR_precipitation_code_origin-04,AU1_WTHR_precipitation_code_origin-05,AU1_WTHR_precipitation_code_origin-06,AU1_WTHR_precipitation_code_origin-07,AU1_WTHR_precipitation_code_origin-08,AU1_WTHR_precipitation_code_origin-09,AU1_WTHR_obscuration_code_origin-0,AU1_WTHR_obscuration_code_origin-1,AU1_WTHR_obscuration_code_origin-2,AU1_WTHR_obscuration_code_origin-3,AU1_WTHR_obscuration_code_origin-4,AU1_WTHR_obscuration_code_origin-5,AU1_WTHR_obscuration_code_origin-6,AU1_WTHR_obscuration_code_origin-7,AU1_WTHR_other_weather_phenomena_code_origin-0,AU1_WTHR_other_weather_phenomena_code_origin-1,AU1_WTHR_other_weather_phenomena_code_origin-2,AU1_WTHR_other_weather_phenomena_code_origin-3,AU1_WTHR_other_weather_phenomena_code_origin-4,AU1_WTHR_other_weather_phenomena_code_origin-5,AU1_WTHR_combination_indicator_code_origin-1,AU1_WTHR_combination_indicator_code_origin-2,AU1_WTHR_combination_indicator_code_origin-3,AT1_WTHR_weather_type_origin-00,AT1_WTHR_weather_type_origin-01,AT1_WTHR_weather_type_origin-02,AT1_WTHR_weather_type_origin-03,AT1_WTHR_weather_type_origin-04,AT1_WTHR_weather_type_origin-05,AT1_WTHR_weather_type_origin-06,AT1_WTHR_weather_type_origin-07,AT1_WTHR_weather_type_origin-08,AT1_WTHR_weather_type_origin-09,AT1_WTHR_weather_type_origin-10,AT1_WTHR_weather_type_origin-13,AT1_WTHR_weather_type_origin-14,AT1_WTHR_weather_type_origin-15,AT1_WTHR_weather_type_origin-16,AT1_WTHR_weather_type_origin-17,AT1_WTHR_weather_type_origin-18,AT1_WTHR_weather_type_origin-19,AT1_WTHR_weather_type_origin-21,AT1_WTHR_weather_type_origin-22,AT1_WTHR_weather_type_abbreviation_origin-FG,AT1_WTHR_weather_type_abbreviation_origin-BR,AT1_WTHR_weather_type_abbreviation_origin-MIFG,AT1_WTHR_weather_type_abbreviation_origin-FC,AT1_WTHR_weather_type_abbreviation_origin-SN,AT1_WTHR_weather_type_abbreviation_origin-DZ,AT1_WTHR_weather_type_abbreviation_origin-RA,AT1_WTHR_weather_type_abbreviation_origin-FZFG,AT1_WTHR_weather_type_abbreviation_origin-TS,AT1_WTHR_weather_type_abbreviation_origin-UP,AT1_WTHR_weather_type_abbreviation_origin-FZDZ,AT1_WTHR_weather_type_abbreviation_origin-BLSN,AT1_WTHR_weather_type_abbreviation_origin-HZ,AT1_WTHR_weather_type_abbreviation_origin-FZRA,AT1_WTHR_weather_type_abbreviation_origin-GR,AT1_WTHR_weather_type_abbreviation_origin-PL,AT1_WTHR_weather_type_abbreviation_origin-DU,AT1_WTHR_weather_type_abbreviation_origin-FG+,WND_WTHR_type_code_dest-V,WND_WTHR_type_code_dest-C,WND_WTHR_type_code_dest-N,WND_WTHR_type_code_dest-R,WND_WTHR_type_code_dest-H,CIG_WTHR_ceiling_determination_code_dest-M,CIG_WTHR_ceiling_determination_code_dest-C,CIG_WTHR_ceiling_determination_code_dest-W,CIG_WTHR_CAVOK_code_dest-Y,CIG_WTHR_CAVOK_code_dest-N,GA1_WTHR_coverage_code_dest-00,GA1_WTHR_coverage_code_dest-01,GA1_WTHR_coverage_code_dest-02,GA1_WTHR_coverage_code_dest-03,GA1_WTHR_coverage_code_dest-04,GA1_WTHR_coverage_code_dest-05,GA1_WTHR_coverage_code_dest-06,GA1_WTHR_coverage_code_dest-07,GA1_WTHR_coverage_code_dest-08,GA1_WTHR_coverage_code_dest-09,GA1_WTHR_coverage_code_dest-10,GA1_WTHR_cloud_type_code_dest-00,GA1_WTHR_cloud_type_code_dest-01,GA1_WTHR_cloud_type_code_dest-02,GA1_WTHR_cloud_type_code_dest-03,GA1_WTHR_cloud_type_code_dest-04,GA1_WTHR_cloud_type_code_dest-05,GA1_WTHR_cloud_type_code_dest-06,GA1_WTHR_cloud_type_code_dest-07,GA1_WTHR_cloud_type_code_dest-08,GA1_WTHR_cloud_type_code_dest-09,GA1_WTHR_cloud_type_code_dest-10,GA1_WTHR_cloud_type_code_dest-12,GA1_WTHR_cloud_type_code_dest-15,GF1_WTHR_total_coverage_code_dest-00,GF1_WTHR_total_coverage_code_dest-01,GF1_WTHR_total_coverage_code_dest-02,GF1_WTHR_total_coverage_code_dest-03,GF1_WTHR_total_coverage_code_dest-04,GF1_WTHR_total_coverage_code_dest-05,GF1_WTHR_total_coverage_code_dest-06,GF1_WTHR_total_coverage_code_dest-07,GF1_WTHR_total_coverage_code_dest-08,GF1_WTHR_total_coverage_code_dest-09,GF1_WTHR_total_lowest_cloud_cover_code_dest-00,GF1_WTHR_total_lowest_cloud_cover_code_dest-01,GF1_WTHR_total_lowest_cloud_cover_code_dest-02,GF1_WTHR_total_lowest_cloud_cover_code_dest-03,GF1_WTHR_total_lowest_cloud_cover_code_dest-04,GF1_WTHR_total_lowest_cloud_cover_code_dest-05,GF1_WTHR_total_lowest_cloud_cover_code_dest-06,GF1_WTHR_total_lowest_cloud_cover_code_dest-07,GF1_WTHR_total_lowest_cloud_cover_code_dest-08,GF1_WTHR_total_lowest_cloud_cover_code_dest-09,GF1_WTHR_total_lowest_cloud_cover_code_dest-10,AA1_WTHR_condition_code_dest-3,AA1_WTHR_condition_code_dest-1,AA1_WTHR_condition_code_dest-2,AA2_WTHR_condition_code_dest-M,AA2_WTHR_condition_code_dest-C,AA2_WTHR_condition_code_dest-W,AJ1_WTHR_condition_code_dest-3,AJ1_WTHR_condition_code_dest-1,AL1_WTHR_condition_code_dest-3,AL1_WTHR_condition_code_dest-1,AN1_WTHR_condition_code_dest-3,AU1_WTHR_intensity_and_proximity_code_dest-0,AU1_WTHR_intensity_and_proximity_code_dest-1,AU1_WTHR_intensity_and_proximity_code_dest-2,AU1_WTHR_intensity_and_proximity_code_dest-3,AU1_WTHR_intensity_and_proximity_code_dest-4,AU1_WTHR_descriptor_code_dest-0,AU1_WTHR_descriptor_code_dest-1,AU1_WTHR_descriptor_code_dest-2,AU1_WTHR_descriptor_code_dest-3,AU1_WTHR_descriptor_code_dest-4,AU1_WTHR_descriptor_code_dest-5,AU1_WTHR_descriptor_code_dest-6,AU1_WTHR_descriptor_code_dest-7,AU1_WTHR_descriptor_code_dest-8,AU1_WTHR_precipitation_code_dest-00,AU1_WTHR_precipitation_code_dest-01,AU1_WTHR_precipitation_code_dest-02,AU1_WTHR_precipitation_code_dest-03,AU1_WTHR_precipitation_code_dest-04,AU1_WTHR_precipitation_code_dest-05,AU1_WTHR_precipitation_code_dest-06,AU1_WTHR_precipitation_code_dest-07,AU1_WTHR_precipitation_code_dest-08,AU1_WTHR_precipitation_code_dest-09,AU1_WTHR_obscuration_code_dest-0,AU1_WTHR_obscuration_code_dest-1,AU1_WTHR_obscuration_code_dest-2,AU1_WTHR_obscuration_code_dest-3,AU1_WTHR_obscuration_code_dest-4,AU1_WTHR_obscuration_code_dest-5,AU1_WTHR_obscuration_code_dest-6,AU1_WTHR_obscuration_code_dest-7,AU1_WTHR_other_weather_phenomena_code_dest-0,AU1_WTHR_other_weather_phenomena_code_dest-1,AU1_WTHR_other_weather_phenomena_code_dest-2,AU1_WTHR_other_weather_phenomena_code_dest-3,AU1_WTHR_other_weather_phenomena_code_dest-4,AU1_WTHR_other_weather_phenomena_code_dest-5,AU1_WTHR_combination_indicator_code_dest-1,AU1_WTHR_combination_indicator_code_dest-2,AU1_WTHR_combination_indicator_code_dest-3,AT1_WTHR_weather_type_dest-00,AT1_WTHR_weather_type_dest-01,AT1_WTHR_weather_type_dest-02,AT1_WTHR_weather_type_dest-03,AT1_WTHR_weather_type_dest-04,AT1_WTHR_weather_type_dest-05,AT1_WTHR_weather_type_dest-06,AT1_WTHR_weather_type_dest-07,AT1_WTHR_weather_type_dest-08,AT1_WTHR_weather_type_dest-09,AT1_WTHR_weather_type_dest-10,AT1_WTHR_weather_type_dest-13,AT1_WTHR_weather_type_dest-14,AT1_WTHR_weather_type_dest-15,AT1_WTHR_weather_type_dest-16,AT1_WTHR_weather_type_dest-17,AT1_WTHR_weather_type_dest-18,AT1_WTHR_weather_type_dest-19,AT1_WTHR_weather_type_dest-21,AT1_WTHR_weather_type_dest-22,AT1_WTHR_weather_type_abbreviation_dest-FG,AT1_WTHR_weather_type_abbreviation_dest-BR,AT1_WTHR_weather_type_abbreviation_dest-MIFG,AT1_WTHR_weather_type_abbreviation_dest-FC,AT1_WTHR_weather_type_abbreviation_dest-SN,AT1_WTHR_weather_type_abbreviation_dest-DZ,AT1_WTHR_weather_type_abbreviation_dest-RA,AT1_WTHR_weather_type_abbreviation_dest-FZFG,AT1_WTHR_weather_type_abbreviation_dest-TS,AT1_WTHR_weather_type_abbreviation_dest-UP,AT1_WTHR_weather_type_abbreviation_dest-FZDZ,AT1_WTHR_weather_type_abbreviation_dest-BLSN,AT1_WTHR_weather_type_abbreviation_dest-HZ,AT1_WTHR_weather_type_abbreviation_dest-FZRA,AT1_WTHR_weather_type_abbreviation_dest-GR,AT1_WTHR_weather_type_abbreviation_dest-PL,AT1_WTHR_weather_type_abbreviation_dest-DU,AT1_WTHR_weather_type_abbreviation_dest-FG+,LOCAL_DEP_HOUR,HOLIDAY,Prev_Flight_Delay_15,Enough_Time_Btwn_Estimate_Arrival_and_Planned_Dep,Poor_Schedule
2015,1,1,1,4,2015-01-01,951,0900-0959,959,1206,127.0,1146,1200-1259,115.0,LGA,"New York, NY",NY,36,New York,22,CLT,"Charlotte, NC",NC,37,North Carolina,36,US,1.0,544.0,3,,-8.0,0.0,-1,0.0,-20.0,0.0,-2,0.0,,,,,N102UW,240518239435,2015-01-01T09:59:00.000+0000,LGA,America/New_York,2015-01-01T14:59:00.000+0000,2015-01-01T11:59:00.000+0000,2015-01-01T11:00:00.000+0000,2015-01-01T17:06:00.000+0000,72503014732,7,2015-01-01T11:51:00.000+0000,40.7792,-73.88,3.4,"LA GUARDIA AIRPORT, NY US",FM-15,KLGA,V030,LGA,KLGA,72503014732,America/New_York,2015-01-01T11:00:00.000+0000,72314013881,7,2015-01-01T11:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-01T11:00:00.000+0000,1,260.0,5.7,22000.0,16093.0,-2.2,-14.4,1021.2,1341.0,1341.0,1021.3,1020.2,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,22000.0,14484.0,-1.7,-3.9,1027.2,7620.0,7620.0,1026.8,998.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,1,0,1,0
2015,1,1,1,4,2015-01-01,1242,1100-1159,1120,1304,164.0,1433,1300-1359,171.0,CLT,"Charlotte, NC",NC,37,North Carolina,36,MSP,"Minneapolis, MN",MN,27,Minnesota,63,US,1.0,930.0,4,82.0,82.0,82.0,5,1.0,89.0,89.0,5,1.0,0.0,7.0,0.0,0.0,N102UW,257698071808,2015-01-01T11:20:00.000+0000,CLT,America/New_York,2015-01-01T16:20:00.000+0000,2015-01-01T13:20:00.000+0000,2015-01-01T13:00:00.000+0000,2015-01-01T19:04:00.000+0000,72314013881,7,2015-01-01T13:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-01T13:00:00.000+0000,72658014922,7,2015-01-01T13:53:00.000+0000,44.8831,-93.2289,265.8,"MINNEAPOLIS ST PAUL INTERNATIONAL AIRPORT, MN US",FM-15,KMSP,V030,MSP,KMSP,72658014922,America/Chicago,2015-01-01T13:00:00.000+0000,1,1.0,0.0,22000.0,12875.0,1.7,-0.6,1027.7,4572.0,4572.0,1027.4,999.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,230.0,4.1,1829.0,12875.0,-8.3,-11.7,1012.2,1463.0,1463.0,1010.5,980.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,1,2,2,1
2015,1,1,1,4,2015-01-01,1511,1300-1359,1350,1714,144.0,1824,1700-1759,133.0,MSP,"Minneapolis, MN",MN,27,Minnesota,63,CLT,"Charlotte, NC",NC,37,North Carolina,36,US,1.0,930.0,4,0.0,81.0,81.0,5,1.0,70.0,70.0,4,1.0,0.0,0.0,0.0,70.0,N102UW,1159641203940,2015-01-01T13:50:00.000+0000,MSP,America/Chicago,2015-01-01T19:50:00.000+0000,2015-01-01T16:50:00.000+0000,2015-01-01T16:00:00.000+0000,2015-01-01T22:14:00.000+0000,72658014922,7,2015-01-01T16:53:00.000+0000,44.8831,-93.2289,265.8,"MINNEAPOLIS ST PAUL INTERNATIONAL AIRPORT, MN US",FM-15,KMSP,V030,MSP,KMSP,72658014922,America/Chicago,2015-01-01T16:00:00.000+0000,72314013881,7,2015-01-01T16:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-01T16:00:00.000+0000,1,250.0,4.6,1829.0,16093.0,-4.4,-9.4,1013.2,1219.0,1219.0,1011.9,981.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,250.0,2.6,22000.0,16093.0,7.8,0.0,1026.5,7620.0,7620.0,1026.4,998.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,1,1,0,0
2015,1,1,1,4,2015-01-01,2030,2000-2059,2010,2213,123.0,2214,2200-2259,104.0,CLT,"Charlotte, NC",NC,37,North Carolina,36,MIA,"Miami, FL",FL,12,Florida,33,US,1.0,650.0,3,,20.0,20.0,1,1.0,1.0,1.0,0,0.0,,,,,N102UW,455266603567,2015-01-01T20:10:00.000+0000,CLT,America/New_York,2015-01-02T01:10:00.000+0000,2015-01-01T22:10:00.000+0000,2015-01-01T22:00:00.000+0000,2015-01-02T03:13:00.000+0000,72314013881,7,2015-01-01T22:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-01T22:00:00.000+0000,72202012839,7,2015-01-01T22:53:00.000+0000,25.7881,-80.3169,8.8,"MIAMI INTERNATIONAL AIRPORT, FL US",FM-15,KMIA,V030,MIA,KMIA,72202012839,America/New_York,2015-01-01T22:00:00.000+0000,1,220.0,2.6,22000.0,16093.0,8.3,-1.7,1024.3,4877.0,4877.0,1024.0,996.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,110.0,4.1,7620.0,16093.0,25.0,20.0,1020.6,762.0,762.0,1020.7,1019.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,1,1,1,0
2015,1,1,2,5,2015-01-02,828,0800-0859,830,1036,126.0,1023,1000-1059,115.0,MIA,"Miami, FL",FL,12,Florida,33,CLT,"Charlotte, NC",NC,37,North Carolina,36,US,1.0,650.0,3,,-2.0,0.0,-1,0.0,-13.0,0.0,-1,0.0,,,,,N102UW,798864024151,2015-01-02T08:30:00.000+0000,MIA,America/New_York,2015-01-02T13:30:00.000+0000,2015-01-02T10:30:00.000+0000,2015-01-02T10:00:00.000+0000,2015-01-02T15:36:00.000+0000,72202012839,7,2015-01-02T10:53:00.000+0000,25.7881,-80.3169,8.8,"MIAMI INTERNATIONAL AIRPORT, FL US",FM-15,KMIA,V030,MIA,KMIA,72202012839,America/New_York,2015-01-02T10:00:00.000+0000,72314013881,7,2015-01-02T10:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-02T10:00:00.000+0000,1,1.0,0.0,1067.0,16093.0,22.8,21.1,1021.9,610.0,610.0,1022.0,1020.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,200.0,1.5,2896.0,16093.0,6.7,0.6,1024.0,2896.0,2896.0,1024.0,996.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,2,1,1,0
2015,1,1,2,5,2015-01-02,1141,1100-1159,1145,1310,85.0,1303,1300-1359,82.0,CLT,"Charlotte, NC",NC,37,North Carolina,36,JAX,"Jacksonville, FL",FL,12,Florida,33,US,1.0,328.0,2,,-4.0,0.0,-1,0.0,-7.0,0.0,-1,0.0,,,,,N102UW,1675037324243,2015-01-02T11:45:00.000+0000,CLT,America/New_York,2015-01-02T16:45:00.000+0000,2015-01-02T13:45:00.000+0000,2015-01-02T13:00:00.000+0000,2015-01-02T18:10:00.000+0000,72314013881,7,2015-01-02T13:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-02T13:00:00.000+0000,72206013889,7,2015-01-02T13:01:00.000+0000,30.495,-81.6936,7.9,"JACKSONVILLE INTERNATIONAL AIRPORT, FL US",FM-16,KJAX,V030,JAX,KJAX,72206013889,America/New_York,2015-01-02T13:00:00.000+0000,1,1.0,0.0,3353.0,16093.0,7.8,2.2,1025.5,3353.0,3353.0,1025.4,997.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,340.0,2.6,122.0,2414.0,12.8,12.8,860.0,122.0,122.0,1025.4,1024.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,11,2,0,1,0
2015,1,1,2,5,2015-01-02,1355,1300-1359,1355,1516,81.0,1513,1500-1559,78.0,JAX,"Jacksonville, FL",FL,12,Florida,33,CLT,"Charlotte, NC",NC,37,North Carolina,36,US,1.0,328.0,2,,0.0,0.0,0,0.0,-3.0,0.0,-1,0.0,,,,,N102UW,377957129355,2015-01-02T13:55:00.000+0000,JAX,America/New_York,2015-01-02T18:55:00.000+0000,2015-01-02T15:55:00.000+0000,2015-01-02T15:00:00.000+0000,2015-01-02T20:16:00.000+0000,72206013889,4,2015-01-02T15:00:00.000+0000,30.495,-81.6936,7.9,"JACKSONVILLE INTERNATIONAL AIRPORT, FL US",FM-12,99999,V020,JAX,KJAX,72206013889,America/New_York,2015-01-02T15:00:00.000+0000,72314013881,4,2015-01-02T15:00:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-12,99999,V020,CLT,KCLT,72314013881,America/New_York,2015-01-02T15:00:00.000+0000,1,50.0,3.1,0.0,16000.0,15.6,14.4,1026.1,250.0,250.0,10132.5,1025.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,16000.0,8.9,1.7,1026.0,0.0,6000.0,10132.5,999.6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,13,2,0,1,0
2015,1,1,2,5,2015-01-02,1629,1600-1659,1616,1706,50.0,1720,1700-1759,51.0,CLT,"Charlotte, NC",NC,37,North Carolina,36,RDU,"Raleigh/Durham, NC",NC,37,North Carolina,36,US,1.0,130.0,1,,13.0,13.0,0,0.0,14.0,14.0,0,0.0,,,,,N102UW,1288490339311,2015-01-02T16:16:00.000+0000,CLT,America/New_York,2015-01-02T21:16:00.000+0000,2015-01-02T18:16:00.000+0000,2015-01-02T18:00:00.000+0000,2015-01-02T22:06:00.000+0000,72314013881,4,2015-01-02T18:00:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-12,99999,V020,CLT,KCLT,72314013881,America/New_York,2015-01-02T18:00:00.000+0000,72306013722,4,2015-01-02T18:00:00.000+0000,35.8923,-78.7819,126.8,"RALEIGH AIRPORT, NC US",FM-12,99999,V020,RDU,KRDU,72306013722,America/New_York,2015-01-02T18:00:00.000+0000,1,160.0,2.1,0.0,16000.0,10.6,2.8,1025.2,1750.0,1750.0,10132.5,998.9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,16000.0,12.2,2.2,1026.2,0.0,6000.0,10132.5,1010.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,2,0,1,0
2015,1,1,2,5,2015-01-02,1752,1700-1759,1759,1902,63.0,1846,1900-1959,54.0,RDU,"Raleigh/Durham, NC",NC,37,North Carolina,36,CLT,"Charlotte, NC",NC,37,North Carolina,36,US,1.0,130.0,1,,-7.0,0.0,-1,0.0,-16.0,0.0,-2,0.0,,,,,N102UW,1254130568159,2015-01-02T17:59:00.000+0000,RDU,America/New_York,2015-01-02T22:59:00.000+0000,2015-01-02T19:59:00.000+0000,2015-01-02T19:00:00.000+0000,2015-01-03T00:02:00.000+0000,72306013722,7,2015-01-02T19:51:00.000+0000,35.8923,-78.7819,126.8,"RALEIGH AIRPORT, NC US",FM-15,KRDU,V030,RDU,KRDU,72306013722,America/New_York,2015-01-02T19:00:00.000+0000,72314013881,7,2015-01-02T19:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-02T19:00:00.000+0000,1,220.0,1.5,2743.0,16093.0,12.2,2.8,1026.4,2743.0,2743.0,1026.4,1010.4,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,180.0,1.5,2743.0,16093.0,10.6,3.3,1025.5,2743.0,2743.0,1025.7,997.6,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,17,2,2,2,0
2015,1,1,2,5,2015-01-02,2011,2000-2059,2010,2213,123.0,2205,2200-2259,114.0,CLT,"Charlotte, NC",NC,37,North Carolina,36,MIA,"Miami, FL",FL,12,Florida,33,US,1.0,650.0,3,,1.0,1.0,0,0.0,-8.0,0.0,-1,0.0,,,,,N102UW,257698107799,2015-01-02T20:10:00.000+0000,CLT,America/New_York,2015-01-03T01:10:00.000+0000,2015-01-02T22:10:00.000+0000,2015-01-02T22:00:00.000+0000,2015-01-03T03:13:00.000+0000,72314013881,7,2015-01-02T22:52:00.000+0000,35.2236,-80.9552,221.9,"CHARLOTTE DOUGLAS AIRPORT, NC US",FM-15,KCLT,V030,CLT,KCLT,72314013881,America/New_York,2015-01-02T22:00:00.000+0000,72202012839,7,2015-01-02T22:53:00.000+0000,25.7881,-80.3169,8.8,"MIAMI INTERNATIONAL AIRPORT, FL US",FM-15,KMIA,V030,MIA,KMIA,72202012839,America/New_York,2015-01-02T22:00:00.000+0000,1,1.0,0.0,2591.0,16093.0,8.9,6.1,1027.3,1829.0,1829.0,1027.4,999.3,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,130.0,3.6,9144.0,16093.0,25.0,21.1,1022.0,671.0,671.0,1022.0,1020.9,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0,1,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,2,0,1,0


In [0]:
columns_to_drop = [
  'DEP_TIME_AIRLNS',
  'DEP_TIME_BLK_AIRLNS',
  'ARR_TIME_AIRLNS',
  'ARR_TIME_BLK_AIRLNS',
  'ACTUAL_ELAPSED_TIME_AIRLNS',
  'ORIGIN_CITY_NAME_AIRLNS',
  'ORIGIN_STATE_ABR_AIRLNS',
  'ORIGIN_STATE_FIPS_AIRLNS',
  'ORIGIN_STATE_NM_AIRLNS',
  'ORIGIN_WAC_AIRLNS',
  'DEST_CITY_NAME_AIRLNS',
  'DEST_STATE_ABR_AIRLNS',
  'DEST_STATE_FIPS_AIRLNS',
  'DEST_STATE_NM_AIRLNS',
  'DEST_WAC_AIRLNS',
  'DISTANCE_GROUP_AIRLNS',
  'CARRIER_DELAY_AIRLNS',
  'DEP_DELAY_AIRLNS',
  'DEP_DELAY_NEW_AIRLNS',
  'DEP_DELAY_GROUP_AIRLNS',
  'ARR_DELAY_AIRLNS',
  'ARR_DELAY_NEW_AIRLNS',
  'ARR_DELAY_GROUP_AIRLNS',
  'ARR_DEL15_AIRLNS',
  'WEATHER_DELAY_AIRLNS',
  'NAS_DELAY_AIRLNS',
  'SECURITY_DELAY_AIRLNS',
  'LATE_AIRCRAFT_DELAY_AIRLNS',
  'TAIL_NUM_AIRLNS',
  'id_AIRLNS',
  'datetime_dep',
  'iata_OPNFLGHT',
  'tz_db_time_zone_OPNFLGHT',
  'utc_dep',
  'utc_dep_3hrs_prior',
  'utc_dep_3hrs_prior_rounded',
  'utc_arrive',
  'STATION_WTHR_origin',
  'SOURCE_WTHR_origin',
  'DATE_WTHR_origin',
  'NAME_WTHR_origin',
  'REPORT_TYPE_WTHR_origin',
  'CALL_SIGN_WTHR_origin',
  'QUALITY_CONTROL_WTHR_origin',
  'iata_origin',
  'icao_origin',
  'station_origin',
  'tz_db_time_zone_origin',
  'DATE_WTHR_rounded_origin',
  'STATION_WTHR_dest',
  'SOURCE_WTHR_dest',
  'DATE_WTHR_dest',
  'NAME_WTHR_dest',
  'REPORT_TYPE_WTHR_dest',
  'CALL_SIGN_WTHR_dest',
  'QUALITY_CONTROL_WTHR_dest',
  'iata_dest',
  'icao_dest',
  'station_dest',
  'tz_db_time_zone_dest',
  'DATE_WTHR_rounded_dest',
  'row_number'
]

In [0]:
df_clean_5 = df_clean_4.drop(*columns_to_drop)
df_clean_5.columns

### Checkpoint 2 - Data ready for modelling

In [0]:
df_clean_5.write.parquet(f"{blob_url}/all_time_full_join_6")