# Projeto 3 - Predição de varíavel quantitativa

O objetivo deste projeto da disciplina de Ciência de Dados é criar um modelo de predição do tempo de atraso de voos domésticos nos EUA, utilizando um dataset que contém informações de todos os voos domésticos ocorridos em 2023. O modelo será desenvolvido com o intuito de fornecer insights e previsões precisas sobre os atrasos de voos, auxiliando na tomada de decisões e no planejamento de viagens aéreas.

----

## Bibliotecas e leitura dos dados

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from math import *

In [2]:
dados = pd.read_csv('US_flights_2023.csv')
clima = pd.read_csv('weather_meteo_by_airport.csv')

----
## Limpeza e pré-processamento de dados

Nesta seção, será feita a limpeza da base dados, contemplando somente as informações julgadas relevantes dado o objetivo do projeto.

In [4]:
dados.columns

Index(['FlightDate', 'Day_Of_Week', 'Airline', 'Tail_Number', 'Dep_Airport',
       'Dep_CityName', 'DepTime_label', 'Dep_Delay', 'Dep_Delay_Tag',
       'Dep_Delay_Type', 'Arr_Airport', 'Arr_CityName', 'Arr_Delay',
       'Arr_Delay_Type', 'Flight_Duration', 'Distance_type', 'Delay_Carrier',
       'Delay_Weather', 'Delay_NAS', 'Delay_Security', 'Delay_LastAircraft',
       'Manufacturer', 'Model', 'Aicraft_age'],
      dtype='object')

In [5]:
columns_to_drop = ['Tail_Number', 'Dep_CityName', 'Arr_CityName', 'Flight_Duration', 'Manufacturer', 'Model', 'Aicraft_age']

for column in columns_to_drop:
    dados = dados.drop(column, axis=1)

In [6]:
display(dados.head(5))
display(dados.columns)
display(dados.shape)

Unnamed: 0,FlightDate,Day_Of_Week,Airline,Dep_Airport,DepTime_label,Dep_Delay,Dep_Delay_Tag,Dep_Delay_Type,Arr_Airport,Arr_Delay,Arr_Delay_Type,Distance_type,Delay_Carrier,Delay_Weather,Delay_NAS,Delay_Security,Delay_LastAircraft
0,2023-01-02,1,Endeavor Air,BDL,Morning,-3,0,Low <5min,LGA,-12,Low <5min,Short Haul >1500Mi,0,0,0,0,0
1,2023-01-03,2,Endeavor Air,BDL,Morning,-5,0,Low <5min,LGA,-8,Low <5min,Short Haul >1500Mi,0,0,0,0,0
2,2023-01-04,3,Endeavor Air,BDL,Morning,-5,0,Low <5min,LGA,-21,Low <5min,Short Haul >1500Mi,0,0,0,0,0
3,2023-01-05,4,Endeavor Air,BDL,Morning,-6,0,Low <5min,LGA,-17,Low <5min,Short Haul >1500Mi,0,0,0,0,0
4,2023-01-06,5,Endeavor Air,BDL,Morning,-1,0,Low <5min,LGA,-16,Low <5min,Short Haul >1500Mi,0,0,0,0,0


Index(['FlightDate', 'Day_Of_Week', 'Airline', 'Dep_Airport', 'DepTime_label',
       'Dep_Delay', 'Dep_Delay_Tag', 'Dep_Delay_Type', 'Arr_Airport',
       'Arr_Delay', 'Arr_Delay_Type', 'Distance_type', 'Delay_Carrier',
       'Delay_Weather', 'Delay_NAS', 'Delay_Security', 'Delay_LastAircraft'],
      dtype='object')

(6743404, 17)

In [7]:
dados = pd.merge(dados, clima[['time', 'airport_id', 'prcp', 'snow', 'wspd']], left_on=['FlightDate', 'Dep_Airport'], right_on=['time', 'airport_id'], how='left')
dados.drop(['time', 'airport_id'], axis=1, inplace=True)

In [8]:
dados.loc[dados.Dep_Airport == 'ATL', :]

Unnamed: 0,FlightDate,Day_Of_Week,Airline,Dep_Airport,DepTime_label,Dep_Delay,Dep_Delay_Tag,Dep_Delay_Type,Arr_Airport,Arr_Delay,Arr_Delay_Type,Distance_type,Delay_Carrier,Delay_Weather,Delay_NAS,Delay_Security,Delay_LastAircraft,prcp,snow,wspd
31,2023-01-09,1,Endeavor Air,ATL,Afternoon,-3,0,Low <5min,FAY,-12,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,9.3
32,2023-01-10,2,Endeavor Air,ATL,Afternoon,-5,0,Low <5min,FAY,-12,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,2.7
33,2023-01-11,3,Endeavor Air,ATL,Afternoon,-2,0,Low <5min,FAY,-7,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,3.2
34,2023-01-12,4,Endeavor Air,ATL,Afternoon,16,1,Medium >15min,FAY,11,Low <5min,Short Haul >1500Mi,0,0,0,0,0,20.1,0.0,11.7
35,2023-01-13,5,Endeavor Air,ATL,Afternoon,-3,0,Low <5min,FAY,-8,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.3,0.0,19.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6743020,2023-12-31,7,JetBlue Airways,ATL,Morning,-6,0,Low <5min,FLL,-6,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,5.0
6743035,2023-12-31,7,JetBlue Airways,ATL,Afternoon,-8,0,Low <5min,JFK,-14,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,5.0
6743142,2023-12-31,7,JetBlue Airways,ATL,Morning,-3,0,Low <5min,BOS,22,Medium >15min,Short Haul >1500Mi,0,0,22,0,0,0.0,0.0,5.0
6743246,2023-12-31,7,JetBlue Airways,ATL,Afternoon,13,1,Low <5min,JFK,-2,Low <5min,Short Haul >1500Mi,0,0,0,0,0,0.0,0.0,5.0


----
## Bibliografia

- Fonte do dataset: https://www.kaggle.com/datasets/bordanova/2023-us-civil-flights-delay-meteo-and-aircraft (dados fornecidos pelo Bureau of Transportation Statistics)