# > **Data Visualization with Python (Part 4)**

In this notebook, I will use **plotly.graph_objects** and **plotly.express** for creating plots.

My dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time. The data is hosted on Data Asset eXchange and its format is .csv.





I've uploaded the dataset to Google Drive.

Let's get started!

In [53]:
!pip install -q -U watermark

In [54]:
%reload_ext watermark
%watermark -v -p pandas,plotly

Python implementation: CPython
Python version       : 3.7.14
IPython version      : 7.9.0

pandas: 1.3.5
plotly: 5.5.0



In [55]:
!pip install plotly-express

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [56]:
#import os
#import numpy as np
import pandas as pd
#from tqdm import tqdm
#import seaborn as sns
#from pylab import rcParams
#%matplotlib inline 
#import matplotlib as mpl
#import matplotlib.pyplot as plt
#from matplotlib import rc
#from scipy import stats
#import matplotlib.patches as mpatches
#from PIL import Image
#import folium
#import io
#import json
import plotly.express as px
import plotly.graph_objects as go


In [57]:
from google.colab import drive
drive.mount('/content/gdrive')

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).


In [58]:
!ls '/content/gdrive'

MyDrive


In [59]:
AirlineData = pd.read_csv('/content/gdrive/MyDrive/AirlineData.csv',encoding = "ISO-8859-1")
AirlineData.head(2)

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,...,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,...,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,...,,,,,,,,,,


In [60]:
AirlineData.shape

(27000, 110)

In [61]:
# Randomly sample 600 data points. I Setg the random state to be 42 so that I get same result.
data = AirlineData.sample(n=600, random_state=42)

data.shape

(600, 110)

- I'm going to apply ***'plotly.graph_objects'*** to discover How departure time changes with respect to airport distance. Let's draw a ***Scatter Plot*** in this section. 



*Just a Note:* With px.scatter, each data point is represented as a marker point, whose location is given by the x and y columns.

In [62]:
# First I will create a figure using go.Figure and adding trace to it through go.scatter
fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='green')))
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')

fig.show()

 Now, I also want to apply 'plotly.graph_objects' to extract average monthly arrival delay time and see how it changes over the year. ***Line Plot*** will be used. 


*Just a Note:* With px.line, each data point is represented as a vertex (which location is given by the x and y columns) of a polyline mark in 2D space.

In [63]:
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()

# Display the data
line_data

Unnamed: 0,Month,ArrDelay
0,1,2.638298
1,2,3.605263
2,3,11.318182
3,4,4.983871
4,5,2.471698
5,6,16.0
6,7,5.836364
7,8,2.285714
8,9,9.782609
9,10,2.488372


In [64]:
#I want to create a line plot with x-axis being the month and y-axis being computed average delay time.



#fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=line_data['ArrDelay'], mode='lines', marker=dict(color='navy')))
fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=line_data['ArrDelay']))
fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month', yaxis_title='ArrDelay')
fig.show()

I will use ***'plotly.express'*** to extract the number of flights from a specific airline that goes to a destination. ***Bar Chart*** will be plotted. 

In [65]:
# Group the data by destination state and reporting airline. Compute total number of flights in each combination
bar_data = data.groupby(['DestState'])['Flights'].sum().reset_index()

bar_data

Unnamed: 0,DestState,Flights
0,AK,5.0
1,AL,4.0
2,AR,1.0
3,AZ,9.0
4,CA,81.0
5,CO,22.0
6,CT,5.0
7,FL,37.0
8,GA,32.0
9,HI,6.0


In [66]:
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of the chart.
# This will give total number of flights to the destination state.
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination state split by reporting Airline') 

fig.show()

I want to use **'plotly.express'** to get the number of flights as per reporting airline. I will plot **Bubble Chart**. 

In [67]:
# Group the data by reporting airline and get number of flights
bub_data = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()

bub_data

Unnamed: 0,Reporting_Airline,Flights
0,9E,5.0
1,AA,68.0
2,AS,21.0
3,B6,12.0
4,CO,21.0
5,DH,1.0
6,DL,84.0
7,EA,4.0
8,EV,13.0
9,F9,4.0


In [68]:
fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",
                 hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights', size_max=50)
fig.show()


I will use **'plotly.express'** to get the distribution of arrival delay, and plot **Histogram**.




In [69]:
# Set missing values to 0
data['ArrDelay'] = data['ArrDelay'].fillna(0)

# Create histogram here
#fig = px.histogram(data, x="ArrDelay")
#fig.show()

fig = px.histogram(data, x="ArrDelay",
                   title='Histogram of arrival delay',
                   labels={'ArrDelay':'ArrDelay'}, # can specify one label per df column
                   opacity=0.8,
                   log_y=True, # represent bars with log scale
                   color_discrete_sequence=['navy'] # color of histogram bars
                   )
fig.show()

I want to use **'plotly.express'** to get the proportion of distance group by month (month indicated by numbers), and plot **Pie Chart**. 


In [70]:
# Values parameter will set values associated to the sector. 'Month' feature is passed to it.
# labels for the sector are passed to the `names` parameter.

fig = px.pie(data, values='Month', names='DistanceGroup', color='DistanceGroup',
             color_discrete_map={'1':'lightcyan',
                                 '2':'cyan',
                                 '3':'royalblue',
                                 '4':'darkblue',
                                 '5': 'green',
                                 '6': 'yellow',
                                 '7': 'pink',
                                 '8': 'red',
                                 '9': 'navy',
                                 '10': 'orange',
                                 '11': 'brown'
                                 })

fig.show()

I am going to apply **'plotly.express'** to demonstrate a hierarchical view in the order of month and destination state holding the value of the number of flights, and plot **Sunburst Charts**.

In [71]:
fig = px.sunburst(data, path=['Month', 'DestStateName'], values='Flights')
fig.show()