# Basic Plotly Charts

## Objectives

In this lab, you will learn about creating plotly charts using plotly.graph_objects and plotly.express.

Learn more about:

*   [Plotly python](https://plotly.com/python/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01)
*   [Plotly Graph Objects](https://plotly.com/python/graph-objects/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01)
*   [Plotly Express](https://plotly.com/python/plotly-express/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01)
*   Handling data using [Pandas](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkDV0101ENSkillsNetwork20297740-2021-01-01)

#### Airline Reporting Carrier On-Time Performance Dataset

The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

Preview data, dataset metadata, and data glossary [here.](https://dax-cdn.cdn.appdomain.cloud/dax-airline/1.0.1/data-preview/index.html)

In [1]:
# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

### Read Data

In [2]:
url = '/Users/QXJ/Desktop/IBM/Data visualization/airline_data.csv'
pd.set_option('display.max_columns', None)
df = pd.read_csv(url,index_col = False)
df.head()

Unnamed: 0.1,Unnamed: 0,Year,Quarter,Month,DayofMonth,DayOfWeek,FlightDate,Reporting_Airline,DOT_ID_Reporting_Airline,IATA_CODE_Reporting_Airline,Tail_Number,Flight_Number_Reporting_Airline,OriginAirportID,OriginAirportSeqID,OriginCityMarketID,Origin,OriginCityName,OriginState,OriginStateFips,OriginStateName,OriginWac,DestAirportID,DestAirportSeqID,DestCityMarketID,Dest,DestCityName,DestState,DestStateFips,DestStateName,DestWac,CRSDepTime,DepTime,DepDelay,DepDelayMinutes,DepDel15,DepartureDelayGroups,DepTimeBlk,TaxiOut,WheelsOff,WheelsOn,TaxiIn,CRSArrTime,ArrTime,ArrDelay,ArrDelayMinutes,ArrDel15,ArrivalDelayGroups,ArrTimeBlk,Cancelled,CancellationCode,Diverted,CRSElapsedTime,ActualElapsedTime,AirTime,Flights,Distance,DistanceGroup,CarrierDelay,WeatherDelay,NASDelay,SecurityDelay,LateAircraftDelay,FirstDepTime,TotalAddGTime,LongestAddGTime,DivAirportLandings,DivReachedDest,DivActualElapsedTime,DivArrDelay,DivDistance,Div1Airport,Div1AirportID,Div1AirportSeqID,Div1WheelsOn,Div1TotalGTime,Div1LongestGTime,Div1WheelsOff,Div1TailNum,Div2Airport,Div2AirportID,Div2AirportSeqID,Div2WheelsOn,Div2TotalGTime,Div2LongestGTime,Div2WheelsOff,Div2TailNum,Div3Airport,Div3AirportID,Div3AirportSeqID,Div3WheelsOn,Div3TotalGTime,Div3LongestGTime,Div3WheelsOff,Div3TailNum,Div4Airport,Div4AirportID,Div4AirportSeqID,Div4WheelsOn,Div4TotalGTime,Div4LongestGTime,Div4WheelsOff,Div4TailNum,Div5Airport,Div5AirportID,Div5AirportSeqID,Div5WheelsOn,Div5TotalGTime,Div5LongestGTime,Div5WheelsOff,Div5TailNum
0,1295781,1998,2,4,2,4,1998-04-02,AS,19930,AS,N785AS,584,11884,1188401,31884,GEG,"Spokane, WA",WA,53.0,Washington,93,14747,1474702,30559,SEA,"Seattle, WA",WA,53.0,Washington,93,1330,1330.0,0.0,0.0,0.0,0.0,1300-1359,8.0,1338.0,1415.0,5.0,1426,1420.0,-6.0,0.0,0.0,-1.0,1400-1459,0.0,,0.0,56.0,50.0,37.0,1.0,224.0,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,1125375,2013,2,5,13,1,2013-05-13,EV,20366,EV,N24103,4132,11618,1161802,31703,EWR,"Newark, NJ",NJ,34.0,New Jersey,21,14524,1452401,34524,RIC,"Richmond, VA",VA,51.0,Virginia,38,1301,1255.0,-6.0,0.0,0.0,-1.0,1300-1359,9.0,1304.0,1358.0,13.0,1423,1411.0,-12.0,0.0,0.0,-1.0,1400-1459,0.0,,0.0,82.0,76.0,54.0,1.0,277.0,2,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,118824,1993,3,9,25,6,1993-09-25,UA,19977,UA,,2206,14108,1410801,34108,PIA,"Peoria, IL",IL,17.0,Illinois,41,13930,1393001,30977,ORD,"Chicago, IL",IL,17.0,Illinois,41,1650,1723.0,33.0,33.0,1.0,2.0,1600-1659,,,,,1730,1815.0,45.0,45.0,1.0,3.0,1700-1759,0.0,,0.0,40.0,52.0,,1.0,130.0,1,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,634825,1994,4,11,12,6,1994-11-12,HP,19991,HP,,1207,12892,1289201,32575,LAX,"Los Angeles, CA",CA,6.0,California,91,14107,1410701,30466,PHX,"Phoenix, AZ",AZ,4.0,Arizona,81,1245,1309.0,24.0,24.0,1.0,1.0,1200-1259,,,,,1457,1538.0,41.0,41.0,1.0,2.0,1400-1459,0.0,,0.0,72.0,89.0,,1.0,370.0,2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,1888125,2017,3,8,17,4,2017-08-17,UA,19977,UA,N827UA,576,11003,1100303,31003,CID,"Cedar Rapids/Iowa City, IA",IA,19.0,Iowa,61,11292,1129202,30325,DEN,"Denver, CO",CO,8.0,Colorado,82,755,746.0,-9.0,0.0,0.0,-1.0,0700-0759,8.0,754.0,836.0,8.0,902,844.0,-18.0,0.0,0.0,-2.0,0900-0959,0.0,,0.0,127.0,118.0,102.0,1.0,692.0,3,,,,,,,,,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [3]:
df.shape

(27000, 110)

In [4]:
df = df[['Year', 'Quarter', 'Month','FlightDate','Reporting_Airline','OriginState','DestState','DepTime','ArrTime','DepDelay','ArrDelay','Distance','DistanceGroup','Flights']]

In [5]:
df.head()

Unnamed: 0,Year,Quarter,Month,FlightDate,Reporting_Airline,OriginState,DestState,DepTime,ArrTime,DepDelay,ArrDelay,Distance,DistanceGroup,Flights
0,1998,2,4,1998-04-02,AS,WA,WA,1330.0,1420.0,0.0,-6.0,224.0,1,1.0
1,2013,2,5,2013-05-13,EV,NJ,VA,1255.0,1411.0,-6.0,-12.0,277.0,2,1.0
2,1993,3,9,1993-09-25,UA,IL,IL,1723.0,1815.0,33.0,45.0,130.0,1,1.0
3,1994,4,11,1994-11-12,HP,CA,AZ,1309.0,1538.0,24.0,41.0,370.0,2,1.0
4,2017,3,8,2017-08-17,UA,IA,CO,746.0,844.0,-9.0,-18.0,692.0,3,1.0


In [6]:
df.columns

Index(['Year', 'Quarter', 'Month', 'FlightDate', 'Reporting_Airline',
       'OriginState', 'DestState', 'DepTime', 'ArrTime', 'DepDelay',
       'ArrDelay', 'Distance', 'DistanceGroup', 'Flights'],
      dtype='object')

In [7]:
# Randomly sample 500 data points. Setting the random state to be 42 so that we get same result.
data = df.sample(n=500, random_state=42)

In [8]:
data.shape

(500, 14)

### Structure

#### plotly.graph_objects

1.  Scatter plot 

    Theme: How departure time changes with respect to airport distance

2.  - Line plot

    Theme: Extract average monthly delay time and see how it changes over the year

#### plotly.express

1.  Bar chart 

    Theme: Extract number of flights from a specific airline that goes to a destination

2.  Bubble chart

    Theme: Get number of flights as per reporting airline

3.  Histogram

    Theme: Get distribution of arrival delay

4.  Pie chart

    Theme: Proportion of distance group by month (month indicated by numbers)

5.  Sunburst chart

    Theme: Hierarchical view in othe order of month and destination state holding value of number of flights


## plotly.graph_objects

### 1. Scatter plot

In [9]:
# First we create a figure using go.Figure and adding trace to it through go.scatter
fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers', marker=dict(color='red')))
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')
# Display the figure
fig.show()

![image.png](attachment:image.png)

### 2. Line chart

In [10]:
# Extract average monthly arrival delay time and see how it changes over the year
df_month = data.groupby('Month')['ArrDelay'].mean().reset_index()
df_month

Unnamed: 0,Month,ArrDelay
0,1,2.232558
1,2,2.6875
2,3,10.868421
3,4,6.229167
4,5,-0.27907
5,6,17.310345
6,7,5.088889
7,8,3.121951
8,9,9.081081
9,10,1.2


In [37]:
# First we create a figure using go.Figure and adding trace to it through go.scatter
fig = go.Figure(data=go.Scatter(x=df_month['Month'], y=df_month['ArrDelay'], mode='lines', marker=dict(color='green')))
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title to x and y axis.
fig.update_layout(title='Average arrival delay by month', xaxis_title='Month', yaxis_title='Arrival delay')
# Display the figure
fig.show()

![image.png](attachment:image.png)

## Plotly.express

### 1. Bar Chart

In [18]:
# Compute total number of flights in each combination of destinatoin 
df_dest = data.groupby('DestState')['Flights'].sum().reset_index()
df_dest.head()

Unnamed: 0,DestState,Flights
0,AK,4.0
1,AL,3.0
2,AZ,8.0
3,CA,68.0
4,CO,20.0


In [20]:
fig = px.bar(df_dest, x="DestState", y="Flights", title='Total number of flights to the destination state split by reporting airline') 
fig.show()

![image.png](attachment:image.png)

### 3. Bubble chart

In [22]:
# Group the data by reporting airline and get number of flights
df_reporting = data.groupby('Reporting_Airline')['Flights'].sum().reset_index()
df_reporting.head()

Unnamed: 0,Reporting_Airline,Flights
0,9E,5.0
1,AA,57.0
2,AS,14.0
3,B6,10.0
4,CO,12.0


In [29]:
fig = px.scatter(df_reporting, x="Reporting_Airline", y="Flights", size='Flights',hover_name='Reporting_Airline', title='Reporting airline vs. Total number of flights') 
fig.show()

![image.png](attachment:image.png)

### 3. Histogram

In [30]:
# Set missing values to 0
data['ArrDelay'] = data['ArrDelay'].fillna(0)

In [33]:
fig = px.histogram(data,x='ArrDelay')
fig.show()

![image.png](attachment:image.png)

![image.png](attachment:image.png)

### 4. Pie chart

In [34]:
fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by month')
fig.show()

![image.png](attachment:image.png)

### 5. Sunburst chart*

*Idea: Hierarchical view in othe order of month and destination state holding value of number of flights*

In [36]:
fig = px.sunburst(data, path=['Month', 'DestState'], values='Flights')
fig.show()


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.



![image.png](attachment:image.png)