# Airline Analysis

In this project, you'll imagine that you work for a travel agency and need to know the ins and outs of airline prices for your clients. You want to make sure that you can find the best deal for your client and help them to understand how airline prices change based on different factors.

You decide to look into your favorite airline. The data include:
- `miles`: miles traveled through the flight
- `passengers`: number of passengers on the flight
- `delay`: take-off delay in minutes
- `inflight_meal`: is there a meal included in the flight?
- `inflight_entertainment`: are there free entertainment systems for each seat?
- `inflight_wifi`: is there complimentary wifi on the flight?
- `day_of_week`: day of the week of the flight
- `weekend`: did this flight take place on a weekend?
- `coach_price`: the average price paid for a coach ticket
- `firstclass_price`: the average price paid for first-class seats
- `hours`: how many hours the flight took
- `redeye`: was this flight a redeye (overnight)?

In this project, you'll explore a dataset for the first time and get to know each of these features. Keep in mind that there's no one right way to address each of these questions. The goal is simply to explore and get to know the data using whatever methods come to mind.

You will be working in this file. Note that there is the file **Airline Analysis_Solution.ipynb** that contains the solution code for this project. We highly recommend that you complete the project on your own without checking the solution, but feel free to take a look if you get stuck or if you want to compare answers when you're done.

In order to get the plots to appear correctly in the notebook, you'll need to show and then clear each plot before creating the next one using the following code:

```py
plt.show() # Show the plot
plt.clf() # Clear the plot
```

Clearing the plot will not erase the plot from view, it will just create a new space for the following graphic.

## Univariate Analysis

1. What do coach ticket prices look like? What are the high and low values? What would be considered the average? Does $500 seem like a good price for a coach ticket?

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels
import matplotlib.pyplot as plt
import math

## Read in Data
flight = pd.read_csv("flight.csv")
print(flight.head())

## Task 1
plt.hist(flight.coach_price, bins=20)
plt.title("Coach price distribution")
plt.show()
plt.clf()
print(f"Mean coach price: {np.mean(flight.coach_price)}")

2. Now visualize the coach ticket prices for flights that are 8 hours long. What are the high, low, and average prices for 8-hour-long flights? Does a $500 dollar ticket seem more reasonable than before?

In [None]:
## Task 2

plt.hist(flight.coach_price[flight.hours==8], bins=20)
plt.title("Coach price distribution for 8-hour flight")
plt.show()
plt.clf()
print(f"Mean coach price: {np.mean(flight.coach_price[flight.hours==8])}")


3. How are flight delay times distributed? Let's say there is a short amount of time between two connecting flights, and a flight delay would put the client at risk of missing their connecting flight. You want to better understand how often there are large delays so you can correctly set up connecting flights. What kinds of delays are typical?

In [None]:
## Task 3
plt.hist(flight.delay[flight.delay < 50], bins=20)
plt.title("Delay distribution")
plt.show()
plt.clf()
print(f"Mean delay: {np.mean(flight.delay[flight.delay < 50])}")


## Bivariate Analysis

4. Create a visualization that shows the relationship between coach and first-class prices. What is the relationship between these two prices? Do flights with higher coach prices always have higher first-class prices as well?

In [None]:
## Task 4
#plt.scatter(flight.coach_price, flight.firstclass_price, alpha=0.5)
sns.lmplot(x = "coach_price", y = "firstclass_price", data = flight, line_kws={'color': 'black'}, lowess=True)
plt.title("Coach and first class prices")
plt.xlabel("Coach prices")
plt.ylabel("First class prices")
plt.show()
plt.clf()


5. What is the relationship between coach prices and inflight features &mdash; inflight meal, inflight entertainment, and inflight WiFi? Which features are associated with the highest increase in price?

In [None]:
## Task 5
sns.histplot(data=flight, x='coach_price', bins=20, hue='inflight_meal', multiple='stack')
plt.title("Price distribution with inflight meal")
plt.show()
plt.clf()
sns.histplot(data=flight, x='coach_price', bins=20, hue='inflight_entertainment', multiple='stack')
plt.title("Price distribution with inflight entertainment")
plt.show()
plt.clf()
sns.histplot(data=flight, x='coach_price', bins=20, hue='inflight_wifi', multiple='stack')
plt.title("Price distribution with inflight WiFi")
plt.show()
plt.clf()



6. How does the number of passengers change in relation to the length of flights?

In [None]:
## Task 6
#plt.scatter(flight.miles, flight.passengers, alpha=0.5)
less_than_180 = flight[flight.passengers < 180]
more_than_180 = flight[flight.passengers > 180]
sns.lmplot(x = "miles", y = "passengers", data = less_than_180, line_kws={'color': 'black'}, lowess=True)
plt.title("Miles and passengers on small flights")
plt.xlabel("Miles")
plt.ylabel("Passengers")
plt.show()
plt.clf()
sns.lmplot(x = "miles", y = "passengers", data = more_than_180, line_kws={'color': 'black'}, lowess=True)
plt.title("Miles and passengers on large flights")
plt.xlabel("Miles")
plt.ylabel("Passengers")
plt.show()
plt.clf()

## Multivariate Analysis

7. Visualize the relationship between coach and first-class prices on weekends compared to weekdays.

In [None]:
## Task 7
sns.scatterplot(x="coach_price", y="firstclass_price", hue="weekend", data=flight)
plt.xlabel("Coach prices")
plt.ylabel("First class prices")
plt.title('Prices on weekend vs weekdays')
plt.show()
plt.clf()




8. How do coach prices differ for redeyes and non-redeyes on each day of the week?

In [None]:
## Task 8
sns.histplot(data=flight.loc[flight.redeye == 'Yes'], x='coach_price', bins=20, hue='day_of_week', multiple='layer', alpha=0.5)
plt.title("Coach price distribution per day for redeye flights")
plt.show()
plt.clf()

sns.histplot(data=flight.loc[flight.redeye == 'No'], x='coach_price', bins=20, hue='day_of_week', multiple='layer', alpha=0.5)
plt.title("Coach price distribution per day for non-redeye flights")
plt.show()
plt.clf()


