Project: Flight Delays and Cancellations

### 1. Introduction to the Dataset

The U.S. Department of Transportation's (DOT) [Buereau of Transportation Statistics](https://www.bts.gov/) tracks the performance of domestic flights operated by large airline carriers. 

In this project we will work with [dataset](https://www.kaggle.com/usdot/flight-delays) compiled by Kaggle providing summary information on the number of on-time, delayed, canceled, and diverted flights published by DOT's montly Air Travel Consumer [Report](https://www.transportation.gov/airconsumer/air-travel-consumer-reports) for the year 2015. 

#### 1.1 Acknowledgements
[Kaggle](https://www.kaggle.com) and DOT's Bureau of Transportation Statistics. 

### 2. Dataset and [Data Dictionary](https://www.kaggle.com/usdot/flight-delays/data)

The raw dataset is included as three files: airlines.csv, airports.csv, flights_20_perc.csv in the data folder. Use the [link](https://www.kaggle.com/usdot/flight-delays/data) to get an understanding of the column metadata for each of the files used in the data. 

Also, note that the file (flights_20_perc.csv) only contains 20% random sample of the original data available on Kaggle. It still has information on 1 million flights. This reduction was done to make the dataset size manageable for new programmers. 


### 3. Project Details
Flight delays and cancellations are typical problems all of us face when traveling. The airports and airline industry try to minimize the impact on the customers and improve their experience. On the other the airline industry also optimizes the flights to fly through their network of cities. For example, a flight from Seattle flies to Chicago, and then further is scheduled to fly from Chicago to New York City. As a result of this efficient utilization of the flights, a popular belief among travelers is that **flight delays or cancellations happen more often as the day progresses**. Though it seems intuitive, we as data scientists will always look at data to check our intuition. 

#### 3.1 Required Analysis
Use the data to prove or disprove the following claims. Your analysis should be accompanied by visualizations where appropriate. Not every answer needs a visualization; it is up to you to decide where a visualization is needed or appropriate.

1. Do flight delays happen more often later in the day compared to earlier in the day? 
2. Does the response to Claim 1 depend on the month of the year? 
3. Does the response to Claim 1 depend on the airline? Which airlines have this phenomenon more pronounced and less pronounced? 
4. Do flight cancellation happen more often later in the day than earlier in the day? 
5. Does the response to Claim 4 depend on the month of the year and on the airline? 
5. State based analysis: What are the top three states with lowest average flight delay? What are the top three states with highest average flight delay? Come up with qualitative (and/or quantitative) reason for why you think these states are lowest and highest flight delays. 
6. Taxi time in the flight is most frustrating experience to me. Can you let me know which airports I should avoid and which ones I should prefer? Answer this based on average taxi time (both taxi in and taxi out)? 

#### Hint:

Some of the questions above require understanding of pandas GroupBy. For example, group by hour of the day or month or airline, etc, and aggregating various informations that will support your arguments.  

#### 3.2 Important Notes
1. I do not know if any of the above phenomenon are true in the data I provided. So, do not only attempt to prove them, you could disprove them as well. There is no one right answer. Your responses are graded based on the strength of the arguments and validity of your analysis to prove your points with the data.
2. In order to prove or disprove each of the above claim, do the analysis with the data. After your analysis, write a short report for each question referring your analysis in your report. 
3. The data (flights_20_perc.csv) is still a **large data file**. You will have to learn (which operations create a copy of the DataFrame, which operations just manipulate existing DataFrame) to work with the large amount of data without requiring too much memory. 
    * Most importantly, learn how to use the keyword argument `inplace=True`. In this way you are modifying the existing `DataFrame` rather than create a copy. 
    
#### 3.3 Bonus Points Opportunity
We investigated above if there is a pattern that the delays in the flights happen more often as the day progresses. Explore and find more interesting patterns (not necessarily based on delays or cancellations) from the data. This bonus question is only to pick your analytical and creative mind. Although this data (airlines.csv, flights_20_perc.csv, airports.csv) is your primary data, you may chose to combine other supplementary datasets for your analysis.

My advice is to work on this bonus part only after finishing the required analysis above and do not spend too much time on this bonus question.

Only submissions with the most interesting findings will get this additional bonus points. Judgement will be made on correctness, innovativeness/creativeness, strength of the argument, and the rigor in the data analysis. The bonus points will be added to your project score.
 

### 4 Submission Expectations and Requirements

1. **Project report**: You are required to write your report using Markdown cells in Jupyter Notebook on the Vocareum cloud platform. In that way, you can combine your data analysis (Python code) and your report. Use headings in Markdown to clearly distinguish between your report and Python code. A blank notebook named 'project_report.ipynb' has been created for you in Vocareum in which you can write your code and report.


#### 4.1 Submission and Due Date
You will submit one Jupyter notebook that includes both your analysis (Python code) and report. Submission is via the Submit button on Vocareum, exactly as in the assignments you did throughout the course. 

The submission due date for the project is <b>Thursday, Dec 19 at 11:59PM Eastern.</b> There will be no exceptions i.e. no late submissions will be accepted.


