# Welcome to project Icaras
---

## Scenario

Your company was asked to analyse Commercial Airfligh data for a sustainability study. Your group is the best team of Data Scientists in the company's roster and are given the challenge. By undertaking this task, your company expects to contribute to the green transition by having a more savvy taskforce. You decide to create some python tools for the challenge.

## Goal

For this project, we will be using data from [International Air Transport Association](https://www.iata.org/). The datasets can be found [here](https://gitlab.com/adpro1/adpro2024/-/raw/main/Files/flight_data.zip?inline=false).

Go over the datasets. You were not given a data dictionary, but the fields can be easily discovered with an online search, as this is heavily used data.

<div class="alert alert-danger">
    <b> THE MOST IMPORTANT TOOLS FOR A DATA SCIENTIST IS PATIENCE AND COMMUNICATION</b>
    <br>
    <b> Discuss the contents of the dataset with your colleagues. Understanding the data is a priority. </b>
</div>

Use whatever python tools you find apropriate.

## Structure of the project

You are going to build a **Showcase Notebook** that doubles as a presentation for your analysis.  
Keep all the .py files in separate directories. The only files in the main directory of the project should be the **Showcase Notebook** and the several configuration files (.yml, .gitignore, and others). Everything else should have their own directories.

### Day 1, Phase 1

- [ ] One of you will create a gitlab/github repository (it does not matter who). __THE NAME OF THE REPOSITORY MUST BE "Group_XX" where XX is the number of your group! If you are group 3, then XX must be 03. Always use two digits and an underscore!__
- [ ] Initialize the repo with a README.md file, a proper license, and a .gitignore for the programming language you will use. The README.md file __MUST__ have your emails in a way that it is possible to copy and paste it into an email.
- [ ] The one who created the repository will then give __Maintainer__ permissions to the rest of the group. Check under "Project Information" > "Members".
- [ ] Every element of the group clones the repository to their own laptops.

### Day 1, Phase 2

- [ ] The class you decide the create for the project has finally been named after a brief internal fight and is __PEP8 compliant, like the entire project__.

The class will have several methods, which you will __not__ develop in the master branch.  
Document everything!  
Make your calls compliant with __pydantic__ and __static type checking__ when appliable.

- [ ] During the _init_ method, your class must download the data file into a __downloads/__ directory in the root directory of the project (main project directory). If the data file already exists, the method will not download it again.
- [ ] The _init_ method must also read the datasets into a corresponding pandas dataframe which become attributes for your class. Remove superfluous columns.
- [ ] Develop a function to calculate the real __distances__ between airports in kilometers in its own .py file with the information in the datasets. Approximate the earth to a sphere (it is safe to disregard __altitude__). Develop a unit test to this function with three cases, where one must be between two airports in different continents. Implement a way to make the distances between airports part of the information contained in your future class instance.

## Day 1, Phase 3

- [ ] Develop a first method that takes a country as an input and plots a map with the locations of its airports (as well as a map for that country). If the country does not exist, return a useful error message. __DO NOT USE INTERACTIVE PROMPTS; IT SHOULD REALLY JUST BE AN ARGUMENT!__
- [ ] Develop a second method called __distance_analysis__. This should plot the distribution of flight distances for all flights.
- [ ] Develop a third method that receives an airport as an input and an optional argument called __internal__ with a value of __False__ by default. If __internal__ is __True__, then this method should plot only the flights leaving this airport with a destination in the same country. Otherwise, it plots all flights.
- [ ] Develop a fourth method that may receive a string with a country or a list of country strings but has __None__ by default. This method should plot the __N__ most used airplane models by number of routes. If the input argument is __None__ it should plot for all dataset. If it receives only a country or list of countries, it should plot just for that subset.
- [ ] Develop a fifth method that receives a country name as an input and an optional argument called __internal__ with a value of __False__ by default. If __internal__ is __True__, then this method should plot only the flights leaving the country with a destination in the same country. Otherwise, it plots all flights. This is analogous to the third method, but for country now.

### Day 1, Phase 4

- [ ] Make a "showcase notebook" where you import your __Class__ and showcase all the methods you developed. Tell a story about your analysis and findings in the showcase notebook. Use all methods with several complementary options. If you feel lost about what story to tell, don't hesitate to contact the professor.

<div class="alert alert-info">
    <b> REMEMBER: The first delivery is until March 4 23:59:59 and it is not graded. It is used as course correction. The delivery is the git repo link. </b>
</div>

<div class="alert alert-danger">
    <b> The notebook must RUN from start to finish. If one runs all cells again, the output must be the same.</b>
</div>

<div class="alert alert-info">
    <b> REMEMBER: IT IS OK TO PROTOTYPE CODE IN NOTEBOOKS, BUT THE FINAL CLASS MUST BE IN A SINGLE .py FILE! </b>
    <br>
    <b> The final delivery of the project is the "showcase" notebook from Phase 4. Don't place this notebook together with prototyping notebooks.</b>
    <br>
    <b> Prototyping notebooks must have their own separate directory.</b>
    <br>
    <b> We will only consider contents in your "master" repository.</b>
</div>

<div class="alert alert-warning">
    <b>When in doubt, ask.</b>
</div>




# Welcome to project Icaras - Part 2
---
## Rules
1. Be sure that the group submits [the link to the repo on moodle](https://moodle.novasbe.pt/mod/assign/view.php?id=318193).
2. We will pull the existing versions by 0:00, Saturday 16 March 2024. Remember: the pushes have a timestamp!

---
<div class="alert alert-danger">
    <b> NEVER USE USER PROMPTS, IT IS INFINITELY ANNOYING!! </b>
    <br>
    <b> Always use arguments for your methods.</b>
</div>


---
## Scenario (continuation)

Your company was asked to analyse Commercial Airfligh data for a sustainability study. Your group is the best team of Data Scientists in the company's roster and are given the challenge. By undertaking this task, your company expects to contribute to the green transition by having a more savvy taskforce. You decide to create some python tools for the challenge.

You spent the first day doing a lot of the code heavy lifting. It is now time to do some polishing. As you know your project might be picked up for an analysis presentation, you add an introduction about your group on the _README.md_ file. Be sure to add your **names**, **your student numbers** and **your e-mails**. It is time to add more features to the class so you can present the analysis in the showcase notebook.

## Goal
For this project, we will be using data from [International Air Transport Association](https://www.iata.org/). The datasets can be found [here](https://gitlab.com/adpro1/adpro2024/-/raw/main/Files/flight_data.zip?inline=false).

Go over the datasets. You were not given a data dictionary, but the fields can be easily discovered with an online search, as this is heavily used data.

<div class="alert alert-danger">
    <b> THE MOST IMPORTANT TOOLS FOR A DATA SCIENTIST IS PATIENCE AND COMMUNICATION</b>
    <br>
    <b> Discuss the contents of the dataset with your colleagues. Understanding the data is a priority. </b>
</div>

Use whatever python tools you find aropriate.

Day two is beginning.

### Day 2, Phase 1: Add Info with an LLM

- [ ] Define a new me*no arguments and prints only the list of aircraft models (Names)ircraf.
- [ ] Define a new method called **aircraft_info** that receives a string called _aircraft_name_. If the string is **NOT** in the list of aircrafts in the data, it should return an exception and present a way to guide the user into how they could choose a correct aircraft name.
- [ ] The latter method should use an LLM to print out a table of specifications about the aircraft model in Markdown.
- [ ] Define a new method called **airport_info** that does the same but for airports (don't make checks in this method, you are already demonstrating you understood it in the case for aicrafts).

<div class="alert alert-danger">
    <b> Do not include the API KEY in the project. Declare the API KEY as a system variable.</b>
    <br>
    <b> If the API KEY is not working, let me know ASAP. </b>
</div>

### Day 2, Phase 2: Decarbonisation

For this project, and for the sake of simplicity, flights under 1000km can be considered short-haul flights, although [there are several definitions](https://en.wikipedia.org/wiki/Flight_length).  
Let's do a mini-case study: Choose a country with more than 20 internal routes. This already accounts for "A to B" and "B to A".

- [ ] Refine the fifth method from Day 1: it should now also receive a float, which will be the cutoff distance for short-haul flight definition. The plot should now reflect the difference between long-haul and short-haul flights (use color, be considerate to color blind people), using the cutoff distance selected in the argument.
- [ ] How many flight routes could be considered short-haul for your country of choice? What is the total distance between airports considered short-haul flights? Print this info as a plot annotation. (Please note we want total distances, don't double count routes "A to B" and "B to A".
- [ ] Research question: a plan to cut emissions is to replace short-haul flights with rail services. Find a reference for the ratio between emissions from flights and your alternative (just a ballpark number from a credible source, include a link to your source in the showcase notebook). Taking into account all flights from your country, both internal and external, by how much would you lower flight emissions? Refine the method to also add this as an annotation onto the plot. 

### Day 2, Phase 3: Cleaning up

- [ ] Add a yaml file to git with all the packages you used, a conda environment file. This file will be used to generate an environment where your code will be ran. Remember to make it OS independent.
- [ ] Use sphinx to generate a __docs__ directory that will showcase the documentation of your code. Remember to comment .gitignore appropriately so everything is included. Update README.md to tell the user how to start using the project.

---
## Grading

Between the two parts, there are 20 gradable items in both Part 1 and 2. All items are 1 point out of 20 except for Day 1, Phase 1, which is 1 point total for the 4 items (correctly setting up git remote).

<div class="alert alert-danger">
    <b> REMEMBER: IT IS OK TO PROTOTYPE CODE IN NOTEBOOKS, BUT THE FINAL CLASS MUST BE IN A SINGLE .py FILE! </b>
    <br>
    <b> The final delivery of the project is the "showcase" notebook. Don't place this notebook together with prototyping notebooks.</b>
    <br>
    <b> Prototyping notebooks must have their own separate directory.</b>
    <br>
    <b> We will only consider contents in your "master" repository before the end of the deadline.</b>
</div>ly the list of aircraft models (Names)ily the list of aircraft models (Names)ily the list of aircraft models (Names)i

# NOTES:
* Implement a way to make the distances between airports part of the information contained in your future class instance.
* Make your calls compliant with __pydantic__ and __static type checking__ when appliable.
* Error handling - plot airports
*  Add a yaml file to git with all the packages you used, a conda environment file. This file will be used to generate an environment where your code will be ran. Remember to make it OS independent.
*  Use sphinx to generate a **docs** directory that will showcase the documentation of your code. Remember to comment .gitignore appropriately so everything is included. Update README.md to tell the user how to start using the project.
*  BLACK AND PYLINT


# Methods:
1) Develop a first method that takes a country as an input and plots a map with the locations of its airports (as well as a map for that country). If the country does not exist, return a useful error message.
2) Develop a second method called __distance_analysis__. This should plot the distribution of flight distances for all flights.
3) Develop a third method that receives an airport as an input and an optional argument called __internal__ with a value of __False__ by default. If __internal__ is __True__, then this method should plot only the flights leaving this airport with a destination in the same country. Otherwise, it plots all flights.
4) Develop a fourth method that may receive a string with a country or a list of country strings but has __None__ by default. This method should plot the __N__ most used airplane models by number of routes. If the input argument is __None__ it should plot for all dataset. If it receives only a country or list of countries, it should plot just for that subset.
5) Develop a fifth method that receives a country name as an input and an optional argument called __internal__ with a value of __False__ by default. If __internal__ is __True__, then this method should plot only the flights leaving the country with a destination in the same country. Otherwise, it plots all flights. This is analogous to the third method, but for country now.

6) Define a new method called **aircrafts** that receives no arguments and prints only the list of aircraft models (Names)ircraf.
7) Define a new method called **aircraft_info** that receives a string called _aircraft_name_. If the string is **NOT** in the list of aircrafts in the data, it should return an exception and present a way to guide the user into how they could choose a correct aircraft name. Method should use an LLM to print out a table of specifications about the aircraft model in Markdown.
8) Define a new method called **airport_info** that does the same but for airports (don't make checks in this method, you are already demonstrating you understood it in the case for aicrafts).
   