# Task 1: Instructions

Import the library you need to work with DataFrames, and load the two datasets (all datasets are located in the datasets folder). Then, take a look at the head of both DataFrames.

- Import `pandas` using the standard alias.
- Load `office_addresses.csv` and assign the resulting DataFrame to `df_office_addresses`.
- Load `employee_information.xls` and assign the resulting DataFrame to `df_employee_addresses`.
- Take a look at the first rows of each DataFrame to familiarize yourself with the data.

## Good to know

This project lets you apply the skills from [Streamlined Data Ingestion in Python](https://www.datacamp.com/courses/streamlined-data-ingestion-with-pandas). We recommend that you are familiar with the content in that course before starting this project.

Helpful links:

- `read_csv()` function [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html).
- `read_excel()` function [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html).

# Task 2: Instructions

People apparently remove titles or headers. Make sure to load the sheet using its index rather than its name, in case someone decides to rename it! Then, add the appropriate column titles to the DataFrame.

- Load the data from the second sheet of `employee_information.xls` and assign the resulting DataFrame to `df_emergency_contact`.
- Assign the list of column names to `emergency_contacts_header`.
- Rename the `df_emergency_contact` DataFrame's columns using the list of column names you just declared.
- Take a look at the first rows of the DataFrame to familiarize yourself with the data.

# Task 3: Instructions

`employee_roles.json` is built as a Python dictionary: the keys are employee IDs, and each employee ID has a corresponding dictionary value holding role, salary, and team information.

- Load the JSON file to a variable `df_employee_roles`, choosing the appropriate orientation.
- Take a look at the first rows of the DataFrame to familiarize yourself with the data.

When reading a JSON file, you need to tell `pandas` how the file is [oriented](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html). If you don't choose the appropriate orientation, your index might become columns, and the columns might become indexes. Check out [this exercise](https://campus.datacamp.com/courses/streamlined-data-ingestion-with-pandas/importing-json-data-and-working-with-apis?ex=3) if you don't remember how to read JSON files.

`read_json()` uses Python dictionaries, which are unordered. Notice the provided line of code: it reorders the columns of the DataFrame you just created alphabetically so you don't run into issues later.

# Task 4: Instructions

Let's merge all DataFrames into one, as initially requested by People Ops! You will be using a left join here, which ensures you keep all the records in the left table. This is done for you and ensures you're not losing any data during the manipulations.

- Merge `df_emergency_contacts` with `df_employee_addresses` using the employee ID. Assign the resulting DataFrame to `df_employees`.
- Merge `df_employee_roles` with `df_employees` using the employee ID.
- Merge `df_office_addresses` with `df_employees` using the country.
- Take a look at the first rows and at the columns (you should not have any duplicate column).

Remember that:

- `df_office_addresses` holds the office addresses.
- `df_empoyee_addresses` holds the employee addresses.
- `df_emergency_contacts` holds the emergency contact information.
- `df_employee_roles` holds more information about employee's roles and education.

# Task 5: Instructions

Let's polish this new `df_employees` DataFrame!

- The columns `employee_first_name` and `employee_last_name` are duplicates of `first_name` and `last_name`. Drop `employee_first_name` and `employee_last_name` and assign the resulting DataFrame to `df_employees_renamed`.
- Assign the list of new column names to `new_header`.
- Rename the columns of `df_employee_renamed` using the `new_header` list.
- Take a look at the first rows of the DataFrame.

You can rename a DataFrame's columns by assigning a list of strings to the DataFrame `columns` attribute.

# Task 6: Instructions

People Ops requested columns to be presented in the following order: `id`, `last_name`, `first_name`, `title`, `team`, `monthly_salary`, `country`, `city`, `street`, `street_number`, `emergency_contact`, `emergency_number`, `emergency_relationship`, `office`, `office_country`, `office_city`, `office_street`, and finally `office_street_number`.

- Declare a list storing the column names ordered as specific by People Ops.
- Reorder the DataFrame's columns.
- Take a look at the result.

You can reorder a DataFrame by passing it a selection of columns in the order you wish.

# Task 7: Instructions

Let's bring these last-minute changes to our DataFrame.

- Set the index of `employees_ordered` to be the employee ID, and then drop the corresponding column.
- Loop through the rows of your new DataFrame, appending the value "Remote" to `status_list` if the `"office"` column value is null and "On-site" otherwise.
- Insert the `status_list` values as a column named "status" right after the `"monthly_salary"` column.
- Take a look at your results.


- You can [loop through a DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html).
- You need to specify where to insert a column, what its name should be, and set its values using a list of predefined values.
- You can [check if a column value is null or not](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.isnull.html).

# Task 8: Instructions

Let's save your work!

- Write `df_employees_final` to a CSV file named "employee_data.csv" directly in the folder where your notebook is stored.

There's a [function for everything](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html) in pandas.