![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

# Lab | Importing and Exporting Files


## Introduction

Without data, we couldn't really be data scientists. Therefore, this lab will discuss the task of importing and exporting data into pandas using different file formats.

## Getting Started

Follow the instructions and add your code and explanations as necessary. By the end of this lab, you will have learned how to import and export JSON, csv, and Excel files.

## Resources

[Pandas - the `read_csv` function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)

[Pandas - the `read_json` function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html)

[Pandas - the `read_excel` function](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html)

# Challenge 1 - Working with JSON files

Import the pandas library.

In [None]:
# your code here

In the next cell, load the data in `nasa.json` in the `data` folder and load it into a pandas dataframe. Name the dataframe `nasa`.

In [None]:
# your code here

Now that we have loaded the data, let's examine it using the `head()` function.

Expected output:

|    |   :@computed_region_cbhk_fwbd |   :@computed_region_nnqa_25f4 | fall   | geolocation                                            |   id |   mass | name     | nametype   | recclass    |   reclat |    reclong | year                    |
|---:|------------------------------:|------------------------------:|:-------|:-------------------------------------------------------|-----:|-------:|:---------|:-----------|:------------|---------:|-----------:|:------------------------|
|  0 |                           nan |                           nan | Fell   | {'type': 'Point', 'coordinates': [6.08333, 50.775]}    |    1 |     21 | Aachen   | Valid      | L5          |  50.775  |    6.08333 | 1880-01-01T00:00:00.000 |
|  1 |                           nan |                           nan | Fell   | {'type': 'Point', 'coordinates': [10.23333, 56.18333]} |    2 |    720 | Aarhus   | Valid      | H6          |  56.1833 |   10.2333  | 1951-01-01T00:00:00.000 |
|  2 |                           nan |                           nan | Fell   | {'type': 'Point', 'coordinates': [-113, 54.21667]}     |    6 | 107000 | Abee     | Valid      | EH4         |  54.2167 | -113       | 1952-01-01T00:00:00.000 |
|  3 |                           nan |                           nan | Fell   | {'type': 'Point', 'coordinates': [-99.9, 16.88333]}    |   10 |   1914 | Acapulco | Valid      | Acapulcoite |  16.8833 |  -99.9     | 1976-01-01T00:00:00.000 |
|  4 |                           nan |                           nan | Fell   | {'type': 'Point', 'coordinates': [-64.95, -33.16667]}  |  370 |    780 | Achiras  | Valid      | L6          | -33.1667 |  -64.95    | 1902-01-01T00:00:00.000 |

In [None]:
# your code here

#### The `value_counts()` function is commonly used in pandas to find the frequency of every value in a column.

In the cell below, use the `value_counts()` function to determine the frequency of all types of asteroid landings by applying the function to the `fall` column.

Expected output:

````python
            Fell     996
            Found      4
            Name: fall, dtype: int64
````

In [None]:
# your code here

Finally, let's save the dataframe as a json file again. Save the dataframe using the `orient=records` argument and name the file `nasa-output.json`. Remember to save the file inside the `data` folder.

In [None]:
# your code here

# Challenge 2 - Working with CSV and Other Separated Files

CSV files are more commonly used as dataframes. In the cell below, load the file from the URL provided using the `read_csv()` function in pandas. Starting version 0.19 of pandas, you can load a CSV file into a dataframe directly from a URL without having to load the file first and then transform it. The dataset we will be using contains information about NASA shuttles.

In the cell below, we define the column names and the URL of the data. Following this cell, read the tst file to a variable called `shuttle`. Since the file does not contain the column names, you must add them yourself using the column names declared in `cols` using the `names` argument. Additionally, a tst file is space separated, make sure you pass ` sep=' '` to the function.

In [None]:
cols = ['time', 'rad_flow', 'fpv_close', 'fpv_open', 'high', 'bypass', 'bpv_close', 'bpv_open', 'class']
tst_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/shuttle/shuttle.tst'

In [None]:
# your code here

Let's verify that this worked by looking at the `head()` function.

Expected output:

|    |   time |   rad_flow |   fpv_close |   fpv_open |   high |   bypass |   bpv_close |   bpv_open |   class |
|---:|-------:|-----------:|------------:|-----------:|-------:|---------:|------------:|-----------:|--------:|
| 55 |      0 |         81 |           0 |         -6 |     11 |       25 |          88 |         64 |       4 |
| 56 |      0 |         96 |           0 |         52 |     -4 |       40 |          44 |          4 |       4 |
| 50 |     -1 |         89 |          -7 |         50 |      0 |       39 |          40 |          2 |       1 |
| 53 |      9 |         79 |           0 |         42 |     -2 |       25 |          37 |         12 |       4 |
| 55 |      2 |         82 |           0 |         54 |     -6 |       26 |          28 |          2 |       1 |

In [None]:
# your code here

To make life easier for us, let's turn this dataframe into a comma separated file by saving it using the `to_csv()` function. Save `shuttle` into the file `shuttle.csv` and ensure the file is comma separated, that we are not saving the index column and that the file is saved inside the `data` folder.

In [None]:
# your code here

# Challenge 3 - Working with Excel Files

We can also use pandas to convert excel spreadsheets to dataframes. Let's use the `read_excel()` function. In this case, `astronauts.xls` is in the `data` folder. Read this file into a variable called `astronaut`. 

Note: Make sure to install the `xlrd` library if it is not yet installed.

````python
pip install xlrd
````

In [None]:
# your code here

Use the `head()` function to inspect the dataframe.

Expected output:

|    | Name             |   Year |   Group | Status   | Birth Date          | Birth Place   | Gender   | Alma Mater                                                    | Undergraduate Major    | Graduate Major        | Military Rank   | Military Branch        |   Space Flights |   Space Flight (hr) |   Space Walks |   Space Walks (hr) | Missions                                                                    | Death Date   |   Death Mission |
|---:|:-----------------|-------:|--------:|:---------|:--------------------|:--------------|:---------|:--------------------------------------------------------------|:-----------------------|:----------------------|:----------------|:-----------------------|----------------:|--------------------:|--------------:|-------------------:|:----------------------------------------------------------------------------|:-------------|----------------:|
|  0 | Joseph M. Acaba  |   2004 |      19 | Active   | 1967-05-17 00:00:00 | Inglewood, CA | Male     | University of California-Santa Barbara; University of Arizona | Geology                | Geology               | nan             | nan                    |               2 |                3307 |             2 |                 13 | STS-119 (Discovery), ISS-31/32 (Soyuz)                                      | NaT          |             nan |
|  1 | Loren W. Acton   |    nan |     nan | Retired  | 1936-03-07 00:00:00 | Lewiston, MT  | Male     | Montana State University; University of Colorado              | Engineering Physics    | Solar Physics         | nan             | nan                    |               1 |                 190 |             0 |                  0 | STS 51-F (Challenger)                                                       | NaT          |             nan |
|  2 | James C. Adamson |   1984 |      10 | Retired  | 1946-03-03 00:00:00 | Warsaw, NY    | Male     | US Military Academy; Princeton University                     | Engineering            | Aerospace Engineering | Colonel         | US Army (Retired)      |               2 |                 334 |             0 |                  0 | STS-28 (Columbia), STS-43 (Atlantis)                                        | NaT          |             nan |
|  3 | Thomas D. Akers  |   1987 |      12 | Retired  | 1951-05-20 00:00:00 | St. Louis, MO | Male     | University of Missouri-Rolla                                  | Applied Mathematics    | Applied Mathematics   | Colonel         | US Air Force (Retired) |               4 |                 814 |             4 |                 29 | STS-41 (Discovery), STS-49 (Endeavor), STS-61 (Endeavor), STS-79 (Atlantis) | NaT          |             nan |
|  4 | Buzz Aldrin      |   1963 |       3 | Retired  | 1930-01-20 00:00:00 | Montclair, NJ | Male     | US Military Academy; MIT                                      | Mechanical Engineering | Astronautics          | Colonel         | US Air Force (Retired) |               2 |                 289 |             2 |                  8 | Gemini 12, Apollo 11                                                        | NaT          |             nan |

In [None]:
# your code here

Use the `value_counts()` function to find the most popular undergraduate major among all astronauts.

Expected output:

````
            Physics                                                                35
            Aerospace Engineering                                                  33
            Mechanical Engineering                                                 30
            Aeronautical Engineering                                               28
            Electrical Engineering                                                 23
                                                                                   ..
            Physics & Astronautical Engineering                                     1
            Philosophy                                                              1
            Psychology                                                              1
            Aeronautics & Astronautics; Earth, Atmospheric & Planetary Sciences     1
            Electrical Science                                                      1
            Name: Undergraduate Major, Length: 83, dtype: int64

````

In [None]:
# your code here

Due to all the commas present in the cells of this file, let's save it as a tab separated csv file. In the cell below, save `astronaut` as a **tab separated file** using the `to_csv` function. Call the file `astronaut.csv`. Remember to remove the index column and save the file in the `data` folder.

In [None]:
# your code here

# Bonus Challenge - Fertility Dataset

Visit the following [URL](https://archive.ics.uci.edu/ml/datasets/Fertility) and retrieve the dataset as well as the column headers. Determine the correct separator and read the file into a variable called `fertility`. Examine the dataframe using the `head()` function. 

Expected output:

|    |   season |   age |   childish-disease |   trauma |   surgical-intervention |   fevers |   alcoholic |   smoking |   sitting | output   |
|---:|---------:|------:|-------------------:|---------:|------------------------:|---------:|------------:|----------:|----------:|:---------|
|  0 |    -0.33 |  0.69 |                  0 |        1 |                       1 |        0 |         0.8 |         0 |      0.88 | N        |
|  1 |    -0.33 |  0.94 |                  1 |        0 |                       1 |        0 |         0.8 |         1 |      0.31 | O        |
|  2 |    -0.33 |  0.5  |                  1 |        0 |                       0 |        0 |         1   |        -1 |      0.5  | N        |
|  3 |    -0.33 |  0.75 |                  0 |        1 |                       1 |        0 |         1   |        -1 |      0.38 | N        |
|  4 |    -0.33 |  0.67 |                  1 |        1 |                       0 |        0 |         0.8 |        -1 |      0.5  | O        |


In [None]:
url="https://archive.ics.uci.edu/ml/machine-learning-databases/00244/fertility_Diagnosis.txt"
# Look in Google for a way to retrieve this data!