# Import and Export Files 

# Challenge 1 - Working with CSV and Other Separated Files

Import the pandas library

csv files are more commonly used as dataframes. In the cell below, load the file from the URL provided using the `read_csv()` function in pandas. Starting version 0.19 of pandas, you can load a csv file into a dataframe directly from a URL without having to load the file first like we did with the JSON URL. The dataset we will be using contains informtaions about NASA shuttles. 

In the cell below, we define the column names and the URL of the data. Following this cell, read the tst file to a variable called `shuttle`. Since the file does not contain the column names, you must add them yourself using the column names declared in `cols` using the `names` argument. Additionally, a tst file is space separated, make sure you pass ` sep=' '` to the function.

In [1]:
#Your pandas import here:
import pandas as pd


In [2]:
# Run this code:

cols = ['time', 'rad_flow', 'fpv_close', 'fpv_open', 'high', 'bypass', 'bpv_close', 'bpv_open', 'class']
tst_url = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/shuttle/shuttle.tst', sep=" ", names = cols)
tst_url

Unnamed: 0,time,rad_flow,fpv_close,fpv_open,high,bypass,bpv_close,bpv_open,class
55,0,81,0,-6,11,25,88,64,4
56,0,96,0,52,-4,40,44,4,4
50,-1,89,-7,50,0,39,40,2,1
53,9,79,0,42,-2,25,37,12,4
55,2,82,0,54,-6,26,28,2,1
...,...,...,...,...,...,...,...,...,...
80,0,84,0,-36,-29,4,120,116,5
55,0,81,0,-20,25,26,102,76,4
55,0,77,0,12,-22,22,65,42,4
37,0,103,0,18,-16,66,85,20,1


In [3]:
# Your code here:

shuttle = tst_url
shuttle

Unnamed: 0,time,rad_flow,fpv_close,fpv_open,high,bypass,bpv_close,bpv_open,class
55,0,81,0,-6,11,25,88,64,4
56,0,96,0,52,-4,40,44,4,4
50,-1,89,-7,50,0,39,40,2,1
53,9,79,0,42,-2,25,37,12,4
55,2,82,0,54,-6,26,28,2,1
...,...,...,...,...,...,...,...,...,...
80,0,84,0,-36,-29,4,120,116,5
55,0,81,0,-20,25,26,102,76,4
55,0,77,0,12,-22,22,65,42,4
37,0,103,0,18,-16,66,85,20,1


Let's verify that this worked by looking at the `head()` function.

In [4]:
# Your code here:

shuttle.head()

Unnamed: 0,time,rad_flow,fpv_close,fpv_open,high,bypass,bpv_close,bpv_open,class
55,0,81,0,-6,11,25,88,64,4
56,0,96,0,52,-4,40,44,4,4
50,-1,89,-7,50,0,39,40,2,1
53,9,79,0,42,-2,25,37,12,4
55,2,82,0,54,-6,26,28,2,1


To make life easier for us, let's turn this dataframe into a comma separated file by saving it using the `to_csv()` function. Save `shuttle` into the file `shuttle.csv` and ensure the file is comma separated and that we are not saving the index column.

In [33]:
# Your code here:
export_to_csv = shuttle.to_csv("/home/rute/Git/lab-import-export/your-code/shuttle.csv", sep = ",")

# Challenge 2 - Working with Excel Files

We can also use pandas to convert excel spreadsheets to dataframes. Let's use the `read_excel()` function. In this case, `astronauts.xls` is in the same folder that contains this notebook. Read this file into a variable called `astronaut`. 

Note: Make sure to install the `xlrd` library if it is not yet installed.

In [13]:
# Your code here:

astronaut = pd.read_excel('astronauts.xls')


Use the `head()` function to inspect the dataframe.

In [14]:
# Your code here:

astronaut.head()

Unnamed: 0,Name,Year,Group,Status,Birth Date,Birth Place,Gender,Alma Mater,Undergraduate Major,Graduate Major,Military Rank,Military Branch,Space Flights,Space Flight (hr),Space Walks,Space Walks (hr),Missions,Death Date,Death Mission
0,Joseph M. Acaba,2004.0,19.0,Active,1967-05-17,"Inglewood, CA",Male,University of California-Santa Barbara; Univer...,Geology,Geology,,,2,3307,2,13.0,"STS-119 (Discovery), ISS-31/32 (Soyuz)",NaT,
1,Loren W. Acton,,,Retired,1936-03-07,"Lewiston, MT",Male,Montana State University; University of Colorado,Engineering Physics,Solar Physics,,,1,190,0,0.0,STS 51-F (Challenger),NaT,
2,James C. Adamson,1984.0,10.0,Retired,1946-03-03,"Warsaw, NY",Male,US Military Academy; Princeton University,Engineering,Aerospace Engineering,Colonel,US Army (Retired),2,334,0,0.0,"STS-28 (Columbia), STS-43 (Atlantis)",NaT,
3,Thomas D. Akers,1987.0,12.0,Retired,1951-05-20,"St. Louis, MO",Male,University of Missouri-Rolla,Applied Mathematics,Applied Mathematics,Colonel,US Air Force (Retired),4,814,4,29.0,"STS-41 (Discovery), STS-49 (Endeavor), STS-61 ...",NaT,
4,Buzz Aldrin,1963.0,3.0,Retired,1930-01-20,"Montclair, NJ",Male,US Military Academy; MIT,Mechanical Engineering,Astronautics,Colonel,US Air Force (Retired),2,289,2,8.0,"Gemini 12, Apollo 11",NaT,


Use the `value_counts()` function to find the most popular undergraduate major among all astronauts.

In [22]:
# Your code here:

pop_undergrad = astronaut["Undergraduate Major"].value_counts()
pop_undergrad

Physics                              35
Aerospace Engineering                33
Mechanical Engineering               30
Aeronautical Engineering             28
Electrical Engineering               23
                                     ..
Business Finance                      1
Aerospace Engineering & Mechanics     1
Animal Nutrition                      1
Electrical Science                    1
Solid Earth Sciences                  1
Name: Undergraduate Major, Length: 83, dtype: int64

Due to all the commas present in the cells of this file, let's save it as a tab separated csv file. In the cell below, save `astronaut` as a tab separated file using the `to_csv` function. Call the file `astronaut.csv` and remember to remove the index column.

In [34]:
# Your code here:

astronaut_export_csv = astronaut.to_csv("/home/rute/Git/lab-import-export/your-code/astronaut.csv", sep=",", index=False)
astronaut_export_csv

# Bonus Challenge - Fertility Dataset

Visit the following [URL](https://archive.ics.uci.edu/ml/datasets/Fertility) and retrieve the dataset as well as the column headers. Determine the correct separator and read the file into a variable called `fertility`. Examine the dataframe using the `head()` function.

In [40]:
# Your code here:

fertility = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00244/fertility_Diagnosis.txt")

colnames = ["Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)",

"Age at the time of analysis. 18-36 (0, 1)",

"Childish diseases (ie , chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)",

"Accident or serious trauma 1) yes, 2) no. (0, 1)",

"Surgical intervention 1) yes, 2) no. (0, 1)",

"High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)",

"Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1)",

"Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)",

"Number of hours spent sitting per day ene-16 (0, 1)",

"Output: Diagnosis normal (N), altered (O)" ]

fertility.columns = colnames
fertility.head()

Unnamed: 0,"Season in which the analysis was performed. 1) winter, 2) spring, 3) Summer, 4) fall. (-1, -0.33, 0.33, 1)","Age at the time of analysis. 18-36 (0, 1)","Childish diseases (ie , chicken pox, measles, mumps, polio) 1) yes, 2) no. (0, 1)","Accident or serious trauma 1) yes, 2) no. (0, 1)","Surgical intervention 1) yes, 2) no. (0, 1)","High fevers in the last year 1) less than three months ago, 2) more than three months ago, 3) no. (-1, 0, 1)","Frequency of alcohol consumption 1) several times a day, 2) every day, 3) several times a week, 4) once a week, 5) hardly ever or never (0, 1)","Smoking habit 1) never, 2) occasional 3) daily. (-1, 0, 1)","Number of hours spent sitting per day ene-16 (0, 1)","Output: Diagnosis normal (N), altered (O)"
0,-0.33,0.94,1,0,1,0,0.8,1,0.31,O
1,-0.33,0.5,1,0,0,0,1.0,-1,0.5,N
2,-0.33,0.75,0,1,1,0,1.0,-1,0.38,N
3,-0.33,0.67,1,1,0,0,0.8,-1,0.5,O
4,-0.33,0.67,1,0,1,0,0.8,0,0.5,N
