# Importing flat files 
Flat files are an important part of the data analysis and data science workflow and a common way to both store and transfer data between people and systems. The most common format that I've come across is the `.csv` format which can be easily handled using the `pandas` library. 

In [1]:
import pandas as pd # Import pandas

path_to_file = '../data/craft-cans/beers.csv'
beers = pd.read_csv(path_to_file)

At its most basic level, that is it. We now have access to the information that is stored in the `beers.csv` csv file in the data section of this repository. 

In [2]:
beers.head()

Unnamed: 0.1,Unnamed: 0,abv,ibu,id,name,style,brewery_id,ounces
0,0,0.05,,1436,Pub Beer,American Pale Lager,408,12.0
1,1,0.066,,2265,Devil's Cup,American Pale Ale (APA),177,12.0
2,2,0.071,,2264,Rise of the Phoenix,American IPA,177,12.0
3,3,0.09,,2263,Sinister,American Double / Imperial IPA,177,12.0
4,4,0.075,,2262,Sex and Candy,American IPA,177,12.0


In [3]:
beers.describe()

Unnamed: 0.1,Unnamed: 0,abv,ibu,id,brewery_id,ounces
count,2410.0,2348.0,1405.0,2410.0,2410.0,2410.0
mean,1204.5,0.059773,42.713167,1431.113278,231.749793,13.592241
std,695.851397,0.013542,25.954066,752.459975,157.685604,2.352204
min,0.0,0.001,4.0,1.0,0.0,8.4
25%,602.25,0.05,21.0,808.25,93.0,12.0
50%,1204.5,0.056,35.0,1453.5,205.0,12.0
75%,1806.75,0.067,64.0,2075.75,366.0,16.0
max,2409.0,0.128,138.0,2692.0,557.0,32.0


Having had more of a look, we can see that the first column (`Unnamed: 0`) most likely just represents a primary index for the beers table. We could either remove it, or we could specify that when we load the data in the first place.

In [4]:
beers2 = pd.read_csv(path_to_file, index_col=0)
beers2.head()

Unnamed: 0,abv,ibu,id,name,style,brewery_id,ounces
0,0.05,,1436,Pub Beer,American Pale Lager,408,12.0
1,0.066,,2265,Devil's Cup,American Pale Ale (APA),177,12.0
2,0.071,,2264,Rise of the Phoenix,American IPA,177,12.0
3,0.09,,2263,Sinister,American Double / Imperial IPA,177,12.0
4,0.075,,2262,Sex and Candy,American IPA,177,12.0


In the above instance, I specified that I wanted the first column to become the index of the DataFrame using the index of the column: `0` = first column.  

Let's say, for instance, that we wanted the name column to be the index column when we loaded the data in. We could specify it by the name of the column, rather than the index, as shown below. 


In [5]:
beers3 = pd.read_csv(path_to_file, index_col='name')
beers3.head()

Unnamed: 0_level_0,Unnamed: 0,abv,ibu,id,style,brewery_id,ounces
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Pub Beer,0,0.05,,1436,American Pale Lager,408,12.0
Devil's Cup,1,0.066,,2265,American Pale Ale (APA),177,12.0
Rise of the Phoenix,2,0.071,,2264,American IPA,177,12.0
Sinister,3,0.09,,2263,American Double / Imperial IPA,177,12.0
Sex and Candy,4,0.075,,2262,American IPA,177,12.0


As a side note, we could have simply reindexed `beers2` to achieve effectively the same result without having to reload the data. 

In [8]:
beers2.set_index('name').head()

Unnamed: 0_level_0,abv,ibu,id,style,brewery_id,ounces
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Pub Beer,0.05,,1436,American Pale Lager,408,12.0
Devil's Cup,0.066,,2265,American Pale Ale (APA),177,12.0
Rise of the Phoenix,0.071,,2264,American IPA,177,12.0
Sinister,0.09,,2263,American Double / Imperial IPA,177,12.0
Sex and Candy,0.075,,2262,American IPA,177,12.0
