# Python Importing Data

This chapter, we will understand how to importing data from different file types.

### Execute Shell Command in Python

Starting a line with `!` gives you complete system shell access.


In [None]:
! ls .

### Read From Flat Files

Using `open` to manipulate files, and don't forget to `close` it after you finish operation.

In [None]:
file = open('./res/yearly_registers.csv', mode='r')

print(file.read())

file.close()

##### Open Flat Files

Use `open` to read file line by line.

In [None]:
file = open('./res/yearly_registers.csv', mode='r')

lines = file.readlines()

for line in lines:
    print(line)

file.close()

`with` is called `Context Manager`, it could automatically `close` file after execution.

In [None]:
with open('./res/yearly_registers.csv') as file:
    print(file.read())

##### Use Numpy to Import Flat Files

You can use `numpy` and customized arguments, such as `skiprows`, `usecols`, `dtype`, to read from file.

In [None]:
import numpy as np

data = np.loadtxt('./res/yearly_registers.csv', delimiter=',', skiprows=1)
print(data)
data = np.loadtxt('./res/yearly_registers.csv', delimiter=',', skiprows=1, usecols=[0], dtype=int)
print(data)

If file contains multiple columns with different data types, `loadtxt` will freak at this.

You can use `np.genfromtxt()` to handle such strucutres.

It contains multiple 1D array, each one has different type.

If you use option `dtype=None`, it will detect column types automatically.

You can use `data[i]` to query specific rows, and use `data['column_name']` to query specifc columns.

In [None]:
data = np.genfromtxt('./res/yearly_registers.csv', delimiter=',', names=True, dtype=None)
print(data['year'])
print(type(data['year'][0]))
print(data['amount'])
print(type(data['amount'][0]))

`np.recfromcsv()` behaves similarly to `np.genfromtxt()`, except that its default dtype is None.

In [None]:
data = np.recfromcsv('./res/yearly_registers.csv', names=True)
print(data['year'])

##### Use Pandas to Import Flat Files

Although it is possible for you to import different data types from `numpy`, `dataframe` object is a more appropriate structure to store such data. We can easily do it by `pd.read_csv()`

In [None]:
import pandas as pd
df = pd.read_csv('./res/yearly_registers.csv')
print(df.head())

In [None]:
import pandas as pd

df = pd.read_csv('./res/yearly_registers.csv', nrows=3)
print(df.head())

You can use customized arguments when reading different format file, such as `tsv`. 

It is possible for you to define comment pattern and na_values pattern

In [None]:
import pandas as pd
df = pd.read_csv('./res/pd_import_test.txt', sep='\t', comment='#', na_values='None')
print(df.head())

### Read From Other File Types

##### Use Pandas to Import Excel Files

You need to install xlrd first to make it works.

Run `pip install xlrd` in the termimal\CMD.

In [None]:
import pandas as pd

file = './res/yearly_registers.xlsx'
xl = pd.ExcelFile(file)
print(xl.sheet_names)

df = xl.parse('data')
print(df.head())

You can use customized arguments to select specics rows, columns, even rename column.

In [None]:
df = xl.parse('data', skiprows=[0], usecols=[1], names=['Registers'])
print(df.head())