** Typically we will just be either reading csv files directly or using pandas-datareader or quandl. Consider this tutorial just a quick overview of what is possible with pandas (we won't be working with SQL or excel files in this part) **

# Data Input and Output

This notebook is the reference code for getting input and output, pandas can read a variety of file types using its pd.read_ methods. Let's take a look at the most common data types:

```
- CSV
- Excel
- HTML
- SQL
```

In [1]:
import pandas as pd
import numpy as np

# Read file from various type of file
##  Output file from various type of file

```
simply read_,then tab, then select
same idea, to_,then tab, then select, need to specify the index etc>
```

In [4]:
pd.read_csv('example.csv')

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [7]:
df = pd.read_csv('example.csv')

In [8]:
df.to_csv("My_output")

In [9]:
ls

[31m08-Data-Input-and-Output.ipynb[m[m*        Pandas - Groupby.ipynb
DataFrames Part 2.ipynb                Pandas - Missing Data.ipynb
DataFrames.ipynb                       Pandas - Operations.ipynb
[31mExcel_Sample.xlsx[m[m*                     Pandas-DataFrames Part 3.ipynb
General_Pandas.ipynb                   [1m[36mPandas-Exercises[m[m/
Merging-Joing-and-Concatenating.ipynb  [31mexample.csv[m[m*
My_output                              groupby.png
Pandas - Data Input and Output.ipynb


In [10]:
pd.read_csv('My_output')

Unnamed: 0.1,Unnamed: 0,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15


In [14]:
df.to_csv('My_output', index = False)

In [16]:
pd.read_csv('My_output')

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [18]:
pd.read_excel('Excel_Sample.xlsx', sheet_name = 'Sheet1')

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [19]:
df

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [20]:
df.to_excel('Excel_Sample2.xlsx', sheet_name = 'NewSheet')

In [21]:
pd.read_excel('Excel_Sample2.xlsx')

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [22]:
type(pd.read_excel('Excel_Sample2.xlsx'))

pandas.core.frame.DataFrame

# read_html
Theory: read_html will find table element in html code ASAP

## HTML

You may need to install htmllib5,lxml, and BeautifulSoup4. In your terminal/command prompt run:

    conda install lxml
    conda install html5lib
    conda install BeautifulSoup4

Then restart Jupyter Notebook.
(or use pip install if you aren't using the Anaconda Distribution)

Pandas can read table tabs off of html. For example:

In [23]:
data = pd.read_html("https://www.fdic.gov/bank/individual/failed/banklist.html")

In [24]:
data

[                                             Bank Name                City  \
 0      The Farmers and Merchants State Bank of Argonia             Argonia   
 1                                  Fayette County Bank          Saint Elmo   
 2    Guaranty Bank, (d/b/a BestBank in Georgia & Mi...           Milwaukee   
 3                                       First NBC Bank         New Orleans   
 4                                        Proficio Bank  Cottonwood Heights   
 5                        Seaway Bank and Trust Company             Chicago   
 6                               Harvest Community Bank          Pennsville   
 7                                          Allied Bank            Mulberry   
 8                         The Woodbury Banking Company            Woodbury   
 9                               First CornerStone Bank     King of Prussia   
 10                                  Trust Company Bank             Memphis   
 11                          North Milwaukee State B

In [25]:
type(data)

list

In [26]:
len(data)

1

In [27]:
data[0]

Unnamed: 0,Bank Name,City,ST,CERT,Acquiring Institution,Closing Date,Updated Date
0,The Farmers and Merchants State Bank of Argonia,Argonia,KS,17719,Conway Bank,"October 13, 2017","October 20, 2017"
1,Fayette County Bank,Saint Elmo,IL,1802,"United Fidelity Bank, fsb","May 26, 2017","July 26, 2017"
2,"Guaranty Bank, (d/b/a BestBank in Georgia & Mi...",Milwaukee,WI,30003,First-Citizens Bank & Trust Company,"May 5, 2017","July 26, 2017"
3,First NBC Bank,New Orleans,LA,58302,Whitney Bank,"April 28, 2017","July 26, 2017"
4,Proficio Bank,Cottonwood Heights,UT,35495,Cache Valley Bank,"March 3, 2017","May 18, 2017"
5,Seaway Bank and Trust Company,Chicago,IL,19328,State Bank of Texas,"January 27, 2017","May 18, 2017"
6,Harvest Community Bank,Pennsville,NJ,34951,First-Citizens Bank & Trust Company,"January 13, 2017","May 18, 2017"
7,Allied Bank,Mulberry,AR,91,Today's Bank,"September 23, 2016","September 25, 2017"
8,The Woodbury Banking Company,Woodbury,GA,11297,United Bank,"August 19, 2016","June 1, 2017"
9,First CornerStone Bank,King of Prussia,PA,35312,First-Citizens Bank & Trust Company,"May 6, 2016","September 6, 2016"


In [28]:
type(data[0])

pandas.core.frame.DataFrame

# Read from SQL

In [29]:
from sqlalchemy import create_engine

In [30]:
# create a very small sqllit engine in memory
engine = create_engine('sqlite:///:memory:')

In [31]:
df

Unnamed: 0,a,b,c,d
0,0,1,2,3
1,4,5,6,7
2,8,9,10,11
3,12,13,14,15


In [32]:
df.to_sql('my_table', engine)

In [33]:
sqldf = pd.read_sql('my_table', con=engine)

In [34]:
sqldf

Unnamed: 0,index,a,b,c,d
0,0,0,1,2,3
1,1,4,5,6,7
2,2,8,9,10,11
3,3,12,13,14,15
