## Create a database

In [1]:
from sqlalchemy import create_engine
from sqlalchemy_utils import database_exists, create_database
import psycopg2
import pandas as pd

In [12]:
# Define a database name (we're using a dataset on births, so we'll call it birth_db)
# Set your postgres username
dbname = 'prediction_db'
username = 'xingliu' # change this to your username

In [13]:
## 'engine' is a connection to a database
## Here, we're using postgres, but sqlalchemy can connect to other things too.
engine = create_engine('postgres://%s@localhost/%s'%(username,dbname))
print(engine.url)

postgres://xingliu@localhost/prediction_db


In [14]:
## create a database (if it doesn't exist)
if not database_exists(engine.url):
    create_database(engine.url)
print(database_exists(engine.url))

True


In [15]:
pred50187 = pd.read_csv('forecastplantid50187.csv', parse_dates = ['ds'])
pred50187['plant_id'] = 50187

In [17]:
pred3845 = pd.read_csv('forecastplantid3845.csv', parse_dates = ['ds'])
pred3845['plant_id'] = 3845

In [18]:
pred54268 = pd.read_csv('forecastplantid54268.csv', parse_dates = ['ds'])
pred54268['plant_id'] = 54268

In [19]:
plant_pred = pd.concat([pred50187, pred3845, pred54268], axis = 0)

In [20]:
## insert data into database from Python (proof of concept - this won't be useful for big data, of course)
plant_pred.to_sql('prediction_table', engine, index = False, if_exists='replace')

In [21]:
plant_pred.head()

Unnamed: 0,ds,trend,trend_lower,trend_upper,yhat_lower,yhat_upper,seasonal,seasonal_lower,seasonal_upper,seasonalities,seasonalities_lower,seasonalities_upper,yearly,yearly_lower,yearly_upper,yhat,plant_id
0,2007-01-31,29120.494432,29120.494432,29120.494432,27038.333973,37108.156532,2927.596415,2927.596415,2927.596415,2927.596415,2927.596415,2927.596415,2927.596415,2927.596415,2927.596415,32048.090847,50187
1,2007-02-28,29074.632325,29074.632325,29074.632325,24488.716189,34518.105238,715.545569,715.545569,715.545569,715.545569,715.545569,715.545569,715.545569,715.545569,715.545569,29790.177894,50187
2,2007-03-31,29023.856421,29023.856421,29023.856421,22894.289561,32666.31137,-1079.577546,-1079.577546,-1079.577546,-1079.577546,-1079.577546,-1079.577546,-1079.577546,-1079.577546,-1079.577546,27944.278874,50187
3,2007-04-30,28974.718449,28974.718449,28974.718449,22857.900878,32865.895293,-900.944713,-900.944713,-900.944713,-900.944713,-900.944713,-900.944713,-900.944713,-900.944713,-900.944713,28073.773736,50187
4,2007-05-31,28923.942545,28923.942545,28923.942545,20687.201412,30757.482432,-3226.703857,-3226.703857,-3226.703857,-3226.703857,-3226.703857,-3226.703857,-3226.703857,-3226.703857,-3226.703857,25697.238688,50187


The above line (to_sql) is doing a lot of heavy lifting.  It's reading a dataframe, it's creating a table, and adding the data to the table.  So ** SQLAlchemy is quite useful! **

## Working with PostgresSQL without Python

**Open up the PostgreSQL app, click on the "Open psql" button in the bottom right corner, ** <br>

or alternatively type <br>

    psql -h localhost

into the command line  

**Connect to the "birth_db" database we created**

    \c birth_db

**You should see something like the following**

`You are now connected to database "birth_db" as user "rockson".`


**Then try the following query:**

    SELECT * FROM birth_data_table;
    
Note that the semi-colon indicates an end-of-statement.

### You can see the table we created!  But it's kinda ugly and hard to read.

Try a few other sample queries.  Before you type in each one, ask yourself what you think the output will look like:

`SELECT * FROM birth_data_table WHERE infant_sex='M';`

`SELECT COUNT(infant_sex) FROM birth_data_table WHERE infant_sex='M';`

`SELECT COUNT(gestation_weeks), infant_sex FROM birth_data_table WHERE infant_sex = 'M' GROUP BY gestation_weeks, infant_sex;`

`SELECT gestation_weeks, COUNT(gestation_weeks) FROM birth_data_table WHERE infant_sex = 'M' GROUP BY gestation_weeks;`

All the above queries run, but they are difficult to visually inspect in the Postgres terminal.

## Working with PostgreSQL in Python

In [23]:
# Connect to make queries using psycopg2
con = None
con = psycopg2.connect(database = dbname, user = username)

# query:
sql_query = """
SELECT * FROM prediction_table WHERE plant_id=3845;
"""
netgen_from_sql = pd.read_sql_query(sql_query,con)
netgen_from_sql.head()

Unnamed: 0,ds,trend,trend_lower,trend_upper,yhat_lower,yhat_upper,seasonal,seasonal_lower,seasonal_upper,seasonalities,seasonalities_lower,seasonalities_upper,yearly,yearly_lower,yearly_upper,yhat,plant_id
0,2007-01-31,723552.305973,723552.305973,723552.305973,579763.183656,1059725.0,100794.288359,100794.288359,100794.288359,100794.288359,100794.288359,100794.288359,100794.288359,100794.288359,100794.288359,824346.594332,3845
1,2007-02-28,720898.443786,720898.443786,720898.443786,334814.832317,858452.2,-122945.73251,-122945.73251,-122945.73251,-122945.73251,-122945.73251,-122945.73251,-122945.73251,-122945.73251,-122945.73251,597952.711277,3845
2,2007-03-31,717960.239222,717960.239222,717960.239222,309813.118689,822411.3,-150057.28563,-150057.28563,-150057.28563,-150057.28563,-150057.28563,-150057.28563,-150057.28563,-150057.28563,-150057.28563,567902.953592,3845
3,2007-04-30,715116.81545,715116.81545,715116.81545,119627.665185,637163.2,-335516.867712,-335516.867712,-335516.867712,-335516.867712,-335516.867712,-335516.867712,-335516.867712,-335516.867712,-335516.867712,379599.947738,3845
4,2007-05-31,712178.610886,712178.610886,712178.610886,19117.55078,537973.6,-426145.673792,-426145.673792,-426145.673792,-426145.673792,-426145.673792,-426145.673792,-426145.673792,-426145.673792,-426145.673792,286032.937094,3845


Once the data has been pulled into python, we can leverage pandas methods to work with the data.