## Lesson 1.4 - Independant Practice

Here's the situation - your working with a Postgre Database at a large wine distributor who needs you to maintain their database. You'll use some of your advanced SQl skills to take care of customer cases. Let's begin! 

First, let's load in the ipython sql extension so that we can use sql within the ipython notebook. 

In [None]:
# !pip uninstall psycopg2
# !conda install psycopg2
# !pip install ipython-sql

In [35]:
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
from sqlalchemy.engine.url import URL
import psycopg2


In [31]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


Connect to the database. Note - enter your own connection string. For help on how to load the raw CSV file into a Postgre database, please refer to the documenatation in the lesson plan on previous SQL lessons. 

Create your database in postrgres:
```bash
psql
```
```
user=# create database wine
user=# \quit
```


In [161]:
PATH = '../../assets/datasets'
df = pd.read_csv(PATH + '/wine.csv')
df.columns = [c.lower().replace(' ','') for c in df.columns] #postgres doesn't like capitals or spaces



In [162]:
df.columns

Index([u'fixedacidity', u'volatileacidity', u'citricacid', u'residualsugar',
       u'chlorides', u'freesulfurdioxide', u'totalsulfurdioxide', u'density',
       u'ph', u'sulphates', u'alcohol', u'quality'],
      dtype='object')

In [163]:
engine = create_engine('postgresql://localhost:5432')

In [164]:
df.to_sql('wine', engine)

In [165]:
%%sql postgresql://localhost:5432/
        
SELECT * FROM wine LIMIT 5

5 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


Select all of the wines that have an alcohol content above 10 and create a new column `high_alc` where 1 denotes an alcohol content > 10. Otherwise the value should be `NULL`

In [167]:
%%sql 

SELECT *, CASE WHEN alcohol > 10 THEN '1' ELSE NULL END AS high_alc FROM wine

1599 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality,high_alc
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,
5,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,5,
6,7.9,0.6,0.06,1.6,0.069,15.0,59.0,0.9964,3.3,0.46,9.4,5,
7,7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0,7,
8,7.8,0.58,0.02,2.0,0.073,9.0,18.0,0.9968,3.36,0.57,9.5,7,
9,7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5,5,1.0


Someone decided that they wanted to purchase *all* of these high alcohol wines for their resteraunts, so make sure to mark them as *sold*. Your predicesor forgot to add a column for *sales date*, so you will have to add this in to the database table as well. 

In [168]:
%%sql

ALTER TABLE wine
ADD sale_date date

Done.


[]

Set their sale date to today

In [172]:
%%sql

UPDATE wine SET sale_date = CURRENT_DATE WHERE alcohol >10

852 rows affected.


[]

In [175]:
%%sql 

SELECT * FROM wine LIMIT 10;

10 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality,sale_date
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,
5,7.4,0.66,0.0,1.8,0.075,13.0,40.0,0.9978,3.51,0.56,9.4,5,
6,7.9,0.6,0.06,1.6,0.069,15.0,59.0,0.9964,3.3,0.46,9.4,5,
7,7.3,0.65,0.0,1.2,0.065,15.0,21.0,0.9946,3.39,0.47,10.0,7,
8,7.8,0.58,0.02,2.0,0.073,9.0,18.0,0.9968,3.36,0.57,9.5,7,
45,4.6,0.52,0.15,2.1,0.054,8.0,65.0,0.9934,3.9,0.56,13.1,4,2016-07-19


Select all of the wines with high alcohol

In [176]:
%%sql
SELECT * FROM wine WHERE alcohol > 10

852 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality,sale_date
45,4.6,0.52,0.15,2.1,0.054,8.0,65.0,0.9934,3.9,0.56,13.1,4,2016-07-19
54,7.6,0.51,0.15,2.8,0.11,33.0,73.0,0.9955,3.17,0.63,10.2,6,2016-07-19
64,7.2,0.725,0.05,4.65,0.086,4.0,11.0,0.9962,3.41,0.39,10.9,5,2016-07-19
65,7.2,0.725,0.05,4.65,0.086,4.0,11.0,0.9962,3.41,0.39,10.9,5,2016-07-19
67,6.6,0.705,0.07,1.6,0.076,6.0,15.0,0.9962,3.44,0.58,10.7,5,2016-07-19
68,9.3,0.32,0.57,2.0,0.074,27.0,65.0,0.9969,3.28,0.79,10.7,5,2016-07-19
69,8.0,0.705,0.05,1.9,0.074,8.0,19.0,0.9962,3.34,0.95,10.5,6,2016-07-19
75,8.8,0.41,0.64,2.2,0.093,9.0,42.0,0.9986,3.54,0.66,10.5,5,2016-07-19
76,8.8,0.41,0.64,2.2,0.093,9.0,42.0,0.9986,3.54,0.66,10.5,5,2016-07-19
77,6.8,0.785,0.0,2.4,0.104,14.0,30.0,0.9966,3.52,0.55,10.7,6,2016-07-19


Now, for our analysis we want to take a look at all the high quality wines. Select the wines with ratings above 7 and save the result as a pandas dataframe

In [142]:
%%sql

SELECT * FROM wine WHERE quality > 7 LIMIT 3;

3 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality,sale_date
267,7.9,0.35,0.46,3.6,0.078,15.0,37.0,0.9973,3.35,0.86,12.8,8,2016-07-19
278,10.3,0.32,0.45,6.4,0.073,5.0,13.0,0.9976,3.23,0.82,12.6,8,2016-07-19
390,5.6,0.85,0.05,1.4,0.045,12.0,88.0,0.9924,3.56,0.82,12.9,8,2016-07-19


In [143]:
hQuality = pd.read_sql_query('SELECT * FROM wine WHERE quality >7;', engine)

But wait! You just recieved a call that we not only want to view high quality wines, but we want to see high quality wines with low acidity and medium alcohol content. Remember, we cannot include the wines already sold in this query. 

In [178]:
%%sql 

SELECT * FROM wine WHERE quality > 7 AND fixedacidity < 7.5 AND sale_date is NULL; 

1 rows affected.


index,fixedacidity,volatileacidity,citricacid,residualsugar,chlorides,freesulfurdioxide,totalsulfurdioxide,density,ph,sulphates,alcohol,quality,sale_date
1403,7.2,0.33,0.33,1.7,0.061,3.0,13.0,0.996,3.23,1.1,10.0,8,


In [145]:
q = 'SELECT * FROM wine WHERE quality > 7 AND fixedacidity < 7.5 AND sale_date is NULL;'
hQuality2 = pd.read_sql_query(q, engine)

Lastly, we want to round the density column to two decimals within the database.

In [None]:
%%sql 

SELECT *, ROUND(density, 1) as rounded_density FROM WINE;

**BONUS**: Continue to transform the data within the SQL database