
# SQL Python (Postgres specificially)

## Lecture Objectives

- Connect to a database from within a python program and create databases and tables using psycopg2
- Understand the cursor, fetches, commits, and rollbacks
- Generate dynamic queries and avoid injection


### Psycopg2
A library that allows Python to connect to an existing PostgreSQL database to utilize SQL functionality.


### Documentation
* http://initd.org/psycopg/docs/install.html
* In addition to what's listed in the documentation, if you have the anaconda distribution of Python 
```python 
conda install psycopg2 
```
should work
* There are similar packages for other flavors of SQL that work much the same way

## General Workflow Reminder

1. Establish connection to Postgres database using psycopg2
2. Create a cursor
3. Use the cursor to execute SQL queries and retrieve data
4. Commit SQL actions
4. Close the cursor and connection

# Walkthrough 1: Creating a database from Python

We already know SQL so let's learn how to use those skills in Python

# Creating a connection with Postgres

In [1]:
import psycopg2 as pg2

### Connect to the database
- Connections must be established using an existing database, username, database IP/URL, and maybe passwords
- If you need to create a database, you can first connect to Postgres using the dbname 'postgres' to initialize

In [2]:
conn = pg2.connect(dbname='postgres', host='localhost') ## "user" input option available if necessary
conn.autocommit = True   ## This is required to remove or create databases

## Generally DON'T have autocommit true! If you make mistakes in changing your future tables,
## you can only rollback if you haven't committed those changes!

### Instantiate the Cursor

- A cursor is what we use to interact with the data in the Database 
- The cursor is a control structure that enables traversal over the records in a database
- Executes and fetches data
- When the cursor points at the resulting output of a query, it can only read each observation once.  If you choose to see a previously read observation, you must rerun the query. 
- Can be closed without closing the connection

In [3]:
cur = conn.cursor()

###  Create a database

In [4]:
cur.execute('DROP DATABASE IF EXISTS class_example;')  # Makes sure there is not already a class_example database and removes is if there is
cur.execute('CREATE DATABASE class_example;')  #to create a PSQL database

### Disconnect from the cursor and database
- Cursors and Connections must be closed using .close() or else Postgres will lock certain operation on the database/tables to connection is severed. 

In [5]:
cur.close() # This is optional
conn.close() # Closing the connection also closes all cursors (DO NOT FORGET TO DO!!!!!)

# Using the new database

In [6]:
# We are connecting to the class_example database we just created 
conn = pg2.connect(dbname='class_example', host='localhost')
cur = conn.cursor()

### Creating a new table

In [7]:
query = '''
        CREATE TABLE logins (
            userid integer, 
            tmstmp timestamp, 
            type varchar(10)
        );
        '''




print(query)


        CREATE TABLE logins (
            userid integer, 
            tmstmp timestamp, 
            type varchar(10)
        );
        


In [8]:
cur.execute(query)   # lets look at our database in terminal and \dt to look at the tables

# Where is the new table?

- When modifying a database the transaction is not completed until we commit the command

In [9]:
conn.commit()

# Now lets import a .csv of data into our table

- We will import os so we can get the current directory we are in which contains our .csv

In [10]:
import os
current_directory_path = os.getcwd()
current_directory_path

'/Users/mark.llorente/Desktop/Galvanize/DSI_Lectures/sql-python/mark_llorente'

In [11]:
# 
query = '''
        COPY logins 
        FROM '{0}/lecture-example/logins01.csv' 
        DELIMITER ',' 
        CSV;
        '''.format(current_directory_path)



cur.execute(query)

### Lets take a look at the data


In [12]:
# pick someone to give the command

In [13]:
# query to get 20 records from the logins table
query = '''
            SELECT * FROM logins LIMIT 20;
        '''


In [14]:
cur.execute(query)

### Lets look at our data
- Executing the query has created a generator so we can request the results multiple ways
    - One row at a time
    - Multiple rows at a time
    - All rows at once

In [15]:
# talk about iters here (look at the record numbers)

In [16]:
# One row at a time

cur.fetchone()

(579, datetime.datetime(2013, 11, 20, 3, 20, 6), 'mobile')

In [17]:
# Multipule rows

cur.fetchmany(5)

[(823, datetime.datetime(2013, 11, 20, 3, 20, 49), 'web'),
 (953, datetime.datetime(2013, 11, 20, 3, 28, 49), 'web'),
 (612, datetime.datetime(2013, 11, 20, 3, 36, 55), 'web'),
 (269, datetime.datetime(2013, 11, 20, 3, 43, 13), 'web'),
 (799, datetime.datetime(2013, 11, 20, 3, 56, 55), 'web')]

In [18]:
# All rows 

cur.fetchall()

[(890, datetime.datetime(2013, 11, 20, 4, 2, 33), 'mobile'),
 (330, datetime.datetime(2013, 11, 20, 4, 54, 59), 'mobile'),
 (628, datetime.datetime(2013, 11, 20, 4, 57, 22), 'mobile'),
 (398, datetime.datetime(2013, 11, 20, 5, 3, 19), 'mobile'),
 (482, datetime.datetime(2013, 11, 20, 5, 4, 43), 'mobile'),
 (581, datetime.datetime(2013, 11, 20, 5, 12, 3), 'mobile'),
 (370, datetime.datetime(2013, 11, 20, 5, 26, 46), 'mobile'),
 (230, datetime.datetime(2013, 11, 20, 5, 28, 29), 'web'),
 (596, datetime.datetime(2013, 11, 20, 5, 28, 36), 'web'),
 (274, datetime.datetime(2013, 11, 20, 5, 43, 8), 'mobile'),
 (581, datetime.datetime(2013, 11, 20, 5, 47, 10), 'web'),
 (417, datetime.datetime(2013, 11, 20, 5, 54, 37), 'mobile'),
 (185, datetime.datetime(2013, 11, 20, 5, 56, 22), 'mobile'),
 (371, datetime.datetime(2013, 11, 20, 5, 58, 35), 'mobile')]

In [19]:
cur.execute('SELECT Count(*) FROM logins')

In [20]:
record_count = cur.fetchall()
print(record_count[0])

(10000,)


In [21]:
conn.commit()

### Since we the cur becomes a generator on queries we can use a for loop to access them one at a time 

In [22]:
cur.execute(query)
for record in cur:
    print("{}: user {} logged in via {}".format(record[1], record[0], record[2]))

2013-11-20 03:20:06: user 579 logged in via mobile
2013-11-20 03:20:49: user 823 logged in via web
2013-11-20 03:28:49: user 953 logged in via web
2013-11-20 03:36:55: user 612 logged in via web
2013-11-20 03:43:13: user 269 logged in via web
2013-11-20 03:56:55: user 799 logged in via web
2013-11-20 04:02:33: user 890 logged in via mobile
2013-11-20 04:54:59: user 330 logged in via mobile
2013-11-20 04:57:22: user 628 logged in via mobile
2013-11-20 05:03:19: user 398 logged in via mobile
2013-11-20 05:04:43: user 482 logged in via mobile
2013-11-20 05:12:03: user 581 logged in via mobile
2013-11-20 05:26:46: user 370 logged in via mobile
2013-11-20 05:28:29: user 230 logged in via web
2013-11-20 05:28:36: user 596 logged in via web
2013-11-20 05:43:08: user 274 logged in via mobile
2013-11-20 05:47:10: user 581 logged in via web
2013-11-20 05:54:37: user 417 logged in via mobile
2013-11-20 05:56:22: user 185 logged in via mobile
2013-11-20 05:58:35: user 371 logged in via mobile


# Dynamic Queries

- A Dynamic Query is a query that generates based on context.


### Example

We have 8 login csv files that we need to insert into the logins table.  Instead of doing a COPY FROM query 8 times, we should utilize Python (or any future languages) to make this more efficient.  This is possible due to tokenized strings.

# WARNING: BEWARE OF SQL INJECTION

## NEVER use + or % to reformat strings to be used with .execute

Use string formatting to generate a query for each approved file.

**[WARNING: BEWARE OF SQL INJECTION](http://initd.org/psycopg/docs/usage.html)**

In [23]:
num = 579
terribly_unsafe = "SELECT * FROM logins WHERE userid = " + str(num) + ";"
print(terribly_unsafe)


date_cut = "2014-08-01"
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)
## Python is happy, but if num or date_cut included something malicious
## your data could be at risk

SELECT * FROM logins WHERE userid = 579;
SELECT * FROM logins WHERE tmstmp > 2014-08-01;


### What is an SQL Injection Attack?

In [24]:
date_cut = "2014-08-01; DROP TABLE logins" # The user enters a date in a field on a web form
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)

SELECT * FROM logins WHERE tmstmp > 2014-08-01; DROP TABLE logins;


In [25]:
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)

SELECT * FROM logins WHERE tmstmp > 2014-08-01; DROP TABLE logins;


<table align="center">
<tr><td>
<img src="stuff/exploits_of_a_momSQLClasss.png" width="600px" align="center"> 
</tr></td>
</table>

### Practice safe SQL with Psycopg2

**~Quick detour to presentation slides~**

In [26]:
query = '''
        COPY logins 
        FROM %(file_path)s
        DELIMITER ','
        CSV;
        '''

In [27]:
path = os.getcwd()

folder_path = path + '/lecture-example/'
for file_name in os.listdir(folder_path):
    if file_name.endswith('.csv') and file_name != 'logins01.csv':
        path=folder_path+file_name
        cur.execute(query, {'file_path':path})
        print('{0} inserted into table.'.format(file_name))

logins08.csv inserted into table.
logins06.csv inserted into table.
logins07.csv inserted into table.
logins05.csv inserted into table.
logins04.csv inserted into table.
logins03.csv inserted into table.
logins02.csv inserted into table.


### Let's check the total number of records we have right now.

In [28]:
print("Old record count: {}".format(record_count))

cur.execute('SELECT count(*) FROM logins;')
record_count = cur.fetchone()

print("New record count: {}".format(record_count))

Old record count: [(10000,)]
New record count: (78588,)


### Transactions can be rolled back until they're committed

In [29]:
conn.rollback()

cur.execute('SELECT count(*) FROM logins;')
record_count = cur.fetchone()[0]

print("After rollback: {}".format(record_count))

After rollback: 10000


### Don't forget to commit your changes


In [30]:
conn.commit()

## And then close your connection

In [31]:
cur.close()
conn.close()


### Using With Statements


In [32]:
query = "SELECT count(*) FROM logins;"
with pg2.connect(dbname='class_example', host='localhost') as conn:
    with conn.cursor() as curs:
        print("Cursor inside with block: {}".format(curs))
        curs.execute(query)
    print("Cursor outside with block: {}".format(curs))

Cursor inside with block: <cursor object at 0x109c56900; closed: 0>
Cursor outside with block: <cursor object at 0x109c56900; closed: -1>


### Note that the connection is *not* closed automatically:

In [33]:
conn.close()

## Lecture Objectives


- Connect to a database
    - How do we connect to a database?
    - What database name do we connect to to create a new database?
    - What do we need to do to create a database?
    
- Understanding the cursor
    - What does a cursor allow you to do?
    - When is the database updated/ changed?
    - What does a rollback do?

- What are injection attacks and how do we avoid them?
    

# Key Things to Remember

* Connections must be established using an existing database, username, database IP/URL, and maybe passwords
* If you have no created databases, you can connect to Postgres using the dbname 'postgres' to initialize db commands
* Data changes are not actually stored until you choose to commit. This can be done either through `conn.commit()` or setting `autocommit = True`.  Until commited, all transactions is only temporary stored.
* Autocommit = True is necessary to do database commands like CREATE DATABASE.  This is because Postgres does not have temporary transactions at the database level.
* If you ever need to build similar pipelines for other forms of database, there are libraries such PyODBC which operate very similarly.
* SQL connection databases utilizes cursors for data traversal and retrieval.  This is kind of like an iterator in Python.
* Cursor operations typically goes like the following:
    - execute a query
    - fetch rows from query result if it is a SELECT query
    - because it is iterative, previously fetched rows can only be fetched again by rerunning the query
    - close cursor through .close()
* Cursors and Connections must be closed using .close() or else Postgres will lock certain operation on the database/tables to connection is severed. 

# Morning Sprint

You'll be asked to make a pipeline using psycopg2 to streamline the process for running queries! This will combine your practice with OOP & psql in one project. It doesn't have many problems but it will be great practice for creating scripts and modules for database management in your future.

## Go get 'em! 

In [34]:
###