
# SQL Python (Postgres specificially)

## Lecture Objectives

- Connect to a database from within a python program and create databases and tables using psycopg2
- Understand the cursor, fetches, commits, and rollbacks
- Generate dynamic queries and avoid injection


### Psycopg2
A library that allows Python to connect to an existing PostgreSQL database to utilize SQL functionality.


### Documentation
* http://initd.org/psycopg/docs/install.html
* In addition to what's listed in the documentation, if you have the anaconda distribution of Python 
```python 
conda install psycopg2 
```
should worked
* There are similar packages for other flavors of SQL that work much the same way

In [None]:
# install homebrew: http://brew.sh

# brew cask install postgres -> double click -> applications
### not needed ### brew cask install pgadmin4 -> double click -> applications, click plug
# brew tap homebrew/services
# brew services start postgresql

# https://github.com/zipfian/welcome/blob/master/notes/postgres_setup.md

# pip install psycopg2
# conda install psycopg2

## General Workflow

1. Establish connection to Postgres database using psycopg2
2. Create a cursor
3. Use the cursor to execute SQL queries and retrieve data
4. Commit SQL actions
4. Close the cursor and connection

# Walkthrough 1: Creating a database from Python

We already know SQL so lets learn how to use those skills in Python

# Creating a connection with Postgres

In [None]:
import psycopg2 as pg2

### Connect to the database
- Connections must be established using an existing database, username, database IP/URL, and maybe passwords
- If you need to create a database, you can first connect to Postgres using the dbname 'postgres' to initialize

In [None]:
conn = pg2.connect(dbname='postgres', host='localhost')
conn.autocommit = True   ## This is required to remove or create databases

### Instantiate the Cursor

- A cursor is what we use to interact with the data in the Database 
- The cursor is a control structure that enables traversal over the records in a database
- Executes and fetches data
- When the cursor points at the resulting output of a query, it can only read each observation once.  If you choose to see a previously read observation, you must rerun the query. 
- Can be closed without closing the connection

In [None]:
cur = conn.cursor()

###  Create a database

In [None]:
cur.execute('DROP DATABASE IF EXISTS class_example;')  # Makes sure there is not already a class_example database and removes is if there is
cur.execute('CREATE DATABASE class_example;')  #to create a PSQL database

### Disconnect from the cursor and database
- Cursors and Connections must be closed using .close() or else Postgres will lock certain operation on the database/tables to connection is severed. 

In [None]:
cur.close() # This is optional
conn.close() # Closing the connection also closes all cursors (DO NOT FORGET TO DO!!!!!)

# Using the new database

In [None]:
# We are connecting to the class_example database we just created 
conn = pg2.connect(dbname='class_example', host='localhost')
cur = conn.cursor()

### Creating a new table

In [None]:
query = '''
        CREATE TABLE logins (
            userid integer, 
            tmstmp timestamp, 
            type varchar(10)
        );
        '''




print(query)

In [None]:
cur.execute(query)   # lets look at our database in terminal and \dt to look at the tables

In [None]:
# Add question here about is the table done

# Where is the new table?

- When modifying a database the transaction is not compleated until we commit the command

In [None]:
conn.commit()

# Now lets import a .csv of data into our table

- We will import os so we can get the current directory we are in which contains our .csv

In [None]:
import os
current_directory_path = os.getcwd()
current_directory_path

In [None]:
# 
query = '''
        COPY logins 
        FROM '{0}/logins_data/logins01.csv' 
        DELIMITER ',' 
        CSV;
        '''.format(current_directory_path)



cur.execute(query)

### Lets take a look at the data


In [None]:
# pick someone to give the command

In [None]:
# query to get 20 records from the logins table
query = '''
            SELECT * FROM logins LIMIT 10;
        '''


In [None]:
cur.execute(query)

### Lets look at our data
- Exicuting the query has created a generator so we can requist the results multipule ways
    - One row at a time
    - Multipule rows at a time
    - All rows at once

In [None]:
# talk about iters here (look at the record numbers)

In [None]:
# One row at a time

cur.fetchone()

In [None]:
# Multipule rows

cur.fetchmany(5)

In [None]:
# All rows 

cur.fetchall()

In [None]:
cur.execute('SELECT Count(*) FROM logins')

In [None]:
record_count = cur.fetchall()
print(record_count)

In [None]:
conn.commit()

### Since we the cur becomes a generator on quries we can use a for loop to access them one at a time 

In [None]:
cur.execute(query)
for record in cur:
    print("{}: user {} logged in via {}".format(record[1], record[0], record[2]))

# Dynamic Queries

- A Dynamic Query is a query that generates based on context.


### Example

We have 8 login csv files that we need to insert into the logins table.  Instead of doing a COPY FROM query 8 times, we should utilize Python (or any future languages) to make this more efficient.  This is possible due to tokenized strings.

# WARNING: BEWARE OF SQL INJECTION

## NEVER use + or % to reformat strings to be used with .execute

Use string formatting to generate a query for each approved file.

**[WARNING: BEWARE OF SQL INJECTION](http://initd.org/psycopg/docs/usage.html)**

In [None]:
num = 579
terribly_unsafe = "SELECT * FROM logins WHERE userid = " + str(num) + ";"
print(terribly_unsafe)


date_cut = "2014-08-01"
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)
## Python is happy, but if num or date_cut included something malicious
## your data could be at risk

### What is an SQL Injection Attack?

In [None]:
date_cut = "2014-08-01; DROP TABLE logins" # The user enters a date in a field on a web form
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)

In [None]:
horribly_risky = "SELECT * FROM logins WHERE tmstmp > %s;" % date_cut
print(horribly_risky)

<table align="center">
<tr><td>
<img src="stuff/exploits_of_a_momSQLClasss.png" width="600px" align="center"> 
</tr></td>
</table>

### Practice safe SQL with Psycopg2

In [None]:
## Show how it works with bad query

In [None]:
query = '''
        COPY logins 
        FROM %(file_path)s
        DELIMITER ','
        CSV;
        '''

In [None]:
path = os.getcwd()

folder_path = path + '/logins_data/'
for file_name in os.listdir(folder_path):
    if file_name.endswith('.csv') and file_name != 'logins01.csv':
        path=folder_path+file_name
        cur.execute(query, {'file_path':path})
        print('{0} inserted into table.'.format(file_name))

### Let's check the total number of records we have right now.

In [None]:
print("Old record count: {}".format(record_count))

cur.execute('SELECT count(*) FROM logins;')
record_count = cur.fetchone()

print("New record count: {}".format(record_count))

### Transactions can be rolled back until they're committed

In [None]:
conn.rollback()

cur.execute('SELECT count(*) FROM logins;')
record_count = cur.fetchone()[0]

print("After rollback: {}".format(record_count))

### Don't forget to commit your changes


In [None]:
conn.commit()

## And then close your connection

In [None]:
cur.close()
conn.close()


### Using With Statements


In [None]:
query = "SELECT count(*) FROM logins;"
with pg2.connect(dbname='class_example', host='localhost') as conn:
    with conn.cursor() as curs:
        print("Cursor inside with block: {}".format(curs))
        curs.execute(query)
    print("Cursor outside with block: {}".format(curs))

### Note that the connection is *not* closed automatically:

In [None]:
conn.close()

## Lecture Objectives


- Connect to a database
    - How do we connect to a database?
    - What database name do we connect to to create a new database?
    - What do we need to do to create a database?
    
- Understanding the cursor
    - What does a curecer allow you to do?
    - When is the database updated/ changed?
    - What does a rollback do?

- What are injection attacks and how do we avoid them?
    

# Key Things to Remember

* Connections must be established using an existing database, username, database IP/URL, and maybe passwords
* If you have no created databases, you can connect to Postgres using the dbname 'postgres' to initialize db commands
* Data changes are not actually stored until you choose to commit. This can be done either through `conn.commit()` or setting `autocommit = True`.  Until commited, all transactions is only temporary stored.
* Autocommit = True is necessary to do database commands like CREATE DATABASE.  This is because Postgres does not have temporary transactions at the database level.
* If you ever need to build similar pipelines for other forms of database, there are libraries such PyODBC which operate very similarly.
* SQL connection databases utilizes cursors for data traversal and retrieval.  This is kind of like an iterator in Python.
* Cursor operations typically goes like the following:
    - execute a query
    - fetch rows from query result if it is a SELECT query
    - because it is iterative, previously fetched rows can only be fetched again by rerunning the query
    - close cursor through .close()
* Cursors and Connections must be closed using .close() or else Postgres will lock certain operation on the database/tables to connection is severed. 

In [None]:
## RESTATE OBJ use quiz of keys ect to do 

In [None]:
###