# Python Charmers 

## Mini Workshop 2: SQL and Python

### Lesson Overview
- **Objective:** We'll connect to Snowflake to read & write data, and execute queries.
- **Source materials:** [The Information Lab NL](https://theinformationlab.nl/2022/09/23/connect-to-snowflake-from-python-script-sso/)
- **Prerequisites:** [Lesson 3 Getting Started with Pandas](./fundamentals-03-getting-started-with-pandas.ipynb)
- **Duration:** 30 mins

One of the big limitations with SQL is the lack of functionality open source libraries and packages provide. With this tutorial we can have the best of both worlds, having data contained within a SQL database but with the functionality of Python too.

## Python Libraries

For these tasks we'll need 4 libraries installed:
- snowflake.connector - for accessing snowflake
- pandas - for creating dataframes
- "snowflake-connector-python[pandas]" - for accessing more snowflake to pandas tools
- json - to access data stored in a config file

Please ensure you can run the script below, if not you will need to install the libraries as detailed in [Lesson 2 Packages](./fundamentals-02-packages.ipynb), and it recommended you close this file and start a new notebook file.

In [1]:
import snowflake.connector as sf
import pandas as pd
from snowflake.connector.pandas_tools import write_pandas
import json

## Connecting to Snowflake

We are logging on via Single Sign On in the browser.

For this you will need:
- **Account**: in your snowflake login url it is **account_identifier**.snowflakecomputing.com
- **User**: your full email address, e.g. first.last@workplace.co.uk
- **Authenticator**: this is 'externalbrowser'

### 1. Set up Config File

In [9]:
# TO DO

# Under python_charmers/data is a json file "config_sql_workshop.json"
# Download this file and fill in your Username & Snowflake Account Identifier
# Reupload this file and run the script below, you should see your username

import json

with open('../data/config_sql_workshop.json', 'r') as file:
    config = json.load(file)

user = config['user']
account = config['account']

print(user)




### 2. Build Connection for Snowflake

In [None]:
# load libraries
import snowflake.connector as sf
import pandas as pd

# Gets the version
ctx = sf.connect(
    account = account,
    user=user,
    authenticator= 'externalbrowser'
    )

# Create a cursor object.
cur = ctx.cursor()

# Execute a statement that will generate a result set.
sql = "SELECT current_version()"
cur.execute(sql)
one_row = cur.fetchone()
print(one_row[0])

## Read Data From Snowflake

In [3]:
# Execute a statement that will generate a result set.
sql = "SELECT * FROM TIL_PLAYGROUND.PREPPIN_DATA_INPUTS.PD2023_WK01;"
cur.execute(sql)

# Fetch the result set from the cursor and deliver it as the pandas DataFrame.
df = cur.fetch_pandas_all()
print(df)

    TRANSACTION_CODE  VALUE  CUSTOMER_CODE  ONLINE_OR_IN_PERSON  \
0    DTB-716-679-576   1448         100001                    2   
1     DS-976-542-770   7364         100009                    1   
2     DS-551-937-380    475         100002                    1   
3     DS-726-686-279   9455         100006                    2   
4     DS-849-981-514   8500         100000                    2   
..               ...    ...            ...                  ...   
360  DSB-448-546-348   4525         100009                    1   
361  DSB-474-374-857   5375         100000                    2   
362   DS-367-545-264   7957         100007                    2   
363  DSB-807-592-406   5520         100005                    1   
364   DS-795-814-303   7839         100001                    2   

        TRANSACTION_DATE  
0    20/03/2023 00:00:00  
1    09/10/2023 00:00:00  
2    11/10/2023 00:00:00  
3    10/08/2023 00:00:00  
4    29/10/2023 00:00:00  
..                   ...  
360  2

In [5]:
# It doesn't have to be just SELECT *, you can run any SQL code
# However for big code blocks include wrap the code in triple single quotes (''') 
# this will allow the full string to be captured
# Alternatively you may want to read in a SQL script - more on that later

# set database and schema path
sql = "USE TIL_PLAYGROUND.PREPPIN_DATA_INPUTS;"
cur.execute(sql)

# query table in schema 
sql = '''
SELECT 
SPLIT_PART(transaction_code,'-',1) as bank, 
SUM(value) as total_value 
FROM pd2023_wk01 
GROUP BY SPLIT_PART(transaction_code,'-',1);
'''
print(sql)

# Execute SQL script
cur.execute(sql)

# Fetch the result set from the cursor and deliver it as the pandas DataFrame.
df = cur.fetch_pandas_all()
print(df)


SELECT 
SPLIT_PART(transaction_code,'-',1) as bank, 
SUM(value) as total_value 
FROM pd2023_wk01 
GROUP BY SPLIT_PART(transaction_code,'-',1);

  BANK  TOTAL_VALUE
0  DTB       618238
1   DS       653940
2  DSB       530489


## Write Data To Snowflake

In [6]:
# Write data to Snowflake
from snowflake.connector.pandas_tools import write_pandas

write_pandas(conn=ctx,df=df,table_name='PY_TEST_PD2021_WK01',database='TIL_PLAYGROUND',schema='TEMP',auto_create_table=True)

(True,
 1,
 3,
 [('gsrlzfmsav/file0.txt', 'LOADED', 3, 3, 1, 0, None, None, None, None)])

## Read a SQL Script and Execute it

In [7]:
# Open and read the file as a single buffer
fd = open('../data/example_sql_script.sql', 'r')
sqlFile = fd.read()
fd.close()

print(sqlFile)

cur.execute(sqlFile)

# Fetch the result set from the cursor and deliver it as the pandas DataFrame.
df = cur.fetch_pandas_all()
print(df)

SELECT * FROM TIL_PLAYGROUND.PREPPIN_DATA_INPUTS.PD2021_WK01;
    ORDER_ID CUSTOMER_AGE  BIKE_VALUE EXISTING_CUSTOMER        DATE  \
0          1           22         481                No  2021-04-25   
1          2           28        1825                No  2021-01-23   
2          3           51        1903                No  2021-07-03   
3          4           59        1059                No  2021-01-24   
4          5           44        1764               Yes  2021-08-12   
..       ...          ...         ...               ...         ...   
995      996           42        3460                No  2021-09-03   
996      997           48        3409                No  2021-01-17   
997      998           81        2534                No  2021-03-26   
998      999           37        4312               Yes  2021-05-25   
999     1000           35        1200               Yes  2021-02-18   

            STORE_BIKE  
0          York - Road  
1          York - Road  
2         

## Update and Create SQL Scripts

In [8]:
# Read Example SQL Script
fd = open('../data/example_sql_script.sql', 'r')
sqlFile = fd.read()
fd.close()

# Current SQL script
print(sqlFile)

# Change file contents and view
sqlFile2 = sqlFile.replace('PD2021','PD2022')
print(sqlFile2)

# Create new SQL Script file
f = open('../data/example_sql_script2.sql', "w")
f.write(sqlFile2)
f.close()

SELECT * FROM TIL_PLAYGROUND.PREPPIN_DATA_INPUTS.PD2021_WK01;
SELECT * FROM TIL_PLAYGROUND.PREPPIN_DATA_INPUTS.PD2022_WK01;


## Additional Resources
- 📰 **Snowflake Docs** - Python Connector Pandas - https://docs.snowflake.com/en/developer-guide/python-connector/python-connector-pandas
- 📺 **Dot PI** - Connect Python to Snowflake - https://www.youtube.com/watch?v=P560h6vZ4_Y
- 📰 **pyodbc library** - Connecting to SQL via ODBC connections - https://github.com/mkleehammer/pyodbc/wiki
- 📺 **Sigma Coding** - How to Use PYODBC With SQL Servers in Python - https://www.youtube.com/watch?v=eDXX5evRgQw

## Summary

In this workshop we connected to Snowflake, read & wrote data to a database, and worked with reading and modifying SQL script files.