## Outline
* Sources:
    * https://www.kaggle.com/kingburrito666/cannabis-strains/downloads/cannabis-strains.zip/9
    * https://www.kaggle.com/tictactouka/cannabis
    
*Steps*
* Extract data into SQLite file, read tables into DataFrames with extracted columns
    * Intermediary step: transform those DFs by selecting columns, renaming, and possibly joining based on the foreign key relationships
* Extract data CSV file, read table into DataFrame with extracted columns
* Transform DFs with inner join on strain name
* Load the resulting DataFrame into postgres DB

*Questions*
* Is it ok that the final result only has one table or do we need multiple tables
* Does this produce a DB with a good enough Size

## Final Table columns:
* Id (serial)
* Strain (sql inner join with csv)
* Type (csv Type)
* Breeder
* Rating_1 (csv - community rating)
* Rating_2 (sql - medical effect rating)
* Effects (csv - 
* Medical Effects
* Flavor
* Description

In [3]:
import pandas as pd
from sqlalchemy import create_engine

# Process SQLite File
* For reference: ~/classwork/10-Advanced-Data-Storage-and-Retrieval/2/Activities/03-Ins_Basic_Updating/Solved/Ins_Basic_Updating.ipynb

### Import SQLite file

In [11]:
# Create the connection engine
engine = create_engine("sqlite:///Resources/strains.sqlite")
conn = engine.connect()

### Convert Strains table into DataFrame
* Extract columns:
    * id (fk for MedicalEffects), name, breeder (fk for Breeder id), 
* Rename into this:
    * strain_id, strain_name, breeder_id

In [14]:
strains = pd.read_sql("SELECT id AS strain_id, name as strain_name, breeder as breeder_id FROM Strains",conn)

In [15]:
strains.head()

Unnamed: 0,strain_id,strain_name,breeder_id
0,1,Af-Pak,1
1,2,00 Cheese,2
2,3,Alien BubbleGum,3
3,4,Cherry OG Kush,4
4,5,Ak 420,5


### Convert MedicalEffects Table into DataFrame
* Extract these columns:
    * strainid, name, info, rating
* And rename them into this format:
    * strainid, medical_effect, medical_info, medical_rating

In [17]:
medical_effects = pd.read_sql("SELECT strain_id, name as medical_effect, info as medical_info, rating as medical_rating FROM MedicalEffects", conn)

In [18]:
medical_effects.head()

Unnamed: 0,strain_id,medical_effect,medical_info,medical_rating
0,6,Anorexia and Cachexia,Affects / helps even in small doses very well ...,4.0
1,6,Autoimmune Diseases and Inflammation,Affects / helps even in small doses very well ...,4.0
2,7,Psychiatric Symptoms,Affects / helps even in small doses extremly w...,5.0
3,12,Autoimmune Diseases and Inflammation,Affects / helps even in small doses extremly w...,5.0
4,19,Pain,Affects / helps even in small doses very well ...,4.0


### Convert Breeder Table into DataFrame
* Extract these columns:
    * id, name
* Rename into this format:
    * breeder_id, breeder_name

In [19]:
breeders = pd.read_sql("SELECT id as breeder_id, name as breeder_name FROM Breeders", conn)

In [20]:
breeders.head()

Unnamed: 0,breeder_id,breeder_name
0,1,210Beans
1,2,00 Seeds Bank
2,3,207 Seeds
3,4,420 Seeds
4,5,420 Genetics


# Process cannabis.csv 

### Convert csv to DataFrame

In [23]:
file = 'Resources/cannabis.csv'
cannabis_df = pd.read_csv(file)

### Create processed DataFrame
* Extract columns: Strain, Rating, Effects, Flavor, Description
* Rename columns: strain, community_rating, effects, flavor, description

In [24]:
cannabis_cols = ["Strain", "Type", "Rating", "Effects", "Flavor", "Description"]
cannabis_transformed = cannabis_df[cannabis_cols].copy()

In [25]:
cannabis_transformed = cannabis_transformed.rename(columns={"Rating": "Community Rating",
                                                           "Effects": "Social Effects",
                                                           })

In [27]:
cannabis_transformed.drop_duplicates("Strain", inplace=True)

In [32]:
cannabis_transformed = cannabis_transformed.reset_index()

In [33]:
cannabis_transformed.head()

Unnamed: 0,Strain,Type,Community Rating,Social Effects,Flavor,Description
0,100-Og,hybrid,4.0,"Creative,Energetic,Tingly,Euphoric,Relaxed","Earthy,Sweet,Citrus",$100 OG is a 50/50 hybrid strain that packs a ...
1,98-White-Widow,hybrid,4.7,"Relaxed,Aroused,Creative,Happy,Energetic","Flowery,Violet,Diesel",The ‘98 Aloha White Widow is an especially pot...
2,1024,sativa,4.4,"Uplifted,Happy,Relaxed,Energetic,Creative","Spicy/Herbal,Sage,Woody",1024 is a sativa-dominant hybrid bred in Spain...
3,13-Dawgs,hybrid,4.2,"Tingly,Creative,Hungry,Relaxed,Uplifted","Apricot,Citrus,Grapefruit",13 Dawgs is a hybrid of G13 and Chemdawg genet...
4,24K-Gold,hybrid,4.6,"Happy,Relaxed,Euphoric,Uplifted,Talkative","Citrus,Earthy,Orange","Also known as Kosher Tangie, 24k Gold is a 60%..."


# Create database and populate

### Create database connection

In [None]:
connection_string = "postgres:postgres@localhost:5432/customer_db"
engine = create_engine(f'postgresql://{connection_string}')

In [None]:
# Confirm tables
engine.table_names()

### Load DataFrames into DB

In [None]:
#starter code, must edit
premise_transformed.to_sql(name='premise', con=engine, if_exists='append', index=True)