<a href="https://colab.research.google.com/github/christophermalone/DSCI325/blob/main/Module3_Part4_SQL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 3 | Part 4 | SQL : Aggregate() Actions in SQL

<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

### Example 3.4.P
For this notebook, we will consider data from a pizza place. The following fields are provided in this data.  This data was collected over 2 years -- one dataset for each year.
 
The following 7 fields will be considered here:

*   LocatonID: Unique ID for each pizza store location
*   DeliveryType: Direct (Store completed the delivery) or SubContract (Delivery was subcontracted out)
*   SameZip: Was the delivery address in the same zipcode as the location?
*   Type: Descriptor for how order was obtained (In-Person, Phone / App, Corporate, OtherLocation)
*   Minutes: Minutes to process order and deliver pizza 
*   Comments:  When Minutes over 1 hour, comments are required.


<table width='100%' ><tr><td bgcolor='green'></td></tr></table>

## Making a Connection

Here, an SQLite3 package will be used to connect to the desired database.

In [1]:
import pandas as pd
import sqlite3

# Getting Table Names and Structure

Doing an initial query to identify the table within this database.

In [27]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "SELECT name FROM sqlite_master WHERE type='table'"
                          , connect_db)
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head()

Unnamed: 0,name
0,sqlite_sequence
1,PizzaDelivery2019
2,PizzaDelivery2020


Next, getting the structure of the the <strong>PizzaDelivery2019</strong> table.

In [28]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "PRAGMA table_info(PizzaDelivery2019)"
                          , connect_db)
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(10)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,TableID,INTEGER,0,,1
1,1,LocationID,VARCHAR,0,,0
2,2,DeliveryType,VARCHAR,0,,0
3,3,SameZip,VARCHAR,0,,0
4,4,Type,VARCHAR,0,,0
5,5,Minutes,DOUBLE,0,,0
6,6,Comments,VARCHAR,0,,0


Next, getting the structure of the the <strong>PizzaDelivery2020</strong> table.

In [29]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "PRAGMA table_info(PizzaDelivery2020)"
                          , connect_db)
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(10)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,TableID,INTEGER,0,,1
1,1,LocationID,VARCHAR,0,,0
2,2,DeliveryType,VARCHAR,0,,0
3,3,SameZip,VARCHAR,0,,0
4,4,Type,VARCHAR,0,,0
5,5,Quantity,INTEGER,0,,0
6,6,Minutes,DOUBLE,0,,0
7,7,Comments,VARCHAR,0,,0


# Basic Summaries

To begin, suppose the goal is to obtain the average delivery time across all deliveries.

In [7]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT AVG(Minutes) AS 'Avg Delivery Time' 
                          FROM PizzaDelivery2019
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head()

Unnamed: 0,Avg Delivery Time
0,9.624701


Next, get the average delivery time for each delivery type.

In [9]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT AVG(Minutes) AS 'Avg Delivery Time' 
                          FROM PizzaDelivery2019
                          GROUP BY DeliveryType
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head()

Unnamed: 0,Avg Delivery Time
0,9.765571
1,8.130779


Notice that the code above does not properly identify which row is for Direct and which is for SubContract.  This happened because DeliveryTyep was not selected in the initial SELECT statments. 

<strong>Note</strong>: The ROUND() function can be used to reduce the number of decimal places used in the output.

In [11]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT DeliveryType, ROUND(AVG(Minutes),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2019
                          GROUP BY DeliveryType
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head()

Unnamed: 0,DeliveryType,Avg Delivery Time
0,Direct,9.8
1,SubContract,8.1


The following will obtain the average delivery time across Delivery Type and Samezip.  

<strong>Note</strong>:  The column order in the output is determined by the SELECT statment -- not the order in which they appear in the GROUP BY statement. (This is different than R and Python).

In [13]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Samezip, DeliveryType, ROUND(AVG(Minutes),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2019
                          GROUP BY Samezip, DeliveryType
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head()

Unnamed: 0,SameZip,DeliveryType,Avg Delivery Time
0,No,Direct,9.4
1,No,SubContract,9.5
2,Yes,Direct,10.0
3,Yes,SubContract,6.3


The following extends the summaries across Type, Samezip, and Delivery Type.

In [14]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Type, Samezip, DeliveryType, ROUND(AVG(Minutes),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2019
                          GROUP BY Type, Samezip, DeliveryType
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(16)

Unnamed: 0,Type,SameZip,DeliveryType,Avg Delivery Time
0,Corporate,No,Direct,11.6
1,Corporate,No,SubContract,11.9
2,Corporate,Yes,Direct,13.0
3,Corporate,Yes,SubContract,6.8
4,In-Person,No,Direct,5.4
5,In-Person,No,SubContract,5.3
6,In-Person,Yes,Direct,5.9
7,In-Person,Yes,SubContract,4.1
8,OtherLocation,No,Direct,17.6
9,OtherLocation,No,SubContract,16.5


The <strong>ORDER BY</strong> statement can be used to sort the rows.

In [15]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Type, Samezip, DeliveryType, ROUND(AVG(Minutes),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2019
                          GROUP BY Type, Samezip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(16)

Unnamed: 0,Type,SameZip,DeliveryType,Avg Delivery Time
0,Corporate,No,Direct,11.6
1,In-Person,No,Direct,5.4
2,OtherLocation,No,Direct,17.6
3,Phone / App,No,Direct,9.3
4,Corporate,Yes,Direct,13.0
5,In-Person,Yes,Direct,5.9
6,OtherLocation,Yes,Direct,16.4
7,Phone / App,Yes,Direct,9.5
8,Corporate,No,SubContract,11.9
9,In-Person,No,SubContract,5.3


## Breaking apart a Column

There is a <strong>PIVOT</strong> operation for SQL that is akin to the <strong>spread</strong> operation in Python / R.  However, PIVOT is not available in SQLite.

The PIVOT operation can be done in SQLite by first getting the summary table (the table above), a second table that contains some of the desired the struture for the output table needs to be created.  Finally, a JOIN operation is then used to combine the information for these two tables to create the desired output table.

## Getting a list of the comments

The following can be used to obtain all comments for deliveries that took longer than 60 minutes.

In [30]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Comments 
                          FROM PizzaDelivery2019
                          WHERE Minutes > 60
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,Comments
0,Customer Delayed;
1,Customer Delayed;
2,Changed Order;
3,Customer Delayed; Changed Order;
4,Changed Order;
5,Customer Delayed; Changed Order;
6,Changed Order;
7,Chained Delivery; Number = 3; Changed Order;
8,Customer Delayed; Changed Order;
9,Chained Delivery; Number = 6;


<font size="+2"><strong>Issue</strong></font>

A couple locations have <i>chained</i> a few of their deliveries.  This means that a sequence of deliveries were made at once.  Locations are not supposed to do this as this is against company policy.   The Number in the Comment field indicates how many deliveries were made in this sequence of deliveries.

## Fixing the Issue

The <strong>ADD COLUMN</strong> command can be used to add a column to an exsisting table.  The necessary structure for the new column should be provided.



*   ADD COLUMN with no structure (BAD): ADD COLUMN Quantity
*   ADD COLUMN with some structure specified (BETTER): ADD COLUMN Quantity INT DEFAULT 1






In [197]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          ALTER TABLE PizzaDelivery2019
                          ADD COLUMN Quantity;
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()


TypeError: ignored

Taking a look at newly created column when Comments contain an "=" sign.

In [199]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Quantity, Comments 
                          FROM PizzaDelivery2019
                          WHERE INSTR(Comments," = ") > 0
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,Quantity,Comments
0,,Chained Delivery; Number = 3; Changed Order;
1,,Chained Delivery; Number = 6;
2,,Chained Delivery; Number = 6;
3,,Chained Delivery; Number = 4; Changed Order;
4,,Chained Delivery; Number = 4;
5,,Chained Delivery; Number = 3; Changed Order; W...
6,,Chained Delivery; Number = 3; Changed Order;
7,,Chained Delivery; Number = 5; Wrong Address;
8,,Chained Delivery; Number = 2; Changed Order; W...
9,,Chained Delivery; Number = 4; Wrong Address;


## No DROP in SQLite

<table width='100%'><tr><td bgcolor='orange'>&nbsp;</td></tr></table>

In most situations, a <strong>DROP</strong> statement could be used to remove an unwanted field.  However, DROP is not available in SQLite.  As a result, the following will be done here.

1.  Make a copy of initial table -- without the column to be dropped.
2.  Drop the initial table.
3.  Rename the new table to have the same name as the initial table.

Step #1: Make copy of initial table -- <strong>without</strong> the column to be removed, i.e Quantity.

In [214]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
dbconnection.cursor().execute("""
                                CREATE TABLE PizzaDelivery2019_v2 AS 
                                SELECT TableID, LocationID, DeliveryType, SameZip, Type, Minutes, Comments
                                FROM PizzaDelivery2019
                              """
                      )
                       
#Closing the connection
dbconnection.close()

Before proceeding to Step 2, make sure the copy of the initial table was 
successfully created.

In [215]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "SELECT name FROM sqlite_master WHERE type='table'"
                          , dbconnection)
#Closing the connection
dbconnection.close()

#Using pandas to show output
df.head()

Unnamed: 0,name
0,sqlite_sequence
1,PizzaDelivery2019
2,PizzaDelivery2020
3,PizzaDelivery2019_v2


Step #2: Drop the initial table

In [216]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
dbconnection.cursor().execute("""
                               DROP TABLE PizzaDelivery2019
                              """
                      )
                       
#Closing the connection
dbconnection.close()

Again, before proceeding to Step 3, make sure the initial table was dropped successfully.

In [217]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "SELECT name FROM sqlite_master WHERE type='table'"
                          , dbconnection)
#Closing the connection
dbconnection.close()

#Using pandas to show output
df.head()

Unnamed: 0,name
0,sqlite_sequence
1,PizzaDelivery2020
2,PizzaDelivery2019_v2


Step #3: Rename the copied table to have same name as orginial table.

In [218]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
dbconnection.cursor().execute("""
                                 ALTER TABLE PizzaDelivery2019_v2 RENAME TO PizzaDelivery2019
                              """
                      )
                       
#Closing the connection
dbconnection.close()

Verify that the rename was successful. The initial table is now back in its orginal form.

In [219]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "SELECT name FROM sqlite_master WHERE type='table'"
                          , dbconnection)
#Closing the connection
dbconnection.close()

#Using pandas to show output
df.head()

Unnamed: 0,name
0,sqlite_sequence
1,PizzaDelivery2020
2,PizzaDelivery2019


A check that only the only variables of the orginal table exist in this table.

In [220]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "PRAGMA table_info(PizzaDelivery2019)"
                          , dbconnection)
#Closing the connection
dbconnection.close()

#Using pandas to show output
df.head(10)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,TableID,INT,0,,0
1,1,LocationID,TEXT,0,,0
2,2,DeliveryType,TEXT,0,,0
3,3,SameZip,TEXT,0,,0
4,4,Type,TEXT,0,,0
5,5,Minutes,REAL,0,,0
6,6,Comments,TEXT,0,,0


<table width='100%'><tr><td bgcolor='orange'>&nbsp;</td></tr></table>

## Specify Some Structure for ADD COLUMN Quantity

As stated above, it is a good idea to specify the necessary strutrue when adding a new column to a table with an database.  Here, the following specification is provided when adding the Quantity column.


*   INT: specifies that the new field should have data type integer
*   DEFAULT 1; specifies that the default value be set to 1



In [None]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          ALTER TABLE PizzaDelivery2019
                          ADD COLUMN Quantity INT DEFAULT 1;
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

The PRAGMA statement can be used on the table to verify the setting for the Quantity field.

In [225]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "PRAGMA table_info(PizzaDelivery2019)"
                          , connect_db)
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(10)

Unnamed: 0,cid,name,type,notnull,dflt_value,pk
0,0,TableID,INT,0,,0
1,1,LocationID,TEXT,0,,0
2,2,DeliveryType,TEXT,0,,0
3,3,SameZip,TEXT,0,,0
4,4,Type,TEXT,0,,0
5,5,Minutes,REAL,0,,0
6,6,Comments,TEXT,0,,0
7,7,Quantity,INT,0,1.0,0


Taking a look at the Quantity value for the rows that need to be updated.

In [226]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT TableID, Quantity, Comments 
                          FROM PizzaDelivery2019
                          WHERE INSTR(Comments," = ") > 0
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,TableID,Quantity,Comments
0,6821,1,Chained Delivery; Number = 3; Changed Order;
1,6826,1,Chained Delivery; Number = 6;
2,6828,1,Chained Delivery; Number = 6;
3,6832,1,Chained Delivery; Number = 4; Changed Order;
4,6855,1,Chained Delivery; Number = 4;
5,7746,1,Chained Delivery; Number = 3; Changed Order; W...
6,7747,1,Chained Delivery; Number = 3; Changed Order;
7,7748,1,Chained Delivery; Number = 5; Wrong Address;
8,7749,1,Chained Delivery; Number = 2; Changed Order; W...
9,7752,1,Chained Delivery; Number = 4; Wrong Address;


## Pulling out the Number value from the Comments

The following set of commands successfully pulls out the Number from the Comments field when appropriate.  

In [132]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT TableID, Quantity, Comments,
                                 CASE
                                    WHEN INSTR(Comments," = ") > 0 THEN SUBSTR(Comments, INSTR(Comments," = ")+2,2)
                                    ELSE 1
                                  END AS Number 
                          FROM PizzaDelivery2019 
                          WHERE INSTR(Comments," = ") > 0
                       """
                          , connect_db)
                      
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,TableID,Quantity,Comments,Number
0,6821,1,Chained Delivery; Number = 3; Changed Order;,3
1,6826,1,Chained Delivery; Number = 6;,6
2,6828,1,Chained Delivery; Number = 6;,6
3,6832,1,Chained Delivery; Number = 4; Changed Order;,4
4,6855,1,Chained Delivery; Number = 4;,4
5,7746,1,Chained Delivery; Number = 3; Changed Order; W...,3
6,7747,1,Chained Delivery; Number = 3; Changed Order;,3
7,7748,1,Chained Delivery; Number = 5; Wrong Address;,5
8,7749,1,Chained Delivery; Number = 2; Changed Order; W...,2
9,7752,1,Chained Delivery; Number = 4; Wrong Address;,4


A Python dataframe can be added as a table into our database using the <strong>to_sql</strong> function.

<p align='center'><font size="+1">df.to_sql(<i>'table name'</i>, <i>db_connection</i>, if_exists='replace')</font></p>

Line #16 in the following code save the Python dataframe into a table called PDFix in the PizzaDelivery.db.

In [229]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                         SELECT TableID, Quantity, Comments,
                                CASE
                                   WHEN INSTR(Comments," = ") > 0 THEN SUBSTR(Comments, INSTR(Comments," = ")+2,2)
                                   ELSE 1
                                END AS Number 
                         FROM PizzaDelivery2019 
                         WHERE INSTR(Comments," = ") > 0
                        """
                        , dbconnection)
                       
df.to_sql('PDFix',dbconnection, if_exists='replace' )

#Closing the connection
dbconnection.close()
                       



Check to see if table has been successfully added to the database.

In [230]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query(
                        "SELECT name FROM sqlite_master WHERE type='table'"
                          , dbconnection)
#Closing the connection
dbconnection.close()

#Using pandas to show output
df.head()

Unnamed: 0,name
0,sqlite_sequence
1,PizzaDelivery2020
2,PizzaDelivery2019
3,PDFix


**THIS CODE IS NOT WORKING --- NOT SURE WHAT IS GOING ON...**

In [234]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
dbconnection.cursor().execute("""
                               UPDATE PizzaDelivery2019 SET Quantity=
                                                                      (
                                                                        SELECT Number
                                                                        FROM PDFix
                                                                        WHERE PDFix.TableID = PizzaDelivery2019.TableID
                                                                      )  
                              """
                              )
                       
#Closing the connection
dbconnection.close()

**CANNOT GET A SIMPLE UPDATE CODE TO WORK**

In [237]:
#Making a connection
dbconnection = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
dbconnection.cursor().execute("""
                               UPDATE PizzaDelivery2019 SET Quantity= 3 WHERE TableID = 6821                              """
                             )
                       
#Closing the connection
dbconnection.close()

Check to see if UPDATE code is working...  No ...

In [238]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT TableID, Quantity, Comments      
                          FROM PizzaDelivery2019 
                          WHERE INSTR(Comments," = ") > 0
                       """
                          , connect_db)
                      
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,TableID,Quantity,Comments
0,6821,1,Chained Delivery; Number = 3; Changed Order;
1,6826,1,Chained Delivery; Number = 6;
2,6828,1,Chained Delivery; Number = 6;
3,6832,1,Chained Delivery; Number = 4; Changed Order;
4,6855,1,Chained Delivery; Number = 4;
5,7746,1,Chained Delivery; Number = 3; Changed Order; W...
6,7747,1,Chained Delivery; Number = 3; Changed Order;
7,7748,1,Chained Delivery; Number = 5; Wrong Address;
8,7749,1,Chained Delivery; Number = 2; Changed Order; W...
9,7752,1,Chained Delivery; Number = 4; Wrong Address;




---

---

---

---









## Getting Summaries Directly -- without updating database

The necessary summaries can be created <i>without</i> creating a new column in the table in the database.  The following code creates a new variable called Number which is used in the basic summaries, but not actually put into the database table.

In [87]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""

                       SELECT Type, SameZip, DeliveryType, ROUND(SUM(Minutes)/SUM(Number),1) AS 'Avg Delivery Time'
                       FROM
                          (
                            SELECT Type, SameZip, DeliveryType, Minutes, Comments,
                                   CASE
                                      WHEN INSTR(Comments," = ") > 0 THEN SUBSTR(Comments, INSTR(Comments," = ")+2,2)
                                    ELSE 1
                                  END AS Number
                            FROM PizzaDelivery2019
                            
                          )
                          GROUP BY Type, SameZip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(200)

Unnamed: 0,Type,SameZip,DeliveryType,Avg Delivery Time
0,Corporate,No,Direct,11.6
1,In-Person,No,Direct,5.4
2,OtherLocation,No,Direct,17.6
3,Phone / App,No,Direct,9.3
4,Corporate,Yes,Direct,12.9
5,In-Person,Yes,Direct,5.9
6,OtherLocation,Yes,Direct,16.1
7,Phone / App,Yes,Direct,9.4
8,Corporate,No,SubContract,11.9
9,In-Person,No,SubContract,5.3


## Analysis for 2020

The analysis in SQL for 2020 is much easier as the Quanity column was placed into this data table upon the data collection form.

In [240]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT Type, Samezip, DeliveryType, ROUND(SUM(Minutes)/SUM(Quantity),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2020
                          GROUP BY Type, Samezip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(16)

Unnamed: 0,Type,SameZip,DeliveryType,Avg Delivery Time
0,Corporate,No,Direct,17.1
1,In-Person,No,Direct,8.0
2,OtherLocation,No,Direct,25.0
3,Phone / App,No,Direct,15.1
4,Corporate,Yes,Direct,18.3
5,In-Person,Yes,Direct,9.4
6,OtherLocation,Yes,Direct,24.5
7,Phone / App,Yes,Direct,16.8
8,Corporate,No,Subcontract,7.0
9,In-Person,No,Subcontract,3.0


Simple indicate that this data is for 2020 by adding a Year variable to this table.

In [247]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df = pd.read_sql_query("""
                          SELECT '2020' AS Year, Type, Samezip, DeliveryType, ROUND(SUM(Minutes)/SUM(Quantity),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2020
                          GROUP BY Type, Samezip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df.head(16)

Unnamed: 0,Year,Type,SameZip,DeliveryType,Avg Delivery Time
0,2020,Corporate,No,Direct,17.1
1,2020,In-Person,No,Direct,8.0
2,2020,OtherLocation,No,Direct,25.0
3,2020,Phone / App,No,Direct,15.1
4,2020,Corporate,Yes,Direct,18.3
5,2020,In-Person,Yes,Direct,9.4
6,2020,OtherLocation,Yes,Direct,24.5
7,2020,Phone / App,Yes,Direct,16.8
8,2020,Corporate,No,Subcontract,7.0
9,2020,In-Person,No,Subcontract,3.0


## Append two dataframe together 

In [262]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df_2019 = pd.read_sql_query("""

                       SELECT '2019' as Year, Type, SameZip, DeliveryType, ROUND(SUM(Minutes)/SUM(Number),1) AS 'Avg Delivery Time'
                       FROM
                          (
                            SELECT Type, SameZip, DeliveryType, Minutes, Comments,
                                   CASE
                                      WHEN INSTR(Comments," = ") > 0 THEN SUBSTR(Comments, INSTR(Comments," = ")+2,2)
                                    ELSE 1
                                  END AS Number
                            FROM PizzaDelivery2019
                            
                          )
                          GROUP BY Type, SameZip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df_2019.head(200)

Unnamed: 0,Year,Type,SameZip,DeliveryType,Avg Delivery Time
0,2019,Corporate,No,Direct,11.6
1,2019,In-Person,No,Direct,5.4
2,2019,OtherLocation,No,Direct,17.6
3,2019,Phone / App,No,Direct,9.3
4,2019,Corporate,Yes,Direct,12.9
5,2019,In-Person,Yes,Direct,5.9
6,2019,OtherLocation,Yes,Direct,16.1
7,2019,Phone / App,Yes,Direct,9.4
8,2019,Corporate,No,SubContract,11.9
9,2019,In-Person,No,SubContract,5.3


In [263]:
#Making a connection
connect_db = sqlite3.connect("/content/sample_data/PizzaDelivery.db")

#SQL Statement
df_2020 = pd.read_sql_query("""
                          SELECT '2020' AS Year, Type, Samezip, DeliveryType, ROUND(SUM(Minutes)/SUM(Quantity),1) AS 'Avg Delivery Time'
                          FROM PizzaDelivery2020
                          GROUP BY Type, Samezip, DeliveryType
                          ORDER BY DeliveryType, Samezip, Type
                       """
                          , connect_db)
                       
#Closing the connection
connect_db.close()

#Using pandas to show output
df_2020.head(16)

Unnamed: 0,Year,Type,SameZip,DeliveryType,Avg Delivery Time
0,2020,Corporate,No,Direct,17.1
1,2020,In-Person,No,Direct,8.0
2,2020,OtherLocation,No,Direct,25.0
3,2020,Phone / App,No,Direct,15.1
4,2020,Corporate,Yes,Direct,18.3
5,2020,In-Person,Yes,Direct,9.4
6,2020,OtherLocation,Yes,Direct,24.5
7,2020,Phone / App,Yes,Direct,16.8
8,2020,Corporate,No,Subcontract,7.0
9,2020,In-Person,No,Subcontract,3.0


In [267]:
pip install dfply




In [268]:
from dfply import *

In [269]:
Outcomes = (
             df_2019
              >> bind_rows(df_2020)
           )

Outcomes = Outcomes
print(Outcomes.to_string(index=False))

 Year           Type SameZip DeliveryType  Avg Delivery Time
 2019      Corporate      No       Direct               11.6
 2019      In-Person      No       Direct                5.4
 2019  OtherLocation      No       Direct               17.6
 2019    Phone / App      No       Direct                9.3
 2019      Corporate     Yes       Direct               12.9
 2019      In-Person     Yes       Direct                5.9
 2019  OtherLocation     Yes       Direct               16.1
 2019    Phone / App     Yes       Direct                9.4
 2019      Corporate      No  SubContract               11.9
 2019      In-Person      No  SubContract                5.3
 2019  OtherLocation      No  SubContract               16.5
 2019    Phone / App      No  SubContract               10.1
 2019      Corporate     Yes  SubContract                6.8
 2019      In-Person     Yes  SubContract                4.1
 2019  OtherLocation     Yes  SubContract               11.9
 2019    Phone / App    