# Mini Project - SQL: Specialty Foods Inc.

## Introduction

**Business Context.** Specialty Foods Inc. is a food retailer focusing on the higher end of the market. You are a new marketing team member that was hired based on your data analytic skills. The company wants to improve business results through more data-driven analysis and decision-making. Traditionally the marketing department has launched campaigns to increase sales using qualitative research focused on previous experience and an understanding of the market. 

Given your data analytic skills, your manager has asked you to help the marketing team by gathering insights into the company's type of customers and the products they buy. You are also invited to review past campaigns and suggest improvements for future marketing campaigns. In addition to better understanding the business, your analysis should result in specific recommendations on how the company can improve business results.


**Objective:** This SQL mini project aims to practice your SQL skills by querying and aggregating data.

**Analysis / Data Analytics:** Your team should begin by answering the questions below for your analysis. However, these questions are aimed only at getting you started and practicing your skills. Next, you should consider further study and determine what additional questions will help you understand the business and make recommendations to improve the business's results.

## Overview of the data

The data for this case is contained in three separate tables, which are extracts from the customer, sales, and marketing databases of the company and contain the information below:

**[Customer table](data/customer.csv) includes the following information:**
* **ID**: customer unique ID
* **Income**: customer’s yearly household income
* **Kids**: number of small children in the household
* **Teens**: number of teenagers in the household
* **Age**: age of customer
* **Divorced**: 1 if the person is divorced, 0 otherwise
* **Married**: 1 if the person is married, 0 otherwise
* **Single**: 1 if the person is single, 0 otherwise
* **Together**: 1 if the person is living with a partner, 0 otherwise
* **Widowed**: 1 if the person is widowed, 0 otherwise
* **Basic**: 1 if education is secondary level (high school), 0 otherwise
* **Graduate**: 1 if education is university level, 0 otherwise
* **Master**: 1 if education is masters level, 0 otherwise
* **PhD**: 1 if education is doctorate level, 0 otherwise
* **State**: US state of residency

**[Sales table](data/sales.csv) table includes the following information:**
* **ID**: customer unique ID
* **Recency**: days since last purchase
* **Wines**: amount spent on wine
* **Fruits**: amount spent on fruit
* **Meats**: amount spent on meat
* **Seafood**: amount spent on seafood
* **Sweets**: amount spent on sweets
* **Premium**: amount spent on premium products
* **Regular**: amount spent on standard products
* **Deals**: number of purchases made with a discount
* **Web**: number of website purchases
* **Catalog**: number of catalog purchases
* **Store**: number of in-store purchases
* **Days**: number of days since last purchase
* **Visits**: number of website visits in past 3 months

**[Marketing table](data/marketing.csv) table includes the following information:**
* **ID**: customer unique ID
* **MC3**: 1 if customer made a purchase based on Campaign 3, otherwise 0
* **MC4**: 1 if customer made a purchase based on Campaign 4, otherwise 0
* **MC5**: 1 if customer made a purchase based on Campaign 5, otherwise 0
* **MC1**: 1 if customer made a purchase based on Campaign 1, otherwise 0
* **MC2**: 1 if customer made a purchase based on Campaign 2, otherwise 0
* **Complaint**: 1 if customer made a complaint in past year
* **Pilot**: 1 if customer made a purchase based on a recent pilot marketing campaign for a new product, otherwise 0
* **Enrollment**: date the customer enrolled with the company


To load the SQLite file (which you can find at [`specialtyfood.db`](/files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db), we need to run the below two cells. Don’t worry about learning that code; it’s not SQL.

In [1]:
%FETCH /files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db specialtyfoods

Start downloading from URL /files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db
Downloading /files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db 0.41% complete
Downloading /files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db 1e+02% complete
Finished downloading 249856 bytes from URL /files/miniprojects/case.miniproject.sql_fellow/data/specialtyfoods.db
Writing downloaded data to file specialtyfoods
Finished writing file


In [2]:
%LOAD specialtyfoods RW

### Exercise 1

Copy the code below to the code answer cell and run the SQL query to test if your database engine is working. You will see your system's current date and time.

```SQL
SELECT DATETIME();
```


**Answer.**

In [3]:
SELECT DATETIME();

DATETIME()
2023-05-19 14:57:25


### Exercise 2

Using our new database query the tables to understand what type of data is included in each table.



Let's query the [`customer`](data/customer.csv) table in our database like this:

In [4]:
SELECT * FROM customer LIMIT 5;

ID,Income,Kids,Teens,Age,Divorced,Married,Single,Together,Widow,Basic,Graduation,Master,PhD,State
2873,213734,0.0,0.0,75,0,0,1,0,0,0,0,0,1,State-California
1629,205471,0.0,0.0,50,0,0,0,1,0,0,1,0,0,State-Texas
1239,202692,0.0,0.0,46,1,0,0,0,0,0,1,0,0,State-Texas
1191,202160,0.0,0.0,43,0,0,0,1,0,0,0,0,1,State-Texas
1116,201970,0.0,0.0,37,0,0,1,0,0,0,1,0,0,State-Florida


Write a query to understand your data in the [`sales`](data/sales.csv) table. 


**Answer.**

In [5]:
SELECT * FROM sales LIMIT 5;

ID,Recency,Wines,Fruits,Meats,Seafood,Sweets,Premium,Regular,Deals,Website,Catalog,Store,Visits
1428,99,0,36,18,42,36,72,60,1,1.0,0.0,3,8
2152,99,0,36,18,42,36,72,60,1,1.0,0.0,3,8
2014,99,21,42,55,13,34,68,97,1,1.0,0.0,3,8
2660,99,81,0,27,0,3,13,98,1,1.0,0.0,3,5
1196,99,175,0,23,0,0,13,184,1,2.0,0.0,3,6


Write a query to understand your data in the [`marketing`](data/marketing.csv) table.

**Answer.**

In [6]:
SELECT * FROM marketing LIMIT 5;

ID,MC3,MC4,MC5,MC1,MC2,Complaint,Pilot,Enrollment
1188,0,1,0,0,0,0,0,12/3/2020
1970,0,0,0,0,0,0,0,12/3/2020
1043,0,0,0,0,0,0,0,12/2/2020
1777,0,0,0,0,0,0,0,12/2/2020
2787,0,0,0,1,0,0,0,12/2/2020


### Exercise 3

Query the database for the products that are purchased by customers based on marital status.

Specifically, join the `customer` and `sales` tables and create a query for amount of wine (TotalWines) purchased by customers based on whether they are divorced or not.

Expected output:



|Divorced|TotalWines|
|--------|----------|
|0|1538225|
|1|196111|


**Answer.**

In [7]:
SELECT customer.Divorced, SUM(sales.Wines) AS TotalWines
FROM customer
JOIN sales ON customer.id = sales.id
GROUP BY Divorced;

Divorced,TotalWines
0,1538225
1,196111


### Exercise 4

Query the database to determine what type of customers purchase which products. Can you describe what are the types of customers, e.g. what is the customer persona or segment.

Specifically, create a query to find the total amount spent on sweets (TotalSweets) for customers who have education above the university level and who are not single.

Expected output:

|Master|PhD|TotalSweets|
|------|---|-----------|
|0|0|90682|
|0|1|17683|
|1|0|15213|

**Answer.**

In [8]:
SELECT customer.Master, customer.PhD, SUM(sales.Sweets) AS TotalSweets
FROM customer
JOIN sales ON customer.id = sales.id
WHERE customer.Single = 0
GROUP BY customer.Master, customer.Phd;

Master,PhD,TotalSweets
0,0,90682
0,1,17683
1,0,15213


### Exercise 5

Query your database to discover which products bring in the most revenues for different customer segments.

Specifically, create a query to find the average age (AveAge) and average income (AveIncome) of customers from California along with their total sales for meats (TotalMeats) and seafood (TotalSeafood).

Expected output:

|AveAge|AveIncome|TotalMeats|TotalSeafood|
|------|---------|----------|------------|
|50.3081081081081|152234.313513514|147089|40222|


**Answer.**

In [9]:
SELECT AVG(customer.Age) AS AveAge, AVG(customer.Income) As AveIncome,
SUM(sales.Meats) AS TotalMeats, SUM(sales.Seafood) AS TotalSeafood
FROM customer
JOIN sales ON customer.id = sales.id
WHERE customer.State = "State-California";

AveAge,AveIncome,TotalMeats,TotalSeafood
50.3081081081081,152234.313513514,147089,40222


## SQL Bonus Question (Optional)

### Exercise 6

Create one query that outputs the total sales from premium products and the average income of customers for customers over the age of 50 who participated in marketing campaign 5. 

Hint: use two inner joins

Expected output:

|TotalPremium|AveIncome|
|------------|---------|
|12077|181205.797297297|

**Answer.**

In [10]:
SELECT SUM(sales.Premium) AS TotalPremium, AVG(customer.Income) AS AveIncome
FROM customer
INNER JOIN sales ON customer.id = sales.id 
INNER JOIN marketing ON customer.id = marketing.id
WHERE customer.Age > 50 AND marketing.MC5 = 1;

TotalPremium,AveIncome
12077,181205.797297297
