# Finding our best-performing salespeople and products

**Total points**: 24 points

## Introduction

**Business Context.** You work for AdventureWorks, a company that sells outdoor sporting equipment. The company has many different locations and has been recording the sales of different locations on various products. You, their new data scientist, have been tasked with the question: *What are our best products and salespeople and how can we use this information to improve our overall performance?*

You have been given access to the relevant data files with documentation from the IT department. Your job is to extract meaningful insights from these data files to help increase sales. First, you will look at the best products and try to see how different products perform in different categories. Second, you will analyze the best salespeople to see if the commission percentage motivates them to sell more.

**Business Problem.** Your task is to *write queries in SQL to carry out the requested analysis*.

**Analytical Context.** You are given the data as a SQLite database. The company has been pretty vague about how they expect you to extract insights, but you have come up with the following plan of attack:

1. Load the database and ensure you can run basic queries against it
2. Look at how product ratings and total sales are related
3. See how products sell in different subcategories (bikes, helmets, socks, etc.)
4. Calculate which salespeople have performed the best in 2014
5. See if total sales are correlated with commission percentage

Of course, this is only your initial plan. As you explore the database, your strategy will likely change.

## Overview of the data

The data for this case is contained in the [`AdventureWorks.db`](/extended.sql_fellow/files/AdventureWorks.db) SQLite database. We will be focusing on the tables that belong to the Sales and Product categories. Complete documentation, with schemas, for the original data (of which you have only a subset) can be found [here](https://dataedo.com/download/AdventureWorks.pdf).

**Product Tables (Pg. 34 in documentation):**
* **Product**: one row per product that the company sells
* **ProductReview**: one row per rating and review left by customers
* **ProductModelProductDescriptionCulture**: a link between products and their longer descriptions also indicating a "culture" - which language and region the product is for
* **ProductDescription**: a longer description of each product, for a specific region
* **ProductCategory**: the broad categories that products fit into
* **ProductSubCategory**: the narrower subcategories that products fit into

**Sales Tables (Pg. 71 in documentation):**
* **SalesPerson**: one row per salesperson, including information on their commission and performance
* **SalesOrderHeader**: one row per sale summarizing the sale
* **SalesOrderDetail**: many rows per sale, detailing each product that forms part of the sale
* **SalesTerritory**: the different territories where products are sold, including performance
* **CountryRegionCurrency**: the currency used by each region
* **CurrencyRate**: the average and closing exchange rates for each currency compared to the USD

**Tip**: Review the documentation carefully to learn more about the tables (like relevant columns in each) and the relationships between them. Note that not all columns may be available in the subset provided in this case as they are not necessary for the following exercises. 

Importing the libraries:

In [3]:
import pandas as pd
import sqlite3

Let's now load in the database:

In [4]:
cxn = sqlite3.connect('AdventureWorks.db')

To run SQL queries from within this notebook, you should define a variable that includes the SQL statement at the start of your code cell, like this:

In [5]:
# Name your variable five_products
five_products = """
    SELECT * FROM product LIMIT 5;
"""

pd.read_sql(five_products, cxn)

Unnamed: 0,productid,NAME,productnumber,makeflag,finishedgoodsflag,color,safetystocklevel,reorderpoint,standardcost,listprice,...,productline,class,style,productsubcategoryid,productmodelid,sellstartdate,sellenddate,discontinueddate,rowguid,modifieddate
0,1,Adjustable Race,AR-5381,f,f,,1000,750,0.0,0.0,...,,,,,,2008-04-30 00:00:00,,,694215b7-08f7-4c0d-acb1-d734ba44c0c8,2014-02-08 10:01:36.827
1,2,Bearing Ball,BA-8327,f,f,,1000,750,0.0,0.0,...,,,,,,2008-04-30 00:00:00,,,58ae3c20-4f3a-4749-a7d4-d568806cc537,2014-02-08 10:01:36.827
2,3,BB Ball Bearing,BE-2349,t,f,,800,600,0.0,0.0,...,,,,,,2008-04-30 00:00:00,,,9c21aed2-5bfa-4f18-bcb8-f11638dc2e4e,2014-02-08 10:01:36.827
3,4,Headset Ball Bearings,BE-2908,f,f,,800,600,0.0,0.0,...,,,,,,2008-04-30 00:00:00,,,ecfed6cb-51ff-49b5-b06c-7d8ac834db8b,2014-02-08 10:01:36.827
4,316,Blade,BL-2036,t,f,,800,600,0.0,0.0,...,,,,,,2008-04-30 00:00:00,,,e73e9750-603b-4131-89f5-3dd15ed5ff80,2014-02-08 10:01:36.827


#### How to input answers

To solve the exercises of this extended case, you will be asked to assign SQL code to Python string variables. Remember that you can use triple quotes to assign multiple-line text blocks to a variable. For example,

~~~python
rating_ranking = """
    SELECT
        product.productid
    FROM
        product
    LIMIT 5
"""
~~~

For each exercise, the exact string variable you must use in your code is provided as a comment like this: 

In [None]:
# Name your variable rating_ranking

#### How to verify answers

Note that if you want to verify if your code executes correctly, you must execute the query by using the variable that defines your SQL query statement and then the `pd.read_sql()` method to execute the query like this:

~~~python
# Name your variable rating_ranking
rating_ranking = """
    SELECT
        product.productid
    FROM
        product
    LIMIT 5
"""

pd.read_sql(rating_ranking, cxn)
~~~

Your answers, will be graded using the variable you created, so be sure that the code that you successfully tested is indeed the code that you assigned to the variable before you submit the extended case.

Do not round your results (ie., leave them with as many decimal digits they have). Also, be sure to name your columns *exactly* as they are in the sample tables in each exercise. Otherwise you won't get any points when we run our tests after you've submitted your homework.

#### Optional and graded exercises

Graded exercises contribute to your final grade. You will see how many points you will get for each graded exercise. Optional Exercises offer extra practice without affecting your grade, allowing you to explore, learn, and grow without fear of mistakes. Embrace all exercises to enhance your knowledge, build confidence, and reach your full potential.

## Finding our most popular products

The company would like to know which of their products is the most popular among customers. You figure that the average rating given in reviews is correlated with the number of sales of a particular product (that products with higher reviews have more sales).

### Exercise 1 (1 point)

Using the `product` and `productreview` tables, `INNER JOIN` them and rank the products according to their average review rating. Save the SQL code in a string variable called `rating_ranking`. Please make *absolutely sure* to name your variable exactly that or otherwise your answer will not be recorded.

Your output should look like this:

| productid 	| NAME 	| avgrating 	| num_ratings 	|
|-:	|-:	|-:	|-	|
| 709 	| Mountain Bike Socks, M 	| 5.0 	| 1 	|
| ... 	| ... 	| ... 	| ... 	|


In [16]:
# Name your variable rating_ranking
# YOUR CODE HERE
rating_ranking = """
    SELECT product.productid, product.name, AVG(productreview.rating) AS avgrating, COUNT(productreview.rating) AS num_ratings
    FROM product
    INNER JOIN productreview ON product.productid = productreview.productid
    GROUP BY product.productid;
"""

pd.read_sql(rating_ranking, cxn)

Unnamed: 0,productid,NAME,avgrating,num_ratings
0,709,"Mountain Bike Socks, M",5.0,1
1,798,"Road-550-W Yellow, 40",5.0,1
2,937,HL Mountain Pedal,3.0,2


### Exercise 2 

Much to your disappointment, there are only three products with ratings and only four reviews in total! This is nowhere near enough to perform an analysis of the correlation between reviews and total sales. Since we cannot infer the most popular products from the reviews, we will go with an alternative strategy.

#### 2.1 (1 point)

Get the product model ID and description for each product. Include only descriptions for which `productmodelproductdescriptionculture.cultureid = 'en'`.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>productmodelid</th>      <th>description</th>    </tr>  </thead>  <tbody>    <tr>      <td>1</td>      <td>Light-weight, wind-resistant, packs to fit into a pocket.</td>    </tr>    <tr>      <td>2</td>      <td>Traditional style with a flip-up brim; one-size fits all.</td>    </tr>    <tr>      <td>3</td>      <td>Synthetic palm, flexible knuckles, breathable mesh upper. Worn by the AWC team riders.</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

In [116]:
# Name your variable productmodelid_description
# YOUR CODE HERE
productmodelid_description = """
    SELECT productmodelproductdescriptionculture.productmodelid, productdescription.description 
    FROM productmodelproductdescriptionculture
    JOIN productdescription ON productmodelproductdescriptionculture.productdescriptionid = productdescription.productdescriptionid
    WHERE productmodelproductdescriptionculture.cultureid = 'en'
"""

pd.read_sql(productmodelid_description, cxn)

Unnamed: 0,productmodelid,description
0,1,"Light-weight, wind-resistant, packs to fit int..."
1,2,Traditional style with a flip-up brim; one-siz...
2,3,"Synthetic palm, flexible knuckles, breathable ..."
3,4,"Full padding, improved finger flex, durable pa..."
4,5,Each frame is hand-crafted in our Bothell faci...
...,...,...
122,123,Replacement mountain wheel for entry-level rider.
123,124,Replacement mountain wheel for the casual to s...
124,125,High-performance mountain replacement wheel.
125,126,Replacement road rear wheel for entry-level cy...


#### 2.2 (Optional)

Get the model ID, name, description, and total number of sales for each product and display the top-10 selling products. You can infer how often products have been sold by looking at the `salesorderdetail` table (each row might indicate more than one sale, so take note of `OrderQty`).

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>productmodelid</th>      <th>description</th>      <th>NAME</th>      <th>total_orders</th>    </tr>  </thead>  <tbody>    <tr>      <td>2</td>      <td>Traditional style with a flip-up brim; one-size fits all.</td>      <td>AWC Logo Cap</td>      <td>8311</td>    </tr>    <tr>      <td>111</td>      <td>AWC logo water bottle - holds 30 oz; leak-proof.</td>      <td>Water Bottle - 30 oz.</td>      <td>6815</td>    </tr>    <tr>      <td>33</td>      <td>Universal fit, well-vented, lightweight , snap-on visor.</td>      <td>Sport-100 Helmet, Blue</td>      <td>6743</td>    </tr>    <tr>      <td>...</td>      <td>...</td>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** Make the query you wrote in exercise 2.1 a temporary view with the `WITH ... AS` syntax. It will give you the English descriptions of the products as a starting point. Then `INNER JOIN` it with all the other relevant tables.

In [None]:
# Name your variable description_totalorders
# YOUR CODE HERE

raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 3

To get a better sense of the sales, let's look at the correlation between quantity sold and price for each subcategory.

#### 3.1 (1 point)

Write a query that shows how many items were ordered in total for every product in the database. Do not filter by culture.

Your output should look like this:

| productid 	| quantity 	|
|-:	|-:	|
| 707 	| 6266 	|
| 708 	| 6532 	|
| 709 	| 1107 	|
| 710 	| 90 	|
| 711 	| 6743 	|
| 712 	| 8311 	|
| 713 	| 429 	|
| 714 	| 3636 	|
| ... 	| ... 	|

**Hint:** Use the `salesorderdetail` table.

In [117]:
# Name your variable quantities_ordered
# YOUR CODE HERE
quantities_ordered = """
SELECT salesorderdetail.productid, SUM(salesorderdetail.orderqty) AS quantity
FROM salesorderdetail
GROUP BY salesorderdetail.productid
"""

pd.read_sql(quantities_ordered, cxn)

Unnamed: 0,productid,quantity
0,707,6266
1,708,6532
2,709,1107
3,710,90
4,711,6743
...,...,...
261,994,378
262,996,543
263,997,656
264,998,1556


#### 3.2 (1 point)

Write a query that shows the list price for each product, alongside its category and subcategory. Your output should look like this:

| productid 	| category 	| subcategory 	| listprice 	|
|-:	|-:	|-:	|-:	|
| 680 	| Components 	| Road Frames 	| 1431.5 	|
| 706 	| Components 	| Road Frames 	| 1431.5 	|
| 707 	| Accessories 	| Helmets 	| 34.99 	|
| 708 	| Accessories 	| Helmets 	| 34.99 	|
| 709 	| Clothing 	| Socks 	| 9.5 	|
| 710 	| Clothing 	| Socks 	| 9.5 	|
| 711 	| Accessories 	| Helmets 	| 34.99 	|
| 712 	| Clothing 	| Caps 	| 8.99 	|
| 713 	| Clothing 	| Jerseys 	| 49.99 	|
| 714 	| Clothing 	| Jerseys 	| 49.99 	|
| 715 	| Clothing 	| Jerseys 	| 49.99 	|
| 716 	| Clothing 	| Jerseys 	| 49.99 	|
| 717 	| Components 	| Road Frames 	| 1431.5 	|
| 718 	| Components 	| Road Frames 	| 1431.5 	|
| 719 	| Components 	| Road Frames 	| 1431.5 	|
| ... 	| ... 	| ... 	| ... 	|

**Hint:** You will find the product categories in the `productcategory` table, and the subcategories in the `productsubcategory` table.

In [77]:
# Name your variable products_prices
# YOUR CODE HERE
products_prices = """
SELECT product.productid, productcategory.name AS category, productsubcategory.name AS subcategory, product.listprice
FROM product
INNER JOIN productsubcategory ON product.productsubcategoryid = productsubcategory.productsubcategoryid
INNER JOIN productcategory ON productsubcategory.productcategoryid = productcategory.productcategoryid;

"""

pd.read_sql(products_prices, cxn)

Unnamed: 0,productid,category,subcategory,listprice
0,680,Components,Road Frames,1431.50
1,706,Components,Road Frames,1431.50
2,707,Accessories,Helmets,34.99
3,708,Accessories,Helmets,34.99
4,709,Clothing,Socks,9.50
...,...,...,...,...
290,995,Components,Bottom Brackets,101.24
291,996,Components,Bottom Brackets,121.49
292,997,Bikes,Road Bikes,539.99
293,998,Bikes,Road Bikes,539.99


#### 3.3 (Optional)

Merge the queries from exercises 3.1 and 3.2 to obtain a table that shows, for each subcategory, the average list price and the total quantity of products sold. Your output should look like this:

| category 	| subcategory 	| average_price_in_subcategory 	| total_items_sold_in_subcategory 	|
|-:	|-:	|-:	|-:	|
| Accessories 	| Bike Racks 	| 120.0 	| 3166 	|
| Accessories 	| Bike Stands 	| 159.0 	| 249 	|
| Accessories 	| Bottles and Cages 	| 7.989999999999999 	| 10552 	|
| Accessories 	| Cleaners 	| 7.95 	| 3319 	|
| Accessories 	| Fenders 	| 21.98 	| 2121 	|
| Accessories 	| Helmets 	| 34.99 	| 19541 	|
| Accessories 	| Hydration Packs 	| 54.99 	| 2761 	|
| Accessories 	| Locks 	| 25.0 	| 1087 	|
| Accessories 	| Pumps 	| 19.99 	| 1130 	|
| Accessories 	| Tires and Tubes 	| 19.482727272727274 	| 18006 	|
| Bikes 	| Mountain Bikes 	| 1683.3649999999982 	| 28321 	|
| Bikes 	| Road Bikes 	| 1597.45 	| 47196 	|
| Bikes 	| Touring Bikes 	| 1425.2481818181814 	| 14751 	|
| Clothing 	| Bib-Shorts 	| 89.99 	| 3125 	|
| ... 	| ... 	| ... 	| ... 	|

**Hint:** To have two `WITH ... AS` statements in the same query, you separate the subqueries with a comma and don't write `WITH` again. Like this:

~~~sql
WITH first_query_alias AS
(
    SELECT ...
),
second_query_alias AS -- Notice we didn't include a second WITH here
(
    SELECT...
)
SELECT ...
~~~

In [68]:
# Name your variable prices_quantities
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

<class 'NotImplementedError'>: 

There is positive correlation between average price and items sold ($\rho=0.68$). This is somewhat unexpected, since common sense tells us that the more expensive an item is, the lower the demand for it. It is possible that we are witnessing an instance of Simpson's Paradox here. To verify if that is indeed the case, we could instead compute the correlation coefficient for each subcategory, possibly evidencing a negative correlation coefficient in some subcategories. We will not do that right now, however, since it would make us deviate too much from our business problem.

## Finding our top salespeople

As mentioned earlier, we want to find our best salespeople and see whether or not we can incentivize them in an appropriate manner. Namely, we want to determine if the commission percentage we give them motivates them to make more and bigger sales.

### Exercise 4 (1 point)

Find the top five performing salespeople by using the `salesytd` (Sales, year-to-date) column.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>businessentityid</th>      <th>salesytd</th>    </tr>  </thead>  <tbody>    <tr>      <td>276</td>      <td>4251368.5497</td>    </tr>    <tr>      <td>289</td>      <td>4116871.2277</td>    </tr>    <tr>      <td>275</td>      <td>3763178.1787</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** We only need to know the `businessentityid` for each salesperson as this uniquely identifies each salesperson. Your query should therefore only have two columns: `businessentityid` and `salesytd`.

In [119]:
# Name your variable salesperson_sales
# YOUR CODE HERE
salesperson_sales = """
SELECT salesperson.businessentityid, salesperson.salesytd
FROM salesperson
ORDER BY salesperson.salesytd DESC
LIMIT 5

"""

pd.read_sql(salesperson_sales, cxn)

Unnamed: 0,businessentityid,salesytd
0,276,4251369.0
1,289,4116871.0
2,275,3763178.0
3,277,3189418.0
4,290,3121616.0


### Exercise 5 (2 points)

The sales numbers from the previous query are hard-coded into the `salesperson` table, instead of dynamically calculated from each sales record. Currently, we don't know how this number is updated or much about it at all, so it's good to remain skeptical.

Using the ```salesorderheader``` table, find the top 5 salespeople who made the most sales *in the most recent year available* (2014). (There is a column called `subtotal` - use that.) Sales that do not have an associated salesperson should be excluded from your calculations and final output. All orders that were made within the 2014 calendar year should be included.

Your output should look like this:

<table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>salespersonid</th>      <th>totalsales</th>    </tr>  </thead>  <tbody>    <tr>      <td>289</td>      <td>1382996.5839000002</td>    </tr>    <tr>      <td>276</td>      <td>1271088.5216</td>    </tr>    <tr>      <td>...</td>      <td>...</td>    </tr>  </tbody></table>

**Hint:** You can use the syntax `WHERE column >= '1970-01-01'` to generate an arbitrary date in SQLite and compare this to specific dates in the tables (in this example, dates equal to or later than Jan 1, 1970). Additionally, when you want to make sure that columns with empty or null values are excluded from a query in SQLite, you have to add a line like this one to your `WHERE` statement: `my_column IS NOT NULL AND my_column <> ""`. The `<>` operator is the opposite of `=`, that is, it checks that two values are different from each other.

In [121]:
# Name your variable salesperson_totalsales
# YOUR CODE HERE
salesperson_totalsales = """
SELECT salesorderheader.salespersonid, SUM(salesorderheader.subtotal) AS totalsales
FROM salesorderheader
WHERE salesorderheader.orderdate  >= '2014-01-01' AND salesorderheader.salespersonid IS NOT NULL AND salesorderheader.salespersonid <> ""
GROUP BY salespersonid
ORDER BY totalsales DESC
LIMIT 5

"""

pd.read_sql(salesperson_totalsales, cxn)

Unnamed: 0,salespersonid,totalsales
0,289,1382997.0
1,276,1271089.0
2,275,1057247.0
3,282,1044811.0
4,277,1040093.0


You should see right away that there are discrepancies between the two sales totals. This makes sense because we used filters in one table and not the other. Nonetheless, for the remainder of this case, use this dynamically-calculated total as the authoritative answer.

### Exercise 6

Looking at the documentation, you will see that `subtotal` in the ```salesorderheader``` table is calculated from other tables in the database. To validate this figure (instead of trusting it blindly), it could be a good idea to calculate `subtotal` manually. Using the ```salesorderdetail``` and ```salesorderheader``` tables, let's calculate the sales for each salesperson for **the year 2014** and display results for the top 5 salespeople.

#### 6.1 (1 point)

Write a query that shows for each `salesorderid` (find this column in the `salesorderdetail` table) the total amount of money paid. Remember to subtract `unitpricediscount` from each item's price (`unitpricediscount` is a percentage).

Your output should look like this:

| salesorderid 	| ordertotal 	|
|-:	|-:	|
| 43659 	| 20565.6206 	|
| 43660 	| 1294.2529 	|
| 43661 	| 32726.4786 	|
| 43662 	| 28832.5289 	|
| 43663 	| 419.4589 	|
| 43664 	| 24432.608799999995 	|
| 43665 	| 14352.7713 	|
| 43666 	| 5056.4896 	|
| 43667 	| 6107.081999999999 	|
| 43668 	| 35944.156200000005 	|
| 43669 	| 714.7043 	|
| ... 	| ... 	|

In [113]:
# Name your variable order_ordertotal
# YOUR CODE HERE
order_ordertotal = """
SELECT salesorderdetail.salesorderid, SUM((salesorderdetail.unitprice * orderqty) * (1 - salesorderdetail.unitpricediscount / 100)) AS ordertotal
FROM salesorderdetail
GROUP BY salesorderdetail.salesorderid;
"""
pd.read_sql(order_ordertotal,cxn)

Unnamed: 0,salesorderid,ordertotal
0,43659,20565.6206
1,43660,1294.2529
2,43661,32726.4786
3,43662,28832.5289
4,43663,419.4589
...,...,...
31460,75119,42.2800
31461,75120,84.9600
31462,75121,74.9800
31463,75122,30.9700


#### 6.2 (optional)

Using the previous query as a subquery, find the sales for each salesperson for the year 2014 and display results for the top 5 salespeople. Remember to exclude sales that are not associated with a salesperson.

**Hint:** You can get the `salesorderid` and `salespersonid` pairs from the `salesorderheader` table.

In [None]:
# Name your variable salesperson_ordertotal
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

The results are the same as Exercise 5. We still prefer this query though because it is generated from granular data instead of relying on hard-coded figures.

### Exercise 7 (optional)

Let's now see whether there is a positive relationship between the total sales of the salespeople and their commission percentages. Join the previous query (remove the `LIMIT` clause) with the `salesperson` table to get a table like this one:

| salespersonid 	| ordertotalsum 	| commissionpct 	|
|-:	|-:	|-:	|
| 274 	| 178584.36250800002 	| 0.0 	|
| 275 	| 1057247.378572 	| 0.012 	|
| 276 	| 1271088.5214610002 	| 0.015 	|
| 277 	| 1040093.406901 	| 0.015 	|
| ... 	| ... 	| ... 	|

**Hint:** Remember that the `businessentityid` column from the `salesperson` is compatible with the `salespersonid` column in the query of exercise 6 (they both represent the salesperson ID).

In [None]:
# Name your variable salesperson_ordertotal_commission
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

The correlation coefficient between `ordertotalsum` and `commissionpct` is $\rho=0.73$. This suggests that the salespeople who earn a high commission are also those who close the bigger deals.

## Exercise 8 

Remember how we mentioned that products were sold in many regions? This is why you had to work with the `culture` value before to get the English language descriptions. To make matters worse, you are told the sales are recorded in *local* currency, so your previous analyses are flawed. Technically, you must convert all amounts to USD if you wish to compare the different salespeople fairly! Instead, let's group the salespeople orders by the currency used for each order (you will have to consider `tocurrencyrate` for this task in the `CurrencyRate` table). 

Let's explore the currencies in different sales. But first, here are some things to understand about the currency columns:
* The `FromCurrencyCode` is all USD, so focus on `tocurrencyrate`
* If the sale was paid in USD, the `currencyrateid` was left blank (since there was no need to make a conversion)

#### 8.1 (optional)

Create a table with the `salespersonid`, `salesorderid`, `currencyrateid` and `tocurrencycode` to see the connection. Remember to exclude sales that are not associated with a salesperson and only consider sales in 2014. Order by the salesperson ID and show only 10 rows. Your table should look like this

| salespersonid 	| salesorderid	| currencyrateid	|tocurrencycode   |
|-:	|-:	|-:	|-: |
| 274 	| 65294 	| None 	| None |
| 274 	| 65298 	| None 	| None |
| 274 	| 67277 	| None 	| None |
| 274 	| 67286	 	| 11427	 	| CAD |
| 274 	| 69528		 	| None	| None |
| ... 	| ... 	| ... 	| ...    |

**Hint**: Since `USD` would not show up in the **CurrencyRate** table, you will have to do a `LEFT JOIN` to avoid losing information. 

In [None]:
# Name your variable salesperson_currency_id
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

As expected, we can see that different salespeople have sales in different currencies. 

**Note**: The `None` in the above example takes the place of `NULL` values, which contextually means that the sale was in USD. 

#### 8.2 (optional)

The `None` in the above query can be confusing to someone who doesn't understand the database. In this case, it's best to replace them with useful information. Redo the previous exercise with the following changes:
* Leave out the `currencyrateid` column 
* Replace `None` with 'USD' in the `tocurrencycode` column

**Hint**: One way of completing this task is to use the `CASE` expression, which can be incorporated like this:

~~~sql
SELECT column1, column2, 
CASE
    WHEN condition1 THEN result1
    ELSE result2
END AS column3
FROM Table
~~~

The above would result in a table with the following columns

|column1   |column2   |column3   |
|-:  |-:   |-:   |
|...  |... |... |

In the `tocurrencycode` column, the `CASE` would 
* replace `NULL` values with 'USD'
* leave other values as they are  

In [None]:
# Name your variable salesperson_currency_code
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

### Exercise 9 (optional)

Now that we have the currency codes associated with each salesperson ID, redo Exercise 7 adding in the `tocurrencycode`. Order the results by currency (ascending) and total sales (descending) to make it easier to see who the best salespeople are for each currency.

**Hint:** Start with Exercise 7 and integrate the currency piece using the `CASE` expression and `LEFT JOIN` from Exercise 8.2 (removing the `LIMIT` clause).

In [None]:
# Name your variable salesperson_ranking_currency
# YOUR CODE HERE
raise NotImplementedError() # Remove this line when you enter your solution

## Testing cells

In [None]:
# Ex. 1
assert "rating_ranking" in globals(), "Ex. 1 - Remember that your variable's name should be `rating_ranking`!"
rating_ranking_result = pd.read_sql(rating_ranking, cxn)
assert len(rating_ranking_result) > 0, "Ex. 1 - Your code is not producing any output! (ie., a table with lenght zero)"
assert set(rating_ranking_result.columns) == {'NAME', 'avgrating', 'num_ratings', 'productid'}, "Ex. 1 - Your query result doesn't have exactly these columns: 'NAME', 'avgrating', 'num_ratings', 'productid'"
print("Exercise 1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 2.1
assert "productmodelid_description" in globals(), "Ex. 2.1 - Remember that your variable's name should be `productmodelid_description`!"
productmodelid_description_result = pd.read_sql(productmodelid_description, cxn)
assert len(productmodelid_description_result) == 127, "Ex. 2.1 - There are 127 product models in the database, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(productmodelid_description_result.columns) == {'description', 'productmodelid'}, "Ex. 2.1 - Your query result doesn't have exactly these columns: 'description', 'productmodelid'"
print("Exercise 2.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 2.2
assert "description_totalorders" in globals(), "Ex. 2.2 - Remember that your variable's name should be `description_totalorders`!"
description_totalorders_result = pd.read_sql(description_totalorders, cxn)
assert len(description_totalorders_result) == 10, "Ex. 2.2 - Remember to use LIMIT 10 in your query! This is a top 10!"
assert set(description_totalorders_result.columns) == {'NAME', 'description', 'productmodelid', 'total_orders'}, "Ex. 2.2 - Your query result doesn't have exactly these columns: 'NAME', 'description', 'productmodelid', 'total_orders'"
print("Exercise 2.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.1
assert "quantities_ordered" in globals(), "Ex. 3.1 - Remember that your variable's name should be `quantities_ordered`!"
quantities_ordered_result = pd.read_sql(quantities_ordered, con=cxn)
assert len(quantities_ordered_result) == 266, "Ex. 2 - There are 266 products in the database that have associated quantities, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise and don't filter by culture!"
assert set(quantities_ordered_result.columns) == {'productid', 'quantity'}, "Ex. 3.1 - Your query result doesn't have exactly these columns: 'productid', 'quantity'"
print("Exercise 3.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.2
assert "products_prices" in globals(), "Ex. 3.2 - Remember that your variable's name should be `products_prices`!"
products_prices_result = pd.read_sql(products_prices, cxn)
assert len(products_prices_result) == 295, "Ex. 3.2 - There are 295 products in the database that have prices, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(products_prices_result.columns) == {'category', 'listprice', 'productid', 'subcategory'}, "Ex. 3.2 - Your query result doesn't have exactly these columns: 'category', 'listprice', 'productid', 'subcategory'"
print("Exercise 3.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 3.3
assert "prices_quantities" in globals(), "Ex. 3.3 - Remember that your variable's name should be `prices_quantities`!"
prices_quantities_result = pd.read_sql(prices_quantities, cxn)
assert len(prices_quantities_result) == 35, "Ex. 3.3 - There are 35 subcategories in the database, but your query produces a different number. Make sure that you don't have any LIMIT clauses in this exercise!"
assert set(prices_quantities_result.columns) == {'average_price_in_subcategory', 'category', 'subcategory', 'total_items_sold_in_subcategory'}, "Ex. 3.3 - Your query result doesn't have exactly these columns: 'average_price_in_subcategory', 'category', 'subcategory', 'total_items_sold_in_subcategory'"
print("Exercise 3.3 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 4
assert "salesperson_sales" in globals(), "Ex. 4 - Remember that your variable's name should be `salesperson_sales`!"
salesperson_sales_result = pd.read_sql(salesperson_sales, cxn)
assert len(salesperson_sales_result) == 5, "Ex. 4 - This is a top 5. Remember to use LIMIT!"
assert set(salesperson_sales_result.columns) == {'businessentityid', 'salesytd'}, "Ex. 4 - Your query result doesn't have exactly these columns: 'businessentityid', 'salesytd'"
print("Exercise 4 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 5
assert "salesperson_totalsales" in globals(), "Ex. 5 - Remember that your variable's name should be `salesperson_totalsales`!"
salesperson_totalsales_result = pd.read_sql(salesperson_totalsales, cxn)
assert len(salesperson_totalsales_result) == 5, "Ex. 5 - This is a top 5. Remember to use LIMIT!"
assert set(salesperson_totalsales_result.columns) == {'salespersonid', 'totalsales'}, "Ex. 5 - Your query result doesn't have exactly these columns: 'salespersonid', 'totalsales'"
print("Exercise 5 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 6.1
assert "order_ordertotal" in globals(), "Ex. 6.1 - Remember that your variable's name should be `order_ordertotal`!"
order_ordertotal_result = pd.read_sql(order_ordertotal, cxn)
assert len(order_ordertotal_result) == 31465, "Ex. 6.1 - There are more than 31,000 orders in the database. Remember to NOT use LIMIT here!"
assert set(order_ordertotal_result.columns) == {'ordertotal', 'salesorderid'}, "Ex. 6.1 - Your query result doesn't have exactly these columns: 'ordertotal', 'salesorderid'"
print("Exercise 6.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 6.2
assert "salesperson_ordertotal" in globals(), "Ex. 6.2 - Remember that your variable's name should be `salesperson_ordertotal`!"
salesperson_ordertotal_result = pd.read_sql(salesperson_ordertotal, cxn)
assert len(salesperson_ordertotal_result) == 5, "Ex. 6.2 - There are too many or too few rows in your result. Remember to use LIMIT here!"
assert set(salesperson_ordertotal_result.columns) == {'ordertotalsum', 'salespersonid'}, "Ex. 6.2 - Your query result doesn't have exactly these columns: 'ordertotalsum', 'salespersonid'"
print("Exercise 6.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 7
assert "salesperson_ordertotal_commission" in globals(), "Ex. 7 - Remember that your variable's name should be `salesperson_ordertotal_commission`!"
salesperson_ordertotal_commission_result = pd.read_sql(salesperson_ordertotal_commission, cxn)
assert len(salesperson_ordertotal_commission_result) == 17, "Ex. 7 - There are too many or too few rows in your result. Remember to NOT use LIMIT here!"
assert set(salesperson_ordertotal_commission_result.columns) == {'commissionpct', 'ordertotalsum', 'salespersonid'}, "Ex. 7 - Your query result doesn't have exactly these columns: 'commissionpct', 'ordertotalsum', 'salespersonid'"
print("Exercise 7 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 8.1
assert "salesperson_currency_id" in globals(), "Ex. 8.1 - Remember that your variable's name should be `salesperson_currency_id`!"
salesperson_currency_result = pd.read_sql(salesperson_currency_id, cxn)
assert len(salesperson_currency_result) == 10, "Ex. 8.1 - There are too many or too few rows in your result. Remember to use LIMIT here!"
assert set(salesperson_currency_result.columns) == {'salespersonid', 'salesorderid', 'currencyrateid','tocurrencycode'}, "Ex. 8.1 - Your query result doesn't have exactly these columns: 'salespersonid', 'salesorderid', 'currencyrateid','tocurrencycode'"
print("Exercise 8.1 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 8.2
assert "salesperson_currency_code" in globals(), "Ex. 8.2 - Remember that your variable's name should be `salesperson_currency_code`!"
salesperson_currency_result2 = pd.read_sql(salesperson_currency_code, cxn)
assert len(salesperson_currency_result2) == 10, "Ex. 8.2 - There are too many or too few rows in your result. Remember to use LIMIT here!"
assert set(salesperson_currency_result2.columns) == {'salespersonid', 'salesorderid','tocurrencycode'}, "Ex. 8.2 - Your query result doesn't have exactly these columns: 'salespersonid', 'salesorderid', 'currencyrateid','tocurrencycode'"
print("Exercise 8.2 looks fine for now. You will get your final grade after we've reviewed your submission.")

In [None]:
# Ex. 9
assert "salesperson_ranking_currency" in globals(), "Ex. 9 - Remember that your variable's name should be `salesperson_ranking_currency`!"
salesperson_ranking_currency_result = pd.read_sql(salesperson_ranking_currency, cxn)
assert len(salesperson_ranking_currency_result) == 21, "Ex. 9 - There are too many or too few rows in your result. Remember to NOT use LIMIT here!"
assert set(salesperson_ranking_currency_result.columns) == {'salespersonid', 'tocurrencycode', 'ordertotalsum', 'commissionpct'}, "Ex. 9 - Your query result doesn't have exactly these columns: 'salespersonid', 'tocurrencycode', 'ordertotalsum', 'commissionpct'"
print("Exercise 9 looks fine for now. You will get your final grade after we've reviewed your submission.")

## Attribution

"AdventureWorks database", Nov 7, 2017, Microsoft Corporation, [MIT License](https://docs.microsoft.com/en-us/sql/samples/sql-samples-where-are?view=sql-server-ver15), https://github.com/microsoft/sql-server-samples/tree/master/samples/databases/adventure-works