## Window functions with aggregations (I)

To familiarize yourself with the window functions, you will work with the `Orders` table in this chapter. Recall that using `OVER()`, you can create a window for the entire table. To create partitions using a specific column, you need to use `OVER()` along with `PARTITION BY`.

Instructions

1. Write a T-SQL query that returns the sum of `OrderPrice` by creating partitions for each `TerritoryName`.

In [None]:
SELECT OrderID, 
       TerritoryName, 
       -- Total price for each partition
       SUM(OrderPrice) 
       -- Create the window and partitions
       OVER(PARTITION BY TerritoryName) AS TotalPrice
FROM Orders;

# OrderID   TerritoryName   TotalPrice
# 43706     Australia       1469
# 43722     Australia       1469
# 43729     Australia       1469
# 47622     Australia       1469
# 47722     Australia       1469
# 48577     Australia       1469
# 48611     Australia       1469
# 50342     Australia       1469
# 50365     Australia       1469
# 51331     Australia       1469
# 51398     Australia       1469
# 53543     Australia       1469
# 53578     Australia       1469
# 53576     Canada          2573
# ...

## Window functions with aggregations (II)

In the last exercise, you calculated the sum of all orders for each territory. In this exercise, you will calculate the number of orders in each territory.

Instructions

1. Count the number of rows in each partition.
2. Partition the table by `TerritoryName`.

In [None]:
SELECT OrderID, 
       TerritoryName, 
       -- Number of rows per partition
       COUNT(*) 
       -- Create the window and partitions
       OVER(PARTITION BY TerritoryName) AS TotalOrders
FROM Orders;

# OrderID   TerritoryName   TotalOrders
# 43706     Australia       13
# 43722     Australia       13
# 43729     Australia       13
# 47622     Australia       13
# 47722     Australia       13
# 48577     Australia       13
# 48611     Australia       13
# 50342     Australia       13
# 50365     Australia       13
# 51331     Australia       13
# 51398     Australia       13
# 53543     Australia       13
# 53578     Australia       13
# 53576     Canada          37
# ...

## Do you know window functions?

Which of the following statements is _incorrect_ regarding window queries?

The standard aggregations like `SUM()`, `AVG()`, and `COUNT()` require `ORDER BY` in the `OVER()` clause.

## First value in a window

Suppose you want to figure out the first `OrderDate` in each territory or the last one. How would you do that? You can use the window functions `FIRST_VALUE()` and `LAST_VALUE()`, respectively! Here are the steps:

- First, create partitions for each territory.
- Then, order by `OrderDate`.
- Finally, use the `FIRST_VALUE()` and/or `LAST_VALUE()` functions as per your requirement.

Instructions

1. Write a T-SQL query that returns the first `OrderDate` by creating partitions for each `TerritoryName`.

In [None]:
SELECT TerritoryName, 
       OrderDate, 
       -- Select the first value in each partition
       FIRST_VALUE(OrderDate) 
       -- Create the partitions and arrange the rows
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS FirstOrder
FROM Orders;

# TerritoryName   OrderDate             FirstOrder
# Australia       2015-02-23 09:00:00   2015-02-23 09:00:00
# Australia       2015-02-23 11:00:00   2015-02-23 09:00:00
# Australia       2015-02-23 12:00:00   2015-02-23 09:00:00
# Australia       2015-04-23 02:00:00   2015-02-23 09:00:00
# Australia       2015-04-24 02:00:00   2015-02-23 09:00:00
# Australia       2015-05-06 03:00:00   2015-02-23 09:00:00
# Australia       2015-05-07 05:00:00   2015-02-23 09:00:00
# Australia       2015-06-03 03:00:00   2015-02-23 09:00:00
# Australia       2015-06-03 05:00:00   2015-02-23 09:00:00
# Australia       2015-06-17 07:00:00   2015-02-23 09:00:00
# Australia       2015-06-18 04:00:00   2015-02-23 09:00:00
# Australia       2015-07-21 03:00:00   2015-02-23 09:00:00
# Australia       2015-07-21 12:00:00   2015-02-23 09:00:00
# Canada          2015-01-01 13:00:00   2015-01-01 13:00:00
# ...

## Previous and next values

What if you want to shift the values in a column by one row up or down? You can use the exact same steps as in the previous exercise but with two new functions, `LEAD()`, for the next value, and `LAG()`, for the previous value. So you follow these steps:

- First, create partitions
- Then, order by a certain column
- Finally, use the `LEAD()` and/or `LAG()` functions as per your requirement

Instructions

1. Write a T-SQL query that for each territory:
    1. Shifts the values in `OrderDate` one row down. Call this column `PreviousOrder`.
    2. Shifts the values in `OrderDate` one row up. Call this column `NextOrder`. _You will need to PARTITION BY the territory._

In [None]:
SELECT TerritoryName,
       OrderDate, 
       -- Specify the previous OrderDate in the window
       LAG(OrderDate) 
       -- Over the window, partition by territory & order by order date
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS PreviousOrder,
       -- Specify the next OrderDate in the window
       LEAD(OrderDate) 
       -- Create the partitions and arrange the rows
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS NextOrder
FROM Orders;

# TerritoryName   OrderDate             PreviousOrder         NextOrder
# Australia       2015-02-23 09:00:00   null                  2015-02-23 11:00:00
# Australia       2015-02-23 11:00:00   2015-02-23 09:00:00   2015-02-23 12:00:00
# Australia       2015-02-23 12:00:00   2015-02-23 11:00:00   2015-04-23 02:00:00
# ...

## Creating running totals

You usually don't have to use `ORDER BY` when using aggregations, but if you want to create running totals, you _should_ arrange your rows! In this exercise, you will create a running total of `OrderPrice`.

Instructions

1. Create the window, partition by `TerritoryName` and order by `OrderDate` to calculate a running total of `OrderPrice`.

In [None]:
SELECT TerritoryName,
       OrderDate, 
       -- Create a running total
       SUM(OrderPrice) 
       -- Create the partitions and arrange the rows
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS TerritoryTotal
FROM Orders;

# TerritoryName   OrderDate             TerritoryTotal
# Australia       2015-02-23 09:00:00   48
# Australia       2015-02-23 11:00:00   83
# Australia       2015-02-23 12:00:00   313
# ...

## Assigning row numbers

Records in T-SQL are inherently unordered. Although in certain situations, you may want to assign row numbers for reference. In this exercise, you will do just that.

Instructions

1. Write a T-SQL query that assigns row numbers to all records partitioned by `TerritoryName` and ordered by `OrderDate`.

In [None]:
SELECT TerritoryName,
       OrderDate, 
       -- Assign a row number
       ROW_NUMBER() 
       -- Create the partitions and arrange the rows
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS OrderCount
FROM Orders;

# TerritoryName   OrderDate             OrderCount
# Australia       2015-02-23 09:00:00   1
# Australia       2015-02-23 11:00:00   2
# Australia       2015-02-23 12:00:00   3
# ...

## Calculating standard deviation

Calculating the standard deviation is quite common when dealing with numeric columns. In this exercise, you will calculate the _running standard deviation_, similar to the running total you calculated in the previous lesson.

Instructions

1. Create the window, partition by `TerritoryName` and order by `OrderDate` to calculate a running standard deviation of `OrderPrice`.

In [None]:
SELECT OrderDate,
       TerritoryName, 
       -- Calculate the standard deviation
       STDEV(OrderPrice) 
       OVER(PARTITION BY TerritoryName ORDER BY OrderDate) AS StdDevPrice
FROM Orders;

# OrderDate             TerritoryName   StdDevPrice
# 2015-02-23 09:00:00   Australia       null
# 2015-02-23 11:00:00   Australia       9.192388155425117
# 2015-02-23 12:00:00   Australia       109.02446208687908
# ...

## Calculating mode (I)

Unfortunately, there is no function to calculate the _mode_, the most recurring value in a column. To calculate the mode:

- First, create a CTE containing an ordered count of values using `ROW_NUMBER()`.
- Write a query using the CTE to pick the value with the highest row number.

In this exercise, you will write the CTE needed to calculate the mode of `OrderPrice`.

Instructions

1. Create a CTE `ModePrice` that returns two columns (`OrderPrice` and `UnitPriceFrequency`).
2. Write a query that returns all rows in this CTE.

In [None]:
-- Create a CTE Called ModePrice which contains two columns
WITH ModePrice (OrderPrice, 
                UnitPriceFrequency) AS(SELECT OrderPrice,
                                       ROW_NUMBER()
                                       OVER(PARTITION BY OrderPrice ORDER BY OrderPrice) AS UnitPriceFrequency
                                       FROM Orders)
-- Select everything from the CTE
SELECT * 
FROM ModePrice;

# OrderPrice          UnitPriceFrequency
# 3.5                 1
# 3.5                 2
# 3.700000047683716   1
# ...

## Calculating mode (II)

In the last exercise, you created a CTE which assigned row numbers to each unique value in `OrderPrice`. All you need to do now is to find the `OrderPrice` with the highest row number.

Instructions

1. Use the CTE `ModePrice` to return the value of `OrderPrice` with the highest row number.

In [None]:
-- CTE from the previous exercise
WITH ModePrice (OrderPrice,
                UnitPriceFrequency) AS (SELECT OrderPrice,
                                        ROW_NUMBER() 
                                        OVER (PARTITION BY OrderPrice ORDER BY OrderPrice) AS UnitPriceFrequency
                                        FROM Orders)
-- Select the order price from the CTE
SELECT OrderPrice AS ModeOrderPrice
FROM ModePrice
-- Select the maximum UnitPriceFrequency from the CTE
WHERE UnitPriceFrequency IN (SELECT MAX(UnitPriceFrequency) 
                             From ModePrice);

# ModeOrderPrice
# 32