# Reviewing Window Functions

### Introduction

With the previous question, we were re-introduced to window functions.  In this lesson, we'll take another look at the fundamentals of window functions.

### Window Functions Review

In [4]:
import pandas as pd
url = "./favorita_transactions.csv"
df = pd.read_csv(url, low_memory=False)
df[:2]

Unnamed: 0,id,date,store_nbr,transactions
0,0,2013-01-01,25,770
1,1,2013-01-02,1,2111


In [5]:
import sqlite3

conn = sqlite3.connect('crm.db')

In [6]:
df.to_sql('stores', conn, if_exists = 'replace', index = False)

83488

In [7]:
query = '''SELECT id, store_nbr, transactions
from stores limit 5'''
pd.read_sql(query, conn)

Unnamed: 0,id,store_nbr,transactions
0,0,25,770
1,1,1,2111
2,2,2,2358
3,3,3,3487
4,4,4,1922


With window functions we use the `with over` syntax to calculate an aggregate for each previously existing row.  For example, we can use `dense_rank` to get a ranking of transactions across all stores and dates.

In [13]:
query = '''SELECT id, store_nbr, transactions, 
dense_rank() over (order by transactions desc) as rank
from stores limit 2'''
pd.read_sql(query, conn)

Unnamed: 0,id,store_nbr,transactions,rank
0,52011,44,8359,1
1,71010,44,8307,2


Let's unpack the above query.  We are calculating the dense rank.  
> With `dense_rank()`, tie values get the same rank, and then the subsequent row gets the next number.  So if the first two values are tied the `dense_rank()` will output: `1, 1, 3`.

Then looking at the query again...

> `dense_rank()` over (order by transactions) as rank

We see that the next part of the query is `over (order by transactions)`.  Here is where *the window* is defined.  So in this case, the window is all of the rows in the table, ordered by the transactions.

In [14]:
query = '''SELECT id, store_nbr, transactions, 
dense_rank() over (order by transactions desc) as rank
from stores limit 2'''
pd.read_sql(query, conn)

Unnamed: 0,id,store_nbr,transactions,rank
0,52011,44,8359,1
1,71010,44,8307,2


Now, if we do not define the window, the window simply takes the existing order of the rows.

In [16]:
query = '''SELECT id, store_nbr, transactions, 
row_number() over () as rank
from stores limit 2'''
pd.read_sql(query, conn)

Unnamed: 0,id,store_nbr,transactions,rank
0,0,25,770,1
1,1,1,2111,2
