# Finding Gaps - Window Functions

### Introduction

Now let's take the same problem of finding gaps, but this time let's work through using window functions to produce our result.

### Loading the data

In [10]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "./sequence.csv"
df = pd.read_csv(url)

In [11]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

### Performing the query

If you look at the data, you can see that multiple places where there is no next number.  

> This occurs at the values of 2, 10, 15, and 20.  And so our gaps are at values 3, 11, 16, and 21 (well, you can include 21 for now).

In [13]:
pd.read_sql("select number from numbers", conn)

Unnamed: 0,number
0,1
1,2
2,5
3,6
4,7
5,8
6,9
7,10
8,12
9,13


And this is the end result that we would like to get to.

<img src="./end-result.png" width="30%">

Start by returning all of the rows where there is not a next number.

In [18]:
query = """
SELECT t.number
    ,lead(t.number, 1) OVER (
        ORDER BY t.number
        ) AS next_number
FROM numbers t limit 5
"""
pd.read_sql(query, conn)

Unnamed: 0,number,next_number
0,1,2
1,2,5
2,5,6
3,6,7
4,7,8


And then compare the two columns with a where clause to identify the gaps between the two columns.  For example, there is not a gap in the first row, but there is a gap in the second row.

In [23]:
query = """
with next_nums as (
    SELECT t.number
        ,lead(t.number, 1) OVER (
            ORDER BY t.number
            ) AS next_number
    FROM numbers t 
)
select * from next_nums where next_nums.next_number - next_nums.number > 1 
"""

pd.read_sql(query, conn)

Unnamed: 0,number,next_number
0,2,5
1,10,12
2,15,20


Ok, and now the real result we would like to get to is the following:

<img src="./end-result.png" width="40%">

So to that we can modify our query to the following.

In [24]:
query = """
with next_nums as (
    SELECT t.number
        ,lead(t.number, 1) OVER (
            ORDER BY t.number
            ) AS next_number
    FROM numbers t 
)
select number + 1 gap_start, 
next_number - 1 gap_end 
from next_nums 
where next_nums.next_number - next_nums.number > 1 
"""

pd.read_sql(query, conn)

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,11
2,16,19


### Summary

Ok, so in this lab, we saw one mechanism for 