# Finding Gaps - Window Functions

### Introduction

Now let's take the same problem of finding gaps, but this time let's work through using window functions to produce our result.

### Loading the data

In [5]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "https://raw.githubusercontent.com/tech-interviews-jigsaw/sql-advanced-joins/main/6-common-strategies/sequence.csv"
# url = './sequence.csv'
df = pd.read_csv(url)

In [6]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

### Performing the query

If you look at the data, you can see that multiple places where there is no next number.  

> This occurs at the values of 2, 10, 15, and 20.  And so our gaps are at values 3, 11, 16, and 21 (well, you can include 21 for now).

In [7]:
pd.read_sql("select number from numbers", conn)

Unnamed: 0,number
0,1
1,2
2,5
3,6
4,7
5,8
6,9
7,10
8,12
9,13


And this is the end result that we would like to get to.

<img src="./end-result.png" width="30%">

We can start by using our lead window function.  This will add a `next_number` column, which contains the successive number for each row.

In [9]:
query = """
SELECT numbers.number,
lead(numbers.number, 1) OVER (ORDER BY numbers.number) AS next_number
FROM numbers limit 5
"""
pd.read_sql(query, conn)

Unnamed: 0,number,next_number
0,1,2
1,2,5
2,5,6
3,6,7
4,7,8


So we can see that for the first row we have 1, with a `next_number` of 2.  And there is no gap.  But with the second row we have 2 with a next number of 5, and we do have a gap.

So use this as a starting point to the list of gaps.

> Remember that the end result we would like to get to is the following.

<img src="./end-result.png" width="40%">

### Solution

The first step is to update the query to just select for those rows that has a difference greater than one. 

In [17]:
query = """
with numbers_and_nexts as (
    SELECT numbers.number as number,
lead(numbers.number, 1) OVER (ORDER BY numbers.number) AS next_number
from numbers
)

select * from numbers_and_nexts where (numbers_and_nexts.next_number - numbers_and_nexts.number) > 1

"""
pd.read_sql(query, conn)

Unnamed: 0,number,next_number
0,2,5
1,10,12
2,15,20


And then we update the select statement so we return the first gap number, and the end of the gap number.

In [18]:
query = """
with numbers_and_nexts as (
    SELECT numbers.number as number,
lead(numbers.number, 1) OVER (ORDER BY numbers.number) AS next_number
from numbers
)

select number + 1 as gap_start, next_number - 1 next_number  from numbers_and_nexts where (numbers_and_nexts.next_number - numbers_and_nexts.number) > 1

"""
pd.read_sql(query, conn)

Unnamed: 0,gap_start,next_number
0,3,4
1,11,11
2,16,19


> Try to reimplement the query without looking.

In [20]:
query = """
"""
# pd.read_sql(query, conn)

### Summary

Ok, so in this lab, we saw another mechanism for finding the beginning and ending gaps.  We did this by starting with the lead window function - which added a new column to identify the next number.  Then we selected those with a gap greater than one.