# Sequence Challenge

### Introduction

In this lesson, we'll take a classic sql problem of identifying gaps in a sequence of numbers.  Try to solve it using self joins, and then try to see if you can solve it with window functions.

### Loading the data

In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "./sequence.csv"
df = pd.read_csv(url)

In [3]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

### Performing the query

If you look at the data, you can see that multiple places where there is no next number.  

> This occurs at the values of 2, 10, 15, and 20.  And so our gaps are at values 3, 11, 16, and 21 (well, you can include 21 for now).

Start by returning all of the rows where there is not a next number.

In [30]:
query = """
select * from numbers limit 3
"""
pd.read_sql(query, conn)

query = """
SELECT l.number + 1 as left_num
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null
"""

pd.read_sql(query, conn)


# 	left_num	number
# 0	3	None
# 1	11	None
# 2	16	None
# 3	21	None

Unnamed: 0,left_num,number
0,3,
1,11,
2,16,
3,21,


Ok, so currently the last number is also included in our query.  This is not technically a gap.  So now update the query to exclude this last number.

In [31]:
query = """
SELECT l.number + 1 as left_num
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
"""

pd.read_sql(query, conn)


Unnamed: 0,left_num,number
0,3,
1,11,
2,16,


### Going further

Ok, now if we look at our output above, the numbers we identified only indicate the *start* of the gaps.  For example, our gap starts at the number 2, but it does not stop until 5. 

In [23]:
query = """
select * from numbers limit 5
"""
pd.read_sql(query, conn)

Unnamed: 0,id,number
0,0,1
1,1,2
2,2,5
3,3,6
4,4,7


But all our query above did is identify the *beginning* of the gap.  Our gap ends at value 4 (one before the numbers resume).

**So now** write a *slightly different* query to find the end of a gap.  

> **Hint**: This query is pretty similar to the one you wrote above.

> **Note**: The minimum number of 1 is excluded, as the first number is not the end of a gap -- it's just the first number.

In [33]:
query = """
SELECT l.number -1 left_num
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers)
"""
pd.read_sql(query, conn)

# left_num	right_num
# 0	4	None
# 1	11	None
# 2	19	None

Unnamed: 0,left_num,right_num
0,4,
1,11,
2,19,


Ok, so now we have the beginning of a gap and the end of a gap.  So next use our previous queries to return two columns, where each row marks the beginning of the gap and the end of the gap.

We'll put this all together for you, because it's pretty tricky.  The first part of the code below, is not so bad.  Essentially, we just use CTEs to move over our gap_starts and gap_ends temporary tables.

Then in the last query, we select all of the previously calculated gap_starts.  And what's left is to find each start's corresponding end.  To do so, we use a subquery to generate the data for the `gap_end` column.  There, the logic is for each row in the gap start, find the first gap end that is greater or equal to the gap start. 

So for example, if our gap starts are at 3, 11 and 16, and our gap ends we're at 4, 11 and 16.  Then finding gap end after the gap start pairs each of the values properly.

In [44]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start
    ,(
        SELECT e.gap_end
        FROM gap_ends e
        WHERE e.gap_end >= s.gap_start
        ORDER BY e.gap_end limit 1
        ) AS gap_end
FROM gap_starts s
"""

pd.read_sql(query, conn)
# gap_start	gap_end
# 3	4
# 11	11
# 16	19 

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,11
2,16,19


Try it again.  Ok, so try to do that last part again.  The goal is to execute a subquery for each row, that shows the gap end less than or equal to the gap_start.  Update the second to last row of code to get this working.

In [47]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start, (
        SELECT e.gap_end from gap_ends e limit 1) as gap_end
 FROM gap_starts s
"""

pd.read_sql(query, conn)

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,4
2,16,4


Ok, and this time, write the subquery out yourself.

In [48]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start, 1 as gap_end
 FROM gap_starts s
"""

pd.read_sql(query, conn)

Unnamed: 0,gap_start,gap_end
0,3,1
1,11,1
2,16,1


### Summary

Ok, so in this lab, we saw one mechanism for 