# Finding Gaps Part 2 Challenge

### Introduction

In this lesson, we'll take a classic sql problem of identifying gaps in a sequence of numbers.  The difference from before -- is that this time our gap size can be larger than one digit.  And we'll need to find the start and end of each gap.

Try to solve it using self joins.  

### Loading the data

In [6]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "https://raw.githubusercontent.com/tech-interviews-jigsaw/sql-advanced-joins/main/6-common-strategies/sequence.csv"
df = pd.read_csv(url)

In [7]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

### Viewing our data

Our data has three gaps with a beginning and end of the following.

<img src="./end-result.png" width="30%">

You can see this if we take a look at the numbers.

In [8]:
pd.read_sql("select * from numbers", conn)

Unnamed: 0,id,number
0,0,1
1,1,2
2,2,5
3,3,6
4,4,7
5,5,8
6,6,9
7,7,10
8,8,12
9,9,13


And again this is the result that we want.

<img src="./end-result.png" width="40%">

### Walking through all of the hints

1. Start by returning a list of all of the gap starts.

In [11]:
query = """
SELECT l.number + 1 as left_num
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
"""

pd.read_sql(query, conn)


# 	left_num	number
# 0	3	None
# 1	11	None
# 2	16	None

Unnamed: 0,left_num,number
0,3,
1,11,
2,16,


Ok, so ideally we already a good part of the way there.  Remember this is what we're building to.

<img src="./end-result.png" width="40%">

* Now write a similar query to identify the end of the gaps.

In [15]:
query = """
SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers)
"""
pd.read_sql(query, conn)

# 	gap_end	right_num
# 0	4	None
# 1	11	None
# 2	19	None

Unnamed: 0,gap_end,right_num
0,4,
1,11,
2,19,


Ok, so now we have the beginning of a gap and the end of a gap.  So next use our previous queries to return two columns, where each row marks the beginning of the gap and the end of the gap.

* Combining the tables

Ok, so now use two CTEs to be able to reference the two queries above.  

And then for the end result use a correlated subquery. Our first CTE returning the `gap_starts` should be the outer query.  


In [44]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start
    ,(
        SELECT e.gap_end
        FROM gap_ends e
        WHERE e.gap_end >= s.gap_start
        ORDER BY e.gap_end limit 1
        ) AS gap_end --- 2. correlated subquery
FROM gap_starts s --- 1. outer query
"""

pd.read_sql(query, conn)
# gap_start	gap_end
# 3	4
# 11	11
# 16	19 

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,11
2,16,19


Try it again.  Ok, so try to do that last part again.  The goal is to execute a subquery for each row, that shows the gap end less than or equal to the gap_start.  Update the second to last row of code to get this working.

In [47]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start, (
        SELECT e.gap_end from gap_ends e limit 1) as gap_end
 FROM gap_starts s
"""

pd.read_sql(query, conn)

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,4
2,16,4


Ok, and this time, try it again filling out essentially the entire correlated subquery yourself.

In [48]:
query = """with gap_starts as
(
SELECT l.number + 1 as gap_start
    ,r.number
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number - 1 
WHERE r.number is null and l.number !=
(select max(number) from numbers)
),

gap_ends as (SELECT l.number -1 gap_end
    ,r.number right_num
FROM numbers l
LEFT JOIN numbers r ON l.number = r.number + 1
where r.number is null and l.number != (select min(number) from numbers) 
               )
               
SELECT s.gap_start, 1 as gap_end
 FROM gap_starts s
"""

pd.read_sql(query, conn)

Unnamed: 0,gap_start,gap_end
0,3,1
1,11,1
2,16,1


### Summary

Ok, so in this lab, we saw one mechanism for 