# Finding Gaps Part 2 Challenge

### Introduction

In this lesson, we'll take a classic sql problem of identifying gaps in a sequence of numbers.  The difference from before -- is that this time our gap size can be larger than one digit.  And we'll need to find the start and end of each gap.

Try to solve it using self joins.  

### Loading the data

In [6]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "https://raw.githubusercontent.com/tech-interviews-jigsaw/sql-advanced-joins/main/6-common-strategies/sequence.csv"
df = pd.read_csv(url)

In [7]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

### Viewing our data

Our data has three gaps with a beginning and end of the following.

<img src="./end-result.png" width="30%">

You can see this if we take a look at the numbers.

In [8]:
pd.read_sql("select * from numbers", conn)

Unnamed: 0,id,number
0,0,1
1,1,2
2,2,5
3,3,6
4,4,7
5,5,8
6,6,9
7,7,10
8,8,12
9,9,13


And again this is the result that we want.

<img src="./end-result.png" width="40%">

### No hints

In [13]:
query = """
select * from numbers limit 3
"""
pd.read_sql(query, conn)

Unnamed: 0,id,number
0,0,1
1,1,2
2,2,5


Go for it.  Return the three gaps with the gap start and gap end, as shown below.

<img src="./end-result.png" width="40%">

If you feel stuck.  Feel free to try out the problem with some hints -- which we'll move through below.

### With (some) hints

Ok, to achieve this, first perform a query that returns a temporary table of gap starts, then return a temporary table of gap ends.  

Finally, line up these two separate tables, use correlated subquery to line up the `gap_start` with the `gap_end`.

Take it in steps, one by one.

In [None]:
query = """
select * from numbers limit 3
"""
pd.read_sql(query, conn)

> Remember, we want to wind up with the following.

<img src="./end-result.png" width="40%">

### Even more hints

Or, if you would like to walk through this with some more structure, move through the section below.

Remember that our approach is to identify the gap starts, and then identify the gap ends, and then line them up with a correlated subquery.  

So take these steps one by one.

1. Start by returning a list of all of the gap starts.

In [11]:
query = """

"""

pd.read_sql(query, conn)


# 	left_num	number
# 0	3	None
# 1	11	None
# 2	16	None

Unnamed: 0,left_num,number
0,3,
1,11,
2,16,


Ok, so with that, we are already a good part of the way there.  Remember this is what we're building to.

<img src="./end-result.png" width="40%">

* Now write a similar query to identify the end of the gaps.

In [15]:
query = """

"""
pd.read_sql(query, conn)

# 	gap_end	right_num
# 0	4	None
# 1	11	None
# 2	19	None

Unnamed: 0,gap_end,right_num
0,4,
1,11,
2,19,


Ok, so now we have the beginning of a gap and the end of a gap.  The next step is to combine our two  queries to return two columns, where each row displays the `gap_start` and `gap_end`.

* Combining the tables

Ok, so now use two CTEs to be able to reference the two queries above.  

And then to align the two tables, we'll use a correlated subquery. Our outer query, should simply query the table of `gap_starts`.

Once you have that, see if you can align the `gap_ends` table.  Use a correlated subquery to do so.    

In [44]:
query = """

"""

pd.read_sql(query, conn)
# gap_start	gap_end
# 3	4
# 11	11
# 16	19 

Unnamed: 0,gap_start,gap_end
0,3,4
1,11,11
2,16,19


### Summary

Ok, so in this lab, we saw how to identify our three gaps -- identifyng the beginning and end of each.  We did so by getting our gap starts and gap ends in separate queries -- and then aligning those results with a correlated subquery.