# Islands Problem

### Introduction

In this lesson, we'll see another common sql problem which is to work with islands of data.

### Result

Let's start by loading our data.

In [4]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('users.db')
url = "./island_sequence.csv"
df = pd.read_csv(url)

In [5]:
df.to_sql('numbers', conn, index = True,
          index_label = 'id', if_exists = 'replace')

13

In [8]:
pd.read_sql("select * from numbers", conn)

Unnamed: 0,id,number
0,0,1
1,1,2
2,2,5
3,3,6
4,4,7
5,5,8
6,6,9
7,7,10
8,8,12
9,9,13


The sample Test numbers table contains 4 islands. We need to create a SQL script to identify them.

This is what the end result looks like.

<img src="./island-answer.png" width="40%">

Ok, let's get started.

### Our first steps

In [12]:
query = """SELECT number
    ,row_number() OVER (
        ORDER BY number
        ) AS row_num
FROM numbers t"""

pd.read_sql(query, conn)

Unnamed: 0,number,row_num
0,1,1
1,2,2
2,5,3
3,6,4
4,7,5
5,8,6
6,9,7
7,10,8
8,12,9
9,13,10


* Difference group

In [13]:
query = """SELECT number
    ,row_number() OVER (
        ORDER BY number
        ) AS row_num
    ,number - row_number() OVER (
        ORDER BY number
        ) AS diff_group
FROM numbers t"""

pd.read_sql(query, conn)

Unnamed: 0,number,row_num,diff_group
0,1,1,0
1,2,2,0
2,5,3,2
3,6,4,2
4,7,5,2
5,8,6,2
6,9,7,2
7,10,8,2
8,12,9,3
9,13,10,3


In [15]:
query = """WITH island_analysis
AS (
    SELECT number
        ,row_number() OVER (
            ORDER BY number
            ) AS row_num
        ,number - row_number() OVER (
            ORDER BY number
            ) AS diff_group
    FROM numbers t
    )
SELECT min(number) AS island_start
    ,max(number) AS island_end
FROM island_analysis i
GROUP BY diff_group;
"""

pd.read_sql(query, conn)

Unnamed: 0,island_start,island_end
0,1,2
1,5,10
2,12,15
3,20,20
