# Reboot - SQL Advanced

Tonight, we will use a Blog SQLite database:

In [7]:
!tree

[01;34m.[00m
├── [01;34mdata[00m
│   ├── blog.sqlite
│   ├── ecommerce.sqlite
│   ├── exploitable_db.sqlite
│   └── students.sqlite
├── exploit.py
├── recap-correction.ipynb
└── recap.ipynb

1 directory, 7 files


## 1. Schema ERD

❓ Open the `data/blog.sqlite` in DBeaver, explore the schema and draw it on [kitt.lewagon.com/db](https://kitt.lewagon.com/db).

_TODO: Double click this cell and **paste** a screenshot of the schema for future reference_.

---
## 2. Most liked posts

Complete the code to get **the 3 most liked posts**:

In [11]:
import sqlite3

conn = sqlite3.connect("data/blog.sqlite")
c = conn.cursor()

# TODO: write the query
query = """
    SELECT posts.title, posts.content, COUNT(posts.id)
    FROM posts
    JOIN likes ON likes.post_id = posts.id
    GROUP BY posts.id
    ORDER BY COUNT(posts.id) DESC
    LIMIT 3
"""

# TODO: Execute the query
c.execute(query)
rows = c.fetchall()

# TODO: Fetch and print the results
rows

[('Half imagine another.',
  'Nice career practice image. Modern son per share painting successful on.',
  84),
 ('Side foot leader popular.',
  'Relate parent run public choice allow. Establish single far Congress impact course offer.',
  82),
 ('Area paper whatever mean.',
  'Space whose often computer. Yard account stuff section write store somebody. Coach none blue skin finish any.',
  81)]

---

### Pretty Print using _pandas_

The readbility of our `print()` statements are not so good.

Next week, we will introduce [pandas](https://pandas.pydata.org/) which will largely improve the UX of our Data Exploration in Notebooks.

Execute the following cell:

In [12]:
import pandas as pd



Then try again the previous `query`, delegating the job of fetching results + displaying them to the `execute_and_print` function and `pandas`:

In [13]:
pd.read_sql_query(query, conn)

Unnamed: 0,title,content,COUNT(posts.id)
0,Half imagine another.,Nice career practice image. Modern son per sha...,84
1,Side foot leader popular.,Relate parent run public choice allow. Establi...,82
2,Area paper whatever mean.,Space whose often computer. Yard account stuff...,81


---
## 3. Find the three users who 'liked' the most

In [19]:
query = """
    SELECT users.first_name, users.last_name, COUNT(likes.id)
    FROM users
    JOIN likes ON users.id = likes.user_id
    GROUP BY users.id
    ORDER BY COUNT(likes.id) DESC
    LIMIT 3
"""

pd.read_sql_query(query, conn)

Unnamed: 0,first_name,last_name,COUNT(likes.id)
0,Michael,Allen,236
1,Donna,Ramirez,233
2,Barbara,Hurst,227


---
## 4. Find the most liked author

In [29]:
pd.read_sql_query("""
    SELECT users.first_name, users.last_name, posts.title, COUNT(likes.id) likes_count
    FROM users
    JOIN likes ON posts.id = likes.post_id
    JOIN posts ON users.id = posts.user_id
    GROUP BY users.id
    ORDER BY likes_count DESC
    LIMIT 1
""", conn)

Unnamed: 0,first_name,last_name,title,likes_count
0,Teresa,Moore,Still relationship rock surface son wait.,647


---
## 5. Who are the authors of the 3 most liked posts?

In [32]:
pd.read_sql_query("""
    SELECT users.first_name, users.last_name, posts.title, COUNT(likes.id) likes_count
    FROM users
    JOIN posts ON users.id = posts.user_id
    JOIN likes ON posts.id = likes.post_id
    GROUP BY posts.id
    ORDER BY likes_count DESC
    LIMIT 3
""", conn)

Unnamed: 0,first_name,last_name,title,likes_count
0,Melissa,Henry,Half imagine another.,84
1,Cynthia,Raymond,Side foot leader popular.,82
2,Alexander,Cook,Area paper whatever mean.,81


---
## 6. How many people liked at least one post?

In [41]:
pd.read_sql_query("""
    WITH likes_info AS (
    SELECT *, COUNT(likes.id) likes_count
    FROM users
    JOIN likes ON users.id = likes.user_id
    GROUP BY users.id
    HAVING COUNT(likes.id) >= 1
    )
    
    SELECT COUNT(likes_count)
    FROM likes_info
""", conn)

Unnamed: 0,COUNT(likes_count)
0,49


---
## 7. Compute the cumulative number of likes per day

In [45]:
pd.read_sql_query("""
    WITH likes_per_days AS (
        SELECT COUNT(likes.id), like_per_day, likes.created_at
        FROM likes
        GROUP BY likes.created_at
    )
    SELECT 
        likes_per_days.like_per_day,
        likes_per_days.created_at,
        SUM(likes_per_days.like_per_day) OVER (
            ORDER_BY likes_per_days.created_at
        ) as cumulative_like_per_day
    FROM likes_per_days
""", conn)

DatabaseError: Execution failed on sql '
    WITH likes_per_days AS (
        SELECT COUNT(likes.id), like_per_day, likes.created_at
        GROUP BY likes.created_at
        FROM likes
    )
    SELECT 
        likes_per_days.like_per_day,
        likes_per_days.created_at,
        SUM(likes_per_days.like_per_day) OVER (
            ORDER_BY likes_per_days.created_at
        ) as cumulative_like_per_day
    FROM likes_per_days
': near "FROM": syntax error

---
## 8. (Optional) Who's the biggest fan of each author?

The biggest fan of an author is defined as the user who liked the most the author's posts.
<br><br>
<details>
    <summary>💡 Click for Hint</summary>
    You might need to use <code>WITH</code>
</details>


In [None]:
pd.read_sql_query("""
TODO: Write the SQL query
""", conn)