### FIRST STEPS WITH DATABRICKS: Lakehouse SQL Exercises
#### This is going to an exercise for you to get your hands on. It is simple and straight forward. Your task is easy, you just to fill in the gaps (_____) then run your solution to see if you are right.


**Author** : TheDataLead AI Databricks Workshop

**Description** : Medallion architecture exercises using SQL (Bronze → Silver → Gold)

#### Run this to list the files in the S3 bucket

In [0]:
%sql

List 's3://thedatalead-data-engineering-projects-ingestion/workshop-demo/'

#### Run this to use catalog and schema

In [0]:
%sql

use catalog demo_catalog;
use library_schema

### Bronze Layer
- Create the right table to ingest the data.
- Use right path to the files to ingest the right table


In [0]:
%sql

DROP TABLE IF EXISTS books_bronze;
CREATE ______
USING CSV
OPTIONS (
  path = '--------',
  header = 'true',
  inferSchema = 'true'
);

### Bronze Layer
- Drop the table if it already exists.
- Make sure to provide the right file format.


In [0]:
%sql

DROP TABLE IF EXISTS __________;
CREATE TABLE borrowers_bronze
USING ____
OPTIONS (
  path 's3://thedatalead-data-engineering-projects-ingestion/workshop-demo/borrowers.csv',
  header 'true',
  inferSchema 'true'
);

### Silver Layer -> Clean and Standardize Data


#### Your task to clean books_bronze table by handling NULLs values and cast publish_date to DATE. Fill in the missing COALESCE defaults as needed.
-  format : COALESCE( ________, 'unknown') AS ______,
-  format : CAST( ________, AS DATE) AS ______,


In [0]:
%sql

CREATE OR REPLACE TABLE books_silver AS
SELECT
  isbn,
  title,
  author,
  genre,
  CAST(publish_date AS DATE) AS publish_date,
  CAST(pages AS INT) AS pages
FROM books_bronze;

#### Clean borrowers table and compute return_delay_days. 
##### Your task is to clean borrowers_bronze table by handling NULLs values and cast publish_date to DATE. 
-  COALESCE( ________, 'unknown') AS ______,
-  CAST( ________, AS DATE) AS ______,
- DATEDIFF(first_date, second_date)


In [0]:
%sql

CREATE OR REPLACE TABLE borrowers_silver AS
SELECT
  ______(user_id, 'unknown') AS user_id,
  COALESCE(name, 'anonymous') AS _____,
  COALESCE(book_isbn, 'unknown') AS book_isbn,
  CAST(_____(borrow_date, '2000-01-01') AS DATE) AS borrow_date,
  CAST(______(return_date, current_date()) AS DATE) AS return_date,
  ______(
    CAST(COALESCE(return_date, current_date()) AS DATE), CAST(COALESCE(borrow_date, '2000-01-01') AS DATE)
  ) AS return_delay_days
FROM borrowers_bronze;

### Gold Layer -> Business Analytics

#### Most Borrowed Books
- Your task is to create most_borrowed_books_gold from borrowers_silver and books_silver tables after performing a join operation

In [0]:
%sql

CREATE OR REPLACE TABLE most_borrowed_books_gold AS
SELECT
  b.title,
  COUNT(*) AS borrow_count
FROM borrowers_silver br
____ books_silver b ON br.book_isbn = b.isbn
GROUP BY b.title
ORDER BY borrow_count DESC;

#### Average delay by genre
- Your task is to create delay_by_genre_gold from borrowers_silver and books_silver tables after performing a join operation

In [0]:
%sql

CREATE OR REPLACE TABLE delay_by_genre_gold AS
SELECT
  b.genre,
  ROUND(AVG(br.return_delay_days), 2) AS avg_return_delay_days
FROM borrowers_silver br
____ books_silver b ON br.book_isbn = b.isbn
GROUP BY b.genre
ORDER BY avg_return_delay_days DESC;