Gold Layer - Business Analytics

In this layer we are going to produce business analysis from the by joining cleaned Silver data.
  - Most borrowed books
  - Average delay per genre
  - staff count by role
  
**Think of this as:** The library's monthly reports that help make decisions about which books to buy more of.

### We are importing all the necessary functions we will need to perform this task

In [0]:
from pyspark.sql.functions import count, avg, desc


### We are loading the data from the bronze tables into variables.

In [0]:

books_silver = spark.table("books_silver")
borrowers_silver = spark.table("borrowers_silver")
staff_silver = spark.table("staff_silver")


# Most borrowed books
#### This statement is going to create most_borrowed_books_gold from borrowers_silver and books_silver tables after performing a join operation

In [0]:
most_borrowed_books = (borrowers_silver.groupBy("book_isbn")
    .agg(count("*").alias("borrow_count"))
    .join(books_silver, borrowers_silver.book_isbn == books_silver.isbn)
    .select("title", "author", "genre", "borrow_count")
    .orderBy(desc("borrow_count"))
)


In [0]:
most_borrowed_books.write.mode("overwrite").format("delta").saveAsTable("most_borrowed_books_gold")

#### We are viewing the table to have a look at our data. We are limiting to just 10 role.

In [0]:
most_borrowed_books_df = spark.table("most_borrowed_books").limit(10)
most_borrowed_books_df.show()

# Average return delay per genre
#### This statement is going to create delay_by_genre_gold from borrowers_silver and books_silver tables after performing a join operation

In [0]:

delay_by_genre = (borrowers_silver
    .join(books_silver, borrowers_silver.book_isbn == books_silver.isbn)
    .groupBy("genre")
    .agg(avg("return_delay_days").alias("avg_return_delay_days"))
    .orderBy(desc("avg_return_delay_days"))
)


In [0]:
delay_by_genre.write.mode("overwrite").format("delta").saveAsTable("delay_by_genre_gold")

#### We are viewing the table to have a look at our data. We are limiting to just 10 role.

In [0]:
delay_by_genre_gold_df = spark.table("delay_by_genre_gold").limit(10)
delay_by_genre_gold_df.show()

# Count of staff by role
#### This SQL statement is going to create staff_count_by_role_gold from staff_silver tables.[](url)

In [0]:

staff_count_by_role = (
    staff_silver
    .groupBy("role")
    .agg(count("*").alias("staff_count"))
    .orderBy(desc("staff_count"))
)



In [0]:
staff_count_by_role.write.mode("overwrite").format("delta").saveAsTable("staff_count_by_role_gold")


#### We are viewing the table to have a look at our data. We are limiting to just 10 role.

In [0]:
staff_count_by_role_gold_df = spark.table("staff_count_by_role_gold").limit(10)
staff_count_by_role_gold_df.show()