## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [None]:
# File location and type
file_location_bookings = "/FileStore/tables/Bookings.csv"
file_location_facilities = "/FileStore/tables/Facilities.csv"
file_location_members = "/FileStore/tables/Members.csv"

file_type = "csv"

# CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
bookings_df = (spark.read.format(file_type) 
                    .option("inferSchema", infer_schema) 
                    .option("header", first_row_is_header) 
                    .option("sep", delimiter) 
                    .load(file_location_bookings))

facilities_df = (spark.read.format(file_type) 
                      .option("inferSchema", infer_schema) 
                      .option("header", first_row_is_header) 
                      .option("sep", delimiter) 
                      .load(file_location_facilities))

members_df = (spark.read.format(file_type) 
                      .option("inferSchema", infer_schema) 
                      .option("header", first_row_is_header) 
                      .option("sep", delimiter) 
                      .load(file_location_members))

In [None]:
# Create a view or table

print('Bookings Schema')
bookings_df.printSchema()
print('Facilities Schema')
facilities_df.printSchema()
print('Members Schema')
members_df.printSchema()




#temp_table_name = "Bookings-1_csv"

#df.createOrReplaceTempView(temp_table_name)

Bookings Schema
root
 |-- bookid: integer (nullable = true)
 |-- facid: integer (nullable = true)
 |-- memid: integer (nullable = true)
 |-- starttime: timestamp (nullable = true)
 |-- slots: integer (nullable = true)

Facilities Schema
root
 |-- facid: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- membercost: double (nullable = true)
 |-- guestcost: double (nullable = true)
 |-- initialoutlay: integer (nullable = true)
 |-- monthlymaintenance: integer (nullable = true)

Members Schema
root
 |-- memid: integer (nullable = true)
 |-- surname: string (nullable = true)
 |-- firstname: string (nullable = true)
 |-- address: string (nullable = true)
 |-- zipcode: integer (nullable = true)
 |-- telephone: string (nullable = true)
 |-- recommendedby: integer (nullable = true)
 |-- joindate: timestamp (nullable = true)



In [None]:
%sql 
drop database if exists country_club cascade;
create database country_club;
show databases;

databaseName
country_club
default


In [None]:
permanent_table_name_bookings = "country_club.Bookings"
bookings_df.write.format("parquet").saveAsTable(permanent_table_name_bookings)

permanent_table_name_facilities = "country_club.Facilities"
facilities_df.write.format("parquet").saveAsTable(permanent_table_name_facilities)

permanent_table_name_members = "country_club.Members"
members_df.write.format("parquet").saveAsTable(permanent_table_name_members)

In [None]:
%sql
use country_club;
REFRESH table bookings;
REFRESH table facilities;
REFRESH table members;
show tables;

database,tableName,isTemporary
country_club,bookings,False
country_club,facilities,False
country_club,members,False


In [None]:
%sql
SELECT * FROM bookings LIMIT 3

bookid,facid,memid,starttime,slots
0,3,1,2012-07-03T11:00:00.000+0000,2
1,4,1,2012-07-03T08:00:00.000+0000,2
2,6,0,2012-07-03T18:00:00.000+0000,2


In [None]:
#### Q1: Some of the facilities charge a fee to members, but some do not. Please list the names of the facilities that do.

In [None]:
%sql
SELECT name FROM facilities WHERE membercost > 0

name
Tennis Court 1
Tennis Court 2
Massage Room 1
Massage Room 2
Squash Court


In [None]:
####Q2: How many facilities do not charge a fee to members?

In [None]:
%sql
SELECT count(*) FROM facilities WHERE membercost = 0

count(1)
4


In [None]:
###Q3: How can you produce a list of facilities that charge a fee to members, where the fee is less than 20% of the facility's monthly maintenance cost?
###Return the facid, facility name, member cost, and monthly maintenance of the facilities in question

In [None]:
%sql
SELECT facid, name, membercost, monthlymaintenance
FROM facilities WHERE membercost > 0  AND membercost / monthlymaintenance < 0.2

facid,name,membercost,monthlymaintenance
0,Tennis Court 1,5.0,200
1,Tennis Court 2,5.0,200
4,Massage Room 1,9.9,3000
5,Massage Room 2,9.9,3000
6,Squash Court,3.5,80


In [None]:
####Q4: How can you retrieve the details of facilities with ID 1 and 5? Write the query without using the OR operator.

In [None]:
%sql
SELECT * FROM facilities WHERE facid IN (1, 5)

facid,name,membercost,guestcost,initialoutlay,monthlymaintenance
1,Tennis Court 2,5.0,25.0,8000,200
5,Massage Room 2,9.9,80.0,4000,3000


In [None]:
####Q5: How can you produce a list of facilities, with each labelled as 'cheap' or 'expensive', depending on if their monthly maintenance cost is more than $100?¶
##Return the name and monthly maintenance of the facilities in question.

In [None]:
%sql
SELECT name, monthlymaintenance, 
CASE WHEN monthlymaintenance > 100 THEN 'expensive'
     ELSE 'cheap' END AS label
FROM facilities

name,monthlymaintenance,label
Tennis Court 1,200,expensive
Tennis Court 2,200,expensive
Badminton Court,50,cheap
Table Tennis,10,cheap
Massage Room 1,3000,expensive
Massage Room 2,3000,expensive
Squash Court,80,cheap
Snooker Table,15,cheap
Pool Table,15,cheap


In [None]:
###Q6: You'd like to get the first and last name of the last member(s) who signed up. Do not use the LIMIT clause for your solution.

In [None]:
%sql
SELECT firstname, surname FROM members
WHERE joindate = ( SELECT MAX(joindate)  FROM members)

firstname,surname
Darren,Smith


In [None]:
####Q7: How can you produce a list of all members who have used a tennis court?

In [None]:
%sql
SELECT sub.court, CONCAT( sub.firstname,  ' ', sub.surname ) AS name
    FROM (SELECT facilities.name AS court, members.firstname AS firstname, members.surname AS surname
        FROM bookings INNER JOIN facilities ON bookings.facid = facilities.facid
        AND facilities.name LIKE  'Tennis Court%'
        INNER JOIN members ON bookings.memid = members.memid ) sub
        GROUP BY sub.court, sub.firstname, sub.surname
ORDER BY name

court,name
Tennis Court 2,Anne Baker
Tennis Court 1,Anne Baker
Tennis Court 1,Burton Tracy
Tennis Court 2,Burton Tracy
Tennis Court 1,Charles Owen
Tennis Court 2,Charles Owen
Tennis Court 2,Darren Smith
Tennis Court 2,David Farrell
Tennis Court 1,David Farrell
Tennis Court 1,David Jones


In [None]:
#Q8: How can you produce a list of bookings on the day of 2012-09-14 which will cost the member (or guest) more than $30?

 #   Remember that guests have different costs to members (the listed costs are per half-hour 'slot')
  #  The guest user's ID is always 0.

#Include in your output the name of the facility, the name of the member formatted as a single column, and the cost.

 #   Order by descending cost, and do not use any subqueries.



In [None]:
%sql
SELECT facilities.name AS facility, CONCAT( members.firstname,  ' ', members.surname ) AS name, CASE 
        WHEN bookings.memid =0 THEN facilities.guestcost * bookings.slots
        ELSE facilities.membercost * bookings.slots
    END AS cost
    FROM bookings INNER JOIN facilities 
        ON bookings.facid = facilities.facid
        AND bookings.starttime LIKE  '2012-09-14%'
        AND (((bookings.memid =0) AND (facilities.guestcost * bookings.slots >30))
        OR ((bookings.memid !=0) AND (facilities.membercost * bookings.slots >30)))
        INNER JOIN members ON bookings.memid = members.memid
ORDER BY cost DESC

facility,name,cost
Massage Room 2,GUEST GUEST,320.0
Massage Room 1,GUEST GUEST,160.0
Massage Room 1,GUEST GUEST,160.0
Massage Room 1,GUEST GUEST,160.0
Tennis Court 2,GUEST GUEST,150.0
Tennis Court 1,GUEST GUEST,75.0
Tennis Court 1,GUEST GUEST,75.0
Tennis Court 2,GUEST GUEST,75.0
Squash Court,GUEST GUEST,70.0
Massage Room 1,Jemima Farrell,39.6


In [None]:
####Q9: This time, produce the same result as in Q8, but using a subquery.


In [None]:
%sql
SELECT * FROM (SELECT facilities.name AS facility, CONCAT( members.firstname,  ' ', members.surname ) AS name, CASE
        WHEN bookings.memid =0 THEN facilities.guestcost * bookings.slots
        ELSE facilities.membercost * bookings.slots
    END AS cost
    FROM bookings
    INNER JOIN facilities 
        ON bookings.facid = facilities.facid
        AND bookings.starttime LIKE  '2012-09-14%'
    INNER JOIN members 
        ON bookings.memid = members.memid )sub
    WHERE sub.cost >30
ORDER BY sub.cost DESC

facility,name,cost
Massage Room 2,GUEST GUEST,320.0
Massage Room 1,GUEST GUEST,160.0
Massage Room 1,GUEST GUEST,160.0
Massage Room 1,GUEST GUEST,160.0
Tennis Court 2,GUEST GUEST,150.0
Tennis Court 1,GUEST GUEST,75.0
Tennis Court 1,GUEST GUEST,75.0
Tennis Court 2,GUEST GUEST,75.0
Squash Court,GUEST GUEST,70.0
Massage Room 1,Jemima Farrell,39.6


In [None]:
###Q10: Produce a list of facilities with a total revenue less than 1000.¶

   ## The output should have facility name and total revenue, sorted by revenue.
   ## Remember that there's a different cost for guests and members!


In [None]:
%sql
SELECT facs.name, sum(CASE 
        WHEN memid = 0 THEN slots * facs.guestcost
        ELSE slots * membercost
    END) AS revenue
    FROM bookings bks
    INNER JOIN facilities facs
        ON bks.facid = facs.facid
    GROUP BY facs.name
    HAVING sum(CASE 
    WHEN memid = 0 THEN slots * facs.guestcost
        ELSE slots * membercost
    END) < 1000
ORDER BY revenue;

name,revenue
Table Tennis,180.0
Snooker Table,240.0
Pool Table,270.0
