# Springboard SQL Mini-project

This mini-project demonstrates data-wrangling with SQL. I set up a local SQLite database and will interact with it from within Python. Whenever a query returns a table, I read the data into pandas since this displays them in a more readable way.


Let's start by connecting to the database:

In [1]:
import sqlite3
import pandas as pd
conn = sqlite3.connect('country_club.sqlite')
c = conn.cursor()

Now let's take a look what tables the database contains:

In [2]:
c.execute('''SELECT name from sqlite_master WHERE type='table';''')
c.fetchall()

[('Bookings',), ('Facilities',), ('Members',)]

**Q1: Some of the facilities charge a fee to members, but some do not.
Please list the names of the facilities that do.**

In [3]:
for row in c.execute('''SELECT name
                          FROM Facilities
                         WHERE membercost>0'''):
    print(row)

('Tennis Court 1',)
('Tennis Court 2',)
('Massage Room 1',)
('Massage Room 2',)
('Squash Court',)


**Q2: How many facilities do not charge a fee to members?**

In [4]:
c.execute('''SELECT COUNT(name) AS free_facilities
               FROM Facilities
              WHERE membercost=0''')

c.fetchone()

(4,)

**Q3: How can you produce a list of facilities that charge a fee to members,
where the fee is less than 20% of the facility's monthly maintenance cost?
Return the facid, facility name, member cost, and monthly maintenance of the
facilities in question.**

In [5]:
pd.read_sql_query('''SELECT facid, 
                            name, membercost, monthlymaintenance
                       FROM Facilities
                      WHERE membercost > 0 AND membercost/monthlymaintenance < .2''', conn)

Unnamed: 0,facid,name,membercost,monthlymaintenance
0,0,Tennis Court 1,5.0,200
1,1,Tennis Court 2,5.0,200
2,4,Massage Room 1,9.9,3000
3,5,Massage Room 2,9.9,3000
4,6,Squash Court,3.5,80


** Q4: How can you retrieve the details of facilities with ID 1 and 5?
Write the query without using the OR operator. **

In [6]:
pd.read_sql_query('''SELECT *
                       FROM Facilities
                      WHERE facid IN (1, 5)''', conn)                           

Unnamed: 0,facid,name,membercost,guestcost,initialoutlay,monthlymaintenance
0,1,Tennis Court 2,5.0,25,8000,200
1,5,Massage Room 2,9.9,80,4000,3000


** Q5: How can you produce a list of facilities, with each labelled as
'cheap' or 'expensive', depending on if their monthly maintenance cost is
more than $100? Return the name and monthly maintenance of the facilities
in question. **

In [7]:
pd.read_sql_query('''SELECT name,
                            monthlymaintenance, 
                            CASE WHEN monthlymaintenance > 100 THEN 'expensive'
                                 WHEN monthlymaintenance < 100 THEN 'cheap' END AS price_point
                       FROM Facilities ''', conn)

Unnamed: 0,name,monthlymaintenance,price_point
0,Tennis Court 1,200,expensive
1,Tennis Court 2,200,expensive
2,Badminton Court,50,cheap
3,Table Tennis,10,cheap
4,Massage Room 1,3000,expensive
5,Massage Room 2,3000,expensive
6,Squash Court,80,cheap
7,Snooker Table,15,cheap
8,Pool Table,15,cheap


** Q6: You'd like to get the first and last name of the last member(s)
who signed up. Do not use the LIMIT clause for your solution. **

The following doesn't seem to work in MySQL, but it works in SQLite:

In [8]:
pd.read_sql_query('''SELECT max(joindate) AS latest_joindate,
                            surname, firstname
                            FROM Members''', conn)

Unnamed: 0,latest_joindate,surname,firstname
0,2012-09-26 18:08:45,Smith,Darren


Alternativel way with subquery that should also work in MySQL:

In [9]:
pd.read_sql_query('''SELECT joindate,
                            surname, firstname
                       FROM Members
                      WHERE joindate = (SELECT MAX(joindate)
                                          FROM Members)''', conn)

Unnamed: 0,joindate,surname,firstname
0,2012-09-26 18:08:45,Smith,Darren


** Q7: How can you produce a list of all members who have used a tennis court?
Include in your output the name of the court, and the name of the member
formatted as a single column. Ensure no duplicate data, and order by
the member name. **

In [10]:
pd.read_sql_query('''SELECT DISTINCT(Members.surname || ', ' || Members.firstname) AS full_name,
                            Facilities.name
                       FROM Bookings  
                  LEFT JOIN Members 
                         ON Members.memid = Bookings.memid
                  LEFT JOIN Facilities
                         ON Bookings.facid = Facilities.facid
                      WHERE Facilities.name  LIKE 'Tennis Court%'
                   ORDER BY full_name, Facilities.name''', conn)

Unnamed: 0,full_name,name
0,"Bader, Florence",Tennis Court 1
1,"Bader, Florence",Tennis Court 2
2,"Baker, Anne",Tennis Court 1
3,"Baker, Anne",Tennis Court 2
4,"Baker, Timothy",Tennis Court 1
5,"Baker, Timothy",Tennis Court 2
6,"Boothe, Tim",Tennis Court 1
7,"Boothe, Tim",Tennis Court 2
8,"Butters, Gerald",Tennis Court 1
9,"Butters, Gerald",Tennis Court 2


**Q8: How can you produce a list of bookings on the day of 2012-09-14 which
will cost the member (or guest) more than $30? Remember that guests have
different costs to members (the listed costs are per half-hour 'slot'), and
the guest user's ID is always 0. Include in your output the name of the
facility, the name of the member formatted as a single column, and the cost.
Order by descending cost, and do not use any subqueries. **

In [11]:
pd.read_sql_query(''' SELECT Members.surname || ', ' || Members.firstname AS member_name,
                            Facilities.name AS facility_name,
                            CASE WHEN Bookings.memid = 0 THEN Facilities.guestcost*Bookings.slots
                                 ELSE Facilities.membercost*Bookings.slots END AS cost
                       FROM Bookings
                  LEFT JOIN Facilities
                         ON Bookings.facid = Facilities.facid
                  LEFT JOIN Members
                         ON Members.memid = Bookings.memid
                      WHERE date(starttime) = '2012-09-14'
                        AND cost > 30
                   ORDER BY cost DESC''', conn)

Unnamed: 0,member_name,facility_name,cost
0,"GUEST, GUEST",Massage Room 2,320.0
1,"GUEST, GUEST",Massage Room 1,160.0
2,"GUEST, GUEST",Massage Room 1,160.0
3,"GUEST, GUEST",Massage Room 1,160.0
4,"GUEST, GUEST",Tennis Court 2,150.0
5,"GUEST, GUEST",Tennis Court 1,75.0
6,"GUEST, GUEST",Tennis Court 1,75.0
7,"GUEST, GUEST",Tennis Court 2,75.0
8,"GUEST, GUEST",Squash Court,70.0
9,"Farrell, Jemima",Massage Room 1,39.6


**Q9: This time, produce the same result as in Q8, but using a subquery. **

In [12]:
pd.read_sql_query('''SELECT Members.surname || ', ' || Members.firstname AS member_name,
                            Costs.facility_name, Costs.cost
                            FROM (
                                 SELECT Facilities.name AS facility_name,
                                   CASE WHEN Bookings.memid = 0 THEN Facilities.guestcost*Bookings.slots
                                        ELSE Facilities.membercost*Bookings.slots END AS cost,
                                        starttime, memid
                                   FROM Bookings
                              LEFT JOIN Facilities
                                     ON Bookings.facid = Facilities.facid
                                  ) Costs
                  LEFT JOIN Members
                         ON Members.memid = Costs.memid
                      WHERE date(starttime) = '2012-09-14'
                        AND cost > 30
                   ORDER BY cost DESC''', conn)

Unnamed: 0,member_name,facility_name,cost
0,"GUEST, GUEST",Massage Room 2,320.0
1,"GUEST, GUEST",Massage Room 1,160.0
2,"GUEST, GUEST",Massage Room 1,160.0
3,"GUEST, GUEST",Massage Room 1,160.0
4,"GUEST, GUEST",Tennis Court 2,150.0
5,"GUEST, GUEST",Tennis Court 1,75.0
6,"GUEST, GUEST",Tennis Court 1,75.0
7,"GUEST, GUEST",Tennis Court 2,75.0
8,"GUEST, GUEST",Squash Court,70.0
9,"Farrell, Jemima",Massage Room 1,39.6


**Q10: Produce a list of facilities with a total revenue less than 1000.
The output of facility name and total revenue, sorted by revenue. Remember
that there's a different cost for guests and members! **

In [13]:
pd.read_sql_query('''SELECT Facilities.name AS facility_name,
                            sum(CASE WHEN Bookings.memid = 0 THEN Facilities.guestcost*Bookings.slots
                                     ELSE Facilities.membercost*Bookings.slots END) AS revenue
                       FROM Bookings
                  LEFT JOIN Facilities
                         ON Bookings.facid = Facilities.facid
                      GROUP BY facility_name
                     HAVING revenue < 1000
                   ORDER BY revenue''', conn)

Unnamed: 0,facility_name,revenue
0,Table Tennis,180
1,Snooker Table,240
2,Pool Table,270


In [14]:
# Close connection
conn.close()