## Note: Look at the end for solutions.

### What is a relational database?




### How does a relational database differ from other databases?





### What is a database management system and why do we use them?





### The NorthWind fictional database:




### What tables are present?




### What is inside the tables?





### What are some of the products named? 




### How are the tables related to each other?




### What products did companies buy?




### How can we order it by the company name?




### How many products did each company buy? 




### Are there any companies who have not purchased anything?




### What is the average number of products that each company bought?




### Suppose that a company wanted to run a promotion campaign to their big buyers. Can we create a table that has the CompanyID and type which is either "BigBuyer" or "SmallBuyer"?



### How can we create an index on the CustomerID?




### What does the Index do? 


### How do I drop/delete a table?

### What are the order of operations for a SQL query?

<ul>
<li>FROM</li> 
<li>WHERE</li> 
<li>GROUP BY</li> 
<li>SELECT</li> 
<li>DISTINCT</li> 
<li>ORDER BY</li>

### What is a relational database?

A: A relational database is composed of tables with relationships between tables.

### How does a relational database differ from other databases?

A: You could have a text file or a nested dictionary. These are not tables.

### What is a database management system and why do we use them?

A: A DBMS is a program, an interface between you and the tables in the database. 
Imagine that we used a certain block of memory to store our data. What problems
would we have?
<ul>

<li>1. Concurrency: If one person is updating every row of a table one by one and another is reading
every row, the person reading may get a half-updated view of the database. This is called concurrency.</li>
<li>2. Fault-tolerance: If the computer loses power and shuts off halfway through an update or insert, then the 
data may be incorrectly or formatted incorrectly.</li>
<li>3. Data independence: Suppose you were writing an application that used this data. What if you change how you manage the data? 
For instance instead of making updates directly to your data you could first make a copy of it, update the copy and then make the copy your live table by changing a pointer. Suppose you wanted to implement this in your application. In your application you might have pointed directly to where in memory the data was stored. Changing where the table is stored by first updating a copy would then break your program. By storing your data in a database and accessing it with a database management system, you can separate data management from your application. </li>
<li>4. Speed. We'll be using a relational database management system called PostGreSQL. We will be accessing it via the SQL language. By installing a new version of postgres, we can speed up all of our queries to the database, say if Postgres releases a new and better version, without changing our queries.</li>
<li>5. Finally, even if you are the only one accessing your database, you might have data spread out on several machines. You could use Spark to access this data and you can access it with SQL queries. You don't have to think about how to fetch the data from the cluster, Spark will figure it out for you. 
</li>
</ul>

The reason DBMS are used: We would like concurrent access to a database in a fault-tolerant way which keeps how data is managed and stored
separate from how it is used in an application.

1. and 2. are achieved with transactions and Xacts. A transaction takes the database from one consistent
state to another. A transaction is made up of Xacts and Xacts are re-ordered by the DBMS in order to 
execute concurrent queries quickly.
3. is achieved by using SQL to access data in the database. This way there can be an update to the DBMS
which makes queries faster without changing how an application communicates with the DBMS.
 
### The database

We will be using the The NorthWind fictional database. It is a fictional database for a fictional company which distributes foreign foods. You can find it in the northwind.postgres.sql file. 



### What tables are present?

The command in psql is \d

### What is inside the tables?

\d tablename

### What are some of the products named? 

SELECT "ProductName" FROM products LIMIT 20;

### How are the tables related to each other?

Both the customers and the orders table have the field CustomerID.

Both the orders and order_details table have the field OrderID.

Both the order_details and products table have the field ProductID.

### What products did companies buy?

We will have to join all four tables together. 

SELECT p."ProductName", c."CompanyName" FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
JOIN customers c
ON c."CustomerID"=o."CustomerID"
LIMIT 10;

### How can we order it by the company name?

SELECT p."ProductName", c."CompanyName" FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
JOIN customers c
ON c."CustomerID"=o."CustomerID"
ORDER BY
c."CompanyName"
LIMIT 10;


### How many products did each company buy? 

When we use aggregation functions like COUNT(), we will need to GROUP BY the field or condition that we would like to count. 

SELECT c."CompanyName", COUNT(c."CompanyName") count 
FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
RIGHT JOIN customers c
ON c."CustomerID"=o."CustomerID"
GROUP BY c."CompanyName"


### Are there any companies who have not purchased anything?

We cannot write a condition on the count before we have counted things. In order to do that, we need to select from the query that counted things. 

SELECT "CompanyName", B.count FROM 

(SELECT c."CompanyName", COUNT(c."CompanyName") as count 
FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
RIGHT JOIN customers c
ON c."CustomerID"=o."CustomerID"
GROUP BY c."CompanyName") B 

WHERE B.count=0;


### What is the average number of products that each company bought?

Postgres will not allow us to nest aggregation function. For instance, you cannot select MAX(COUNT(....)). In order to find the maximum of a count, we need to do a subquery. 

SELECT AVG(counts.count) FROM (SELECT c."CompanyName", COUNT(c."CompanyName") count
FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
JOIN customers c
ON c."CustomerID"=o."CustomerID"
GROUP BY c."CompanyName") counts;

### Suppose that a company wanted to run a promotion campaign to their big buyers. Big buyers are companies that buy more than the average. Can we create a table that has the CompanyID and a BigBuyer field with a 1 for big buyers and and a 0 for small buyers?

CREATE TABLE BigBuyer AS
SELECT B."CustomerID", 

CASE WHEN B.count>(SELECT AVG(counts.count) FROM (SELECT COUNT(c."CompanyName") count
FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
JOIN customers c
ON c."CustomerID"=o."CustomerID"
GROUP BY c."CustomerID") counts) THEN 1
ELSE 0 END as BigBuyer

FROM 
(SELECT o."CustomerID", COUNT(o."CustomerID") as count 
FROM orders o 
JOIN order_details od 
ON od."OrderID"=o."OrderID" 
JOIN products p 
ON p."ProductID"=od."ProductID" 
GROUP BY o."CustomerID") B;



### How can we create an index on the CustomerID?

CREATE INDEX BigBuyer_index ON bigbuyer("CustomerID");

### What does the Index do? 


### What are the order of operations for a SQL query?
Postgres will first look at which table you are querying from, then join tables, then evaluate the WHERE clause, then looks at which fields you wanted, then figure out if any of the fields are distinct and finally order the results in the end. This is summarized in the following list:

<ul>
<li>FROM</li> 
<li>JOIN</li>
<li>WHERE</li> 
<li>GROUP BY</li> 
<li>SELECT</li> 
<li>DISTINCT</li>
<li>ORDER BY</li>
</ul>