# Section 3
## Read in database credentials
Below the credentials for connecting to the database are read into variables by extracting the lines from the local file. The local file is not included in the repo.

In [57]:
db_name = ""
db_user = ""
db_pass = ""
db_host = ""
with open("database_credentials.txt") as f:
    db_name = f.readline().strip()
    db_user = f.readline().strip()
    db_pass = f.readline().strip()
    db_host = f.readline().strip()

## Shorthand connect to database
This method will just return a new database connection with the default credentials made available above.

In [58]:
import pymysql as pms

In [59]:
def get_connect():
    """
    Returns a database connection object using the default parameters
    specified in the database_credentials file read in at the start of
    this notebook.
    """
    return pms.connect(host=db_host, user=db_user, passwd=db_pass, db=db_name);

## Test connection
The next code segment tests to ensure that the database connection is working properly.

In [60]:
try:
    con = get_connect()
    print("Successfully connected")
finally:
    if con:
        print("Closing connection")
        con.close()

Successfully connected
Closing connection


In [61]:
def get_connect():
    """
    Returns a database connection object using the default parameters
    specified in the database_credentials file read in at the start of
    this notebook.
    """
    return pms.connect(host=db_host, user=db_user, passwd=db_pass, db=db_name);

## Shorthand query execution and output
The method below accepts a single parameter (expected query), executes the parameter as a SQL query, and outputs the results. The connection is closed before the function terminates.

In [62]:
def execute_sql_output_result(query_string):
    """
    Given the query_string parameter, this function connects to the database, executes
    the query, outputs the result, and closes the connection.
    """
    try:
        con = get_connect()
        with con.cursor() as cur:
            #If the query_string is a single string, execute the string
            if type(query_string) == str:
                cur.execute(query_string)
                result = cur.fetchall()
                print("=== {} RESULTS ===".format(len(result)))
                #Column names
                print(" ".join([i[0] for i in cur.description]))
                #Results
                for i in range(len(result)):
                    print("{}: {}".format(i, result[i]))
                print()
    finally:
        if con:
            con.close()

## Subqueries
`SELECT` statements can be made within `SELECT` statements, which nests a query within another query.

In [63]:
execute_sql_output_result("""
    SELECT * FROM dept WHERE
        deptno = (SELECT deptno FROM dept WHERE deptno = 30);
""")
execute_sql_output_result("""
    SELECT * FROM dept WHERE
        deptno < (SELECT deptno FROM dept WHERE deptno = 30)
        AND dname = 'ACCOUNTING';
""")

=== 1 RESULTS ===
deptno dname loc
0: (30, 'SALES', 'CHICAGO')

=== 1 RESULTS ===
deptno dname loc
0: (10, 'ACCOUNTING', 'NEW YORK')



## JOIN
It's often a common necessity to link data from 2 table togehter to attain a filtered result.

In [64]:
execute_sql_output_result("""
    SELECT * FROM dept WHERE loc = 'CHICAGO';
""")

execute_sql_output_result("""
    SELECT * FROM emp WHERE
        deptno = (SELECT deptno FROM dept WHERE loc = 'CHICAGO');
""")

=== 1 RESULTS ===
deptno dname loc
0: (30, 'SALES', 'CHICAGO')

=== 6 RESULTS ===
empno ename job mgr hiredate sal comm deptno
0: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.date(1981, 2, 20), 1600.0, 300.0, 30)
1: (7521, 'WARD', 'SALESMAN', 7698, datetime.date(1981, 2, 22), 1250.0, 500.0, 30)
2: (7654, 'MARTIN', 'SALESMAN', 7698, datetime.date(1981, 9, 28), 1250.0, 1400.0, 30)
3: (7698, 'BLAKE', 'MANAGER', 7839, datetime.date(1981, 5, 1), 2850.0, None, 30)
4: (7844, 'TURNER', 'SALESMAN', 7698, datetime.date(1981, 9, 8), 1500.0, 0.0, 30)
5: (7900, 'JAMES', 'CLERK', 7698, datetime.date(1981, 12, 3), 950.0, None, 30)



`JOIN` can be used to do this more easily than with subqueries. Notice how both of the below tables have a *deptno* attribute.

In [65]:
execute_sql_output_result("""
    SELECT * FROM emp LIMIT 1;
""")

execute_sql_output_result("""
    SELECT * FROM dept;
""")  

=== 1 RESULTS ===
empno ename job mgr hiredate sal comm deptno
0: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20)

=== 4 RESULTS ===
deptno dname loc
0: (10, 'ACCOUNTING', 'NEW YORK')
1: (20, 'RESEARCH', 'DALLAS')
2: (30, 'SALES', 'CHICAGO')
3: (40, 'OPERATIONS', 'BOSTON')



The dot notation can be used to join 2 tables together by a specific column value.

In [66]:
execute_sql_output_result("""
    SELECT * FROM emp, dept WHERE
        emp.deptno = dept.deptno;
""")

=== 14 RESULTS ===
empno ename job mgr hiredate sal comm deptno deptno dname loc
0: (7782, 'CLARK', 'MANAGER', 7839, datetime.date(1981, 6, 9), 2450.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
1: (7839, 'KING', 'PRESIDENT', None, datetime.date(1981, 11, 17), 5000.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
2: (7934, 'MILLER', 'CLERK', 7782, datetime.date(1982, 1, 23), 1300.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
3: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20, 20, 'RESEARCH', 'DALLAS')
4: (7566, 'JONES', 'MANAGER', 7839, datetime.date(1981, 4, 2), 2975.0, None, 20, 20, 'RESEARCH', 'DALLAS')
5: (7788, 'SCOTT', 'ANALYST', 7566, datetime.date(1981, 12, 9), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
6: (7876, 'ADAMS', 'CLERK', 7788, datetime.date(1983, 1, 12), 1100.0, None, 20, 20, 'RESEARCH', 'DALLAS')
7: (7902, 'FORD', 'ANALYST', 7566, datetime.date(1981, 12, 3), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
8: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.dat

**NOTE**: In the below query, the prefix specified with the **.** can be omitted if the column name is unique to only 1 of the tables being used. The difference is shown by using both `emp.ename` and just `job` without the prefix `emp.` (both are valid, the prefix is more specific). However, for a column value like *deptno* that exists in both tables MUST be specified with a prefix.

In [67]:
execute_sql_output_result("""
    SELECT emp.ename AS Name, job Job, sal Salary FROM emp, dept WHERE
        emp.deptno = dept.deptno AND 
        loc = 'DALLAS';
""")

=== 5 RESULTS ===
Name Job Salary
0: ('SMITH', 'CLERK', 800.0)
1: ('JONES', 'MANAGER', 2975.0)
2: ('SCOTT', 'ANALYST', 3000.0)
3: ('ADAMS', 'CLERK', 1100.0)
4: ('FORD', 'ANALYST', 3000.0)



### Table aliases
The query above is identical to the one below. The only difference is that now the tables are being given aliases. However, note that when an alias to a table is given, ALL `table.` prefixes must be updated to `prefix.`.

In [68]:
execute_sql_output_result("""
    SELECT ename AS Name, job Job, sal Salary FROM emp e, dept d WHERE
        e.deptno = d.deptno AND 
        loc = 'DALLAS'
""")

=== 5 RESULTS ===
Name Job Salary
0: ('SMITH', 'CLERK', 800.0)
1: ('JONES', 'MANAGER', 2975.0)
2: ('SCOTT', 'ANALYST', 3000.0)
3: ('ADAMS', 'CLERK', 1100.0)
4: ('FORD', 'ANALYST', 3000.0)



### Replacing tables with subqueries
Subqueries can actually replace their underlying tables. The above and below queries are, in terms of results, identical.

In [69]:
execute_sql_output_result("""
    SELECT ename AS Name, job Job, sal Salary FROM (SELECT * FROM emp) e, dept d WHERE
        e.deptno = d.deptno AND 
        loc = 'DALLAS';
""")

=== 5 RESULTS ===
Name Job Salary
0: ('SMITH', 'CLERK', 800.0)
1: ('JONES', 'MANAGER', 2975.0)
2: ('SCOTT', 'ANALYST', 3000.0)
3: ('ADAMS', 'CLERK', 1100.0)
4: ('FORD', 'ANALYST', 3000.0)



To find only the employees who are managers and clerks working in Dallas, the below query can be made.

In [70]:
execute_sql_output_result("""
    SELECT ename, job, sal FROM
        (SELECT * FROM emp WHERE job in ('MANAGER', 'CLERK')) eFilter,
        (SELECT * FROM dept WHERE loc = 'DALLAS') dFilter
        WHERE eFilter.deptno = dFilter.deptno;
""")

execute_sql_output_result("""
    SELECT * FROM emp;
""")

execute_sql_output_result("""
    SELECT * FROM dept;
""")

=== 3 RESULTS ===
ename job sal
0: ('SMITH', 'CLERK', 800.0)
1: ('JONES', 'MANAGER', 2975.0)
2: ('ADAMS', 'CLERK', 1100.0)

=== 14 RESULTS ===
empno ename job mgr hiredate sal comm deptno
0: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20)
1: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.date(1981, 2, 20), 1600.0, 300.0, 30)
2: (7521, 'WARD', 'SALESMAN', 7698, datetime.date(1981, 2, 22), 1250.0, 500.0, 30)
3: (7566, 'JONES', 'MANAGER', 7839, datetime.date(1981, 4, 2), 2975.0, None, 20)
4: (7654, 'MARTIN', 'SALESMAN', 7698, datetime.date(1981, 9, 28), 1250.0, 1400.0, 30)
5: (7698, 'BLAKE', 'MANAGER', 7839, datetime.date(1981, 5, 1), 2850.0, None, 30)
6: (7782, 'CLARK', 'MANAGER', 7839, datetime.date(1981, 6, 9), 2450.0, None, 10)
7: (7788, 'SCOTT', 'ANALYST', 7566, datetime.date(1981, 12, 9), 3000.0, None, 20)
8: (7839, 'KING', 'PRESIDENT', None, datetime.date(1981, 11, 17), 5000.0, None, 10)
9: (7844, 'TURNER', 'SALESMAN', 7698, datetime.date(1981, 9, 8), 1500.

## Using different notation for JOINs (preferred)
### INNER JOIN
`INNER JOIN`s (as previously done) join multiple tables based on common values.

In [71]:
execute_sql_output_result("""
    SELECT * FROM emp INNER JOIN dept ON
        emp.deptno = dept.deptno;
""")

=== 14 RESULTS ===
empno ename job mgr hiredate sal comm deptno deptno dname loc
0: (7782, 'CLARK', 'MANAGER', 7839, datetime.date(1981, 6, 9), 2450.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
1: (7839, 'KING', 'PRESIDENT', None, datetime.date(1981, 11, 17), 5000.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
2: (7934, 'MILLER', 'CLERK', 7782, datetime.date(1982, 1, 23), 1300.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
3: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20, 20, 'RESEARCH', 'DALLAS')
4: (7566, 'JONES', 'MANAGER', 7839, datetime.date(1981, 4, 2), 2975.0, None, 20, 20, 'RESEARCH', 'DALLAS')
5: (7788, 'SCOTT', 'ANALYST', 7566, datetime.date(1981, 12, 9), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
6: (7876, 'ADAMS', 'CLERK', 7788, datetime.date(1983, 1, 12), 1100.0, None, 20, 20, 'RESEARCH', 'DALLAS')
7: (7902, 'FORD', 'ANALYST', 7566, datetime.date(1981, 12, 3), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
8: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.dat

### RIGHT/LEFT JOIN
Right join specifies to give all of the data from the table on the right. Left join specifies to give all of the data from the leftward table. The keyword LEFT or RIGHT specifies which of the two tables we get all of the data output from.

**NOTE**: `RIGHT OUTER JOIN` and `LEFT OUTER JOIN` are synonymous with `RIGHT JOIN` and `LEFT JOIN`.

In [72]:
#Right join
print("RIGHT JOIN")
execute_sql_output_result("""
    SELECT * FROM emp RIGHT JOIN dept ON
        emp.deptno = dept.deptno;
""")

#Left join
print("LEFT JOIN")
execute_sql_output_result("""
    SELECT * FROM emp LEFT JOIN dept ON
        emp.deptno = dept.deptno;
""")

RIGHT JOIN
=== 15 RESULTS ===
empno ename job mgr hiredate sal comm deptno deptno dname loc
0: (7782, 'CLARK', 'MANAGER', 7839, datetime.date(1981, 6, 9), 2450.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
1: (7839, 'KING', 'PRESIDENT', None, datetime.date(1981, 11, 17), 5000.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
2: (7934, 'MILLER', 'CLERK', 7782, datetime.date(1982, 1, 23), 1300.0, None, 10, 10, 'ACCOUNTING', 'NEW YORK')
3: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20, 20, 'RESEARCH', 'DALLAS')
4: (7566, 'JONES', 'MANAGER', 7839, datetime.date(1981, 4, 2), 2975.0, None, 20, 20, 'RESEARCH', 'DALLAS')
5: (7788, 'SCOTT', 'ANALYST', 7566, datetime.date(1981, 12, 9), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
6: (7876, 'ADAMS', 'CLERK', 7788, datetime.date(1983, 1, 12), 1100.0, None, 20, 20, 'RESEARCH', 'DALLAS')
7: (7902, 'FORD', 'ANALYST', 7566, datetime.date(1981, 12, 3), 3000.0, None, 20, 20, 'RESEARCH', 'DALLAS')
8: (7499, 'ALLEN', 'SALESMAN', 7698, d

## FULL OUTER JOIN
Outer joins are used for finding a record that may not have a match in another table [see example with Tim](https://stackoverflow.com/questions/4796872/how-to-do-a-full-outer-join-in-mysql).
* "A full outer join would give us all records from both tables, whether or not they have a match in the other table, with NULLs on both sides where there is no match."

However, `FULL JOIN`s are not implemented in MySQL, but can be emulated using code from the first answer in the source above.

In [73]:
sql_syntax = """
    SELECT * FROM emp FULL OUTER JOIN dept ON
        emp.deptno = dept.deptno;
"""
#execute_sql_output_result(sql_syntax)

In [79]:
execute_sql_output_result("""
    SELECT * FROM
        (SELECT * FROM emp WHERE job = 'SALESMAN') e RIGHT JOIN dept
        ON
            e.deptno = dept.deptno;
""")

=== 7 RESULTS ===
empno ename job mgr hiredate sal comm deptno deptno dname loc
0: (None, None, None, None, None, None, None, None, 10, 'ACCOUNTING', 'NEW YORK')
1: (None, None, None, None, None, None, None, None, 20, 'RESEARCH', 'DALLAS')
2: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.date(1981, 2, 20), 1600.0, 300.0, 30, 30, 'SALES', 'CHICAGO')
3: (7521, 'WARD', 'SALESMAN', 7698, datetime.date(1981, 2, 22), 1250.0, 500.0, 30, 30, 'SALES', 'CHICAGO')
4: (7654, 'MARTIN', 'SALESMAN', 7698, datetime.date(1981, 9, 28), 1250.0, 1400.0, 30, 30, 'SALES', 'CHICAGO')
5: (7844, 'TURNER', 'SALESMAN', 7698, datetime.date(1981, 9, 8), 1500.0, 0.0, 30, 30, 'SALES', 'CHICAGO')
6: (None, None, None, None, None, None, None, None, 40, 'OPERATIONS', 'BOSTON')



## EXISTS
Exists can be used in conjunction with NOT to specify a boolean value. However, EXISTS in this context is inefficient since it must be run for each row in the query.

In [84]:
execute_sql_output_result("""
    SELECT * FROM emp WHERE
        EXISTS (SELECT * FROM dual)
""")

execute_sql_output_result("""
    SELECT * FROM emp WHERE
        NOT EXISTS (SELECT * FROM dual)
""")

=== 14 RESULTS ===
empno ename job mgr hiredate sal comm deptno
0: (7369, 'SMITH', 'CLERK', 7902, datetime.date(1980, 12, 17), 800.0, None, 20)
1: (7499, 'ALLEN', 'SALESMAN', 7698, datetime.date(1981, 2, 20), 1600.0, 300.0, 30)
2: (7521, 'WARD', 'SALESMAN', 7698, datetime.date(1981, 2, 22), 1250.0, 500.0, 30)
3: (7566, 'JONES', 'MANAGER', 7839, datetime.date(1981, 4, 2), 2975.0, None, 20)
4: (7654, 'MARTIN', 'SALESMAN', 7698, datetime.date(1981, 9, 28), 1250.0, 1400.0, 30)
5: (7698, 'BLAKE', 'MANAGER', 7839, datetime.date(1981, 5, 1), 2850.0, None, 30)
6: (7782, 'CLARK', 'MANAGER', 7839, datetime.date(1981, 6, 9), 2450.0, None, 10)
7: (7788, 'SCOTT', 'ANALYST', 7566, datetime.date(1981, 12, 9), 3000.0, None, 20)
8: (7839, 'KING', 'PRESIDENT', None, datetime.date(1981, 11, 17), 5000.0, None, 10)
9: (7844, 'TURNER', 'SALESMAN', 7698, datetime.date(1981, 9, 8), 1500.0, 0.0, 30)
10: (7876, 'ADAMS', 'CLERK', 7788, datetime.date(1983, 1, 12), 1100.0, None, 20)
11: (7900, 'JAMES', 'CLERK', 76

### Better use cases (Correlated Subqueries)
Correlated subqueries correlate a subquery with it's corresponding outer query. In the below queries, the subquery uses the outer alias d for it's own query.

In [86]:
execute_sql_output_result("""
    SELECT d.* FROM dept d WHERE EXISTS
        (SELECT * FROM emp WHERE d.deptno = emp.deptno)
""")

execute_sql_output_result("""
    SELECT d.* FROM dept d WHERE NOT EXISTS
        (SELECT * FROM emp WHERE d.deptno = emp.deptno)
""")

=== 3 RESULTS ===
deptno dname loc
0: (10, 'ACCOUNTING', 'NEW YORK')
1: (20, 'RESEARCH', 'DALLAS')
2: (30, 'SALES', 'CHICAGO')

=== 1 RESULTS ===
deptno dname loc
0: (40, 'OPERATIONS', 'BOSTON')

