# Relational Algebra (joins)

There are a few different kinds of joins in relational algebra:
* Inner
* (Left / Right / Full) Outer
* Cross Join

In [None]:
import pandas as pd

In [None]:
employees = pd.read_csv('https://hds5210-data.s3.amazonaws.com/employees.csv')

In [None]:
employees

In [None]:
departments = pd.read_csv('https://hds5210-data.s3.amazonaws.com/departments.csv')

In [None]:
departments

## Join Departments and Employees

Keeping all the Department names that show up in either set of data

In [None]:
total = departments.merge(employees, how='outer', left_on='Department', right_on='Department')

In [None]:
total

In [None]:
by_dept = total.groupby('Department')['Budget','ID'].agg({'ID':'count','Budget':'last'})

In [None]:
by_dept

In [None]:
by_dept['PerPerson'] = by_dept['Budget'] / by_dept['ID']

In [None]:
by_dept

## Different than Left Outer Join

With `left` and `right` joins, the meaning of those words is based on which side of the `merge()` function they are on.  In the examples below, `departments` is on the left and `employees` is on the right.

Note that there is no `None` department in the departments file!  So, it doesn't show up in this version of the join.

In [None]:
departments.merge(employees, how='left')

Note that there is noone in the `Facilities` department, so it won't show up in a right join.

In [None]:
departments.merge(employees, how='right')

In [None]:
departments.merge(employees, how='inner')

# Recursion demonstrated

In [None]:
def reverse(s):
    print("I was called with '{}'".format(s))
    if len(s) <= 1:
        print(" Returning just {}".format(s))
        return s
    else:
        print(" Concatenate '{}' with reverse('{}')".format(s[-1],s[0:-1]))
        return s[-1] + reverse(s[0:-1])

In [None]:
reverse('hello')

In [None]:
reverse('h')

# Getting the Supervisor

We can actually join a data frame back to itself

In [None]:
employees[['ID','Name','Title']].rename(index=str,
      columns={'ID': 'SupervisorID', 'Name': 'SupervisorName', 'Title':'SupervisorTitle'})

In [None]:
supervisors=employees[['ID','Name','Title']].rename(
    index=str,
    columns={'ID': 'SupervisorID', 'Name': 'SupervisorName', 'Title':'SupervisorTitle'})

reports = employees.merge(
    supervisors,
    how='left',
    left_on='SupervisorID',
    right_on='SupervisorID')

reports

In [None]:
reports.groupby('SupervisorName')['ID'].count()

## Recursion

We can actually do this recursively if we want to!

In [None]:
def get_all_reports(df, supervisor_id, level=1):
    direct = df[df['SupervisorID'] == supervisor_id]
    direct = direct.assign(Level=level)

    if len(direct) == 0:
        return direct
    else:
        subs = direct['ID']
        for s in subs:
            direct=pd.concat([direct,get_all_reports(df, s, level+1)])
        return direct


In [None]:
get_all_reports(employees, 18374)

In [None]:
get_all_reports(employees, 8232)

# Cross Join or Cartesian Product

The idea here it to create all possible combinations of rows from the two data frames.  There is no **key** to join on per se.

In [None]:
genders = ['M','F','O','U']
age_ranges = ['0-18', '19-64', '65-84', '85+']

index = pd.MultiIndex.from_product([genders, age_ranges], names = ["gender", "age_range"])

combinations = pd.DataFrame(index = index).reset_index()

In [None]:
combinations