# Implement Hash Join

### Introduction

In this lesson, we'll ask you to implement the hash join procedure

Beyond allowing us to better absorb the procedure, it's also a pretty good leetcode style problems.  Let's get started.

### Working with our data

Let's say that we have the following data representing the orders table, and the customers table.

In [6]:
orders = [{'customer_id': 1, 'product': 'phone'}, 
          {'customer_id': 1, 'product': 'tshirt'},
          {'customer_id': 4, 'product': 'camera'},
          {'customer_id': 6, 'product': 'watch'}]

In [5]:
customers = [{'id': 1, 'name': 'sam'},
             {'id': 2, 'name': 'bob'},
             {'id': 4, 'name': 'tina'},
             {'id': 6, 'name': 'clayton'}]

And let's say that someone now performs the following query.

`select * from orders join customers on orders.customer_id = customers.id;`

Implement the hash join.  Remember that this involves two operations: (1) hashing the smaller table, and (2) a sequential scan of the remaining table.

In [52]:
def sort_tables(table_one, table_two):
    tables = [table_one, table_two]
    return sorted(tables, key = lambda table: len(table))

def build_hash(smaller_table, col_one, col_two):
    hashed_smaller = {}
    for row in smaller_table:
        hashed_col = row.get(col_one) or row.get(col_two)
        if hashed_smaller.get(hashed_col):
            hashed_smaller[hashed_col].append(row)
        else:
            hashed_smaller[hashed_col] = [row]
    return hashed_smaller

def merge_rows(hashed_table, larger_table, col_one, col_two):
    returned_rows = []
    for row in larger_table:
        match_id = row.get(col_one) or row.get(col_two)
        matching_rows = hashed_table.get(match_id)
        if matching_rows:
            returned_rows = returned_rows + [{**matching_row, **row} for matching_row in matching_rows]
    return returned_rows

def hash_join(table_one, col_one, table_two, col_two):
    sorted_tables = sort_tables(table_one, table_two)
    smaller_table, larger_table = sorted_tables
    hashed_table = build_hash(smaller_table, col_one, col_two)
    merged = merge_rows(hashed_table, larger_table, col_one, col_two)
    return merged

In [53]:
hash_join(orders, 'customer_id', customers, 'id')

# [{'customer_id': 1, 'product': 'phone', 'id': 1, 'name': 'sam'},
#  {'customer_id': 1, 'product': 'tshirt', 'id': 1, 'name': 'sam'},
#  {'customer_id': 4, 'product': 'camera', 'id': 4, 'name': 'tina'},
#  {'customer_id': 6, 'product': 'watch', 'id': 6, 'name': 'clayton'}]



[{'customer_id': 1, 'product': 'phone', 'id': 1, 'name': 'sam'},
 {'customer_id': 1, 'product': 'tshirt', 'id': 1, 'name': 'sam'},
 {'customer_id': 4, 'product': 'camera', 'id': 4, 'name': 'tina'},
 {'customer_id': 6, 'product': 'watch', 'id': 6, 'name': 'clayton'}]