# Foreign keys and ticdat

In my opinion the most important thing about foreign keys is checking them for referential integrity and handling the results in a sophisticated way. I think creating data structures that map between tables in Python code is relatively easy to do even without an ORM. So please bear with me while I discuss both topics together.

## one-many - the more common case

Here is a one-many foreign key (which is surely the most common type of foreign-key).

In [1]:
from ticdat import TicDatFactory, Slicer
tdf = TicDatFactory(parent_one = [["Name"], ["Data One", "Data Two"]],
                    child = [["Parent One", "Parent Two"], ["Data"]])
tdf.add_foreign_key( "child", "parent_one", ["Parent One", "Name"])

I'm ommitting the "Parent Two" table for brevity - the "Child" table obviously has two parents, but there is a one-many relationship between "Parent One" and "Child".

This is a non-default setting that slows things down a little bit on data load, in exchange for more easy table cross references.

In [2]:
tdf.enable_foreign_key_links()

Could just as easily load data from any number of data sources (csv, Excel, PostGres, SQLite, Access, JSON) but I'll just build some right here for demonstration.

In [3]:
dat = tdf.TicDat(parent_one = [[f"p_{i}", i+10, i%3 * (i+1)] for i in range(5)], 
                 child = [[f"p_{i}", f"q_{j}", i*j] for i in range(5) for j in range(2,6)])

In [4]:
dat.parent_one['p_1']

_td:{'Data One': 11, 'Data Two': 2}

This is the link.

In [5]:
dat.parent_one['p_1'].child

{'q_2': _td:{'Data': 2},
 'q_3': _td:{'Data': 3},
 'q_4': _td:{'Data': 4},
 'q_5': _td:{'Data': 5}}

The other technique I like to use is the `Slicer` object.  This first `slice` call will iterate over the entire "Child" table.

In [6]:
child_sliced = Slicer(dat.child)
[(p_q, dat.child[p_q]) for p_q in child_sliced.slice('p_1', '*')]

[(('p_1', 'q_2'), _td:{'Data': 2}),
 (('p_1', 'q_3'), _td:{'Data': 3}),
 (('p_1', 'q_4'), _td:{'Data': 4}),
 (('p_1', 'q_5'), _td:{'Data': 5})]

This second `slice` call is very fast, it only iterates over the subsection of the "Child" table that matches.

In [7]:
[(p_q, dat.child[p_q]) for p_q in child_sliced.slice('p_2', '*')]

[(('p_2', 'q_2'), _td:{'Data': 4}),
 (('p_2', 'q_3'), _td:{'Data': 6}),
 (('p_2', 'q_4'), _td:{'Data': 8}),
 (('p_2', 'q_5'), _td:{'Data': 10})]

`Slicer` is fairly consistent with how the modeling languges work, and is based on the `gurobipy.tuplelist` object, which was Gurobi's original recommendation for how to handle such things.

Of course, integrity checking is the more important thing. You can't just pretend that every child record has a match in the parent table.

In [8]:
tdf.find_foreign_key_failures(dat)

{}

So I remove the 'p_1' record....

In [9]:
dat.parent_one.pop('p_1')

_td:{'Data One': 11, 'Data Two': 2}

And this is how `ticdat` tells you its missing.

In [10]:
tdf.find_foreign_key_failures(dat, verbosity="Low")

{('child', 'parent_one', ('Parent One', 'Name')): (('p_1',),
  (('p_1', 'q_3'), ('p_1', 'q_4'), ('p_1', 'q_5'), ('p_1', 'q_2')))}

## many-many - the less common case

So the many-many relationship is of course worth considering. `ticdat` handles the integrity checking part the same way, but it doesn't yet have `enable_foreign_key_links` turned on.

In [11]:
tdf = TicDatFactory(parent_one = [["Name"], ["Label", "Data"]],
                    child = [["Name"], ["Label", "Data"]])
tdf.add_foreign_key( "child", "parent_one", ["Label", "Label"])

Not going to `enable_foreign_key_links` because it would lead to a "complex foreign key" exception. I can address this if needed.

In [12]:
dat = tdf.TicDat(parent_one = [[f"p_{i}", i%3, i+10] for i in range(5)], 
           child = [[ f"q_{j}", j%3, j+2] for j in range(2,7)])

To cross reference the two tables, I would probably write something like this.

In [13]:
from collections import defaultdict
child_sliced = defaultdict(set)
for key, row in dat.child.items():
    child_sliced[row["Label"]].add(key)
child_sliced = dict(child_sliced)
[(_, dat.child[_]) for _ in child_sliced[2]]

[('q_2', _td:{'Label': 2, 'Data': 4}), ('q_5', _td:{'Label': 2, 'Data': 7})]

You can do something similar for looking up the "Parent One" table based on the value of "label".

It wouldn't be hard to extend `ticdat` so that the many-many foreign keys have foreign key links for many-many relationships. In this case both you would create links from "Parent One" to "Child" and vice-versa. If this was enabled code like

`dat.parent_one['p_2'].child`

would evaluate to a sub-table of `dat.child` that had "Label"=2 (which is the "Label" for "p_2"). This same sub-table would be created for `dat.parent_one['p_4'].child`, `dat.parent_one['p_6'].child`, etc. since they also have "Label"=2.

Now, the more important but less sexy integrity checking that everyone likes to pretend is unimportant. For many-many, as for one-many, the there are integrity failures if a child row can't find at least one parent row, but the converse isn't a problem.

In [14]:
tdf.find_foreign_key_failures(dat)

{}

In [15]:
dat.parent_one['p_1']["Label"] = 10101
tdf.find_foreign_key_failures(dat)

{}

In [16]:
dat.child['q_2']["Label"] = 80085
tdf.find_foreign_key_failures(dat, verbosity="Low")

{('child', 'parent_one', ('Label', 'Label')): ((80085,), ('q_2',))}