# The Key operator: `⌸`

> Overemphasis of efficiency leads to an unfortunate circularity in design: for reasons of efficiency early programming languages reflected the characteristics of the early computers, and each generation of computers reflects the needs of the programming languages of the preceding generation. --_Kenneth E. Iverson_

In [40]:
⎕IO ← 0
]box on -s=min -t=tree -f=on
]rows on

The monadic operator Key, `⌸`, groups things. It can, for example, be used to generate histograms, or if you are so inclined, you can think of it as similar to SQL's [GROUP BY](https://en.wikipedia.org/wiki/Group_by_(SQL)) clause. 

Other resources on Key:

* Dyalog [docs](http://help.dyalog.com/18.0/index.htm#Language/Primitive%20Operators/Key.htm)
* APL Orchard [cultivation](https://chat.stackexchange.com/rooms/52405/conversation/lesson-5-even-more-apl-operators--) (also covering _Stencil_)

The function derived by Key is ambivalent (can be called either monadically or dyadically). The operand function can be any dyadic function returning a value. Let's start with the monadic case.

In [54]:
{⍺⍵}⌸'bill' 'bob' 'bill' 'eric' 'bill' 'bob' 'eric' 'sue'

In this case, the operator function is called with each unique element from the argument array in turn as the left argument, and a vector of indices where they occur. If we wanted this to be a histogram, all we need to to is:

In [42]:
{⍺ (≢⍵)}⌸'bill' 'bob' 'bill' 'eric' 'bill' 'bob' 'eric' 'sue'

In its dyadic form, Key takes each unique element to the left, and groups the corresponding elements from the right. In other words, the following two formulations do the same thing:

In [68]:
names ← 'bill' 'bob' 'bill' 'eric' 'bill' 'bob' 'eric' 'sue'
{⍺⍵}⌸names
names{⍺⍵}⌸⍳≢names

Let's look at a slightly more involved example. Here are the Rugby Union Gallagher English Premiership results for Jan, 2021, home fixtures only. The 0-0 games were COVID cancellations. 

In [69]:
scores←'London Irish,31-22' 'Wasps,17-49' 'Gloucester,26-31' 'Worcester Warriors,17-21' 'Bristol,48-3' 'Leicester Tigers,15-25' 'Harlequins,27-27' 'Newcastle Falcons,22-10' 'Exeter Chiefs,7-20' 'Northampton Saints,0-0' 'Bath,44-52' 'Sale,20-13' 'London Irish,0-0' 'Leicester Tigers,36-31' 'Wasps,34-5' 'Gloucester,19-22' 'Bristol,29-17' 'Worcester Warriors,0-0'

We'll do some quick and dirty slicing and dicing to separate team names from results.

In [70]:
table←⎕CSV ('-'⎕R','⊢scores)''4
teams←⊣/table
scores←table[;1 2]

To illustrate the similarity with SQL's GROUP BY, by using dyadic Key we can group the scores under each team:

In [72]:
teams{⍺⍵}⌸scores

One more example. Given a vector, return the most frequent element, or if there are several, return all of those:

In [38]:
{(⊣/m)/⍨f=f⊃⍨⊃⍒f←⊢/m←,∘≢⌸⍵} 'Mississippi' ⍝ Most frequent

Ouch. 

A lot of stuff there we haven't seen yet, and rather 'golfy'. Let's unpick it, and see how far we get. From the right we first have our _Key_:

In [39]:
,∘≢⌸ 'Mississippi'

That's just a [tacit](tacit.ipynb) formulation equivalent to the histogram we've seen above already. We can write that as the explicit dfn

In [28]:
{⍺,≢⍵}⌸ 'Mississippi'

This is assigned to the variable `m`, and then we _right-tack-reduce_ that, assigning the result to `f`. 

In [29]:
⊢/,∘≢⌸ 'Mississippi'

The tack-reduce makes a vector from the last column of the `m` array, assigning that to `f`... which we then grade down, picking the first element:

In [30]:
⊃⍒⊢/,∘≢⌸ 'Mississippi'

So now we have the index of the largest element in the vector `f`. Next up is a selfied pick which takes this index and selects the corresponding element in `f`:

In [40]:
f⊃⍨⊃⍒f←⊢/,∘≢⌸ 'Mississippi'

thus giving us the highest frequency. Now we have to figure out if there are several instances of this frequency:

In [41]:
f=f⊃⍨⊃⍒f←⊢/,∘≢⌸ 'Mississippi'

Yes, we have _two_ elements both with the frequency 4, as a boolean mask. Now we compress the _first_ column of our histogram array (by a _left-tack-reduce_) to pick the letters corresponding to our boolean mask:

In [42]:
(⊣/m)/⍨f=f⊃⍨⊃⍒f←⊢/m←,∘≢⌸ 'Mississippi'

If we de-golf that a bit we end up with:

In [36]:
]dinput
MostFrequent ← {
    hist ← {⍺,≢⍵}⌸⍵
    vals ← ⊢/hist
    high ← (⊃⍒vals)⊃vals
    (high=vals)/⊣/hist
}

In [37]:
MostFrequent 'Mississippi'

Here are some drills on Key. Try to work out in your head what the answers are before you reveal:

In [2]:
x←'Supercalifragilisticexpialidocious'

{⍺⍵}⌸x
{⍺ (≢⍵)}⌸x
{≢⍵}⌸x
{⍺}⌸x

In [5]:
x←'abcdefghijk'
y←10+11 2⍴⍳22

x{⍺⍵}⌸y
x{⍺ (≢⍵)}⌸y
x{≢⍵}⌸y
x{⍺}⌸y
x{⍺(+/,⍵)}⌸y
x{⍺(⌈/,⍵)}⌸y
x{⍺(⌊/,⍵)}⌸y