Before we start, import a few Python modules:

In [1]:
from kongming import api, hv
from kongming.hv import helpers

d0 = hv.d0()

GOMAXPROCS set to 128 for this invocation.


Also we make some preparations. `primitives` is a dict from alphabet (`a` to `z`) to random codes.

In [2]:
primitives={chr(letter): d0.new_sparse_segmented_from_seed_word(api.MODEL_1M_10BIT, chr(letter)) for letter in range(ord('a'), ord('z')+1)}

str(primitives['a']), str(primitives['k'])

("ss(m=MODEL_1M_10BIT, hash=0x40fa56a18e54f291, 'a', exp=1)",
 "ss(m=MODEL_1M_10BIT, hash=0xabcc3adf6707dbf2, 'k', exp=1)")

# `bundle` and `bind` Operations

`d0` (or in general, `hv.Domain` objects) provides convenience functions of `bind` and `bundle`.

The first argument for `bundle` is the seed for bundle operation: different seeds will produce different but all conforming results.

In [3]:
a, b = primitives['a'], primitives['b']
bundled = d0.bundle(0, a, b)

hv.overlap(a, bundled), hv.overlap(b, bundled)

(507, 518)

As expected, the overlap is approximately half of the total cardinality (count of ON bits), for bundling of 2 hyper-vectors: the original vector `a` and `b` (with the model of `MODEL_1M_10BIT` has precisely 1024 ON bits.

Furthermore, we can try to bundle 3 hypervectors `a`, `b`, and `c`, like this:

In [4]:
a, b, c = primitives['a'], primitives['b'], primitives['c']
bundled3 = d0.bundle(1, a, b, c)

hv.overlap(bundled3, a), hv.overlap(bundled3, b), hv.overlap(bundled3, c)

(356, 321, 349)

Oh, did we mention that `bundle` operation can have weights associated with each operands?

In this case, we use a helper function `helpers.weights` to pass in a Python list of weights.

In [5]:
a, b, c = primitives['a'], primitives['b'], primitives['c']
bundled_weighted = d0.bundle_weighted(1, helpers.weights([0.2, 0.4, 0.4]), primitives['a'], primitives['b'], primitives['c'])

hv.overlap(bundled_weighted, a), hv.overlap(bundled_weighted, b), hv.overlap(bundled_weighted, c)

(225, 395, 406)

In this case, you can easily verify the overlap is approximately distributed with respect to their weights.

Another critical operation for hyper-dimensional vectors is `bind`. The bound vector will have almost no overlap with original vectors.

In [6]:
bound = d0.bind(a, b)

hv.overlap(bound, a), hv.overlap(bound, b)

(1, 0)

# Composite structures

This section will introduce composite structures, such as sequences and sets.

## Sets

In [7]:
a, b, c = primitives['a'], primitives['b'], primitives['c']

set0 = d0.new_set(primitives['a'], primitives['b'], primitives['c'])

helpers.to_native_hbp(set0.proto_load())

hint: SET
model: MODEL_1M_10BIT
set {
  contains {
    members {
      sparse_segmented {
        seed_word: "a"
      }
    }
    members {
      sparse_segmented {
        seed_word: "b"
      }
    }
    members {
      sparse_segmented {
        seed_word: "c"
      }
    }
  }
}

Note at this time the composition has NOT happened: `set0` only records the fact that a set of `a`, `b`, and `c` is thus formed, and as the result, `Set.core()` returns an identity vector. 

In [8]:
set0.composition_seed(), str(set0.core())

(0, 'ss(m=MODEL_1M_10BIT, IDENTITY)')

Use `Set.compose` to return a new composed object with a seed.

In [9]:
set0_composed = set0.compose(100)

helpers.to_native_hbp(set0_composed.proto_load())

hint: SET
model: MODEL_1M_10BIT
stable_hash: 7258081846772163338
set {
  seed: 100
  contains {
    members {
      sparse_segmented {
        seed_word: "a"
      }
    }
    members {
      sparse_segmented {
        seed_word: "b"
      }
    }
    members {
      sparse_segmented {
        seed_word: "c"
      }
    }
  }
}

This is a bit subtle, but you can see `set0_composed` has the field of `seed` set (while `set0` doesn't), which indicate this instance has been composed.

Furthermore, the underlying hyper-vector can be examined.

In [10]:
helpers.to_native_hbp(set0_composed.core().proto_load())

model: MODEL_1M_10BIT
stable_hash: 7258081846772163338
sparse_segmented {
  offsets: "7\355\200<\003\031\330\276\336\013\263{X\037\252{\252\240@\242@\020\001!2MeW.\332\314x3b\327\270m\025\346\273\305\251\023\343\236\037\354\216\240\ne\3306\003\223\264\331\364\257\032\236q\206nL\233;Z6Kng>\035&h\333\177\033\017\266\265\037:\235\\\035\317\341\236\332]\016\216\322A\210P\316\347\003\207\224:\212\325\214~\346\355 0d> ;\374+\243>\266\030\334\372\216M\277\3645\344\343\024\373\317j.\244*\245\031\023\221\340Q\340N\216\315\310\275\033_\200LO\337\217\213\330\302T\264\260b\3715\337\201\3327\253\260B;\024\244C?Rd\001@I\'\245\220g\327\346\270\225t/\342GK\027&%\323\363\276\230\231\014^\317@3\336\220\215\337\273\371\306\322\354\340\230\232\357IT\346,\017 \013\275\227\327x\212\314\373;\327_x\220\377\224\224{:\315f \357\330\236r\235\035\203[\214z:$\002\351z\226@m^\303\304\256X\346\262\330\361\025\334\310\346\254\023d\201\356R\3275,\037\356\202\222;\304\354`aQ\334\334\024+GGI\325tb\323E\023\027|\240\232\

Once composed, each constituent should have significant overlap with the set itself.

However, a bit of detail here. The set code $S = C_{set} \otimes (\sum_{\oplus, i} C_i)$: we need to release the set marker $C_{set}$ from the set code $S$ first.

In [11]:
a, b, c = primitives['a'], primitives['b'], primitives['c']

marker = d0.new_sparse_segmented_prewired(api.MODEL_1M_10BIT, api.SET_MARKER)
combined = d0.release(set0_composed, marker)
hv.overlap(combined, a), hv.overlap(combined, b), hv.overlap(combined, c)

(354, 335, 337)

## Sequences

Sequences are a collection of codes with enforced ordering.

Let's proceed with some examples.

In [12]:
first = d0.new_sequence(
    primitives['f'], primitives['i'], primitives['r'], primitives['s'], primitives['t'])

helpers.to_native_hbp(first.proto_load())

hint: SEQUENCE
model: MODEL_1M_10BIT
sequence {
  contains {
    members {
      sparse_segmented {
        seed_word: "f"
      }
    }
    members {
      sparse_segmented {
        seed_word: "i"
      }
    }
    members {
      sparse_segmented {
        seed_word: "r"
      }
    }
    members {
      sparse_segmented {
        seed_word: "s"
      }
    }
    members {
      sparse_segmented {
        seed_word: "t"
      }
    }
  }
}

By close examination, you can see this is a `SEQUENCE` object, with no seed: it's a non-composed sequence that only records the compositional structure and constituents.

Composed sequence can be constructed by an additional call to `Sequence.compose`, with a seed. Composition by different seeds will be all conforming and valid.

In [13]:
first_composed = first.compose(5)

helpers.to_native_hbp(first_composed.proto_load())

hint: SEQUENCE
model: MODEL_1M_10BIT
stable_hash: 16008683035129884237
sequence {
  seed: 5
  contains {
    members {
      sparse_segmented {
        seed_word: "f"
      }
    }
    members {
      sparse_segmented {
        seed_word: "i"
      }
    }
    members {
      sparse_segmented {
        seed_word: "r"
      }
    }
    members {
      sparse_segmented {
        seed_word: "s"
      }
    }
    members {
      sparse_segmented {
        seed_word: "t"
      }
    }
  }
}

The underlying hypervector can also be examined. Note the underlying hypervector will lose the compositional structure and every constituents are "merged" and gone.

In [14]:
helpers.to_native_hbp(first_composed.core().proto_load())

model: MODEL_1M_10BIT
stable_hash: 16008683035129884237
sparse_segmented {
  offsets: "r2k\214\234\n^\333\030\0206\300$t4x\247\227`\2459h\\\027\312\341\330\317\271\205*\247\330\335\332\371\317\300\266(\010\264`\211:\344vRg\301y\304\337m\031x(\357V\033\"\210F\271+u\204\245_\'\356\340\014\215\304\025\363\377^B\242\224\357B\255F\2513\206jeZ\344j\350\332\211^V\246\371W\372\362\233-f\031\344\243\374\256k\200+2m\356t8\350\251cb \346\214Ek\244\306\036\230\376\2073\330\016\212\t0\315\035\231\205\275G\006\316d~BU\300\304\267T\346\352Q\n\241r\002\325B\351!\302l\345.DT\271j\325FB\261\243\005#g\227\004l\356^&\260\272\260WV\336\n\305\377\245i6\264\274\026\323Y\r\355\254\343\305,\364\353.\256\217\027\022\006\006L\\\221\335\rR\253k\3363\023\201P\335\tj\346\366\351\360\354G\240\001\310m\007\203Rq\276\342v\033C\033{\345#\315%i\362\225\206.\261\236\212\377\216\246zg\320\256\342E\341\221\254z\353\360\202u\315WJ\021\337\030\264\221\005_c\343\032\000&\202\2277\217u\272\356v\367\246\014`\316\3313vF\343!\270

Each member should have significant overlap with the resulted sequence code.

Again, the sequence code $S = C_{seq} \otimes (\sum_{\oplus, i} C_i \otimes C^{i}_{step})$: we need to release the positional marker $C^{i}_{step}$ and the overall sequence marker $C_{seq}$ from the sequence code $S$ first. 

The overlap should be approximately $1/5$, since we have 5 members in the sequence.

In [15]:
marker = d0.new_sparse_segmented_prewired(api.MODEL_1M_10BIT, api.SEQUENCE_MARKER)
step = d0.new_sparse_segmented_prewired(api.MODEL_1M_10BIT, api.STEP)
stripped = d0.release(first_composed, marker)

hv.overlap(stripped, primitives['f']), hv.overlap(stripped, d0.bind(primitives['i'], step)), hv.overlap(stripped, d0.bind(primitives['r'], step.power(2))), hv.overlap(stripped, d0.bind(primitives['s'], step.power(3)))

(230, 190, 190, 186)

## Necklaces and knots

We can even record the compositional structure for the operation of `bundle` and `bind`. 

In [16]:
knot = d0.new_knot(primitives['a'], primitives['b'], primitives['c'])

helpers.to_native_hbp(knot.proto_load())

hint: KNOT
model: MODEL_1M_10BIT
stable_hash: 12572121051728929346
knot {
  parts {
    sparse_segmented {
      seed_word: "a"
    }
  }
  parts {
    sparse_segmented {
      seed_word: "b"
    }
  }
  parts {
    sparse_segmented {
      seed_word: "c"
    }
  }
}

`hv.Knot` faithfully records the `bind` operation (and its operands). 

The underlying value from `knot` is identical to the result if we were to `bind` individual codes together.
There is no concept of composed vs non-composed as no seed will be needed.

In [17]:
bound = d0.bind(primitives['a'], primitives['b'], primitives['c'])

hv.equal(bound, knot), hv.equal(bound, knot.core())

(True, True)

`hv.Necklace` faithfully records the `bundle` operation and its operands. 

In [18]:
necklace = d0.new_necklace(10, primitives['a'], primitives['b'], primitives['c'], primitives['d'])

helpers.to_native_hbp(necklace.proto_load())

hint: NECKLACE
model: MODEL_1M_10BIT
stable_hash: 14808825802167636548
necklace {
  seed: 10
  pearls {
    members {
      sparse_segmented {
        seed_word: "a"
      }
    }
    members {
      sparse_segmented {
        seed_word: "b"
      }
    }
    members {
      sparse_segmented {
        seed_word: "c"
      }
    }
    members {
      sparse_segmented {
        seed_word: "d"
      }
    }
  }
}

In [19]:
bundled = d0.bundle(10, primitives['a'], primitives['b'], primitives['c'], primitives['d'])

hv.equal(bundled, necklace), hv.equal(bundled, necklace.core())

(True, True)

Unlike compositional structures such as sets and sequences, `knot` and `necklace` is always composed, since seed (if any) is always supplied during construction.

For adventurous readers, `hv.Domain` also has `new_weighted_necklace()`, which takes `seed`, normalized weights and a list of operands. The resulted `hv.Necklace` instance will faithfully record a weighted bundling operation.

# Online learners

`hv.Learner` is the online learner, as described in the arXiv paper.

In [20]:
l = d0.new_learner(api.MODEL_1M_10BIT, 50)

str(l.core())

'ss(m=MODEL_1M_10BIT, IDENTITY)'

The underlying code (which can be retrieved via `.core()`) for a brand-new learner is the identity vector. It will be replaced completely by any incoming code.

In [21]:
l.on_observation(primitives['a'])

str(l.core()), hv.equal(l.core(), primitives['a'])

('ss(m=MODEL_1M_10BIT, hash=0x40fa56a18e54f291)', True)

In [22]:
a, b, c = primitives['a'], primitives['b'], primitives['c']
l.on_observation(b)
l.on_observation(c)

averaged = l.core()
hv.overlap(averaged, a), hv.overlap(averaged, b), hv.overlap(averaged, c)

(342, 353, 331)

This running-average learner has the result that is 1/3 overlap with `a`, `b` and `c`.

If the incoming data stream contains another occurance of `a`, overall `a` appeared 2 / 4 times, while `b` and `c` appeared once.

In [23]:
a = primitives['a']
l.on_observation(a)

averaged = l.core()
hv.overlap(averaged, a), hv.overlap(averaged, b), hv.overlap(averaged, c)

(504, 264, 258)