# Introduction to hyper-dimensional computing with `kongming` package

This notebook (along with a few follow-up ones) serves as an illustration of my work on high-dimensional computing (HDC)

First we need to import related Python packages and modules.

In [25]:
from kongming import api, hv

from kongming.hv import helpers

## The `kongming.api` package

`api` is collection of auto-generated from cross-language protocol buffer definitions.
For now, you just need to know a few constants:

* `api.MODEL_64K_8BIT`=1: used for hyper-vectors, where N=65536, and sparsity=1/256 (8-bit depth);
* `api.MODEL_1M_10BIT`=2: used for hyper-vectors, where N=1M, and sparsity=1/1024 (10-bit depth);
* `api.MODEL_16M_12BIT`=3: used for hyper-vectors, where N=16M, and sparsity=1/4096 (12-bit depth).

Later on you can either use these constants, or just the numeric value.

## hyper-vector: the basics

Before getting into real hyper-vectors, we will encounter `hv.Domain` class, which models basically a collection of semantically related hypervectors. 

For now, we will use default domain `d0`. Later we will try to explain hypervectors from different domains.

In [26]:
d0 = hv.d0()

The `hv.Domain` instance `d0` offers constructors for a number of useful classes and operations.

For example, `hv.SparseOperation`, as its name suggests, models the sparse operation, including the sparsity configuration and a random number generator. a `help(hv.SparseOperation)` will reveal more information about this class.

As first step, we create a `hv.SparseOperation` instance from the default domain `d0`: the second argument is an initial seed for the internal random number generator, and any number will do.

In [27]:
so = d0.new_sparse_operation(api.MODEL_1M_10BIT, 99)

so.model()

1

The associated model can be retrieved from this object, which is the numeric value for `api.MODEL_1M_10BIT`

With the default domain `d0` and a sparse operation object `so`, a random hyper-vector can be generated by:

In [28]:
a = d0.new_random_sparse_segmented(so)

Each returned hyper-vector is of type `hv.SparseSegmented`, which is a subclass of `hv.HyperBinary`. 

The associated methods can be inspected by Python's `help(hv.SparseSegmented)` or `help(hv.HyperBinary)`.

In [29]:
help(hv.SparseSegmented)

Help on class SparseSegmented in module kongming.ext.hv:

class SparseSegmented(kongming.ext.go.GoClass)
 |  SparseSegmented(*args, **kwargs)
 |  
 |  SparseSegmented is a special subset of sparse binary hyper-vectors.
 |  
 |  Compared with generic sparse binary hyper-vectors (SparseBinary), where ON bits can be positioned anywhere,
 |  this is a more constrained form of sparse hyper-vectors, with the following benefits:
 |  
 |  1. it allows an even more compact memory/storage representation than SparseBinary;
 |  2. if offers greatly simplified operations;
 |  
 |  Method resolution order:
 |      SparseSegmented
 |      kongming.ext.go.GoClass
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __init__(self, *args, **kwargs)
 |      handle=A Go-side object is always initialized with an explicit handle=arg
 |      otherwise parameters can be unnamed in order of field names or named fields
 |      in which case a new Go object is constructed first


One of the important method for `hv.SparseSegmented` is `stable_hash`, which returns the signature hash value for the hyper-vector. Different vectors, no matter how small the difference is, will produce dramatically different hash value. In addition, the design for this hash is representation-agnostic: the idea is that the hash value for the same vector, in different forms of representation, will remain unchanged. In summary, it's always safe to compare their hash value to determine equality.

In [30]:
hex(a.stable_hash())

'0x53be7fd0cd84d422'

Another useful way to examine a hyper-vector is `string`, essentially turn a vector into its human-readable string form.

In [31]:
a.string()

'ss(m=MODEL_1M_10BIT, hash=0x53be7fd0cd84d422, seed=0x9b03a1202af08426, exp=1)'

The above hash always in `uint64` hex form, and idential to `stable_hash()`.

The string form can also be obtained via Python function call of `str(a)`, which internally will call `string()` method.

In [32]:
str(a)

'ss(m=MODEL_1M_10BIT, hash=0x53be7fd0cd84d422, seed=0x9b03a1202af08426, exp=1)'

Note the original hypervector computing library is written in Go, and we use `gopy` to expose the underlying Go methods to be used in Python. 
For this reason, `hv.SparseSegmented` is actually a Python-wrapped Go class, whose underlyign data is mostly opaque, except exported methods.

A good way to inspect hypervectors in more details is through its protobuf message representations.

We use protobuf messages as the cross-platform and cross-language media for hypervectors, in the sense that the messages originated from native objects from either language can be transferred to another platform and be converted back into another language without information loss. In addition, the protobuf messages can be serialized over wire and de-serialized into native objects for each languauge (Go or Python) it supports. This is a powerful tool for our purposes.

`HyperBinary.proto_load` will convert the `HyperBinary` (where `SparseSegmented` is one subclass) objects into equivalent protobuf messages. 

In [33]:
wrapped_msg = a.proto_load()

However, `wrapped_msg` is still a Python wrapper for underlying Go object of type `*api.HyperBinaryProto`, which doesn't provide much transparency: we need to convert it to a Python native message. This is where a few helper functions can help.

In [34]:
helpers.to_native_msg("HyperBinaryProto", wrapped_msg)

model: MODEL_1M_10BIT
stable_hash: 6034401085501002786
sparse_segmented {
  seed: 11169948660340392998
}

In addition to the combined `helpers.to_native_msg("HyperBinaryProto, a.proto_load())`, we have a shortcut method `to_native_hbp` for any hypervectors: this is a good trick for anyone's toolbox.

In [35]:
msg = helpers.to_native_hbp(a.proto_load())

msg, type(msg) 

(model: MODEL_1M_10BIT
 stable_hash: 6034401085501002786
 sparse_segmented {
   seed: 11169948660340392998
 },
 kongming.api.hv_pb2.HyperBinaryProto)

The resulted message is a native Python object, where we can freely inspect individual fields. For example:

In [36]:
msg.stable_hash == a.stable_hash()

True

That's pretty much all you can do for a single hypervector. 

Let's make another small step by creating another hypervector `b`, and see how 1 plus 1 can be much more than 2.

In [37]:
b = d0.new_random_sparse_segmented(so)

Note the `SparseOperation` object will change its internal RNG status, and the next call of `new_random_sparse_segmented` will produce a complete new hyper-vector.

A pair of random hypervectors is almost orthogonal, as shown by their overlap.

In [38]:
hv.overlap(a,b)

1

In the meanwhile, their Hamming distance will be big: they are really far distant apart.

Note these hypervectors has dimension of `N=1048576`, and `M=1024` ON bits.

In [39]:
hv.hamming(a,b)

2046

There are several alternative ways to create hypervectors. 

For example, here is way to produce a hyper-vector as determined by a numeric seed: a random number generator initialized with that seed will be used to produce the per-segment offsets for the hyper-vector.

Interested readers can check out by `dir(d0)`.

In [40]:
c = d0.new_sparse_segmented_from_seed(api.MODEL_1M_10BIT, 1234)

The second argument is the seed for this hyper-vector: different seeds will produce uniquely differnt vectors.

It's also trivial to verify this random vector is almost orthogonal to previous `a` and `b`:

In [41]:
hv.overlap(c, a), hv.overlap(c, b)

(3, 0)

`hv` package also provides ways to generate hypervectors, as determined by a seed word. Under the hood, we use the hash for the supplied word as the seed to kickstart the random number generator.

In [42]:
d = d0.new_sparse_segmented_from_seed_word(api.MODEL_1M_10BIT, "random")

hv.overlap(d, a), hv.overlap(d, b)

(1, 1)

Note the seed word is case sensitive as the underlying hash function is.

In [43]:
e = d0.new_sparse_segmented_from_seed_word(api.MODEL_1M_10BIT, "RANDOM")

hv.equal(d, e)

False

For hypervectors that were created via different contructors, their string representations can be slightly different, but will reflect how the hypervector was produced.

In [44]:
str(e)

"ss(m=MODEL_1M_10BIT, hash=0x3c60123f0a90a7cc, 'RANDOM', exp=1)"

Equivalently, we can example the protobuf message for `e`.

In [45]:
helpers.to_native_hbp(e.proto_load())

model: MODEL_1M_10BIT
stable_hash: 4350497302009391052
sparse_segmented {
  seed_word: "RANDOM"
}

So far, we've covered the hypervectors, with a few methods to examine their contents.

Next notebook will cover the compositional structures such as sets and sequences.