## The `kongming.api` package

This notebook (along with a few follow-up ones) serves as an illustration of my work on high-dimensional computing (HDC)
First we need to import related Python modules.

In [1]:
from kongming import api

`api` is collection of auto-generated from cross-language protocol buffer definitions.
For now, you just need to know a few constants:

* `api.MODEL_64K_8BIT`=1: used for hyper-vectors, where N=65536, and sparsity=1/256 (8-bit depth);
* `api.MODEL_1M_10BIT`=2: used for hyper-vectors, where N=1M, and sparsity=1/1024 (10-bit depth);
* `api.MODEL_16M_12BIT`=3: used for hyper-vectors, where N=16M, and sparsity=1/4096 (12-bit depth).

Later on you can either use these constants, or just the numeric value.

## `hv` module from `kongming` package

In [2]:
from kongming import hv

GOMAXPROCS set to 128 for this invocation.


Another important concept is `domain`, which is a collection of semantically related hypervectors. For now, we use default domain `d0`.

In [4]:
d0 = hv.d0()

Module `kongming.hv` contains a number of useful classes and operations.
For example, `SparseOperation` class, as its name suggests, models the sparse operation.

In [3]:
help(hv.SparseOperation)

Help on class SparseOperation in module kongming.ext.hv:

class SparseOperation(kongming.ext.go.GoClass)
 |  SparseOperation(*args, **kwargs)
 |  
 |  SparseOperation models operations for stochastic sparse hypervectors.
 |  
 |  Method resolution order:
 |      SparseOperation
 |      kongming.ext.go.GoClass
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __init__(self, *args, **kwargs)
 |      handle=A Go-side object is always initialized with an explicit handle=arg
 |      otherwise parameters can be unnamed in order of field names or named fields
 |      in which case a new Go object is constructed first
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  cardinality(self)
 |      Cardinality() long
 |      
 |      Cardinality returns the associated cardinality (ON bit counts).
 |  
 |  domain(self)
 |      Domain() object
 |      
 |      Domain returns the associated domain.
 |  
 |  

As directed by above information, we can create a `SparseOperation` instance from the default domain `d0`: the second argument is an initial seed for the internal random number generator, and any number will do.

In [5]:
so = d0.new_sparse_operation(api.MODEL_1M_10BIT, 99)

The associated model can be retrieved from this object, which is the numeric value for `MODEL_1M_10BIT`

In [6]:
so.model()

2

A random hyper-vector can be generated by:

In [8]:
a = d0.new_random_sparse_segmented(so)

Each returned hyper-vector is of type `hv.SparseSegmented`, and the associated methods can be inspected by

In [9]:
help(hv.SparseSegmented)

Help on class SparseSegmented in module kongming.ext.hv:

class SparseSegmented(kongming.ext.go.GoClass)
 |  SparseSegmented(*args, **kwargs)
 |  
 |  SparseSegmented is a special subset of sparse binary hyper-vectors.
 |  
 |  Compared with generic sparse binary hyper-vectors (SparseBinary), where ON bits can be positioned anywhere,
 |  this is a more constrained form of sparse hyper-vectors, with the following benefits:
 |  
 |  1. it allows an even more compact memory/storage representation than SparseBinary;
 |  2. if offers greatly simplified operations;
 |  
 |  Method resolution order:
 |      SparseSegmented
 |      kongming.ext.go.GoClass
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __init__(self, *args, **kwargs)
 |      handle=A Go-side object is always initialized with an explicit handle=arg
 |      otherwise parameters can be unnamed in order of field names or named fields
 |      in which case a new Go object is constructed first


One of the important method for `hv.SparseSegmented` is `stable_hash`, which returns the signature hash value for the hyper-vector. Different vector, no matter how small the difference is, will produce dramatically different hash value. In addition, the design for this hash is representation-agnostic: the idea is that the hash value for the same vector, in different forms of representation, will remain unchanged. In summary, it's always safe to compare their hash value to determine equality.

In [10]:
hex(a.stable_hash())

'0x53be7fd0cd84d422'

Another useful way to examine a hyper-vector is `string`, essentially turn a vector into its human-readable string form.

In [11]:
a.string()

'ss(m=MODEL_1M_10BIT, hash=0x53be7fd0cd84d422, seed=0x9b03a1202af08426, exp=1)'

The above hash always in `uint64` hex form, and idential to `stable_hash()`.

The string form can also be obtained via Python function call of `str(a)`, which internally will call `string()` method.

In [12]:
str(a)

'ss(m=MODEL_1M_10BIT, hash=0x53be7fd0cd84d422, seed=0x9b03a1202af08426, exp=1)'

Let's make another small step by creating another hypervector `b`.

In [14]:
b = d0.new_random_sparse_segmented(so)

Note the `SparseOperation` object will change its internal RNG status, and the next call of `new_random_sparse_segmented` will produce a complete new hyper-vector.

A pair of random hypervectors is almost orthogonal, as shown by their overlap.

In [16]:
hv.overlap(a,b)

1

In the meanwhile, their Hamming distance will be big: they are really far distant apart.

In [12]:
hv.hamming(a,b)

2046

There are several ways to create hypervectors, and interested readers can check out by `dir(d0)`.

For example, here is way to produce a random hyper-vector as determined by a numeric seed:

In [17]:
c = d0.new_sparse_segmented_from_seed(api.MODEL_1M_10BIT, 1234, False)

The second argument is the seed for this hyper-vector: different seeds will produce uniquely differnt vectors.
The third argument `invert` will invert the vector if set to `True`, which we don't want for now.

It's also trivial to verify this random vector is almost orthogonal to previous `a`:

In [14]:
hv.overlap(a,c)

2

`hv` package also provides ways to generate hypervectors, as determined by a seed word.

In [18]:
d = d0.new_sparse_segmented_from_seed_word(api.MODEL_1M_10BIT, "random", False)

hv.overlap(a,d)

1

The above `random` is just a random string, and different string will produce different vectors. However, if we use the identical string, the identical vector will be returned.

In [20]:
e = d0.new_sparse_segmented_from_seed_word(api.MODEL_1M_10BIT, "RANDOM", False)

hv.equal(d,e)

False

For hypervectors that were created via different contructors, their string representation can be slightly different.

In [21]:
str(e)

"ss(m=MODEL_1M_10BIT, hash=0x3c60123f0a90a7cc, 'RANDOM', exp=1)"

## The bind and bundle operations

Now we can try bundle operation. 

`d0` (or in general, `hv.Domain` objects) provides convenience functions of `bind` and `bundle`.

The first argument for `bundle` is the seed for bundle operation: different seeds will produce different but all conforming results.

In [25]:
bundled = d0.bundle(0, a, b)

hv.overlap(a, bundled), hv.overlap(b, bundled)

(507, 518)

As expected, the overlap is approximately half of the total cardinality (count of ON bits), for bundling of 2 hyper-vectors: the original vector `a` and `b` (with the model of `MODEL_1M_10BIT` has precisely 1024 ON bits.

Furthermore, we can try to bundle 3 hypervectors `a`, `b`, and `c`, like this:

In [29]:
bundled3 = d0.bundle(1, a, b, c)

hv.overlap(a, bundled3), hv.overlap(b, bundled3), hv.overlap(c, bundled3)

(358, 320, 348)

Another critical operation for hyper-dimensional vectors is `bind`.

In [30]:
bound = d0.bind(a, b)

The bound vector will have almost no overlap with original vectors:

In [31]:
hv.overlap(bound, a), hv.overlap(bound, c)

(1, 0)

In the next notebook, we will go through serialization of vectors between local Python module and remote services. Stay tuned.