## The `kongming.api` package

This notebook serves as an illustration of my work on high-dimensional computing (HDC).
First we need to import related Python modules.

In [1]:
from kongming import api

`api` is auto-generated from cross-language protocol buffer definitions.
For now, you just need to know a few constants:

* `api.MODEL_64K_8BIT`=1: used for hyper-vectors, where N=65536, and sparsity=1/256 (8-bit depth);
* `api.MODEL_1M_10BIT`=2: used for hyper-vectors, where N=1M, and sparsity=1/1024 (10-bit depth);
* `api.MODEL_16M_12BIT`=3: used for hyper-vectors, where N=16M, and sparsity=1/4096 (12-bit depth).

Later on you can either use these constants, or just the numeric value.

## `hv` module from `kongming.ext` package

In [2]:
from kongming import hv

Module `kongming.hv` contains a number of useful classes and operations.
For example, `SparseOperation` class, as its name suggests, models the sparse operation.

In [3]:
help(hv.SparseOperation)

Help on class SparseOperation in module kongming.ext.hv:

class SparseOperation(kongming.ext.go.GoClass)
 |  SparseOperation(*args, **kwargs)
 |  
 |  SparseOperation models operations for stochastic sparse hypervectors.
 |  
 |  Method resolution order:
 |      SparseOperation
 |      kongming.ext.go.GoClass
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __init__(self, *args, **kwargs)
 |      handle=A Go-side object is always initialized with an explicit handle=arg
 |      otherwise parameters can be unnamed in order of field names or named fields
 |      in which case a new Go object is constructed first
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  cardinality(self)
 |      Cardinality() long
 |      
 |      Cardinality returns the associated cardinality (ON bit counts).
 |  
 |  model(self)
 |      Model() int
 |      
 |      Model returns the associated sparsity model.
 |  
 |

As directed by above information, we can create a `SparseOperation` instance: the second argument is an initial seed for the internal random number generator, and any number will do.

In [4]:
so=hv.new_sparse_operation(api.MODEL_1M_10BIT, 99)

The associated model can be retrieved from this object, which is the numeric value for `MODEL_1M_10BIT`

In [5]:
so.model()

2

A random hyper-vector can be generated by:

In [6]:
a=hv.new_random_sparse_constrained(so)

Each returned hyper-vector is of type `hv.SparseConstrained`, and the associated methods can be inspected by

In [7]:
help(hv.SparseConstrained)

Help on class SparseConstrained in module kongming.ext.hv:

class SparseConstrained(kongming.ext.go.GoClass)
 |  SparseConstrained(*args, **kwargs)
 |  
 |  SparseConstrained is a subset of sparse binary hyper-vectors.
 |  
 |  Compared with SparseBinary, where ON bits can be positioned anywhere,
 |  this is a more constrained form of sparse hyper-vectors, with the following benefits:
 |  
 |  1. it allows an even more compact memory/storage representation than SparseBinary;
 |  2. if offers greatly simplified operations;
 |  
 |  Method resolution order:
 |      SparseConstrained
 |      kongming.ext.go.GoClass
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __del__(self)
 |  
 |  __init__(self, *args, **kwargs)
 |      handle=A Go-side object is always initialized with an explicit handle=arg
 |      otherwise parameters can be unnamed in order of field names or named fields
 |      in which case a new Go object is constructed first
 |  
 |  __repr__(self)
 |      Retu

One of the important method for `hv.SparseConstrained` is `stable_hash`, which returns the signature hash value for the hyper-vector. Different vector, no matter how small the difference is, will produce dramatically different hash value. In addition, the design for this hash is representation-agnostic: the idea is that the hash value for the same vector, in different forms of representation, will remain unchanged. In summary, it's always safe to compare their hash value to determine equality.

In [8]:
hex(a.stable_hash())

'0x79f54e49e0db63d7'

Another useful way to examine a hyper-vector is `string`, essentially turn a vector into its human-readable string form.

In [9]:
a.string()

'SparseConstrained(m=MODEL_1M_10BIT, hash=0x79f54e49e0db63d7, seed=0x9c1359de7aa1244f, exp=1)'

The above hash always in `uint64` hex form, and idential to `stable_hash()`.

In [10]:
b=hv.new_random_sparse_constrained(so)

Note the `SparseOperation` object will change its internal RNG status, and the next call of `new_random_sparse_constrained` will produce a complete new hyper-vector.

Another hyper-vector of `b`, which is almost orthogonal to previous `a`, as shown by their overlap.

In [11]:
hv.overlap(a,b)

1

In the meanwhile, their Hamming distance will be big: they are really far distant apart.

In [12]:
hv.hamming(a,b)

2046

Another way to produce a random hyper-vector is:

In [13]:
c=hv.new_sparse_constrained_from_seed(api.MODEL_1M_10BIT, 1234, False)

The second argument is the seed for this hyper-vector: different seeds will produce uniquely differnt vectors.
The third argument (if True) will invert the vector, we don't want to do that (for now).

It's also trivial to verify this random vector is almost orthogonal to previous `a`:

In [14]:
hv.overlap(a,c)

2

`hv` package also provides a few function to generate hyper-vector deterministically.

In [15]:
d=hv.new_sparse_constrained_from_seed_word(api.MODEL_1M_10BIT, "random", False)
hv.overlap(a,d)

2

The above `random` is just a random string, and different string will produce different vectors. However, if we use the identical string, the identical vector will be returned.

In [16]:
e=hv.new_sparse_constrained_from_seed_word(api.MODEL_1M_10BIT, "RANDOM", False)
hv.equal(d,e)

False

In [17]:
e.short()

"sc('RANDOM', 1)"

For debugging,`short()` is another useful tool to examine the content of hyper-vector `a` in a short and concise fashion. For vector that was generated from a seed word, this will simply print out the seed word.

## The bind and bundle operations

Now we can try bundle operation. 

`kongming.hv` provides convenience functions of `bind` and `bundle`, that takes regular Python variable list as arguments.

The first argument for `bundle` is the seed for bundle operation: different seeds will produce different but all valid results.

In [18]:
bundled=hv.bundle(0, [a, c])
hv.overlap(a,bundled)

514

As expected, the overlap is approximately half of the total cardinality (count of ON bits), for bundling of 2 hyper-vectors: the original vector `a` and `c` (with the model of `MODEL_1M_10BIT` has precisely 1024 ON bits.

Another critical operation for hyper-dimensional vectors are `hv.bind`.

In [19]:
bound=hv.bind([a, c])

The bound vector will have almost no overlap with original vectors:

In [20]:
hv.overlap(bound, a), hv.overlap(bound, c)

(1, 2)

In the next notebook, we will go through serialization of vectors between local Python module and remote services. Stay tuned.