## Serialization for hyper-vectors

This notebook talks about how to seriaze/de-serialize hyper-vectors. This is critical if we want to communicate vectors with remote services.

Let's first import relevant Python modules.

In [1]:
from kongming import api
from kongming import hv

so=hv.new_sparse_operation(3, 2342)
a=hv.new_random_sparse_constrained(so)
a.string()

'SparseConstrained(m=MODEL_16M_12BIT, hash=0xb0104c8a363b61c0, seed=0xb65c386d6e85d5ec, exp=1)'

`hv.to_message` will turn a hyper-vector into its equivalent message form (primarily for serialization). It can be printed nicely:

In [2]:
msg = hv.to_message(a)
msg

hint: SPARSE_CONSTRAINED
model: MODEL_16M_12BIT
stable_hash: 12686724306801746368
sparse_constrained {
  seed: 13140439855417120236
}

`msg` is the serialized form of vector `a`. We can verify their signature hashes are identical.

In [3]:
a.stable_hash() == msg.stable_hash

True

It's also possible to convert the message into a json representation.

In [None]:
import json
json_string = hv.as_json(a)
json.loads(json_string)

'{"hint":"SPARSE_CONSTRAINED", "model":"MODEL_16M_12BIT", "stableHash":"12686724306801746368", "sparseConstrained":{"seed":"13140439855417120236"}}'

The `msg` can be de-serialized back into a Python object:

In [6]:
back=hv.from_message(msg)
back.string()

'SparseConstrained(m=MODEL_16M_12BIT, hash=0xb0104c8a363b61c0, seed=0xb65c386d6e85d5ec, exp=1)'

We've verified the conversion from serialized form to its original vector was successful.

For vectors generated via `new_sparse_constrained_from_seed_word`, the debugging can be even nicer.

In [7]:
b=hv.new_sparse_constrained_from_seed_word(api.MODEL_16M_12BIT, "seed", False)
b.string()

"SparseConstrained(m=MODEL_16M_12BIT, hash=0x4ad4def1a8f34152, 'seed', exp=1)"

Its json form can be retrieved by:

In [8]:
hv.as_json(b)

'{"hint":"SPARSE_CONSTRAINED", "model":"MODEL_16M_12BIT", "stableHash":"5392179783372325202", "sparseConstrained":{"seedWord":"seed"}}'

## Serialization for sets of hyper-vectors

Similarly we can serialize / deserialize sets of vectors as a whole. 
`hv.new_weighted_set` is such a helper function, which forms a weighted set.

In [9]:
b=hv.new_random_sparse_constrained(so)
c=hv.new_random_sparse_constrained(so)

s=hv.new_weighted_set([a,b,c], [0.2,0.2,0.6])
s.string()

'WeightedSet{0xb0104c8a363b61c0 (0.20) ,0xd50ea407d58f12c7 (0.20) ,0x83fd0fc2f3ee5df0 (0.60)}'

Again, `hv.to_set_message` will turn such instance into its equivalent message.

In [10]:
set_msg=hv.to_set_message(s)
set_msg

hint: WEIGHTED_SET
members {
  hint: SPARSE_CONSTRAINED
  model: MODEL_16M_12BIT
  stable_hash: 12686724306801746368
  sparse_constrained {
    seed: 13140439855417120236
  }
}
members {
  hint: SPARSE_CONSTRAINED
  model: MODEL_16M_12BIT
  stable_hash: 15352388533307249351
  sparse_constrained {
    seed: 7185127869584986079
  }
}
members {
  hint: SPARSE_CONSTRAINED
  model: MODEL_16M_12BIT
  stable_hash: 9510775318066912752
  sparse_constrained {
    seed: 8801786502350939704
  }
}
weights: 0.2
weights: 0.2
weights: 0.6

The message can be converted back to a weighted set (of hyper-vectors), and the result should be identical to the original `s`.

In [11]:
back_set=hv.from_set_message(set_msg)
back_set.string()

'WeightedSet{0xb0104c8a363b61c0 (0.20) ,0xd50ea407d58f12c7 (0.20) ,0x83fd0fc2f3ee5df0 (0.60)}'

Finally, `util` module also contains convenience functions such as `new_sequence` and `new_uniform_set`.

## Serialization for `bind` and `bundle` operations

Let's first perform a `bind` operation.

In [12]:
bound = hv.bind([a, b, c])

In [13]:
hv.to_message(bound)

hint: KNOT
model: MODEL_16M_12BIT
stable_hash: 10105111120497942738
knot {
  parts {
    hint: SPARSE_CONSTRAINED
    sparse_constrained {
      seed: 13140439855417120236
    }
  }
  parts {
    hint: SPARSE_CONSTRAINED
    sparse_constrained {
      seed: 7185127869584986079
    }
  }
  parts {
    hint: SPARSE_CONSTRAINED
    sparse_constrained {
      seed: 8801786502350939704
    }
  }
}

`knot` is a new type to record the result of a `bind` operation. Actually we don't serialize the result, instead, we record individual inputs and the result will be computed on-the-fly during de-serialization, as the computation is repeatable and deterministic.

`bundle` operation is a bit complicated, as it involves a seed: each seed identifies an unique and valid `bundle` operation among an operator family.

In [14]:
bundled = hv.bundle(10, [a, b, c])

In [15]:
hv.to_message(bundled)

hint: NECKLACE
model: MODEL_16M_12BIT
stable_hash: 14435243644180154857
necklace {
  seed: 10
  pearls {
    hint: UNIFORM_SET
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 13140439855417120236
      }
    }
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 7185127869584986079
      }
    }
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 8801786502350939704
      }
    }
  }
}

The resulted `necklace`type faithfully records the individual members, as well as the unique seed. Again, the computation is done on-the-fly during de-serialization and no space is needed at rest.

I feel a bit more adventurous: how about bundle vectors with varying weights?

In [16]:
bundled_weights = hv.bundle_with_weights(10, [a, b, c], [0.4, 0.2, 0.4])
bundled_weights.string()

'Necklace{SparseConstrained(m=MODEL_16M_12BIT, hash=0x2b4d289048e51706), seed=0x000000000000000a, pearls:(sc(0xb65c386d6e85d5ec, 1), sc(0x63b6b3ba9c5f4fdf, 1), sc(0x7a263a30a574a638, 1))}'

In [17]:
hv.to_message(bundled_weights)

hint: NECKLACE
model: MODEL_16M_12BIT
stable_hash: 3120194717000996614
necklace {
  seed: 10
  pearls {
    hint: WEIGHTED_SET
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 13140439855417120236
      }
    }
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 7185127869584986079
      }
    }
    members {
      hint: SPARSE_CONSTRAINED
      sparse_constrained {
        seed: 8801786502350939704
      }
    }
    weights: 0.4
    weights: 0.2
    weights: 0.4
  }
}

The individual weights are faithfully recorded in the message.

In [18]:
hv.overlap(bundled_weights, a), hv.overlap(bundled_weights, b), hv.overlap(bundled_weights, c)

(1612, 871, 1619)

You can see the overlap is distributed approximately to their invididual weights.