# Design Notes

## Main goal of pipeline maker

* We want the user to specify a **collection of objects**
* more concretely: 
  * connect a **datasource** to an existing dacc, in a normalized form (say, a dacc using WfStore, AnnotStore)
  * datasource can be anything, but the (raw) values we want are numpy arrays (int16). 
  The keys can be deduced from the items, after a sequence of transformations (see mk_item2kv_for, and other functions in dol for example). 
  Another tool to use : dol/filt_iter on store
  We can use the interface of dol/Files to get the raw material (example: FileBytesReader)

* At the **GUI level**: user makes a collection by adding funcs
  * 'select one' : from pre-existing dropdown list
  * 'make one' and select after (will use crude: we select through the crude name)

* At the **backend level**:
  * two stores are provided: one store of funcs (populated with 2-3 funcs say), one store of factories.
  * make a collection of functions through the GUI (in the form of Slabs or not)
  * the user needs to populate the items. Behind the scenes we provide functions to create the list of objects that we can select. Those functions can be provided as keys, of a mall. If very large, the collection of keys is accessed via a smartlist. The function that creates the items could be as simple as dunder iter for the store.
  * making one element: we provide a store of factories. Those have certain signatures, that will determine the front rendering of each. For example: one factory needs some params to be entered by the user, so front creates a page showing the arguments of the factory. User enters arguments, then click to save in (dill) store of functions.
  * In summary: there are 2 stores, store of funcs, store of factories

* **Remark**: crude:
  does not appear at the level of the React components. 

* **example**:
    make a pipeline to create a Slabs example (ie a store exists, and is transformed by adding more stuff to it)
  

## Comments by Thor
Looked through in a noisy env, so not the level of reading I would want, but from my current reading, yes, that’s it.
Now you probably want to write a bit more code, like
* sets of object and factory stores around our use cases (making stores (with dol wrappers), making train pipelines, making test pipelines, making things (slabiter instances) that will run on live streams)
* what patterns emerge?
* this is sort of a mini-framework to make GUIs (or for now backend to GUIs): What sort of tools would make this framework easier to use (for example, what kind of stores or store wrappers).
* Misc questions.

Here’s an example of a misc question: Should object and factory stores be separate?
It make it nice and clean, yes, until…
* You realize that the composite objects (e.g. Pipe instances or SlabIter objects) are… objects themselves that you’d like to reuse perhaps.
* You realize that factories can be partialized (still factories though, but where do you save them?)
So perhaps instead of two stores, we should have one (on the surface at least), but with a “kind” and ability to filter on it.
For example… (

# A Slabs example

* random generator: provided directly
* clipper: clip above fixed value
* threshold maker: a factory
(* jump finder: a factory (define jump))


In [39]:
import numpy as np
from functools import partial
from know.base import Slabs


DFLT_SIZE = 5
DFLT_WIDTH = 10

def random_gen(width= DFLT_WIDTH, size = DFLT_SIZE):
    while True:
        yield np.random.randint(width, size=size)

def clipper(arr, a_min=0, a_max=10):
    return np.clip(arr, a_min, a_max)

def threshold_factory():
    max_val = int(input())
    return partial(clipper, a_min=0, a_max = max_val)


data = random_gen().__next__
clipper_4 = partial(clipper, a_min=0, a_max = 4)

funcs_store = {'clipper': clipper_4}
factories_store = {'threshold': threshold_factory}

transform = funcs_store['clipper']


In [40]:
s = Slabs(
    data = data, 
    transform = lambda data: data>3
)

In [41]:
next(s)

{'data': array([0, 8, 3, 4, 5]),
 'transform': array([False,  True, False,  True,  True])}

# Examples centered around our use cases

sets of object and factory stores around our use cases (making stores (with dol wrappers), making train pipelines, making test pipelines, making things (slabiter instances) that will run on live streams)

## local files

In [134]:
import os
import soundfile as sf
import pandas as pd
import zipfile
from io import BytesIO
from pathlib import Path

from dol.appendable import add_append_functionality_to_store_cls
from dol import Store
from dol import FilesOfZip, wrap_kvs, filt_iter

from py2store import FilesOfZip
from hear import WavLocalFileStore
from dol import FuncReader



def my_obj_of_data(b):
    return sf.read(BytesIO(b), dtype="float32")[0]

@wrap_kvs(obj_of_data=my_obj_of_data)
@filt_iter(filt=lambda x: not x.startswith("__MACOSX") and x.endswith(".wav"))
class WfZipStore(FilesOfZip):
    """Waveform access. Keys are .wav filenames and values are numpy arrays of int16 waveform."""

    pass


def key_to_ext(k):
    _, ext = os.path.splitext(k)
    if ext.startswith("."):
        ext = ext[1:]
    return ext


def processor_from_ext(ext):
    if ext.startswith("."):
        ext = ext[1:]
    if ext in {"zip"}:
        pass
    elif ext in {"wav"}:
        pass

def is_zip_file(filepath):
    return zipfile.is_zipfile(filepath)

def is_dir(filepath):
    return os.path.isdir(filepath)

def key_maker(name, prefix):
    return f'{prefix}_{name}'

def wf_store_factory(filepath):
    key = key_maker(name = filepath, prefix='wf_store')
    tag = 'wf_store'

    if is_dir(filepath):
        data = WavLocalFileStore(filepath)

         
    elif is_zip_file(filepath):
        data = WfZipStore(filepath)

    return mk_store_item(key, tag, data)

def annot_store_factory(filepath):
    key = key_maker(name = filepath, prefix='annot_store')
    tag = 'annot_store'

    data = pd.read_csv(filepath)

    return mk_store_item(key, tag, data)

def mk_store_item(key, tag, data):
    return dict(key = key, tag = tag, data=data)

def append_to_store(store, item):
    store.append(item)

def dacc_factory():
    pass

factory_store = {'wf_store': wf_store_factory, 'dacc':None}

factory_store = FuncReader([wf_store_factory, dacc_factory])

store_cls = Store
item2kv = lambda item: (item['key'], item)

appendable_store_cls = add_append_functionality_to_store_cls(store_cls, item2kv=item2kv)

In [135]:
# example
rootdir = '/Users/sylvain/Dropbox/Otosense/VacuumEdgeImpulse/'
annot_filepath = '/Users/sylvain/Dropbox/sipyb/Testing/data/annots_vacuum.csv'
is_dir(rootdir)


True

# pipeline maker

In [136]:
# user story
global_store = appendable_store_cls()

wf_store_item = wf_store_factory(filepath=rootdir)
global_store.append(wf_store_item)
annot_store_item = annot_store_factory(filepath = annot_filepath)
global_store.append(annot_store_item)





In [138]:
list(global_store.keys())

['wf_store_/Users/sylvain/Dropbox/Otosense/VacuumEdgeImpulse/',
 'annot_store_/Users/sylvain/Dropbox/sipyb/Testing/data/annots_vacuum.csv']

In [None]:
# AnnotStore


In [130]:
from sklearn.preprocessing import normalize
from hear import WavLocalFileStore
from dol import wrap_kvs
import soundfile as sf
from io import BytesIO
from odat.utils.chunkers import fixed_step_chunker
from slang.featurizers import tile_fft
import pandas as pd
from odat.mdat.vacuum import annot_columns, DFLT_ANNOTS_COLS

DFLT_CHUNKER = partial(fixed_step_chunker, chk_size=2048)
DFLT_FEATURIZER = tile_fft


class Dacc:
    def __init__(self, wf_store):
        self.wfs = wf_store

    def mk_annots(self):
        srefs = self.wfs.keys()
        annots = annot_columns(srefs)
        return annots

    @property
    def annots_df(self):
        annots = self.mk_annots()
        columns = DFLT_ANNOTS_COLS
        df = pd.DataFrame(annots, columns=columns)
        return df

    def wf_tag_train_gen(self):
        for key in self.wfs:
            signal = self.wfs[key]
            train = key.split("/")[0]
            tag = key.split("/")[1].split(".")[0]
            normal_wf = normalize(np.float32(signal).reshape(1, -1))[0]

            yield normal_wf, tag, train

    def chk_tag_train_gen(self, chunker=DFLT_CHUNKER):
        for wf, tag, train in self.wf_tag_train_gen():
            for chk in chunker(wf):
                yield chk, tag, train

    def fvs_tag_train_gen(self, featurizer=DFLT_FEATURIZER):
        for chk, tag, train in self.chk_tag_train_gen():
            yield featurizer(chk), tag, train

    def mk_Xy(self):  # TODO use a groupby here
        X_train, y_train, X_test, y_test = [], [], [], []
        for fv, tag, train in self.fvs_tag_train_gen():
            if train == "train":
                X_train.append(fv)
                y_train.append(tag)
            elif train == "test":
                X_test.append(fv)
                y_test.append(tag)
            else:
                continue
        return np.array(X_train), y_train, np.array(X_test), y_test

In [131]:
annot_columns??

[0;31mSignature:[0m [0mannot_columns[0m[0;34m([0m[0msrefs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0mannot_columns[0m[0;34m([0m[0msrefs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0mlist[0m[0;34m([0m[0mmap[0m[0;34m([0m[0mextract_annot_info[0m[0;34m,[0m [0msrefs[0m[0;34m)[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/Desktop/dev/otosense/odat/odat/mdat/vacuum.py
[0;31mType:[0m      function


# Scrap

In [126]:
# factory
from functools import singledispatch
from dataclasses import dataclass


@dataclass
class DirZips:
    pass


@dataclass
class DirWavs:
    pass


@singledispatch
def process(obj=None):
    raise NotImplementedError("Can't create a SourceStore from that directory")


@process.register(DirZips)
def sub_process(obj):
    return "DirZips processed successfully!"


@process.register(DirWavs)
def sub_process(obj):
    return "DirWavs processed successfully!"