# Getting the embeddings

> This notebook gets the embeddings (or latent space) from a multivariate time series 
given by a encoder (e.g., autoencoder)

In [32]:
from dvats.all import *
from tsai.data.preparation import SlidingWindow
from fastcore.all import *
import wandb
wandb_api = wandb.Api()
from yaml import load, FullLoader

## Config parameters

Put here everything that could be needed if this notebook

In [33]:
config = AttrDict(
    use_wandb = True, # Whether to use or not wandb for experiment tracking
    wandb_group = 'embeddings', # Whether to group this run in a wandb group
    wandb_entity = os.environ['WANDB_ENTITY'], # The entity to use for wandb
    wandb_project = os.environ['WANDB_PROJECT'], # The project to use for wandb
    enc_artifact = 'vrodriguezf90/deepvats/dcae:v1', # Name:version of the encoder artifact
    input_ar = None, # If none, the validation set used to train enc_artifact is used
    cpu = False
)

## Run

In [34]:
run = wandb.init(entity=config.wandb_entity,
                    project=config.wandb_project if config.use_wandb else 'work-nbs', 
                    group=config.wandb_group,
                    job_type='embeddings', 
                    mode='online' if config.use_wandb else 'disabled',
                    anonymous = 'never' if config.use_wandb else 'must',
                    config=config,
                    #id = 'embeddingsProvider',
                    resume='allow')

[34m[1mwandb[0m: Currently logged in as: [33mvrodriguezf90[0m. Use [1m`wandb login --relogin`[0m to force relogin


VBox(children=(Label(value='Waiting for wandb.init()...\r'), FloatProgress(value=0.01666947898338549, max=1.0)…

In [35]:
# Botch to use artifacts offline
artifacts_gettr = run.use_artifact if config.use_wandb else wandb_api.artifact

Restore the encoder model and its associated configuration

In [36]:
enc_artifact = artifacts_gettr(config.enc_artifact, type='learner')

In [37]:
# TODO: This only works when you run it two timeS! WTF?
try:
    enc_learner = enc_artifact.to_obj()
except:
    enc_learner = enc_artifact.to_obj()
enc_learner

[34m[1mwandb[0m:   1 of 1 files downloaded.  


<fastai.learner.Learner at 0x7f221158a8f0>

Restore the dataset artifact used for training the encoder. Even if we do not compute the dimensionality reduction over this dataset, we need to know the metadata of the encoder training set, to check that 
it matches with the dataset that we want to reduce.

In [38]:
enc_run = enc_artifact.logged_by()
enc_artifact_train = artifacts_gettr(enc_run.config['train_artifact'], type='dataset')
enc_artifact_train.name

'toit:v0'

Now we specify the dataset artifact that we want to get the embeddings from. If no 
artifact is defined, the artifact to reduce will be the one used for validate the encoder.

In [39]:
input_ar_name = ifnone(config.input_ar, 
                       f'{enc_artifact_train.entity}/{enc_artifact_train.project}/{enc_artifact_train.name}')
wandb.config.update({'input_ar': input_ar_name}, allow_val_change=True)
input_ar = artifacts_gettr(input_ar_name)
input_ar.name

'toit:v0'

In [40]:
df = input_ar.to_df()
df

[34m[1mwandb[0m:   1 of 1 files downloaded.  


Unnamed: 0,Piazza_Vanvitelli
2019-12-04 13:00:00,0.364095
2019-12-04 14:00:00,0.532166
2019-12-04 15:00:00,0.661551
2019-12-04 16:00:00,0.552637
2019-12-04 17:00:00,0.684569
...,...
2020-02-29 19:00:00,0.803531
2020-02-29 20:00:00,0.817342
2020-02-29 21:00:00,0.890509
2020-02-29 22:00:00,0.919360


In [41]:
df.shape

(2099, 1)

In [42]:
enc_input, _ = SlidingWindow(window_len=enc_run.config['w'], 
                             stride=enc_run.config['stride'], 
                             get_y=[])(df)
enc_input.shape

(2076, 1, 24)

In [43]:
embs = get_enc_embs(enc_input, enc_learner, cpu=config.cpu, to_numpy=True)

In [44]:
if config.use_wandb: 
    run.log_artifact(ReferenceArtifact(embs, 'embeddings', metadata=dict(run.config)), 
                     aliases=f'run-{run.project}-{run.id}')

In [45]:
run.finish()

VBox(children=(Label(value='0.006 MB of 0.006 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…