# Getting the embeddings

This notebook gets the embeddings (or latent space) from a multivariate time series 
given by a encoder (e.g., autoencoder)

> It uses sliding window view instead of sliding window for splitting the dataset.

## Set-up
Initial notebook setup and specific debugging and pre-configured cases selection.
### VsCode update patch
Initial notebook setup when using VSCode.

In [1]:
# This is only needed if the notebook is run in VSCode
import sys
import dvats.utils as ut
if '--vscode' in sys.argv:
    print("Executing inside vscode")
    ut.DisplayHandle.update = ut.update_patch

### Debugging variables

- `verbose`. If `> 0` it adds debbuging messages in those functions that allows so (eg. `get_enc_embeddings`)
- `reset_kernel`. If `True` it resets the kernel by the end of the execution. Use only in case that memory management is needed.
- `check_memory_usage`. If `True`, it adds some lines for checking the GPU memmory ussage along the execution.
- `time_flag`. If `True` it get the execution time along the notebook as well as inside those functions that allows so (eg. `get_enc_embeddings`)

In [2]:
verbose = 0
reset_kernel = False
check_memory_usage = True
time_flag = True

## Main code
### Import libraries

In [3]:
from dvats.all import *
from tsai.data.preparation import prepare_forecasting_data
from tsai.data.validation import get_forecasting_splits
from fastcore.all import *
import wandb
from yaml import load, FullLoader
if check_memory_usage:
    import torch 

### Initialize and Configurate Artifact

In [4]:
wandb_api = wandb.Api()

In [5]:
if check_memory_usage:
    gpu_device = torch.cuda.current_device()
    gpu_memory_status(gpu_device)

GPU | Used mem: 12
GPU | Used mem: 24
GPU | Memory Usage: [[92m██████████----------[0m] [92m50%[0m


#### Get configuration parameters from yml
> Configuration parameters are obtained from 'config\03-embeddings.yaml'

In [6]:
config, job_type = get_artifact_config_embeddings_swv(verbose = verbose)
if verbose: show_attrdict(config)

### Setup W&B artifact

In [7]:
import os
path = os.path.expanduser("~/work/nbs_pipeline/")
name="03b_embeddings-sliding_window_view"
os.environ["WANDB_NOTEBOOK_NAME"] = path+name+".ipynb"
runname=name
print("runname: "+runname)

runname: 03b_embeddings-sliding_window_view


In [8]:
run = wandb.init(
    entity      = config.wandb_entity,
    project     = config.wandb_project if config.use_wandb else 'work-nbs', 
    group       = config.wandb_group,
    job_type    = job_type,
    mode        = 'online' if config.use_wandb else 'disabled',
    anonymous   = 'never' if config.use_wandb else 'must',
    config      = config,
    resume      = 'allow',
    name        = runname
)

wandb: Currently logged in as: mi-santamaria. Use `wandb login --relogin` to force relogin


'stream.Stream' object attribute 'write' is read-only


### Get trained model artifact

##### Build artifact selector
Botch to use artifacts offline

In [9]:
artifacts_gettr = run.use_artifact if config.use_wandb else wandb_api.artifact

##### Get the model from W&B
Restore the encoder model and its associated configuration

In [10]:
enc_artifact = artifacts_gettr(config.enc_artifact, type='learner')

In [11]:
if verbose > 0:enc_artifact.metadata

In [12]:
print("enc_artifact: "+enc_artifact.name)

enc_artifact: mvp-SWV:v22


In [13]:
# TODO: This only works when you run it two timeS! WTF?
try:
    enc_learner = enc_artifact.to_obj()
except:
    enc_learner = enc_artifact.to_obj()

wandb:   1 of 1 files downloaded.  


## Get dataset artifact from W&B
### Restore the dataset artifact used for training the encoder. 
> Even if we do not compute the dimensionality reduction over this dataset, we need to know the metadata of the encoder training set, to check that it matches with the dataset that we want to reduce.

In [14]:
enc_run = enc_artifact.logged_by()
enc_artifact_train = artifacts_gettr(enc_run.config['train_artifact'], type='dataset')
enc_artifact_train.name

'Monash-Australian_electricity_demand:v8'

### Specify the dataset artifact that we want to get the embeddings from
> If no artifact is defined, the artifact to reduce will be the one used for validate the encoder.

In [15]:
input_ar_name = ifnone(
    config.input_ar, 
    f'{enc_artifact_train.entity}/{enc_artifact_train.project}/{enc_artifact_train.name}'
)
wandb.config.update({'input_ar': input_ar_name}, allow_val_change=True)
input_ar = artifacts_gettr(input_ar_name)
input_ar.name

'Monash-Australian_electricity_demand:v8'

In [16]:
df = input_ar.to_df()

if verbose > 0:
    display(df.head())
    print("df ~ ", df.shape)

wandb:   1 of 1 files downloaded.  


### Split data with Sliding Window

In [17]:
import time

In [18]:
w = enc_run.config['w']
if verbose > 0:print(w)

In [19]:
if check_memory_usage: gpu_memory_status(gpu_device)

GPU | Used mem: 11
GPU | Used mem: 24
GPU | Memory Usage: [[92m█████████-----------[0m] [92m46%[0m


In [20]:
type(df)

pandas.core.frame.DataFrame

In [21]:
if time_flag: t_start = time.time()

enc_input, _ = prepare_forecasting_data(df, fcst_history = w)

if time_flag: 
    t_end = time.time()
    t = t_end - t_start
    print("SW start | " , t_start, " | end ", t_end, "total (secs): ", t)

if verbose > 0:print(enc_input.shape)

SW start |  1725543925.7887301  | end  1725543925.7895868 total (secs):  0.0008566379547119141


In [22]:
if check_memory_usage: gpu_memory_status(gpu_device)

GPU | Used mem: 11
GPU | Used mem: 24
GPU | Memory Usage: [[92m█████████-----------[0m] [92m46%[0m


### Get embedings

In [23]:
stride = enc_run.config['stride']
batch_size = enc_run.config['batch_size']
#enc_learner.dls.bs = enc_run.config['batch_size']

In [24]:
if verbose > 0:
    print(stride)
    print(batch_size)
    print(enc_input.shape)
    print(enc_artifact.name)

In [25]:
print(enc_input.shape)

(232242, 1, 30)


In [26]:
if check_memory_usage: gpu_memory_status(gpu_device)

GPU | Used mem: 11
GPU | Used mem: 24
GPU | Memory Usage: [[92m█████████-----------[0m] [92m46%[0m


In [27]:
print(enc_run.config['stride'])
print(enc_run.config['batch_size'])

15
512


In [28]:
enc_input.shape

(232242, 1, 30)

In [29]:
embs = get_enc_embs(
    X = enc_input, 
    enc_learn = enc_learner, 
    cpu=config.cpu, 
    to_numpy=True, 
    verbose = verbose
)

In [30]:
if time_flag: t_start = time.time()

embs = get_enc_embs_set_stride_set_batch_size(
    X = enc_input, 
    enc_learn = enc_learner, 
    stride = stride, 
    batch_size = batch_size, 
    cpu=config.cpu, 
    to_numpy=True, 
    verbose = verbose,
    time_flag = time_flag,
    chunk_size = 1000,
    check_memory_usage = check_memory_usage
)
if time_flag:
    t_end = time.time()
    t = t_end - t_start
    print("GE start | " , t_start, " | end ", t_end, "total (secs): ", t)

GPU | Used mem: 19
GPU | Used mem: 24
GPU | Memory Usage: [[93m███████████████-----[0m] [93m79%[0m
get_enc_embs_set_stride_set_batch_size 1.9706542491912842 seconds
GPU | Used mem: 12
GPU | Used mem: 24
GPU | Memory Usage: [[92m██████████----------[0m] [92m50%[0m
GE start |  1725544015.0022922  | end  1725544017.0640519 total (secs):  2.0617597103118896


In [31]:
%debug

ERROR:root:No traceback has been produced, nothing to debug.


In [32]:
if check_memory_usage: gpu_memory_status(gpu_device)

GPU | Used mem: 12
GPU | Used mem: 24
GPU | Memory Usage: [[92m██████████----------[0m] [92m50%[0m


In [33]:
if config.use_wandb: 
    run.log_artifact(ReferenceArtifact(embs, 'embeddings-SWV', metadata=dict(run.config)), 
                     aliases=f'run-{run.project}-{run.id}')

In [34]:
run.finish()

VBox(children=(Label(value='0.006 MB of 0.006 MB uploaded (0.000 MB deduped)\r'), FloatProgress(value=1.0, max…

## Final checks

### Check num_inputs = embs
num_inputs = len(enc_input[::stride]) = enc_input.shape[0]/stride + 1 if extra block = ceil(enc_input.shape[0]/stride

num_embs = embs.shape[0]

In [35]:
#Dimensions check
num_inputs = np.ceil(enc_input.shape[0]/stride)
num_embs = embs.shape[0]
test_eq(num_inputs, num_embs )
print(num_inputs, num_embs)

15483.0 15483


# Free GPU

In [36]:
if check_memory_usage: gpu_memory_status(gpu_device)

GPU | Used mem: 12
GPU | Used mem: 24
GPU | Memory Usage: [[92m██████████----------[0m] [92m50%[0m


In [37]:
#| export
print("Execution ended")
beep(1)
if reset_kernel:
    import os
    os._exit(00)

Execution ended
