### Overview

This notebook demonstrates the inference pipeline. Assumes a pretrained model and either simulated RRBS or true RRBS data is available in both patch-format expected by the model. This basically requires patches to be of the same size as was used during model training (here, 128 CpGs). In general, a pretrained model can be applied to any level of sparsity, however the model hyperparameters were tuned specifically for extreme sparsity levels (>90% missing or RRBS-like missing not at random).

#### Requirements
- Pretrained model. Available under /ARUNA/checkpoints as a pytorch .pth file.
- Patchified sparse data. Available under /ARUNA/data/```<dataset>``` following steps in data_prep.ipynb.

#### Outputs
- Predicted methylomes in /ARUNA/results/.


<pre>
</pre>

In [1]:
%cd ..

/home/js228/ARUNA


In [2]:
import os
import yaml
import numpy as np
import pandas as pd
from pathlib import Path
from collections import defaultdict

from aruna.process_dataset import get_cc_gt, get_cc_noisy
from aruna.process_dataset import get_pc_gt, get_pc_noisy

import torch
import torch.nn as nn
from aruna.utils import get_splits
from aruna.models import DCAE_MSLICE
from aruna.model_utils import get_peObj
from aruna.evaluations import collate_mslices, collate_mslices_test
from aruna.evaluations import process_seq, process_seq_test
from torch.utils.data import DataLoader
from aruna.data_utils import get_mslice_dataset
from aruna.model_engine import valid_step_mslice, test_step_mslice
from scripts.inference import get_cpgmask, get_spp_collapsed_maps, save_aruna_preds

%load_ext autoreload
%autoreload 2

In [4]:
CWD = os.getcwd()
saved_model_dir = os.path.join(CWD, "checkpoints")

In [None]:
test_data = "gtex"
chrom = "chr21"
test_regimes = ["rrbs_sim"]

save_path = Path("/grain/js228/aruna_paper/predictions/mslice-imputer")
save_path.mkdir(parents=True, exist_ok=True)

cpgMask_map = get_cpgmask([chrom], test_data)

cc_gt_df, _ = get_cc_gt(test_data, chrom)
canonical_index = cc_gt_df.index