# Train scNet on 5 Pancreas datasets from scratch

In this notebook, we are going to use train scNet on 5 different pancreas datasets. 

Please note that, no architecture surgery is going to be performed. This is a tutorial for training scNet on a new task and share the trained network with Zenodo.

In [1]:
import os
os.chdir("../../")

In [2]:
os.environ['CUDA_VISIBLE_DEVICES'] = '1'

In [3]:
import scnet as sn
import scanpy as sc

Using TensorFlow backend.


In [4]:
sc.settings.set_figure_params(dpi=200)

In [5]:
condition_key = "study"
cell_type_key = "cell_type"

# Loading 5 pancreas datasets

In [6]:
adata = sc.read("/home/mohsen/data/pancreas/pancreas_normalized.h5ad")
adata

AnnData object with n_obs × n_vars = 15681 × 1000 
    obs: 'batch', 'study', 'cell_type', 'size_factors'

# Train/Test split 

In [7]:
train_adata, valid_adata = sn.tl.train_test_split(adata, 0.80)
train_adata.shape, valid_adata.shape

((12544, 1000), (3137, 1000))

# Calculating number of conditions (n = 5)

In [8]:
n_conditions = len(train_adata.obs[condition_key].unique().tolist())
n_conditions

5

# Create scNet network from scratch 

There are some parameters that worth to be mentioned here:

1. __task_name__: name of the task which you are going to train scNet on it.
2. __x_dimension__: number of dimensions in expression space
3. __z_dimension__: number of dimensions in latent space of scNet
4. __n_conditions__: number of conditions (batches, datasets, or domains)
5. __gene_names__: list of gene names used as scNet's input
6. __model_path__: path to save trained scNet model and its configuration files.
7. __alpha__: KL divergence coefficient in the scNet's loss function
8. __beta__: MMD coefficient in the scNet's loss function. Please __NOTE__ that if beta is set to be zero, scNet's loss fucntion is equivalent to a CVAE loss function.
9. __loss_fn__: loss function to be used in scNet. Can be one of `mse`, `sse`, `nb`, or `zinb`. Please __NOTE__ that If you are going to use `nb` or `zinb` loss function, we suggest that setting `beta` hyperparameter to zero will be the best config for scNet to train on your task.
10. __clip_value__: optimizer's clip value for gradients.

In [9]:
network = sn.archs.scNet(task_name='pancreas',
                         x_dimension=train_adata.shape[1], 
                         z_dimension=10,
                         architecture=[128, 20],
                         n_conditions=n_conditions,
                         gene_names=adata.var_names.tolist(),
                         lr=0.001,
                         alpha=0.0001,
                         beta=50,
                         eta=1000,
                         clip_value=10,
                         use_batchnorm=False,
                         loss_fn='mse',
                         model_path="./models/scNet/Zenodo/pancreas/before/",
                         dropout_rate=0.05,
                         )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
scNet's network has been successfully constructed!
scNet's network has been successfully compiled!


# Extracting unique conditions in `adata`

In [11]:
conditions = adata.obs[condition_key].unique().tolist()

# Setting scNet's `condition_encoder` with the extracted `conditions` 

The `condition_encoder` variable is used to map each condition(batch or domain) in the dataset to its corresponding encoded label. This encoded labels are used to construct one-hot vectors in order to be fed to encoder and decoder sub-networks of scNet.

In [12]:
network.set_condition_encoder(conditions=conditions)

In [13]:
network.condition_encoder

{'Pancreas inDrop': 0,
 'Pancreas CelSeq2': 1,
 'Pancreas CelSeq': 2,
 'Pancreas Fluidigm C1': 3,
 'Pancreas SS2': 4}

# Train scNet 

You can train scNet with `scNet.train` function with the following parameters:

1. __train_adata__: Annotated dataset used for training scNet.
2. __valid_adata__: Annotated dataset used for validating scNet.
3. __condition_key__: name of the column in `obs` matrix in `train_adata` and `valid_adata` which contains the conditions for each sample.
4. __n_epochs__: number of epochs used to train scNet.
5. __batch_size__: number of sample used to sample as mini-batches in order to optmize scNet. 
6. __early_stop_limit__: number of epochs used for EarlyStopping's patience.
7. __lr_reducer__: number of epochs used for LRReduceOnPlateau's patience.
8. __save__: whether to save scNet's model and configs after training phase or not. 
9. __retrain__: if `False` and scNet's pretrained model exists in `model_path`, will restore scNet's weights. Otherwise will train and validate scNet on `train_data` and `valid_adata` respectively. 

In [14]:
network.train(train_adata,
              valid_adata, 
              condition_key=condition_key,
              n_epochs=1000,
              batch_size=512, 
              early_stop_limit=10,
              lr_reducer=8, 
              save=True, 
              retrain=True)

Instructions for updating:
Use tf.cast instead.
 |█████---------------| 25.3%  - loss: 981.6236 - reconstruction_loss: 962.6992 - mmd_loss: 18.9243 - val_loss: 958.8756 - val_reconstruction_loss: 933.9247 - val_mmd_loss: 24.9509535
scNet has been successfully saved in ./models/scNet/Zenodo/pancreas/before/.


# Shared Your Trained scNet with other Researchers

You can easily get TOKEN by signing up in [**Zenodo**](https://zenodo.org/) Website and creating an app in the settings. You just have to following these steps for creating a new TOKEN: 

1. Sign in/Register in [__Zenodo__](https://zenodo.org/)
2. Go to __Applications__ page.
3. Click on __new_token__ in __Personal access tokens__ panel.
4. Give it access for `deposit:actions` and `deposit:write`.

__NOTE__: Zenodo will show the created TOKEN only once so be careful in preserving it. If you lost your TOKEN you have to create new one.

In [15]:
ACCESS_TOKEN = "YOUR_TOKEN"

## 1. Create a Deposition in your zenodo account

You can use wrapper functions in `zenodo` module in scNet package to interact with your depositions and uploaded files in Zenodo. In Zenodo, A deposition is a cloud space for a publication, poster, etc which contains multiple files.

In order to create a deposition in Zenodo, You can call our `create_deposition` function with the following parameters:

-  __access_token__: Your access token
-  __upload_type__: Type of the deposition, has to be one of the following types defined in [here](https://developers.zenodo.org/#representation).
-  __title__: Title of the deposition.
-  __description__: Description of the deposition.
-  __creators__: List of creators of this deposition. Each item in the list has to be in the following form:

```
{
    "name": "LASTNAME, FIRSTNAME", (Has to be in this format)
    "affiliation": "AFFILIATION", (Optional)
    "orcid": "ORCID" (Optional, has to be a valid ORCID)
}
```





In [19]:
deposition_id = sn.zenodo.create_deposition(ACCESS_TOKEN, 
                                            upload_type="other", 
                                            title='scNet-pancreas',
                                            description='pre-trained scNet on inDrop, CelSeq, CelSeq2, SmartSeq2, and Fluidigm C1',                                            
                                            creators=[
                                                {"name": "Naghipourfar, Mohsen", "affiliation": "SUT"},
                                            ],
                                            )

New Deposition has been successfully created!


## 2. Upload scNet to your deposition

After creating a deposition, you can easily upload your pre-trained scNet model using `upload_model` function in `zenodo` module. This function accepts the following parameters:

- __model__: Instance of scNet's class which is trained on your task
- __deposition_id__: ID of the deposition you want to upload the model in.
- __access_token__: Your TOKEN.

The function will return the generated `download_link` in order to use and provide other 

In [20]:
download_link = sn.zenodo.upload_model(network, 
                                       deposition_id=deposition_id, 
                                       access_token=ACCESS_TOKEN)

Model has been successfully uploaded


## 3. Publish the created deposition

In [23]:
sn.zenodo.publish_deposition(deposition_id, ACCESS_TOKEN)

Deposition with id = 3834593 has been successfully published!


## Congrats! Your model is ready to be downloaded by others researchers!

Now you can download model directly with `download_link` variable.
You can also share your `download_link` with a title and a description of your task with scNet repository by sending pull requests. 

In order to do so, You have to do the following steps:

```bash
git clone https://github.com/theislab/scNet
cd scNet

sud
```

| Task | link | Description | 
| ----------- | ----------- | ----------- |
| Pancreas | https://google.com/  | Trained scNet on 5 pancreas batches |