# Train scNet on 2 Pancreas datasets from pre-trained scNet

In this notebook, we are going to use train scNet on 5 different pancreas datasets. 

Please note that, architecture surgery is going to be performed inside `create_scNet_from_pre_trained_task` function. This is a tutorial for training scNet on a new task and share the trained network with Zenodo.

In [2]:
import os
os.chdir("../../")

In [3]:
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

In [4]:
import scnet as sn
import scanpy as sc

In [5]:
sc.settings.set_figure_params(dpi=200)

In [6]:
condition_key = "study"
cell_type_key = "cell_type"
target_conditions = ['Pancreas CelSeq2', 'Pancreas SS2']

## Loading 5 pancreas datasets

In [7]:
adata = sn.data.read("/home/mohsen/data/pancreas/pancreas_normalized.h5ad")
adata

AnnData object with n_obs × n_vars = 15681 × 1000 
    obs: 'batch', 'study', 'cell_type', 'size_factors'

## Keep 2 target datasets

In [8]:
adata = adata[adata.obs[condition_key].isin(target_conditions)]
adata

View of AnnData object with n_obs × n_vars = 5387 × 1000 
    obs: 'batch', 'study', 'cell_type', 'size_factors'

## Train/Test split 

In [10]:
train_adata, valid_adata = sn.utils.train_test_split(adata, 0.80)
train_adata.shape, valid_adata.shape

((4309, 1000), (1078, 1000))

## Calculating number of conditions (n = 2)

In [11]:
n_conditions = len(train_adata.obs[condition_key].unique().tolist())
n_conditions

2

## Create scNet network from pre-trained task 

There are some parameters that worth to be mentioned here:

- __path_or_link__: Path to downloaded zip file of scNet's model or a direct downloadable link.
- __filename__: If `path_or_link` is a link, `filename` is used for the name of downloaded file.
- __model_path__: path to save downloaded file from link and new trained scNet's model.
- __new_task__: name of the new task to be solved.
- __target_conditions__: list of target conditions(batches or domains) used to append to scNet's `condition_encoder`.
- __version__: Version of scNet to be used. Can be one of `scNet`, `scNet v1`, and `scNet v2`.

In [10]:
link = "https://zenodo.org/record/3834803/files/scNet-pancreas-inDropCelSeqFluidigmC1.zip?download=1"

In [11]:
network = sn.create_scNet_from_pre_trained_task(
    path_or_link=link,
    filename='pancreas-inDropCelSeqFC1.zip',
    model_path="./models/scNet/pancreas/",
    new_task="pancreas-CelSeq2,SS2",
    target_conditions=target_conditions,
    version='scNet',
)

File already exists!
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
scNet's network has been successfully constructed!
scNet's network has been successfully compiled!
scNet's network has been successfully constructed!
scNet's network has been successfully compiled!
scNet's network has been successfully compiled!


## Fine-tune pre-trained scNet 

You can train scNet with `scNet.train` function with the following parameters:

1. __train_adata__: Annotated dataset used for training scNet.
2. __valid_adata__: Annotated dataset used for validating scNet.
3. __condition_key__: name of the column in `obs` matrix in `train_adata` and `valid_adata` which contains the conditions for each sample.
4. __n_epochs__: number of epochs used to train scNet.
5. __batch_size__: number of sample used to sample as mini-batches in order to optmize scNet. 
6. __early_stop_limit__: number of epochs used for EarlyStopping's patience.
7. __lr_reducer__: number of epochs used for LRReduceOnPlateau's patience.
8. __save__: whether to save scNet's model and configs after training phase or not. 
9. __retrain__: if `False` and scNet's pretrained model exists in `model_path`, will restore scNet's weights. Otherwise will train and validate scNet on `train_data` and `valid_adata` respectively. 

In [12]:
network.train(train_adata,
              valid_adata, 
              condition_key=condition_key,
              n_epochs=1000,
              batch_size=512, 
              early_stop_limit=10,
              lr_reducer=8, 
              save=True, 
              retrain=True)

Instructions for updating:
Use tf.cast instead.
 |███████████---------| 56.7%  - loss: 3000.4598 - reconstruction_loss: 2994.0160 - mmd_loss: 6.4439 - val_loss: 3033.6113 - val_reconstruction_loss: 3024.9799 - val_mmd_loss: 8.63146
scNet has been successfully saved in ./models/scNet/pancreas/after/.


## Shared Your Trained scNet with other Researchers using [Zenodo](https://zenodo.org/)

You can easily get TOKEN by signing up in [**Zenodo**](https://zenodo.org/) Website and creating an app in the settings. You just have to following these steps for creating a new TOKEN: 

1. Sign in/Register in [__Zenodo__](https://zenodo.org/)
2. Go to __Applications__ page.
3. Click on __new_token__ in __Personal access tokens__ panel.
4. Give it access for `deposit:actions` and `deposit:write`.

__NOTE__: Zenodo will show the created TOKEN only once so be careful in preserving it. If you lost your TOKEN you have to create new one.

In [13]:
ACCESS_TOKEN = "PX2vU2sfJVwgnCC1Qq6o9Ca6o5ygw64Kn7P5PEUFFg9yciEZbWIZR3wpc7BK"

### 1. Create a Deposition in your zenodo account

You can use wrapper functions in `zenodo` module in scNet package to interact with your depositions and uploaded files in Zenodo. In Zenodo, A deposition is a cloud space for a publication, poster, etc which contains multiple files.

In order to create a deposition in Zenodo, You can call our `create_deposition` function with the following parameters:

-  __access_token__: Your access token
-  __upload_type__: Type of the deposition, has to be one of the following types defined in [here](https://developers.zenodo.org/#representation).
-  __title__: Title of the deposition.
-  __description__: Description of the deposition.
-  __creators__: List of creators of this deposition. Each item in the list has to be in the following form:

```
{
    "name": "LASTNAME, FIRSTNAME", (Has to be in this format)
    "affiliation": "AFFILIATION", (Optional)
    "orcid": "ORCID" (Optional, has to be a valid ORCID)
}
```





In [14]:
deposition_id = sn.zenodo.create_deposition(ACCESS_TOKEN, 
                                            upload_type="other", 
                                            title='scNet-pancreasCelSeq2,SS2',
                                            description='pre-trained scNet on CelSeq2, SmartSeq2',                                            
                                            creators=[
                                                {"name": "Naghipourfar, Mohsen", "affiliation": "SUT"},
                                            ],
                                            )

New Deposition has been successfully created!


### 2. Upload scNet to your deposition

After creating a deposition, you can easily upload your pre-trained scNet model using `upload_model` function in `zenodo` module. This function accepts the following parameters:

- __model__: Instance of scNet's class which is trained on your task
- __deposition_id__: ID of the deposition you want to upload the model in.
- __access_token__: Your TOKEN.

The function will return the generated `download_link` in order to use and provide other 

In [15]:
download_link = sn.zenodo.upload_model(network, 
                                       deposition_id=deposition_id, 
                                       access_token=ACCESS_TOKEN)

Model has been successfully uploaded


In [16]:
download_link

'https://zenodo.org/record/3834843/files/scNet-pancreas-CelSeq2SS2.zip?download=1'

### 3. Publish the created deposition

In [17]:
sn.zenodo.publish_deposition(deposition_id, ACCESS_TOKEN)

Deposition with id = 3834843 has been successfully published!


## Congrats! Your model is ready to be downloaded by others researchers!

Now you can download model directly with `download_link` variable.
You can also share your `download_link` with a title and a description of your task with scNet repository by sending pull requests. 

In order to do so, You have to do the following steps:

- Fork [scNet](https://github.com/theislab/scNet) repository
- Clone the forked repository using the following command:

```bash
git clone https://github.com/<YourUserName>/scNet
cd scNet
```

- Create a new branch with: 

```bash
git checkout -b NEW_BRANCH
```

- Create a new remote for the upstream repo with the command:

```bash
git remote add upstream https://github.com/theislab/scNet
```

```bash
git checkout -b new_branch
```

- modify the `pretrained_models.md` file by adding a row to the table in the file.
- commit and push your changes by the following command:

```bash
git commit pretrained_models.md -m "added YOUR_MODEL"
git push -u origin NEW_BRANCH
```

- Finally, Once you push the changes to your repo, the Compare & pull request button will appear in your Github repo page 