# *dandelion* Notebook-4

![dandelion_logo](img/dandelion_logo.png)

As mentioned, ***dandelion*** is written in `python==3.7.6` but it can be run in `R` through `reticulate`. This notebook will try to replicate the examples in notebooks 1-3 entirely in R. There are some issues with the conversion of dataframes between python and R so I would ***not*** reccomend saving the final `AnnData` object as a final out file, but only use this to help generate the intermediate files from the BCR processing and the plots.

First, install reticulate via if you don't already have it:
```R
install.packages('reticulate')
```

Because we are managing the packages through a conda virtual environment, we will need to point reticulate to the right python paths.

In [1]:
library(reticulate)
use_condaenv('dandelion')
# or use Sys.setenv(RETICULATE_PYTHON = conda_python(envname='dandelion'))

You can check if the python config is set up properly with `py_config()`

In [2]:
py_config()

python:         /Users/kt16/miniconda3/envs/dandelion/bin/python
libpython:      /Users/kt16/miniconda3/envs/dandelion/lib/libpython3.7m.dylib
pythonhome:     /Users/kt16/miniconda3/envs/dandelion:/Users/kt16/miniconda3/envs/dandelion
version:        3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 22:45:16)  [Clang 9.0.1 ]
numpy:          /Users/kt16/miniconda3/envs/dandelion/lib/python3.7/site-packages/numpy
numpy_version:  1.18.4

NOTE: Python version was forced by RETICULATE_PYTHON

To proceed with the analyses, we first change the working directory and also import the dandelion module.

In [3]:
setwd('/Users/kt16/Documents/Clatworthy_scRNAseq/Ondrej/PIP/Pan_Immune_BCR/')
ddl = import('dandelion')

As per reticulate convention, python `.` operators are to be swaped with `$` in R.

In [4]:
# prepare sample swap dictionary
sampledict = ddl$utl$dict_from_table('/Users/kt16/Documents/Clatworthy_scRNAseq/Ondrej/dandelion_files/meta/PIP_sampleInfo_kt16.txt', columns = c('SANGER SAMPLE ID', 'GEX_SAMPLE_ID')) # optional

The functions should take the python dictionaries without much messing around. But in case you prefer to use it like how you interact with R dataframe, a python dictionary is imported as a named list in R it's just a simple case of converting it to a dataframe for visualizing.

In [5]:
sampledictionary = data.frame('GEX_ID' = t(as.data.frame(sampledict)))
head(sampledictionary)

Unnamed: 0_level_0,GEX_ID
Unnamed: 0_level_1,<fct>
Pan_T7918901,Pan_T7917815
Pan_T7918902,Pan_T7917816
Pan_T7918903,Pan_T7917817
Pan_T7918904,Pan_T7917818
Pan_T7918905,Pan_T7917819
Pan_T7918906,Pan_T7917820


## Pre-processing

### Step 1:
#### Formatting the headers of the cellranger fasta file
For simplicity, I will just run it on the first sample.

In [6]:
# the first option is a list of fasta files to format and the second option is the prefix to add to each file.
samples = c('Pan_T7918901', 'Pan_T7918902', 'Pan_T7918903', 'Pan_T7918904', 'Pan_T7918905', 'Pan_T7918906', 'Pan_T7918907', 'Pan_T7918908', 'Pan_T7918909', 'Pan_T7918910', 'Pan_T7918912', 'Pan_T7918913', 'Pan_T7918914')
ddl$pp$format_fasta(paste0(samples[1], '/all_contig.fasta'), as.character(sampledict[samples[1]]))

In case you are wondering the logs and progress bars are appearing in the backend terminal.

### Step 2:
#### Reannotate the V/D/J genes with *igblastn* with `ddl$pp$reannotate_genes`.

In [7]:
ddl$pp$reannotate_genes(samples[1])

### Step 3:
#### Assigning constant region calls with `ddl$pp$assign_isotype`

In [8]:
ddl$pp$assign_isotype(paste0(samples[1], '/dandelion/data/all_contig.fasta'))

### Step 4 *(optional)*:
#### Reassigning heavy chain V gene alleles with `ddl$pp$reassign_alleles`.
I'm 'cheating' for this step and will use all the files that were generated in the first notebook to run this. As mentioned, the fuction will take the sampledict directly as well, so no need to faff around.

In [9]:
ddl$pp$reassign_alleles(samples, out_folder = 'A31', sample_dict = sampledict)

## Filtering
Now we change directory and import the gene expression data via scanpy. Technically, you could also do the set up of the gene expression data via R like in `Seurat` or `scran/scater` and then just convert it to `AnnData` format for the next section.

In [10]:
setwd('/Users/kt16/Documents/Clatworthy_scRNAseq/Ondrej/PIP/')
library(reticulate)
use_condaenv('dandelion')
ddl = import('dandelion')
sc = import("scanpy")

#### Setting up dictionaries from the meta data to let me add the info to the obs slot
I will use the `ddl$utl$dict_from_table` utility function to prepare the meta data dictionaries for scanpy. 

In [11]:
sample = 'Pan_T7918901'
bcr_folder = 'Pan_Immune_BCR/'
gex_folder = 'Pan_Immune_GEX/'
meta_file = '/Users/kt16/Documents/Clatworthy_scRNAseq/Ondrej/dandelion_files/meta/PIP_sampleInfo_kt16.txt'
sampledict = ddl$utl$dict_from_table('/Users/kt16/Documents/Clatworthy_scRNAseq/Ondrej/dandelion_files/meta/PIP_sampleInfo_kt16.txt', columns = c('SANGER SAMPLE ID', 'GEX_SAMPLE_ID'))
sampleid = ddl$utl$dict_from_table(meta_file, columns = c('SANGER SAMPLE ID', 'SANGER SAMPLE ID'))
gender = ddl$utl$dict_from_table(meta_file, columns = c('SANGER SAMPLE ID', 'GENDER'))
donor = ddl$utl$dict_from_table(meta_file, columns = c('SANGER SAMPLE ID', 'SANGERID'))
tissue = ddl$utl$dict_from_table(meta_file, columns = c('SANGER SAMPLE ID', 'TISSUE'))
experiment = ddl$utl$dict_from_table(meta_file, columns = c('SANGER SAMPLE ID', 'COMMENTS'))

#### Import the transcriptome data and populate the obs slot with meta data
We can use the normal conventions of interacting with R dataframes with the `obs` slot.

In [12]:
inputfolder=paste0(gex_folder, sampledict[sample], '/filtered_feature_bc_matrix/')
adata = sc$read_10x_mtx(inputfolder)
adata$obs['sampleid'] = sampleid[[as.character(sampledict[sample])]]
adata$obs['gender'] = gender[[as.character(sampledict[sample])]]
adata$obs['donor'] = donor[[as.character(sampledict[sample])]]
adata$obs['tissue'] = tissue[[as.character(sampledict[sample])]]
adata$obs['experiment'] = experiment[[as.character(sampledict[sample])]]
# rename cells to sample id + barcode, cleaving the trailing -1
row.names(adata$obs) = paste0(sampledict[sample], '_', gsub('-.*', '', row.names(adata$obs)))
adata

AnnData object with n_obs × n_vars = 4002 × 33694 
    obs: 'sampleid', 'gender', 'donor', 'tissue'
    var: 'gene_ids', 'feature_types'

#### Run basic scanpy pipeline with `ddl$pp$ext$run_scanpy_qc`
From this point onwards, the conversion issues will appear; basically if you try to access the `obs` slot in R, it will tell you that `py_to_r` is not working.

Having said that, we can still continue.

In [13]:
adata = ddl$pp$ext$run_scanpy_qc(adata)

AnnData object with n_obs × n_vars = 4002 × 33694 
    obs: 'sampleid', 'gender', 'donor', 'tissue', 'scrublet_score', 'n_genes', 'percent_mito', 'n_counts', 'bh_pval', 'is_doublet', 'filter_rna'
    var: 'gene_ids', 'feature_types'

### Filter cells that are potental doublets and poor quality in both the BCR data and transcriptome data
We use `ddl$tl$filter_bcr` and it returns the results as a list.

In [14]:
bcr_file = paste0(bcr_folder, sample, '/dandelion/data/all_contig_igblast_gap_genotyped.tsv')
results = ddl$tl$filter_bcr(bcr_file, adata)

#### Check the output vdj table
The vdj data is in the first slot:

In [15]:
head(results[[1]])

Unnamed: 0_level_0,sequence_id,sequence,rev_comp,productive,v_call,d_call,j_call,sequence_alignment,germline_alignment,junction,⋯,cdr3_end,np1,np1_length,np2,np2_length,junction_aa_length,c_call,cell,sample,v_call_genotyped
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<list>,<chr>,<chr>,<chr>,<chr>,⋯,<chr>,<list>,<chr>,<list>,<list>,<chr>,<list>,<chr>,<chr>,<chr>
Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_1,Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_1,GAGGAGTCAGACCCAGTCAGGACACAGCATGGACATGAGGGTCCCCGCTCAGCTCCTGGGGCTCCTGCTGCTCTGGTTCCCAGGTTCCAGATGCGACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTTTAGGAGACAGAGTCACCATCACTTGCCGGGCGAGTCAGGGTATTAGGAGGTGGTTAGCCTGGTATCAGCAAAAACCAGGGACAGCCCCTAAACTCCTGATCCATTCTGTATCCAGTTTGCAAAGTGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGACAGATTTCACTCTCATTATCAGCAGCCTGCAACCTGAAGACTTTGCAACTTACTTTTGTCTACAGGGTGAGAGTTACCCTCTCACCTTCGGCCAGGGGACACGACTGGACATTAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGC,False,True,IGKV1D-12*01,,IGKJ5*01,GACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTTTAGGAGACAGAGTCACCATCACTTGCCGGGCGAGTCAGGGTATT..................AGGAGGTGGTTAGCCTGGTATCAGCAAAAACCAGGGACAGCCCCTAAACTCCTGATCCATTCTGTA.....................TCCAGTTTGCAAAGTGGGGTCCCA...TCAAGGTTCAGTGGCAGTGGA......TCTGGGACAGATTTCACTCTCATTATCAGCAGCCTGCAACCTGAAGACTTTGCAACTTACTTTTGTCTACAGGGTGAGAGTTACCCTCTCACCTTCGGCCAGGGGACACGACTGGACATTAAAC,GACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTGTAGGAGACAGAGTCACCATCACTTGTCGGGCGAGTCAGGGTATT..................AGCAGCTGGTTAGCCTGGTATCAGCAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTATGCTGCA.....................TCCAGTTTGCAAAGTGGGGTCCCA...TCAAGGTTCAGCGGCAGTGGA......TCTGGGACAGATTTCACTCTCACTATCAGCAGCCTGCAGCTCACCTTCGGCCAAGGGACACGACTGGAGATTAAAC,TGTCTACAGGGTGAGAGTTACCCTCTCACCTTC,⋯,385,,0,,,11,IGKC,Pan_T7917815_ACCTTTATCGCTTGTC,Pan_T7917815,IGKV1D-12*01
Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_2,Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_2,AGCCTGCGAGGCGAAGATACGGCTATCTATTACTGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGGGGCCAGGGAACCCAGGTCATCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCCGAGAGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCTCTGACCAGCGGCGTGCACACCTTCCCAGCTGTCCTACAGTCCTCAGG,False,True,"IGHV3-21*06,IGHV3-30*07,IGHV3-30*12",IGHD4-23*01,IGHJ4*02,....................................................................................................................................................................................................................................................................................AGCCTGCGAGGCGAAGATACGGCTATCTATTACTGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGGGGCCAGGGAACCCAGGTCATCGTCTCCTCAG,AATTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACNNNNNNNNNNNNNNNNGACTACGGTGGNNNNNNNNNNTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGG,⋯,90,TCCCCCTACTGCGGGA,16,CGGAGCCGAT,10.0,20,IGHG2,Pan_T7917815_ACCTTTATCGCTTGTC,Pan_T7917815,"IGHV3-21*06,IGHV3-30*07,IGHV3-30*12"
Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_1,Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_1,TGGGGGAGAAGAGCTGCTCAGTTAGGACCCAGAGGGAACCATGGAAACCCCAGCGCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGACTATTAGTAGCAGTTACTTAGCCTGGTACCAGCAGAGACCTGGCCAGGCTCCCAGGCTCCTCATCCATGGTGCGTCCACCAGGGCCACGGGCATCCCAGACAGGTTCAGTGGCAGTGGGTCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGCCTGAAGATTTTGCAGTGTATTATTGTCAGCACTTTGGTAGCTCATCCTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGCCCTCCAATCGGGTAACTCCCAGGAGAGTGTCACAGAGCAGGACAGCAAGGACAGCACCTACAGCCTCAGCAGCACCCTGACGCTGAGCAAAGCAGACTACGAGAA,False,True,IGKV3-20*01,,IGKJ1*01,GAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGACTATTAGT...............AGCAGTTACTTAGCCTGGTACCAGCAGAGACCTGGCCAGGCTCCCAGGCTCCTCATCCATGGTGCG.....................TCCACCAGGGCCACGGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGCCTGAAGATTTTGCAGTGTATTATTGTCAGCACTTTGGTAGCTCATCCTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAAC,GAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTTAGC...............AGCAGCTACTTAGCCTGGTACCAGCAGAAACCTGGCCAGGCTCCCAGGCTCCTCATCTATGGTGCA.....................TCCAGCAGGGCCACTGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGNNNTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAAC,TGTCAGCACTTTGGTAGCTCATCCTGGACGTTC,⋯,394,TCC,3,,,11,IGKC,Pan_T7917815_ACTGCTCCAGGTCGTC,Pan_T7917815,IGKV3-20*01
Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_2,Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_2,GAGCTCTGAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTGATCAGCACTGAGCACCGAGGATTCACCATGGAACTGGGGCTCCGCTGGGTTTTCCTTGTTGCTATTTTAGAAGGTGTCCAGTGTGAGGTGCAGTTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGCCTGGAGTGGGTCTCCTCCATTAGTGGTGATAGTGATTACAAATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAATAGGCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCGTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG,False,True,IGHV3-21*01,IGHD4-23*01,IGHJ4*02,GAGGTGCAGTTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGCCTGGAGTGGGTCTCCTCCATTAGTGGTGAT......AGTGATTACAAATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAATAGGCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCGTCAG,GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACANNNNNNNNNCGGTGGTAACNNNCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGG,⋯,458,GAGAGCGGG,9,CAC,3.0,13,IGHM,Pan_T7917815_ACTGCTCCAGGTCGTC,Pan_T7917815,IGHV3-21*01
Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_1,Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_1,AGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAACCATGGAGGCCCCAGCTCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAGATTGTGTTGACTCAGTCTCCAGTCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTCAGCAGCTCCTTAGCCTGGTACCAACAGAAACCTGGTCAGGCTCCCAGGCTCCTCATCTATGATGCATCCAACAGGGCCACTGGCATCCCAGCCAGGTTCAGTGGCAGTGGGTCTGGGACAGCCTTCACTCTCACCATCAGCAGCCTAGAGCCTGAAGATTTTGCAGTGTATTACTGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTCGGCGGCGGGACCAAGGTGGAGATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGC,False,True,IGKV3-11*01,,IGKJ4*01,GAGATTGTGTTGACTCAGTCTCCAGTCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTC..................AGCAGCTCCTTAGCCTGGTACCAACAGAAACCTGGTCAGGCTCCCAGGCTCCTCATCTATGATGCA.....................TCCAACAGGGCCACTGGCATCCCA...GCCAGGTTCAGTGGCAGTGGG......TCTGGGACAGCCTTCACTCTCACCATCAGCAGCCTAGAGCCTGAAGATTTTGCAGTGTATTACTGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTCGGCGGCGGGACCAAGGTGGAGATCAAAC,GAAATTGTGTTGACACAGTCTCCAGCCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTT..................AGCAGCTACTTAGCCTGGTACCAACAGAAACCTGGCCAGGCTCCCAGGCTCCTCATCTATGATGCA.....................TCCAACAGGGCCACTGGCATCCCA...GCCAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGCCTAGAGCCGCTCACTTTCGGCGGAGGGACCAAGGTGGAGATCAAAC,TGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTC,⋯,400,,0,,,12,IGKC,Pan_T7917815_CCTACCAGTGCCTTGG,Pan_T7917815,IGKV3-11*01
Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_2,Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_2,GGGAGTGCTTTCTGAGAGTCATGGACCTCCTGCACAAGAACATGAAACACCTGTGGTTCTTCCTCCTCCTGGTGGCAGCTCCCAGATGGGTCCTGTCCCAGGTGCGGCTACAACAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCTCTCACCTGCGCTGTCGGTAGTGGGTCCTTCAGTGGTTACTACTGGACCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGAGAAGTCAATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATAACAGTGGACGCGTCCAAGAAGCAATTCTCCCTGAAGTTGAAATCTGTGACCGCCGCGGACACGGCTGTCTACTACTGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGGGGCCGGGGAACCCCGGTCACCGTCTCCCCCGCTTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCTGGGGGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCCCTGACCAGCGGCGTGCACACCTTCCCGGCTGTCCTACAGTCCTCAGGAGTCTGGAGACTGGGTCATCACGATGTCCCCGTAGGCACCAGAGATCCAGAGCAACAGAGAAATGAAGACCTGGGTCTGCAACACCATCTTGCTGCCCCTGCCTGC,False,True,IGHV4-34*02,IGHD6-25*01,IGHJ4*02,CAGGTGCGGCTACAACAGTGGGGCGCA...GGACTGTTGAAGCCTTCGGAGACCCTGTCTCTCACCTGCGCTGTCGGTAGTGGGTCCTTC............AGTGGTTACTACTGGACCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGAGAAGTCAATCATAGT.........GGGAGCACCAACTACAACCCGTCCCTCAAG...AGTCGAGTCACCATAACAGTGGACGCGTCCAAGAAGCAATTCTCCCTGAAGTTGAAATCTGTGACCGCCGCGGACACGGCTGTCTACTACTGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGGGGCCGGGGAACCCCGGTCACCGTCTCC,CAGGTGCAGCTACAACAGTGGGGCGCA...GGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTC............AGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGT.........GGAAGCACCAACTACAACCCGTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCNNNNNNNNNNNNNGCAGCGGCTNNNNNNNCTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCC,TGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGG,⋯,425,GTTCCCAAGGATG,13,GGGGGAC,7.0,16,IGHG3,Pan_T7917815_CCTACCAGTGCCTTGG,Pan_T7917815,IGHV4-34*02


#### Check the AnnData object as well
And the scanpy `AnnData` object is in the second slot:

In [16]:
results[[2]]

View of AnnData object with n_obs × n_vars = 816 × 33694 
    obs: 'sampleid', 'gender', 'donor', 'tissue', 'scrublet_score', 'n_genes', 'percent_mito', 'n_counts', 'bh_pval', 'is_doublet', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light'
    var: 'gene_ids', 'feature_types'

For simplicity, I will write over the original `adata` object with `results[[2]]`.

In [17]:
adata = results[[2]]

#### Now actually filter the AnnData object and run through a standard workflow starting by filtering genes and normalizing the data
Technically, you could convert the `AnnData` object into a R friendly objects like in `Seurat`,`SingleCellExperiment`  etc. and process it that way, and then format it back to `AnnData` format for the network generation section. Or, just use your pre-processed R objects and convert to `AnnData` with something like [sceasy](https://github.com/cellgeni/sceasy). If you do, you can skip this bit and head straight to the [find clones](##Finding) or [plotting](##Visualization) sections.

In [18]:
# filter genes
sc$pp$filter_genes(adata, min_cells=3)
# Normalize the counts
sc$pp$normalize_total(adata, target_sum=1e4)
# Logarithmize the data
sc$pp$log1p(adata)
# Stash the normalised counts
adata$raw = adata

#### Identify highly-variable genes and filter the genes to only those marked as highly-variable
I'm doing a subset straightway with `subset = TRUE`.

In [19]:
sc$pp$highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5, subset=TRUE)
adata

AnnData object with n_obs × n_vars = 816 × 3233 
    obs: 'sampleid', 'gender', 'donor', 'tissue', 'scrublet_score', 'n_genes', 'percent_mito', 'n_counts', 'bh_pval', 'is_doublet', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light'
    var: 'gene_ids', 'feature_types', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p'

I haven't worked out how to display the plots in line yet but if you run the next command a plot should appear.
```R
sc$pl$highly_variable_genes(adata)
```

#### Regress out effects of total counts per cell and the percentage of mitochondrial genes expressed.Scale the data to unit variance.

In [20]:
sc$pp$regress_out(adata, c('n_counts', 'percent_mito'))
sc$pp$scale(adata, max_value=10)

#### Run PCA

In [21]:
sc$tl$pca(adata, svd_solver='arpack')

#### Computing the neighborhood graph, umap and clusters

In [22]:
# Computing the neighborhood graph
sc$pp$neighbors(adata)

In [23]:
# Embedding the neighborhood graph
sc$tl$umap(adata, min_dist = 0.3)

In [24]:
# Clustering the neighborhood graph
sc$tl$leiden(adata)

#### Visualizing the clusters and whether or not there's a corresponding BCR
```R
sc$pl$umap(adata, color=c('leiden', 'has_bcr'))
```

#### Visualizing some B cell genes
```R
sc$pl$umap(adata, color=['IGHM', 'JCHAIN'])
```

## Finding clones

#### Running `tl.find_clones`

In [25]:
filtered_file = paste0(bcr_folder, sample, '/dandelion/data/all_contig_igblast_gap_genotyped_filtered.tsv')
vdj_clone = ddl$tl$find_clones(filtered_file)
vdj_clone

Unnamed: 0_level_0,sequence_id,sequence,rev_comp,productive,v_call,d_call,j_call,sequence_alignment,germline_alignment,junction,⋯,np1,np1_length,np2,np2_length,junction_aa_length,c_call,cell,sample,v_call_genotyped,clone
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<list>,<chr>,<chr>,<chr>,<chr>,⋯,<list>,<chr>,<list>,<list>,<chr>,<list>,<chr>,<chr>,<chr>,<chr>
Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_1,Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_1,GAGGAGTCAGACCCAGTCAGGACACAGCATGGACATGAGGGTCCCCGCTCAGCTCCTGGGGCTCCTGCTGCTCTGGTTCCCAGGTTCCAGATGCGACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTTTAGGAGACAGAGTCACCATCACTTGCCGGGCGAGTCAGGGTATTAGGAGGTGGTTAGCCTGGTATCAGCAAAAACCAGGGACAGCCCCTAAACTCCTGATCCATTCTGTATCCAGTTTGCAAAGTGGGGTCCCATCAAGGTTCAGTGGCAGTGGATCTGGGACAGATTTCACTCTCATTATCAGCAGCCTGCAACCTGAAGACTTTGCAACTTACTTTTGTCTACAGGGTGAGAGTTACCCTCTCACCTTCGGCCAGGGGACACGACTGGACATTAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGC,FALSE,TRUE,IGKV1D-12*01,,IGKJ5*01,GACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTTTAGGAGACAGAGTCACCATCACTTGCCGGGCGAGTCAGGGTATT..................AGGAGGTGGTTAGCCTGGTATCAGCAAAAACCAGGGACAGCCCCTAAACTCCTGATCCATTCTGTA.....................TCCAGTTTGCAAAGTGGGGTCCCA...TCAAGGTTCAGTGGCAGTGGA......TCTGGGACAGATTTCACTCTCATTATCAGCAGCCTGCAACCTGAAGACTTTGCAACTTACTTTTGTCTACAGGGTGAGAGTTACCCTCTCACCTTCGGCCAGGGGACACGACTGGACATTAAAC,GACATCCAGATGACCCAGTCTCCATCTTCTGTGTCTGCATCTGTAGGAGACAGAGTCACCATCACTTGTCGGGCGAGTCAGGGTATT..................AGCAGCTGGTTAGCCTGGTATCAGCAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTATGCTGCA.....................TCCAGTTTGCAAAGTGGGGTCCCA...TCAAGGTTCAGCGGCAGTGGA......TCTGGGACAGATTTCACTCTCACTATCAGCAGCCTGCAGCTCACCTTCGGCCAAGGGACACGACTGGAGATTAAAC,TGTCTACAGGGTGAGAGTTACCCTCTCACCTTC,⋯,,0,,,11,IGKC,Pan_T7917815_ACCTTTATCGCTTGTC,Pan_T7917815,IGKV1D-12*01,11_1_1
Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_2,Pan_T7917815_ACCTTTATCGCTTGTC-1_contig_2,AGCCTGCGAGGCGAAGATACGGCTATCTATTACTGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGGGGCCAGGGAACCCAGGTCATCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCCGAGAGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCTCTGACCAGCGGCGTGCACACCTTCCCAGCTGTCCTACAGTCCTCAGG,FALSE,TRUE,"IGHV3-21*06,IGHV3-30*07,IGHV3-30*12",IGHD4-23*01,IGHJ4*02,....................................................................................................................................................................................................................................................................................AGCCTGCGAGGCGAAGATACGGCTATCTATTACTGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGGGGCCAGGGAACCCAGGTCATCGTCTCCTCAG,AATTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACACNNNNNNNNNNNNNNNNGACTACGGTGGNNNNNNNNNNTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAGTGATCCCCCTACTGCGGGAGACTACGGTGGCGGAGCCGATTTTGACTACTGG,⋯,TCCCCCTACTGCGGGA,16,CGGAGCCGAT,10,20,IGHG2,Pan_T7917815_ACCTTTATCGCTTGTC,Pan_T7917815,"IGHV3-21*06,IGHV3-30*07,IGHV3-30*12",11_1_1
Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_1,Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_1,TGGGGGAGAAGAGCTGCTCAGTTAGGACCCAGAGGGAACCATGGAAACCCCAGCGCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGACTATTAGTAGCAGTTACTTAGCCTGGTACCAGCAGAGACCTGGCCAGGCTCCCAGGCTCCTCATCCATGGTGCGTCCACCAGGGCCACGGGCATCCCAGACAGGTTCAGTGGCAGTGGGTCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGCCTGAAGATTTTGCAGTGTATTATTGTCAGCACTTTGGTAGCTCATCCTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGCCCTCCAATCGGGTAACTCCCAGGAGAGTGTCACAGAGCAGGACAGCAAGGACAGCACCTACAGCCTCAGCAGCACCCTGACGCTGAGCAAAGCAGACTACGAGAA,FALSE,TRUE,IGKV3-20*01,,IGKJ1*01,GAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGACTATTAGT...............AGCAGTTACTTAGCCTGGTACCAGCAGAGACCTGGCCAGGCTCCCAGGCTCCTCATCCATGGTGCG.....................TCCACCAGGGCCACGGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGCCTGAAGATTTTGCAGTGTATTATTGTCAGCACTTTGGTAGCTCATCCTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAAC,GAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTTAGC...............AGCAGCTACTTAGCCTGGTACCAGCAGAAACCTGGCCAGGCTCCCAGGCTCCTCATCTATGGTGCA.....................TCCAGCAGGGCCACTGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGNNNTGGACGTTCGGCCAAGGGACCAAGGTGGAAATCAAAC,TGTCAGCACTTTGGTAGCTCATCCTGGACGTTC,⋯,TCC,3,,,11,IGKC,Pan_T7917815_ACTGCTCCAGGTCGTC,Pan_T7917815,IGKV3-20*01,29_2_1
Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_2,Pan_T7917815_ACTGCTCCAGGTCGTC-1_contig_2,GAGCTCTGAGAGAGGAGCCTTAGCCCTGGATTCCAAGGCCTATCCACTTGGTGATCAGCACTGAGCACCGAGGATTCACCATGGAACTGGGGCTCCGCTGGGTTTTCCTTGTTGCTATTTTAGAAGGTGTCCAGTGTGAGGTGCAGTTGGTGGAGTCTGGGGGAGGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGCCTGGAGTGGGTCTCCTCCATTAGTGGTGATAGTGATTACAAATACTACGCAGACTCAGTGAAGGGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAATAGGCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCGTCAGGGAGTGCATCCGCCCCAACCCTTTTCCCCCTCGTCTCCTGTGAGAATTCCCCGTCGGATACGAGCAGCGTG,FALSE,TRUE,IGHV3-21*01,IGHD4-23*01,IGHJ4*02,GAGGTGCAGTTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGCCTGGAGTGGGTCTCCTCCATTAGTGGTGAT......AGTGATTACAAATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAATAGGCTGAGAGCCGAGGACACGGCTGTGTATTACTGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCGTCAG,GAGGTGCAGCTGGTGGAGTCTGGGGGA...GGCCTGGTCAAGCCTGGGGGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTC............AGTAGCTATAGCATGAACTGGGTCCGCCAGGCTCCAGGGAAGGGGCTGGAGTGGGTCTCATCCATTAGTAGTAGT......AGTAGTTACATATACTACGCAGACTCAGTGAAG...GGCCGATTCACCATCTCCAGAGACAACGCCAAGAACTCACTGTATCTGCAAATGAACAGCCTGAGAGCCGAGGACANNNNNNNNNCGGTGGTAACNNNCTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAGAGGAGAGCGGGCGGCGGTAACCACCTACTGG,⋯,GAGAGCGGG,9,CAC,3,13,IGHM,Pan_T7917815_ACTGCTCCAGGTCGTC,Pan_T7917815,IGHV3-21*01,29_2_1
Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_1,Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_1,AGAGAGCCCTGGGGAGGAACTGCTCAGTTAGGACCCAGAGGGAACCATGGAGGCCCCAGCTCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAGATTGTGTTGACTCAGTCTCCAGTCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTCAGCAGCTCCTTAGCCTGGTACCAACAGAAACCTGGTCAGGCTCCCAGGCTCCTCATCTATGATGCATCCAACAGGGCCACTGGCATCCCAGCCAGGTTCAGTGGCAGTGGGTCTGGGACAGCCTTCACTCTCACCATCAGCAGCCTAGAGCCTGAAGATTTTGCAGTGTATTACTGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTCGGCGGCGGGACCAAGGTGGAGATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGC,FALSE,TRUE,IGKV3-11*01,,IGKJ4*01,GAGATTGTGTTGACTCAGTCTCCAGTCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTC..................AGCAGCTCCTTAGCCTGGTACCAACAGAAACCTGGTCAGGCTCCCAGGCTCCTCATCTATGATGCA.....................TCCAACAGGGCCACTGGCATCCCA...GCCAGGTTCAGTGGCAGTGGG......TCTGGGACAGCCTTCACTCTCACCATCAGCAGCCTAGAGCCTGAAGATTTTGCAGTGTATTACTGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTCGGCGGCGGGACCAAGGTGGAGATCAAAC,GAAATTGTGTTGACACAGTCTCCAGCCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTT..................AGCAGCTACTTAGCCTGGTACCAACAGAAACCTGGCCAGGCTCCCAGGCTCCTCATCTATGATGCA.....................TCCAACAGGGCCACTGGCATCCCA...GCCAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGCCTAGAGCCGCTCACTTTCGGCGGAGGGACCAAGGTGGAGATCAAAC,TGTCAGCAGCGTAGCAACTGGCCTCCGCTCACTTTC,⋯,,0,,,12,IGKC,Pan_T7917815_CCTACCAGTGCCTTGG,Pan_T7917815,IGKV3-11*01,31_3_1
Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_2,Pan_T7917815_CCTACCAGTGCCTTGG-1_contig_2,GGGAGTGCTTTCTGAGAGTCATGGACCTCCTGCACAAGAACATGAAACACCTGTGGTTCTTCCTCCTCCTGGTGGCAGCTCCCAGATGGGTCCTGTCCCAGGTGCGGCTACAACAGTGGGGCGCAGGACTGTTGAAGCCTTCGGAGACCCTGTCTCTCACCTGCGCTGTCGGTAGTGGGTCCTTCAGTGGTTACTACTGGACCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGAGAAGTCAATCATAGTGGGAGCACCAACTACAACCCGTCCCTCAAGAGTCGAGTCACCATAACAGTGGACGCGTCCAAGAAGCAATTCTCCCTGAAGTTGAAATCTGTGACCGCCGCGGACACGGCTGTCTACTACTGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGGGGCCGGGGAACCCCGGTCACCGTCTCCCCCGCTTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCTGGGGGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCCCTGACCAGCGGCGTGCACACCTTCCCGGCTGTCCTACAGTCCTCAGGAGTCTGGAGACTGGGTCATCACGATGTCCCCGTAGGCACCAGAGATCCAGAGCAACAGAGAAATGAAGACCTGGGTCTGCAACACCATCTTGCTGCCCCTGCCTGC,FALSE,TRUE,IGHV4-34*02,IGHD6-25*01,IGHJ4*02,CAGGTGCGGCTACAACAGTGGGGCGCA...GGACTGTTGAAGCCTTCGGAGACCCTGTCTCTCACCTGCGCTGTCGGTAGTGGGTCCTTC............AGTGGTTACTACTGGACCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGAGAAGTCAATCATAGT.........GGGAGCACCAACTACAACCCGTCCCTCAAG...AGTCGAGTCACCATAACAGTGGACGCGTCCAAGAAGCAATTCTCCCTGAAGTTGAAATCTGTGACCGCCGCGGACACGGCTGTCTACTACTGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGGGGCCGGGGAACCCCGGTCACCGTCTCC,CAGGTGCAGCTACAACAGTGGGGCGCA...GGACTGTTGAAGCCTTCGGAGACCCTGTCCCTCACCTGCGCTGTCTATGGTGGGTCCTTC............AGTGGTTACTACTGGAGCTGGATCCGCCAGCCCCCAGGGAAGGGGCTGGAGTGGATTGGGGAAATCAATCATAGT.........GGAAGCACCAACTACAACCCGTCCCTCAAG...AGTCGAGTCACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCCNNNNNNNNNNNNNGCAGCGGCTNNNNNNNCTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCC,TGTGCGGTTCCCAAGGATGGCAGCGGCTGGGGGACCTTTCACCACTGG,⋯,GTTCCCAAGGATG,13,GGGGGAC,7,16,IGHG3,Pan_T7917815_CCTACCAGTGCCTTGG,Pan_T7917815,IGHV4-34*02,31_3_1
Pan_T7917815_CGATCGGAGGGCTCTC-1_contig_1,Pan_T7917815_CGATCGGAGGGCTCTC-1_contig_1,GGGGAGAAGAGCTGCTCAGTTAGGACCCAGAGGGAACCATGGAAACCCCAGCGCAGCTTCTCTTCCTCCTGCTACTCTGGCTCCCAGATACCACCGGAGAAGTTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGCAAAGAGTCACCCTCTCCTGCAGGGCCAGTCAGAGTATTGGCAGGTCCTTAATCTGGTACCAGCAAAAACCTGGCCAGGCTCCCAGACTCCTCATCTATACTGCATCCACCAGGGCCACTGGCATCCCAGACAGGTTCAGTGGCAGTGGGTCTGGGACAGACTTCTCTCTCACCATCAGCAGACTAGAGGCTGAAGATTTTGCAGTGTATTACTGTCAACAGGTTTTTAGCTCACCTCAGACTTTCGGCGGAGGGACCAGGGTGGAGATCAAACGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGCCCTCCAATCGGGT,FALSE,TRUE,IGKV3-20*01,,IGKJ4*01,GAAGTTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGCAAAGAGTCACCCTCTCCTGCAGGGCCAGTCAGAGTATTGGC...............AGG---TCCTTAATCTGGTACCAGCAAAAACCTGGCCAGGCTCCCAGACTCCTCATCTATACTGCA.....................TCCACCAGGGCCACTGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCTCTCTCACCATCAGCAGACTAGAGGCTGAAGATTTTGCAGTGTATTACTGTCAACAGGTTTTTAGCTCACCTCAGACTTTCGGCGGAGGGACCAGGGTGGAGATCAAAC,GAAATTGTGTTGACGCAGTCTCCAGGCACCCTGTCTTTGTCTCCAGGGGAAAGAGCCACCCTCTCCTGCAGGGCCAGTCAGAGTGTTAGC...............AGCAGCTACTTAGCCTGGTACCAGCAGAAACCTGGCCAGGCTCCCAGGCTCCTCATCTATGGTGCA.....................TCCAGCAGGGCCACTGGCATCCCA...GACAGGTTCAGTGGCAGTGGG......TCTGGGACAGACTTCACTCTCACCATCAGCAGACTGGAGCCTGNNACTTTCGGCGGAGGGACCAAGGTGGAGATCAAAC,TGTCAACAGGTTTTTAGCTCACCTCAGACTTTC,⋯,AG,2,,,11,IGKC,Pan_T7917815_CGATCGGAGGGCTCTC,Pan_T7917815,IGKV3-20*01,36_1_2
Pan_T7917815_CGATCGGAGGGCTCTC-1_contig_2,Pan_T7917815_CGATCGGAGGGCTCTC-1_contig_2,GGCATCACATAACAACCACATTCCTCCTCTCAAGAAGCCCCTGGGAGCACAGCTCTTCACCATGGACTGGACCTGGAGGTTCCTCTTTGTGGTGGCAGCATCTACAGGTGTCCAGTCCCAGGTGCACCTGGTGCAGCCGGGGGCTGAGGTGAAGAAACCTGGGTCCTCGGTGAAAGTCTCCTGCGAGGCTTCTGGAGCCACATTCAGCAAATATTCTTTCGGCTGGCTGAAACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGGGGGATCATCCCCTTCTCTGGTAGACCAAAGTATGGCCCGAAGTTCCAGGGCAGACTCACGATAAGCGCGGACGAATCTACGAGAACAGTCTACATGGAGCTGACCAGCCTGACATCTGAGGACACGGCCATCTATTACTGTGCGAAATGGGCCGCAATTTGTGAGAGTGGCGATTGCTATGAGGTCTCTTTTGACTATTGGGGCCAGGGAACCCTGCTAACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCCGAGAGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCTCTGACCAGCGGCGTGCACACCTTCCCAGCTGTCCTACAGTCCTCAGGA,FALSE,TRUE,"IGHV1-69*01,IGHV1-69D*01",IGHD2-21*01,IGHJ4*02,CAGGTGCACCTGGTGCAGCCGGGGGCT...GAGGTGAAGAAACCTGGGTCCTCGGTGAAAGTCTCCTGCGAGGCTTCTGGAGCCACATTC............AGCAAATATTCTTTCGGCTGGCTGAAACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGGGGGATCATCCCCTTC......TCTGGTAGACCAAAGTATGGCCCGAAGTTCCAG...GGCAGACTCACGATAAGCGCGGACGAATCTACGAGAACAGTCTACATGGAGCTGACCAGCCTGACATCTGAGGACACGGCCATCTATTACTGTGCGAAATGGGCCGCAATTTGTGAGAGTGGCGATTGCTATGAGGTCTCTTTTGACTATTGGGGCCAGGGAACCCTGCTAACCGTCTCCTCAG,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTGGGTCCTCGGTGAAGGTCTCCTGCAAGGCTTCTGGAGGCACCTTC............AGCAGCTATGCTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGAGGGATCATCCCTATC......TTTGGTACAGCAAACTACGCACAGAAGTTCCAG...GGCAGAGTCACGATTACCGCGGACGAATCCACGAGCACAGCCTACATGGAGCTGAGCAGCCTGAGATCTGAGGNNNNNNNNNNNNNNNNNNNNNGTGGTGATTGCTATNNNNNNNNNTTTGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAAATGGGCCGCAATTTGTGAGAGTGGCGATTGCTATGAGGTCTCTTTTGACTATTGG,⋯,AATGGGCCGCAATTTGTGAGA,21,GAGGTCTCT,9,21,IGHG2,Pan_T7917815_CGATCGGAGGGCTCTC,Pan_T7917815,"IGHV1-69*01,IGHV1-69D*01",36_1_2
Pan_T7917815_CGGACGTTCTTTAGGG-1_contig_1,Pan_T7917815_CGGACGTTCTTTAGGG-1_contig_1,GAGGAGTCAGTCTCAGTCAGGACACAGCATGGACATGAGGGTCCCCGCTCAGCTCCTGGGGCTCCTGCTACTCTGGCTCCGAGGTGCCAGATGTGACATCCAGATGACCCAGTCTCCATCTTTCCTGTCTGCATCTGTAGGAGATAGAGTCACCATCACTTGCCGGGCTAGTCAGACCATTAGCAACTATTTAAATTGGTATCAGCAAAAACCGACGAAAGCCCCTAGCCTCCTGATCTATGCTGCATCCACTTTGGAAAGTGGGGTCCCATCAAGGTTCGGTGCCAGTGGGTCTGGGACAGATTTCACTCTCACCATCAGCAGTCTGCAACCTGAAGATTTTGCAACTTACTACTGTCAACAGAGTTTCACTACCCCTCACACTTTTGGCCAGGGGACCAACCTGGAGATCAAAGGAACTGTGGCTGCACCATCTGTCTTCATCTTCCCGCCATCTGATGAGCAGTTGAAATCTGGAACTGCCTCTGTTGTGTGCCTGCTGAATAACTTCTATCCCAGAGAGGCCAAAGTACAGTGGAAGGTGGATAACGC,FALSE,TRUE,"IGKV1-39*01,IGKV1D-39*01",,IGKJ2*01,GACATCCAGATGACCCAGTCTCCATCTTTCCTGTCTGCATCTGTAGGAGATAGAGTCACCATCACTTGCCGGGCTAGTCAGACCATT..................AGCAACTATTTAAATTGGTATCAGCAAAAACCGACGAAAGCCCCTAGCCTCCTGATCTATGCTGCA.....................TCCACTTTGGAAAGTGGGGTCCCA...TCAAGGTTCGGTGCCAGTGGG......TCTGGGACAGATTTCACTCTCACCATCAGCAGTCTGCAACCTGAAGATTTTGCAACTTACTACTGTCAACAGAGTTTCACTACCCCTCACACTTTTGGCCAGGGGACCAACCTGGAGATCAAA,GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAGGAGACAGAGTCACCATCACTTGCCGGGCAAGTCAGAGCATT..................AGCAGCTATTTAAATTGGTATCAGCAGAAACCAGGGAAAGCCCCTAAGCTCCTGATCTATGCTGCA.....................TCCAGTTTGCAAAGTGGGGTCCCA...TCAAGGTTCAGTGGCAGTGGA......TCTGGGACAGATTTCACTCTCACCATCAGCAGTCTGCAACACACTTTTGGCCAGGGGACCAAGCTGGAGATCAAA,TGTCAACAGAGTTTCACTACCCCTCACACTTTT,⋯,,0,,,11,IGKC,Pan_T7917815_CGGACGTTCTTTAGGG,Pan_T7917815,"IGKV1-39*01,IGKV1D-39*01",66_2_1
Pan_T7917815_CGGACGTTCTTTAGGG-1_contig_2,Pan_T7917815_CGGACGTTCTTTAGGG-1_contig_2,GAAACTGAGCTCCATGACAGCTGCGGACACGGCCATTTATTACTGTGCGAGGAGATATTATGATAGTGGTGGTCCCTTCGACCCCTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAGCCTCCACCAAGGGCCCATCGGTCTTCCCCCTGGCGCCCTGCTCCAGGAGCACCTCCGAGAGCACAGCGGCCCTGGGCTGCCTGGTCAAGGACTACTTCCCCGAACCGGTGACGGTGTCGTGGAACTCAGGCGCTCTGACCAGCGGCGTGCACACCTTCCCAGCTGTCCTACAGTCCTCAGGA,FALSE,TRUE,"IGHV4-59*01,IGHV4-59*02,IGHV4-59*07",IGHD3-22*01,IGHJ5*02,..........................................................................................................................................................................................................................................................................GAAACTGAGCTCCATGACAGCTGCGGACACGGCCATTTATTACTGTGCGAGGAGATATTATGATAGTGGTGGTCCCTTCGACCCCTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,AGACACGTCCAAGAACCAGTTCTCCCTGAAGCTGAGCTCTGTGACCGCTGCNNNNNNNTATGATAGTAGTGGTNNNTTCGACCCCTGGGGCCAGGGAACCCTGGTCACCGTCTCCTCAG,TGTGCGAGGAGATATTATGATAGTGGTGGTCCCTTCGACCCCTGG,⋯,GAGATAT,7,CCC,3,15,IGHG2,Pan_T7917815_CGGACGTTCTTTAGGG,Pan_T7917815,"IGHV4-59*01,IGHV4-59*02,IGHV4-59*07",66_2_1


## Visualization of BCR network

#### Generate network with `ddl$tl$generate_network`

In [26]:
network = ddl$tl$generate_network(vdj_clone)

#### Visualizing BCR network with `ddl$pl$igraph_network`
```R
ddl$pl$igraph_network(network, colorby = 'clone', visual_style = list('vertex_size'='5'))
```

##### Transfer network to `AnnData` object
As mentioned, you can provide a pre-processed R object analysed by `seurat`/`scran` etc. and just convert it to `AnnData` format. 

In [27]:
ddl$tl$transfer_network(adata, network)
adata

AnnData object with n_obs × n_vars = 816 × 3233 
    obs: 'sampleid', 'gender', 'donor', 'tissue', 'scrublet_score', 'n_genes', 'percent_mito', 'n_counts', 'bh_pval', 'is_doublet', 'filter_rna', 'has_bcr', 'filter_bcr_quality', 'filter_bcr_heavy', 'filter_bcr_light', 'leiden', 'clone', 'clone_group', 'isotype', 'lightchain', 'productive', 'heavychain_v', 'lightchain_v', 'heavychain_j', 'lightchain_j'
    var: 'gene_ids', 'feature_types', 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
    uns: 'log1p', 'pca', 'neighbors', 'umap', 'leiden'
    obsm: 'X_pca', 'X_umap', 'X_bcr'
    varm: 'PCs'

#### Visualizing BCR network with `ddl$pl$plot_network`
```R
sc$set_figure_params(figsize = c(8,8))
ddl$pl$plot_network(adata, color = c('clone_group'), legend_loc = 'on data', legend_fontoutline=3, edges_width = 1)
```

Done!