# Interoperability with `scirpy`
![dandelion_logo](img/dandelion_logo_illustration.png)

It is now possible to convert the file formats between `dandelion>=0.1.0` and `scirpy>=0.6.2.dev104` to enhance the collaboration between the analysis toolkits.

We will download the *airr_rearrangement.tsv* file from here:
```bash
# bash
wget https://cf.10xgenomics.com/samples/cell-vdj/4.0.0/sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv
```


**Import *dandelion* module**

In [1]:
import os
import dandelion as ddl
# change directory to somewhere more workable
os.chdir(os.path.expanduser('/Users/kt16/Documents/scripts/data/dandelion_tutorial/'))
ddl.logging.print_versions()

dandelion==0.1.0 pandas==1.2.3 numpy==1.20.1 matplotlib==3.3.4 networkx==2.5 scipy==1.6.1 skbio==0.5.6


In [2]:
import scirpy as ir
ir.__version__

'0.6.2.dev105'

# `dandelion`

In [3]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
vdj = ddl.read_10x_airr(file_location)
vdj

Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'cell_id', 'clone_id', 'sequence_id', 'sequence', 'sequence_aa', 'productive', 'rev_comp', 'v_call', 'v_cigar', 'd_call', 'd_cigar', 'j_call', 'j_cigar', 'c_call', 'c_cigar', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'junction_length', 'junction_aa_length', 'v_sequence_start', 'v_sequence_end', 'd_sequence_start', 'd_sequence_end', 'j_sequence_start', 'j_sequence_end', 'c_sequence_start', 'c_sequence_end', 'consensus_count', 'duplicate_count', 'is_cell', 'locus'
    metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'duplicate_count_heavy_0', 'duplicate_count_heavy_1', 'duplicate_count_light_0', 'duplicate_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_stat

The test file contains a blank `clone_id` column so we run `find_clones` to populate it first.

In [4]:
ddl.tl.find_clones(vdj)

Finding clones based on heavy chains : 100%|██████████| 157/157 [00:00<00:00, 960.80it/s]
Refining clone assignment based on light chain pairing : 100%|██████████| 978/978 [00:00<00:00, 295983.07it/s]


## `dandelion` : Converting `dandelion` to `scirpy`

In [5]:
irdata = ddl.to_scirpy(vdj)
irdata

... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germl

AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

The `clone_id` is mapped to `IR_VJ_1_clone_id` column.

`transfer = True` will perform dandelion's `tl.transfer`.

In [6]:
irdatax = ddl.to_scirpy(vdj, transfer = True)
irdatax

... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germl

AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

## `dandelion` : Converting `scirpy` to `dandelion`

In [7]:
vdjx = ddl.from_scirpy(irdata)
vdjx

Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_st

In [8]:
vdjx.metadata

Unnamed: 0,clone_id,clone_id_by_size,locus_heavy,locus_light,productive_heavy,productive_light,v_call_heavy,v_call_light,j_call_heavy,j_call_light,...,junction_aa_light,status,status_summary,productive,productive_summary,isotype,isotype_summary,vdj_status,vdj_status_summary,heavychain_status_summary
AAACCTGTCCGTTGTC-1,148_3_1_266,1066,IGH,IGK,T,T,IGHV1-69D,IGKV1-8,IGHJ3,IGKJ1,...,CQQYYSYPRTF,IGH + IGK,IGH + IGK,T + T,T + T,IgM,IgM,Single + Single,Single,Single
AAACCTGTCGAGAACG-1,92_4_1_47,1065,IGH,IGL,T,T,IGHV1-2,IGLV5-45,IGHJ3,IGLJ3,...,CMIWHSSAWVV,IGH + IGL,IGH + IGL,T + T,T + T,IgM,IgM,Single + Single,Single,Single
AAACCTGTCTTGAGAC-1,149_1_2_419,166,IGH,IGK,T,T,IGHV5-51,IGKV1D-8,IGHJ3,IGKJ2,...,CQQYYSFPYTF,IGH + IGK,IGH + IGK,T + T,T + T,IgM,IgM,Single + Single,Single,Single
AAACGGGAGCGACGTA-1,82_1_2_2,600,IGH,IGL,T,T,IGHV4-59,IGLV3-19,IGHJ3,IGLJ2,...,CNSRDSSGNHVVF,IGH + IGL,IGH + IGL,T + T,T + T,IgM,IgM,Single + Single,Single,Single
AAACGGGCACTGTTAG-1,70_1_1_92,1075,IGH,IGL,T,T,IGHV4-39,IGLV3-21,IGHJ3,IGLJ2,...,CQVWDSSSDHVVF,IGH + IGL,IGH + IGL,T + T,T + T,IgM,IgM,Single + Single,Single,Single
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTGCGCTCTAACGGT-1,138_3_1_217,156,IGH,IGL,T,T,IGHV3-43,IGLV2-8,IGHJ6,IGLJ3,...,CGSFAGSNIWVF,IGH + IGL,IGH + IGL,T + T,T + T,IgM,IgM,Single + Single,Single,Single
TTTGGTTGTAGCCTAT-1,85_1_1_240,506,IGH,IGK,T,T,IGHV4-39,IGKV6-21,IGHJ2,IGKJ4,...,CHQSSSLPLTF,IGH + IGK,IGH + IGK,T + T,T + T,IgM,IgM,Single + Single,Single,Single
TTTGGTTTCAGAGCTT-1,117_5_2_232,563,IGH,IGK,T,T,IGHV7-4-1,IGKV3-11,IGHJ4,IGKJ5,...,CQQRSNWLTF,IGH + IGK,IGH + IGK,T + T,T + T,IgM,IgM,Single + Single,Single,Single
TTTGGTTTCAGTGTTG-1,12_1_1_329,820,IGH,IGL,T,T,IGHV2-5,IGLV2-23,IGHJ4,IGLJ2,...,CCSYAGSSTFEVF,IGH + IGL,IGH + IGL,T + T,T + T,IgM,IgM,Single + Single,Single,Single


# `scirpy`

## `scirpy` : Converting `dandelion` to `scirpy`

In [9]:
irdata2 = ir.io.from_dandelion(vdj)
irdata2

... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germl

AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

likewise, `transfer = True` will perform dandelion's `tl.transfer`.

In [10]:
irdata2x = ir.io.from_dandelion(vdj, transfer = True)
irdata2x

... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_cigar' as categorical
... storing 'IR_VJ_1_d_sequence_end' as categorical
... storing 'IR_VJ_2_d_sequence_end' as categorical
... storing 'IR_VJ_1_d_sequence_start' as categorical
... storing 'IR_VJ_2_d_sequence_start' as categorical
... storing 'IR_VJ_1_germline_alignment' as categorical
... storing 'IR_VJ_2_germline_alignment' as categorical
... storing 'IR_VDJ_1_germline_alignment' as categorical
... storing 'IR_VDJ_2_germl

AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

## `scirpy` : Converting `scirpy` to `dandelion`

In [11]:
vdj3 = ir.io.to_dandelion(irdata2)
vdj3

Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_st

## `scirpy` : Read from `scirpy`, convert to `dandelion`

In [12]:
# read in the airr_rearrangement.tsv file
file_location = 'sc5p_v2_hs_PBMC_10k/sc5p_v2_hs_PBMC_10k_b_airr_rearrangement.tsv'
irdata_s = ir.io.read_airr(file_location)
irdata_s

... storing 'extra_chains' as categorical
... storing 'IR_VJ_1_c_cigar' as categorical
... storing 'IR_VJ_2_c_cigar' as categorical
... storing 'IR_VDJ_1_c_cigar' as categorical
... storing 'IR_VDJ_2_c_cigar' as categorical
... storing 'IR_VJ_1_c_sequence_end' as categorical
... storing 'IR_VJ_2_c_sequence_end' as categorical
... storing 'IR_VDJ_1_c_sequence_end' as categorical
... storing 'IR_VDJ_2_c_sequence_end' as categorical
... storing 'IR_VJ_1_c_sequence_start' as categorical
... storing 'IR_VJ_2_c_sequence_start' as categorical
... storing 'IR_VDJ_1_c_sequence_start' as categorical
... storing 'IR_VDJ_2_c_sequence_start' as categorical
... storing 'IR_VJ_1_clone_id' as categorical
... storing 'IR_VJ_2_clone_id' as categorical
... storing 'IR_VDJ_1_clone_id' as categorical
... storing 'IR_VDJ_2_clone_id' as categorical
... storing 'IR_VJ_1_d_cigar' as categorical
... storing 'IR_VJ_2_d_cigar' as categorical
... storing 'IR_VDJ_1_d_cigar' as categorical
... storing 'IR_VDJ_2_d_ci

AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

This time, find clones with `scirpy`'s method.

In [13]:
ir.tl.chain_qc(irdata_s)
ir.pp.ir_dist(irdata_s, metric = 'hamming', sequence="aa")
ir.tl.define_clonotypes(irdata_s)
irdata_s

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=105.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=231.0), HTML(value='')))




HBox(children=(HTML(value=''), FloatProgress(value=1.0, bar_style='info', layout=Layout(width='20px'), max=1.0…




AnnData object with n_obs × n_vars = 994 × 0
    obs: 'multi_chain', 'extra_chains', 'IR_VJ_1_c_call', 'IR_VJ_2_c_call', 'IR_VDJ_1_c_call', 'IR_VDJ_2_c_call', 'IR_VJ_1_c_cigar', 'IR_VJ_2_c_cigar', 'IR_VDJ_1_c_cigar', 'IR_VDJ_2_c_cigar', 'IR_VJ_1_c_sequence_end', 'IR_VJ_2_c_sequence_end', 'IR_VDJ_1_c_sequence_end', 'IR_VDJ_2_c_sequence_end', 'IR_VJ_1_c_sequence_start', 'IR_VJ_2_c_sequence_start', 'IR_VDJ_1_c_sequence_start', 'IR_VDJ_2_c_sequence_start', 'IR_VJ_1_clone_id', 'IR_VJ_2_clone_id', 'IR_VDJ_1_clone_id', 'IR_VDJ_2_clone_id', 'IR_VJ_1_consensus_count', 'IR_VJ_2_consensus_count', 'IR_VDJ_1_consensus_count', 'IR_VDJ_2_consensus_count', 'IR_VJ_1_d_call', 'IR_VJ_2_d_call', 'IR_VDJ_1_d_call', 'IR_VDJ_2_d_call', 'IR_VJ_1_d_cigar', 'IR_VJ_2_d_cigar', 'IR_VDJ_1_d_cigar', 'IR_VDJ_2_d_cigar', 'IR_VJ_1_d_sequence_end', 'IR_VJ_2_d_sequence_end', 'IR_VDJ_1_d_sequence_end', 'IR_VDJ_2_d_sequence_end', 'IR_VJ_1_d_sequence_start', 'IR_VJ_2_d_sequence_start', 'IR_VDJ_1_d_sequence_start', 'IR_VDJ_

In [14]:
vdj4 = ir.io.to_dandelion(irdata_s)
vdj4

Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_status', 'vdj_status_summary', 'he

Note that `clone_id` column is missing.

As [@grst](https://github.com/grst) has [noted](https://github.com/icbi-lab/scirpy/pull/241#discussion_r604135056), *'Currently only the chain-specific attributes get exported (i.e. all scirpy columns that start with IR_). In principle, it probably makes sense to write out the clonotype column to clone_id. But then again, scirpy allows different versions of clonal assignments...'*. What this means is that unless the `IR_*_clone_id columns` are populated, this will not be transferred over.

You can manually parse that over or use `ddl.from_scirpy`'s conversion method which will use the `clonotype` column (or `clone_id` column if already present) from the scirpy initialized `AnnData.obs` as the default `clone_id`. `clone_key` and `key_added` options can be toggled to adjust this behavior.

In [15]:
vdj5 = ddl.from_scirpy(irdata_s)
vdj5

Dandelion class object with n_obs = 978 and n_contigs = 2093
    data: 'c_call', 'c_cigar', 'c_sequence_end', 'c_sequence_start', 'clone_id', 'consensus_count', 'd_call', 'd_cigar', 'd_sequence_end', 'd_sequence_start', 'germline_alignment', 'is_cell', 'j_call', 'j_cigar', 'j_sequence_end', 'j_sequence_start', 'junction', 'junction_aa', 'junction_aa_length', 'junction_length', 'locus', 'productive', 'rev_comp', 'sequence', 'sequence_aa', 'sequence_alignment', 'sequence_id', 'v_call', 'v_cigar', 'v_sequence_end', 'v_sequence_start', 'cell_id', 'umi_count'
    metadata: 'clone_id', 'clone_id_by_size', 'locus_heavy', 'locus_light', 'productive_heavy', 'productive_light', 'v_call_heavy', 'v_call_light', 'j_call_heavy', 'j_call_light', 'c_call_heavy', 'c_call_light', 'umi_count_heavy_0', 'umi_count_heavy_1', 'umi_count_light_0', 'umi_count_light_1', 'junction_aa_heavy', 'junction_aa_light', 'status', 'status_summary', 'productive', 'productive_summary', 'isotype', 'isotype_summary', 'vdj_st