# Dandelion class

Much of the functions and utility of the `dandelion` package revolves around the `Dandelion` class object. The class will act as an intermediary object for storage and flexible interaction with other tools. This section will run through a quick primer to the `Dandelion` class.

<b>Import modules</b>

In [1]:
import os

os.chdir(os.path.expanduser("~/Downloads/dandelion_tutorial/"))
import dandelion as ddl

ddl.logging.print_versions()



dandelion==0.5.5.dev16 pandas==2.2.3 numpy==2.1.3 matplotlib==3.10.1 networkx==3.4.2 scipy==1.15.2


nxviz has a new API! Version 0.7.4 onwards, the old class-based API is being
deprecated in favour of a new API focused on advancing a grammar of network
graphics. If your plotting code depends on the old API, please consider
pinning nxviz at version 0.7.4, as the new API will break your old code.

To check out the new API, please head over to the docs at
https://ericmjl.github.io/nxviz/ to learn more. We hope you enjoy using it!

(This deprecation message will go away in version 1.0.)



In [2]:
vdj = ddl.read_h5ddl("dandelion_results.h5ddl")
# let's run find_clones again as this was not stored.
ddl.tl.find_clones(vdj)
vdj

Finding clones based on B cell VDJ chains : 100%|██████████| 222/222 [00:00<00:00, 3567.62it/s]
Finding clones based on B cell VJ chains : 100%|██████████| 209/209 [00:00<00:00, 5862.71it/s]
Refining clone assignment based on VJ chain pairing : 100%|██████████| 2238/2238 [00:00<00:00, 647413.78it/s]


Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

Essentially, the `.data` slot holds the AIRR contig table while the `.metadata` holds a collapsed version that is compatible with combining with `AnnData`'s `.obs` slot. You can retrieve these slots like a typical class object; for example, if I want the metadata:

In [3]:
vdj.metadata

Unnamed: 0,clone_id,clone_id_by_size,sample_id,locus_VDJ,locus_VJ,productive_VDJ,productive_VJ,v_call_genotyped_VDJ,d_call_VDJ,j_call_VDJ,...,d_call_B_VDJ_main,j_call_B_VDJ_main,v_call_B_VJ_main,j_call_B_VJ_main,isotype,isotype_status,locus_status,chain_status,rearrangement_status_VDJ,rearrangement_status_VJ
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG,B_VJ_76_2_7,169,sc5p_v2_hs_PBMC_10k,,IGK,,T,,,,...,,,"IGKV1D-33,IGKV1-33",IGKJ4,,,Orphan IGK,Orphan VJ,,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC,B_VDJ_191_3_2_VJ_185_2_3,1988,sc5p_v2_hs_PBMC_10k,IGH,IGK,T,T,"IGHV1-69,IGHV1-69D",IGHD3-22,IGHJ3,...,IGHD3-22,IGHJ3,IGKV1-8,IGKJ1,IgM,IgM,IGH + IGK,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG,B_VDJ_9_1_2_VJ_153_1_1,1602,sc5p_v2_hs_PBMC_10k,IGH,IGL,T,T,IGHV1-2,,IGHJ3,...,,IGHJ3,IGLV5-45,IGLJ3,IgM,IgM,IGH + IGL,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC,B_VDJ_92_4_2_VJ_47_1_1,1603,sc5p_v2_hs_PBMC_10k,IGH,IGK,T,T,IGHV5-51,,IGHJ3,...,,IGHJ3,IGKV1D-8,IGKJ2,IgM,IgM,IGH + IGK,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA,B_VDJ_15_2_1_VJ_83_2_6,1604,sc5p_v2_hs_PBMC_10k,IGH,IGL,T,T,IGHV4-4,IGHD6-13,IGHJ3,...,IGHD6-13,IGHJ3,IGLV3-19,"IGLJ2,IGLJ3",IgM,IgM,IGH + IGL,Single pair,standard,standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
vdj_v1_hs_pbmc3_TTTCCTCAGCAATATG,B_VDJ_61_2_1_VJ_129_2_7,812,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV2-5,"IGHD5/OR15-5a,IGHD5/OR15-5b","IGHJ4,IGHJ5",...,"IGHD5/OR15-5a,IGHD5/OR15-5b","IGHJ4,IGHJ5",IGKV4-1,IGKJ4,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT,B_VDJ_37_5_2_VJ_49_1_3,813,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV3-30,IGHD4-17,IGHJ6,...,IGHD4-17,IGHJ6,IGKV2-30,IGKJ2,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA,B_VDJ_145_1_1_VJ_35_4_14,814,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV4-61,IGHD6-13,IGHJ2,...,IGHD6-13,IGHJ2,"IGKV1-39,IGKV1D-39",IGKJ1,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG,B_VDJ_48_4_2_VJ_50_3_5,815,vdj_v1_hs_pbmc3,IGH,IGL,T,T,"IGHV1-69,IGHV1-69D",IGHD2-15,IGHJ6,...,IGHD2-15,IGHJ6,IGLV1-47,IGLJ3,IgM,IgM,IGH + IGL,Single pair,standard,standard


### slicing

You can slice the `Dandelion` object via the `.data` or `.metadata` via their indices, with the behavior similar to how it is in pandas `DataFrame` and `AnnData`.

<b>slicing</b> `.data`

In [4]:
# get the largest clone
largest_clone = vdj.data["clone_id"].value_counts().idxmax()

vdj[vdj.data["clone_id"] == largest_clone]

Dandelion class object with n_obs = 566 and n_contigs = 2802
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', '

In [5]:
vdj[
    vdj.data_names.isin(
        [
            "sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1",
            "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2",
            "sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1",
            "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1",
            "sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2",
        ]
    )
]

Dandelion class object with n_obs = 3 and n_contigs = 5
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_num

**slicing** `.metadata`

In [6]:
vdj[vdj.metadata["productive_VDJ"].isin(["T", "T|T"])]

Dandelion class object with n_obs = 2112 and n_contigs = 5052
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

In [7]:
vdj[vdj.metadata_names == "vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT"]

Dandelion class object with n_obs = 1 and n_contigs = 2
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 'j_num

### copy

You can deep copy the `Dandelion` object to another variable which will inherit all slots:

In [8]:
vdj2 = vdj.copy()
vdj2.metadata

Unnamed: 0,clone_id,clone_id_by_size,sample_id,locus_VDJ,locus_VJ,productive_VDJ,productive_VJ,v_call_genotyped_VDJ,d_call_VDJ,j_call_VDJ,...,d_call_B_VDJ_main,j_call_B_VDJ_main,v_call_B_VJ_main,j_call_B_VJ_main,isotype,isotype_status,locus_status,chain_status,rearrangement_status_VDJ,rearrangement_status_VJ
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG,B_VJ_76_2_7,169,sc5p_v2_hs_PBMC_10k,,IGK,,T,,,,...,,,"IGKV1D-33,IGKV1-33",IGKJ4,,,Orphan IGK,Orphan VJ,,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC,B_VDJ_191_3_2_VJ_185_2_3,1988,sc5p_v2_hs_PBMC_10k,IGH,IGK,T,T,"IGHV1-69,IGHV1-69D",IGHD3-22,IGHJ3,...,IGHD3-22,IGHJ3,IGKV1-8,IGKJ1,IgM,IgM,IGH + IGK,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG,B_VDJ_9_1_2_VJ_153_1_1,1602,sc5p_v2_hs_PBMC_10k,IGH,IGL,T,T,IGHV1-2,,IGHJ3,...,,IGHJ3,IGLV5-45,IGLJ3,IgM,IgM,IGH + IGL,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACCTGTCTTGAGAC,B_VDJ_92_4_2_VJ_47_1_1,1603,sc5p_v2_hs_PBMC_10k,IGH,IGK,T,T,IGHV5-51,,IGHJ3,...,,IGHJ3,IGKV1D-8,IGKJ2,IgM,IgM,IGH + IGK,Single pair,standard,standard
sc5p_v2_hs_PBMC_10k_AAACGGGAGCGACGTA,B_VDJ_15_2_1_VJ_83_2_6,1604,sc5p_v2_hs_PBMC_10k,IGH,IGL,T,T,IGHV4-4,IGHD6-13,IGHJ3,...,IGHD6-13,IGHJ3,IGLV3-19,"IGLJ2,IGLJ3",IgM,IgM,IGH + IGL,Single pair,standard,standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
vdj_v1_hs_pbmc3_TTTCCTCAGCAATATG,B_VDJ_61_2_1_VJ_129_2_7,812,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV2-5,"IGHD5/OR15-5a,IGHD5/OR15-5b","IGHJ4,IGHJ5",...,"IGHD5/OR15-5a,IGHD5/OR15-5b","IGHJ4,IGHJ5",IGKV4-1,IGKJ4,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTCCTCAGCGCTTAT,B_VDJ_37_5_2_VJ_49_1_3,813,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV3-30,IGHD4-17,IGHJ6,...,IGHD4-17,IGHJ6,IGKV2-30,IGKJ2,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTCCTCAGGGAAACA,B_VDJ_145_1_1_VJ_35_4_14,814,vdj_v1_hs_pbmc3,IGH,IGK,T,T,IGHV4-61,IGHD6-13,IGHJ2,...,IGHD6-13,IGHJ2,"IGKV1-39,IGKV1D-39",IGKJ1,IgM,IgM,IGH + IGK,Single pair,standard,standard
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG,B_VDJ_48_4_2_VJ_50_3_5,815,vdj_v1_hs_pbmc3,IGH,IGL,T,T,"IGHV1-69,IGHV1-69D",IGHD2-15,IGHJ6,...,IGHD2-15,IGHJ6,IGLV1-47,IGLJ3,IgM,IgM,IGH + IGL,Single pair,standard,standard


### Retrieving entries with `update_metadata`

The `.metadata` slot in Dandelion class automatically initializes whenever the `.data` slot is filled. However, it only returns a standard number of columns that are pre-specified. To retrieve other columns from the `.data` slot, we can update the metadata with `ddl.update_metadata` and specify the options `retrieve` and `retrieve_mode`. 

The following modes determine how the retrieval is completed:

`split and unique only` - splits the retrieval into VDJ and VJ chains. A `|` will separate _**unique**_ element.

`split and merge` - splits the retrieval into VDJ and VJ chains. A `|` will separate _**every**_ element.

`merge and unique only` - smiliar to above but merged into a single column.

`split` - split retrieval into _**individual**_ columns for each contig.

`merge` - merge retrieval into a _**single**_ column where a `|` will separate _**every**_ element.

For numerical columns, there's additional options:

`split and sum` - splits the retrieval into VDJ and VJ chains and sum separately.

`split and average` - smiliar to above but average instead of sum.

`sum` - sum the retrievals into a single column.

`average` - averages the retrievals into a single column.

If `retrieve_mode` is not specified, it will default to `split and merge`

***Example: retrieving fwr1 sequences***

In [9]:
vdj.update_metadata(retrieve="fwr1")
vdj

Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

Note the additional `fwr1` VDJ and VJ columns in the metadata slot.

By default, `dandelion` will not try to merge numerical columns as it can create mixed dtype columns.

There is a new sub-function that will try and retrieve frequently used columns such as `np1_length`, `np2_length`:

In [10]:
vdj.update_plus()
vdj



Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

## Renaming barcodes

You can now use a simple function to rename the barcodes (both sequence and cell ids at the same time). This is useful for when you want to rename the barcodes to a more meaningful name. This only works on the indices that were initially used to create the `Dandelion` object. So if you have run the function once already, it doesn't continuously add the prefix/suffix to the new indices. It just updates based on the original indices.

In [11]:
# original
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)

                                                                                 sequence_id  \
sequence_id                                                                                    
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2  sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2   
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2  sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2   
...                                                                                      ...   
vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1          vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1   
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2          vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2   
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_

(None, None)

In [12]:
# let's add a 'test-' as a prefix. There's also the suffix option
vdj.add_sequence_prefix("test", sep="-")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)

                                                                                          sequence_id  \
sequence_id                                                                                             
test-sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1  test-sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_cont...   
test-sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2  test-sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_cont...   
test-sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1  test-sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_cont...   
test-sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1  test-sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_cont...   
test-sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2  test-sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_cont...   
...                                                                                               ...   
test-vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1         test-vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1   
test-vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2         

(None, None)

In [13]:
# same functionality as above
vdj.add_cell_prefix("test2", sep="_")
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)

                                                                                          sequence_id  \
sequence_id                                                                                             
test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_cont...  test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_con...   
test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_cont...  test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_con...   
test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_cont...  test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_con...   
test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_cont...  test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_con...   
test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_cont...  test2_sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_con...   
...                                                                                               ...   
test2_vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1       test2_vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1   
test2_vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2       t

(None, None)

In [14]:
# you can also reset the ids
vdj.reset_ids()
print(vdj.data[["sequence_id", "cell_id"]]), print(vdj.metadata_names)

                                                                                 sequence_id  \
sequence_id                                                                                    
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2  sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2   
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1  sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1   
sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2  sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2   
...                                                                                      ...   
vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1          vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1   
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2          vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2   
vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_

(None, None)

### Simplifying the V/DJ/C call annotations

Sometimes the V/DJ/C call annotations can be quite verbose. You can simplify them with the `.simplify()` function. This function will remove the `,` and only keep the first element of the call, as well as stripping alleles. This is useful for when you want to simplify the V/DJ/C calls for plotting purposes.

In [15]:
# before
(
    vdj.data[["v_call_genotyped", "j_call"]],
    vdj.metadata[["v_call_genotyped_VDJ", "j_call_VDJ"]],
)

(                                                       v_call_genotyped  \
 sequence_id                                                               
 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1  IGKV1-33*01,IGKV1D-33*01   
 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2  IGHV1-69*01,IGHV1-69D*01   
 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1                IGKV1-8*01   
 sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1               IGLV5-45*02   
 sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2                IGHV1-2*02   
 ...                                                                 ...   
 vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1                   IGHV1-46*01   
 vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2      IGHV1-69*01,IGHV1-69D*01   
 vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_1                   IGLV1-47*01   
 vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_2                   IGLV2-11*01   
 vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_1      IGHV3-23*01,IGHV3-23D*01   
 
          

In [16]:
# after
vdj.simplify()
# before
(
    vdj.data[["v_call_genotyped", "j_call"]],
    vdj.metadata[["v_call_genotyped_VDJ", "j_call_VDJ"]],
)

(                                              v_call_genotyped j_call
 sequence_id                                                          
 sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1         IGKV1-33  IGKJ4
 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2         IGHV1-69  IGHJ3
 sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1          IGKV1-8  IGKJ1
 sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1         IGLV5-45  IGLJ3
 sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2          IGHV1-2  IGHJ3
 ...                                                        ...    ...
 vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1             IGHV1-46  IGHJ5
 vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2             IGHV1-69  IGHJ6
 vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_1             IGLV1-47  IGLJ3
 vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_2             IGLV2-11  IGLJ2
 vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_1             IGHV3-23  IGHJ4
 
 [7355 rows x 2 columns],
                                      v_call_geno

### concatenating multiple objects

This is a simple function to concatenate (append) two or more `Dandelion` class, or `pandas` dataframes. Note that this operates on the `.data` slot and not the `.metadata` slot.

In [17]:
vdj

Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

In [18]:
# just simple concatenation x 3. check the difference between the cell and contig numbers between this object and just vdj
vdj_concat = ddl.concat([vdj, vdj, vdj])
vdj_concat

Dandelion class object with n_obs = 6714 and n_contigs = 22065
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn',

In [19]:
vdj_concat.data[["sequence_id", "cell_id"]].head()

Unnamed: 0_level_0,sequence_id,cell_id
sequence_id,Unnamed: 1_level_1,Unnamed: 2_level_1
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_0
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_1,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_1,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_1
sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_2,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1_2,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_2
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_0
sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1_0,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_0


`ddl.concat` also lets you add in your custom prefixes/suffixes to append to the sequence ids. If not provided, it will add `-0`, `-1` etc. as a suffix if it detects that the sequence ids are not unique as seen above.

### read/write

`Dandelion` class can be saved using `.write_h5ddl` and `.write_pkl` functions with accompanying compression methods e.g. `gzip`. `write_h5ddl` primarily uses `h5py` library and `write_pkl` just uses pickle. `read_h5ddl` and `read_pkl` functions will read the respective file formats accordingly. 

In [20]:
%time vdj.write_h5ddl('dandelion_results_test.h5ddl', compression="gzip")



CPU times: user 6.39 s, sys: 137 ms, total: 6.53 s
Wall time: 6.79 s


If you see any warnings above, it's due to mix dtypes somewhere in the object. So do some checking if you think it will interfere with downstream usage.

In [21]:
%time vdj_1 = ddl.read_h5ddl('dandelion_results_test.h5ddl')
vdj_1

CPU times: user 1.19 s, sys: 60.9 ms, total: 1.25 s
Wall time: 1.37 s


Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

The read/write times using `pickle` can be situationally faster/slower and file sizes can also be situationally smaller/larger (depending on which compression is used).

In [22]:
%time vdj.write_pkl('dandelion_results_test.pkl.gz')

CPU times: user 7.4 s, sys: 89.6 ms, total: 7.49 s
Wall time: 8.15 s


In [23]:
%time vdj_2 = ddl.read_pkl('dandelion_results_test.pkl.gz')
vdj_2

CPU times: user 127 ms, sys: 13.7 ms, total: 141 ms
Wall time: 146 ms


Dandelion class object with n_obs = 2238 and n_contigs = 7355
    data: 'sequence_id', 'sequence', 'rev_comp', 'productive', 'v_call', 'd_call', 'j_call', 'sequence_alignment', 'germline_alignment', 'junction', 'junction_aa', 'v_cigar', 'd_cigar', 'j_cigar', 'stop_codon', 'vj_in_frame', 'locus', 'c_call', 'junction_length', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end', 'v_score', 'v_identity', 'v_support', 'd_score', 'd_identity', 'd_support', 'j_score', 'j_identity', 'j_support', 'fwr1', 'fwr2', 'fwr3', 'fwr4', 'cdr1', 'cdr2', 'cdr3', 'cell_id', 'consensus_count', 'umi_count', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa', 'j_support_igblastn', 'j_score_igblastn', 'j_call_igblastn', 'j_call_blastn', 'j_identity_blastn', 'j_alignment_length_blastn', 

There's also other types of writing functions such as `.write_airr` and `.write_10x`, which will write the object to a `.tsv` or `.csv` file that is compatible with `airr` and `10x` formats respectively.

In [24]:
import pandas as pd

vdj2.write_airr("test.airr.tsv")
df = pd.read_csv("test.airr.tsv", sep="\t")
df

Unnamed: 0,sequence_id,sequence,rev_comp,productive,v_call,d_call,j_call,sequence_alignment,germline_alignment,junction,...,j_call_multimappers,j_call_multiplicity,j_call_sequence_start_multimappers,j_call_sequence_end_multimappers,j_call_support_multimappers,mu_count,ambiguous,extra,rearrangement_status,clone_id
0,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1,TGGGGAGGAGTCAGTCCCAACCAGGACACGGCCTGGACATGAGGGT...,F,T,"IGKV1-33*01,IGKV1D-33*01",,IGKJ4*01,GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTGG...,GACATCCAGATGACCCAGTCTCCATCCTCCCTGTCTGCATCTGTAG...,TGTCAACAATATGACGAACTTCCCGTCACTTTC,...,IGKJ4*01,1.0,385.0,412.0,3.56e-09,27,F,F,standard,B_VJ_76_2_7
1,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2,ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA...,F,T,"IGHV1-69*01,IGHV1-69D*01",IGHD3-22*01,IGHJ3*02,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG...,...,IGHJ3*02,1.0,445.0,494.0,4.5799999999999995e-23,0,F,F,standard,B_VDJ_191_3_2_VJ_185_2_3
2,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1,AGGAGTCAGACCCTGTCAGGACACAGCATAGACATGAGGGTCCCCG...,F,T,IGKV1-8*01,,IGKJ1*01,GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG...,GCCATCCGGATGACCCAGTCTCCATCCTCATTCTCTGCATCTACAG...,TGTCAACAGTATTATAGTTACCCTCGGACGTTC,...,IGKJ1*01,1.0,380.0,415.0,2.7e-15,0,F,F,standard,B_VDJ_191_3_2_VJ_185_2_3
3,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1,ACTGTGGGGGTAAGAGGTTGTGTCCACCATGGCCTGGACTCCTCTC...,F,T,IGLV5-45*02,,IGLJ3*02,CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG...,CAGGCTGTGCTGACTCAGCCGTCTTCC...CTCTCTGCATCTCCTG...,TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC,...,IGLJ3*01,1.0,402.0,431.0,6.84e-12,8,F,F,standard,B_VDJ_9_1_2_VJ_153_1_1
4,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2,GGGAGCATCACCCAGCAACCACATCTGTCCTCTAGAGAATCCCCTG...,F,T,IGHV1-2*02,,IGHJ3*02,CAGGTGCAACTGGTGCAGTCTGGGGGT...GAGGTAAAGAAGCCTG...,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG,...,IGHJ3*02,1.0,433.0,479.0,4.48e-18,22,F,F,standard,B_VDJ_9_1_2_VJ_153_1_1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7350,vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1,ATCATCCAACAACCACATCCCTTCTCTACAGAAGCCTCTGAGAGGA...,F,T,IGHV1-46*01,IGHD2-15*01,IGHJ5*02,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG...,...,IGHJ5*02,1.0,461,506,7.83e-21,0,T,T,standard,
7351,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2,ATCACATAACAACCACATTCCTCCTCTAAAGAAGCCCCTGGGAGCA...,F,T,"IGHV1-69*01,IGHV1-69D*01",IGHD2-15*01,IGHJ6*02,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,CAGGTGCAGCTGGTGCAGTCTGGGGCT...GAGGTGAAGAAGCCTG...,TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT...,...,IGHJ6*02,1.0,439,497,4.57e-28,0,F,F,standard,B_VDJ_48_4_2_VJ_50_3_5
7352,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_1,AGCTTCAGCTGTGGTAGAGAAGACAGGATTCAGGACAATCTCCAGC...,F,T,IGLV1-47*01,,IGLJ3*02,CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG...,CAGTCTGTGCTGACTCAGCCACCCTCA...GCGTCTGGGACCCCCG...,TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC,...,IGLJ3*02,1.0,397,434,2.46e-16,0,F,F,standard,B_VDJ_48_4_2_VJ_50_3_5
7353,vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_2,GGCTGGGGTCTCAGGAGGCAGCACTCTCGGGACGTCTCCACCATGG...,F,T,IGLV2-11*01,,"IGLJ2*01,IGLJ3*01,IGLJ3*02",CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG...,CAGTCTGCCCTGACTCAGCCTCGCTCA...GTGTCCGGGTCTCCTG...,TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC,...,IGLJ3*01,1.0,393,430,2.46e-11,4,F,F,standard,B_VDJ_117_5_3_VJ_102_3_4


In [25]:
vdj2.write_10x(
    folder="10x_test",
    filename_prefix="all",
)  # this writes both the conting_annotations.csv and contig.fasta
df = pd.read_csv("10x_test/all_contig_annotations.csv")
df

Unnamed: 0,barcode,contig_id,length,chain,v_gene,d_gene,j_gene,c_gene,full_length,productive,cdr3,cdr3_nt,reads,umis,raw_clonotype_id,raw_consensus_id
0,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG,sc5p_v2_hs_PBMC_10k_AAACCTGTCATATCGG_contig_1,556,IGK,"IGKV1-33*01,IGKV1D-33*01",,IGKJ4*01,IGKC,,True,CQQYDELPVTF,TGTCAACAATATGACGAACTTCCCGTCACTTTC,9139,68,B_VJ_76_2_7,B_VJ_76_2_7
1,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_2,565,IGH,"IGHV1-69*01,IGHV1-69D*01",IGHD3-22*01,IGHJ3*02,IGHM,,True,CATTYYYDSSGYYQNDAFDIW,TGTGCGACTACGTATTACTATGATAGTAGTGGTTATTACCAGAATG...,4161,51,B_VDJ_191_3_2_VJ_185_2_3,B_VDJ_191_3_2_VJ_185_2_3
2,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC,sc5p_v2_hs_PBMC_10k_AAACCTGTCCGTTGTC_contig_1,551,IGK,IGKV1-8*01,,IGKJ1*01,IGKC,,True,CQQYYSYPRTF,TGTCAACAGTATTATAGTTACCCTCGGACGTTC,5679,43,B_VDJ_191_3_2_VJ_185_2_3,B_VDJ_191_3_2_VJ_185_2_3
3,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_1,642,IGL,IGLV5-45*02,,IGLJ3*02,IGLC3,,True,CMIWHSSAWVV,TGTATGATTTGGCACAGCAGCGCTTGGGTGGTC,13160,90,B_VDJ_9_1_2_VJ_153_1_1,B_VDJ_9_1_2_VJ_153_1_1
4,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG,sc5p_v2_hs_PBMC_10k_AAACCTGTCGAGAACG_contig_2,550,IGH,IGHV1-2*02,,IGHJ3*02,IGHM,,True,CAREIEGDGVFEIW,TGTGCGAGAGAGATAGAGGGGGACGGTGTTTTTGAAATCTGG,5080,47,B_VDJ_9_1_2_VJ_153_1_1,B_VDJ_9_1_2_VJ_153_1_1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7350,vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC,vdj_v1_hs_pbmc3_TTTCCTCTCGACAGCC_contig_1,577,IGH,IGHV1-46*01,IGHD2-15*01,IGHJ5*02,IGHM,,True,CAREGYCSGGSCYSPDPNNGWFDPW,TGTGCGAGAGAGGGATATTGTAGTGGTGGTAGCTGCTACTCCCCCG...,2960,28,,
7351,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_2,568,IGH,"IGHV1-69*01,IGHV1-69D*01",IGHD2-15*01,IGHJ6*02,IGHM,,True,CARSLDIVVVVALYYYYGMDVW,TGTGCGAGATCTCTGGATATTGTAGTGGTGGTAGCACTCTACTACT...,2464,32,B_VDJ_48_4_2_VJ_50_3_5,B_VDJ_48_4_2_VJ_50_3_5
7352,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG,vdj_v1_hs_pbmc3_TTTGCGCCATACCATG_contig_1,645,IGL,IGLV1-47*01,,IGLJ3*02,IGLC3,,True,CAAWDDSLSGWVF,TGTGCAGCATGGGATGACAGCCTGAGTGGTTGGGTGTTC,2457,28,B_VDJ_48_4_2_VJ_50_3_5,B_VDJ_48_4_2_VJ_50_3_5
7353,vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG,vdj_v1_hs_pbmc3_TTTGGTTGTAGGCATG_contig_2,641,IGL,IGLV2-11*01,,"IGLJ2*01,IGLJ3*01,IGLJ3*02",IGLC,,True,CCSYAGSYTVFF,TGCTGCTCATATGCAGGCAGCTACACTGTGTTTTTC,2744,36,B_VDJ_117_5_3_VJ_102_3_4,B_VDJ_117_5_3_VJ_102_3_4
