[BUG] scirpy conversion - rename productive #153

zktuong · 2022-06-09T08:51:05Z

zktuong · 2022-06-10T13:51:25Z

I have a more question during update germline sequence by update_germline. I have many samples to update. Should the fasta file be "tigger_heavy_igblast_db-pass_genotype.fasta" ? ( I also got the error in this case) or manually specify in each sample ?

OSError: Environmental variable GERMLINE must be set. Otherwise, please provide path to folder containing germline IGHV, IGHD, and IGHJ fasta files.

Hi @sbenjamaporn,

just a few things to ask -

did you run the preprocessing with the singularity container?
if you already have that tigger file, chances are you already have a germline_alignment_d_mask column in your data, and you can just use this directly (i.e. skip both update_germline and create_germlines) and just go straight to quantify_mutations and don't need to mess around with create_germline - unless tigger failed?
having said that, if you ran the preprocessing through the singularity container, then you can also do vdj.update_plus() and this will retrieve the mutation count and frequency columns into the metadata.

Some clarification:

update_germline is just to store the germline slot in the Dandelion object for easy retrieval of the sequences when running create_germlines. So you will still need to run create_germlines.
if you are going to manually specify tigger_heavy_igblast_db-pass_genotype.fasta, then each Dandelion object should only hold the sequences that tigger was run on. If sample A, B and C were belonging to individuals 1, 2 and 1, there should be two Dandelion objects, where it's A + C and B separately.

If you follow the documentation, there's an instruction like:

vdj.update_germline(corrected = 'path/to/tigger_heavy_igblast_db-pass_genotype.fasta', germline = None, org = 'human')

where germline is set as None because it's stored as an environmental variable.

I just noticed another bug with the if-else statement that would prevent manual input of the germline option which I'm looking into fixing. So the current workarounds are either:

import os
os.environ['GERMLINE'] = '/path/to/database/germlines/' # download and unpack the database file from https://github.com/zktuong/databases_for_vdj
vdj.update_germline(corrected = 'path/to/tigger_heavy_igblast_db-pass_genotype.fasta', germline = None, org = 'human')

or directly update vdj.germline with a dictionary like:

from changeo.IO import readGermlines
gml = [
'path/to/database/germlines/imgt/human/vdj/imgt_human_IGHV.fasta', 'path/to/database/germlines/imgt/human/vdj/imgt_human_IGHD.fasta', 'path/to/database/germlines/imgt/human/vdj/imgt_human_IGHJ.fasta',
'path/to/tigger_heavy_igblast_db-pass_genotype.fasta' # place this last
]
vdj.germlines.update(readGermlines(gml))

This can then be followed up with ddl.pp.create_germlines and ddl.pp.quantify_mutations.

Let me know if there's any issues

sbenjamaporn · 2022-06-14T04:57:31Z

Dear @zktuong,

Thanks for your helping and recommendation. I run the preprocessing via singularity container. Then, I use scirpy to define my clonotype. The output from scirpy did not give the column "germline_alignment_d_mask" to me, so I think even if I convert scirpy's AnnData to dandelion, the dandelion could not apply "quantify_mutations" to it, properly.
My error: RRuntimeError: Error in (function (db, sequenceColumn = "sequence_alignment", germlineColumn = "germline_alignment_d_mask", :
The column germline_alignment_d_mask was not found

I then use update germline that you have suggested (import os), follow by create_germlines. The result also show KeyError: "['germline_alignment_d_mask'] not in index".

To sum up, my main problem is "germline_alignment_d_mask did not be found in my data".

My next solution is "I will merge AIRR from your dandelion (result before I convert to scirpy containning "germline_alignment_d_mask" with AIRR from scirpy (no germline_alignment_d_mask information) to correct germline of each sequence.

Thanks again! And if you have any more suggestions, feel free to let me know

zktuong · 2022-06-14T06:57:00Z

ah ok! i see.

scirpy's conversion only transfers some default AIRR fields. to transfer everything found in a dandelion object, you should do this:

adata = ir.io.from_dandelion(vdj, include_fields = vdj.data.columns) # there's a bug in dandelion's ddl.to_scirpy that doesn't accept the additional kwargs but will be fixed in the next version

you will then find if you transfer back, the columns will be present:

vdj2 = ddl.from_scirpy(adata)
'germline_alignment_d_mask' in vdj2.data

sbenjamaporn · 2022-06-14T12:37:12Z

Dear @zktuong,

It works! Thank you so much for developing this tool and kindly responding to me.

Best regards,
Benjamaporn

zktuong added the bug Something isn't working label Jun 9, 2022

zktuong linked a pull request Jun 9, 2022 that will close this issue

Speed upgrade - Refactor generate network #152

Merged

zktuong mentioned this issue Jun 10, 2022

Cannot convert output from Scirpy to dandelion scverse/scirpy#343

Closed

zktuong mentioned this issue Jun 13, 2022

Speed upgrade - Refactor generate network #152

Merged

zktuong closed this as completed in #152 Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] scirpy conversion - rename productive #153

[BUG] scirpy conversion - rename productive #153

zktuong commented Jun 9, 2022

zktuong commented Jun 10, 2022 •

edited

Loading

sbenjamaporn commented Jun 14, 2022

zktuong commented Jun 14, 2022

sbenjamaporn commented Jun 14, 2022

[BUG] scirpy conversion - rename productive #153

[BUG] scirpy conversion - rename productive #153

Comments

zktuong commented Jun 9, 2022

zktuong commented Jun 10, 2022 • edited Loading

sbenjamaporn commented Jun 14, 2022

zktuong commented Jun 14, 2022

sbenjamaporn commented Jun 14, 2022

zktuong commented Jun 10, 2022 •

edited

Loading