Skip to content

VEP Pipeline Transformer - Chrom header error #741

Open
@gabi-ryan

Description

@gabi-ryan

Hello,
I've been attempting to set up and run VEP on my Databricks cluster using Glow. I have created a Docker image based on the example Dockerfile from this repository and using this for my cluster, I was able to run VEP with the single node set up.

However, I have been trying to run the pipeline transformer that Glow provides -

df_intersect = glow.transform('pipe', 
                               df, 
                               cmd=cmd,
                               input_formatter='vcf',
                               in_vcf_header='infer',
                               output_formatter='vcf')

When I define the output_formatter as 'vcf', I get the error:

"Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file ..."

However, if I set the output_formatter to 'text', I can see from the output dataframe that the '#CHROM' line exists. Therefore, it seems like an issue with the output from VEP being transformed to a VCF by Glow, so I'm currently a bit stuck on how to fix this issue. I've tried to change a few of the parameters, such as the in_vcf_header and out_ignore_header, but no luck.

If anyone has any help/advice for this issue, it would be greatly appreciated!

Thanks,
Gabi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions