Description
Hello,
I've been attempting to set up and run VEP on my Databricks cluster using Glow. I have created a Docker image based on the example Dockerfile from this repository and using this for my cluster, I was able to run VEP with the single node set up.
However, I have been trying to run the pipeline transformer that Glow provides -
df_intersect = glow.transform('pipe',
df,
cmd=cmd,
input_formatter='vcf',
in_vcf_header='infer',
output_formatter='vcf')
When I define the output_formatter as 'vcf', I get the error:
"Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file ..."
However, if I set the output_formatter to 'text', I can see from the output dataframe that the '#CHROM' line exists. Therefore, it seems like an issue with the output from VEP being transformed to a VCF by Glow, so I'm currently a bit stuck on how to fix this issue. I've tried to change a few of the parameters, such as the in_vcf_header and out_ignore_header, but no luck.
If anyone has any help/advice for this issue, it would be greatly appreciated!
Thanks,
Gabi