Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bedtools bam2bed race condition #29

Open
skchronicles opened this issue Apr 18, 2024 · 0 comments
Open

bedtools bam2bed race condition #29

skchronicles opened this issue Apr 18, 2024 · 0 comments

Comments

@skchronicles
Copy link
Contributor

Hey @jinyongyoo @zanglab,

I hope you are having a great day! I just wanted to start this off by saying thank you for creating and maintaining this awesome tool.

Background
I recently noticed an issue while running SICER2. While processing a large batch of samples on our cluster, I noticed that sets of samples with the sample input control were failing.

Here is the error that was received while running SICER2:

Error: Input BED files must have the first six fields. Check input.bed to see if it has the following fields: chrom, chromStart, chromEnd, name, score, and strand.

After checking the bed file for issues, I could not find any problems with it. It had all six fields. What is interesting is that one job with the same input completed successfully. One thing that I noticed is that both jobs started running around the same time. Our cluster uses a network-attached filesystem so multiple concurrent jobs will have access to the same files.

At first, I was confused as to what was happening because we weren't providing a BAM file as input, but then I took a look at what was happening under-the-hood. I see starting here there is some logic to convert a given bam file to bed format:
image

The issue is that the bed file is not being written to the output directory. It will be written to the same directory as the input bam file. This causes an issue if two treatment samples share the same input control and the jobs run at the same time. This can create a race condition where two processes are trying to write to the same file, causing a possible deadlock.

Possible solution
Would it be possible to write the converted BED file to the output directory? That way a user can create a unique output directory for a ChIP sample to avoid this issue. I can submit a PR to fix this if you want.

Please let me know what you think.

Best regards,
Skyler Kuhn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant