# Introduction
This is the pipeline for running freemuxlet on the ATAC data, written in plain text. This will be updated at a later date to confirm that these commands run using the tools listed and will note the version numbers. Then I will assemble into a bash script.

# Setup

In [20]:
import os
import gzip

In [10]:
mountpoint = '/data/clue/'
prefix = mountpoint + 'amo/atac/'

# Creating the VCF

1. Download the 1000 genomes VCF.

```
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz
wget http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/ALL.wgs.shapeit2_integrated_v1a.GRCh38.20181129.sites.vcf.gz.tbi
```

2. Use a text editor to change the version from 4.3 to 4.2 in the header line so that `bcftools` can run on it. This should not affect functionality.

3. Run `bcftools` to filter to a minor allele frequency of 0.05.

4. Run `bedtools intersect` to filter the SNPs to only those found in the peak sets of every well.

5. Rename the contigs from 1 to chr1 and filter for only autosomes.

```
<contig renaming code here>
samtools view -b input.bam chr{1..22} > output.bam
```

This pipeline should result in a VCF with `184243` sites located at:

In [11]:
prefix + 'vcfs/filtered.2.with_chr.autosomes.vcf.gz'

'/data/clue/amo/atac/vcfs/filtered.2.with_chr.autosomes.vcf.gz'

# Running Freemuxlet


The freemuxlet pipeline is actually composed of 2 steps:

1. Running a pileup using `popscle dsc-pileup`.
2. Running the clustering and demultiplexing using `popscle freemuxlet`.

The `dsc-pileup` command takes in the aligned reads (BAM), the VCF created above, and the filtered droplet barcodes and creates several intermediate files:

1. 

# ...

At the end, you should have a freemuxlet output file at:

In [14]:
freemux_path = prefix + 'demuxing/freemux_outs/freemux.clust1.samples.gz'
freemux_path

'/data/clue/amo/atac/demuxing/freemux_outs/freemux.clust1.samples.gz'

# Split by Well

Even though we merged for the demultiplexing, we'll load in the freemuxlet outputs separately by well, so we split them now at the command line.

In [15]:
freemux_path

'/data/clue/amo/atac/demuxing/freemux_outs/freemux.clust1.samples.gz'

In [51]:
try:
    os.mkdir(prefix + 'demuxing/freemux_outs/by_well/')
except FileExistsError:
    pass

In [58]:
freemux_file = gzip.open(freemux_path, 'rt')

In [59]:
header = freemux_file.readline()

In [60]:
wells = dict()
for well in range(1,6):
    wells[well] = dict()
    wells[well]['path'] = prefix + 'demuxing/freemux_outs/by_well/freemux_well%d.clust1.samples' % well
    wells[well]['file'] = open(wells[well]['path'], 'w')
    wells[well]['file'].write(header)

In [61]:
for line in freemux_file.readlines():
    well = int(line.split('\t')[1].split('-')[-1])
    wells[well]['file'].write(line)

In [62]:
for well in range(1, 6):
    wells[well]['file'].close()

In [63]:
freemux_file.close()