# Introduction

Here I run latent semantic indexing (LSI) on the ATAC data using archR. Before running it, I run the following script at the command line:

```
library("ArchR")
set.seed(1)
rhdf5::h5disableFileLocking()
addArchRThreads(threads = 14)
inputFile <- '/data/clue/amo/atac/cr/aggr/outs/fragments.tsv.gz'
addArchRGenome("hg38")
valid_barcodes <- readLines('/data/clue/amo/atac/vals/concat_1_bcs.txt')
ArrowFiles <- createArrowFiles(
    inputFiles = inputFile,
    sampleNames = "clue.aggr.concat_1_bcs",
    validBarcodes = valid_barcodes,
#     filterTSS = 4, #Dont set this too high because you can always increase later
#     filterFrags = 1000,
    QCDir = '/data/clue/atac/archR/QualityControl',
    logFile = createLogFile(name="createArrows", logDir='/data/clue/atac/archR/ArchRLogs'),
    addTileMat = TRUE,
    addGeneScoreMat = TRUE
)
```

Then, run the following cells using an R kernel.

# Setup

In [2]:
suppressMessages(library("ArchR"))

In [3]:
set.seed(1)

In [4]:
suppressMessages(addArchRGenome("hg38"))

In [6]:
ArrowFile <- '/data/clue/amo/atac/archR/clue.aggr.concat_2_bcs.arrow'

In [6]:
proj <- ArchRProject(
    ArrowFiles = ArrowFile,
    outputDirectory = '/data/clue/amo/atac/archR/',
    copyArrows = FALSE # This is recommened so that you maintain an unaltered copy for later usage.
)

Using GeneAnnotation set by addArchRGenome(Hg38)!

Using GeneAnnotation set by addArchRGenome(Hg38)!

Validating Arrows...

Getting SampleNames...

1 


Getting Cell Metadata...

1 


Merging Cell Metadata...

Initializing ArchRProject...


                                                   / |
                                                 /    \
            .                                  /      |.
            \\\                              /        |.
              \\\                          /           `|.
                \\\                      /              |.
                  \                    /                |\
                  \\#####\           /                  ||
                ==###########>      /                   ||
                 \\##==......\    /                     ||
            ______ =       =|__ /__                     ||      \\\
       \               '        ##_______ _____ ,--,__,=##,__   ///
        ,    __==    ___,-,__,--'#'  ==='   

`archR` rearranged the order of the cells when it created the arrow file, so I need to re-export the cell names.

In [58]:
write.table(proj$cellNames, paste('/data/clue/amo/atac/archR/cell_labels.txt', sep=''), row.names = FALSE, sep='\t', quote = FALSE)

# Create LSI

In [8]:
proj <- addIterativeLSI(ArchRProj = proj, useMatrix = "TileMatrix", name = "IterativeLSI")

In [8]:
write.csv(getReducedDims(proj), '/data/clue/amo/atac/archR/vals/lsi.csv')

# Create Genescore

In [8]:
genescorematrix <- getMatrixFromProject(proj, useMatrix = "GeneScoreMatrix")

In [8]:
length(assay(genescorematrix))

In [8]:
length(rownames(colData(genescorematrix)))*length(rowData(genescorematrix)$name)

In [8]:
dir.create('/data/clue/amo/atac/archR/aggr_genescore/aggr_genescore/')

write(rownames(colData(genescorematrix)), '/data/clue/amo/atac/archR/aggr_genescore/barcodes.tsv')
write(rowData(genescorematrix)$name, '/data/clue/amo/atac/archR/aggr_genescore/genes.tsv')
writeMM(assay(genescorematrix), '/data/clue/amo/atac/archR/aggr_genescore/matrix.mtx')