# Tutorial: scATAC-seq data

We show an exmaple for scRNA-seq data produced by 10X Chromium. 
We use scATAC-seq data `500 Peripheral blood mononuclear cells (PBMCs) from a healthy donor (Next GEM v1.1)` (484 cells and 65,908 peaks) in [10X Genomics Datasets](https://www.10xgenomics.com/resources/datasets). The test data is directly avairable from `Peak by cell matrix HDF5 (filtered)` in [here](https://www.10xgenomics.com/resources/datasets/500-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-next-gem-v-1-1-1-1-standard-2-0-0) (registration required).

We use [scanpy](https://scanpy.readthedocs.io/en/stable/) to read/write 10X data. Import numpy, scipy, and scanpy in addlition to screcode.

In [1]:
import screcode
import numpy as np
import scanpy as sc

Read in the count matrix into an [AnnData](https://anndata.readthedocs.io/en/latest/) object. 

In [2]:
input_filename = 'data/atac_pbmc_500_nextgem_filtered_peak_bc_matrix.h5'
adata = sc.readwrite._read_v3_10x_h5(input_filename)

In [3]:
adata.X.toarray().shape

(484, 65908)

In [4]:
X_mat = adata.X.toarray()
idx_nonsilent = np.sum(X_mat,axis=0) > 0
X_temp = X_mat[:,idx_nonsilent]
X_temp.shape

(484, 65904)

In [5]:
(X_temp[0]+1)//2

array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)

In [6]:
X_temp = np.array((X_temp+1)/2,dtype=int)

In [7]:
recode = screcode.RECODE(seq_target='ATAC')
recode.fit(adata)


In [8]:
recode.applicability()

AttributeError: 'RECODE' object has no attribute 'applicability'

## Apply RECODE
Apply RECODE to the count matrix. The **anndata** or **ndarray** data format is available. 

In [None]:
recode = screcode.RECODE(seq_target='ATAC')
adata = recode.fit_transform(adata)
# 

In [None]:
data = recode.fit_transform(adata.X.toarray())

With anndata format, outputs of RECODE are included in anndata objects:
- denoised matrix -> adata.obsm['RECODE']
- noise variance -> adata.var['noise_variance_RECODE']
- normalized variance (NVSN variance) ->  adata.var['normalized_variance_RECODE']
- clasification of genes (significant/non-significant/silent) ->  adata.var['significance_RECODE']

In [None]:
adata

## Performance check
Check applicability:

In [None]:
recode.check_applicability()

Show scatter plots of mean and variance of log-scaled data before and after RECODE

In [None]:
recode.plot_mean_variance()

Show scatter plots of mean and CV (coefficient of variation) before and after RECODE.

In [None]:
recode.plot_mean_cv()

Check the log.

In [None]:
recode.log_

Output the RECODE-denoised data by h5 format:

In [None]:
output_filename = 'data/atac_pbmc_5k_nextgem_filtered_peak_bc_matrix_RECODE.h5'
adata.var_names_make_unique() 
adata.write(output_filename)