# Goals

 - Annotate Chris' Cell x Gene SRX accession records via `SRAgent metadata`

# Init

In [1]:
import os
from dotenv import load_dotenv
import pandas as pd
from pypika import Query, Table, Criterion, functions as fn
from SRAgent.db.connect import db_connect

In [2]:
load_dotenv(override=True)
os.environ["DYNACONF"] = "prod"

# Load data

In [5]:
# read in srx-metadata as pandas dataframe
tbl = Table("srx_metadata")
stmt = Query \
    .from_(tbl) \
    .select("*") \
    .distinct() \
    .where(tbl.notes == "Processed by Chris Carpenter")

with db_connect() as conn:
    srx_metadata = pd.read_sql(str(stmt), conn)
srx_metadata

Unnamed: 0,database,entrez_id,srx_accession,is_illumina,is_single_cell,is_paired_end,lib_prep,tech_10x,cell_prep,organism,tissue,disease,purturbation,cell_line,czi_collection_id,czi_collection_name,notes,created_at,updated_at


# 

# `SRAgent metadata`

In [6]:
df = srx_metadata[:3000][["entrez_id","srx_accession"]]
df.to_csv("czi_annotate/records_batch7.csv", index=False)
df

Unnamed: 0,entrez_id,srx_accession


Run via screen:

> (SRAgent) nickyoungblut@sc-recounter-nick-n8

```bash
screen -L SRAgent metadata --no-summaries --use-database --no-srr czi_annotate/records_batch7.csv
```

# _OLD_

In [11]:
df = srx_metadata[:2][["entrez_id","srx_accession"]]
df.to_csv("czi_annotate/records_1-2b.csv", index=False)
df

Unnamed: 0,entrez_id,srx_accession
0,9828424,ERX3620259
1,11668718,ERX4319180


In [12]:
!SRAgent metadata --no-summaries --use-database --no-srr czi_annotate/records_1-2b.csv

[11668718] Step 1: sragent_agent_node
[9828424] Step 1: sragent_agent_node
[9828424] Step 2: get_metadata_node
[11668718] Step 2: get_metadata_node
[9828424] Step 3: router_node
[9828424] Step 4: bump_metadata_level_node
[11668718] Step 3: router_node
[11668718] Step 4: bump_metadata_level_node
[9828424] Step 5: sragent_agent_node
[9828424] Step 6: get_metadata_node
[9828424] Step 7: router_node
[9828424] Step 8: SRX2SRR_node
[9828424] Step 9: add2db_node
[9828424] Step 10: final_state_node
#-- Final results for Entrez ID 9828424 --#
# SRX accession: ERX3620259
 - SRR accessions: ERR3625055,ERR3625056,ERR3625053,ERR3625054
 - Is the dataset Illumina sequence data?: yes
 - Is the dataset single cell RNA-seq data?: yes
 - Is the dataset paired-end sequencing data?: no
 - Which scRNA-seq library preparation technology?: 10x_Genomics
 - If 10X Genomics, which particular 10X technologies?: 3_prime_gex
 - Single nucleus or single cell RNA sequencing?: single_cell
 - Which organism was sequen

In [9]:
df = srx_metadata[2:1000][["entrez_id","srx_accession"]]
df.to_csv("czi_annotate/records_3-1000b.csv", index=False)

Run via screen:

```bash
screen -L SRAgent metadata --no-summaries --use-database --no-srr czi_annotate/records_3-1000b.csv
```

In [2]:
df = srx_metadata[1000:2000][["entrez_id","srx_accession"]]
df.to_csv("czi_annotate/records_1-2k.csv", index=False)

NameError: name 'srx_metadata' is not defined

Run via screen:

```bash
screen -L SRAgent metadata --no-summaries --use-database --no-srr czi_annotate/records_1-2k.csv
```