Skip to content

Workflow of transformations

Szymon Chojnacki edited this page Feb 16, 2016 · 10 revisions

This site describes data transformations workflow in piLINCS. This workflow is executed if spring.jpa.hibernate.ddl-auto=create-drop.

Step 1

We try to get from Panorama all Peptide Annotations for each assay

1.1 Assays are specified in AssayType.java.

1.2 To get a JSON we use Panorama's API with following endpoints:

https://panoramaweb.org/labkey/query/LINCS/P100/selectRows.api?schemaName=targetedms&query.queryName=peptideannotation

https://panoramaweb.org/labkey/query/LINCS/GCP/selectRows.api?schemaName=targetedms&query.queryName=peptideannotation

The URL template of the endpoint is set in config file as panorama.peptideAnnotationsUrl.

1.3 Parse JSON to get one 3-tuple with (PeptideId, Name, Value) for each PeptideId. For example only one tuple is created from two JavaScript objects below:

rows: [
{
PeptideId: 561546,
Value: "BI10003",
Id: 438782,
Name: "pr_id"
},
...
{
PeptideId: 562723,
Value: "BI10003",
Id: 448661,
Name: "pr_id"
},

1.4 Some annotations are skipped. You can find them in PeptideService.java: For example:

"EntrezGeneId"
"GeneName"
"P100_BasePeptide"
"P100_Cluster"
"P100_GeneClusterCode"
"P100_ModifiedPeptideCode"
"P100_OriginalGeneSiteCode"
"pr_P100_original_gene_site_code"
"P100_OriginalProbeID"
"PhosphoSite"
"UniprotAC"
"pr_probe_normalization_group"
"pr_probe_suitability_manual"

1.5 Some annotations are merged. You find them in: PeptideService.java. For example:

"pr_p100_modified_peptide_code"
"pr_gcp_modified_peptide_code"

1.6 Peptide annotations are grouped by PeptideId and save to a DB table:

Step 2

We connect to Panorama to get all Replicate Annotations

2.1 To get a JSON we use Panorama's API with following endpoints:

https://panoramaweb.org/labkey/query/LINCS/P100/selectRows.api?schemaName=targetedms&query.queryName=replicateannotation https://panoramaweb.org/labkey/query/LINCS/GCP/selectRows.api?schemaName=targetedms&query.queryName=replicateannotation

The URL template of the endpoint is set in config file as panorama.replicateAnnotationsUrl

2.2 Parse JSON to get one 3-tuple with (ReplicateId, Name, Value) for each ReplicateId. For example only one tuple is created from two JavaScript objects below:

rows: [
{
ReplicateId: 52512,
Value: "PM7-46D43-001A01",
Id: 108091,
Name: "id"
},
...
{
ReplicateId: 147575,
Value: "PM7-46D43-001A01",
Id: 252235,
Name: "id"
},

2.3 Some annotations are skipped. You can find them in ReplicateService.java: For example:

"canonical_smiles"
"cell_reprogrammed"
"det_normalization_group_vector"
"det_filename"
"pert_batch_internal_compound_enumerator"
"pert_batch_internal_replicate"
"provenance_code"
"pert_desc"

2.4 Unexpected situations are written to logs. For example:

2.5 Replicates/Samples are grouped by ReplicateId and save to a DB table:

Step 3

We connect to Panorama to get data points.