Skip to content

3. Code templates

Johan Källberg Zvrskovec edited this page May 19, 2023 · 17 revisions

SQL

View available assessments

Metadata on all assessments in the database can be found in the met.assessment table:

Screenshot 2023-03-13 at 10 54 01

View it by navigating to it in your GUI tool or select from the table by

SELECT * FROM met.assessment;

Use the assessment code and version code to extract information from this assessment from the cohort inventory below.

View the cohort inventory; an overview of the available variables

The view met.cohort_inventory is a view that brings together information about what metadata there is in the database related to stored cohort data in the tables under the coh schema. It shows all available variables with corresponding metadata.

Only selecting the inventory for the demographics questionnaire, and showing the codes for the items, original variable codes and database variable codes.

SELECT assessment_item_code, variable_original_descriptor, assessment_item_variable_code
FROM met.cohort_inventory ci
WHERE ci.assessment_code = 'covidcnsdem';

Extracting cohort data

The function coh.create_current_assessment_item_variable_tview stores a selection of cohort data in the temporary view t_export_data. Example: simple selection of specified items from the demographics questionnaire:

SELECT * FROM coh.create_current_assessment_item_variable_tview(
	cohort_code => 'covidcns',
	instance_code => '2023',
	assessment_code => 'covidcnsdem',
	assessment_version_code => '1',
	assessment_item_code => ARRAY['dateofbirth','dobage','ethnicorigin']
	);
SELECT * FROM t_export_data;

--Or if you have access to the secure schema 'sec' you can join in the study ID like this
SELECT ici.identifier_cohort, t_export_data.* FROM t_export_data INNER JOIN sec.individual_cohortinstance_identifier ici 
ON t_export_data._individual_identifier = ici.identifier;

Example: select specified items from the demographics questionnaire into a temporary table named mydem (think of this as an object or variable in another programming language):

SELECT * FROM coh.create_current_assessment_item_variable_tview(
	cohort_code => 'covidcns',
	instance_code => '2023',
	assessment_code => 'covidcnsdem',
	assessment_version_code => '1',
	assessment_item_code => ARRAY['dateofbirth','dobage','ethnicorigin']
	);
CREATE TEMP TABLE mydem AS SELECT * FROM t_export_data;

Example: extract more data from the PHQ-9 questionnaire and store in the myphq9 temporary table:

SELECT * FROM coh.create_current_assessment_item_variable_tview(
	cohort_code => 'covidcns',
	instance_code => '2023',
	assessment_code => 'phq9',
	assessment_version_code => '1'
	);
CREATE TEMP TABLE myphq9 AS SELECT * FROM t_export_data;

Example: join both temporary tables to get a combined view (individuals part of both datasets):

SELECT * FROM mydem, myphq9
WHERE mydem._stage=myphq9._stage AND mydem._individual_identifier=myphq9._individual_identifier;

Example: join both temporary tables to get a combined view (individuals part of both datasets):

SELECT * FROM mydem, myphq9
WHERE mydem._stage=myphq9._stage AND mydem._individual_identifier=myphq9._individual_identifier;

R

Using the phenodbr R package

The phenodbr package works as a wrapper for most of the functions in the PhenoDB database. Get the package from: https://github.com/tnggroup/phenodbr

A typical installation from R would include installing devtools and then using the install_github function to install the phenodbr package from GitHub.

install.packages("devtools")
devtools::install_github("tnggroup/phenodbr")

Making and cohort data extraction with the phenodbr package is done by setting up your connection by creating a phenodbr object (with the pgDatabaseUtilityClass function) and then using the selectExportData member function. Example below using my tng2101 user and an SSH tunnel connection activated from the command line on port 65432.

library(phenodbr)

cinfo<-c()
cinfo$pw <- rstudioapi::askForPassword(prompt = "Enter database password for specified user.") #Enter your password each time you run the script, not to have to store it in the code. Only works on rstudio with a graphical interface.
cinfo$host<-"localhost"
cinfo$dbname<-"phenodb"
cinfo$user<-"tng2101"
cinfo$port<-65432 

dbutil <- pgDatabaseUtilityClass(host=cinfo$host, dbname=cinfo$dbname, user=cinfo$user, port=cinfo$port, password= cinfo$pw)

dbutil$selectExportData(
  cohortCode = "covidcns",
  instanceCode = "2023",
  assessmentCode = "covidcnsdem",
  assessmentVersionCode = "1",
  assessmentItemCodeList = c("dateofbirth","dobage","ethnocorigin")
  )
View(dbutil$exportDataDf) #The extraction will be in dbutil$exportDataDf

Reading and writing files

#Writing extracted data to a tab separated value file, include variable names, quote text content
write.table(dbutil$exportDataDf, file = file.path("folder","myData.tsv"), quote = TRUE, sep = "\t", row.names = FALSE, col.names = TRUE)

#Reading previously extracted data from a tab separated value file, include variable names, quote text content, configure which strings should be interpreted as NA-values
myReadData<-read.table(file.path("folder","myData.tsv"), header=T, quote="\"", fill=T, na.string=c(".",NA,"NA",""), sep="\t")

#Writing your object to an R data-set file (.Rds)
saveRDS(object = dbutil$exportDataDf, file = file.path("folder","myData.Rds"))

#Reading your object from an R data-set file (.Rds)
myReadData<-readRDS(file=file.path("folder","myData.Rds"))

Clone this wiki locally