Gene_Centric_Coding function fails due to character-formatted annotation data #36

kwdoyle · 2023-10-23T16:13:23Z

Hello,

I've been working my way through each step in the pipeline and have encountered an error when running STAARpipeline_Gene_Centric_Coding.r

The error occurs in the internal coding function within the Gene_Centric_Coding function. When the current annotation name is "aPC.LocalDiversity", the script attempts to perform a transformation of the data:

if (Annotation_name[k] == "aPC.LocalDiversity") {
                    Annotation.PHRED.2 <- -10 * log10(1 - 10^(-Annotation.PHRED/10))
                    Annotation.PHRED <- cbind(Annotation.PHRED,
                      Annotation.PHRED.2)
                    Anno.Int.PHRED.sub.name <- c(Anno.Int.PHRED.sub.name,
                      paste0(Annotation_name[k], "(-)"))
                  }

However, the data in Annotation.PHRED is of the character class, causing this error:

Error in -Annotation.PHRED : invalid argument to unary operator
Calls: Gene_Centric_Coding -> coding

aPC.LocalDiversity is not the only character annotation. Of those picked from the name catalog, the only other character values are 3 other annotation PCs:

/cadd_phred  # numeric
/linsight  # numeric
/fathmm_xf  # numeric
/apc_epigenetics_active  # character
/apc_epigenetics_repressed  # character
/apc_epigenetics_transcription  # numeric
/apc_conservation  # numeric
/apc_local_nucleotide_diversity  # character (causing this error)
/apc_mappability  # character
/apc_transcription_factor  # numeric
/apc_protein_function  # numeric

Would anyone know why these values would be written to (or read from) the GDS file as characters?

The text was updated successfully, but these errors were encountered:

xihaoli · 2023-10-23T18:53:29Z

Hi @kwdoyle,

Thank you for letting me know. Could I ask what FAVOR database files did you use to annotate your genotype data?

Best,
Xihao

kwdoyle · 2023-10-23T19:58:56Z

I'm using the full database of 160 annotations hosted on Harvard Dataverse. More specifically, just the database file for chromosome 5 for now.

Interestingly enough, the Annotate.R script seems to work correctly. If I read in the saved, merged annotation file "Anno_chr5_STAARpipeline.csv" with data.table::fread, all of the data indicated in my previous post as being character-formatted are read in as numeric.

So presumably the issue occurs somewhere in the gds2agds.R script?

It also might be of interest that I chose to annotate the gds using all 160 annotations (rather, I set my 'anno_colnum' variable as c(1, 8:190). I chose the same initial annotations as the default (1 and 8) and then selected all variables from 8 to the end. I'm not sure if this is relevant, however, as the file from Annotate.R seems fine

kwdoyle · 2023-10-23T20:05:22Z

Oh, while looking through the scripts, I think this part in gds2agds.R might be the problem:

FunctionalAnnotation <- read_csv(paste0(dir_anno,"chr",chr,"/",anno_file_name_1,chr,anno_file_name_2),
col_types=list(col_character(),col_double(),col_double(),col_double(),col_double(),
col_double(),col_double(),col_double(),col_double(),col_double(),
col_character(),col_character(),col_character(),col_double(),col_character(),
col_character(),col_character(),col_character(),col_character(),col_double(),
col_double(),col_character()))

The character values I'm seeing might be in the location of these hard-coded character columns..

xihaoli · 2023-10-23T20:12:40Z

Hi @kwdoyle,

Thank you so much for taking a closer look at the issue. Yes indeed, the current FAVORannotator script in the STAARpipeline-Tutorial works well for the FAVOR Essential Database hosted on Harvard Dataverse. For the full database of 160 annotations, it is recommended to adapt the col_types in gds2agds.R to reflect the correct column type for each annotation.

Best,
Xihao

kwdoyle · 2023-10-24T16:21:56Z

Yes, this was indeed the issue. If I removed the col_types specification and let read_csv auto-assign the column classes, everything seems to work fine.

xihaoli · 2023-10-24T16:45:21Z

Thank you @kwdoyle! Feel free to let me know if you have any other questions, or you may close this issue.

xihaoli mentioned this issue Oct 26, 2023

Gene_Centic_Coding Unable to Analyze Gene xihaoli/STAARpipeline#9

Closed

xihaoli closed this as completed Oct 27, 2023

xihaoli mentioned this issue Dec 7, 2023

gnomAD AF & family-based vcf analysis #34

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene_Centric_Coding function fails due to character-formatted annotation data #36

Gene_Centric_Coding function fails due to character-formatted annotation data #36

kwdoyle commented Oct 23, 2023

xihaoli commented Oct 23, 2023

kwdoyle commented Oct 23, 2023

kwdoyle commented Oct 23, 2023

xihaoli commented Oct 23, 2023

kwdoyle commented Oct 24, 2023

xihaoli commented Oct 24, 2023

Gene_Centric_Coding function fails due to character-formatted annotation data #36

Gene_Centric_Coding function fails due to character-formatted annotation data #36

Comments

kwdoyle commented Oct 23, 2023

xihaoli commented Oct 23, 2023

kwdoyle commented Oct 23, 2023

kwdoyle commented Oct 23, 2023

xihaoli commented Oct 23, 2023

kwdoyle commented Oct 24, 2023

xihaoli commented Oct 24, 2023