Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with R script: DESeq_stats. #70

Open
pauvic opened this issue Feb 25, 2022 · 11 comments
Open

Error with R script: DESeq_stats. #70

pauvic opened this issue Feb 25, 2022 · 11 comments

Comments

@pauvic
Copy link

pauvic commented Feb 25, 2022

Hello,
I had rthis error when I try to run DESeq_stats.

/run_DESeq_stats.R: line 4: syntax error near unexpected token {' ./run_DESeq_stats.R: line 4: suppressPackageStartupMessages({'

I could not figure out what went wrong . Would you guide me through this?

Thanks!
Paula

@transcript
Copy link
Owner

Hey Paula, that's pretty much the first line in the R script. Are you hitting this as part of the master_script.sh shell script, or are you trying to run the run_DESeq_stats.R script on its own?

Can you check your R version on your machine?

If you have RStudio, you could also open this in that program and run it line by line interactively.

Best,
Sam

@pauvic
Copy link
Author

pauvic commented Feb 28, 2022

Hello, thanks for your answer.
I am trying to run the run_DESeq_stats.R script on its own.
R version in may machine is 4.1.2 (2021-11-01) . Do you think there is a problem with my R version?
Best,
Paula

@transcript
Copy link
Owner

transcript commented Feb 28, 2022

Thanks Paula! I just checked with R version 4.1.2 (latest), and didn't run into any issues.

Next question: can you verify that the optparse and DESeq2 packages are installed on your machine? You can check this fairly easily in command-line R by running:

R
library("optparse")
library("DESeq2")

If either of these gives you a warning that it's not installed, you can install them easily in R:

Optparse:

install.packages("optparse")

DESeq2:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("DESeq2")

Can you confirm both these packages are installed?

@pauvic
Copy link
Author

pauvic commented Feb 28, 2022

Sam, thanks for your help.
I already checked that both packages were installed but unfortunately I still have the same error.
What else could I do?
thanks again for your time

@transcript
Copy link
Owner

Hi Paula, interesting, this is a tricky problem! A few more things to test, and to help me:

  1. What operating system are you using? Mac, Windows, Linux?

  2. In command line R (which you can start by just typing "R" on the terminal), could you try pasting in the following lines:

suppressPackageStartupMessages({
  library(optparse)
})

option_list = list(
  make_option(c("-I", "--input"), type="character", default="./",
              help="Input directory", metavar="character"),
  make_option(c("-O", "--out"), type="character", default="DESeq_results.tab", 
              help="output file name [default= %default]", metavar="character"),
  make_option(c("-R", "--raw_counts"), type="character", default=NULL,
              help="raw (total) read counts for this starting file", metavar="character")
)

All of this is just the first lines of the script, and I can run it (R version 4.1.2, on a Mac) copy-pasting it into command-line R without any issues or warnings.

Let me know if this works successfully or still gives you the same error.

@transcript
Copy link
Owner

Additionally, one thing more to check: can you copy/paste the error in code blocks, AKA between back-ticks like ` ?

It looks like there might be an extra mark in the message you pasted in, which could be responsible for this, but I can't tell if it's just Github trying to format the pasted content.

@pauvic
Copy link
Author

pauvic commented Mar 2, 2022

Sam, thanks for your answer and sorry for the late reply!!
1- I am using window.
2- If I run the first line of the script in R, I don't have any issues or warnings. Moreover, I also have the same error if I run other R_scripts.
3- line 4: syntax error near unexpected token {' ./run_DESeq_stats.R: line 4: suppressPackageStartupMessages({'

@pauvic
Copy link
Author

pauvic commented Mar 2, 2022

Sam, sorry for bothering you again .I have run line by line interactively in RStudio as you ask me first but I had this errors. I am sorry but I still can't make it works.
Thanks for your time!

`# DESeq statistical calculations
completeCondition <- data.frame(condition=factor(c(

  • rep(paste("control", 1:length(control_files), sep=".")),
  • rep(paste("experimental", 1:length(exp_files), sep=".")))))
    completeCondition1 <- t(completeCondition)
    colnames(complete_table) <- completeCondition1
    Error in names(x) <- value :
    'names' attribute [8] must be the same length as the vector [4]
    completeCondition2 <- data.frame(condition=factor(c(
  • rep("control", length(control_files)),
  • rep("experimental", length(exp_files)))))

dds <- DESeqDataSetFromMatrix(complete_table, completeCondition2, ~condition)
Error in DESeqDataSetFromMatrix(complete_table, completeCondition2, ~condition) :
ncol(countData) == nrow(colData) is not TRUE
dds <- DESeq(dds)
Error in is(object, "DESeqDataSet") : object 'dds' not found

This step creates the summary results output

res <- results(dds)
Error in is(object, "DESeqDataSet") : object 'dds' not found
org_results <- data.frame(res)
Error in data.frame(res) : object 'res' not found

these next steps won't work if there's only 1 control sample (no replicates)

if (y > 1) {

  • baseMeanPerLvl <- sapply( levels(dds$condition), function(lvl) rowMeans( counts(dds,normalized=TRUE)[,dds$condition == lvl] ) )
  • org_results <- merge(org_results, baseMeanPerLvl, by="row.names")
  • org_results <- org_results[,c(1,2,8,9,3,4,5,6,7)]
  • colnames(org_results)[c(3,4)] <- c("controlMean", "experimentalMean")
  • }
    Error in h(simpleError(msg, call)) :
    error in evaluating the argument 'X' in selecting a method for function 'sapply': error in evaluating the argument 'x' in selecting a method for function 'levels': object 'dds' not found

sorted_org_results <- org_results[order(-org_results$baseMean),]
Error: object 'org_results' not found
colnames(sorted_org_results)[1] <- "Organism Name"
Error in colnames(sorted_org_results)[1] <- "Organism Name" :
object 'sorted_org_results' not found

saving and finishing up

cat ("\nSuccess!\nSaving results file as ", save_filename, "\n")

Success!
Saving results file as DESeq_results.tab
write.table(sorted_org_results, file = save_filename, append = FALSE, quote = FALSE, sep = "\t", row.names = FALSE, col.names = TRUE)
Error in is.data.frame(x) : object 'sorted_org_results' not found`

@transcript
Copy link
Owner

Hi Paula, okay, it's frustrating that you're on Windows (I have Mac and Linux systems), but I can still try and help troubleshoot.

First, for the issues with trying to run the R script from the command line: a bit of searching suggests that the issue is that your interpreter is trying to run this in Bash, and not switching to use R as the shell. You can see a bit more about this here: https://unix.stackexchange.com/questions/408355/running-r-script-via-shell-script-syntax-error-near-unexpected-token .

To test this, you could try running this script from the command line explicitly in the R shell by calling:

Rscript ./run_DESeq_stats.R -i <input_directory>

But now, when you get to running it in interactive mode, it looks like you're hitting a different problem.

It looks like the error first occurs when you're trying to assign the names to the columns of the complete_table; the error seems to be stating that the complete_table didn't get built properly from the merge of the experimental_table and the control_table, so there are more sample names than actual samples.

Could you give me some idea of what samples you're running this on? How many input files are you providing, and what are their names?

If that's not enough to help me troubleshoot, I might see if you could send the inputs (or truncated versions of them) to me by email so I could test.

@pauvic
Copy link
Author

pauvic commented Mar 7, 2022 via email

@transcript
Copy link
Owner

Hi Paula,

Okay, progress! First off, I made an error in my previous comment; it needs to be -I flag (uppercase I, not lowercase) to work. So I'd try:

Rscript ./run_DESeq_stats.R -I <input_directory> -O output_file.tsv

Capital I, capital O.

That should allow optparse to find the input files directory.

I grabbed the control_T2_5.merged.RefSeq_annot_organism.tsv and experimental_T2_14.merged.RefSeq_annot_organism.tsv files, and it worked with them, after ONE change that you should be aware: the script tries to parse on underscores ("_"), and so when you had these named as "T2_5" and "T2_14", the script read them both in as T2 and complained that they were duplicates.

I will note this as an item for me to fix, but the easy solution for right now is just to replace those second underscores with dashes:

control_T2-5.merged.RefSeq_annot_organism.tsv
experimental_T2-14.merged.RefSeq_annot_organism.tsv

The *.receipt files, by the way, are not used by R for any of the analysis, so you don't need to share those. I know they're big, but they're just reporting on line-by-line progress and can be ignored for downstream analysis. They're mainly if you want to go back and check on a specific read.

Can you try running the Rscript command again, this time with the uppercase letter flags?


Second, regarding the python "bad interpreter" error, my suspicion is that, with Conda, Python is in a different location. You could probably just delete the shebang from line 1 of the script (remove #!/usr/lib/python2.7). I've also updated these to be callable with Python3 as well, so that could be an easier approach.

Sam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants