Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input data format #209

Closed
johanneswerner opened this issue Nov 6, 2019 · 2 comments
Closed

Input data format #209

johanneswerner opened this issue Nov 6, 2019 · 2 comments

Comments

@johanneswerner
Copy link

johanneswerner commented Nov 6, 2019

I have two questions about the input data:

  1. I am not clear what the transcript column means. I used featurecounts (from the rsubread package) to create my count table, where the row names are the gene names predicted from the metagenome and all other columns comprise the unnormalized counts. But I am not clear what the transcript column represent.

  2. I found this paper for an example on batch effects: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880143/

Batch effects are sub-groups of measurements that have qualitatively different behaviour across conditions and are unrelated to the biological or scientific variables in a study. For example, batch effects may occur if a subset of experiments was run on Monday and another set on Tuesday, if two technicians were responsible for different subsets of the experiments or if two different lots of reagents, chips or instruments were used. These effects are not exclusive to high-throughput biology and genomics research1, and batch effects also affect low-dimensional molecular measurements, such as northern blots and quantitative PCR. Although batch effects are difficult or impossible to detect in low-dimensional assays, high-throughput technologies provide enough data to detect and even remove them. However, if not properly dealt with, these effects can have a particularly strong and pervasive impact. Specific examples have been documented in published studies2,3 in which the biological variables were extremely correlated with technical variables, which subsequently led to serious concerns about the validity of the biological conclusions4,5.

Would this mean, that the batch column would have the value "1" for all samples if all were sampled from the same person/same day/same procedure?

Thank you very much!

@johanneswerner johanneswerner changed the title Input data Input data format Nov 6, 2019
@nephantes
Copy link
Member

nephantes commented Nov 7, 2019

  1. You don't need transcript column. If you have unique gene IDs/names in the first column and raw counts in the other columns, it should work.
  2. If you don't want to do batch effect correction, you don't need to define the batch column or upload the second metadata file. Metadata file has two functionality, one of them is for batch effect correction, the second one is for choosing the conditions faster in DE.
    Thanks for pointing out, I have fixed the documentation for these two issues.

@johanneswerner
Copy link
Author

thank you very much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants