Input data format #209

johanneswerner · 2019-11-06T23:48:40Z

I have two questions about the input data:

I am not clear what the transcript column means. I used featurecounts (from the rsubread package) to create my count table, where the row names are the gene names predicted from the metagenome and all other columns comprise the unnormalized counts. But I am not clear what the transcript column represent.
I found this paper for an example on batch effects: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880143/

Batch effects are sub-groups of measurements that have qualitatively different behaviour across conditions and are unrelated to the biological or scientific variables in a study. For example, batch effects may occur if a subset of experiments was run on Monday and another set on Tuesday, if two technicians were responsible for different subsets of the experiments or if two different lots of reagents, chips or instruments were used. These effects are not exclusive to high-throughput biology and genomics research1, and batch effects also affect low-dimensional molecular measurements, such as northern blots and quantitative PCR. Although batch effects are difficult or impossible to detect in low-dimensional assays, high-throughput technologies provide enough data to detect and even remove them. However, if not properly dealt with, these effects can have a particularly strong and pervasive impact. Specific examples have been documented in published studies2,3 in which the biological variables were extremely correlated with technical variables, which subsequently led to serious concerns about the validity of the biological conclusions4,5.

Would this mean, that the batch column would have the value "1" for all samples if all were sampled from the same person/same day/same procedure?

Thank you very much!

nephantes · 2019-11-07T14:36:20Z

You don't need transcript column. If you have unique gene IDs/names in the first column and raw counts in the other columns, it should work.
If you don't want to do batch effect correction, you don't need to define the batch column or upload the second metadata file. Metadata file has two functionality, one of them is for batch effect correction, the second one is for choosing the conditions faster in DE.
Thanks for pointing out, I have fixed the documentation for these two issues.

johanneswerner · 2019-11-07T19:36:20Z

thank you very much

johanneswerner changed the title ~~Input data~~ Input data format Nov 6, 2019

johanneswerner closed this as completed Nov 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input data format #209

Input data format #209

johanneswerner commented Nov 6, 2019 •

edited

Loading

nephantes commented Nov 7, 2019 •

edited

Loading

johanneswerner commented Nov 7, 2019

Input data format #209

Input data format #209

Comments

johanneswerner commented Nov 6, 2019 • edited Loading

nephantes commented Nov 7, 2019 • edited Loading

johanneswerner commented Nov 7, 2019

johanneswerner commented Nov 6, 2019 •

edited

Loading

nephantes commented Nov 7, 2019 •

edited

Loading