-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset #6
Comments
@FarshidShekari I'm not sure which dataset you are referring to. For the original GEO/GTEx dataset, can you please follow the instructions on the README page for data accessing |
I send you an email, I grateful if you reply. |
@FarshidShekari I've replied your email. |
thanks, but I need that files data in CSV files separately. |
@FarshidShekari Sorry, but I only have the original data in gctx format and I'm using l1ktools v1.1 to read/process the data as documented in the README. |
thank you. |
Can you explain more about your datasets? |
@FarshidShekari All the details have been documented in the published paper and supplementary: |
thanks, @yil8. |
What's the |
@FarshidShekari The preprocess.sh contains all the details you can go through. |
Yes, that's true, It just shows the steps for preparing data for the model. I read your code in each file for each step and for that reason, I asked the above question. |
@FarshidShekari The 1000G data is used as the validation dataset for RNA-Seq, see code here https://github.com/uci-cbcl/D-GEX/blob/master/H1_0-4760.py#L152 . In the paper, it says |
@FarshidShekari I highly recommend you not rewrite your own code with another packages other than provided within this repo since I'm not familiar with it and haven't tested it thoroughly. For line by line explanation/debugging of the code, I'm sorry but I'm afraid I won't have enough time for that. Would you please try to explore the original l1ktools as documented within the README. |
I'm trying to use l1ktools but I get below error when running |
@yil8 you run your code on linux or windows? |
@FarshidShekari Linux, probably ubuntu 14.04 |
thank you. |
Hi, @yil8 Thank you for helping me.
I want to sumarize above your answer in below sentences: |
@FarshidShekari Yes, you are correct. |
I want to know Why you loaded multiple files generated from GEO dataset in this below code:
according to your previus answer, determined line what's for used? |
@FarshidShekari I recommend you get familiar with basic python and numpy operations, and then probably you will get through your issues quickly. |
I have no problem with Numpy, just I want to know why used these for files |
@FarshidShekari There are two validation set, 1000G for RNA-Seq, and GEO-va for GEO(microarray) data. The original paper says
|
According to paper And second model, acording to paper is in below order:
My qustion about the above sentences is: Why you used GEO-tr for training and after test with other platform test data? |
|
The validation data 1000G-va was used to do model selection and parameter tuning for all the methods. |
@FarshidShekari I'm not sure which part of this sentence confuses you, it literally means what it means |
hi @yil8 , How you compare D-GEX result with linear regression for all target genes? |
@FarshidShekari I think it's all written in the paper |
I saw in the paper, My question is how you calculated |
@FarshidShekari Like I said it's written in the paper : |
@FarshidShekari I guess there must be something wrong with your code, the error of 7.15 is significantly higher than the error reported in the paper (all of which are less than 1) |
This is my code:
|
@FarshidShekari I think your code looks fine. When you |
my code output is for two target gene:
|
@FarshidShekari Can you try to print the error of all 9520 genes? It seems some of your error looks reasonable, e.g. 0.766 0.635, while some others are pretty big. |
Sorry, Not yet(because that system I use in the lab now working on other projects), I run it next week, When I rut it, I comment here. |
@FarshidShekari 1. that's for validation dataset not for test dataset. 2. Yes, I averaged the output of those two. |
I run it for 604 target gene and model score was |
@FarshidShekari That looks within the reasonable range of the numbers I reported. Can you do this for all the ~9000 target genes? |
Hi dear |
@FarshidShekari The two MAE on GEO and GTEx are reported in Table 1 and Table 2 in the original paper respectively, and they are neither 78% nor 99.68%. |
thanks, dear @yil8 |
Hi @yil8 thanks for the reply |
@FarshidShekari Sorry, but I may not be able to answer all your questions in details, given it's published about three years ago, and I'm currently mainly focusing on other business, but I'll try my best once I have time though. |
your welcome @yil8 , If you could answer, I would grateful. |
Hi
can you save your dataset in CSV file and send the link for download?
thanks for it
The text was updated successfully, but these errors were encountered: