-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Project progress report for team_Methylation-Badassays #5
Comments
Hi @STAT540-UBC/team-badassays Thank you for submitting the progress report. A few comments:
@rbalshaw your thoughtful comments are highly welcomed. Good luck team! :) |
It looks like you have made good progress getting your data into R, reviewing it for quality, and conducting normalization, etc. Your plan, laid out in S.1.2 looks pretty solid. A regularized logistic regression model seems a sensible thing to try. Cross-validation as you describe (and as packages like caret should make fairly straight-forward) will help you to understand the performance of the model for identifying the Caucasian vs. Asian samples and help reduce overfitting. You next plan (step 3) to do unsupervised analyses of these data (PCA) and hope to see that some of the PCs are associated with self-reported ethnicity in the training data. This is a sensible idea, but I tend to think of this type of analysis as a precursor to the logistic regression (a supervised technique). Not a big deal, though. Plotting these PC values for the test data -- where you cannot confirm the ethnicity -- will be very interesting. I would suggest that you could also do a PCA using only the features selected by the regularized logistic regression. This plot will almost certainly show some differences between the ethnicities in the training data (you should think about this and make sure it's clear why this is so) -- and if you are lucky, and your hypothesis is valid, you may see similar structures when you plot these PCs for the test data. You have a bit of a hurdle to clear with getting your processed data back into R - but that seems something that we might be able to look at over the phone and with screen sharing (Webex or Skype?) Please let me know if anyone on the team would like to chat. Best would be to contact me by email: robert.balshaw@bccdc.ca |
Hey @STAT540-UBC/team-badassays Please make sure most of your team members are coming to seminar tomorrow :) Rob will be there as well! We can discuss things in your project together. |
@rbalshaw @farnushfarhadi
(edit 1, typo removed: 7bd8ad2 )
(edit 2, added precursory exploratory analysis: 0c00794 )
The text was updated successfully, but these errors were encountered: