Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project progress report for team_Methylation-Badassays #5

Open
nivretta opened this issue Mar 16, 2017 · 3 comments
Open

Project progress report for team_Methylation-Badassays #5

nivretta opened this issue Mar 16, 2017 · 3 comments

Comments

@nivretta
Copy link
Contributor

nivretta commented Mar 16, 2017

@rbalshaw @farnushfarhadi

@farnushfarhadi
Copy link

Hi @STAT540-UBC/team-badassays

Thank you for submitting the progress report.

A few comments:

  • I can see that you are having different folders in your repo for different part of the analysis. make sure you keep having clean and organized repo once you add more scripts and files.

  • Overall, your main progress was in the preprocessing part. I understand this took longer than you expected but you want to make sure that your project is doable within the timeline of the class. The good side is that you are now sure about the quality control part of your project.

  • Since logistic regression is not fully covered in the class, you might need to spend some time to learn about it. Also, implementing CV might be tricky as well. So, I suggest that you put as much time as you can in building your classifiers as soon as possible. I can see a great analytical power in your group! Make sure you put enough time daily! Go ahead!

  • I guess it was not clear for you how to write the progress report. The questions in the rubric were to give to an idea of what you mainly need to include in the progress report. It is okay that you wrote it in this way. But we were mainly asking for having different sections of preprocessing, progress on methodology and some results. We did not mean to have specific answers for each of the questions in the rubric.

  • Thanks for references.

  • For logistic regression, you can use glm function. With this, you compute the probability of being Asian to probability of not being Asian (which is caucasian in your case) as your response.

  • Please make sure you keep yourself on the track! so you progress.. ask your questions and feel free to ask for meeting. Rob is at BC Centre for Disease Control so you can visit him there.

@rbalshaw your thoughtful comments are highly welcomed.

Good luck team! :)

@rbalshaw
Copy link

It looks like you have made good progress getting your data into R, reviewing it for quality, and conducting normalization, etc.

Your plan, laid out in S.1.2 looks pretty solid. A regularized logistic regression model seems a sensible thing to try. Cross-validation as you describe (and as packages like caret should make fairly straight-forward) will help you to understand the performance of the model for identifying the Caucasian vs. Asian samples and help reduce overfitting.

You next plan (step 3) to do unsupervised analyses of these data (PCA) and hope to see that some of the PCs are associated with self-reported ethnicity in the training data. This is a sensible idea, but I tend to think of this type of analysis as a precursor to the logistic regression (a supervised technique). Not a big deal, though. Plotting these PC values for the test data -- where you cannot confirm the ethnicity -- will be very interesting.

I would suggest that you could also do a PCA using only the features selected by the regularized logistic regression. This plot will almost certainly show some differences between the ethnicities in the training data (you should think about this and make sure it's clear why this is so) -- and if you are lucky, and your hypothesis is valid, you may see similar structures when you plot these PCs for the test data.

You have a bit of a hurdle to clear with getting your processed data back into R - but that seems something that we might be able to look at over the phone and with screen sharing (Webex or Skype?)

Please let me know if anyone on the team would like to chat. Best would be to contact me by email: robert.balshaw@bccdc.ca

@farnushfarhadi
Copy link

Hey @STAT540-UBC/team-badassays

Please make sure most of your team members are coming to seminar tomorrow :) Rob will be there as well! We can discuss things in your project together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants