Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Can we discover subtypes in a training test, and use the discovered subtypes to subtype subjects of a test set? #44

Closed
little-nem opened this issue Mar 13, 2023 · 1 comment

Comments

@little-nem
Copy link

Hi SuStaIn team!

I am trying to use SuStaIn with a train / test like approach, in which I have two dataset:

  • the training one, that I want to use as to infer the summary subtypes (that is to say: subtype 1 is this sequence of abnormalities, subtype 2 is this sequence, etc), and I do not really worry about the actual subtyping of the individual subjects in this dataset. This is where I would run the run_sustain_algorithm method, if i'm correct.
  • the test one, that I would like to use as follows: given the subtypes discovered on the training set, I want to subtype (and maybe stage, even though I'm less interest in this) these new subjects with respect to the subtypes discovered on the training set. Intuitively, if I go back to the following notations from the Young et al. (2018) paper, if (f_c, S_c)_c are the subtypes (and their prevalence) discovered at the training step, given some new X (the test data), I would want to evaluate P(X | S_c) for each c, and infer the best subtype for each of my new subjects from this mixture of subtypes.
    image

So it seems to me that this makes sense from a methodological point of view (but I could be mistaken 😅).

Now I don't seem to find exactly how I would proceed to perform this last step, given the output from the first step. I went back to the notebook from the workshop (that I had followed some time ago) and it looks to me that the presented cross_validate_sustain_model mainly focuses on cross validation metrics, rather than outputting the subtypes corresponding to the "test" subtypes.

I am sorry if this is treated somewhere that I have missed, and don't hesitate if the question is somewhat unclear, I'm happy to rephrase or go more into details 🙂

Cheers,
Nemo

@noxtoby
Copy link
Member

noxtoby commented Mar 14, 2023

Hey Nemo. You should have a look at subtype_and_stage_individuals_newData(), presumably you're interested in this version for ZscoreSuStaIn

@noxtoby noxtoby closed this as completed Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants