Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] train_test_split dataset based on replicate groups #112

Open
chirranjeevigopal-TRI opened this issue Oct 8, 2020 · 0 comments
Labels
enhancement New feature or request

Comments

@chirranjeevigopal-TRI
Copy link
Contributor

The BeepDataset class has a method for generating train-test splits based on cell_IDs or unique sequence numbers. However, if there are replicate measurements for the same protocol, I would like to ensure that all replicates fall in the same group (train or test)

As a solution, include a method that first performs unique-parameter groupings based no the protocol parameter file, and then randomizes these groupings into train/test at prescribed level. Ensure the fraction of train vs test is maintained (even if some parameter groups have significantly more replicates than others).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant