[Feature Request] train_test_split dataset based on replicate groups #112

chirranjeevigopal-TRI · 2020-10-08T22:23:38Z

The BeepDataset class has a method for generating train-test splits based on cell_IDs or unique sequence numbers. However, if there are replicate measurements for the same protocol, I would like to ensure that all replicates fall in the same group (train or test)

As a solution, include a method that first performs unique-parameter groupings based no the protocol parameter file, and then randomizes these groupings into train/test at prescribed level. Ensure the fraction of train vs test is maintained (even if some parameter groups have significantly more replicates than others).

chirranjeevigopal-TRI added the enhancement New feature or request label Oct 8, 2020

chirranjeevigopal-TRI assigned chirranjeevigopal-TRI and unassigned chirranjeevigopal-TRI Oct 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] train_test_split dataset based on replicate groups #112

[Feature Request] train_test_split dataset based on replicate groups #112

chirranjeevigopal-TRI commented Oct 8, 2020

[Feature Request] train_test_split dataset based on replicate groups #112

[Feature Request] train_test_split dataset based on replicate groups #112

Comments

chirranjeevigopal-TRI commented Oct 8, 2020