Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for distributed feature extraction / training #15

Open
3 tasks
pierotofy opened this issue Apr 7, 2023 · 2 comments
Open
3 tasks

Support for distributed feature extraction / training #15

pierotofy opened this issue Apr 7, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@pierotofy
Copy link
Member

pierotofy commented Apr 7, 2023

  • Modify pctrain by adding a --extract-features <path>.opcfeat.bin parameter. When set, execution should stop at https://github.com/uav4geo/OpenPointClass/blob/main/randomforest.cpp#L30 and https://github.com/uav4geo/OpenPointClass/blob/main/gbm.cpp#L45
  • Serialize the required vectors (for RF that's gt, ft, GBT populates the structures similarly although not identically). It might also be possible to serialize in a single format regardless of RF or GBT if one creates a new function that simply does the serialization (like train, but stops after creating the features). One might want to encode the various scale, radius, treeDepth, etc. parameters into the serialized output to avoid repeating them and validating other serialized outputs. All serialized output's parameters from different processes need to match.
  • Modify pctrain by checking for .opcfeat.bin file input extensions; if all files passed as input are .opcfeat.bin, then read features directly instead of computing them by adapting the rf::train and gbt::train functions. If you have serialized the scale, radius, etc. parameters one can read them from the serialized files instead of passing them manually.
@pierotofy pierotofy added the enhancement New feature or request label Apr 7, 2023
@hobu
Copy link

hobu commented Apr 18, 2023

Implemented over here but only for rf, and it isn't so convenient to use quite yet. It seems to work ok though.

@Ty4Code
Copy link

Ty4Code commented Jan 27, 2024

Implemented over here but only for rf, and it isn't so convenient to use quite yet. It seems to work ok though.

This is awesome stuff, thanks for sharing!

I'm considering trying to adapt it to save the files in something more Python/Pandas friendly like .CSV. If anyone might find it useful let me know, otherwise will just use it for my own training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants