-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary self organizing map #53
Comments
Are you asking for an example on how to use BM? Thank you! |
yes, exactly, on the use case of a Self Organizing Map of binary vectors. Usually, SOM are used on integers or float vectors as you can see on these to pages : In my case I need to do this kind of classification on sparse binary vectors. BitMagic seems to be a very very good candidate for that. The naïve SOM algorithm is very simple :
The map will progressively evolved and finally each vector to classify will have a best location identified and all similar vectors in the ~ 10 millions will also be associated to locations nearby on the map. Now imaging you have ~ 10 millions of extremely sparse binary vectors (same size ~ 1Mbits) and you want to place them on a 2D map of 128 x 128 locations. As you can see, a lot of vectors will be loaded in RAM, a lot of BV comparisons will be performed, a lot of hamming distance will be calculated, a lot of bit modification will be done, so the question is how to correctly use BitMagic for best / maximum speed ? (32/64/128Go RAM available) https://www.semanticscholar.org/paper/An-alternative-approach-for-binary-and-categorical-Santana-Morais/f1f079640218b72822911ef3dc7394765703d62a/figure/0 |
Yes, i understand the problem. It creates binary fingerprints and implements primitive (but relatively fast) clusterization using similarity measure (COUNT_AND). Not quite well documented yet, but the code should be instructive as a use case for how to compute binary distances. I am planning to create a separate example on how to do a more efficient distance computations for Hamming, Jacquard distances using cache locking and SIMD. Current distance compute is fast but can be even faster actually to facilitate binary SOMs. |
Awesome, many thanks for the hint. |
xsample07a is very interesting, although it's difficult to understand exactly why you've choosen some technics. never mind. this project is a side project for me. once I have written something (in few weeks) would you mind my showing to you ? in order to have your advices concrete bitmagic usage. |
Surest thing. I would certainly appreciate your example, please share it. |
Hi,
First I want to thank you for this awesome library.
I'm experimenting on binary NLP technics and I was wondering if you could publish a bitmagic version of SOM (self organizing map) using about ~ 10 millions of extremely sparse (< 5 / 100 000) binary vectors of same size ~ 1Mbits. The map could be for instance 128 x 128 locations.
It could be a huge step froward for me to see a correct use of bitmagic on this particular subject.
Thanks for your help.
The text was updated successfully, but these errors were encountered: