Applying the bloom filter for HASH160 addresses #55

oguzdelioglu · 2021-03-20T09:20:48Z

Example Address List.

0af80f98e511577e5c87d56069896c70fe4a6263
0af81021340c23bb20429e47c3bdd62137c322d3
0af811994346cdca9d6dbe8e7da4f3393130fc71
0af81250856d377cf7f6914ebe196a9570ada313
0af814b46c4ef5c57dbc53ae03c812eea37d4aac
0af816bb57c0adac8bca0401d295724cd80b0956
64dfb83dcca1f516d2bac93c8c9e4eb6513fd364
64dfb8c23802f3aa67c9078c0e7970fa17584892
64dfb8c5d21743ba3dd8cef5b576e1ddf6d4c239
64dfb931e9d68aa68842efa3c5d0bc3c1101b5ec
64dfba34a39823c273a33fadf49112c063ec3724

It gives quite a lot of positive and false results and for this reason I cannot use it.
I want to use bloom filter to filter about 25 million hash160 addresses.
How can I use the settings more efficiently?

I want to use it for this project.
https://github.com/oguzdelioglu/odelbrain

lemire · 2021-05-13T15:20:02Z

Can you please run the following code...

    filter := bloom.NewWithEstimates(25000000, 0.01) 
    filter.Add([]byte("0af80f98e511577e5c87d56069896c70fe4a6263"))
    filter.Add([]byte("0af81021340c23bb20429e47c3bdd62137c322d3"))
    filter.Add([]byte("0af811994346cdca9d6dbe8e7da4f3393130fc71"))
....

You should get very close to a 1% false-positive rate. If you do not, please report back on what you do get.

lemire · 2021-05-17T13:14:50Z

I am going to close this. Please run the suggested code above or, at least, provide a full code sample so we can verify the false-positive issue.

Note that when creating the Bloom filter, you need to know both the capacity and the desired false-positive rate. This is not a limitation of this library, it is how the Bloom filter works.

If you keep adding entries to the same Bloom filter beyond the initial capacity, the false-positive rate will go up.

oguzdelioglu · 2021-09-08T01:33:15Z

boom.NewBloomFilter(uint(addressCount), 0.01)

addressCount is total wallets (Count:24605923)

When I do 0.01 I get an exorbitantly high number of false positives.

I generated random wallets = 5229707
Found false positives = 53393

lemire · 2021-09-08T03:02:38Z

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

oguzdelioglu · 2021-09-08T07:45:45Z

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

https://github.com/oguzdelioglu/odelbrain/blob/master/main.go

Total Wallet: 24605923
Elapsed Time = 1m0.3868661s
Generated Wallet = 6867766
Generate Speed Avg(s) = 114462
Found = 69115

It creates 70,000 false positives in approximately 7 million inquiries.

oguzdelioglu · 2021-09-08T11:07:30Z

I using now false positive 0.0000001 and not bad now.But its maybe more beautiful

lemire · 2021-09-08T15:16:42Z

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

oguzdelioglu · 2021-09-08T19:55:15Z

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

I use goroutine to do faster brute-forcing with more cores.

Doesn't it work well with Bloomfilter goroutine?

I first define it as follows, then I add the addresses, then I check.

//Settings
sbf = boom.NewWithEstimates(uint(addressCount), 0.0000001)

//Insert Address to bloom
for _, address := range addressList {
		sbf.Add([]byte(address))
}

//Check Wallet is exist in bloom
if sbf.Test([]byte(randomWallet.base58BitcoinAddress)) {
	SaveWallet(randomWallet)
	totalFound++
}

lemire · 2021-09-09T00:50:54Z

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.

	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}

This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

oguzdelioglu · 2021-09-11T02:02:53Z

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.
	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}
This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

10 billion test 2000 false positive now.
I set fp rate = 0.0000001

I was able to minimize false positives with this setting.

I don't think we can reduce it even more :)

lemire · 2021-09-12T00:45:58Z

I expect that you might be misusing the library. The fp rate that you specify should match very closely the fp that you measure empirically. If not, please build a simple example as I suggested above to demonstrate the issue.

The fp rate that you specify is unlikely to be practical. It is very low.

lemire closed this as completed May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying the bloom filter for HASH160 addresses #55

Applying the bloom filter for HASH160 addresses #55

oguzdelioglu commented Mar 20, 2021 •

edited

Loading

lemire commented May 13, 2021

lemire commented May 17, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 9, 2021

oguzdelioglu commented Sep 11, 2021

lemire commented Sep 12, 2021

Applying the bloom filter for HASH160 addresses #55

Applying the bloom filter for HASH160 addresses #55

Comments

oguzdelioglu commented Mar 20, 2021 • edited Loading

lemire commented May 13, 2021

lemire commented May 17, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 8, 2021

oguzdelioglu commented Sep 8, 2021

lemire commented Sep 9, 2021

oguzdelioglu commented Sep 11, 2021

lemire commented Sep 12, 2021

oguzdelioglu commented Mar 20, 2021 •

edited

Loading