Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Applying the bloom filter for HASH160 addresses #55

Closed
oguzdelioglu opened this issue Mar 20, 2021 · 11 comments
Closed

Applying the bloom filter for HASH160 addresses #55

oguzdelioglu opened this issue Mar 20, 2021 · 11 comments

Comments

@oguzdelioglu
Copy link

oguzdelioglu commented Mar 20, 2021

Example Address List.

0af80f98e511577e5c87d56069896c70fe4a6263
0af81021340c23bb20429e47c3bdd62137c322d3
0af811994346cdca9d6dbe8e7da4f3393130fc71
0af81250856d377cf7f6914ebe196a9570ada313
0af814b46c4ef5c57dbc53ae03c812eea37d4aac
0af816bb57c0adac8bca0401d295724cd80b0956
64dfb83dcca1f516d2bac93c8c9e4eb6513fd364
64dfb8c23802f3aa67c9078c0e7970fa17584892
64dfb8c5d21743ba3dd8cef5b576e1ddf6d4c239
64dfb931e9d68aa68842efa3c5d0bc3c1101b5ec
64dfba34a39823c273a33fadf49112c063ec3724

It gives quite a lot of positive and false results and for this reason I cannot use it.
I want to use bloom filter to filter about 25 million hash160 addresses.
How can I use the settings more efficiently?

I want to use it for this project.
https://github.com/oguzdelioglu/odelbrain

@lemire
Copy link
Member

lemire commented May 13, 2021

Can you please run the following code...

    filter := bloom.NewWithEstimates(25000000, 0.01) 
    filter.Add([]byte("0af80f98e511577e5c87d56069896c70fe4a6263"))
    filter.Add([]byte("0af81021340c23bb20429e47c3bdd62137c322d3"))
    filter.Add([]byte("0af811994346cdca9d6dbe8e7da4f3393130fc71"))
....

You should get very close to a 1% false-positive rate. If you do not, please report back on what you do get.

@lemire
Copy link
Member

lemire commented May 17, 2021

I am going to close this. Please run the suggested code above or, at least, provide a full code sample so we can verify the false-positive issue.

Note that when creating the Bloom filter, you need to know both the capacity and the desired false-positive rate. This is not a limitation of this library, it is how the Bloom filter works.

If you keep adding entries to the same Bloom filter beyond the initial capacity, the false-positive rate will go up.

@lemire lemire closed this as completed May 17, 2021
@oguzdelioglu
Copy link
Author

boom.NewBloomFilter(uint(addressCount), 0.01)

addressCount is total wallets (Count:24605923)

When I do 0.01 I get an exorbitantly high number of false positives.

I generated random wallets = 5229707
Found false positives = 53393

@lemire
Copy link
Member

lemire commented Sep 8, 2021

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

@oguzdelioglu
Copy link
Author

@oguzdelioglu If you find that bloom.NewWithEstimates does not work, please provide fully reproducing code sample. Create a new issue.

https://github.com/oguzdelioglu/odelbrain/blob/master/main.go

Total Wallet: 24605923
Elapsed Time = 1m0.3868661s
Generated Wallet = 6867766
Generate Speed Avg(s) = 114462
Found = 69115

It creates 70,000 false positives in approximately 7 million inquiries.

@oguzdelioglu
Copy link
Author

I using now false positive 0.0000001 and not bad now.But its maybe more beautiful

@lemire
Copy link
Member

lemire commented Sep 8, 2021

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

@oguzdelioglu
Copy link
Author

@oguzdelioglu Your code is a bit complicated. Can you trim it to only as few lines as possible. There should be no need for goroutines, for example.

I use goroutine to do faster brute-forcing with more cores.

Doesn't it work well with Bloomfilter goroutine?

I first define it as follows, then I add the addresses, then I check.

//Settings
sbf = boom.NewWithEstimates(uint(addressCount), 0.0000001)

//Insert Address to bloom
for _, address := range addressList {
		sbf.Add([]byte(address))
}

//Check Wallet is exist in bloom
if sbf.Test([]byte(randomWallet.base58BitcoinAddress)) {
	SaveWallet(randomWallet)
	totalFound++
}

@lemire
Copy link
Member

lemire commented Sep 9, 2021

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.

	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}

This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

@oguzdelioglu
Copy link
Author

Please run the following and tell me what float64(count)/1000.0 is at end of the routine.

	f := NewWithEstimates(1000, 0.001)
	for i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i)
		f.Add(n)
	}
	count := 0

	for  i := uint32(0); i < 1000; i++ {
		n := make([]byte, 4)
		binary.BigEndian.PutUint32(n, i + 1000)
		if f.Test(n) {
			count += 1
		}
	}

This adds the keys from 0 to 1000 and the it checks that the keys from 1000 to 2000 are in the set. It is a measure of the fpp.

If you believe that there is a bug, then you need to present such a simple example.

I cannot go through your own code.

The first expectation you should have is that your code is at fault. This library is used by hundreds of people for years. So if you think that there is a bug in the library (and not in your code), you have to come up with a carefully crafted reproducible test case. Do not submit a large blog of complicated code. If you do, then my assumption is that the bug is in your code.

10 billion test 2000 false positive now.
I set fp rate = 0.0000001

I was able to minimize false positives with this setting.

I don't think we can reduce it even more :)

@lemire
Copy link
Member

lemire commented Sep 12, 2021

I expect that you might be misusing the library. The fp rate that you specify should match very closely the fp that you measure empirically. If not, please build a simple example as I suggested above to demonstrate the issue.

The fp rate that you specify is unlikely to be practical. It is very low.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants