New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coin selection improvement #10096
base: master
Are you sure you want to change the base?
Coin selection improvement #10096
Conversation
8709686
to
54f3cd4
Compare
Thanks for this! This is an important PR. Can I get someone to review things before I do? Maybe @turbolay? You're pretty talented in understanding the most complex problems and this is one of the most important part of the code to get familiar with. |
Ty for the details on the context you provided in this PR, they were really useful. Review/tl;drcACK This PR is an adjustment of an existing concept that tries to mitigate the problem of big If we change the
ReformulationI will try to reformulate the problem and the purpose of this to be sure I and everyone else get it: If an output is: Formula choice happens here (I inverted the if in my short explanation above so that it's easier to understand): WalletWasabi/WalletWasabi/Blockchain/Analysis/BlockchainAnalyzer.cs Lines 141 to 144 in a695f7d
This PR tries to be more precise on whether or not we should recursively remove an input from the selection if one selected coin can potentially have a big negative impact on the The formula implemented in this PR based on generalized weight mean gives a score to the selection regarding the potential negative impact on The parameters are for adjusting the balance between the weight of the amount ( Go further proposalIntuitively, a different approach than your PR (not mutually exclusive) would be for the
For example, this TX: https://mempool.space/tx/3e3fbad73bbca60615b5fe1724c94a675192b10f468aa58d728303e5d19b6eef#flow=&vin=0 Instead of one This is decided here: WalletWasabi/WalletWasabi/WabiSabi/Client/AmountDecomposer.cs Lines 296 to 298 in 5660501
My proposition is: change |
var coinsToSelectFrom = Enumerable | ||
.Empty<SmartCoin>() | ||
.Append(bigCoinWithSmallAnonymity1) | ||
.Append(bigCoinWithSmallAnonymity2) | ||
.Append(smallCoinWithBigAnonymity) | ||
.ToList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative would be:
var coinsToSelectFrom = Enumerable | |
.Empty<SmartCoin>() | |
.Append(bigCoinWithSmallAnonymity1) | |
.Append(bigCoinWithSmallAnonymity2) | |
.Append(smallCoinWithBigAnonymity) | |
.ToList(); | |
List<SmartCoin> coinsToSelectFrom = new() { bigCoinWithSmallAnonymity1, bigCoinWithSmallAnonymity2, smallCoinWithBigAnonymity }; |
or
var coinsToSelectFrom = Enumerable | |
.Empty<SmartCoin>() | |
.Append(bigCoinWithSmallAnonymity1) | |
.Append(bigCoinWithSmallAnonymity2) | |
.Append(smallCoinWithBigAnonymity) | |
.ToList(); | |
List<SmartCoin> coinsToSelectFrom = new() | |
{ | |
bigCoinWithSmallAnonymity1, | |
bigCoinWithSmallAnonymity2, | |
smallCoinWithBigAnonymity | |
}; |
Assert.False(coins.Contains(bigCoinWithSmallAnonymity1) && coins.Contains(smallCoinWithBigAnonymity)); | ||
Assert.False(coins.Contains(bigCoinWithSmallAnonymity2) && coins.Contains(smallCoinWithBigAnonymity)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two checks make me think that the asserts are more general because we use random number generator but the rng returns a fixed number. Would it make sense to run this test, eg, 100 times with random RNG seed (i.e. make it random instead of deterministic) so that the test can actually fail if something is not as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would make sense. And I suppose the same holds true for other tests in this file. For this reason I suggest you to address it in another issue or pull request.
public void DoNotSelectCoinsWithBigAnonymityLoss() | ||
{ | ||
// This test ensures that we do not select coins whose anonymity could be lowered a lot | ||
const int AnonymitySet = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to switch from [Fact]
to [Theory]
with multiple AnonymitySet
values to make the test more robust?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would make sense. Nevertheless, it is not easy to generate such test vectors, because the coin selection algorithm is very complex and doesn't behave deterministically.
WalletWasabi.Tests/UnitTests/WabiSabi/Client/CoinJoinCoinSelectionTests.cs
Outdated
Show resolved
Hide resolved
// * GeneralizedWeightedAverage(source, value, weight, p) goes to Max(source, value) as p goes to the infinity | ||
// * GeneralizedWeightedAverage(source, value, weight, p) goes to Min(source, value) as p goes to the minus infinity | ||
// * GeneralizedWeightedAverage(source, value, weight, p) <= GeneralizedWeightedAverage(source, value, weight, q) provided p < q | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check the parameters more:
if (p == 0) | |
{ | |
throw new ArgumentException("Non-zero value is expected.", nameof(p)); | |
} | |
If this is not added, it will crash later in because 1 / 0 is not defined (AFAIK).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 3cd7ea2.
The changes look great. I would like to re-review this PR in the afternoon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Very nice description.
Another pair of eyes for reviewing would be certainly useful as this annon stuff in general is non-trivial.
Unless other comments arrive I am merging this a week from now. |
Co-authored-by: nopara73 <adam.ficsor73@gmail.com>
Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>
Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>
@turbolay asked me to hold on with the merge, so I won't be merging this for now. |
The reason is that it seems this PR, even if the concept makes total sense, tends to create more coins for the user than master, which result in no gain on mining fees/time, even if the PR reduces the |
@turbolay any news here? |
I didn't test more after my previous statement because of prioritization.
-> I fear an unforeseeable consequence, something bad we do not see coming. Yet, I agree and understand that it can also tremendously improve the current consolidation and privacy loss issues. |
No need for concern, my intention in commenting wasn't to press you. I simply wanted to ensure that the matter isn't overlooked. |
This has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We basically ended up implementing what I explained in my comment:
Now that the work on the AmountDecomposer is finished, it would be interesting to consider again this PR. At first glance, I believe that its impact will now be minimal, but I think the PR still makes sense. |
This has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
TLDR
This pull request improves the coin selection algorithm. The goal was to reduce the probability of anonymity loss that can occur when the algorithm selects coins whose anonymity scores are too far from each other.
Problem statement
The anonymity score of a transaction output is computed as an anonymity inherited from the inputs of the transaction belonging to the same person as the output plus an anonymity gained from the transaction itself. The inherited anonymity is either
Usually the first formula is used. Nevertheless, if there is a suspicion that the coinjoin sudoku problem could be solvable for the output then the second formula is used. The problem with this second case is that the (weighted) average anonymity score of the outputs can be smaller that the (weighted) average anonymity score of the inputs.
Let say, for example, a user registers the following inputs:
The average anonymity score of the inputs weighted by their values is (4 * 0.4 + 5 * 0.5 + 9 * 0.1) / (0.4 + 0.5 + 0.1) = 5. Let say the user registered one output with the value of 1 BTC. Suppose there is a suspicion the coinjon sudoku problem could be solvable for the output therefore the second formula is used. Suppose there is no other output with the value of 1 BTC which means the anonymity gain from the transaction is zero. The anonymity score of the output would be 4 which is decrease in the average anonymity score. From the anonymity point of view, it would be better for the user not to participate in the transaction at all. Moreover, the user sees that they had a utxo whose anonymity score was reduced from 9 to 5.
I agree this is a kind of an extreme example that nevertheless can occur when you register far more coins (in terms of the total amount) than your co-participants.
In e1773b5 and e1773b5, we extended the coin selection algorithm by a piece of code that removes coins from the final set of coins until the worst expected anonymity loss is less than a certain value. The worst expected anonymity loss is computed as the average difference between a coin anonymity and the coin with the minimum anonymity weighted by the value of the coin.
In our experience, it doesn't seem to be enough. The worst expected anonymity loss in the example above would be ((4 - 4) * 0.4 + (5 - 4) * 0.5 + (9 - 4) * 0.1) / (0.4 + 0.5 + 0.1) = 1.
Proposed solution
In my opinion, the formula for the anonymity loss should be modified in the following way:
I developed a formula that combines both the ideas. The formula is based on generalized weighted mean. Let say we have coins with anonymity scores a1, ..., an and values v1, ..., vn. Let a = min(a1, ..., an). The anonymity loss is computed as follows:
(((a1 - a)p * v1q + ... + (an - a)p * vnq) / (v1q + ... + vnq))1/p
The numbers p and q are suitable chosen numbers.The formula has a few interesting special cases:
I computed the anonymity loss for several coin sets and try to choose p and q so it corresponds best to my subjective perception of anonymity loss. I ended up with p = 10 and q = 0.8 which is in compliance with the two ideas that I had before.