Coin selection improvement #10096

onvej-sl · 2023-02-10T14:51:22Z

TLDR

This pull request improves the coin selection algorithm. The goal was to reduce the probability of anonymity loss that can occur when the algorithm selects coins whose anonymity scores are too far from each other.

Problem statement

The anonymity score of a transaction output is computed as an anonymity inherited from the inputs of the transaction belonging to the same person as the output plus an anonymity gained from the transaction itself. The inherited anonymity is either

Usually the first formula is used. Nevertheless, if there is a suspicion that the coinjoin sudoku problem could be solvable for the output then the second formula is used. The problem with this second case is that the (weighted) average anonymity score of the outputs can be smaller that the (weighted) average anonymity score of the inputs.

Let say, for example, a user registers the following inputs:

0.4 BTC with anonymity score 4
0.5 BTC with anonymity score 5
0.1 BTC with anonymity score 9

The average anonymity score of the inputs weighted by their values is (4 * 0.4 + 5 * 0.5 + 9 * 0.1) / (0.4 + 0.5 + 0.1) = 5. Let say the user registered one output with the value of 1 BTC. Suppose there is a suspicion the coinjon sudoku problem could be solvable for the output therefore the second formula is used. Suppose there is no other output with the value of 1 BTC which means the anonymity gain from the transaction is zero. The anonymity score of the output would be 4 which is decrease in the average anonymity score. From the anonymity point of view, it would be better for the user not to participate in the transaction at all. Moreover, the user sees that they had a utxo whose anonymity score was reduced from 9 to 5.
I agree this is a kind of an extreme example that nevertheless can occur when you register far more coins (in terms of the total amount) than your co-participants.

In e1773b5 and e1773b5, we extended the coin selection algorithm by a piece of code that removes coins from the final set of coins until the worst expected anonymity loss is less than a certain value. The worst expected anonymity loss is computed as the average difference between a coin anonymity and the coin with the minimum anonymity weighted by the value of the coin.

In our experience, it doesn't seem to be enough. The worst expected anonymity loss in the example above would be ((4 - 4) * 0.4 + (5 - 4) * 0.5 + (9 - 4) * 0.1) / (0.4 + 0.5 + 0.1) = 1.

Proposed solution

In my opinion, the formula for the anonymity loss should be modified in the following way:

Instead of the weighted average it should use something that is more like the maximum. An extreme case would be max(4 - 4, 5 - 4, 9 - 4) = 4.
Coin values should have a less weight in the weighted average. An extreme case would be ((4 - 4) + (5 - 4) + (9 - 4)) / 3 = 2.

I developed a formula that combines both the ideas. The formula is based on generalized weighted mean. Let say we have coins with anonymity scores a₁, ..., a_n and values v₁, ..., v_n. Let a = min(a₁, ..., a_n). The anonymity loss is computed as follows:

(((a₁ - a)^p * v₁^q + ... + (a_n - a)^p * v_n^q) / (v₁^q + ... + v_n^q))^1/p

The numbers p and q are suitable chosen numbers.

The formula has a few interesting special cases:

If p = 1 and q = 1 then the formula becomes ((a₁ - a) * v₁ + ... + (a_n - a) * v_n) / (v₁ + ... + v_n), which is the formula used previously.
If q = 0 and p goes to the infinity then the formula becomes max((a₁ - a), ..., (a_n - a)), which corresponds to the extreme case of the first idea.
If p = 1 and q = 0 then the formula becomes ((a₁ - a) + ... + (a_n - a)) / n, which corresponds to the extreme case of the second idea.

I computed the anonymity loss for several coin sets and try to choose p and q so it corresponds best to my subjective perception of anonymity loss. I ended up with p = 10 and q = 0.8 which is in compliance with the two ideas that I had before.

nopara73 · 2023-02-12T10:40:46Z

Thanks for this! This is an important PR. Can I get someone to review things before I do? Maybe @turbolay? You're pretty talented in understanding the most complex problems and this is one of the most important part of the code to get familiar with.

turbolay · 2023-02-13T06:21:40Z

Ty for the details on the context you provided in this PR, they were really useful.

Review/tl;dr

cACK

This PR is an adjustment of an existing concept that tries to mitigate the problem of big AnonScore loss because of big inputs with small AnonScore. It makes a wise decision about potentially removing coins suspected to impact AnonScore for all outputs negatively.
It relies on three magic numbers (MaxWeightedAnonLoss, p, q), so we need to test with #9375, I didn't do it yet.

If we change the AmountDecomposer, we will need to find the best magic numbers again, and I made another proposal on the AmountDecomposer that continues the same idea, so we can:

Consider my proposal and implement or not something (I will open dedicated Issur/PR)
Test this PR to be sure about magic numbers MaxWeightedAnonLoss, p, q and merge.

Reformulation

I will try to reformulate the problem and the purpose of this to be sure I and everyone else get it:

If an output is: unique && (non-standard || one of the two biggest outputs), it will inherit the lowest AnonScore of the client's inputs, what OP called the second formula. Otherwise (standard, majority of cases), a weighted average of inputs is used, the first formula.

Formula choice happens here (I inverted the if in my short explanation above so that it's easier to understand):

WalletWasabi/WalletWasabi/Blockchain/Analysis/BlockchainAnalyzer.cs

Lines 141 to 144 in a695f7d

    
           if (anonymityGain < 1) 
        
           { 
        
           	// When WW2 denom output isn't too large, then it's not change. 
        
           	if (tx.IsWasabi2Cj is true && StdDenoms.Contains(virtualOutput.Amount.Satoshi) && virtualOutput.Amount < secondLargestOutputAmount)

This PR tries to be more precise on whether or not we should recursively remove an input from the selection if one selected coin can potentially have a big negative impact on the AnonymityScore outputs will inherit, regardless if they use the first or second formula.

The formula implemented in this PR based on generalized weight mean gives a score to the selection regarding the potential negative impact on AnonScore. It is used to decide if we should remove a coin (max acceptable threshold) or to rank subsets between them (the less the score, the better) to see which coin is the best to remove.

The parameters are for adjusting the balance between the weight of the amount (q) and the weight of AnonScore (p).

Go further proposal

Intuitively, a different approach than your PR (not mutually exclusive) would be for the AmountDecomposer to try to make "change" output (big non-standard) as small in amount possible, because they are the ones degrading outputs' AnonScore the most common case: first formula

AnonScore gain dynamics will be improved the same way as they are with this PR: if the "change" (low AnonScore big coin) from last CJ is smaller in amount, well in the next CJ it will take part as an input it will impact less negatively the WeigthedAverage than a bigger change like currently. The potential for improvement here seems really consequent.

For example, this TX: https://mempool.space/tx/3e3fbad73bbca60615b5fe1724c94a675192b10f468aa58d728303e5d19b6eef#flow=&vin=0
Output#1 42..BTC used the second formula (biggest and unique).

Instead of one 42 BTC output, AmountDecomposer could probably have made three more standards, 11.62..BTC, with still output in the second formula but of only 7.14..BTC.

This is decided here:

WalletWasabi/WalletWasabi/WabiSabi/Client/AmountDecomposer.cs

Lines 296 to 298 in 5660501

    
           var orderedCandidates = preCandidates 
        
           	.OrderBy(x => x.Cost) // Less cost is better. 
        
           	.ThenBy(x => x.Decomp.All(x => denomHashSet.Contains(x)) ? 0 : 1) // Prefer no change.

My proposition is: change less cost is better to lower change is better if cost is acceptable

kiminuo · 2023-03-01T18:59:14Z

WalletWasabi.Tests/UnitTests/WabiSabi/Client/CoinJoinCoinSelectionTests.cs

+		var coinsToSelectFrom = Enumerable
+			.Empty<SmartCoin>()
+			.Append(bigCoinWithSmallAnonymity1)
+			.Append(bigCoinWithSmallAnonymity2)
+			.Append(smallCoinWithBigAnonymity)
+			.ToList();


An alternative would be:

Suggested change

var coinsToSelectFrom = Enumerable

.Empty<SmartCoin>()

.Append(bigCoinWithSmallAnonymity1)

.Append(bigCoinWithSmallAnonymity2)

.Append(smallCoinWithBigAnonymity)

.ToList();

List<SmartCoin> coinsToSelectFrom = new() { bigCoinWithSmallAnonymity1, bigCoinWithSmallAnonymity2, smallCoinWithBigAnonymity };

or

Suggested change

var coinsToSelectFrom = Enumerable

.Empty<SmartCoin>()

.Append(bigCoinWithSmallAnonymity1)

.Append(bigCoinWithSmallAnonymity2)

.Append(smallCoinWithBigAnonymity)

.ToList();

List<SmartCoin> coinsToSelectFrom = new()

{

bigCoinWithSmallAnonymity1,

bigCoinWithSmallAnonymity2,

smallCoinWithBigAnonymity

};

kiminuo · 2023-03-01T19:10:14Z

WalletWasabi.Tests/UnitTests/WabiSabi/Client/CoinJoinCoinSelectionTests.cs

+		Assert.False(coins.Contains(bigCoinWithSmallAnonymity1) && coins.Contains(smallCoinWithBigAnonymity));
+		Assert.False(coins.Contains(bigCoinWithSmallAnonymity2) && coins.Contains(smallCoinWithBigAnonymity));


These two checks make me think that the asserts are more general because we use random number generator but the rng returns a fixed number. Would it make sense to run this test, eg, 100 times with random RNG seed (i.e. make it random instead of deterministic) so that the test can actually fail if something is not as expected?

Yes, it would make sense. And I suppose the same holds true for other tests in this file. For this reason I suggest you to address it in another issue or pull request.

kiminuo · 2023-03-01T19:13:21Z

WalletWasabi.Tests/UnitTests/WabiSabi/Client/CoinJoinCoinSelectionTests.cs

+	public void DoNotSelectCoinsWithBigAnonymityLoss()
+	{
+		// This test ensures that we do not select coins whose anonymity could be lowered a lot
+		const int AnonymitySet = 10;


Would it make sense to switch from [Fact] to [Theory] with multiple AnonymitySet values to make the test more robust?

Yes, it would make sense. Nevertheless, it is not easy to generate such test vectors, because the coin selection algorithm is very complex and doesn't behave deterministically.

WalletWasabi.Tests/UnitTests/WabiSabi/Client/CoinJoinCoinSelectionTests.cs

WalletWasabi/WabiSabi/Client/CoinJoinClient.cs

kiminuo · 2023-03-01T19:44:29Z

WalletWasabi/Extensions/LinqExtensions.cs

+		//   * GeneralizedWeightedAverage(source, value, weight, p) goes to Max(source, value) as p goes to the infinity
+		//   * GeneralizedWeightedAverage(source, value, weight, p) goes to Min(source, value) as p goes to the minus infinity
+		//   * GeneralizedWeightedAverage(source, value, weight, p) <= GeneralizedWeightedAverage(source, value, weight, q) provided p < q
+


I would check the parameters more:

Suggested change

if (p == 0)

{

throw new ArgumentException("Non-zero value is expected.", nameof(p));

}

If this is not added, it will crash later in because 1 / 0 is not defined (AFAIK).

Done in 3cd7ea2.

WalletWasabi/Extensions/LinqExtensions.cs

WalletWasabi/WabiSabi/Client/CoinJoinClient.cs

kiminuo · 2023-03-16T09:54:32Z

The changes look great.

I would like to re-review this PR in the afternoon.

kiminuo

LGTM. Very nice description.

Another pair of eyes for reviewing would be certainly useful as this annon stuff in general is non-trivial.

WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs

nopara73 · 2023-03-22T13:41:26Z

Unless other comments arrive I am merging this a week from now.

Co-authored-by: nopara73 <adam.ficsor73@gmail.com>

Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>

nopara73 · 2023-03-29T06:00:05Z

@turbolay asked me to hold on with the merge, so I won't be merging this for now.

turbolay · 2023-03-29T12:14:00Z

The reason is that it seems this PR, even if the concept makes total sense, tends to create more coins for the user than master, which result in no gain on mining fees/time, even if the PR reduces the AnonScore loss.
Also, in the standard case, the positive impact seems to exist but is relatively low.
Because testing and understanding the complete impact of code related to Coin Selection is complex, I don't want to rush a conclusion, wither positive or negative. The safe thing to do is wait and try to understand unforeseeable consequences.

nopara73 · 2023-04-19T02:26:43Z

@turbolay any news here?

turbolay · 2023-04-19T05:52:27Z

I didn't test more after my previous statement because of prioritization.
What I'm wondering about with this change is its potential negative impact on crumbling, something we are already suffering from.
I can be easily convinced of a merge because the PR makes total sense and is a direct improvement.
But I still think that merging without further testing for this specific case (does it or not make crumbling worse?) would be, I quote:

[A] leap into the abyss of the unknown, surrendering the security of our familiar existence to the whims of chance and entropy while creating the tower of Babel.

-> I fear an unforeseeable consequence, something bad we do not see coming. Yet, I agree and understand that it can also tremendously improve the current consolidation and privacy loss issues.

nopara73 · 2023-04-20T04:57:13Z

No need for concern, my intention in commenting wasn't to press you. I simply wanted to ensure that the matter isn't overlooked.

stale · 2023-06-23T03:13:43Z

This has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

turbolay · 2023-06-23T05:48:06Z

We basically ended up implementing what I explained in my comment:

Intuitively, a different approach than your PR (not mutually exclusive) would be for the AmountDecomposer to try to make "change" output (big non-standard) as small as possible

Now that the work on the AmountDecomposer is finished, it would be interesting to consider again this PR.

At first glance, I believe that its impact will now be minimal, but I think the PR still makes sense.

stale · 2023-09-16T21:52:08Z

This has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Implement generalized weighted average

164c30c

pull-request-size bot added the size/M label Feb 10, 2023

onvej-sl added 2 commits February 10, 2023 16:02

Calculate anonymity loss using generalized weighted average

11fa485

Test anonymity loss

54f3cd4

onvej-sl force-pushed the onvej-sl/coin-selection-improvement branch from 8709686 to 54f3cd4 Compare February 10, 2023 15:03

yahiheb requested a review from nopara73 February 12, 2023 09:40

kiminuo reviewed Mar 1, 2023

View reviewed changes

onvej-sl added 4 commits March 14, 2023 14:47

fixup! Implement generalized weighted average

3cd7ea2

fixup! Test anonymity loss

6a691af

fixup! Calculate anonymity loss using generalized weighted average

ed7d6cc

Use weighted mean instead of weighted average

96d01af

pull-request-size bot added size/L and removed size/M labels Mar 14, 2023

onvej-sl requested review from kiminuo and removed request for nopara73 March 14, 2023 14:10

nopara73 reviewed Mar 16, 2023

View reviewed changes

WalletWasabi/WabiSabi/Client/CoinJoinClient.cs Outdated Show resolved Hide resolved

kiminuo previously approved these changes Mar 16, 2023

View reviewed changes

WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs Outdated Show resolved Hide resolved

WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs Outdated Show resolved Hide resolved

turbolay mentioned this pull request Mar 24, 2023

[Test] Deterministic AnonScoreGainTests #9375

Closed

Update WalletWasabi/WabiSabi/Client/CoinJoinClient.cs

ae6205e

Co-authored-by: nopara73 <adam.ficsor73@gmail.com>

onvej-sl dismissed kiminuo’s stale review via ae6205e March 24, 2023 12:33

onvej-sl and others added 3 commits March 24, 2023 13:34

Update WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs

f5a2788

Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>

Update WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs

af6f9b3

Co-authored-by: Kimi <58662979+kiminuo@users.noreply.github.com>

fixup! Update WalletWasabi.Tests/UnitTests/LinqExtensionsTests.cs

96056fa

nopara73 mentioned this pull request Apr 9, 2023

Avoid Overmixing In Edge cases #10462

Closed

Pule08 mentioned this pull request Apr 25, 2023

Managing Item: Avoid Overmixing In Edge cases #10559

Open

8 tasks

onvej-sl mentioned this pull request May 2, 2023

InputBanned for no apparent reason trezor/trezor-suite#8307

Open

turbolay mentioned this pull request May 16, 2023

[Breakthrough] Occasional Anonymity Loss #10708

Closed

stale bot added the stale label Jun 23, 2023

stale bot removed the stale label Jun 23, 2023

stale bot added the stale label Sep 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coin selection improvement #10096

Coin selection improvement #10096

onvej-sl commented Feb 10, 2023

nopara73 commented Feb 12, 2023

turbolay commented Feb 13, 2023 •

edited

kiminuo Mar 1, 2023

kiminuo Mar 1, 2023

onvej-sl Mar 14, 2023

kiminuo Mar 1, 2023

onvej-sl Mar 14, 2023

kiminuo Mar 1, 2023

onvej-sl Mar 14, 2023

kiminuo commented Mar 16, 2023

kiminuo left a comment

nopara73 commented Mar 22, 2023

nopara73 commented Mar 29, 2023

turbolay commented Mar 29, 2023

nopara73 commented Apr 19, 2023

turbolay commented Apr 19, 2023

nopara73 commented Apr 20, 2023

stale bot commented Jun 23, 2023

turbolay commented Jun 23, 2023 •

edited

stale bot commented Sep 16, 2023

		Assert.False(coins.Contains(bigCoinWithSmallAnonymity1) && coins.Contains(smallCoinWithBigAnonymity));
		Assert.False(coins.Contains(bigCoinWithSmallAnonymity2) && coins.Contains(smallCoinWithBigAnonymity));

+		if (p == 0)
+		{
+			throw new ArgumentException("Non-zero value is expected.", nameof(p));
+		}

Coin selection improvement #10096

Are you sure you want to change the base?

Coin selection improvement #10096

Conversation

onvej-sl commented Feb 10, 2023

TLDR

Problem statement

Proposed solution

nopara73 commented Feb 12, 2023

turbolay commented Feb 13, 2023 • edited

Review/tl;dr

Reformulation

Go further proposal

kiminuo Mar 1, 2023

Choose a reason for hiding this comment

kiminuo Mar 1, 2023

Choose a reason for hiding this comment

onvej-sl Mar 14, 2023

Choose a reason for hiding this comment

kiminuo Mar 1, 2023

Choose a reason for hiding this comment

onvej-sl Mar 14, 2023

Choose a reason for hiding this comment

kiminuo Mar 1, 2023

Choose a reason for hiding this comment

onvej-sl Mar 14, 2023

Choose a reason for hiding this comment

kiminuo commented Mar 16, 2023

kiminuo left a comment

Choose a reason for hiding this comment

nopara73 commented Mar 22, 2023

nopara73 commented Mar 29, 2023

turbolay commented Mar 29, 2023

nopara73 commented Apr 19, 2023

turbolay commented Apr 19, 2023

nopara73 commented Apr 20, 2023

stale bot commented Jun 23, 2023

turbolay commented Jun 23, 2023 • edited

stale bot commented Sep 16, 2023

turbolay commented Feb 13, 2023 •

edited

turbolay commented Jun 23, 2023 •

edited