New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransactionStore
using SQLite
#12137
TransactionStore
using SQLite
#12137
Conversation
Concept ACK |
bba556e
to
446ad66
Compare
446ad66
to
0383cd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this for a while and all worked properly: coinjoin, receive, and send.
I investigated a bit. Wasabi tries to analyze all tx that we know about. This includes all the wallets and in some cases, there are interactions between wallets. Like label or anon score transitions, when you send a coin from one wallet to another it should preserve these properties. Also, there are cases when we punish user behavior resulting lower anon score. We consider these cases important. We need to know our coins even if it is in another wallet. I am not sure how complicated is to change this, keeping these concepts working. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested. My balance got corrupted - something is off.
- I copied my data folder and opened 3 wallets. LGTM
- Restarted Wasabi, opened 3 wallets - quite fast but the balance was off and I saw only a few tx in the list.
Am I testing it correctly?
Partial code review - you can ignore them for now.
if (migrateData) | ||
{ | ||
string oldPath = Path.Combine(workFolderPath, "Transactions.dat"); | ||
Import(oldPath, network, deleteAfterImport: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the final version, the old file will be deleted after import?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably follow the approach we took with block filters:
- Finish implementation of
TransactionStore
using SQLite #12137 where the migration does not deletetransaction.dat
files - After some time, we would implement a PR that removes old
transaction.dat
file - A release version would delete
transaction.dat
file.
However, given that transaction.dat
is in many cases much smaller than the block filter file, it's conceivable that leaving transaction.dat
file on disk would not be that bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree to create a PR that deletes the file after this PR is merged - better for the devs they can switch back and forth to other PR-s where the database does not exist without losing Tx.dat. Drop the idea of keeping the tx.dat on the disk, we do not need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept is obviously really good, and the implementation is also (as always with your PRs) really good as well. There are some questions/remarks raised by it:
- I had an old HDD and Wasabi was almost unusable on it because of the amount of IO caused by updating
Transactions.dat
. This PR will make incredible improvements for this case. - It is really good that you implemented
LinqExtensions.OrderByBlockchain
on theGet
side. Maybe it would save some performance to implement it in theInsert
andUpdate
side instead, but it feels more tricky as it can change. - Some users had problems with migration for the filters. With this PR, Wasabi will for example become unusable for users that have less space available in the disk than what is taken by the database. Maybe it would make sense to keep both system available for some time? IDK
- I see that your implementation prefers to crash rather than to resync. I believe that this is the correct implementation, as long as [PoC] Always test filters against the MinGapLimit of clean keys, no more #10981 is not implemented. Currently a resync with all keys in the wallet.json is a black swan scenario for big wallets.
I will update or send new comment if I think of more
I'm not aware of the reasons for this but 1 wallet : 1 transaction database sounds better to me.
Currently, this creates performance issues in the case that an installation contains a big wallet, but the user only opens a small wallet. If you are in such case (and this is why I recommended you to use a different datadir
to test a big wallet), the small wallet will become laggy. This is a problem, and that is why improvements like #11848 are so impactful in those cases.
An example currently of this problem in the wallet is the "Most used labels" feature. We loop through all the transactions, whether they are relevant to the wallet or not, making a small wallet extremely CPU-intensive and laggy.
As @molnard mentioned, sometimes we need all transactions and sometimes only wallet relevant ones. The case mentioned where a user spends from Wallet A to B would not be a problem because we are talking about transactions, not coins. The only thing to care about is that the same transaction can be linked to several wallets.
All in all I believe it would indeed make sense to have separate databases for different wallets (or a new field list of wallet ids relevant to the transaction). But only if we want to continue in the direction of slowly removing the transaction from the memory, which I think is a continuous effort that we should do.
It would not really be a new state, as it's immutable: once a transaction is made relevant for a wallet by the TransactionProcessor
, it will always be relevant for it. Similarly, if a transaction is marked as irrelevant for a wallet, it will never be relevant for it. It would be useful because we would still have access to all transactions when we need, but we could do a quick query to filter out the transactions that we don't need if we need only relevant ones. This would make implementations such as Wallet.GetTransactions
much better than currently.
BTW, huge cACK. 🥇 |
It appears to be a good testing scenario. For debugging purposes, it would be good to know if all transactions from your |
Yes, I believe that WW should work much better in that case.
So it's just a matter of a database index. So "insert"s and "update"s automatically update indices. Am I missing something?
So I believe that
So I think it's preferable because it can allow us to fix bugs (if there are bugs). If some
I'm not sure why "Most used labels" iterates over all transactions. For me as a user, it's really unexpected as it feels like "a data spill from one wallet to another one". |
I added your test scenario to the OP. |
TransactionStore
using SQLiteTransactionStore
using SQLite
is_cancellation INTEGER NOT NULL, /* 0 ~ no, 1 ~ yes */ | ||
tx BLOB NOT NULL /* transaction as a binary array */ | ||
); | ||
CREATE INDEX IF NOT EXISTS transaction_blockchain_idx ON "transaction" (block_height, block_index, first_seen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be UNIQUE INDEX
instead of INDEX
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it from the real-world perspective: If two miners were to mine two different blocks with different transactions then it seems like block_height
and block_index
values can be the same. first_seen
is a value assigned by us. So in theory, it's not unique. In practice, what we should do is that we process one block and handle the reorg so for our purposes it's likely unique is.
Taking a step back can be good here: Why do we actually need it? AFAIK it's to display transactions in the UI.
-> If other people think it's a good idea to add the unique
keyword, I'll add it. I'm not against it, I'm just not confident enough here.
This is interesting. So what is the idea behind it actually? My naive expectation (without knowing this) would be that moving coins from one of my wallets to another one of my wallets would behave as if I send coins to somebody else. So I guess there is some real-world user case behind it. Or some argument why it was introduced. Does anyone know? (cc @MaxHillebrand @lontivero) |
The idea behind is it to use all the information we have available to help users, regardless of which wallet it comes from. |
They cannot separate them because of the blockchain and Wasabi should highlight the connection - its job to do so. There can be nuances but the general concept is good and desired. But this is out of scope here anyway. |
There are many, but the idea is purely conceptual. Here are two examples:
Which means coin properties like anonscore and label should be transwallet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to refocus discussions on this PR, which is already big enough, as cross-wallet functionalities are irrelevant here: changing from file to SQL doesn't change this behaviour in any way. We could have a follow-up discussion with the best way to go afterwards, as it's still an interesting topic.
I didn't test this PR yet because I think to test it the best way is to create an xUnit test that compares the SQL file and the .dat file and verifies that there are no differences, but this test could be run by different people.
I reviewed carefully the code and it looks good to me, only few comments.
Cannot reproduce anymore. It is clear the Transactions.dat contains only 435 tx in the temporary folder, but originally it was 1035. The database was generated from that so the issue was not in the new code. Many things could have happened. My best assumption is that Tx.dat got corrupted, as we all encounter that rarely. Maybe an append operation got interrupted somehow. Anyway this does not relate to this PR (except that we might get rid of the bug of Tx.dat getting corrupted after this PR). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started a Wasabi like a new installation, without data folder. I saw these in the logs:
2023-12-30 10:57:42.699 [1] INFO TransactionStore..ctor (38) Migration of transaction file 'C:\Users\molna\AppData\Roaming\WalletWasabi\Client\BitcoinStore\Main\Mempool\Transactions.dat' to SQLite format is about to begin. Please wait a moment.
2023-12-30 10:57:42.732 [1] INFO TransactionStore..ctor (38) Migration of transaction file 'C:\Users\molna\AppData\Roaming\WalletWasabi\Client\BitcoinStore\Main\ConfirmedTransactions\2\Transactions.dat' to SQLite format is about to begin. Please wait a moment.
public static class SqliteStorageHelper | ||
{ | ||
/// <seealso href="https://learn.microsoft.com/en-us/dotnet/standard/data/sqlite/in-memory-databases"/> | ||
public const string InMemoryDatabase = ":memory:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment, that this is only used in the tests!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constant is present on master as well.
Added a note in 7771cb2.
|
||
// In Transactions.dat every line starts with the tx id, so the first character is the best for digest creation. | ||
TransactionsFileManager = new IoManager(filePath: Path.Combine(WorkFolderPath, "Transactions.dat")); | ||
if (workFolderPath == SqliteStorageHelper.InMemoryDatabase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is cleaner if you add that this is only used in the tests. Otherwise, ppl like me think is has something to do with the prod and start the investigation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, 7d363d1
|
||
#region Modifiers | ||
long ms = sw.ElapsedMilliseconds; | ||
Logger.LogInfo($"XXX: Both stores initialized in {ms} ms"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a nasty log message? What is XXX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a nasty log message? What is XXX?
XXX
is easy to type, easy to search for, and quite unique string of characters. So it's an idea marker for a temporary logging message.
Removed in 4a3ca0a.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK. I was not able to break it. Code LGTM - few nits above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tACK, Code LGTM. I tried what I could to break it but couldn't, IMO migration is much cleaner than for the IndexStore
.
Ty for the integrity test.
2 comments:
- Is it intentional that the migration occurs all the time while
deleteAfterImport
isfalse
? I didn't see comments talking about this. I understand it will be ok when the other PR deleting the file will be merged, but in the meanwhile it will be annoying for some contributors. Anyway, I think it's OK. - There is no support for cancellation during migration. I don't think it's a problem, because the code is resistant to interruption, as the old file is only deleted after complete migration, therefore the migration is reran afterwards, really good.
The strategy is this one: #12137 (comment). Basically, we leave the
True. Though in this case, the migration should be fast in majority of cases. And again this is very similar to what we did with block filters. But we can improve it if we find it to be a problem. |
@kiminuo are we migrating from the old file every time?
|
Yes, it's not ideal. Either we need to remove Or we can rename the |
This sounds like safest option for the start to me. |
We are not going through all the blocks again with the filters, just the found transactions before - with every wallet load. So if we find a transaction and we save it to the SQL but later empty that and reload the old tx list - but meanwhile increasing the Heights in the wallet file => will result in a broken wallet state and we need to rescan to fix that. Does this make sense? |
After migration, the wallet detects that and falls back to the last height? In that case, it will always rescan so the broken state fixing itself? |
Yes, I think it's a correct description of the current state of things. I think that now we have broken wallets because of this and we need to set "last height" in affected wallets to a lower number to re-scan and to see all transactions. Could you please confirm @turbolay? (Obviously, it would get broken again so we need #12200 or a different PR to fix it)
I don't think it does. edit: I'm not sure if you propose to add some "self-recovery" mechanism or whether you are after "how to fix the now corrupted wallet". I suppose that the later. And that's what this comment is about. |
Nice catch @molnard, sorry I didn't realize it while testing.
|
I was thinking about something different. All who suffered from this have the last height where the state was not broken. And that is stored in the old tx dat. So my suggestion is to migrate like we do now but after migration read the latest height and set it to all the wallets. After that delete tx dat. This would be good for everyone. Let's try this first. I won't be able to work on it today - so anyone can grab it. |
This is not true. The latest height of the .dat file is the latest height of a transaction received, not the latest height synchronized. The solution suggested earlier is better, because the worst case is less worse. We could mix both solutions and do some kind of Also, I disagree that there is no rush, because the more this version stays into master the more we increase the risk that someone outside contributors runs the toxic master and breaks his wallet state, and this person might not update while the mitigation is live |
Perfect that is what we need, isn't it? It is a safe point to start the sync.
they will wait one more day. If someone is using the master the fix will come. |
I checked the other solution. I wanted to not have temporary code in the master - but that is impossible1. So #12202 is good. Footnotes
|
A sort of MVP for
TransactionStore
s using SQLite.Introduction
Currently, transactions are being stored to a plaintext data file (i.e.
Transactions.dat
) where each line describes a single transaction:<serialized tx>:<height>:<block_hash>:<labels>:<first_seen>:<is_replacement>:<is_speedup>:<is_cancellation>
Now the disadvantage of plaintext files is that modifications generally require full-rewrites, so adding a new transaction, modifying an existing one, etc. are operations that require rewriting that
Transactions.dat
. Robustness of such solution against failures is really just baseline (power loss at improper time leads to file corruption). To increase performance, currently changes are done in batches to avoid rewritingTransactions.dat
too often.This PR proposes transaction storage changes:
Client/BitcoinStore/Main/ConfirmedTransactions/2/Transactions.dat
Client/BitcoinStore/Main/ConfirmedTransactions/2/Transactions.sqlite
Client/BitcoinStore/Main/Mempool/Transactions.dat
Client/BitcoinStore/Main/Mempool/Transactions.sqlite
Testing
There should be no change in behavior. To be on the safe side, I suggest making a copy of your WW profile somewhere else and run the wallet like this:
Testing scenario 1
Please test this scenario as it might regress:
Measurements
I observe on a big wallet that
AllTransactionStore.InitializeAsync
took:Serialized transaction file is also smaller:
ConfirmedTransactions/2/Transactions.dat
ConfirmedTransactions/2/Transactions.sqlite
Scope of the PR
This PR intentionally does not introduce more changes than just changing the underlying storage from a plaintext format to an SQLite format. There are multiple reasons for that:
SmartTransaction
andSmartCoin
because we rely onSmartTransaction
s being in memory and we rely on the fact that references do no change.SmartTransaction
is used as a storage of data (a better solution would be store such data in a service likeCoinsRegistry
).BitcoinStore/Main/ConfirmedTransactions/2/Transactions.dat
are transactions for all wallets. It's a good question whether it should be like that or not. I'm not aware of the reasons for this but 1 wallet : 1 transaction database sounds better to me.