-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimizing memory usage #190
Conversation
I thought you were going to pack all mem optimizations in the same PR. Are you planning on a follow-up for that? |
Nah, will follow-up in this PR. Polling the least amount of data already solved the iter/non-iter issue since it only appears when a hashmap has a dynamic sized field (the
Not sure if there are any trivial/non-structural opts left after that. The next set of optimizations for watcher I think are:
I will need to review the watcher and see what roles does these two hashmaps play so to understand how would reducing/eliminating them would affect the tower CPU-wise in a normal case (not so many breaches). Will also need to check similar memory optimization options for the responder and gatekeeper. |
f57736a
to
92ad4e5
Compare
After a very very long fight with lifetimes, boxes, dyns and everything I was able to assemble this f12cb90. It's working as intended as a way of steaming data whilst hiding the DBM specific things, but the design isn't good at all:
The issue lies in DBM methods constructing a That said, I might be complicating the issue and could be solved in another simpler way, so posting here to have a review on this. Meanwhile I am moving on to other possible memory optimization options. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am only reviewing f12cb90 for now.
I don't think this approach is that bad, it'll be best if we could simply implement Iter
for QueryIterator
, but for more than I've scratched my head, I haven't found a way of doing so.
With respect to changes being implemented in rusqlite
, the one you've mentioned seems to depend on an issue from 2016, so I think it's pretty unlikely :(
Regarding the code itself. Assuming our goal is to use this only for bootstrapping, we don't actually need the params to be part of QueryIterator
, given there are none for either the collection of appointment summaries nor the trackers. Therefore we could simplify it to something like the following:
diff --git a/teos/src/db_iterator.rs b/teos/src/db_iterator.rs
index 9bf61a1..feda098 100644
--- a/teos/src/db_iterator.rs
+++ b/teos/src/db_iterator.rs
@@ -1,22 +1,19 @@
-use rusqlite::{MappedRows, Params, Result, Row, Statement};
+use rusqlite::{MappedRows, Result, Row, Statement};
use std::iter::Map;
/// A struct that owns a [Statement] and has an `iter` method to iterate over the
/// results of that DB query statement.
-pub struct QueryIterator<'db, P, T> {
+pub struct QueryIterator<'db, T> {
stmt: Statement<'db>,
- params_and_mapper: Option<(P, Box<dyn Fn(&Row) -> T>)>,
+ mapper: Option<Box<dyn Fn(&Row) -> T>>,
}
-impl<'db, P, T> QueryIterator<'db, P, T>
-where
- P: Params,
-{
+impl<'db, T> QueryIterator<'db, T> {
/// Construct a new [QueryIterator].
- pub fn new(stmt: Statement<'db>, params: P, f: impl Fn(&Row) -> T + 'static) -> Self {
+ pub fn new(stmt: Statement<'db>, f: impl Fn(&Row) -> T + 'static) -> Self {
Self {
stmt,
- params_and_mapper: Some((params, Box::new(f))),
+ mapper: Some(Box::new(f)),
}
}
@@ -28,9 +25,9 @@ where
&mut self,
) -> Option<Map<MappedRows<'_, impl FnMut(&Row) -> Result<T>>, impl FnMut(Result<T>) -> T>>
{
- self.params_and_mapper.take().map(move |(params, mapper)| {
+ self.mapper.take().map(move |mapper| {
self.stmt
- .query_map(params, move |row| Ok((mapper)(row)))
+ .query_map([], move |row| Ok((mapper)(row)))
.unwrap()
.map(|row| row.unwrap())
})
diff --git a/teos/src/dbm.rs b/teos/src/dbm.rs
index 318d8c9..6823313 100644
--- a/teos/src/dbm.rs
+++ b/teos/src/dbm.rs
@@ -7,7 +7,7 @@ use std::path::PathBuf;
use std::str::FromStr;
use rusqlite::limits::Limit;
-use rusqlite::{params, params_from_iter, Connection, Error as SqliteError, Row, ParamsFromIter};
+use rusqlite::{params, params_from_iter, Connection, Error as SqliteError};
use bitcoin::consensus;
use bitcoin::hashes::Hash;
@@ -307,9 +307,7 @@ impl DBM {
}
/// Loads all [AppointmentSummary]s from that database.
- pub(crate) fn load_appointment_summaries(
- &self,
- ) -> QueryIterator<ParamsFromIter<[u8; 0]>, (UUID, AppointmentSummary)> {
+ pub(crate) fn load_appointment_summaries(&self) -> QueryIterator<(UUID, AppointmentSummary)> {
let stmt = self
.connection
.prepare(
@@ -318,7 +316,7 @@ impl DBM {
)
.unwrap();
- let func = |row: &Row| {
+ QueryIterator::new(stmt, |row| {
let raw_uuid: Vec<u8> = row.get(0).unwrap();
let raw_locator: Vec<u8> = row.get(1).unwrap();
let raw_userid: Vec<u8> = row.get(2).unwrap();
@@ -329,9 +327,7 @@ impl DBM {
UserId::from_slice(&raw_userid).unwrap(),
),
)
- };
-
- QueryIterator::new(stmt, params_from_iter([]), func)
+ })
}
/// Loads appointments from the database. If a locator is given, this method loads only the appointments
Also, notice how annotating the closure is not necessary as long as you add it directly to QueryIterator::new
, given the compiler can infer the types in that case (if you assign it to a variable it could be used in different contexts, so it looks like the compiler doesn't really like that).
All in all, I don't think this is too terrible, the alternative would be replacing rusqlite
with something that has a design more compatible with our needs (which I'm not opposed to, tbh), or being a rustlang master maybe and find a tricky solution 🙃
Happy you find it not that terrible 😂 |
It was you calling it not good at all, I'm just smoothing it out a bit lol |
f12cb90
to
72b384d
Compare
DB Iterator stuff are backed up in https://github.com/mariocynicys/rust-teos/tree/dbm-iterator |
After this change, reached 631MiB after trimming using the 1G DB. Some tests needed to adapt because they were breaking the recipe for Also some methods might need to have its signatures changed a little bit to be less confusing and less error prone. Another thing that I think we should change is the excessive use of hashsets and hashmaps in functions. Hash maps and sets are more expensive to insert in and use more memory than a vector. In many cases we pass a hashmaps to a function just to iterate on it (never using the mapping functionality it offers). Hashmaps could be replaced with a I pushed that commit 72b384d to get a concept ack, but I think will need to do some refactoring & renaming. |
Let me also dump some notes I had written xD. Trying to get rid of 1- For This should be replaced with:
2- For We extract locator from a uuid when a client calls We extract user_id from a uuid when a new block is connected (in That latest commit works on point 2, I will try to work on point 1 without the |
Alright, I think you are on the right path. I'm commenting on your notes, but haven't checked 72b384d (lmk if you'd like me to at this stage).
Yeah, that should only affect tests. In the test suite the uuid recipe was not being followed because data was being generated sort of randomly, but that should be relatively easy to patch.
I partially agree with this. With respect to the
The two former need
I agree as long as collecting the map doesn't involve ending up having worse performance. Same applies to sets. The reasoning behind sets instead of vectors is that the former allow item deletion by identifier, while the latter doesn't. In a nutshell, if we're only iterating it may be worth, but if we need addition/deletion it may be trickier. Actually, if we only want to iterate, wouldn't an iterator be an option? Also, are we passing copies of the maps/sets or references?
The other way around, or am I missing something? The user requests a given locator and we compute the
I barely remember this now, but I think the reason why
Same reasoning I think, so we can delete the corresponding data from |
That's true, we can keep it taking UUID but be cautious later in the tests not to provide a random non-related UUID. Or we can embed the UUID inside the extended appointment.
Yup iterator would be the best we can do if we nail having converting the calls reacting to
When a client asks for their subscription info, they wanna know the locators they have given out and not the UUIDs (UUIDs is tower-implementation specific after all). The thing is we store a user's appointments in terms of UUIDs inside
common_msgs::GetSubscriptionInfoResponse {
available_slots: subscription_info.available_slots,
subscription_expiry: subscription_info.subscription_expiry,
locators: locators.iter().map(|x| x.to_vec()).collect(),
} I think you confused it with
Nah, ACKing these comments were enough. |
I was indeed thinking about single appointment requests 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a thorough review, some things may still be missing, but given I wanted to start using these optimizations at teos.talaia.watch
I went ahead a give this a look.
I think this goes in the right direction, but it needs some polishing.
72b384d
to
8b357e0
Compare
The last 4 commits should have their tests adapted and squashed into one commit. Leaving them for now for an easy linear review. Does they make sense in terms of how much they query the DB (bottle necks)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overlooking that test are not building and that some methods have unnecessary arguments for now, just focusing on the big changes.
I reviewed up to the Gatekeeper
(so Watcher
and Responder
are missing), but I don't trust GH to not delete all my pending comments, so I'll add the rest in a followup review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like GH posted the comment twice, so I'm reserving this stop for later 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the Wartcher
and Responder
.
I would need to do another general pass once the code is cleaned up, it's otherwise har to follow.
Take a look at the comment regarding the disconnections, I think we should preserve that logic instead of trying to react to things on block_disconnected
.
Also need to mention the IO implications of this refactor. Most of the IO are reads triggered each block by
Other affected reads:
The 111 calls to |
After applying |
Wow, what a reduction, that nice :) |
The current rework of the responder is actually equivalent to how things were previously before the memory opt stuff (but without I believe it still needs some reworking to keep tracking reorged disputes for longer time (+ actually track reorged disputes for non-reoged penalties), but dunno whether they should be in a follow-up since this one has grown a lot. Thoughts @sr-gi ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, but it is starting to get to the point where a cleanup would be helpful, especially to be able to properly review each commit individually.
It would need more a thorough review, but see this as an Approach ACK
Ummm, I thought we can only do so many optimization at the first and relied on some in-memory structs being there. Incrementally, I realized we can remove these as well, and updated on old commits, so it's a bit confusing not having done all of them in a single commit. I would suggest squash reviewing all the commits (- the test fixing one & the merge one) at once, as separating each of them into a standalone commit will be a very nasty rebase (and probably will end up squashing most/all of them together). |
Fair, as long as there is a cleanup of old comments and suggestions are addressed. |
db73cd1
to
9150458
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final nits.
Can you rebase so I can review this from the new base? I think only tests are missing but just to avoid having to go trough that again.
You can leave squashing for later, after the final review.
43cb570
to
c9d4c3f
Compare
c9d4c3f
to
550305f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a full review, inckluding tests. It is looking pretty good.
We should be tracking things that are pending so we do not forget in the followups. I've added comments for some, here are more (some may be duplicated):
- Store data as reorged in the DB so we don't have to work around it in
check_confirmations
- Return list of locators instead of UUID when querying for user appointments
- handle_reorged_txs after landing Improves reorg logic by checking whether we are on sync with the backend or not #235 to avoid hacking things around
ConfirmationStatus::IrrevocablyResolved
- rebroadcast_stale_txs after landing Improves reorg logic by checking whether we are on sync with the backend or not #235 to deal with the case that the tower was off for some time and then force updated
assert_eq!( | ||
dbm.get_appointment_length(uuid).unwrap(), | ||
appointment.inner.encrypted_blob.len() | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit tricky because all appointments created using generate_random_appointment
do use the exact same penalty, which is hardcoded.
I don't really think it is a big issue, given the assertion doesn't need to work with random sizes, but in prod appointments will certainly have different lengths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, a "random" transaction could be created by just modifying some of the transaction bits, such as value, prev_txid, ...
It may be worth adding something on these lines to the testing suite at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prev_txid
is already random (used for encryption) but I don't think this affects the length of the encrypted bytes.
pub struct Transaction {
/// The protocol version, is currently expected to be 1 or 2 (BIP 68).
pub version: i32,
/// Block number before which this transaction is valid, or 0 for valid immediately.
pub lock_time: u32,
/// List of transaction inputs.
pub input: Vec<TxIn>,
/// List of transaction outputs.
pub output: Vec<TxOut>,
}
We can add more inputs/outputs to the transaction. A variable size OP_RETURN
should do the trick.
@@ -329,6 +270,9 @@ impl Watcher { | |||
.lock() | |||
.unwrap() | |||
.store_appointment(uuid, appointment) | |||
// TODO: Don't unwrap, or better, make this insertion atomic with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add this to the pending fixes issue
I have typed the issues (one for add_appointment atomicity issues & one for DB persistence of tracker status) but will wait to add code references on them from this PR once it's on master. |
a012db4
to
19ea462
Compare
Most comments have been addressed, I left additional comments in the ones that have not. Once those are addressed, this should be good to go. It will need both rebasing and squashing. |
83614ec
to
b4e6ba6
Compare
5d32e26
to
f965c40
Compare
By loading the minimal necessary data during bootstrap, we get lower memory usage and faster bootstrapping. Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
`last_known_blocks` was taking up ~300migs of memory (for 100 blocks) because it was not dropped in `main`. Co-authored-by: Sergi Delgado Segura <sergi.delgado.s@gmail.com>
f965c40
to
fbcfebb
Compare
Regrading the `Watcher`, fields (appointments, locator_uuid_map) has been replaced by DB calls when needed. For `Responder`, the field `trackers` has been replaced by DB calls when needed, and `tx_tracker_map` wasn't actually needed for the tower to operate, so was just dropped. For `GateKeeper`, `registered_users::appointments` which used to hold the uuids of every appointment the user submitted was removed so that `registered_users` only holds meta information about users. Also now the gatekeeper is the entity responsible for deleting appointments from the database. Instead of the watcher/responder asking the gatekeeper for the users to update and carry out the deletion and update itself, now the watcher/responder will hand the gatekeeper the uuids to delete and the gatekeeper will figure out which users it needs to update (refund the freed slots to). Also now, like in `Watcher::store_triggered_appointment`, if the appointment is invalid or was rejected by the network in block connections, the freed slots will not be refunded to the user. Also the block connection order starts with the gatekeeper first, this allows the gatekeeper to delete the outdated users so that the watcher and the responder doesn't take them into account.
fbcfebb
to
1790fe3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just by loading the minimal necessary data plus dropping the
last_known_blocks
we get a huge reduction (2GB -> 1.2GB for 2.2M appointments*).We can even do better and further reduce the data stored in memory (dropping user_ids and potentially locators).
* This is on MacOS. On linux we get to as low as 1.3G but only 1.0G is actually being used, we know that because if we call
malloc_trim
we get to 1.0GFixes #224