Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(storage): check and clean for pallet smart contract #659

Merged
merged 27 commits into from
May 25, 2023

Conversation

renauter
Copy link
Collaborator

@renauter renauter commented Apr 7, 2023

@renauter renauter changed the title feat: smart contract pallet storage check feat: storage check fot pallet smart contract Apr 7, 2023
@renauter renauter marked this pull request as ready for review April 14, 2023 17:52
@renauter
Copy link
Collaborator Author

renauter commented Apr 14, 2023

To run the "migration" live against Devnet do:

cargo run --release --features=try-runtime try-runtime --runtime target/release/wbuild/tfchain-runtime/tfchain_runtime.compact.wasm --chain chainspecs/main/chainSpecRaw.json on-runtime-upgrade live --uri ws://10.10.0.151:9944

Note that the process consists in:
(1) check storage and display the warnings
(2) clean storage
(3) re-check storage with exact same code as (1) and no more warnings should remain

@renauter renauter changed the title feat: storage check fot pallet smart contract feat(storage): check and clean for pallet smart contract Apr 17, 2023
Copy link
Contributor

@DylanVerstraete DylanVerstraete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the new storage version actually set?


impl<T: Config> OnRuntimeUpgrade for CheckStorageStateV9<T> {
fn on_runtime_upgrade() -> Weight {
if PalletVersion::<T>::get() == types::StorageVersion::V9 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work on all networks, since the version on qa/test/main is v8

@renauter
Copy link
Collaborator Author

renauter commented Apr 17, 2023

Where is the new storage version actually set?

It is not set because there is no modification of the structure, and this check/clean is valid for v9

This won't work on all networks, since the version on qa/test/main is v8

Indeed but the purpose of this check/clean is related to a specific version.
When we will upgrade qa/test/main it will go to v9 and the "migration" can be applied

Where is the pre and post check?

pre and post check are used by testing tools and not triggered when running command

cargo run --release --features=try-runtime try-runtime --runtime target/release/wbuild/tfchain-runtime/tfchain_runtime.compact.wasm --chain chainspecs/main/chainSpecRaw.json on-runtime-upgrade live --uri ws://10.10.0.151:9944

I wanted to integrate them by using try_on_runtime_upgrade() with no success
So for now on_runtime_upgrade() has check + clean + check
Open to better solution

@@ -25,7 +25,7 @@ pub enum StorageVersion {

impl Default for StorageVersion {
fn default() -> StorageVersion {
StorageVersion::V9
StorageVersion::V8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are going to V9, default is now V9

@renauter
Copy link
Collaborator Author

renauter commented Apr 18, 2023

I added the weights and made it compatible with v8 and v9
Actually v8 is on qa/test/mainnet and v9 on devnet
But v9 (which is a billing loop cleaning) can be discarded since the checking/cleaning in this PR already includes what v9 do

Run on:

  • Devnet
  • QAnet
  • Testnet
  • Mainnet

And no remaining warnings after cleaning

image

Comment on lines 689 to 696
if !ContractLock::<T>::contains_key(contract_id) {
let now = <timestamp::Pallet<T>>::get().saturated_into::<u64>() / 1000;
r += 1;
let mut contract_lock = types::ContractLock::default();
contract_lock.lock_updated = now;
ContractLock::<T>::insert(contract_id, contract_lock);
w += 1;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is the right thing to do here, in case there is no contract lock associated to a contract

@DylanVerstraete
Copy link
Contributor

DylanVerstraete commented Apr 25, 2023

Current weight is above the block threshold:

TryRuntime_on_runtime_upgrade executed without errors. Consumed weight = (2462275000000 ps, 0 byte), total weight = (2000000000000 ps, 18446744073709551615 byte) (123.11 %, 0.00 %).```

-> 123%

@DylanVerstraete
Copy link
Contributor

@renauter I refactored the logging, and set all logs to debug. You can now see all the logs from your migration like:

RUST_LOG=debug cargo run --release --features=try-runtime try .......

Because logging in production also takes up compute time.

I also put back the Post/Pre checks. I know why they were not firing, see:

let weight = Executive::try_runtime_upgrade(true).unwrap();

@DylanVerstraete
Copy link
Contributor

Now that we know that the storage migration takes 123% of a total blocks weight we need to think about execution strategy. Either we optimize the migration code OR we do a rolling migration that is triggered by an extrinsic?

@renauter
Copy link
Collaborator Author

I also put back the Post/Pre checks. I know why they were not firing, see:

Nice! So I guess I can remove the weights from the checking process

@renauter
Copy link
Collaborator Author

Now that we know that the storage migration takes 123% of a total blocks weight we need to think about execution strategy. Either we optimize the migration code OR we do a rolling migration that is triggered by an extrinsic?

I just re-checked and as the checking process is now done on pre/post checks, as it should be (because never meant to be executed on-chain but is meant to be used by testing tools), the 123% decreases to 30%:

TryRuntime_on_runtime_upgrade executed without errors. Consumed weight = (604750000000 ps, 0 byte), total weight = (2000000000000 ps, 18446744073709551615 byte) (30.24 %, 0.00 %).

Which corresponds to the cleaning process alone that would be performed on-chain.
So no need to formulate a strategy, correct?

@DylanVerstraete
Copy link
Contributor

@renauter on which network did you execute to get a 30% blockweight execution time? It's crucial this is measured against mainnet (which is the biggest chain in terms of data)

@renauter
Copy link
Collaborator Author

@renauter on which network did you execute to get a 30% blockweight execution time? It's crucial this is measured against mainnet (which is the biggest chain in terms of data)

Indeed, was on devnet... my bad
With the current code we have:

  • Devnet => 30%
  • QAnet => 13%
  • Testnet => 43%
  • Mainnet => 123%

So for the strategies:

  1. Optimize migration code
    I gonna check but at first sight seems complicated to reduce

  2. Rolling migration that is triggered by an extrinsic
    not sure if I get this one because the extrinsic will remain right?
    maybe not the best in terms of impact

There is no other way to run the migration in 2 steps?
ex: we split migration code in 2 (v9_1 and v9_2) and we run them at different moment (block)

@DylanVerstraete
Copy link
Contributor

@renauter
Copy link
Collaborator Author

@DylanVerstraete out of reach in Telegram
which is blocked in Brazil since yesterday
https://apnews.com/article/brazil-telegram-suspension-social-media-school-violence-d72acaacd3c1b4d07c2c4fcb094f4ce6
I send alternative via gtalk

@DylanVerstraete
Copy link
Contributor

Oh damn, that some hard censorship right there

@renauter
Copy link
Collaborator Author

Oh damn, that some hard censorship right there

Indeed, It s the second time in 1 year ...

@renauter
Copy link
Collaborator Author

I adapted the strategy of the migration code to be able to simulate the rolling migration and check if storage was effectively cleaned by executing the following steps:

(1) run a local node with the actual (forked) chain_specs of dev/test/mainnet

(2) run a try-runtime operation on local node with the checking code in pre_upgrade() to check storage BEFORE cleaning

RUST_LOG=debug cargo run --release --features=try-runtime try-runtime --runtime target/release/wbuild/tfchain-runtime/tfchain_runtime.compact.wasm --chain chainspecs/main/chainSpecRaw.json on-runtime-upgrade live --uri ws://localhost:9944

(3) do a runtime upgrade using set_code() extrinsic (via polkadot.js UI) to start the rolling migration of the storage cleaning

(4) when migration ended, repeat step (2) to check storage AFTER cleaning

Copy link
Contributor

@brandonpille brandonpille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Some comments:

contract_id, contract.contract_id
);
}
if !contract_id_range.contains(&contract_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get why you are checking if the id is in the range. It's always the case no? And if it would not be the case why not do:

let current_contract_id = ContractID::<T>::get();

to then replace this if by:

if contract_id >= current_contract_id

Copy link
Collaborator Author

@renauter renauter May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I get why you are checking if the id is in the range. It's always the case no?

Indeed, it should always be the case that s why I am checking it to identify bad cases that could result from old code.
We never know, I have seen some unexpected cases investigating the storage.

And if it would not be the case why not do: ...

Because I also want to check contract_id > 0 so using the range I check both at the same time with same complexity I guess

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ContractID is u64 so it can never be < 0... It will always be >= 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we still want to identify contract_id == 0 case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but that simplifies the check to

if contract_id == 0 {
  // log
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I could have done

if contract_id == 0 || contract_id > ContractID::<T>::get() {
  // log
}

but I found it more elegant to do

let contract_id_range = 1..=ContractID::<T>::get();
if !contract_id_range.contains(&contract_id) {
  // log
}

I think complexity is same, correct?

Comment on lines 9 to 18
// ✅ ContractsToBillAt
// ✅ Contracts
// ✅ ActiveNodeContracts
// ✅ ActiveRentContractForNode
// ✅ ContractIDByNodeIDAndHash
// ✅ ContractIDByNameRegistration
// ✅ ContractLock
// ✅ SolutionProviders
// ✅ ContractBillingInformationByID
// ✅ NodeContractResources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented code

"🔎 CleanBillingLoop pre migration: Number of existing billing loop indexes {:?}",
contracts_to_bill_count
fn check_node_contract<T: Config>(node_id: u32, contract_id: u64, deployment_hash: HexHash) {
if let Some(_) = pallet_tfgrid::Nodes::<T>::get(node_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contained value of the Some variant is not used, so this check only cares if the key exists at all, which according to the docs can be checked as:

Suggested change
if let Some(_) = pallet_tfgrid::Nodes::<T>::get(node_id) {
if pallet_tfgrid::Nodes::<T>::contains_key(node_id) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed
which is also equivalent to:

if pallet_tfgrid::Nodes::<T>::get(node_id).is_some() {

correct?

pre_contracts_to_bill_count,
"Number of billing loop indexes migrated does not match"
fn check_rent_contract<T: Config>(node_id: u32, contract: &types::Contract<T>) {
if let Some(_) = pallet_tfgrid::Nodes::<T>::get(node_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if let Some(_) = pallet_tfgrid::Nodes::<T>::get(node_id) {
if pallet_tfgrid::Nodes::<T>::contains_key(node_id) {

@DylanVerstraete DylanVerstraete merged commit 704fc01 into development May 25, 2023
2 checks passed
@DylanVerstraete DylanVerstraete deleted the development_runtime_storage_checking branch May 25, 2023 06:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants