Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Testnet is down (unknown catchain-related disastrous bug) #272

Closed
Skydev0h opened this issue Mar 23, 2020 · 6 comments
Closed

⚠️ Testnet is down (unknown catchain-related disastrous bug) #272

Skydev0h opened this issue Mar 23, 2020 · 6 comments

Comments

@Skydev0h
Copy link

Testnet has crashed approximately at 19:02:13 +0200.
I do not know exact reason but after starting up next FATAL occurs:

[catchain-receiver.cpp:585][!catchainreceiver] Check `last_sent_block_->delivered()` failed

If I purge catchain receiver data, next error happens:

[catchain-receiver.cpp:316][!catchainreceiver]   Check `!S->blamed()` failed

Also, there are lots of warnings in logs about incorrect signature of left fork blame, "CAIN: blaming source", a CATCHAIN_WARNING that it "got ill" (validator caught coronavirus?!).

Also there was information that one of validators got full disk nearly when this happened.

The network seems to be still down now (at least my validator tries to download something, state or catchain, and then fatals out).

@Skydev0h
Copy link
Author

image
It is scary, it seems that the pandemic have reached the TON testnet

@Skydev0h
Copy link
Author

image
The moment before fatal

@Skydev0h
Copy link
Author

Skydev0h commented Mar 23, 2020

After last emerg fix validation suddenly stops after some seconds of seemingly successful validation:
image
image
And last block is 24k seconds ago.

And lots of bad overlay id +PYCoUrtNuCT5CRx8gUgZ3TN8zTc59FKKTbh97Rqgc0= in other 3 validators.

@Skydev0h
Copy link
Author

image
It seems that validation is working as of now, but the state of config and elector is very strange.

@ton-blockchain
Copy link
Collaborator

Well, recovering from the situation when somebody controlling 40% of testnet validators upgraded them to a buggy version that corrupted the local database, including current catchain state, is not simple. However, we see that this turned out to be possible, even though it required an emergency update of more than 2/3 of all validators. So let's think of this as an unplanned test of the behavior of the TON Blockchain in such harsh conditions.

Such things are unlikely to happen in the mainnet, because nobody is expected to control such a large portion of the validators there, so they would not have been upgraded to a buggy version simultaneously.

On the flip side, we see that no forks and no invalid blocks have been created during the several hours when the testnet was not working properly, so the Catchain consensus turned out to be robust enough to cope with this situation (as it should), even if it generated a lot of error messages in the process. We still value C and A of the CAP-theorem higher than P.

@Skydev0h
Copy link
Author

Skydev0h commented Mar 24, 2020

Wow, that was a bit unexpected. I tracked owners by wallet addresses and did not see anyone near 40% on any voting.
image
This was the last log before last voting ended before the crash.

It seems that your validators have 624 278 weight out of 1 277 679 (48.86%), if subtract 18.78% (my validators used latest git commit and were not patched in any way) then 32.36% remains.

I rechecked those percentages in "current validators" section after switch:
image

That means that someone controls all other validators? Or it was your testing of forking? 🙄

@Skydev0h Skydev0h closed this as completed Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant