Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synchronization runs very slowly #100

Closed
ashkov opened this issue Oct 7, 2019 · 12 comments
Closed

Synchronization runs very slowly #100

ashkov opened this issue Oct 7, 2019 · 12 comments

Comments

@ashkov
Copy link

ashkov commented Oct 7, 2019

With 8 cores and 32GB memory it seems that my Full Node can't synchronize and catch up current masterchain block.

[ 3][t 1][1570444307.520390034][lite-client.cpp:322][!query] last masterchain block is (-1,8000000000000000,231810):C2903C8557679E0613260231B5C3FA46C24FE7AF7FF802033F07654214A97946:5D5110A1DF5EC304D35A8244478B0F31E1B7E30FA9A919362FF118266DDDDEC0
[ 3][t 1][1570444307.520483494][lite-client.cpp:283][!testnode] server time is 1570444307 (delta 0)
[ 2][t 1][1570444307.520502329][lite-client.cpp:368][!testnode] server appears to be out of sync: its newest masterchain block is (-1,8000000000000000,231810):C2903C8557679E0613260231B5C3FA46C24FE7AF7FF802033F07654214A97946:5D5110A1DF5EC304D35A8244478B0F31E1B7E30FA9A919362FF118266DDDDEC0 created at 1569919568 (524739 seconds ago according to the server's clock)
latest masterchain block known to server is (-1,8000000000000000,231810):C2903C8557679E0613260231B5C3FA46C24FE7AF7FF802033F07654214A97946:5D5110A1DF5EC304D35A8244478B0F31E1B7E30FA9A919362FF118266DDDDEC0 created at 1569919568 (524739 seconds ago)
BLK#3 = (-1,8000000000000000,231810):C2903C8557679E0613260231B5C3FA46C24FE7AF7FF802033F07654214A97946:5D5110A1DF5EC304D35A8244478B0F31E1B7E30FA9A919362FF118266DDDDEC0

It seems that last block renews more often then "newest masterchain block" known to FullNode.
What configuration on Amazon fits Validator needs?

@ton-blockchain
Copy link
Collaborator

You can find the recommended configuration in https://test.ton.org/Validator-HOWTO.txt

@begetan
Copy link

begetan commented Oct 7, 2019

You should wait several hours until your node will be fully synchronized. What is the Internet link speed for the server? For the Testnet it looks like a network is a bottleneck, not a CPU power. I mean the whole network capacity too. On my tests the CPU utilization quite rare hits 200% and never crosses 300%.

During the synchronization process the network utilization is 20-30 Mbps both in upload/download. You can easily check it with iftop tool.

@ashkov
Copy link
Author

ashkov commented Oct 7, 2019

Thank you. I have the cpu load up to 700%. And the network utilization even more than your numbers. 17 hours and last block is still 400 000 seconds behind current time is it normal values or something goes wrong?

@begetan
Copy link

begetan commented Oct 8, 2019

@ashkov My test node is running for now about 20 hours and still has 60% of blocks height.

They did a big optimization of the disk space usage, but it seems to lead to slower network synchronization. Unfortunately it is unclear how to speed up synchronization process now.

@ton-blockchain
Copy link
Collaborator

Just a couple of guesses.
Do you use OpenSSL 1.1.1? It is crucial for performance.
Do you use a release build of validator?

@akme
Copy link
Contributor

akme commented Oct 8, 2019

Ok, I've rebuilt my node with "Release" version and checked OpenSSL version which 1.1.1, looks like CPU got lower, but sync is still slow.
My guess that it's because of SATA HDD, that has low IOPS performance no write.
Is there any documentation about using SSD drives in front of HDD, so HDD used only for archival data?

@akme
Copy link
Contributor

akme commented Oct 8, 2019

setverbosity 0 helped a bit, do you have a description of verbosity levels?

@akme
Copy link
Contributor

akme commented Oct 8, 2019

Ok, found out that you suggest keeping everything on SSD and only mount slow storage for archival.

@begetan
Copy link

begetan commented Oct 8, 2019

@akme /var/ton-work/db/archive may be pointed to the low IOPS external HDD volume. Anyway I see currently only 3 GB of data from 14 GB of total amount.

HDD may be an issue, because I am also run HDD instance, but it was fine before the last build in the version around 5 of October.

@ashkov
Copy link
Author

ashkov commented Oct 9, 2019

@ton-blockchain Hardware upgrade to system requirements https://test.ton.org/Validator-HOWTO.txt helped to solve the problem. Can I rebuild Validator with Release flag and keep existing database? I don't want to re-synchronize validator.

@begetan
Copy link

begetan commented Oct 9, 2019

I see that full node creates millions of files again in the /var/ton-work/db/files/ and /var/ton-work/db/archive/files directories as it was at the first Testnet launch before several resets: #15

Is there any trigger in the network or node software which enforces the creating of a large number of files? I am not sure that millions of files is a good storage of any kind of blockchain data.

# df -h | grep sda2
/dev/sda2       1.4T   33G  1.3T   3% /
# df -ih | grep sda2
/dev/sda2         88M  2.7M   85M    4% /

@konak
Copy link

konak commented Feb 21, 2022

Seems the issue of sync begins after deleting archive files from archive folder or moving archive folder some other place.

After installing validation node from the scratch everything were working just perfect for more than two month. After moving archive folder to other place and making symbolic link to that folder. Validator database size reduced from 80Gb to 17Gb. Out of sync time started growing. In two days it became 123940 s. The same case has happened about 3 month ago, when I were just testing the server, and after some days of stable working just deleted files in archive folder (nothing else) ...

Storage is SSD not HDD ...
Internet connection 1Gbit
16 core 32Gb RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants