Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need stricter checks for minting #27

Closed
sabrinasadik opened this issue Jul 18, 2023 · 9 comments
Closed

Need stricter checks for minting #27

sabrinasadik opened this issue Jul 18, 2023 · 9 comments

Comments

@sabrinasadik
Copy link

@LeeSmet pls add comments here

@LeeSmet
Copy link
Contributor

LeeSmet commented Jul 18, 2023

  1. Reduce allowed uptime out of bounds from 5 minutes to 1 minute (see diff in 5c99834 for reasoning)
  2. Reduce allowed downtime from power managed nodes from 25 hours to 24 hours (25 was an accidental leftover commit after testing purposes)
  3. Add violation for nodes if uptime increased less than time increased (accounting for 1 minute of skew as per point 1). This was part of the original implementation but was later allowed in the early days of V3 since we were still figuring out how infrastructure was handled, and manual validation at that time showed no issues. As it often goes with these things, it has been forgotten and stayed part of the code base ever since.
  4. Enforce max delay between uptime reports of 41 minutes (40 minutes zos interval + 1 minute of skew as per point 1). See Enforce maximum delay of consecutive uptime reports for online nodes #22 for reasoning.
  5. Add violation for nodes if the twin does not have a relay set (happened before, node is essentially not usable for the grid)
  6. Add violation for nodes if the twin has a public key in invalid format (as that won't work for sure) (should never happen)
  7. Fix twin decoding since it's currently broken. Several twins will have an invalid twin id and other garbage data when loaded from chain with the current code
  8. Add violation for the case where a node twin does not exist (should never happen, not sure if the chain prevents this in case a node is attached to the twin).
  9. Add violation for long term clock skew (cfr. Some nodes have a lot of clock skew zos#1914)
  10. Miscellaneous code improvements which aren't directly related to payout (e.g. remove dead code, small performance improvement, better wording of violations for easy debugging, filling in sheets to check based on the data from receipt instead of calculating twice leading to possible discrepancies between data in overview and in actual payout file, ...)

i.e. revert the revert of all these commits.

The following point is in relation to v0.2.0 of the farmerbot and might be better of in its own feature, not sure.

  1. Add max delay of half an hour between node power target rising edge and node uptime event of reboot. Farmerbot v0.2.0 introduced random wakeups to combat fraud. Since periodic wakeups continue as normal, a node could just ignore this (as currently nodes are only required to post uptime every 24/25 hours, see point 2, while farmerbot is running). The idea here is that since a random wakeup is unpredictable, this check essentially makes sure the hardware is still there. This check will also apply for regular wakeups. If the treshold is passed, the node gets stuck with a violation (and it doesn't mint for the month). 30 minutes is debatable but should be sufficient for even heavy duty servers to fully boot. Notice that this (time between rising edge power target and node boot) is also the delay a user has to wait for a node to come online if he deploys on a power managed node.

@LeeSmet
Copy link
Contributor

LeeSmet commented Jul 18, 2023

In addition to the last point, there also needs to be consideration how we can enforce the presence of random wakeups over time in case the farmerbot is running.

@xmonader
Copy link

  1. Add violation for nodes if the twin does not have a relay set (happened before, node is essentially not usable for the grid)
  2. Add violation for nodes if the twin has a public key in invalid format (as that won't work for sure) (should never happen)
  3. Fix twin decoding since it's currently broken. Several twins will have an invalid twin id and other garbage data when loaded from chain with the current code
  4. Add violation for the case where a node twin does not exist (should never happen, not sure if the chain prevents this in case a node is attached to the twin).

I believe violations 5,6,7, 8 aren't on the farmer, but ZOS should refuse to boot new nodes if these ever happens

Is it possible for the farmerbot to expand with after wakeup calls it tries to reserve and deploy on the reported capacity?

@LeeSmet
Copy link
Contributor

LeeSmet commented Jul 20, 2023

while 5 6 and 8 are indeed not really user errors, the goal of the minting is to mint tokens for useable capacity. If any of these conditions are true, then the node is not useable, and therefore it should not get tokens rewarded. While that is probably not the users fault, the minting doesn't care for that. If such an event should happen, the user can still be reimbursed, just not through the minting.

7 is referring to the fact that the current way in which twins are decoded from the chain data is inherently broken. This is just a problem local to the minting.

Is it possible for the farmerbot to expand with after wakeup calls it tries to reserve and deploy on the reported capacity?

I suppose it should be, though if that is implemented there needs to be some mechanism to inform the minting that this failed (which means pushing some data on the chain).

@xmonader
Copy link

I see, were there any actual reports from farmers regarding the zos concerning points? Because if so, some updates need to happen in zos, but if not, we can go a head indeed with these updates

@robvanmieghem we are waiting for your input to proceed with this

@LeeSmet
Copy link
Contributor

LeeSmet commented Jul 31, 2023

Not to my knowledge, and there doesn't seem to be any open bugs about this on the zos repo

@xmonader
Copy link

Alright, should we have a forum post explaining the upcoming changes ?

@LeeSmet
Copy link
Contributor

LeeSmet commented Aug 1, 2023

I suppose there needs to be a GEP about it, something for @sabrinasadik to handle when she's back next week

@scottyeager
Copy link

  1. Reduce allowed downtime from power managed nodes from 25 hours to 24 hours (25 was an accidental leftover commit after testing purposes)

The extra hour here is currently saving us from fallout due to this bug in farmerbot that was introduced with the random wakeups. Since development of farmerbot has been stalled for some time, it's not clear when we'll be able to deploy a fix.

Lets please wait to make this one change in minting until the plan for next version of farmerbot can be carried out.

@LeeSmet LeeSmet closed this as completed Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants