New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-senescence #2274

Closed
nathan-at-least opened this Issue Apr 17, 2017 · 21 comments

Comments

Projects
6 participants
@nathan-at-least
Contributor

nathan-at-least commented Apr 17, 2017

Add a feature to our software that by default causes the node to exit with an error once it is too old, as defined by an explicit public deprecation policy. I prefer allowing users to disable this feature with a configuration setting, because the intent isn't to force them to do anything, but just to ensure that all users know they should upgrade or opt-out.

@radix42

This comment has been minimized.

Show comment
Hide comment
@radix42

radix42 Apr 18, 2017

Contributor

As long as it spits out a useful error message about needing to upgrade to BOTH stdout and debug.log, that sounds good. I've seen some users habitually only check one or the other when things go wrong.

Contributor

radix42 commented Apr 18, 2017

As long as it spits out a useful error message about needing to upgrade to BOTH stdout and debug.log, that sounds good. I've seen some users habitually only check one or the other when things go wrong.

@str4d

This comment has been minimized.

Show comment
Hide comment
@str4d

str4d Apr 18, 2017

Contributor

The configuration setting for opting-out should be something like -disabledeprecation=1.0.9, so that if a user upgrades they need to explicitly re-opt-out. This is fine IMHO because:

  • Upgrades could introduce or change functionality that affects the user's reason for disabling deprecation.
  • Upgrades (in the cases this is relevant for) require user intervention already; if the user's node is auto-upgrading, it won't ever hit deprecation.
Contributor

str4d commented Apr 18, 2017

The configuration setting for opting-out should be something like -disabledeprecation=1.0.9, so that if a user upgrades they need to explicitly re-opt-out. This is fine IMHO because:

  • Upgrades could introduce or change functionality that affects the user's reason for disabling deprecation.
  • Upgrades (in the cases this is relevant for) require user intervention already; if the user's node is auto-upgrading, it won't ever hit deprecation.

@daira daira added this to Discussion in Network Upgrade 0 Apr 19, 2017

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least Apr 19, 2017

Contributor

BTW- we're considering defining auto-senescence deadline in terms of absolute block height. This has two advantages (at least): not dependent on local clock (or ex NTP attacks), and it also allows us to coordinate protocol upgrade schedules based on block heights.

Contributor

nathan-at-least commented Apr 19, 2017

BTW- we're considering defining auto-senescence deadline in terms of absolute block height. This has two advantages (at least): not dependent on local clock (or ex NTP attacks), and it also allows us to coordinate protocol upgrade schedules based on block heights.

@daira daira added this to the 1.0.9 milestone Apr 19, 2017

@daira daira moved this from Discussion to Work Queue in Network Upgrade 0 Apr 19, 2017

@bitcartel

This comment has been minimized.

Show comment
Hide comment
@bitcartel

bitcartel Apr 23, 2017

Contributor

Rather than exit, perhaps after a certain age the node should display a suitable message in the console, metrics screen and also fall into RPC safe mode. Less scary than an abrupt exit for end users.

Contributor

bitcartel commented Apr 23, 2017

Rather than exit, perhaps after a certain age the node should display a suitable message in the console, metrics screen and also fall into RPC safe mode. Less scary than an abrupt exit for end users.

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Apr 23, 2017

Contributor

And disable mining.

Contributor

daira commented Apr 23, 2017

And disable mining.

@radix42

This comment has been minimized.

Show comment
Hide comment
@radix42

radix42 Apr 23, 2017

Contributor

and somehow indicate in json results that it is in that state and why, for wallets connected to that node to know tell that to someone

Contributor

radix42 commented Apr 23, 2017

and somehow indicate in json results that it is in that state and why, for wallets connected to that node to know tell that to someone

@str4d

This comment has been minimized.

Show comment
Hide comment
@str4d

str4d Apr 25, 2017

Contributor

@bitcartel I don't think the proposal was for an abrupt exit, but to effectively be a call to Shutdown() (so files get flushed and closed, etc.) and then a block during startup that checks for deprecation.

Whether I lean towards auto-shutdown or not depends on the deprecation timescale. I'm currently more in favour of auto-shutdown than against, but if we didn't implement auto-shutdown then I agree it should be strictly more of an effect than if we had sent an alert that put the RPC into safe mode - otherwise, we could just do all this with alerts. (In fact, we still could, by defining a higher alert level, but IIRC we dismissed that idea in a meeting.)

Contributor

str4d commented Apr 25, 2017

@bitcartel I don't think the proposal was for an abrupt exit, but to effectively be a call to Shutdown() (so files get flushed and closed, etc.) and then a block during startup that checks for deprecation.

Whether I lean towards auto-shutdown or not depends on the deprecation timescale. I'm currently more in favour of auto-shutdown than against, but if we didn't implement auto-shutdown then I agree it should be strictly more of an effect than if we had sent an alert that put the RPC into safe mode - otherwise, we could just do all this with alerts. (In fact, we still could, by defining a higher alert level, but IIRC we dismissed that idea in a meeting.)

@str4d

This comment has been minimized.

Show comment
Hide comment
@str4d

str4d Apr 25, 2017

Contributor

I'm going to go ahead with implementing the apoptosis variant of senescence, which should be mostly pre-cursor work to safe-mode-style deprecation.

Contributor

str4d commented Apr 25, 2017

I'm going to go ahead with implementing the apoptosis variant of senescence, which should be mostly pre-cursor work to safe-mode-style deprecation.

str4d added a commit to str4d/zcash that referenced this issue Apr 25, 2017

@str4d str4d self-assigned this Apr 25, 2017

@daira daira moved this from Work Queue to In Progress in Network Upgrade 0 Apr 26, 2017

@bitcartel

This comment has been minimized.

Show comment
Hide comment
@bitcartel

bitcartel Apr 28, 2017

Contributor

Some nodes are running on headless servers in the cloud. They may be left running without any further human interaction, in order to support a function e.g. explorer. The user may not be checking for warning or deprecation messages in getinfo. As @radix42 mentions, we should consider a way to communicate the change in state from normal to deprecated (assuming rpc safe mode and not shutdown).

Contributor

bitcartel commented Apr 28, 2017

Some nodes are running on headless servers in the cloud. They may be left running without any further human interaction, in order to support a function e.g. explorer. The user may not be checking for warning or deprecation messages in getinfo. As @radix42 mentions, we should consider a way to communicate the change in state from normal to deprecated (assuming rpc safe mode and not shutdown).

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Apr 29, 2017

Contributor

See also #2268 (How to report serious errors in a way that reliably attracts node operators' attention).

Contributor

daira commented Apr 29, 2017

See also #2268 (How to report serious errors in a way that reliably attracts node operators' attention).

@bitcartel

This comment has been minimized.

Show comment
Hide comment
@bitcartel

bitcartel Apr 29, 2017

Contributor

In the world of shrink-wrap software, you can still run old deprecated versions of Turbotax. Users are reassured that they can still access their old data. It might be scary for users if they are not able to launch the software to export their private keys and obtain transactional data stored locally. They may decide that they do not want to run the latest version of zcashd for some reason (technical, political, etc), so upgradiing in order to access their data is not an option. We should look at RPC safe mode to see what functionality a user can still perform and consider if there should be another mode.

Contributor

bitcartel commented Apr 29, 2017

In the world of shrink-wrap software, you can still run old deprecated versions of Turbotax. Users are reassured that they can still access their old data. It might be scary for users if they are not able to launch the software to export their private keys and obtain transactional data stored locally. They may decide that they do not want to run the latest version of zcashd for some reason (technical, political, etc), so upgradiing in order to access their data is not an option. We should look at RPC safe mode to see what functionality a user can still perform and consider if there should be another mode.

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira Apr 29, 2017

Contributor

@bitcartel wrote:

They may decide that they do not want to run the latest version of zcashd for some reason (technical, political, etc), so upgrading in order to access their data is not an option.

As far as I understood the proposed design, they would always be able to stick with an older version by setting a config field (once each time they upgrade).

Contributor

daira commented Apr 29, 2017

@bitcartel wrote:

They may decide that they do not want to run the latest version of zcashd for some reason (technical, political, etc), so upgrading in order to access their data is not an option.

As far as I understood the proposed design, they would always be able to stick with an older version by setting a config field (once each time they upgrade).

@bitcartel

This comment has been minimized.

Show comment
Hide comment
@bitcartel

bitcartel Apr 30, 2017

Contributor

Having to update a config field each time adds friction to the upgrade process. One more thing to remember. Having disabledeprecation be a boolean option might be easier for end users.

I quite like the idea of having an option like deprecationpolicy=... which accepts values:

  • none = do nothing (just like now)
  • warning (default) = show deprecated/unsupported warnings in results of rpc commands, metrics, logging etc
  • safemode = warnings + put node into safe mode when deprecated
  • shutdown = warnings + shut down node when deprecated

This way, the user is in control and can decide what happens, rather than having the software behave in a way they did not expect, as they did not read the documentation, update a config file etc

Contributor

bitcartel commented Apr 30, 2017

Having to update a config field each time adds friction to the upgrade process. One more thing to remember. Having disabledeprecation be a boolean option might be easier for end users.

I quite like the idea of having an option like deprecationpolicy=... which accepts values:

  • none = do nothing (just like now)
  • warning (default) = show deprecated/unsupported warnings in results of rpc commands, metrics, logging etc
  • safemode = warnings + put node into safe mode when deprecated
  • shutdown = warnings + shut down node when deprecated

This way, the user is in control and can decide what happens, rather than having the software behave in a way they did not expect, as they did not read the documentation, update a config file etc

@radix42

This comment has been minimized.

Show comment
Hide comment
@radix42

radix42 Apr 30, 2017

Contributor

I very much like this idea, @bitcartel

Contributor

radix42 commented Apr 30, 2017

I very much like this idea, @bitcartel

@daira

This comment has been minimized.

Show comment
Hide comment
@daira

daira May 1, 2017

Contributor

Perhaps the node should exit with an error message on startup if the -disabledeprecation setting is present but does not match the current version. Then the node operator is required to make an explicit choice of whether to enable/disable deprecation each time they upgrade.

Contributor

daira commented May 1, 2017

Perhaps the node should exit with an error message on startup if the -disabledeprecation setting is present but does not match the current version. Then the node operator is required to make an explicit choice of whether to enable/disable deprecation each time they upgrade.

@samsmith

This comment has been minimized.

Show comment
Hide comment
@samsmith

samsmith May 3, 2017

Reading the blog post content, I’m unclear on the wisdom of having an automated calendar based deprecation cycle, when there is a manual release process in which reality can intervene.

Rather than hard-coding it to 4 months (ie after 3 intermediate releases), auto-senescence could use information from the chain itself to determine when the client is now too old (and then either exit, warn, or carry on regardless) rather than a hardcoded/predefined date.

Whether you deprecate via a signed message on a chain, detecting newer version numbers of daemons, or block numbers, is something that I leave as an implementation detail.

It would allow you to change the behaviour of deployed code, without having to be locked into past decisions, in the way a calendar deprecation schedule locks you in. Because you guys deserve time off over xmas, and that then screws up your schedule at least once a year.

In the case of a remotely executable vulnerability, or consensus failure, you can simply age out all past clients or drop them into a read-only mode; and this mechanism should also work for third party z-cash clients, and delayed-but-automated upgrade processes.

samsmith commented May 3, 2017

Reading the blog post content, I’m unclear on the wisdom of having an automated calendar based deprecation cycle, when there is a manual release process in which reality can intervene.

Rather than hard-coding it to 4 months (ie after 3 intermediate releases), auto-senescence could use information from the chain itself to determine when the client is now too old (and then either exit, warn, or carry on regardless) rather than a hardcoded/predefined date.

Whether you deprecate via a signed message on a chain, detecting newer version numbers of daemons, or block numbers, is something that I leave as an implementation detail.

It would allow you to change the behaviour of deployed code, without having to be locked into past decisions, in the way a calendar deprecation schedule locks you in. Because you guys deserve time off over xmas, and that then screws up your schedule at least once a year.

In the case of a remotely executable vulnerability, or consensus failure, you can simply age out all past clients or drop them into a read-only mode; and this mechanism should also work for third party z-cash clients, and delayed-but-automated upgrade processes.

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least May 3, 2017

Contributor

Rather than hard-coding it to 4 months (ie after 3 intermediate releases), auto-senescence could use information from the chain itself to determine when the client is now too old (and then either exit, warn, or carry on regardless) rather than a hardcoded/predefined date.

We plan to use block height, not actual dates as per #2274 (comment). However, note that for this feature the timing doesn't have to match the policy. For this feature I propose we lock down the node based on a block height that has a very high probability of occurring sometime after the 'deprecation date'.

Whether you deprecate via a signed message on a chain, detecting newer version numbers of daemons, or block numbers, is something that I leave as an implementation detail.

These options have important differences. We're going with block numbers. A signed on-chain message leaves us in control of deprecation as a centralizing force, which we are largely motivated to move away from. (Note: we still retain control over the in-band alert keys, which are still such a centralized control, so I'd like to deal with those separately someday.) Relying on version numbers opens the protocol to various malicious attacks, such as connecting a bunch of nodes with a 'fake' new version to trigger existing nodes to drop out.

Contributor

nathan-at-least commented May 3, 2017

Rather than hard-coding it to 4 months (ie after 3 intermediate releases), auto-senescence could use information from the chain itself to determine when the client is now too old (and then either exit, warn, or carry on regardless) rather than a hardcoded/predefined date.

We plan to use block height, not actual dates as per #2274 (comment). However, note that for this feature the timing doesn't have to match the policy. For this feature I propose we lock down the node based on a block height that has a very high probability of occurring sometime after the 'deprecation date'.

Whether you deprecate via a signed message on a chain, detecting newer version numbers of daemons, or block numbers, is something that I leave as an implementation detail.

These options have important differences. We're going with block numbers. A signed on-chain message leaves us in control of deprecation as a centralizing force, which we are largely motivated to move away from. (Note: we still retain control over the in-band alert keys, which are still such a centralized control, so I'd like to deal with those separately someday.) Relying on version numbers opens the protocol to various malicious attacks, such as connecting a bunch of nodes with a 'fake' new version to trigger existing nodes to drop out.

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least May 3, 2017

Contributor

@bitcartel: I don't think shutting down is scary. It happens all the time with all kinds of server software for all kinds of reasons. A segfault is scary, but an orderly shutdown with an error message is not.

In fact, several of my zcash nodes just did an orderly shutdown because the disk was full. If instead they kept running but failing to update indexes and returning some RPC results but not others, I wonder how long it would have taken me to realize what the problem was?

Anyway, this is just my own opinionated stance on UX, and I don't mean to impose it as a design requirement. I'm like -0.5 on having more options for behavior on the general rule that more config options means more complex behavior in the wild, so when we're helping users with bugs or problems there's just that many more cases to analyze.

Contributor

nathan-at-least commented May 3, 2017

@bitcartel: I don't think shutting down is scary. It happens all the time with all kinds of server software for all kinds of reasons. A segfault is scary, but an orderly shutdown with an error message is not.

In fact, several of my zcash nodes just did an orderly shutdown because the disk was full. If instead they kept running but failing to update indexes and returning some RPC results but not others, I wonder how long it would have taken me to realize what the problem was?

Anyway, this is just my own opinionated stance on UX, and I don't mean to impose it as a design requirement. I'm like -0.5 on having more options for behavior on the general rule that more config options means more complex behavior in the wild, so when we're helping users with bugs or problems there's just that many more cases to analyze.

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least May 3, 2017

Contributor

Actually I think this was disingenuous:

In fact, several of my zcash nodes just did an orderly shutdown because the disk was full. If instead they kept running but failing to update indexes and returning some RPC results but not others, I wonder how long it would have taken me to realize what the problem was?

In truth, I ran a zcash-cli command, and it reported that there was no server response, so I looked at the log, saw orderly shutdown messages, and then scanned back to see:

2017-05-02 23:48:35 *** Disk space is low!

In fact, if zcash-cli had just immediately said 'error: disk space is low' it would have saved me time, so my argument doesn't pan out in this case. ;-)

Contributor

nathan-at-least commented May 3, 2017

Actually I think this was disingenuous:

In fact, several of my zcash nodes just did an orderly shutdown because the disk was full. If instead they kept running but failing to update indexes and returning some RPC results but not others, I wonder how long it would have taken me to realize what the problem was?

In truth, I ran a zcash-cli command, and it reported that there was no server response, so I looked at the log, saw orderly shutdown messages, and then scanned back to see:

2017-05-02 23:48:35 *** Disk space is low!

In fact, if zcash-cli had just immediately said 'error: disk space is low' it would have saved me time, so my argument doesn't pan out in this case. ;-)

@nathan-at-least

This comment has been minimized.

Show comment
Hide comment
@nathan-at-least

nathan-at-least May 3, 2017

Contributor

Pondering a bit more if there were a 'safe mode' that would block all RPC calls and always return an error message with a clear description, that's what I would prefer.

If safe mode only blocks some calls, I consider it dangerous. What if a block explorer is using unfrozen methods, but that node is now on a minority fork without realizing it? People relying on that block explorer might make faulty decisions, etc… The impact is harder to predict.

Contributor

nathan-at-least commented May 3, 2017

Pondering a bit more if there were a 'safe mode' that would block all RPC calls and always return an error message with a clear description, that's what I would prefer.

If safe mode only blocks some calls, I consider it dangerous. What if a block explorer is using unfrozen methods, but that node is now on a minority fork without realizing it? People relying on that block explorer might make faulty decisions, etc… The impact is harder to predict.

str4d added a commit to str4d/zcash that referenced this issue May 12, 2017

@bitcartel

This comment has been minimized.

Show comment
Hide comment
@bitcartel

bitcartel May 13, 2017

Contributor

I think this feature should be disabled by default as it goes against the grain of what users expect. There are few (any?) examples of software doing what we're proposing. It's also a short-term fix in lieu of providing users with an option to auto-upgrade, which users are familiar with.

Also, shouldn't this feature be considered "experimental" ?

Contributor

bitcartel commented May 13, 2017

I think this feature should be disabled by default as it goes against the grain of what users expect. There are few (any?) examples of software doing what we're proposing. It's also a short-term fix in lieu of providing users with an option to auto-upgrade, which users are familiar with.

Also, shouldn't this feature be considered "experimental" ?

zkbot added a commit that referenced this issue May 14, 2017

Auto merge of #2297 - str4d:2274-apoptosis, r=nathan-at-least
Implement automatic shutdown of deprecated Zcash versions

Closes #2274.

zkbot added a commit that referenced this issue May 15, 2017

Auto merge of #2297 - str4d:2274-apoptosis, r=nathan-at-least
Implement automatic shutdown of deprecated Zcash versions

Closes #2274.

@zkbot zkbot closed this in #2297 May 15, 2017

@nathan-at-least nathan-at-least moved this from In Progress to Complete in Network Upgrade 0 Jun 19, 2017

@daira daira removed the NU0 wishlist label Nov 9, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment