Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/better exit messages #657

Merged
merged 32 commits into from
Jun 26, 2023
Merged

Feat/better exit messages #657

merged 32 commits into from
Jun 26, 2023

Conversation

rvermootenct
Copy link
Contributor


name: Pull Request
about: Create a pull request to make a contribution
labels:


IMPORTANT NOTICE:
I read and understood the guidelines for contributions to the TRD. The contribution may qualify for being compensated by the TRD grant if approved by the maintainers.

This PR resolves the issue #653 . The following steps were performed:

  • Analysis: It would be nice for TRD to give exit messages that arn't just 0. It would also be nice to have a utility we could add to with different exit situations.

  • Solution: Create that utility

  • Implementation: Created a lil exit utility and attached it to the log file duplicate check.

  • Performed tests: Added tests and ran it with a lockfile present, confirmed exit code was not 0.

  • Documentation: I think the enum idea self documents? I'm on the fence about having different exit codes for different exit reasons we can come up with or just have 0 and 1, thoughts @nicolasochem?

Work effort:

  • GIF tax:

@jdsika jdsika added the bug Something isn't working label Jan 31, 2023
@jdsika jdsika added this to the v11.5 (Lima) milestone Jan 31, 2023
@jdsika
Copy link
Contributor

jdsika commented Feb 1, 2023

@nicolasochem does that solve your issue?

@jdsika
Copy link
Contributor

jdsika commented Feb 2, 2023

@rvermootenct work effort missing. I propose to merge today and make further improvements in a separate PR

@nicolasochem
Copy link
Contributor

@rvermootenct @jdsika thanks!

It does sound useful to have a utility, and I think having 3 possible exit codes is fine.

After a cursory look, it looks like you are only catching USER_ABORT here? And never GENERAL_ERROR?

Can you also add the case where the program ends in error:

  • because of a crash
  • unreachable signer,
  • not enough money in the payout
  • unreachable rpc..

These are the scenarios for which I would like my infra to alert me.

@jdsika
Copy link
Contributor

jdsika commented Feb 17, 2023

@rvermootenct I would like to include this in a release that I want to start to prepare. I think we need to put a date on this one

@jdsika
Copy link
Contributor

jdsika commented Mar 12, 2023

When will you finish this PR? We will soon have a new protocol version

@nicolasochem
Copy link
Contributor

It looks like the last commit adds multiple error types, but it still does not catch the generic error as I was calling for, is this correct @rvermootenct ?

@jdsika
Copy link
Contributor

jdsika commented Mar 24, 2023

make sure to integrate #611 here "clear lockfile when properly shut down"

"Properly" means in this case that there are no threads running in the background anymore which could potentially trigger a payment.

@rvermootenct rvermootenct changed the title Feat/better exit Feat/better exit messages Mar 27, 2023
@rvermootenct
Copy link
Contributor Author

rvermootenct commented Mar 27, 2023

make sure to integrate #611 here "clear lockfile when properly shut down"

"Properly" means in this case that there are no threads running in the background anymore which could potentially trigger a payment.

This PR only gives better exit messages and exit codes. The state machine in this thing is very confusing to me and I've tried to make sense of it but I think someone else will be better equipped to figure out how to ensure the lockfiles are correctly removed at the correct times. As I'm moving away from this project/ecosystem I don't think it's worth anyones while me spending time struggling through this. I'm willing to spend some time today to try figure this out, but if I can't crack it I'd like to not add to this PR.

I'd like this work to be thoroughly checked too because I don't want this pr to incorrectly exit program.

@nicolasochem I hope now I have caught the generic errors. If not can we please have a chat sometime this week and you can explain to me the situation?

service_add.py Outdated Show resolved Hide resolved
raise ClientException(
"Unknown Error at signing. Please consult the verbose logs!"
exit_program(
ExitCode.SIGNER_ERROR_NOT_RUNNING, ExitMessage.SIGNER_ERROR_NOT_RUNNING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be any other error including "the signer not running", correct?

src/configure.py Outdated Show resolved Hide resolved
if disk_is_full():
running = False
break
exit_program(ExitCode.NO_SPACE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the break here and exit directly? Doesn't this change the behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it would just leave the run at the moment and break, but if the disk is full why not just exit?

tests/utils.py Outdated Show resolved Hide resolved
@nicolasochem
Copy link
Contributor

I tried the code while my signer was not running and got nested exceptions. TypeError: exit_program() missing 1 required positional argument: 'exit_message'. It does look like one parameter is missing to exit_program in some cases.

2023-03-27 23:04:22,725 - MainThread - INFO - --------------------------------------------------------
2023-03-27 23:04:22,725 - MainThread - INFO - Sensitive operations are in progress!
2023-03-27 23:04:22,725 - MainThread - INFO - Please wait while the application is being shut down!
2023-03-27 23:04:22,725 - MainThread - INFO - --------------------------------------------------------
Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 174, in _new_conn
    conn = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 95, in create_connection
    raise err
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
                       ^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 398, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 244, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/lib64/python3.11/http/client.py", line 1282, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1328, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1277, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1037, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.11/http/client.py", line 975, in send
    self.connect()
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 205, in connect
    conn = self._new_conn()
           ^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connection.py", line 186, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=6732): Max retries exceeded with url: /keys/tz1ejA7UWkdVk9wYkLGnReq2qrmyi5Po86FK (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 210, in _do_request
    response = requests.request(
               ^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/requests/adapters.py", line 565, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=6732): Max retries exceeded with url: /keys/tz1ejA7UWkdVk9wYkLGnReq2qrmyi5Po86FK (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fcf28824590>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 135, in check_pkh_known_by_signer
    response = self._do_request(method="GET", url=url, timeout=timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 224, in _do_request
    exit_program(ExitCode.SIGNER_ERROR, e)
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
       ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'ConnectionError' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 216, in start
    self.fsm.trigger_event(TrdEvent.LOAD_CONFIG)
  File "/home/nochem/workspace/tezos-reward-distributor/src/fsm/TransitionsFsmModel.py", line 26, in trigger_event
    self.trigger(event, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 922, in _get_trigger
    return event.trigger(model, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 402, in trigger
    return self.machine._process(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1211, in _process
    return trigger()
           ^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 416, in _trigger
    self._process(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 439, in _process
    if trans.execute(event_data):
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 277, in execute
    self._change_state(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 287, in _change_state
    event_data.machine.get_state(self.dest).enter(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 129, in enter
    event_data.machine.callbacks(self.on_enter, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1146, in callbacks
    self.callback(func, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1165, in callback
    func(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 294, in do_load_config
    cfg_life_cycle.start()
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/config_life_cycle.py", line 86, in start
    self.fsm.trigger_event(ConfigEvent.VALIDATE)
  File "/home/nochem/workspace/tezos-reward-distributor/src/fsm/TransitionsFsmModel.py", line 26, in trigger_event
    self.trigger(event, *args, **kwargs)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 922, in _get_trigger
    return event.trigger(model, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 402, in trigger
    return self.machine._process(func)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1211, in _process
    return trigger()
           ^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 416, in _trigger
    self._process(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 439, in _process
    if trans.execute(event_data):
       ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 277, in execute
    self._change_state(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 287, in _change_state
    event_data.machine.get_state(self.dest).enter(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 129, in enter
    event_data.machine.callbacks(self.on_enter, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1146, in callbacks
    self.callback(func, event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/venv/lib/python3.11/site-packages/transitions/core.py", line 1165, in callback
    func(event_data)
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/config_life_cycle.py", line 126, in do_validate_cfg
    self.__parser.validate()
  File "/home/nochem/workspace/tezos-reward-distributor/src/config/yaml_baking_conf_parser.py", line 74, in validate
    self.validate_payment_address(conf_obj)
  File "/home/nochem/workspace/tezos-reward-distributor/src/config/yaml_baking_conf_parser.py", line 238, in validate_payment_address
    self.clnt_mngr.check_pkh_known_by_signer(pymnt_addr)
  File "/home/nochem/workspace/tezos-reward-distributor/src/cli/client_manager.py", line 137, in check_pkh_known_by_signer
    exit_program(ExitCode.SIGNER_ERROR, f"{e}\n{signer_exception}")
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
       ^^^^^^^^^^^^^^^^^^^^^^^
TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 146, in <module>
    start_application()
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 127, in start_application
    life_cycle.start()
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 245, in start
    self.shut_down_on_error()
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 386, in shut_down_on_error
    exit_program(ExitCode.GENERAL_ERROR)
TypeError: exit_program() missing 1 required positional argument: 'exit_message'

@nicolasochem
Copy link
Contributor

It's still not working. When signer is off, I'm still getting

TypeError: 'str' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 146, in <module>
    start_application()
  File "/home/nochem/workspace/tezos-reward-distributor/src/main.py", line 127, in start_application
    life_cycle.start()
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 245, in start
    self.shut_down_on_error()
  File "/home/nochem/workspace/tezos-reward-distributor/src/util/process_life_cycle.py", line 386, in shut_down_on_error
    exit_program(ExitCode.GENERAL_ERROR)
TypeError: exit_program() missing 1 required positional argument: 'exit_message'

An error message is needed here: https://github.com/tezos-reward-distributor-organization/tezos-reward-distributor/pull/657/files#diff-acda51ab96991dc20d4760e24db3206e89b406b0ee525a93ef6ff7ed290dbd44R386

@nicolasochem
Copy link
Contributor

Another error I got:

2023-03-29 04:46:05,602 - consumer0 - DEBUG - Consumer returning...
2023-03-29 04:46:05,602 - producer  - DEBUG - Unknown error in payment producer loop: 'str' object is not callable
Traceback (most recent call last):
  File "/app/src/pay/payment_producer.py", line 337, in run
    self.exit(ExitCode.SUCCESS)
  File "/app/src/pay/payment_producer.py", line 152, in exit
    exit_program(
  File "/app/src/util/exit_program.py", line 25, in exit_program
    if exit_message(exit_code):
TypeError: 'str' object is not callable
2023-03-29 04:46:05,603 - producer  - ERROR - Unknown error in payment producer loop: 'str' object is not callable, will try again.
2023-03-29 04:46:05,603 - producer  - DEBUG - Producer returning...

@jdsika
Copy link
Contributor

jdsika commented Mar 29, 2023

@vkresch I see also a topic with the logger. I think the global logger should trigger the exit of the function when an error log is thrown right?

@vkresch
Copy link
Contributor

vkresch commented Mar 29, 2023

@jdsika please post pone this feature as I would like to have more time to look into it. Currently the implementation seems not to fix the initial issue.

@jdsika
Copy link
Contributor

jdsika commented Mar 29, 2023

I would call it "temporarily disabled until time for a fix" but if you want to call the "removal of the feature" :D

@jdsika
Copy link
Contributor

jdsika commented Mar 29, 2023

@vkresch I see also a topic with the logger. I think the global logger should trigger the exit of the function when an error log is thrown right?

IMO and error in the log must trigger a graceful shutdown - yes!

@vkresch
Copy link
Contributor

vkresch commented Apr 11, 2023

Work effort: 4h

@vkresch
Copy link
Contributor

vkresch commented Apr 11, 2023

@nicolasochem @jdsika could you test your usecase again?

@nicolasochem
Copy link
Contributor

@vkresch it works now, I tried 3 things:

  • launch TRD but no RPC accessible => exit code != 0
  • launch TRD but no signer accessible => exit code != 0
  • launch TRD but lock file present => exit code != 0

So it looks like you fixed the issue 👍

Copy link
Contributor

@nicolasochem nicolasochem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works after quick testing. Failure modes result in non-zero exit status.

@vkresch
Copy link
Contributor

vkresch commented Jun 25, 2023

@nicolasochem gonna try to fix the tests today here and then we can merge

@vkresch
Copy link
Contributor

vkresch commented Jun 26, 2023

@jdsika @nicolasochem mergable if needed

@jdsika jdsika merged commit 83cdc68 into master Jun 26, 2023
8 checks passed
@jdsika jdsika deleted the feat/better_exit branch June 26, 2023 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants