Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a fix for event handling of different errors #8891

Merged
merged 1 commit into from
Mar 26, 2025

Conversation

achouhan09
Copy link
Member

@achouhan09 achouhan09 commented Mar 21, 2025

Describe the Problem

Previously, when an InternalError occurred during an operation, it was not recorded as an event.

Explain the Changes

  1. Implemented a fix to ensure that InternalError and other error types are properly recorded as events.

Issues: Fixed #xxx / Gap #xxx

  1. Fix: NC | NSFS | CLI | Events Improvements #8199

Testing Instructions:

  1. Manual testing instructions:
  • Use command make rpm inside noobaa-core to create rpm, which will be found in dir build/rpm/ after creation.
  • Start Redhat UBI9 container and enter the shell using below commands:
    $ docker pull redhat/ubi9:latest
    $ docker run --privileged -it --user root -d --platform=linux/amd64 redhat/ubi9 /usr/sbin/init
    $ docker exec -it <container-id> bash
  • Copy rpm created in step 1 to the container from noobaa-core(update rpm name):
    $ docker cp build/rpm/noobaa-core-5.19.0-20250326.el9.x86_64.rpm <container-id>:tmp/
  • Install dependencies in the container using below commands:
    $ yum install -y rsyslog wget make initscripts
    $ systemctl start rsyslog
    $ wget https://rpmfind.net/linux/centos-stream/9-stream/AppStream/x86_64/os/Packages/boost-system-1.75.0-8.el9.x86_64.rpm
    $ wget https://rpmfind.net/linux/centos-stream/9-stream/AppStream/x86_64/os/Packages/boost-thread-1.75.0-8.el9.x86_64.rpm
    $ rpm -i boost-system-1.75.0-8.el9.x86_64.rpm
    $ rpm -i boost-thread-1.75.0-8.el9.x86_64.rpm
  • Install noobaa rpm inside the container using below command(update rpm name):
    $ rpm -i tmp/noobaa-core-5.19.0-20250326.el9.x86_64.rpm
    $ rpm -qa | grep noobaa (check if noobaa installed or not)
  • Do something in the noobaa code to throw InternalError, for example try adding this throw ManageCLIError.InternalError; as first line of function delete_account() in manage_nsfs.js. After that try creating and then deleting account using noobaa-cli and check the noobaa-events in the below steps --> new event for InternalError should be created.
  • Check for noobaa-events using below command:
    $ cat var/log/noobaa_events.log

[root@8ccc29c3ce83 /]# cat var/log/noobaa_events.log
Mar 20 16:09:57 8ccc29c3ce83 node[218]: {"timestamp":"2025-03-20T16:09:57.526Z","host":"8ccc29c3ce83","event":{"code":"noobaa_gpfslib_missing","message":"Noobaa GPFS library file is missing","description":"Noobaa GPFS library file is missing","entity_type":"NODE","event_type":"STATE_CHANGE","scope":"NODE","severity":"ERROR","state":"DEGRADED","arguments":{"gpfs_dl_path":"/usr/lpp/mmfs/lib/libgpfs.so"},"pid":218}}
Mar 20 16:09:58 8ccc29c3ce83 [218]: {"timestamp":"2025-03-20T16:09:58.009Z","host":"8ccc29c3ce83","event":{"code":"noobaa_started","message":"Noobaa started","description":"Noobaa started running","entity_type":"NODE","event_type":"STATE_CHANGE","scope":"NODE","severity":"INFO","state":"HEALTHY","pid":218}}
Mar 20 16:12:30 8ccc29c3ce83 node[274]: {"timestamp":"2025-03-20T16:12:30.399Z","host":"8ccc29c3ce83","event":{"code":"noobaa_account_created","message":"Account created","description":"Noobaa Account created","entity_type":"NODE","event_type":"INFO","scope":"NODE","severity":"INFO","state":"HEALTHY","arguments":{"account":"aayush"},"pid":274}}
Mar 21 14:38:54 8ccc29c3ce83 node[324]: {"timestamp":"2025-03-21T14:38:54.682Z","host":"8ccc29c3ce83","event":{"code":"noobaa_internal_error","message":"Noobaa action failed with internal error","description":"Noobaa action failed with internal error","entity_type":"NODE","event_type":"ERROR","scope":"NODE","severity":"ERROR","state":"HEALTHY","pid":324}}

  • Tests added

event_type: 'ERROR',
scope: 'NODE',
severity: 'ERROR',
state: 'DEGRADED'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why state is DEGRADED?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per my understanding, by DEGRADED state we are referring that the operation can still be performed by making some adjustments to the command(could be some permission issue). Please correct me if I am wrong

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it affects the state of the cluster, @naveenpaul1 can you explain when event's state should be DEGRADED and when it should be HEALTHY?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @romayalon, We have added DEGRADED stat to those events that could be the reason for potential I/O failers and core Noobaa config dir-related issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naveenpaul1 but does it do anything to the cluster? like ip movement etc? or it might happen only if health reported an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romayalon I also think this event should not have DEGRADED stat. For I/O events we added DEGRADED stat assuming could be because of some internal issue something related to cluster or storage itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will update it to HEALTHY then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naveenpaul1 @achouhan09 maybe you can add this definition of DEGRADED stat in our docs so it would be clear to others as well.

event_type: 'ERROR',
scope: 'NODE',
severity: 'ERROR',
state: 'DEGRADED'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @romayalon, We have added DEGRADED stat to those events that could be the reason for potential I/O failers and core Noobaa config dir-related issue.

Signed-off-by: Aayush Chouhan <achouhan@redhat.com>
@achouhan09 achouhan09 merged commit 185ab19 into noobaa:master Mar 26, 2025
11 of 12 checks passed
@achouhan09 achouhan09 deleted the event-fix branch March 26, 2025 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NC | NSFS | CLI | Events Improvements
4 participants