Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kibana server is not ready yet | Community help request for troubleshooting #5218

Closed
Cybercop-Training opened this issue Feb 20, 2023 · 8 comments
Assignees
Labels
reporter/community Issue reported by the community

Comments

@Cybercop-Training
Copy link

Hello Community

I don't want report a bug, but I got stuck with a kibana error in my wazuh server installation.

One year ago I did setup a wazuh installation and I've used the official installation assistant script from the wazuh documentation.
Everything worked fine and I did successfully rollout some wazuh agents in my network and I was amazed about the data and visualization I got from them.

Some months ago I took notice that the agents were still online and showed connectivity in the dashbard, but I didn't get data from them anymore. Because of that I've decided to reboot the ubuntu server and since then it's not possible to access the wazuh dashboard anymore. I got stuck with this kibana server is not ready yet message and I hope there is still a chance to get my wazuh installation up and running again.

If I run the command systemctl status kibana I got this:

lxoperator@agslxs01:~$ systemctl status kibana
● kibana.service - Kibana
     Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-02-09 13:07:50 UTC; 1 weeks 4 days ago
   Main PID: 107730 (node)
      Tasks: 11 (limit: 14247)
     Memory: 138.8M
     CGroup: /system.slice/kibana.service
             └─107730 /usr/share/kibana/bin/../node/bin/node /usr/share/kibana/bin/../src/cli/dist -c /etc/kibana/kibana.yml

Feb 20 16:34:02 agslxs01 kibana[107730]: {"type":"log","@timestamp":"2023-02-20T16:34:02Z","tags":["error","elasticsearch","data"],"pid":107730,"message":"[ConnectionError]: connect ECONN>
Feb 20 16:34:04 agslxs01 kibana[107730]: {"type":"log","@timestamp":"2023-02-20T16:34:04Z","tags":["error","elasticsearch","data"],"pid":107730,"message":"[ConnectionError]: connect ECONN>
Feb 20 16:34:07 agslxs01 kibana[107730]: {"type":"log","@timestamp":"2023-02-20T16:34:07Z","tags":["error","elasticsearch","data"],"pid":107730,"message":"[ConnectionError]: connect ECONN>

It seems that there is a connection problem to elasticsearch, but I've no clue where it cames from.

If I run journalctl -u kibana I got multiple line like this:

Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","plugins","wazuh","cron-scheduler"],"pid":1101440,"message":"ResponseError: val>
Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","elasticsearch","data"],"pid":1101440,"message":"[validation_exception]: Valida>
Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","plugins","wazuh","monitoring"],"pid":1101440,"message":"Could not create wazuh>

Do I have a plugin problem?

If I check the elasticsearch log with the following command cat /var/log/elasticsearch/wazuh-cluster.log | grep -i -E "error|warn" I can see multiple error patterns like this:

[2023-01-26T09:15:30,546][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:15:30,547][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:15:30,585][WARN ][r.suppressed             ] [node-1] path: /wazuh-monitoring-2023.4w, params: {index=wazuh-monitoring-2023.4w}
[2023-01-26T09:15:51,052][ERROR][c.a.o.s.a.s.InternalESSink] [node-1] Unable to index audit log {"audit_cluster_name":"wazuh-cluster","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"node-1","audit_trace_task_id":"Sku9diz9Qf2OB3kCLNlUBg:910868036","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"Sku9diz9Qf2OB3kCLNlUBg","audit_request_layer":"TRANSPORT","@timestamp":"2023-01-26T09:14:51.051+00:00","audit_format_version":4,"audit_request_remote_address":"127.0.0.1","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"10.10.98.110","audit_request_effective_user":"wazuh","audit_trace_indices":["<wazuh-alerts-4.x-{2022.11.26||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"10.10.98.110"} due to ProcessClusterEventTimeoutException[failed to process cluster event (auto create [security-auditlog-2023.01.26]) within 1m]
[2023-01-26T09:16:00,547][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:16:00,547][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:16:00,586][WARN ][r.suppressed             ] [node-1] path: /wazuh-monitoring-2023.4w, params: {index=wazuh-monitoring-2023.4w}
[2023-01-26T09:16:30,547][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:16:30,548][WARN ][r.suppressed             ] [node-1] path: /wazuh-statistics-2023.4w, params: {index=wazuh-statistics-2023.4w}
[2023-01-26T09:16:30,586][WARN ][r.suppressed             ] [node-1] path: /wazuh-monitoring-2023.4w, params: {index=wazuh-monitoring-2023.4w}
[2023-01-26T09:16:52,681][ERROR][c.a.o.s.a.s.InternalESSink] [node-1] Unable to index audit log {"audit_cluster_name":"wazuh-cluster","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"node-1","audit_trace_task_id":"Sku9diz9Qf2OB3kCLNlUBg:910873484","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"Sku9diz9Qf2OB3kCLNlUBg","audit_request_layer":"TRANSPORT","@timestamp":"2023-01-26T09:15:52.680+00:00","audit_format_version":4,"audit_request_remote_address":"127.0.0.1","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"10.10.98.110","audit_request_effective_user":"wazuh","audit_trace_indices":["<wazuh-alerts-4.x-{2022.11.26||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"10.10.98.110"} due to ProcessClusterEventTimeoutException[failed to process cluster event (auto create [security-auditlog-2023.01.26]) within 1m]
[2023-01-26T09:43:48,315][WARN ][o.e.c.s.MasterService    ] [node-1] took [60.5d] and then failed to publish updated cluster state (version: 256698, uuid: 84pVcm3xSouAO8UFfc1cSA) for [elected-as-master ([1] nodes joined)[{node-1}{Sku9diz9Qf2OB3kCLNlUBg}{fZfnuBAoSFuSLis4mz0ebg}{10.10.98.110}{10.10.98.110:9300}{dimr} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_]]:
[2023-01-26T09:43:58,426][ERROR][c.a.o.i.i.IndexStateManagementHistory] [node-1] Error creating ISM history index.
[2023-01-26T09:43:58,429][WARN ][o.e.c.s.ClusterApplierService] [node-1] cluster state applier task

And in the bottom I can see some hints for insecure file permissions:

[2023-01-26T09:50:45,810][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] Directory /etc/elasticsearch/certs has insecure file permissions (should be 0700)
[2023-01-26T09:50:45,810][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] File /etc/elasticsearch/certs/elasticsearch.pem has insecure file permissions (should be 0600)
[2023-01-26T09:50:45,811][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] File /etc/elasticsearch/certs/elasticsearch-key.pem has insecure file permissions (should be 0600)
[2023-01-26T09:50:45,811][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] File /etc/elasticsearch/certs/admin.pem has insecure file permissions (should be 0600)
[2023-01-26T09:50:45,812][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] File /etc/elasticsearch/certs/root-ca.pem has insecure file permissions (should be 0600)
[2023-01-26T09:50:45,812][WARN ][c.a.o.s.OpenDistroSecurityPlugin] [node-1] File /etc/elasticsearch/.elasticsearch.keystore.initial_md5sum has insecure file permissions (should be 0600)
[2023-01-26T09:51:15,244][WARN ][c.a.o.r.s.PluginSettings ] [node-1] reports:Failed to load /etc/elasticsearch/opendistro-reports-scheduler/reports-scheduler.yml

If you need further informations or logs I hope that I can provide you this. I'm thankful for every help and assistance I can get in this case

@Desvelao Desvelao self-assigned this Feb 21, 2023
@Desvelao
Copy link
Member

Hi @Cybercop-Training,

Could you share the next data about your environment?

  • Kibana version
  • Elasticsearch version
  • Wazuh plugin for Kibana version
  • Wazuh manager version

From the shared logs, it could have some problem with the connection of Elasticsearch-Kibana.

You could review the next things:

  1. Ensure Elasticsearch service is running without problem.
systemctl status elasticsearch
  1. Check the Elasticsearch configuration (host address, credentials, etc...) are correct in the Kibana configuration. See the kibana.yml ( this could be located at /etc/kibana/kibana.yml).

  2. Start the Kibana service

systemctl start kibana

For another hand, these lines of the Kibana logs (old logs November 21):

Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","plugins","wazuh","cron-scheduler"],"pid":1101440,"message":"ResponseError: val>
Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","elasticsearch","data"],"pid":1101440,"message":"[validation_exception]: Valida>
Nov 21 03:15:02 agslxs01 kibana[1101440]: {"type":"log","@timestamp":"2022-11-21T03:15:02Z","tags":["error","plugins","wazuh","monitoring"],"pid":1101440,"message":"Could not create wazuh>

seems to be related to the Wazuh plugin for Kibana. Some tasks of the Wazuh plugin need the internal user of Kibana, which is configurated in its configuration (kibana.yml) have the required permissions to do some actions. If you want these features to work, then you have to ensure the internal user of Kibana, has the required permissions. These tasks are related to index data of the Wazuh API in wazuh-monitoring-* and wazuh-statistics-* indices. The Wazuh plugin for Kibana displays this data in some visualizations. This problem should not the reason for the Kibana server is not ready yet problem you are reporting.

@Cybercop-Training
Copy link
Author

@Desvelao
Thank you so much for your guidance and help in this case :)

Addidtional Info about my environment:
OS: Ubuntu 20.04.5 LTS
Wazuh Manager Version: v.4.3.9 / Revision: 40.3.22
Kibana Plugin Version:
cat /usr/share/kibana/package.json | grep version
7.10.2

Elastic Search version:
7.10.2

First I've checked the credentials ins kibana.yml and they are still the same that I've used in my setup.

It seems that the elasticsearch services did run into a timeout and got terminated:

lxoperator@agslxs01:~$ systemctl status elasticsearch
? elasticsearch.service - Elasticsearch
     Loaded: loaded (/lib/systemd/system/elasticsearch.service; enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Thu 2023-01-26 09:51:21 UTC; 3 weeks 4 days ago
       Docs: https://www.elastic.co
   Main PID: 854 (code=exited, status=143)

Jan 26 09:50:06 agslxs01 systemd[1]: Starting Elasticsearch...
Jan 26 09:51:21 agslxs01 systemd[1]: elasticsearch.service: start operation timed out. Terminating.
Jan 26 09:51:21 agslxs01 systemd[1]: elasticsearch.service: Failed with result 'timeout'.
Jan 26 09:51:21 agslxs01 systemd[1]: Failed to start Elasticsearch.

After that I've tried to restart the wazuh core services:
systemctl restart kibana filebeat elasticsearch wazuh-manager

All services got up and running again and I was able to login to my wazuh dashboard!
If I restart the server, the elasticsearch service will still ran in a timeout, but for now I'm happy that I can handle it with a restart of the wazuh core services.

I've still the problem that my wazuh agents are online, but I don't see any alert data in the dashboard.
I guess it could be a problem with the filebeat service.
Can we go ahead to try also find a solution for this specific issue or shall I better conduct the slack channel or google mailing list for this?

The status of my agents looks like this:

root@agslxs01:/var/ossec/bin# ./agent_control -l

Wazuh agent_control. List of available agents:
   ID: 000, Name: agslxs01 (server), IP: 127.0.0.1, Active/Local
   ID: 006, Name: AGSSRV30, IP: any, Active
   ID: 003, Name: AGSSRV28, IP: any, Active
   ID: 004, Name: AGSSRV42, IP: any, Active
   ID: 005, Name: AGSSRV29, IP: any, Active
   ID: 007, Name: AGSSRV10, IP: any, Active

@Desvelao
Copy link
Member

Desvelao commented Feb 22, 2023

Hi @Cybercop-Training, I am glad you could solve the problem. But maybe you should review the reason why Elasticsearch fails to start due to a timeout when you restart the server machine.

For another hand, related to you are not able to see any alert data and this seems to be another topic different from the initial one, it is recommended to open a new thread in any community channel (Slack, Google mailing list, Discord, etc). Consider searching before opening a new thread, you could find a related thread that could help to debug or solve the problem.

If you don't see any alert data in the Wazuh plugin, this could be caused by:

  • the data is not indexed
    • data is not generated
    • error indexing the data
  • the filters used in the dashboard don't match the indexed data
  1. Wazuh manager/s is running
systemctl status wazuh-manager
  1. Ensure the Wazuh manager/s is generating new alerts. Review the alerts.json file where the alerts are stored.
tail -n1 /var/ossec/logs/alerts/alerts.json

The previous command, should display the last line of the alerts.json file. Review if the timestamp property displays a recent date.

  1. Filbeat service is running
systemctl status filebeat
  1. Verify the connection Filebeat-Elasticsearch
filebeat test output
  1. Review the Filebeat logs ( you could filter by errors/warnings ):
grep -iE "err|warn" /var/log/filebeat/filebeat
  1. Optionally, you could review the Elasticsearch logs too, but the problem could be identified in the above check.
grep -iE "err|warn" /var/log/elasticsearch/<CLUSTER_NAME>.log

where:

  • <CLUSTER_NAME> is the name of your Elasticsearch cluster.

@Cybercop-Training
Copy link
Author

Hi @Desvelao
Again thank you so much for your support!

  1. Systemctl Status wazuh-manager
● wazuh-manager.service - Wazuh manager
     Loaded: loaded (/lib/systemd/system/wazuh-manager.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-02-21 16:21:29 UTC; 18h ago
    Process: 887 ExecStart=/usr/bin/env /var/ossec/bin/wazuh-control start (code=exited, status=0/SUCCESS)
      Tasks: 176 (limit: 14247)
     Memory: 2.6G
     CGroup: /system.slice/wazuh-manager.service
             ├─1668 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh-apid.py
             ├─1707 /var/ossec/bin/wazuh-authd
             ├─1716 /var/ossec/bin/wazuh-db
             ├─1743 /var/ossec/bin/wazuh-execd
             ├─1757 /var/ossec/bin/wazuh-analysisd
             ├─1760 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh-apid.py
             ├─1763 /var/ossec/framework/python/bin/python3 /var/ossec/api/scripts/wazuh-apid.py
             ├─1775 /var/ossec/bin/wazuh-syscheckd
             ├─1793 /var/ossec/bin/wazuh-remoted
             ├─1908 /var/ossec/bin/wazuh-logcollector
             ├─1921 /var/ossec/bin/wazuh-monitord
             └─1941 /var/ossec/bin/wazuh-modulesd

Feb 21 16:21:21 agslxs01 env[887]: Started wazuh-execd...
Feb 21 16:21:22 agslxs01 env[887]: Started wazuh-analysisd...
Feb 21 16:21:23 agslxs01 env[887]: Started wazuh-syscheckd...
Feb 21 16:21:24 agslxs01 env[887]: Started wazuh-remoted...
Feb 21 16:21:25 agslxs01 env[887]: Started wazuh-logcollector...
Feb 21 16:21:26 agslxs01 env[887]: Started wazuh-monitord...
Feb 21 16:21:26 agslxs01 env[1939]: 2023/02/21 16:21:26 wazuh-modulesd: WARNING: The <ignore_time> tag at module 'vulnerability-detector' is deprecated for version newer than 4.3.
Feb 21 16:21:27 agslxs01 env[887]: Started wazuh-modulesd...
Feb 21 16:21:29 agslxs01 env[887]: Completed.
Feb 21 16:21:29 agslxs01 systemd[1]: Started Wazuh manager.
  1. Check for alert generating in alerts.json file
{"timestamp":"2023-02-22T10:42:00.367+0000","rule":{"level":3,"description":"Windows logon success.","id":"60106","mitre":{"id":["T1078"],"tactic":["Defense Evasion","Persistence","Privilege Escalation","Initial Access"],"technique":["Valid Accounts"]},"firedtimes":16,"mail":false,"groups":["windows","windows_security","authentication_success"],"gdpr":["IV_32.2"],"gpg13":["7.1","7.2"],"hipaa":["164.312.b"],"nist_800_53":["AC.7","AU.14"],"pci_dss":["10.2.5"],"tsc":["CC6.8","CC7.2","CC7.3"]},"agent":{"id":"004","name":"AGSSRV42","ip":"10.10.98.92"},"manager":{"name":"agslxs01"},"id":"1677062520.5239657","decoder":{"name":"windows_eventchannel"},"data":{"win":{"system":{"providerName":"Microsoft-Windows-Security-Auditing","providerGuid":"{54849625-5478-4994-a5ba-3e3b0328c30d}","eventID":"4624","version":"2","level":"0","task":"12544","opcode":"0","keywords":"0x8020000000000000","systemTime":"2023-02-22T10:41:59.303532900Z","eventRecordID":"966432","processID":"804","threadID":"528","channel":"Security","computer":"AGSSRV42.GIB.BS","severityValue":"AUDIT_SUCCESS","message":"\"An account was successfully logged on.\r\n\r\nSubject:\r\n\tSecurity ID:\t\tS-1-5-18\r\n\tAccount Name:\t\tAGSSRV42$\r\n\tAccount Domain:\t\tGIB\r\n\tLogon ID:\t\t0x3E7\r\n\r\nLogon Information:\r\n\tLogon Type:\t\t5\r\n\tRestricted Admin Mode:\t-\r\n\tVirtual Account:\t\tNo\r\n\tElevated Token:\t\tYes\r\n\r\nImpersonation Level:\t\tImpersonation\r\n\r\nNew Logon:\r\n\tSecurity ID:\t\tS-1-5-18\r\n\tAccount Name:\t\tSYSTEM\r\n\tAccount Domain:\t\tNT AUTHORITY\r\n\tLogon ID:\t\t0x3E7\r\n\tLinked Logon ID:\t\t0x0\r\n\tNetwork Account Name:\t-\r\n\tNetwork Account Domain:\t-\r\n\tLogon GUID:\t\t{00000000-0000-0000-0000-000000000000}\r\n\r\nProcess Information:\r\n\tProcess ID:\t\t0x2f8\r\n\tProcess Name:\t\tC:\\Windows\\System32\\services.exe\r\n\r\nNetwork Information:\r\n\tWorkstation Name:\t-\r\n\tSource Network Address:\t-\r\n\tSource Port:\t\t-\r\n\r\nDetailed Authentication Information:\r\n\tLogon Process:\t\tAdvapi  \r\n\tAuthentication Package:\tNegotiate\r\n\tTransited Services:\t-\r\n\tPackage Name (NTLM only):\t-\r\n\tKey Length:\t\t0\r\n\r\nThis event is generated when a logon session is created. It is generated on the computer that was accessed.\r\n\r\nThe subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.\r\n\r\nThe logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).\r\n\r\nThe New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.\r\n\r\nThe network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.\r\n\r\nThe impersonation level field indicates the extent to which a process in the logon session can impersonate.\r\n\r\nThe authentication information fields provide detailed information about this specific logon request.\r\n\t- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.\r\n\t- Transited services indicate which intermediate services have participated in this logon request.\r\n\t- Package name indicates which sub-protocol was used among the NTLM protocols.\r\n\t- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.\""},"eventdata":{"subjectUserSid":"S-1-5-18","subjectUserName":"AGSSRV42$","subjectDomainName":"GIB","subjectLogonId":"0x3e7","targetUserSid":"S-1-5-18","targetUserName":"SYSTEM","targetDomainName":"NT AUTHORITY","targetLogonId":"0x3e7","logonType":"5","logonProcessName":"Advapi","authenticationPackageName":"Negotiate","logonGuid":"{00000000-0000-0000-0000-000000000000}","keyLength":"0","processId":"0x2f8","processName":"C:\\\\Windows\\\\System32\\\\services.exe","impersonationLevel":"%%1833","virtualAccount":"%%1843","targetLinkedLogonId":"0x0","elevatedToken":"%%1842"}}},"location":"EventChannel"}
  1. Check Filebeat service
 filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
     Loaded: loaded (/lib/systemd/system/filebeat.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-02-21 16:26:23 UTC; 18h ago
       Docs: https://www.elastic.co/products/beats/filebeat
   Main PID: 3986 (filebeat)
      Tasks: 14 (limit: 14247)
     Memory: 21.4M
     CGroup: /system.slice/filebeat.service
             └─3986 /usr/share/filebeat/bin/filebeat --environment systemd -c /etc/filebeat/filebeat.yml --path.home /usr/share/filebeat --path.config /etc/filebeat --path.data /var/lib/f>

Feb 22 10:56:02 agslxs01 filebeat[3986]: 2023-02-22T10:56:02.697Z        WARN        [elasticsearch]        elasticsearch/client.go:408        **Cannot index event publisher.Event{Content:b>**
Feb 22 10:56:02 agslxs01 filebeat[3986]: 2023-02-22T10:56:02.697Z        WARN        [elasticsearch]        elasticsearch/client.go:408        **Cannot index event publisher.Event{Content:b>**
Feb 22 10:56:23 agslxs01 filebeat[3986]: 2023-02-22T10:56:23.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:56:53 agslxs01 filebeat[3986]: 2023-02-22T10:56:53.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:57:23 agslxs01 filebeat[3986]: 2023-02-22T10:57:23.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:57:37 agslxs01 filebeat[3986]: 2023-02-22T10:57:37.732Z        WARN        [elasticsearch]        elasticsearch/client.go:408        **Cannot index event publisher.Event{Content:b>**
Feb 22 10:57:53 agslxs01 filebeat[3986]: 2023-02-22T10:57:53.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:58:23 agslxs01 filebeat[3986]: 2023-02-22T10:58:23.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:58:53 agslxs01 filebeat[3986]: 2023-02-22T10:58:53.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>
Feb 22 10:59:23 agslxs01 filebeat[3986]: 2023-02-22T10:59:23.995Z        INFO        [monitoring]        log/log.go:145        Non-zero metrics in the last 30s        {"monitoring": {"met>

Filebeat service is running, but the warning Cannot index event publisher.Event{Content:b> sounds suspiscous!

  1. Connection Filebeat-Elasticsearch
elasticsearch: https://127.0.0.1:9200...
  parse url... OK
  connection...
    parse host... OK
    dns lookup... OK
    addresses: 127.0.0.1
    dial up... OK
  TLS...
    security: server's certificate chain verification is enabled
    handshake... OK
    TLS version: TLSv1.3
    dial up... OK
  talk to server... OK
  version: 7.10.2
  1. Filebeat log (shows only info and no errors/warnings)
2022-04-28T11:36:55.666Z        INFO    instance/beat.go:645    Home path: [/usr/share/filebeat] Config path: [/etc/filebeat] Data path: [/var/lib/filebeat] Logs path: [/var/log/filebeat]
2022-04-28T11:36:55.666Z        INFO    instance/beat.go:653    Beat ID: 9e448ea1-e624-4d62-8437-aac291049097
2022-04-28T11:36:55.666Z        INFO    [index-management]      idxmgmt/std.go:184      Set output.elasticsearch.index to 'filebeat-7.10.2' as ILM is enabled.
2022-04-28T11:36:55.667Z        INFO    eslegclient/connection.go:99    elasticsearch url: https://127.0.0.1:9200
2022-04-28T11:36:55.725Z        INFO    [esclientleg]   eslegclient/connection.go:314   Attempting to connect to Elasticsearch version 7.10.2
  1. Elastic search log
[2023-02-22T12:42:17,912][ERROR][c.a.o.s.a.s.InternalESSink] [node-1] Unable to index audit log {"audit_cluster_name":"wazuh-cluster","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"node-1","audit_trace_task_id":"Sku9diz9Qf2OB3kCLNlUBg:4581694","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"Sku9diz9Qf2OB3kCLNlUBg","audit_request_layer":"TRANSPORT","@timestamp":"2023-02-22T12:42:17.911+00:00","audit_format_version":4,"audit_request_remote_address":"127.0.0.1","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"10.10.98.110","audit_request_effective_user":"wazuh","audit_trace_indices":["<wazuh-alerts-4.x-{2023.02.22||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"10.10.98.110"} due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
[2023-02-22T12:44:02,940][ERROR][c.a.o.s.a.s.InternalESSink] [node-1] Unable to index audit log {"audit_cluster_name":"wazuh-cluster","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"node-1","audit_trace_task_id":"Sku9diz9Qf2OB3kCLNlUBg:4589964","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"Sku9diz9Qf2OB3kCLNlUBg","audit_request_layer":"TRANSPORT","@timestamp":"2023-02-22T12:44:02.939+00:00","audit_format_version":4,"audit_request_remote_address":"127.0.0.1","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"10.10.98.110","audit_request_effective_user":"wazuh","audit_trace_indices":["<wazuh-alerts-4.x-{2023.02.22||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"10.10.98.110"} due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
[2023-02-22T12:46:37,941][ERROR][c.a.o.s.a.s.InternalESSink] [node-1] Unable to index audit log {"audit_cluster_name":"wazuh-cluster","audit_transport_headers":{"_system_index_access_allowed":"false"},"audit_node_name":"node-1","audit_trace_task_id":"Sku9diz9Qf2OB3kCLNlUBg:4598237","audit_transport_request_type":"CreateIndexRequest","audit_category":"INDEX_EVENT","audit_request_origin":"REST","audit_request_body":"{}","audit_node_id":"Sku9diz9Qf2OB3kCLNlUBg","audit_request_layer":"TRANSPORT","@timestamp":"2023-02-22T12:46:37.940+00:00","audit_format_version":4,"audit_request_remote_address":"127.0.0.1","audit_request_privilege":"indices:admin/auto_create","audit_node_host_address":"10.10.98.110","audit_request_effective_user":"wazuh","audit_trace_indices":["<wazuh-alerts-4.x-{2023.02.22||/d{yyyy.MM.dd|UTC}}>"],"audit_node_host_name":"10.10.98.110"} due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;

What does this error line mean?
due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;
It says that it has 1000 shards open and no more are left. I don't understands what shards are.
Could that be the problem that I receive no more event data from my agents?

@Desvelao
Copy link
Member

Hi @Cybercop-Training,

Thank you so much for sharing the outputs.

It is the cause because you can't see any alerts. The data can't be indexed.

due to org.elasticsearch.common.ValidationException: Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open

This means the shards limit count was reached (1000 by default in the node). To fix this issue, there are multiple options:

  • Delete indices. This frees shards. You could do it with old indices you don't want/need. Or even, you could automate it with ILM/ISM policies to delete old indices after a period of time as explained in this post: https://wazuh.com/blog/wazuh-index-management.
    Note:
    • ILM: Index Lifecycle Management (used by X-Pack)
    • ISM: Index State Management (used by Open Distro for Elasticsearch and OpenSearch)

The automation of the indices deletion through ILM/ISM policies is recommended because reduces manual maintenance.

  • Add more nodes to your Elasticserach/Wazuh indexer cluster.

  • Increment the max shards per node (not recommended). But if you do this option, make sure you do not increase it too much, as it could provoke inoperability and performance issues in your Elasticsearch/Wazuh indexer cluster. To do this:

    curl -k -u USERNAME:PASSWORD -XPUT ELASTICSEARCH_HOST_ADDRESS/_cluster/settings -H "Content-Type: application/json" \
    -d '{ "persistent": { "cluster.max_shards_per_node": "MAX_SHARDS_PER_NODE" } }'

    replace the placeholders, where:

  • USERNAME : username to do the request

  • PASSWORD : password for the user

  • ELASTICSEARCH_HOST_ADDRESS: Elasticsearch/Wazuh indexer host address. Include the protocol https if needed.

  • MAX_SHARDS_PER_NODE: Maximum shards by node. Maybe you could try with 1200 o something like that, depending of your case.

You should know each index in Elasticsearch has assigned a number of shards, by example, the wazuh-alerts-* indices could use 3 shards (due to the template that defines some settings of these indices). So another option is reducing the count of shards used by then. You could be interested to take a look to https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster.

If you don't need the data of old indices, you could set a ILM/ISM policy to automate the deletion of these indices when they have a minimum period of life.

More info: https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

@AlexRuiz7 AlexRuiz7 added the reporter/community Issue reported by the community label Feb 24, 2023
@Cybercop-Training
Copy link
Author

@Desvelao
Thank you so much for your explanation and your support!
The whole elastic search index processing looks a bit confusing and complicating for me, but hopefully I'll understand it better as soon as I read more about it...

First of all I did check if a ISM Policy were allready running on my system and there was one, but maybe wrong configured.
In the Index Management under Indicies I did set a filter for alert* and could see many alert indicies that were not assigned to my ISM Policy.
I did unassign the old policy first and did create a new one as it is described in the article you've linked:

{
    "policy": {
        "description": "Wazuh index state management for OpenDistro to move indices into a cold state after 30 days and delete them after a year.",
        "default_state": "hot",
        "states": [
            {
                "name": "hot",
                "actions": [
                    {
                        "replica_count": {
                            "number_of_replicas": 1
                        }
                    }
                ],
                "transitions": [
                    {
                        "state_name": "cold",
                        "conditions": {
                            "min_index_age": "30d"
                        }
                    }
                ]
            },
            {
                "name": "cold",
                "actions": [
                    {
                        "read_only": {}
                    }
                ],
                "transitions": [
                    {
                        "state_name": "delete",
                        "conditions": {
                            "min_index_age": "365d"
                        }
                    }
                ]
            },
            {
                "name": "delete",
                "actions": [
                    {
                        "delete": {}
                    }
                ],
                "transitions": []
            }
        ],
       "ism_template": {
           "index_patterns": ["wazuh-alerts*"],
           "priority": 100
       }
    }
}

Now this policy is linked to all my alert* indicies!
Will this policy be applied automatically for new alert* indicies or do I have to this from time to time manualy again?

After that I did increment the max shards per node and set a value of 2000

PUT _cluster/settings
{
  "persistent" : {
    "cluster.max_shards_per_node": 2000
  }
} 

That command did help and I could see immediately that new alers are comming in.
With that ISM policy applied I took notice that a lot of alert indicies moved over to the cold state. After a year the wazuh server should automatically delete them.

The question that comes to me now is:
Does alert indicies in a cold state also reserve shards for themself?
Is there a difference in reserved shards between alert indicies in a hot or cold state?

Is there a command that I can check on the wazu server how many shards are left`?

I also discovered this post which described the same issue I had: wazuh/wazuh-puppet#222

Would it also be useful in my case to realocate unassigned shards?

curl -XPUT ' localhost:9200/wazuh-alerts-*/_settings' -H 'Content-Type: application/json' -d '{ "index": { "number_of_replicas": "0" } }'

Last but not least, how can I update my wazuh stack to the newest release without breaking it with an incompatible elastic search version? It's because I took notice that I've to be very careful just to go over aptitude update / aptitude upgrade command.

Thanks in advance!

@Desvelao
Copy link
Member

Desvelao commented Feb 27, 2023

Hi @Cybercop-Training,

The ISM policy should be applied to the new one indices if its name matches with the index patterns you defined in the ISM policy. I see you added:

"ism_template": {
           "index_patterns": ["wazuh-alerts*"],
           "priority": 100
       }

so this means, for the indices whose name matches the index pattern wazuh-alerts-*, the ISM policy should be applied. Consider reviewing that the ISM policy is applied to the expected new indices to be sure that this is working automatically.

Note that increasing the maximum shards count by a node is not recommended, because it could provoke inoperability and performance issues in your Elasticsearch/Wazuh indexer cluster. If you use case could be managed with the ISM policies, then maybe you could not need to increase the maximum shards count by node.

Does alert indicies in a cold state also reserve shards for themself?

Yes, they do.

Is there a difference in reserved shards between alert indices in a hot or cold state?

According to your policy, the difference between hot and cold state, is the indices become read-only in the cold state. This allows searching in the indices, but the write operations are disabled (data can't be indexed).

Is there a command that I can check on the wazu server how many shards are left`?

I don't know if there is a way to get the remaining shards, but you could get the total with a request (from the Kibana Dev tools plugin or transform it to do with cURL)

GET _cluster/stats?filter_path=indices.shards.total

You could check the Elasticserach API documentation to get more information.

Would it also be useful in my case to realocate unassigned shards?

I guess so, but this depends on the indices configuration and count of Elasticsearch nodes. I will research if there is a problem that should be solved.

Last but not least, how can I update my wazuh stack to the newest release without breaking it with an incompatible elastic search version? It's because I took notice that I've to be very careful just to go over aptitude update / aptitude upgrade command.

You should disable the possibility of upgrading the Wazuh components through an automatic way as aptitude because this could cause the installed packages are not compatible with the Wazuh setup. See how to disable the repositories if using aptitude or other tools: https://askubuntu.com/questions/18654/how-to-prevent-updating-of-a-specific-package.

Before upgrading, you should check the compatibility matrix of the Wazuh plugin-Kibana https://github.com/wazuh/wazuh-kibana-app/wiki/Compatibility. The table displays the compatibility of specific Wazuh plugin versions with Kibana versions.

To upgrade the Wazuh components, you should follow one of the guides from the Wazuh documentation.
If you are using:

Please, before upgrading, read the guides with attention and verify that the provided documentation applies to your case.

Consider opening new tickets in some of the Wazuh community channels if the questions are not directly related to the initial one. They could be useful for other users.

@Cybercop-Training
Copy link
Author

@Desvelao
Thank you so much for your help and support in this case! :)
I'll close this ticket now and I'm glad to come back or check out other channels if there should be any further questions or issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reporter/community Issue reported by the community
Projects
None yet
Development

No branches or pull requests

3 participants