wazuh-logcollector: ERROR: socketerr (not available) problem. #10

rustybofh · 2024-06-14T08:13:39Z

Hi, I’m running Wazuh agent 4.7.4 on pfSense 2.7 and I keep getting these errors even though it’s working and sending information to the manager. I’ve tried changing agent parameters like queue size and events per second, but the issue persists. The manager version is 4.8, and for more context, I have Suricata running on pfSense, but even stopping Suricata on the interfaces doesn’t resolve the problem.

I´ve got this:

2024/06/13 17:34:42 wazuh-logcollector: INFO: Successfully reconnected to 'queue/sockets/queue'
2024/06/13 17:34:42 wazuh-logcollector: ERROR: socketerr (not available).
2024/06/13 17:34:42 wazuh-logcollector: ERROR: Unable to send message to 'queue/sockets/queue' (wazuh-agentd might be down). Attempting to reconnect.
2024/06/13 17:34:42 wazuh-logcollector: INFO: Successfully reconnected to 'queue/sockets/queue'
2024/06/13 17:34:42 wazuh-logcollector: ERROR: socketerr (not available).
2024/06/13 17:34:42 wazuh-logcollector: ERROR: Unable to send message to 'queue/sockets/queue' after a successfull reconnection...
2024/06/13 17:34:42 wazuh-logcollector: ERROR: socketerr (not available).
2024/06/13 17:34:42 wazuh-logcollector: ERROR: Unable to send message to 'queue/sockets/queue' (wazuh-agentd might be down). Attempting to reconnect.
2024/06/13 17:34:42 wazuh-logcollector: INFO: Successfully reconnected to 'queue/sockets/queue

I have tried changing parameters in the local.conf, but nothing has worked. The agent appears fine in the manager and there are no enrollment issues. There is no extra hop from the manager to the pfSense, and the other agents in various locations do not have this issue. I also do not see any firewall blocks.

Any hints or suggestions would be appreciated.

Thanks!

Update: This issue only occurs when monitoring the WAN interface. It does not happen when monitoring only the LAN interface

vikman90 · 2024-06-18T09:03:52Z

Hi @rustybofh

The Wazuh agent is divided into multiple processes that communicate through a local socket: /var/ossec/queue/sockets/queue. The wazuh-agent process exposes this socket so that collectors (e.g., Logcollector, FIM, etc.) can send messages to the manager.

Trying to reproduce

The most common reason for a disconnection is that wazuh-agentd crashes. We see this is not the case here, as it can reconnect immediately. I suspect the issue lies with the internal buffer of that socket (provided by the operating system), so I conducted a proof of concept which worked without issues on Linux:

queue.py

#!/usr/bin/env python3
# Print messages on Wazuh's queue (analysisd/agentd)
#
# Syntax: queue.py [-L] [PATH]
# Reads a line from stdin
# Standard message form: <id>:<location>:<log>
#
# Example:
# echo '1:test:Hello World' | sudo ./queue.py -L

import argparse
from socket import socket, AF_UNIX, SOCK_DGRAM, SO_SNDBUF, SOL_SOCKET
from sys import argv

ADDR = '/var/ossec/queue/sockets/queue'
BLEN = 212992

def connect(addr, blen):
    sock = socket(AF_UNIX, SOCK_DGRAM)
    sock.connect(addr)
    oldbuf = sock.getsockopt(SOL_SOCKET, SO_SNDBUF)

    if oldbuf < blen:
        sock.setsockopt(SOL_SOCKET, SO_SNDBUF, blen)
        newbuf = sock.getsockopt(SOL_SOCKET, SO_SNDBUF)
        print("INFO: Buffer expended from {0} to {1}".format(oldbuf, newbuf))

    return sock


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="Print messages on Wazuh's queue")
    parser.add_argument('-L', '--loop', action='store_true', dest='loop', help='enable loop mode')
    parser.add_argument('PATH', nargs='?', default=ADDR, help='override default queue path')

    args = parser.parse_args()
    string = input().encode()
    sock = connect(args.PATH, BLEN)

    if args.loop:
        i = 0

        try:
            while True:
                sock.send(string)
                i += 1
        except BaseException as e:
            print(e)
            print("Messages: {0}\nBytes: {1}".format(i, i * len(string)))

    else:
        string = ' '.join(argv[1:])

    sock.close()

As I understand it, pfSense is based on FreeBSD. I don't have pfSense, but I tested this on FreeBSD and encountered this error:

[Errno 55] No buffer space available

Rationale

This demonstrates a difference between the two platforms: if the socket memory fills up (because Logcollector generates more messages than the agent can handle), Linux performs an implicit wait (causing Logcollector to wait until space is available), while BSD generates an error code.

In fact, this is how I tested it on FreeBSD: by enabling logcollector.debug=2 and inserting numerous logs into a file, Logcollector produced this warning:

2024/06/18 08:39:23 wazuh-logcollector[16057] mq_op.c:127 at SendMSGAction(): DEBUG: Socket busy, discarding message.

So, my hypothesis is:

When you enable WAN monitoring, Suricata produces a massive number of logs, which Logcollector captures but exceeds the agent's capacity to handle.
For some reason, pfSense in particular generates a different error code than FreeBSD. This causes Logcollector to produce that generic error and attempt to reconnect to the agent.

If this is correct, and seeing that Logcollector reconnects successfully, in terms of code, the end effect is nearly the same (aside from the error printed in the log).

Additionally, pfSense and FreeBSD are not officially supported, so I don't believe we can prioritize development to eliminate the error message.

Workaround

If my hypothesis is valid, and this is due to a capacity issue, I believe we can implement a workaround with the configuration:

Ensure the agent's leaky bucket (<client_buffer>) is enabled. This speeds up message handling: if messages can't be sent directly, they are queued.
Reduce Logcollector's read rate. For example, limit the bunch of lines to 500 per cycle: edit etc/local_internal_options.conf and add:
```
logcollector.max_lines=500
```

I hope this helps.

Best regards.

vikman90 added reporter/community Issue reported by the community type/bug Bug issue module/logcollector platform/pfsense labels Jun 18, 2024

vikman90 self-assigned this Jun 18, 2024

vikman90 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wazuh-logcollector: ERROR: socketerr (not available) problem. #10

wazuh-logcollector: ERROR: socketerr (not available) problem. #10

rustybofh commented Jun 14, 2024 •

edited

Loading

vikman90 commented Jun 18, 2024

wazuh-logcollector: ERROR: socketerr (not available) problem. #10

wazuh-logcollector: ERROR: socketerr (not available) problem. #10

Comments

rustybofh commented Jun 14, 2024 • edited Loading

vikman90 commented Jun 18, 2024

Trying to reproduce

Rationale

Workaround

rustybofh commented Jun 14, 2024 •

edited

Loading