Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very Slow Client Buffer Clearance Despite Default EPS (500) #23213

Open
bbkrr opened this issue May 2, 2024 · 0 comments
Open

Very Slow Client Buffer Clearance Despite Default EPS (500) #23213

bbkrr opened this issue May 2, 2024 · 0 comments

Comments

@bbkrr
Copy link

bbkrr commented May 2, 2024

Wazuh version Component Install type Install method Platform
4.3.5 Server Manager Sources Ubuntu 20.04
4.3.5 Client Agent Packages Windows Server 2016

Note: I have enabled ZEROMQ as an output source and disabled logging in the files.


After activating the anti-flooding mechanism on my client-side, I'm encountering a 'Buffer Flooded' error.

Screenshot 2024-05-02 at 15 30 39

In my setup, I've kept the configuration minimal, with just one file monitoring setup. I've also crafted a batch file to generate random logs in the monitored file. The configuration includes a queue size of 5000 and an events per second limit of 500.

agent.conf

<agent_config>
  <labels>
    <label key="os_name">windows</label>
  </labels>
  <client_buffer>
    <disabled>no</disabled>
    <events_per_second>500</events_per_second>
    <queue_size>5000</queue_size>
  </client_buffer>
  <localfile>
    <location>C:\Users\Administrator\Documents\logfile.txt</location>
    <log_format>syslog</log_format>
    <only-future-events>no</only-future-events>
  </localfile>
</agent_config>

ossec.conf

<ossec_config>
  <client>
    <server>
      <address>192.168.0.1</address>
      <port>1514</port>
      <protocol>tcp</protocol>
    </server>
    <crypto_method>aes</crypto_method>
    <notify_time>10</notify_time>
    <time-reconnect>60</time-reconnect>
    <auto_restart>yes</auto_restart>
    <enrollment>
      <use_source_ip>no</use_source_ip>
    </enrollment>
  </client>
</ossec_config>

local_internal_options.conf

# Logcollector file loop timeout (check every 2 seconds for file changes)
logcollector.loop_timeout=2

# Logcollector - Maximum number of lines to read from the same file [100..1000000]
# 0. Disable line burst limitation
logcollector.max_lines=1000

# Time since the agent buffer is full to consider events flooding
agent.tolerance=15

# Level of occupied capacity in Agent buffer to trigger a warning message
agent.warn_level=90

# Level of occupied capacity in Agent buffer to come back to normal state
agent.normal_level=70

# Minimum events per second, configurable at XML settings [1..1000]
agent.min_eps=500

# Wazuh modules - maximum number of events per second sent by each module [1..1000]
wazuh_modules.max_eps=500

I have increased the min_eps to 500 as well, it is same as the max_eps.

Upon initiating the generation and writing of random logs to the monitored file, the buffer begins to fill. Once the buffer reaches 4500, a warning "Agent buffer at 90%" appears, followed by "Agent buffer is full: Events may be lost."

After 15 seconds, as anticipated, "Agent buffer is flooded: Producing too many events" is displayed. However, despite this behavior, only 50-60 events per second are observed being sent to the Manager (remoted), instead of the expected 500 EPS.

On the client side, I'm monitoring the client buffer (msg_buffer) by checking the wazuh-agent.state file, and on the manager side, I'm inspecting the wazuh-remoted.state file located at /var/ossec/var/run/wazuh-remoted.state.

The text file below provides details about the events received by the remoted, sourced from the wazuh-remoted.state file.

event_check.txt

When examining the received events and tracking those sent from the client, I typically observe an average of 50-60 events per second.

Considering the configuration I've established, my expectation was to achieve a throughput of 500 events per second, with the buffer clearing within approximately 10 seconds if it reaches its maximum capacity of 5000 events and 500 EPS is consistently dispatched.

I have encountered some other configs such as "vcheck_files" and "reload_interval".
In my understanding, "vcheck_files" and "reload_interval" parameters shouldn't be relevant in preventing the buffer flooding.

"vcheck_files" is the interval in which the logcollector module checks whether the monitored files have been rotated or truncated. By default, it is set to 64 seconds which should be okay.

"reload_interval" is the interval in which the logcollector module closes and reopens the file handlers for the monitored files. It is effective only when the "forced_reload" parameter is enabled.


If I disable the buffer, I get 300 EPS in average. I have attached the "logfile" and the "events received files" after I disable the buffer.

logfile.txt

event_check_disabled_buffer.txt


Are there any other configurations that I should adjust in order to achieve a throughput of 500 EPS and expedite the buffer clearance process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant