Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional API timeouts #8544

Open
jojelupipa opened this issue May 5, 2021 · 3 comments
Open

Occasional API timeouts #8544

jojelupipa opened this issue May 5, 2021 · 3 comments

Comments

@jojelupipa
Copy link

jojelupipa commented May 5, 2021

Wazuh version Component
4.1.4 Wazuh API

Description

A community user is reporting API timeouts when accessing Kibana.

We first discarded any heavy CPU load issues as the server running Wazuh didn't exceed a huge load either when receiving a timeout nor the rest of the time. The manager is hosted at an AWS EC2 instance with 2 CPUs and 8 GB RAM. It is supporting ~100 agents.

Restarting the manager only recovers the API for a short time.

The size of client.keys, the number of files in /var/ossec/queue/db and /var/ossec/queue/agent-groups are acceptable (either match or are close to the actual number of agents).

We tried to execute some curl requests directly to the API, but we received a timeout even we explicitly added the wait_for_complete parameter to the call.

curl -k -X GET "https://<HOST_IP>:55000/agents?wait_for_complete" -H "Authorization: Bearer <YOUR_JWT_TOKEN>"

It returned:

{"title": "Wazuh Internal Error", "detail": "Timeout executing API request", "dapi_errors": {"node01": {"error": "Timeout executing API request", "logfile": "WAZUH_HOME/logs/api.log"}}, "error": 3021}

Looking at the logs under /var/ossec/log/api.log we could see the following error showing up repeatedly:

2021/05/05 00:00:12 ERROR: Timeout executing API request
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.8/site-packages/wazuh-4.1.1-py3.8.egg/wazuh/core/cluster/dapi/dapi.py", line 250, in execute_local_request
    data = await asyncio.wait_for(task, timeout=timeout)
  File "/var/ossec/framework/python/lib/python3.8/asyncio/tasks.py", line 498, in wait_for
    raise exceptions.TimeoutError()
asyncio.exceptions.TimeoutError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.8/site-packages/wazuh-4.1.1-py3.8.egg/wazuh/core/cluster/dapi/dapi.py", line 252, in execute_local_request
    raise exception.WazuhInternalError(3021)
wazuh.core.exception.WazuhInternalError: Error 3021 - Timeout executing API request

Please, feel free to request any further information regarding this issue.

Notes

  • The user mentioned that this issue didn't exist while using Wazuh 3.x (with the same amount of agents)
@Selutario
Copy link
Member

Issue update

Hey team!

While working on issue #8541 (comment) to improve some error messages, I have seen that the API timeout errors already contain which endpoint has been executed and its parameters, like it can be seen in the last line of this log:

2021/05/06 07:36:08 ERROR: Timeout executing API request
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.2.0-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 253, in execute_local_request
    data = await asyncio.wait_for(task, timeout=timeout)
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.2.0-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 255, in execute_local_request
    raise exception.WazuhInternalError(3021)
wazuh.core.exception.WazuhInternalError: Error 3021 - Timeout executing API request
2021/05/06 07:36:08 INFO: wazuh 192.168.128.1 "GET /security/user/authenticate" with parameters {} and body {} done in 10.102s: 500
2021/05/06 07:36:18 ERROR: Timeout executing API request
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.2.0-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 253, in execute_local_request
    data = await asyncio.wait_for(task, timeout=timeout)
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.2.0-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 255, in execute_local_request
    raise exception.WazuhInternalError(3021)
wazuh.core.exception.WazuhInternalError: Error 3021 - Timeout executing API request
2021/05/06 07:36:18 INFO: wazuh 192.168.128.1 "GET /security/user/authenticate" with parameters {} and body {} done in 10.111s: 500

However, although the issue specification does not show the complete error trace (it was reported by a user), it turns out that the endpoint name/parameters are never displayed in its case. It is as if the API had gone into an infinite loop.

Also, it seems to return a timeout despite using the wait_for_complete option, which doesn't make any sense and might have something to do with this loop: User's API logs

Regards,
Selu.

@dwellkamp
Copy link

dwellkamp commented May 11, 2021

I face a similar issue.
On worker nodes the API seems to get stuck. It still takes requests, but does not seem to process them.

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 154, in distribute_function
    response = await self.execute_remote_request()
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 371, in execute_remote_request
    return json.loads(node_response,
  File "/var/ossec/framework/python/lib/python3.9/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
  File "/var/ossec/framework/python/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/var/ossec/framework/python/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hanging requests stay until wazuh-manager.service is restarted:

# ss -tn | grep :55000
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50142           
CLOSE-WAIT  42      0            127.0.0.1:55000      127.0.0.1:55458           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49172           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49896           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49088           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49416           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50018           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49794           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49298           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50440           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50286           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49538           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49176           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49898           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50016           
CLOSE-WAIT  1       0          10.231.1.59:55000   10.140.45.40:48996           
CLOSE-WAIT  1       0          10.231.1.59:55000   10.140.45.40:49006           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50292           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50438           
CLOSE-WAIT  42      0            127.0.0.1:55000      127.0.0.1:55460           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50140           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49836           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49418           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49536           
CLOSE-WAIT  42      0            127.0.0.1:55000      127.0.0.1:55462           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49296           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:50264           
ESTAB       270     0          10.231.1.59:55000   10.140.45.40:49696

The wazuh-kibana-webapp logs out users after the API-Timeout (20000ms) with an anoying SESSION_EXPIRED, even if the user has just logged in.

Error from the api.log on the wazuh master:

2021/05/12 09:48:21 ERROR: Timeout sending request
Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/asyncio/locks.py", line 226, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 492, in wait_for
    fut.result()
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/local_client.py", line 183, in send_api_request
    await asyncio.wait_for(self.protocol.response_available.wait(), timeout=timeout)
  File "/var/ossec/framework/python/lib/python3.9/asyncio/tasks.py", line 494, in wait_for
    raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 149, in distribute_function
    response = await self.forward_request()
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 464, in forward_request
    response = await asyncio.shield(asyncio.gather(*[forward(node) for node in valid_nodes]))
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/dapi/dapi.py", line 409, in forward
    result = json.loads(await client.execute(b'dapi_forward',
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/local_client.py", line 213, in execute
    result = await self.send_api_request(command, data, wait_for_complete)
  File "/var/ossec/framework/python/lib/python3.9/site-packages/wazuh-4.1.5-py3.9.egg/wazuh/core/cluster/local_client.py", line 186, in send_api_request
    raise exception.WazuhInternalError(3020)
wazuh.core.exception.WazuhInternalError: Error 3020 - Timeout sending request

@alexmateescu
Copy link

any update on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants