Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix errors in ossec-remoted when it is stressed #3602

Merged
merged 8 commits into from Jul 17, 2019
Merged

Conversation

albertomn86
Copy link
Contributor

@albertomn86 albertomn86 commented Jul 2, 2019

Related issue
#3574

Description

This PR solves the issues described in the issue #3574.

Error 1218: Connection reset by peer

The ECONNRESET value is now managed: f596b65

Error 1218: No such file or directory

The errno variable is set to 0 before the send() call: 7d7913c

Errors caused by invalid file descriptors

The message counter described in the issue was implemented. Also the function inet_ntoa() was replaced by inet_ntop() since it is not thread safe:

inet_ntop() extends the inet_ntoa(3) function to support multiple address families, inet_ntoa(3) is now considered to be deprecated in favor of inet_ntop().

http://man7.org/linux/man-pages/man3/inet_ntop.3.html

Tests

  • Compilation without warnings in every supported platform
    • Linux
    • Windows (Agent)
    • MAC OS X (Agent)
    • CentOS 6
  • Source installation
  • Source upgrade
  • Memory tests
    • Valgrind report for affected components
    • CPU impact
    • RAM usage impact
  • Retrocompatibility with older Wazuh versions
  • Working on cluster environments
  • Review logs syntax and correct language
  • Test in high load.

@albertomn86
Copy link
Contributor Author

Documentation: wazuh/wazuh-documentation#1316

@albertomn86
Copy link
Contributor Author

Valgrind report

==63823== Memcheck, a memory error detector
==63823== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==63823== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==63823== Command: /var/ossec/bin/ossec-remoted -fdd
==63823==

==63824==
==63824== HEAP SUMMARY:
==63824==     in use at exit: 1,146,383 bytes in 129 blocks
==63824==   total heap usage: 47,670 allocs, 47,541 frees, 80,185,988 bytes allocated
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 44 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B6A7: HandleSecure (secure.c:73)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 45 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B6C9: HandleSecure (secure.c:76)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 46 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B6EB: HandleSecure (secure.c:79)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 47 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B70D: HandleSecure (secure.c:82)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 48 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B72F: HandleSecure (secure.c:85)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 49 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B75F: HandleSecure (secure.c:90)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 272 bytes in 1 blocks are possibly lost in loss record 50 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B93B: HandleSecure (secure.c:132)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 1,088 bytes in 4 blocks are possibly lost in loss record 56 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B84B: HandleSecure (secure.c:112)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== 2,176 bytes in 8 blocks are possibly lost in loss record 58 of 70
==63824==    at 0x4C31B25: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==63824==    by 0x4013C86: allocate_dtv (dl-tls.c:290)
==63824==    by 0x4013C86: _dl_allocate_tls (dl-tls.c:538)
==63824==    by 0x5870421: allocate_stack (allocatestack.c:597)
==63824==    by 0x5870421: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==63824==    by 0x1993F1: CreateThreadJoinable (pthreads_op.c:47)
==63824==    by 0x199498: CreateThread (pthreads_op.c:62)
==63824==    by 0x11B7DD: HandleSecure (secure.c:101)
==63824==    by 0x11D380: HandleRemote (remoted.c:117)
==63824==    by 0x11E714: main (main.c:216)
==63824==
==63824== LEAK SUMMARY:
==63824==    definitely lost: 0 bytes in 0 blocks
==63824==    indirectly lost: 0 bytes in 0 blocks
==63824==      possibly lost: 5,168 bytes in 19 blocks
==63824==    still reachable: 1,141,215 bytes in 110 blocks
==63824==         suppressed: 0 bytes in 0 blocks
==63824== Reachable blocks (those to which a pointer was found) are not shown.
==63824== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==63824==
==63824== For counts of detected and suppressed errors, rerun with: -v
==63824== ERROR SUMMARY: 9 errors from 9 contexts (suppressed: 0 from 0)
==63824== could not unlink /tmp/vgdb-pipe-from-vgdb-to-63824-by-root-on-???
==63824== could not unlink /tmp/vgdb-pipe-to-vgdb-from-63824-by-root-on-???
==63824== could not unlink /tmp/vgdb-pipe-shared-mem-vgdb-63824-by-root-on-???

@vikman90 vikman90 self-assigned this Jul 3, 2019
@vikman90 vikman90 requested a review from snaow July 16, 2019 10:12
src/remoted/netcounter.c Show resolved Hide resolved
size_t counter = connections.list[fd];
w_mutex_unlock(&lock);
return counter;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ending newline.

w_mutex_lock(&state_mutex);
remoted_state.dequeued_after_close++;
w_mutex_unlock(&state_mutex);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing ending newline.

Copy link
Member

@vikman90 vikman90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@albertomn86
Copy link
Contributor Author

Testing

Tested in a AWS instance c4.large for manager and 600 connected agents.

Events per second: 17259.6

ERRORS:
No errors found in the ossec.log file.

WARNINGS:

2019/07/16 16:20:29 ossec-remoted: WARNING: (1218): Unable to send message to '437': A message could not be delivered completely. [228]
2019/07/16 16:20:29 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '437'.
2019/07/16 16:20:29 ossec-remoted: WARNING: (1218): Unable to send message to '109': Agent is not responding. [1125]
2019/07/16 16:20:30 ossec-remoted: WARNING: (1218): Unable to send message to '109': Agent is not responding. [1125]
2019/07/16 16:20:30 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '109'.
2019/07/16 16:20:31 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '164'.
2019/07/16 16:20:33 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '220'.
2019/07/16 16:20:34 ossec-remoted: WARNING: (1218): Unable to send message to '377': A message could not be delivered completely. [603]
2019/07/16 16:20:34 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '377'.
2019/07/16 16:20:35 ossec-remoted: WARNING: (1218): Unable to send message to '320': A message could not be delivered completely. [712]
2019/07/16 16:20:35 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '320'.
2019/07/16 16:20:36 ossec-remoted: WARNING: (1218): Unable to send message to '078': A message could not be delivered completely. [275]
2019/07/16 16:20:36 ossec-remoted: WARNING: (1246): Unable to send file 'merged.mg' to agent ID '078'.
2019/07/16 16:20:37 ossec-remoted: WARNING: (1218): Unable to send message to '012': A message could not be delivered completely. [1042

State file

# State file for ossec-remoted
# Updated every 5 seconds.

# Queue size
queue_size='39657'

# Total queue size
total_queue_size='131072'

# TCP sessions
tcp_sessions='599'

# Events sent to Analysisd
evt_count='14385664'

# Control messages received
ctrl_msg_count='67242'

# Discarded messages
discarded_count='0'

# Messages sent
msg_sent='1596971'

# Total number of bytes received
recv_bytes='4856619248'

## Messages dequeued after the agent closes the connection
dequeued_after_close='20755'

State file + 5 seconds

# State file for ossec-remoted
# Updated every 5 seconds.

# Queue size
queue_size='25609'

# Total queue size
total_queue_size='131072'

# TCP sessions
tcp_sessions='597'

# Events sent to Analysisd
evt_count='14471962'

# Control messages received
ctrl_msg_count='67523'

# Discarded messages
discarded_count='0'

# Messages sent
msg_sent='1608167'

# Total number of bytes received
recv_bytes='4879009212'

# Messages dequeued after the agent closes the connection
dequeued_after_close='21268'

@vikman90 vikman90 merged commit 202b57b into 3.9 Jul 17, 2019
@vikman90 vikman90 deleted the 3.9-fix-remoted-3574 branch July 17, 2019 14:38
@vikman90 vikman90 added this to To do in Review 3.9.4 via automation Jul 24, 2019
@vikman90 vikman90 requested a review from bah07 July 24, 2019 14:49
@chemamartinez chemamartinez moved this from To do to In progress in Review 3.9.4 Jul 26, 2019
@chemamartinez chemamartinez moved this from In progress to Done in Review 3.9.4 Jul 26, 2019
@vikman90 vikman90 added type/bug Something isn't working module/remote labels Jul 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module/remote type/bug Something isn't working
Projects
No open projects
Review 3.9.4
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants