Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overload protection #9

Open
seriyps opened this issue Feb 3, 2022 · 5 comments
Open

Overload protection #9

seriyps opened this issue Feb 3, 2022 · 5 comments
Labels
discussion ⏰ Discussion feature 🚀 Feature Request improvement 💡 Improvement

Comments

@seriyps
Copy link

seriyps commented Feb 3, 2022

logstasher version

1.0.0

OS version

Any

Description

  • Motivation
    When running system is under stress load and it logs a lot, there is a risk that singletone logstasher process won't be able to process all the incoming log requests. In this scenario gen_server:call

-spec send(Data :: binary()) -> ok | {error, atom()}.
send(Data) when is_binary(Data) ->
case whereis(?MODULE) of
undefined -> {error, not_started};
Pid -> gen_server:call(Pid, {send, Data})
end.

would timeout which would cause OTP logger to disable this handler.

Also, for some latency-critical systems it might be unacceptable to hung for up to 5s on synchronous log calls.

This might be less of a problem for UDP connection, but since logstasher supports tcp, it may block for quite some time.

  • Proposal
    Handlers in standard OTP logger provide flexible overload protection mechanisms.
    Unfortunately, they are not yet available as a library, so, AFAIK, other logger handler libraries tend to implement their own overload protection:

https://github.com/seriyps/logger_journald/blob/34b5d9cc0c35b06abe05fe2b5b7d826bf69fd8bc/src/logger_journald_h.erl#L120

https://github.com/hauleth/erlang-systemd/blob/8637dadf493d1a520d56cdd74db9813c5b86d669/src/systemd_journal_h.erl#L383

So I think it might be a good idea to add some sort of overload protection to logstasher too.

Current behavior

With TCP connection and high amount of logs, calls to logger:log may take up to 5s and logger handler may crash with timeout and being automatically disabled.

Expected behavior

Some sort of configurable overload protection mechanism. Ideally, with configuration compatible with OTP standard logger overload protection settings https://www.erlang.org/doc/apps/kernel/logger_chapter.html#overload_protection

Config

none

@mworrell
Copy link
Member

mworrell commented Feb 3, 2022

Was thinking that as a simple overload protection we could use the ringbuffer that we are also using in Zotonic to protect the access loggers.

The ringbuffer has a limited size and on read returns the number of skipped entries.

By adding this buffer, with a configurable size, we immediately gain the basic protection we need and can log if we are dropping log messages.

@vkatsuba vkatsuba added discussion ⏰ Discussion feature 🚀 Feature Request improvement 💡 Improvement labels Feb 3, 2022
@mmzeeman
Copy link
Member

mmzeeman commented Feb 4, 2022

That ring buffer in zotonic prevents the caller from blocking too. It just has to add the message to an ets table. The logger can then periodacally empty the messages from the table inside it's own process. For the access log, writing multiple access log entries in a batch was also more efficiënt.

Another (bit more ugly) solution is have the caller check the size of the mailbox of the gen-server before doing the call. If it is over some limit, it can return {error, overload}

@seriyps
Copy link
Author

seriyps commented Feb 4, 2022

Checking the mailbox size is quite common way of overload protection, but AFAIK it also has some concurrency-related limitations (since process_info is implemented as some series of asynchronous signals requiring process synchronization):
https://www.erlang.org/doc/reference_manual/processes.html#signals
https://www.erlang.org/blog/parallel-signal-sending-optimization/

@mworrell
Copy link
Member

mworrell commented Feb 4, 2022

I think we can add an option to the ringbuffer for sync reading (wait till content arrives). Then we don't need to poll the buffer and we can easily log how many messages were dropped.

@mmzeeman
Copy link
Member

mmzeeman commented Feb 5, 2022

FYI the ringbuffer is here: https://github.com/zotonic/ringbuffer. Just made a PR (zotonic/zotonic#2867) to remove the old, more simplistic buffered worker implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ⏰ Discussion feature 🚀 Feature Request improvement 💡 Improvement
Projects
None yet
Development

No branches or pull requests

4 participants