Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broker::Store tables with expiration triggering initialization errors (4.2.2 and 5.0.0) #2331

Open
awelzel opened this issue Aug 10, 2022 · 9 comments
Assignees

Comments

@awelzel
Copy link
Contributor

awelzel commented Aug 10, 2022

Hey,

while fuzzing around with Broker::Store, ran into the following issue observed also by @mdhawan on systems he has was working on (suspecting unrelated to any freezing, however).

When setting &write_expire broker backed table (sqlite) and setting Broker::schedule_policy to stealing the following errors are produced with Zeek 5.0 when (presumably) there are expired entries in the broker store:

$ zeek broker/store/brokerstore-attr-expire-restart.zeek && sleep 1 && zeek broker/store/brokerstore-attr-expire-restart.zeek
received termination signal
error: Failed to load value for key (59.154.216.67, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.222.150, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.223.174, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.225.102, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.225.185, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.228.94, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.231.200, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.233.13, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.237.136, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.240.68, description) while importing Broker store persistenceStore to table
fatal error: errors occurred while initializing

With Zeek 5.0.0, using the Broker::schedule_policy = "sharing" doesn't trigger the problem. With Zeek 4.2.2, the problem triggers with Broker::schedule_policy set either way. It didn't trigger with the dev build I have lying around, but didn't try very hard, but maybe something was fixed here?

This is the reproducer script:

# Run this twice with a short sleep in between to trigger an issue.
redef exit_only_after_terminate=T;

# Needed for Zeek 5.0.0
redef Broker::scheduler_policy = "stealing";

module Test;

export {
        type Idx: record {
                host: addr;
                description: string;
        };

        type Val: record {
                uid: string;
                timestamp: time;
        };

        option expire_persistence_ival = 2sec;
}

global store : opaque of Broker::Store;
global store_table: table[Idx] of Val &write_expire=expire_persistence_ival &broker_allow_complex_type &broker_store="persistenceStore";

global a_count = 1000000000;

event loop()
        {
        local host = count_to_v4_addr(++a_count + rand(10000));
        local uid = unique_id("X");
        local idx = Idx($host = host, $description="description");
        local val = Val($uid=uid, $timestamp=current_time());

        store_table[idx] = val;

        schedule 0.1 sec { loop() };
        }

event stop()
        {
        terminate();
        }

event go()
        {
        schedule expire_persistence_ival + 0.5sec  { stop() };
        event loop();
        }

event zeek_init()
        {
        store = Broker::create_master("persistenceStore", Broker::SQLITE, [$sqlite=[$path=cat("mystore.sqlite")]]);
        schedule 0.1sec { go() };
        }
@awelzel
Copy link
Contributor Author

awelzel commented Aug 10, 2022

It didn't trigger with the dev build I have lying around, but didn't try very hard, but maybe something was fixed here?

It does trigger with the latest master, too. Needs redef Broker::scheduler_policy = "stealing"; just as Zeek 5.0.0, however.

$ zeek --version && zeek broker/store/brokerstore-attr-expire-restart.zeek && sleep 1 && zeek broker/store/brokerstore-attr-expire-restart.zeek
zeek version 5.1.0-dev.351
received termination signal
error: Failed to load value for key (59.154.216.47, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.216.176, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.219.219, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.220.84, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.221.30, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.222.72, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.224.39, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.224.244, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.230.249, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.154.237.11, description) while importing Broker store persistenceStore to table
fatal error: errors occurred while initializing

@Neverlord
Copy link
Member

error: Failed to load value for key (59.154.216.47, description) while importing Broker store persistenceStore to table

I've stepped through the sources. This error gets printed if broker::store::get fails. And it fails after 10 seconds per default (default is declared in broker/timeout.hh).

10 seconds to load an SQLite database should be plenty. Even with a thousand entries. So it seems like some kind of performance bug or maybe just another symptom of #2332.

@timwoj timwoj added this to the 5.1.0 milestone Aug 30, 2022
@timwoj
Copy link
Member

timwoj commented Nov 2, 2022

I'm not seeing this anymore on master with script above so it's possible that the fixes for #2332 fixed it:

tim@coregeek2 build% zeek ~/Desktop/2331.zeek && sleep 1 && zeek ~/Desktop/2331.zeek
received termination signal
received termination signal

@awelzel can you see if you can reproduce it still?

@awelzel
Copy link
Contributor Author

awelzel commented Nov 3, 2022

@awelzel can you see if you can reproduce it still?

Doesn't reproduce here either. We can close it, though I don't quite get how #2332 fixed it :-)

@awelzel awelzel closed this as completed Nov 3, 2022
@awelzel
Copy link
Contributor Author

awelzel commented Nov 3, 2022

Ah, nope, still reproduces. I think the timings involved just got much better with the #2332, but the actual problem is still there:

Changing this in event loop():

-        schedule 0.1 sec { loop() };
+        schedule 0.0001 sec { loop() };

And then the following:

$ rm -f mystore.sqlite; zeek 2331.zeek; echo "done1"; sleep 1; zeek 2331.zeek; echo "done2";

The second zeek process spills lots of errors and exits with an initialization error:

$ rm -f mystore.sqlite; zeek 2331.zeek; echo "done1"; sleep 1; zeek 2331.zeek; echo "done2";
received termination signal
done1
error: Failed to load value for key (59.154.235.54, description) while importing Broker store persistenceStore to table
[...many...]
error: Failed to load value for key (59.155.47.0, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.155.47.6, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.155.47.82, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.155.47.218, description) while importing Broker store persistenceStore to table
error: Failed to load value for key (59.155.47.226, description) while importing Broker store persistenceStore to table
fatal error: errors occurred while initializing
done2

@awelzel awelzel reopened this Nov 3, 2022
@Neverlord
Copy link
Member

@awelzel thanks for the confirmation. Isn't 0.0001 sec causing Zeek to basically run in a hot loop? Do you see memory piling up during the run? I would expect that writing to a Broker store every 100 microseconds pushes more into Broker than it can handle.

@awelzel
Copy link
Contributor Author

awelzel commented Nov 3, 2022

Isn't 0.0001 sec causing Zeek to basically run in a hot loop? Do you see memory piling up during the run? I would expect that writing to a Broker store every 100 microseconds pushes more into Broker than it can handle.

Yes, it's a busy loop. I'd guess we could use a lower frequency, but run processes for much longer (200 seconds instead of 2, for example). Think the important part is to have enough elements in the table and have some expire during the loading process (this is a bit guesses and hand wavy).

Memory goes up, yes. But not something that's concerning IMO. I'd also view the two invocations separately really:

  • A process is populating the sqlite database (that is just faked with the busy loop, it could also be a long-running Zeek process writing entries once in a while)
  • A second process (e.g. Zeek being restarted) loading the pre-populated database failing during initialization (the second zeek invocation of the reproducer never enters into the busy loop)

@Neverlord
Copy link
Member

I've reproduced this locally with this small patch to Zeek to see what Broker is complaining about:

diff --git a/src/broker/Manager.cc b/src/broker/Manager.cc
index 37e7f9d4c..4b1376470 100644
--- a/src/broker/Manager.cc
+++ b/src/broker/Manager.cc
@@ -1936,9 +1936,10 @@ void Manager::BrokerStoreToZeekTable(const std::string& name, const detail::Stor
                auto value = handle->store.get(key);
                if ( ! value )
                        {
+                       auto err_str = broker::to_string(value.error());
                        reporter->Error(
-                               "Failed to load value for key %s while importing Broker store %s to table",
-                               to_string(key).c_str(), name.c_str());
+                               "Failed to load value for key %s while importing Broker store %s to table: %s",
+                               to_string(key).c_str(), name.c_str(), err_str.c_str());
                        table->EnableChangeNotifications();
                        continue;
                        }

With that, I get a bunch of error: Failed to load value for key ... while importing Broker store persistenceStore to table: no_such_key.

So it seems like Broker is loading the database, then tells Zeek what keys are initially in there. Then it applies the expiration times (2s) and basically all values time out immediately. This makes sense to me, since the items were created more than 2s in the past (using the absolute time when the first process created them). Then Zeek comes around asking for the values of all the keys and Broker no longer can find them, because they've expired.

When ignoring no_such_key via this patch:

diff --git a/src/broker/Manager.cc b/src/broker/Manager.cc
index 37e7f9d4c..3fb58cf48 100644
--- a/src/broker/Manager.cc
+++ b/src/broker/Manager.cc
@@ -1936,10 +1936,16 @@ void Manager::BrokerStoreToZeekTable(const std::string& name, const detail::Stor
                auto value = handle->store.get(key);
                if ( ! value )
                        {
-                       reporter->Error(
-                               "Failed to load value for key %s while importing Broker store %s to table",
-                               to_string(key).c_str(), name.c_str());
-                       table->EnableChangeNotifications();
+                       auto& err = value.error();
+                       if ( err != broker::ec::no_such_key )
+                               {
+                               auto err_str = broker::to_string(err);
+                               reporter->Error(
+                                       "Failed to load value for key %s while importing Broker store %s to table: %s",
+                                       to_string(key).c_str(), name.c_str(), err_str.c_str());
+                               table->EnableChangeNotifications();
+                               }
+                       // else: ignore; value has expired since retrieving the key
                        continue;
                        }

I don't get any errors from the second run.

However, I've also played a bit with expire_persistence_ival. This parameter ultimately tells Broker the expiration time for each time, right? When increasing this to 10s, then the first Zeek process needs around a minute (rather than 10s) and the second process actually runs into a request_timeout.

Before I start debugging under some wrong assumptions: what would be the expected system behavior for Zeek here? I would expect that the expiration time should have no effect on the execution time of the first process. Is something in Zeek is actively waiting for the expiration times? My assumption would be Zeek just shuts down and Broker then uses the SQLite database to initialize the table again with expiration time restored to the original creation time + the relative duration.

@timwoj timwoj modified the milestones: 5.1.0, 5.2.0 Nov 8, 2022
@awelzel
Copy link
Contributor Author

awelzel commented Nov 14, 2022

However, I've also played a bit with expire_persistence_ival. This parameter ultimately tells Broker the expiration time for each time, right? When increasing this to 10s, then the first Zeek process needs around a minute (rather than 10s) and the second process actually runs into a request_timeout.

In the reproducer, increasing the expiration time also increases the runtime of the process and the number of items inserted. Sorry, I realize that might be unexpected, but I was just fuzzing around seeing if I could trigger any errors. The following line is responsible for your observation:

        schedule expire_persistence_ival + 0.5sec  { stop() };

With the same number of elements inserted, yes, the expiration time shouldn't have a big effect on the runtime. But how the reproducer is structured, it'll insert ~5x as many elements. Coupled with the slow sqlite implementation that probably makes it very visible.

I'm not sure a request_timeout for the second process should happen - if Zeek managed to write out a broker store, it should be able to read it back, too, even if it takes very long. I'm sure there are optimizations around batching broker store / sqlite operations to fix the "very long" part.

Is something in Zeek is actively waiting for the expiration times? My assumption would be Zeek just shuts down and Broker then uses the SQLite database to initialize the table again with expiration time restored to the original creation time + the relative duration.

That assumption makes sense to me.

@timwoj timwoj removed this from the 5.2.0 milestone Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants