Skip to content

Commit d6c8ed0

Browse files
committed
Bug#36423593 Purging ndb_binlog_index affects ndb binlog [part 2]
Problem: This patch is for the second part of the problem where PURGE BINARY LOGS holds the LOCK_index mutex for long duration and thus affects SHOW BINARY LOGS as well as blocking purge of both relay and binlog. Analysis: The ha_binlog_index_purge_file() callback is called with the LOCK_index mutex held. When the callback implementation need to do work that takes some time it will thus block above mentioned commands. While the prior patch managed to remove lots of work from the callback, the problem still exists for PURGE BINARY LOGS which should not return until work is complete. Solution: To allow PURGE BINARY LOGS to wait after having released LOCK_index a new ha_binlog_index_purge_wait() callback is added. The new callback is only called for PURGE BINARY LOGS and thus making it possible to preserve the existing synchronous behaviour of the command. Ideally the ha_binlog_index_purge_file() callback could have been moved to be called outside of LOCK_index, but the context regarding which file(s) to purge would then not be available in all code paths. Adding a new callback for waiting is a much smaller change and affects only the two code paths for PURGE BINARY LOGS TO and PURGE BINARY LOGS BEFORE. The new callback has no effect for storage engines not installing the handlerton::binlog_func callback. In addition the mystery code and comment stating that the ha_binlog_index_purge_file() callback can only be called when LOCK_index is not already held is removed. The callback should always be called and then it's up to the storage engine implementor to decide what to do. Change-Id: I4d19bd9b0a4bfa9820d11ff6d23a4bc4b0ea7e06 (cherry picked from commit 11f4459a1668fc2c0f2b59f29f4c4701785c610b)
1 parent 9b9311d commit d6c8ed0

File tree

6 files changed

+101
-39
lines changed

6 files changed

+101
-39
lines changed

sql/binlog.cc

+5-11
Original file line numberDiff line numberDiff line change
@@ -3356,6 +3356,7 @@ bool purge_source_logs_to_file(THD *thd, const char *to_log) {
33563356
auto purge_error = mysql_bin_log.purge_logs(
33573357
search_file_name, include_to_log, need_index_lock, need_update_threads,
33583358
nullptr, auto_purge);
3359+
ha_binlog_index_purge_wait(thd);
33593360
return purge_error_message(thd, purge_error);
33603361
}
33613362

@@ -3391,6 +3392,7 @@ bool purge_source_logs_before_date(THD *thd, time_t purge_time) {
33913392
constexpr auto auto_purge{false};
33923393
auto purge_error =
33933394
mysql_bin_log.purge_logs_before_date(purge_time, auto_purge);
3395+
ha_binlog_index_purge_wait(thd);
33943396
return purge_error_message(thd, purge_error);
33953397
}
33963398

@@ -6370,17 +6372,9 @@ int MYSQL_BIN_LOG::purge_index_entry(THD *thd, ulonglong *decrease_log_space,
63706372
}
63716373

63726374
error = 0;
6373-
if (!need_lock_index) {
6374-
/*
6375-
This is to avoid triggering an error in NDB.
6376-
6377-
@todo: This is weird, what does NDB errors have to do with
6378-
need_lock_index? Explain better or refactor /Sven
6379-
*/
6380-
if (!is_relay_log) {
6381-
// Only called when purging binlog
6382-
ha_binlog_index_purge_file(current_thd, log_info.log_file_name);
6383-
}
6375+
if (!is_relay_log) {
6376+
// Only called when purging binlog
6377+
ha_binlog_index_purge_file(thd, log_info.log_file_name);
63846378
}
63856379

63866380
DBUG_PRINT("info", ("purging %s", log_info.log_file_name));

sql/handler.cc

+6
Original file line numberDiff line numberDiff line change
@@ -5910,6 +5910,12 @@ int ha_binlog_index_purge_file(THD *thd, const char *file) {
59105910
return 0;
59115911
}
59125912

5913+
void ha_binlog_index_purge_wait(THD *thd) {
5914+
assert(thd);
5915+
binlog_func_st bfn = {BFN_BINLOG_PURGE_WAIT, nullptr};
5916+
binlog_func_foreach(thd, &bfn);
5917+
}
5918+
59135919
struct binlog_log_query_st {
59145920
enum_binlog_command binlog_command;
59155921
const char *query;

sql/handler.h

+37-1
Original file line numberDiff line numberDiff line change
@@ -698,7 +698,8 @@ enum enum_binlog_func {
698698
BFN_RESET_SLAVE = 2,
699699
BFN_BINLOG_WAIT = 3,
700700
BFN_BINLOG_END = 4,
701-
BFN_BINLOG_PURGE_FILE = 5
701+
BFN_BINLOG_PURGE_FILE = 5,
702+
BFN_BINLOG_PURGE_WAIT = 6
702703
};
703704

704705
enum enum_binlog_command {
@@ -7582,7 +7583,42 @@ void trans_register_ha(THD *thd, bool all, handlerton *ht,
75827583
const ulonglong *trxid);
75837584

75847585
int ha_reset_logs(THD *thd);
7586+
7587+
/**
7588+
Inform storage engine(s) that a binary log file will be purged and any
7589+
references to it should be removed.
7590+
7591+
The function is called for all purged files, regardless if it is an explicit
7592+
PURGE BINARY LOGS statement, or an automatic purge performed by the server.
7593+
7594+
@note Since function is called with the LOCK_index mutex held the work
7595+
performed in this callback should be kept at minimum. One way to defer work is
7596+
to schedule work and use the `ha_binlog_index_purge_wait` callback to wait for
7597+
completion.
7598+
7599+
@param thd Thread handle of session purging file. The nullptr value indicates
7600+
that purge is done at server startup.
7601+
@param file Name of file being purged.
7602+
@return Always 0, return value are ignored by caller.
7603+
*/
75857604
int ha_binlog_index_purge_file(THD *thd, const char *file);
7605+
7606+
/**
7607+
Request the storage engine to complete any operations that were initiated
7608+
by `ha_binlog_index_purge_file` and which need to complete
7609+
before PURGE BINARY LOGS completes.
7610+
7611+
The function is called only from PURGE BINARY LOGS. Each PURGE BINARY LOGS
7612+
statement will result in 0, 1 or more calls to `ha_binlog_index_purge_file`,
7613+
followed by exactly 1 call to `ha_binlog_index_purge_wait`.
7614+
7615+
@note This function is called without LOCK_index mutex held and thus any
7616+
waiting performed will only affect the current session.
7617+
7618+
@param thd Thread handle of session.
7619+
*/
7620+
void ha_binlog_index_purge_wait(THD *thd);
7621+
75867622
void ha_reset_slave(THD *thd);
75877623
void ha_binlog_log_query(THD *thd, handlerton *db_type,
75887624
enum_binlog_command binlog_command, const char *query,

storage/ndb/plugin/ha_ndbcluster_binlog.cc

+26-10
Original file line numberDiff line numberDiff line change
@@ -320,6 +320,14 @@ static Ndb_binlog_purger ndb_binlog_purger(&opt_bin_log,
320320
binlog on the local server, if the auto purge conditions are exceeded at
321321
that time the applier will trigger a purge of the binlog.
322322

323+
@note This callback is normally called with the `LOCK_index` mutex held, this
324+
means that time consuming work should not be performed as the mutex is
325+
needed for other functionality and are assumed to not be held for long.
326+
Instead, the work to remove references to the purged file are started
327+
asynchronously in the background and waited for completion from
328+
ndbcluster_binlog_index_purge_wait() which is then called after releasing
329+
the mutex.
330+
323331
@param thd Thread handle
324332
@param filename Name of the binlog file which has been removed
325333
*/
@@ -332,7 +340,7 @@ static void ndbcluster_binlog_index_purge_file(THD *thd, const char *filename) {
332340
// No thd indicates auto purge at server startup, just submit the purged
333341
// file to purger who will remove the rows when server has started properly.
334342
ndb_log_info("Purging binlog file at server startup: '%s'", filename);
335-
ndb_binlog_purger.submit_purge_binlog_file(filename);
343+
ndb_binlog_purger.submit_purge_binlog_file(nullptr, filename);
336344
return;
337345
}
338346

@@ -342,17 +350,22 @@ static void ndbcluster_binlog_index_purge_file(THD *thd, const char *filename) {
342350
}
343351

344352
ndb_log_info("Purging binlog file: '%s'", filename);
345-
ndb_binlog_purger.submit_purge_binlog_file(filename);
353+
ndb_binlog_purger.submit_purge_binlog_file(thd, filename);
354+
}
355+
356+
static void ndbcluster_binlog_index_purge_wait(THD *thd) {
357+
DBUG_TRACE;
358+
assert(thd);
359+
// Only PURGE BINARY LOGS TO and PURGE BINARY LOGS BEFORE calls wait
360+
assert(thd_sql_command(thd) == SQLCOM_PURGE ||
361+
thd_sql_command(thd) == SQLCOM_PURGE_BEFORE);
346362

347-
if (thd_sql_command(thd) == SQLCOM_PURGE ||
348-
thd_sql_command(thd) == SQLCOM_PURGE_BEFORE) {
349-
// When the ndb binlog thread triggers a purge it should never wait
350-
assert(!ndb_thd_is_binlog_thread(thd));
363+
// When the ndb binlog thread triggers a purge it should never wait
364+
assert(!ndb_thd_is_binlog_thread(thd));
351365

352-
// Wait until purger has removed all rows for the file
353-
ndb_binlog_purger.wait_purge_binlog_file(filename);
354-
}
355-
return;
366+
// Wait until purger has removed all files requested by this session
367+
ndb_log_info("Waiting for purge to complete");
368+
ndb_binlog_purger.wait_purge_completed_for_session(thd);
356369
}
357370

358371
/*
@@ -613,6 +626,9 @@ static int ndbcluster_binlog_func(handlerton *, THD *thd, enum_binlog_func fn,
613626
case BFN_BINLOG_PURGE_FILE:
614627
ndbcluster_binlog_index_purge_file(thd, (const char *)arg);
615628
break;
629+
case BFN_BINLOG_PURGE_WAIT:
630+
ndbcluster_binlog_index_purge_wait(thd);
631+
break;
616632
}
617633
return 0;
618634
}

storage/ndb/plugin/ndb_binlog_purger.cc

+14-12
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,8 @@ Ndb_binlog_purger::Ndb_binlog_purger(bool *binlogging_on, ulong *log_purge_rate)
6666

6767
Ndb_binlog_purger::~Ndb_binlog_purger() {}
6868

69-
void Ndb_binlog_purger::submit_purge_binlog_file(const std::string &filename) {
69+
void Ndb_binlog_purger::submit_purge_binlog_file(void *session,
70+
const std::string &filename) {
7071
if (is_stop_requested()) {
7172
// Does not happen, but better not accept new work
7273
log_error("Binlog file '%s' submitted while stopping", filename.c_str());
@@ -76,26 +77,27 @@ void Ndb_binlog_purger::submit_purge_binlog_file(const std::string &filename) {
7677
std::unique_lock files_lock(m_purge_files_lock);
7778

7879
// Don't allow adding existing filename
79-
if (std::ranges::any_of(m_purge_files,
80-
[filename](const auto f) { return f == filename; })) {
80+
if (std::ranges::any_of(m_purge_files, [filename](const auto &request) {
81+
return request.filename == filename;
82+
})) {
8183
log_info("Binlog file '%s' was already submitted for purge, skipping",
8284
filename.c_str());
8385
return;
8486
}
8587

86-
m_purge_files.push_back(std::move(filename));
88+
m_purge_files.push_back({std::move(filename), session});
8789
purge_queue_size++;
8890
assert(purge_queue_size == static_cast<long long>(m_purge_files.size()));
8991
m_purge_file_added_cond.notify_one();
9092
}
9193

92-
void Ndb_binlog_purger::wait_purge_binlog_file(const std::string &filename) {
94+
void Ndb_binlog_purger::wait_purge_completed_for_session(void *session) {
9395
std::unique_lock files_lock(m_purge_files_lock);
9496

95-
m_purge_files_finished_cond.wait(files_lock, [this, filename]() {
97+
m_purge_files_finished_cond.wait(files_lock, [this, session]() {
9698
return is_stop_requested() ||
97-
std::ranges::none_of(m_purge_files, [filename](const auto f) {
98-
return f == filename;
99+
std::ranges::none_of(m_purge_files, [session](const auto &request) {
100+
return request.session == session;
99101
});
100102
});
101103
}
@@ -137,7 +139,7 @@ void Ndb_binlog_purger::find_and_delete_orphan_purged_rows() {
137139
for (const auto &file : ndb_binlog_index_files_not_existing) {
138140
log_info("Found row(s) for '%s' which has been purged, removing it",
139141
file.c_str());
140-
submit_purge_binlog_file(file);
142+
submit_purge_binlog_file(nullptr, file);
141143
}
142144
}
143145

@@ -229,7 +231,7 @@ void Ndb_binlog_purger::process_purge_first_file_completed(
229231
}
230232
#endif
231233
std::unique_lock files_lock(m_purge_files_lock);
232-
assert(m_purge_files[0] == filename);
234+
assert(m_purge_files[0].filename == filename);
233235
m_purge_files.erase(m_purge_files.begin());
234236
purged_files_count++;
235237
purge_queue_size--;
@@ -242,7 +244,7 @@ bool Ndb_binlog_purger::process_purge_first_file(Ndb_local_connection &con) {
242244
if (m_purge_files.empty()) return false;
243245

244246
// Start processing file to remove.
245-
const auto filename = m_purge_files[0];
247+
const auto filename = m_purge_files[0].filename;
246248
log_info("Start purging binlog file: '%s'", filename.c_str());
247249

248250
files_lock.unlock();
@@ -315,7 +317,7 @@ void Ndb_binlog_purger::process_purge_files_list() {
315317
log_error(
316318
"Got too many errors when removing rows for '%s' from "
317319
"ndb_binlog_index, skipping...",
318-
m_purge_files[0].c_str());
320+
m_purge_files[0].filename.c_str());
319321
m_purge_files.erase(m_purge_files.begin());
320322
purge_queue_size--;
321323
assert(purge_queue_size == static_cast<long long>(m_purge_files.size()));

storage/ndb/plugin/ndb_binlog_purger.h

+13-5
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,14 @@ class Ndb_binlog_purger : public Ndb_component {
4646
THD *m_thd{nullptr};
4747
void create_thd(void *stackptr);
4848

49+
// Name of one file to purge and the session which requested it
50+
struct PurgeRequest {
51+
std::string filename;
52+
void *session;
53+
};
54+
4955
// List of purged files whose rows need to be removed
50-
std::vector<std::string> m_purge_files;
56+
std::vector<PurgeRequest> m_purge_files;
5157
// Mutex protecting the list of files to purge
5258
std::mutex m_purge_files_lock;
5359
// Condition used by purger to wait until there are new files to
@@ -91,16 +97,18 @@ class Ndb_binlog_purger : public Ndb_component {
9197
@brief Submit the name of a purged binlog file for aynchrounous removal
9298
of corresponding rows from the ndb_binlog_index table.
9399
100+
@param session Identifies the session requesting purge. Used for being able
101+
to wait for purge to complete, use nullptr when there is no need to wait.
94102
@param filename Name of the binlog file which has been purged
95103
*/
96-
void submit_purge_binlog_file(const std::string &filename);
104+
void submit_purge_binlog_file(void *session, const std::string &filename);
97105

98106
/*
99-
@brief Wait until removal of rows for the given filename has completed.
107+
@brief Wait until removal of files for the given session has completed.
100108
101-
@param filename Name of the binlog file to wait for removal of.
109+
@param session Identifies the session requesting purge..
102110
*/
103-
void wait_purge_binlog_file(const std::string &filename);
111+
void wait_purge_completed_for_session(void *session);
104112

105113
private:
106114
virtual int do_init() override;

0 commit comments

Comments
 (0)