You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
WL#8069: Ensure that the first LCP executes well after a long idle period
BUG#30276755: sync_lsn can wait forever
Reviewed-by: Mauritz Sundell <mauritz.sundell@oracle.com>
1) Add 3 ndbinfo tables:
pgman_time_track_stats
This table tracks latencies to perform get_page and read and write latencies
of PGMAN disk accesses
diskstat
This table reports disk statistics when accessing PGMAN pages, it reports
statistics for the last second
diskstat_history
Same as diskstat, but reports stats for last 20 seconds, one row per second
2) Add new configuration variable MaxDiskDataLatency
Track mean latency of file operations, make it possible to set a maximum
disk data latency that is acceptable in new config variable:
MaxDiskDataLatency.
If we pass this we will abort 20% of all disk accesses and if we pass
2 times this value we will abort 2 out of 5 and so forth if it is
X times this value we will abort X out of X + 1 disk access requests.
If it passes 5 times this delay all disk accesses will be aborted.
By default this configuration is 0 which means that we will not check
for any maximum disk data latency.
Add new configuration variable DiskDataUsingSameDisk.
By default this is set to true. This means that we will by default try to
balance the load from writing disk data checkpointing with the load created
by writing in-memory checkpoints. If this configuration is set to 0 we
will calculate the disk checkpoint speed independent of writes to
in-memory checkpoints.
3) Fixed a bug in sync_lsn that caused log_waits to wait
forever in some situations (BUG#30276755).
4) Report m_max_sync_req_lsn in DUMP command
5) Report duration of NDBFS requests in microseconds rather than millis
in DUMP command
6) Improved latency of get_page requests in a number of different ways,
particularly when WAL rule had to be applied
7) Started preparations to handle adaptive checkpoint speed also
in PGMAN.
- Keep track of number of dirty pages to assess current visible
need for checkpointing
- Keep track of number of pageouts in last LCP to give an estimate
of normal checkpoint speed.
- Provide information of how much time disk data uses compared to
in memory pages.
- Provide information of time it took with last LCP
8)
First step to make it possible to perform LCP writes even when not requested
to do so yet. The idea is the following.
The next fragment to perform an LCP is the next in table id order and fragment
order. Thus we can find the next fragment that will execute an LCP. By starting
to write the pages in this fragment we are likely to make it much faster to run
the LCP ones the request to perform the LCP in PGMAN arrives.
We will at most write four fragments ahead of the current fragment being
checkpointed. This is to strike a balance between staying ahead to get smooth
checkpoint speed and avoiding to write too much that later is written again
when writing the current checkpoint fragment.
9) Needed to track number of outstanding prepare LCP writes
10) Made it possible to run checkpointing also in normal operation
during times when no disk data checkpoint has been requested.
Calculates both IO parallelism and IO rate once every 100
milliseconds. This in order to attempt to even out the IO
load during checkpoints. Modern NVMe drives can write millions
of IO operations per second, so it is very easy to overwhelm
the disk drives with checkpoint flushes that makes it hard to
perform normal disk IO for user operations. So we try to keep
track of how fast we need to write disk data checkpoints and
in-memory checkpoints to ensure an even IO load. It is also
important to find a good balance between disk data checkpointing
and in-memory checkpointing.
11) We invoked a lot of extra latency by putting the dirty pages first
in dirty list. This led to writes of the most recently dirtied
pages coming first. This led to a lot of unnecessary invocations
of the WAL rule.
Fixed this by inserting it last in list. In addition every time
the page was made dirty (even when already dirty), we put the
page last in its current dirty list to ensure that we minimise
the risk of having to apply the WAL rule during LCP execution.
In addition we ensured that we never attempted to write any
pages that needed application of the WAL rule or that was ready
to send a callback on a get page during prepare LCP phase.
The prepare phase will mostly have enough work to do anyways.
Finally we attempt to avoid writing pages in SL_CALLBACK list
during LCP execution. We have to apply rules however to ensure
that we make progress by not applying this rule for last
page in dirty list. Also we don't apply the rule more than 32
times to avoid breaking any rules for real-time execution.
Finally we don't allow skipping two pages after each other that
are in SL_CALLBACK list since that would put us at risk of
looping on those two pages. We only move the page one step
forward when skipping it to avoid giving it to high priority
to skip the page.
12) The TSMAN had a major bottleneck in that it was protected by one single
mutex. We held this mutex for quite some time during inserts and we also
held it for some time preparing to page out and after completed page out
and in a few more places. This is a serious bottleneck for disk data.
The solution is to break up the protection in a few steps.
The first step is that the Tablespace_client only takes a lock on one
instance, its own instance. This means those Tablespace_client will
not have problems with each other. When TSMAN needs to manipulate any
data structures used by any Tablespace_client one can lock all
instances. This is a rare operation only occurring when manipulating
a tablespace to add more files, add a tablespace or when dropping a
tablespace.
Allocation and freeing extents requires a bit more protection given
that we're working on a free list. Thus we use a allocate extent lock
to protect these operations in addition to the instance lock.
Finally to ensure that the instances don't bounce into each other we
need to protect extent pages. We keep a fixed amount of mutexes per
tablespace data file to protect these extent pages accesses from
each other. We use a very simple hash function to decide which mutex
to use for a specific extent page.
13) The extra PGMAN worker works differently in how LCPs are discovered
to be started and ended. We need code to handle this as well in PGMAN.
14) Ensure that SYNC_EXTENT_PAGES_REQ is sent with FIRST_LCP also when no
disk data tables present.
15) Handle case with completely empty LCP
16) Added more jam's around BUSY state in PGMAN
Added rules to try to keep LCP speed down such that we at least
take 10 seconds to run an LCP.
17) Fixed bug with respect to variable m_lcp_ongoing that wasn't properly handled for
flushing of page cache for restarts.
Reorganised debug printouts a bit and standardised on printing instance number in
parenthesis at start of printout.
18) Need to avoid setting BUSY that would require lists to change for extent pages
19) Minor adaptions to decrease aggressiveness in writing LCPs for disk data
20) More fine-tuning of adaptive control parameters
21) Occasionally when running testRedo -n CheckLCPStartsAfterSR we
move the REDO log so fast that we haven't finished opening the
next file when starting to write into this next file. In this
test case the REDO log size is set to 16 * 16 MB. I change to
using 4 * 64 MB instead to minimise this risk.
22) Avoid trying to lock client locks when already holding those
before calling execFSCLOSECONF.
23) New error code 1518 for overload when going beyond MaxDiskDataLatency
24) Slowed down the speed a bit more to increase lengths of LCPs in normal
operation.
The aim of this worklog is to ensure that we balance the disk write rate for the
actual required disk write rate. This makes it possible to have very heavy write
rates also on disk data columns in NDB, particularly when using modern NVMe drives.
It has been tested heavily for large rows with YCSB on modern NVMe drives supporting
millions of IOPS and also tested on older SSD drives supporting at most 20.000 IOPS.
The worklog also adds ndbinfo tables to track the usage a bit more and debugging info
to track any problems in the new code. The worklog integrates the Adaptive Redo Control
algorithm for in-memory data with the checkpoint algorithm for disk data columns.
Copy file name to clipboardExpand all lines: scripts/mysql_system_tables.sql
+66Lines changed: 66 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -690,6 +690,16 @@ PREPARE stmt FROM @str;
690
690
EXECUTE stmt;
691
691
DROP PREPARE stmt;
692
692
693
+
SET @str=IF(@have_ndbinfo,'DROP VIEW IF EXISTS `ndbinfo`.`diskstat`','SET @dummy = 0');
694
+
PREPARE stmt FROM @str;
695
+
EXECUTE stmt;
696
+
DROP PREPARE stmt;
697
+
698
+
SET @str=IF(@have_ndbinfo,'DROP VIEW IF EXISTS `ndbinfo`.`diskstats_1sec`','SET @dummy = 0');
699
+
PREPARE stmt FROM @str;
700
+
EXECUTE stmt;
701
+
DROP PREPARE stmt;
702
+
693
703
SET @str=IF(@have_ndbinfo,'DROP VIEW IF EXISTS `ndbinfo`.`error_messages`','SET @dummy = 0');
694
704
PREPARE stmt FROM @str;
695
705
EXECUTE stmt;
@@ -735,6 +745,11 @@ PREPARE stmt FROM @str;
735
745
EXECUTE stmt;
736
746
DROP PREPARE stmt;
737
747
748
+
SET @str=IF(@have_ndbinfo,'DROP VIEW IF EXISTS `ndbinfo`.`pgman_time_track_stats`','SET @dummy = 0');
749
+
PREPARE stmt FROM @str;
750
+
EXECUTE stmt;
751
+
DROP PREPARE stmt;
752
+
738
753
SET @str=IF(@have_ndbinfo,'DROP VIEW IF EXISTS `ndbinfo`.`processes`','SET @dummy = 0');
739
754
PREPARE stmt FROM @str;
740
755
EXECUTE stmt;
@@ -985,6 +1000,28 @@ PREPARE stmt FROM @str;
985
1000
EXECUTE stmt;
986
1001
DROP PREPARE stmt;
987
1002
1003
+
# ndbinfo.ndb$diskstat
1004
+
SET @str=IF(@have_ndbinfo,'DROP TABLE IF EXISTS `ndbinfo`.`ndb$diskstat`','SET @dummy = 0');
1005
+
PREPARE stmt FROM @str;
1006
+
EXECUTE stmt;
1007
+
DROP PREPARE stmt;
1008
+
1009
+
SET @str=IF(@have_ndbinfo,'CREATE TABLE `ndbinfo`.`ndb$diskstat` (`node_id` INT UNSIGNED COMMENT "node_id",`block_instance` INT UNSIGNED COMMENT "Block instance",`pages_made_dirty` INT UNSIGNED COMMENT "Pages made dirty last second",`reads_issued` INT UNSIGNED COMMENT "Reads issued last second",`reads_completed` INT UNSIGNED COMMENT "Reads completed last second",`writes_issued` INT UNSIGNED COMMENT "Writes issued last second",`writes_completed` INT UNSIGNED COMMENT "Writes completed last second",`log_writes_issued` INT UNSIGNED COMMENT "Log writes issued last second",`log_writes_completed` INT UNSIGNED COMMENT "Log writes completed last second",`get_page_calls_issued` INT UNSIGNED COMMENT "get_page calls issued last second",`get_page_reqs_issued` INT UNSIGNED COMMENT "get_page calls that triggered disk IO issued last second",`get_page_reqs_completed` INT UNSIGNED COMMENT "get_page calls that triggered disk IO completed last second") COMMENT="Disk data statistics for last second" ENGINE=NDBINFO CHARACTER SET latin1','SET @dummy = 0');
1010
+
PREPARE stmt FROM @str;
1011
+
EXECUTE stmt;
1012
+
DROP PREPARE stmt;
1013
+
1014
+
# ndbinfo.ndb$diskstats_1sec
1015
+
SET @str=IF(@have_ndbinfo,'DROP TABLE IF EXISTS `ndbinfo`.`ndb$diskstats_1sec`','SET @dummy = 0');
1016
+
PREPARE stmt FROM @str;
1017
+
EXECUTE stmt;
1018
+
DROP PREPARE stmt;
1019
+
1020
+
SET @str=IF(@have_ndbinfo,'CREATE TABLE `ndbinfo`.`ndb$diskstats_1sec` (`node_id` INT UNSIGNED COMMENT "node_id",`block_instance` INT UNSIGNED COMMENT "Block instance",`pages_made_dirty` INT UNSIGNED COMMENT "Pages made dirty per second",`reads_issued` INT UNSIGNED COMMENT "Reads issued per second",`reads_completed` INT UNSIGNED COMMENT "Reads completed per second",`writes_issued` INT UNSIGNED COMMENT "Writes issued per second",`writes_completed` INT UNSIGNED COMMENT "Writes completed per second",`log_writes_issued` INT UNSIGNED COMMENT "Log writes issued per second",`log_writes_completed` INT UNSIGNED COMMENT "Log writes completed per second",`get_page_calls_issued` INT UNSIGNED COMMENT "get_page calls issued per second",`get_page_reqs_issued` INT UNSIGNED COMMENT "get_page calls that triggered disk IO issued per second",`get_page_reqs_completed` INT UNSIGNED COMMENT "get_page calls that triggered disk IO completed per second",`seconds_ago` INT UNSIGNED COMMENT "Seconds ago that this measurement was made") COMMENT="Disk data statistics history for last few seconds" ENGINE=NDBINFO CHARACTER SET latin1','SET @dummy = 0');
1021
+
PREPARE stmt FROM @str;
1022
+
EXECUTE stmt;
1023
+
DROP PREPARE stmt;
1024
+
988
1025
# ndbinfo.ndb$frag_locks
989
1026
SET @str=IF(@have_ndbinfo,'DROP TABLE IF EXISTS `ndbinfo`.`ndb$frag_locks`','SET @dummy = 0');
990
1027
PREPARE stmt FROM @str;
@@ -1073,6 +1110,17 @@ PREPARE stmt FROM @str;
1073
1110
EXECUTE stmt;
1074
1111
DROP PREPARE stmt;
1075
1112
1113
+
# ndbinfo.ndb$pgman_time_track_stats
1114
+
SET @str=IF(@have_ndbinfo,'DROP TABLE IF EXISTS `ndbinfo`.`ndb$pgman_time_track_stats`','SET @dummy = 0');
1115
+
PREPARE stmt FROM @str;
1116
+
EXECUTE stmt;
1117
+
DROP PREPARE stmt;
1118
+
1119
+
SET @str=IF(@have_ndbinfo,'CREATE TABLE `ndbinfo`.`ndb$pgman_time_track_stats` (`node_id` INT UNSIGNED COMMENT "node_id",`block_number` INT UNSIGNED COMMENT "Block number",`block_instance` INT UNSIGNED COMMENT "Block instance",`upper_bound` INT UNSIGNED COMMENT "Upper bound in microseconds",`page_reads` BIGINT UNSIGNED COMMENT "Number of disk reads in this range",`page_writes` BIGINT UNSIGNED COMMENT "Number of disk writes in this range",`log_waits` BIGINT UNSIGNED COMMENT "Number of waits due to WAL rule in this range (log waits)",`get_page` BIGINT UNSIGNED COMMENT "Number of waits for get_page in this range") COMMENT="Time tracking of reads and writes of disk data pages" ENGINE=NDBINFO CHARACTER SET latin1','SET @dummy = 0');
1120
+
PREPARE stmt FROM @str;
1121
+
EXECUTE stmt;
1122
+
DROP PREPARE stmt;
1123
+
1076
1124
# ndbinfo.ndb$pools
1077
1125
SET @str=IF(@have_ndbinfo,'DROP TABLE IF EXISTS `ndbinfo`.`ndb$pools`','SET @dummy = 0');
1078
1126
PREPARE stmt FROM @str;
@@ -1470,6 +1518,18 @@ PREPARE stmt FROM @str;
1470
1518
EXECUTE stmt;
1471
1519
DROP PREPARE stmt;
1472
1520
1521
+
# ndbinfo.diskstat
1522
+
SET @str=IF(@have_ndbinfo,'CREATE OR REPLACE DEFINER=`root`@`localhost` SQL SECURITY INVOKER VIEW `ndbinfo`.`diskstat` AS SELECT * FROM `ndbinfo`.`ndb$diskstat`','SET @dummy = 0');
1523
+
PREPARE stmt FROM @str;
1524
+
EXECUTE stmt;
1525
+
DROP PREPARE stmt;
1526
+
1527
+
# ndbinfo.diskstats_1sec
1528
+
SET @str=IF(@have_ndbinfo,'CREATE OR REPLACE DEFINER=`root`@`localhost` SQL SECURITY INVOKER VIEW `ndbinfo`.`diskstats_1sec` AS SELECT * FROM `ndbinfo`.`ndb$diskstats_1sec`','SET @dummy = 0');
1529
+
PREPARE stmt FROM @str;
1530
+
EXECUTE stmt;
1531
+
DROP PREPARE stmt;
1532
+
1473
1533
# ndbinfo.error_messages
1474
1534
SET @str=IF(@have_ndbinfo,'CREATE OR REPLACE DEFINER=`root`@`localhost` SQL SECURITY INVOKER VIEW `ndbinfo`.`error_messages` AS SELECT error_code, error_description, error_status, error_classification FROM `ndbinfo`.`ndb$error_messages`','SET @dummy = 0');
1475
1535
PREPARE stmt FROM @str;
@@ -1524,6 +1584,12 @@ PREPARE stmt FROM @str;
1524
1584
EXECUTE stmt;
1525
1585
DROP PREPARE stmt;
1526
1586
1587
+
# ndbinfo.pgman_time_track_stats
1588
+
SET @str=IF(@have_ndbinfo,'CREATE OR REPLACE DEFINER=`root`@`localhost` SQL SECURITY INVOKER VIEW `ndbinfo`.`pgman_time_track_stats` AS SELECT * FROM `ndbinfo`.`ndb$pgman_time_track_stats`','SET @dummy = 0');
1589
+
PREPARE stmt FROM @str;
1590
+
EXECUTE stmt;
1591
+
DROP PREPARE stmt;
1592
+
1527
1593
# ndbinfo.processes
1528
1594
SET @str=IF(@have_ndbinfo,'CREATE OR REPLACE DEFINER=`root`@`localhost` SQL SECURITY INVOKER VIEW `ndbinfo`.`processes` AS SELECT DISTINCT node_id, CASE node_type WHEN 0 THEN "NDB" WHEN 1 THEN "API" WHEN 2 THEN "MGM" ELSE NULL END AS node_type, node_version, NULLIF(process_id, 0) AS process_id, NULLIF(angel_process_id, 0) AS angel_process_id, process_name, service_URI FROM `ndbinfo`.`ndb$processes` ORDER BY node_id','SET @dummy = 0');
0 commit comments