Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zil_itx_needcopy_bytes kstat counter is corrupted #6988

Closed
dechamps opened this issue Dec 20, 2017 · 14 comments

Comments

Projects
None yet
6 participants
@dechamps
Copy link
Contributor

commented Dec 20, 2017

System information

Type Version/Name
Distribution Name Debian
Distribution Version Unstable
Linux Kernel 4.13.0-1-amd64
Architecture amd64
ZFS Version 0.7.3-3
SPL Version 0.7.3-1

Describe the problem you're observing

$ cat /proc/spl/kstat/zfs/zil
15 1 0x01 13 624 31503034653 382758011634377
name                            type data
zil_commit_count                4    197902
zil_commit_writer_count         4    197884
zil_itx_count                   4    611431070
zil_itx_indirect_count          4    0
zil_itx_indirect_bytes          4    0
zil_itx_copied_count            4    0
zil_itx_copied_bytes            4    0
zil_itx_needcopy_count          4    611266365
zil_itx_needcopy_bytes          4    18446744072731425348
zil_itx_metaslab_normal_count   4    0
zil_itx_metaslab_normal_bytes   4    0
zil_itx_metaslab_slog_count     4    1169526
zil_itx_metaslab_slog_bytes     4    140983216376

The zil_itx_needcopy_bytes counter is blatantly wrong - I'm pretty sure I did not write 16 exabytes of data in that pool :) Its value is quite close to UINT64_MAX, which suggests some kind of overflow or memory corruption.

Describe how to reproduce the problem

Not sure. However I can tell that it started after I did a system upgrade, which included the following version changes:

  • Kernel: 4.12 → 4.13
  • SPL: 0.6.5 → 0.7.3
  • ZFS: 0.6.5 → 0.7.3

For this reason I suspect this might be a regression introduced between SPL/ZFS 0.6.5 and SPL/ZFS 0.7.3.

This issue might seem benign, but in my case it's really not because it prevents Prometheus Node exporter from exporting ZFS metrics correctly. Here's the log message from the node exporter in an attempt to make this issue easier to search for:

time="2017-12-20T22:32:39Z" level=error msg="ERROR: zfs collector failed after 0.000693s: could not parse expected integer value for \"kstat.zfs.misc.zil.zil_itx_needcopy_bytes\"" source="node_exporter.go:95"
@dechamps

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2017

The only place where that counter gets written to in the code is:

ZIL_STAT_INCR(zil_itx_needcopy_bytes,

It looks like the only way that can happen is if lrw->lr_length itself is corrupted (overflow?), which actually sounds quite scary.

@behlendorf behlendorf added this to the 0.8.0 milestone Dec 21, 2017

@behlendorf

This comment has been minimized.

Copy link
Member

commented Dec 21, 2017

That is scary! So it's possible this was fixed in 0.7.4 by commit 4a98780 which is a backport from master. The patch resolves a potential issue where an itx, and thus an lrw->lr_length, might be corrupted under exactly the right circumstances. I've never actually been able to reproduce this specific issue but it's a plausible way to explain the zilstats. And it would only need to have happened once to trash that counter.

If you're able to reproduce the issue I'd suggest updating to 0.7.4 or newer which includes the fix. Locally I haven't been able to reproduce this issue.

@dechamps

This comment has been minimized.

Copy link
Contributor Author

commented Dec 21, 2017

Interesting. I might just try that. I tried to reproduce it by adding a tactical VERIFY3U() before that line and then running ztest, but no luck.

On my live production pool, I know from my Prometheus timeseries that it happens very quickly (< minutes) after the pool comes up, but I have no idea what triggers it specifically. Once I'm back from the holidays I'll try updating to 0.7.4 (or just cherry-picking your patch) and see what happens.

@cwedgwood

This comment has been minimized.

Copy link
Contributor

commented Dec 22, 2017

@dechamps underflow?

18446744072731425348 has most MSB set

'-18446744072731425348' is 978126267 which isn't outrageous

@dechamps

This comment has been minimized.

Copy link
Contributor Author

commented Jan 7, 2018

I have upgraded to 0.7.4, and that appears to have fixed the issue.

@dechamps dechamps closed this Jan 7, 2018

@dechamps

This comment has been minimized.

Copy link
Contributor Author

commented Jan 14, 2018

Strike my last. It looks like I spoke too fast. The issue is still there in 0.7.4-1. The counter started showing corruption again after about 5 days of uptime.

@dechamps dechamps reopened this Jan 14, 2018

@nightah

This comment has been minimized.

Copy link

commented Jan 23, 2018

It appears that I'm having the same issue.

System information

Type Version/Name
Distribution Name Arch Linux
Distribution Version n/a
Linux Kernel 4.14.14-1
Architecture amd64
ZFS Version 0.7.5-1
SPL Version 0.7.5-1

Describe the problem you're observing

# cat /proc/spl/kstat/zfs/zil
15 1 0x01 13 624 1924297609 504195774290
name                            type data
zil_commit_count                4    7137
zil_commit_writer_count         4    6754
zil_itx_count                   4    126774
zil_itx_indirect_count          4    145
zil_itx_indirect_bytes          4    14166783
zil_itx_copied_count            4    0
zil_itx_copied_bytes            4    0
zil_itx_needcopy_count          4    120733
zil_itx_needcopy_bytes          4    18446744073709013048
zil_itx_metaslab_normal_count   4    5581
zil_itx_metaslab_normal_bytes   4    84571112
zil_itx_metaslab_slog_count     4    0
zil_itx_metaslab_slog_bytes     4    0

Similarly I noticed this when my node exporter instance complained about the same issue as the previous report.

@cwedgwood

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2018

@dechamps what about putting some unused (but exported) guard values before and after it? that way when it's corrupted we could dump those (which by default would be 0) and perhaps get some more insight into this

@cwedgwood

This comment has been minimized.

Copy link
Contributor

commented Jan 23, 2018

@dechamps furthermore, if the guard values are being corrupted we could look for this in common code-paths and WARN

@chrisrd

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2018

Me too... zfs-0.7.6, linux-4.9.76 (also noticed via node-exporter failing)

# grep zil_itx_needcopy_bytes /proc/spl/kstat/zfs/zil
zil_itx_needcopy_bytes          4    18446744073709537686
@cwedgwood

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2018

@chrisrd @behlendorf note 18446744073709537686 is 0xffffffffffffc996 - again, high bits set as if an underflow (-13930)

@richardelling

This comment has been minimized.

Copy link
Contributor

commented Feb 13, 2018

The logic for the length of the data written is incorrect in zil.c line 1165, allowing a negative
increment soon thereafter.

				if (lrwb->lr_length > dnow)
					lrwb->lr_length = dnow;
				lrw->lr_offset += dnow;
				lrw->lr_length -= dnow;
				ZIL_STAT_BUMP(zil_itx_needcopy_count);
				ZIL_STAT_INCR(zil_itx_needcopy_bytes,
				    lrw->lr_length);

I'll take a closer look at this soon.

Regarding node_explorer, the latest version after 16-jan-2018 should handle uint64 to float64 ok. node_explorer and Prometheus only do floats, so we inevitably will lose resolution. If you can reach out to me at Richard.Elling@RichardElling.com with the error message you saw, I think we can make node_explorer get smarter about these things.

@richardelling

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2018

I can see no benefit to incrementing zil_itx_needcopy_bytes after lrw->lr_length is decremented. IMHO, the benefit of tracking the bytes is to see the relative weight of WR_COPY, WR_NEED_COPY, and WR_INDIRECT. With the relative weighting, decisions can be made wrt zfs_immediate_write_sz and logbias

So I propose moving the lrw->lr_length -= dnow; after the ZIL_STAT_* adjustments. Thoughts?

@chrisrd

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2018

It's all pretty opaque to me, but here goes nothing...

It looks like dnow is how much data we're adding "now" to lwb (via lrcb) and writing out in zil_lwb_add_txg(lwb, txg). And this stuff:

        lrw->lr_offset += dnow; 
        lrw->lr_length -= dnow;

...is recording the offset in the buffer (lr_offset) and how much data (lr_length) we need to output next time we come through the goto cont loop,

So this:

        ZIL_STAT_INCR(zil_itx_needcopy_bytes, lrw->lr_length);

...looks like we're incrementing zil_itx_needcopy_bytes by how much data is remaining, each time through the loop, and lrw->lr_length goes negative when we can fit the whole thing with space left over in the current record (which means we won't come through the loop again).

I suspect the correct thing to do is to increment zil_itx_needcopy_bytes by how much we're writing this time through the loop, i.e.:

        ZIL_STAT_INCR(zil_itx_needcopy_bytes, dnow);

But I've no idea how to test this!

chrisrd added a commit to chrisrd/zfs that referenced this issue Feb 15, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes: zfsonlinux#6988

@chrisrd chrisrd referenced this issue Feb 15, 2018

Merged

Increment zil_itx_needcopy_bytes properly #7176

4 of 13 tasks complete

chrisrd added a commit to chrisrd/zfs that referenced this issue Mar 1, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes: zfsonlinux#6988

@behlendorf behlendorf closed this in 5666a99 Mar 2, 2018

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 7, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes zfsonlinux#6988 
Closes zfsonlinux#7176

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 7, 2018

zfs-0.7.7 squashed patchset
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Linux 4.16 compat: get_disk_and_module() zfsonlinux#7264
- Change checksum & IO delay ratelimit values zfsonlinux#7252
- Increment zil_itx_needcopy_bytes properly zfsonlinux#6988  zfsonlinux#7176
- Fix some typos zfsonlinux#7237
- Fix zpool(8) list example to match actual format zfsonlinux#7244
- Add SMART self-test results to zpool status -c zfsonlinux#7178
- Add scrub after resilver zed script zfsonlinux#4662  zfsonlinux#7086
- Fix free memory calculation on v3.14+ zfsonlinux#7170
- Report duration and error in mmp_history entries zfsonlinux#7190
- Do not initiate MMP writes while pool is suspended zfsonlinux#7182
- Linux 4.16 compat: use correct *_dec_and_test() zfsonlinux#7179  zfsonlinux#7211
- Allow modprobe to fail when called within systemd zfsonlinux#7174
- Add SMART attributes for SSD and NVMe zfsonlinux#7183  zfsonlinux#7193
- Correct count_uberblocks in mmp.kshlib zfsonlinux#7191
- Fix config issues: frame size and headers zfsonlinux#7169
- Clarify zinject(8) explanation of -e zfsonlinux#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio zfsonlinux#7168
- 'zfs receive' fails with "dataset is busy" zfsonlinux#7129  zfsonlinux#7154
- contrib/initramfs: add missing conf.d/zfs zfsonlinux#7158
- mmp should use a fixed tag for spa_config locks zfsonlinux#6530  zfsonlinux#7155
- Handle zap_add() failures in mixed case mode zfsonlinux#7011 zfsonlinux#7054
- Fix zdb -ed on objset for exported pool zfsonlinux#7099 zfsonlinux#6464
- Fix zdb -E segfault zfsonlinux#7099
- Fix zdb -R decompression zfsonlinux#7099  zfsonlinux#4984
- Fix racy assignment of zcb.zcb_haderrors zfsonlinux#7099
- Fix zle_decompress out of bound access zfsonlinux#7099
- Fix zdb -c traverse stop on damaged objset root zfsonlinux#7099
- Linux 4.11 compat: avoid refcount_t name conflict zfsonlinux#7148
- Linux 4.16 compat: inode_set_iversion() zfsonlinux#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable zfsonlinux#7141
- Remove deprecated zfs_arc_p_aggressive_disable zfsonlinux#7135
- Fix default libdir for Debian/Ubuntu zfsonlinux#7083  zfsonlinux#7101
- Bug fix in qat_compress.c for vmalloc addr check zfsonlinux#7125
- Fix systemd_ RPM macros usage on Debian-based distributions zfsonlinux#7074
  zfsonlinux#7100
- Emit an error message before MMP suspends pool zfsonlinux#7048
- ZTS: Fix create-o_ashift test case zfsonlinux#6924  zfsonlinux#6977
- Fix --with-systemd on Debian-based distributions (zfsonlinux#6963) zfsonlinux#6591  zfsonlinux#6963
- Remove vn_rename and vn_remove dependency zfsonlinux/spl#648 zfsonlinux#6753
- Add support for "--enable-code-coverage" option zfsonlinux#6670
- Make "-fno-inline" compile option more accessible zfsonlinux#6605
- Add configure option to enable gcov analysis zfsonlinux#6642
- Implement --enable-debuginfo to force debuginfo zfsonlinux#2734
- Make --enable-debug fail when given bogus args zfsonlinux#2734

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Requires-spl: refs/pull/690/head

@tonyhutter tonyhutter referenced this issue Mar 7, 2018

Closed

zfs-0.7.7 patchset (squashed) #7278

0 of 13 tasks complete

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 12, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes zfsonlinux#6988 
Closes zfsonlinux#7176

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018

zfs-0.7.7 squashed patchset
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools zfsonlinux#7205 zfsonlinux#7289
- Handle zio_resume and mmp => off zfsonlinux#7286
- Fix zfs-kmod builds when using rpm >= 4.14 zfsonlinux#7284
- zdb and inuse tests don't pass with real disks zfsonlinux#6939 zfsonlinux#7261
- Take user namespaces into account in policy checks zfsonlinux#6800 zfsonlinux#7270
- Detect long config lock acquisition in mmp zfsonlinux#7212
- Linux 4.16 compat: get_disk_and_module() zfsonlinux#7264
- Change checksum & IO delay ratelimit values zfsonlinux#7252
- Increment zil_itx_needcopy_bytes properly zfsonlinux#6988 zfsonlinux#7176
- Fix some typos zfsonlinux#7237
- Fix zpool(8) list example to match actual format zfsonlinux#7244
- Add SMART self-test results to zpool status -c zfsonlinux#7178
- Add scrub after resilver zed script zfsonlinux#4662 zfsonlinux#7086
- Fix free memory calculation on v3.14+ zfsonlinux#7170
- Report duration and error in mmp_history entries zfsonlinux#7190
- Do not initiate MMP writes while pool is suspended zfsonlinux#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd zfsonlinux#7174
- Add SMART attributes for SSD and NVMe zfsonlinux#7183 zfsonlinux#7193
- Correct count_uberblocks in mmp.kshlib zfsonlinux#7191
- Fix config issues: frame size and headers zfsonlinux#7169
- Clarify zinject(8) explanation of -e zfsonlinux#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed parent zio zfsonlinux#7168
- 'zfs receive' fails with "dataset is busy" zfsonlinux#7129 zfsonlinux#7154
- contrib/initramfs: add missing conf.d/zfs zfsonlinux#7158
- mmp should use a fixed tag for spa_config locks zfsonlinux#6530 zfsonlinux#7155
- Handle zap_add() failures in mixed case mode zfsonlinux#7011 zfsonlinux#7054
- Fix zdb -ed on objset for exported pool zfsonlinux#7099 zfsonlinux#6464
- Fix zdb -E segfault zfsonlinux#7099
- Fix zdb -R decompression zfsonlinux#7099 zfsonlinux#4984
- Fix racy assignment of zcb.zcb_haderrors zfsonlinux#7099
- Fix zle_decompress out of bound access zfsonlinux#7099
- Fix zdb -c traverse stop on damaged objset root zfsonlinux#7099
- Linux 4.11 compat: avoid refcount_t name conflict zfsonlinux#7148
- Linux 4.16 compat: inode_set_iversion() zfsonlinux#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common contains a use after end of the lifetime of a local variable zfsonlinux#7141
- Remove deprecated zfs_arc_p_aggressive_disable zfsonlinux#7135
- Fix default libdir for Debian/Ubuntu zfsonlinux#7083 zfsonlinux#7101
- Bug fix in qat_compress.c for vmalloc addr check zfsonlinux#7125
- Fix systemd_ RPM macros usage on Debian-based distributions zfsonlinux#7074 zfsonlinux#7100
- Emit an error message before MMP suspends pool zfsonlinux#7048
- ZTS: Fix create-o_ashift test case zfsonlinux#6924 zfsonlinux#6977
- Fix --with-systemd on Debian-based distributions (zfsonlinux#6963) zfsonlinux#6591 zfsonlinux#6963
- Remove vn_rename and vn_remove dependency zfsonlinux/spl#648 zfsonlinux#6753
- Add support for "--enable-code-coverage" option zfsonlinux#6670
- Make "-fno-inline" compile option more accessible zfsonlinux#6605
- Add configure option to enable gcov analysis zfsonlinux#6642
- Implement --enable-debuginfo to force debuginfo zfsonlinux#2734
- Make --enable-debug fail when given bogus args zfsonlinux#2734

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Requires-spl: refs/pull/690/head

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018

zfs-0.7.7 squashed patchset
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools zfsonlinux#7205 zfsonlinux#7289
- Handle zio_resume and mmp => off zfsonlinux#7286
- Fix zfs-kmod builds when using rpm >= 4.14 zfsonlinux#7284
- zdb and inuse tests don't pass with real disks zfsonlinux#6939 zfsonlinux#7261
- Take user namespaces into account in policy checks zfsonlinux#6800 zfsonlinux#7270
- Detect long config lock acquisition in mmp zfsonlinux#7212
- Linux 4.16 compat: get_disk_and_module() zfsonlinux#7264
- Change checksum & IO delay ratelimit values zfsonlinux#7252
- Increment zil_itx_needcopy_bytes properly zfsonlinux#6988 zfsonlinux#7176
- Fix some typos zfsonlinux#7237
- Fix zpool(8) list example to match actual format zfsonlinux#7244
- Add SMART self-test results to zpool status -c zfsonlinux#7178
- Add scrub after resilver zed script zfsonlinux#4662 zfsonlinux#7086
- Fix free memory calculation on v3.14+ zfsonlinux#7170
- Report duration and error in mmp_history entries zfsonlinux#7190
- Do not initiate MMP writes while pool is suspended zfsonlinux#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd zfsonlinux#7174
- Add SMART attributes for SSD and NVMe zfsonlinux#7183 zfsonlinux#7193
- Correct count_uberblocks in mmp.kshlib zfsonlinux#7191
- Fix config issues: frame size and headers zfsonlinux#7169
- Clarify zinject(8) explanation of -e zfsonlinux#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio zfsonlinux#7168
- 'zfs receive' fails with "dataset is busy" zfsonlinux#7129 zfsonlinux#7154
- contrib/initramfs: add missing conf.d/zfs zfsonlinux#7158
- mmp should use a fixed tag for spa_config locks zfsonlinux#6530 zfsonlinux#7155
- Handle zap_add() failures in mixed case mode zfsonlinux#7011 zfsonlinux#7054
- Fix zdb -ed on objset for exported pool zfsonlinux#7099 zfsonlinux#6464
- Fix zdb -E segfault zfsonlinux#7099
- Fix zdb -R decompression zfsonlinux#7099 zfsonlinux#4984
- Fix racy assignment of zcb.zcb_haderrors zfsonlinux#7099
- Fix zle_decompress out of bound access zfsonlinux#7099
- Fix zdb -c traverse stop on damaged objset root zfsonlinux#7099
- Linux 4.11 compat: avoid refcount_t name conflict zfsonlinux#7148
- Linux 4.16 compat: inode_set_iversion() zfsonlinux#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable zfsonlinux#7141
- Remove deprecated zfs_arc_p_aggressive_disable zfsonlinux#7135
- Fix default libdir for Debian/Ubuntu zfsonlinux#7083 zfsonlinux#7101
- Bug fix in qat_compress.c for vmalloc addr check zfsonlinux#7125
- Fix systemd_ RPM macros usage on Debian-based distributions zfsonlinux#7074
  zfsonlinux#7100
- Emit an error message before MMP suspends pool zfsonlinux#7048
- ZTS: Fix create-o_ashift test case zfsonlinux#6924 zfsonlinux#6977
- Fix --with-systemd on Debian-based distributions (zfsonlinux#6963) zfsonlinux#6591 zfsonlinux#6963
- Remove vn_rename and vn_remove dependency zfsonlinux/spl#648 zfsonlinux#6753
- Add support for "--enable-code-coverage" option zfsonlinux#6670
- Make "-fno-inline" compile option more accessible zfsonlinux#6605
- Add configure option to enable gcov analysis zfsonlinux#6642
- Implement --enable-debuginfo to force debuginfo zfsonlinux#2734
- Make --enable-debug fail when given bogus args zfsonlinux#2734

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Requires-spl: refs/pull/690/head

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes zfsonlinux#6988 
Closes zfsonlinux#7176

tonyhutter added a commit to tonyhutter/zfs that referenced this issue Mar 13, 2018

zfs-0.7.7 squashed patchset
This is a squashed patchset for zfs-0.7.7.  The individual commits are
in the tonyhutter:zfs-0.7.7-hutter branch.  I squashed the commits so
that buildbot wouldn't have to run against each one, and because
github/builbot seem to have a maximum limit of 30 commits they can
test from a PR.

- Fix MMP write frequency for large pools zfsonlinux#7205 zfsonlinux#7289
- Handle zio_resume and mmp => off zfsonlinux#7286
- Fix zfs-kmod builds when using rpm >= 4.14 zfsonlinux#7284
- zdb and inuse tests don't pass with real disks zfsonlinux#6939 zfsonlinux#7261
- Take user namespaces into account in policy checks zfsonlinux#6800 zfsonlinux#7270
- Detect long config lock acquisition in mmp zfsonlinux#7212
- Linux 4.16 compat: get_disk_and_module() zfsonlinux#7264
- Change checksum & IO delay ratelimit values zfsonlinux#7252
- Increment zil_itx_needcopy_bytes properly zfsonlinux#6988 zfsonlinux#7176
- Fix some typos zfsonlinux#7237
- Fix zpool(8) list example to match actual format zfsonlinux#7244
- Add SMART self-test results to zpool status -c zfsonlinux#7178
- Add scrub after resilver zed script zfsonlinux#4662 zfsonlinux#7086
- Fix free memory calculation on v3.14+ zfsonlinux#7170
- Report duration and error in mmp_history entries zfsonlinux#7190
- Do not initiate MMP writes while pool is suspended zfsonlinux#7182
- Linux 4.16 compat: use correct *_dec_and_test()
- Allow modprobe to fail when called within systemd zfsonlinux#7174
- Add SMART attributes for SSD and NVMe zfsonlinux#7183 zfsonlinux#7193
- Correct count_uberblocks in mmp.kshlib zfsonlinux#7191
- Fix config issues: frame size and headers zfsonlinux#7169
- Clarify zinject(8) explanation of -e zfsonlinux#7172
- OpenZFS 8857 - zio_remove_child() panic due to already destroyed
  parent zio zfsonlinux#7168
- 'zfs receive' fails with "dataset is busy" zfsonlinux#7129 zfsonlinux#7154
- contrib/initramfs: add missing conf.d/zfs zfsonlinux#7158
- mmp should use a fixed tag for spa_config locks zfsonlinux#6530 zfsonlinux#7155
- Handle zap_add() failures in mixed case mode zfsonlinux#7011 zfsonlinux#7054
- Fix zdb -ed on objset for exported pool zfsonlinux#7099 zfsonlinux#6464
- Fix zdb -E segfault zfsonlinux#7099
- Fix zdb -R decompression zfsonlinux#7099 zfsonlinux#4984
- Fix racy assignment of zcb.zcb_haderrors zfsonlinux#7099
- Fix zle_decompress out of bound access zfsonlinux#7099
- Fix zdb -c traverse stop on damaged objset root zfsonlinux#7099
- Linux 4.11 compat: avoid refcount_t name conflict zfsonlinux#7148
- Linux 4.16 compat: inode_set_iversion() zfsonlinux#7148
- OpenZFS 8966 - Source file zfs_acl.c, function zfs_aclset_common
  contains a use after end of the lifetime of a local variable zfsonlinux#7141
- Remove deprecated zfs_arc_p_aggressive_disable zfsonlinux#7135
- Fix default libdir for Debian/Ubuntu zfsonlinux#7083 zfsonlinux#7101
- Bug fix in qat_compress.c for vmalloc addr check zfsonlinux#7125
- Fix systemd_ RPM macros usage on Debian-based distributions zfsonlinux#7074
  zfsonlinux#7100
- Emit an error message before MMP suspends pool zfsonlinux#7048
- ZTS: Fix create-o_ashift test case zfsonlinux#6924 zfsonlinux#6977
- Fix --with-systemd on Debian-based distributions (zfsonlinux#6963) zfsonlinux#6591 zfsonlinux#6963
- Remove vn_rename and vn_remove dependency zfsonlinux/spl#648 zfsonlinux#6753
- Fix "--enable-code-coverage" debug build zfsonlinux#6674
- Update codecov.yml zfsonlinux#6669
- Add support for "--enable-code-coverage" option zfsonlinux#6670
- Make "-fno-inline" compile option more accessible zfsonlinux#6605
- Add configure option to enable gcov analysis zfsonlinux#6642
- Implement --enable-debuginfo to force debuginfo zfsonlinux#2734
- Make --enable-debug fail when given bogus args zfsonlinux#2734

Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Requires-spl: refs/pull/690/head

tonyhutter added a commit that referenced this issue Mar 19, 2018

Increment zil_itx_needcopy_bytes properly
In zil_lwb_commit() with TX_WRITE, we copy the log write record (lrw)
into the log write block (lwb) and send it off using zil_lwb_add_txg().
If we also have WR_NEED_COPY, we additionally copy the lwr's data into
the lwb to be sent off.  If the lwr + data doesn't fit into the lwb, we
send the lrw and as much data as will fit (dnow bytes), then go back
and do the same with the remaining data.

Each time through this loop we're sending dnow data bytes. I.e.
zil_itx_needcopy_bytes should be incremented by dnow.

Reviewed-by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Closes #6988
Closes #7176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.