Skip to content

Commit c352ec2

Browse files
pcd1193182behlendorf
authored andcommitted
Illumos 6370 - ZFS send fails to transmit some holes
6370 ZFS send fails to transmit some holes Reviewed by: Matthew Ahrens <mahrens@delphix.com> Reviewed by: Chris Williamson <chris.williamson@delphix.com> Reviewed by: Stefan Ring <stefanrin@gmail.com> Reviewed by: Steven Burgess <sburgess@datto.com> Reviewed by: Arne Jansen <sensille@gmx.net> Approved by: Robert Mustacchi <rm@joyent.com> References: https://www.illumos.org/issues/6370 illumos/illumos-gate@286ef71 In certain circumstances, "zfs send -i" (incremental send) can produce a stream which will result in incorrect sparse file contents on the target. The problem manifests as regions of the received file that should be sparse (and read a zero-filled) actually contain data from a file that was deleted (and which happened to share this file's object ID). Note: this can happen only with filesystems (not zvols, because they do not free (and thus can not reuse) object IDs). Note: This can happen only if, since the incremental source (FromSnap), a file was deleted and then another file was created, and the new file is sparse (i.e. has areas that were never written to and should be implicitly zero-filled). We suspect that this was introduced by 4370 (applies only if hole_birth feature is enabled), and made worse by 5243 (applies if hole_birth feature is disabled, and we never send any holes). The bug is caused by the hole birth feature. When an object is deleted and replaced, all the holes in the object have birth time zero. However, zfs send cannot tell that the holes are new since the file was replaced, so it doesn't send them in an incremental. As a result, you can end up with invalid data when you receive incremental send streams. As a short-term fix, we can always send holes with birth time 0 (unless it's a zvol or a dataset where we can guarantee that no objects have been reused). Ported-by: Steven Burgess <sburgess@datto.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #4369 Closes #4050
1 parent a9977b3 commit c352ec2

File tree

2 files changed

+43
-14
lines changed

2 files changed

+43
-14
lines changed

module/zfs/dmu_object.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
*/
2121
/*
2222
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
23-
* Copyright (c) 2013, 2014 by Delphix. All rights reserved.
23+
* Copyright (c) 2013, 2015 by Delphix. All rights reserved.
2424
* Copyright 2014 HybridCluster. All rights reserved.
2525
*/
2626

@@ -50,6 +50,12 @@ dmu_object_alloc(objset_t *os, dmu_object_type_t ot, int blocksize,
5050
* reasonably sparse (at most 1/4 full). Look from the
5151
* beginning once, but after that keep looking from here.
5252
* If we can't find one, just keep going from here.
53+
*
54+
* Note that dmu_traverse depends on the behavior that we use
55+
* multiple blocks of the dnode object before going back to
56+
* reuse objects. Any change to this algorithm should preserve
57+
* that property or find another solution to the issues
58+
* described in traverse_visitbp.
5359
*/
5460
if (P2PHASE(object, L2_dnode_count) == 0) {
5561
uint64_t offset = restarted ? object << DNODE_SHIFT : 0;

module/zfs/dmu_traverse.c

Lines changed: 36 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
*/
2121
/*
2222
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
23-
* Copyright (c) 2012, 2014 by Delphix. All rights reserved.
23+
* Copyright (c) 2012, 2016 by Delphix. All rights reserved.
2424
*/
2525

2626
#include <sys/zfs_context.h>
@@ -61,6 +61,7 @@ typedef struct traverse_data {
6161
uint64_t td_hole_birth_enabled_txg;
6262
blkptr_cb_t *td_func;
6363
void *td_arg;
64+
boolean_t td_realloc_possible;
6465
} traverse_data_t;
6566

6667
static int traverse_dnode(traverse_data_t *td, const dnode_phys_t *dnp,
@@ -228,18 +229,30 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp,
228229

229230
if (bp->blk_birth == 0) {
230231
/*
231-
* Since this block has a birth time of 0 it must be a
232-
* hole created before the SPA_FEATURE_HOLE_BIRTH
233-
* feature was enabled. If SPA_FEATURE_HOLE_BIRTH
234-
* was enabled before the min_txg for this traveral we
235-
* know the hole must have been created before the
236-
* min_txg for this traveral, so we can skip it. If
237-
* SPA_FEATURE_HOLE_BIRTH was enabled after the min_txg
238-
* for this traveral we cannot tell if the hole was
239-
* created before or after the min_txg for this
240-
* traversal, so we cannot skip it.
232+
* Since this block has a birth time of 0 it must be one of
233+
* two things: a hole created before the
234+
* SPA_FEATURE_HOLE_BIRTH feature was enabled, or a hole
235+
* which has always been a hole in an object.
236+
*
237+
* If a file is written sparsely, then the unwritten parts of
238+
* the file were "always holes" -- that is, they have been
239+
* holes since this object was allocated. However, we (and
240+
* our callers) can not necessarily tell when an object was
241+
* allocated. Therefore, if it's possible that this object
242+
* was freed and then its object number reused, we need to
243+
* visit all the holes with birth==0.
244+
*
245+
* If it isn't possible that the object number was reused,
246+
* then if SPA_FEATURE_HOLE_BIRTH was enabled before we wrote
247+
* all the blocks we will visit as part of this traversal,
248+
* then this hole must have always existed, so we can skip
249+
* it. We visit blocks born after (exclusive) td_min_txg.
250+
*
251+
* Note that the meta-dnode cannot be reallocated.
241252
*/
242-
if (td->td_hole_birth_enabled_txg < td->td_min_txg)
253+
if ((!td->td_realloc_possible ||
254+
zb->zb_object == DMU_META_DNODE_OBJECT) &&
255+
td->td_hole_birth_enabled_txg <= td->td_min_txg)
243256
return (0);
244257
} else if (bp->blk_birth <= td->td_min_txg) {
245258
return (0);
@@ -347,6 +360,15 @@ traverse_visitbp(traverse_data_t *td, const dnode_phys_t *dnp,
347360

348361
prefetch_dnode_metadata(td, mdnp, zb->zb_objset,
349362
DMU_META_DNODE_OBJECT);
363+
/*
364+
* See the block comment above for the goal of this variable.
365+
* If the maxblkid of the meta-dnode is 0, then we know that
366+
* we've never had more than DNODES_PER_BLOCK objects in the
367+
* dataset, which means we can't have reused any object ids.
368+
*/
369+
if (osp->os_meta_dnode.dn_maxblkid == 0)
370+
td->td_realloc_possible = B_FALSE;
371+
350372
if (arc_buf_size(buf) >= sizeof (objset_phys_t)) {
351373
prefetch_dnode_metadata(td, gdnp, zb->zb_objset,
352374
DMU_GROUPUSED_OBJECT);
@@ -554,12 +576,13 @@ traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp,
554576
td->td_pfd = pd;
555577
td->td_flags = flags;
556578
td->td_paused = B_FALSE;
579+
td->td_realloc_possible = (txg_start == 0 ? B_FALSE : B_TRUE);
557580

558581
if (spa_feature_is_active(spa, SPA_FEATURE_HOLE_BIRTH)) {
559582
VERIFY(spa_feature_enabled_txg(spa,
560583
SPA_FEATURE_HOLE_BIRTH, &td->td_hole_birth_enabled_txg));
561584
} else {
562-
td->td_hole_birth_enabled_txg = 0;
585+
td->td_hole_birth_enabled_txg = UINT64_MAX;
563586
}
564587

565588
pd->pd_flags = flags;

0 commit comments

Comments
 (0)