Browse files

Introduce auxiliary metaslab histograms

This patch introduces 3 new histograms per metaslab. These
histograms track segments that have made it to the metaslab's
space map histogram (and are part of the spacemap) but have
not yet reached the ms_allocatable tree on loaded metaslab's
because these metaslab's are currently syncing and haven't
gone through metaslab_sync_done() yet.

The histograms help when we decide whether to load an unloaded
metaslab in-order to allocate from it. When calculating the
weight of an unloaded metaslab traditionally, we look at the
highest bucket of its spacemap's histogram.  The problem is
that we are not guaranteed to be able to allocated that
segment when we load the metaslab because it may still be at
the freeing, freed, or defer trees. The new histograms are
used when we try to calculate an unloaded metaslab's weight
to deal with this issue by removing segments that have would
not be in the allocatable tree at runtime. Note, that this
method of dealing with this is not completely accurate as
adjacent segments are not always consolidated in the space
map histogram of a metaslab.

In addition and to make things deterministic, we always reset
the weight of unloaded metaslabs based on their space map
weight (instead of doing that on a need basis). Thus, every
time a metaslab is loaded and its weight is reset again (from
the weight based on its space map to the one based on its
allocatable range tree) we expect (and assert) that this
change in weight can only get better if it doesn't stay the

Finally, this commit adds a side-fix correcting a couple of
memory leaks in the error cases of vdev_metaslab_init().

Signed-off-by: Serapheim Dimitropoulos <>
  • Loading branch information...
sdimitro committed Jan 28, 2019
1 parent 4417096 commit 374901be80d50af56864070513fd5f055b967ff1
Showing with 332 additions and 20 deletions.
  1. +1 −0 include/sys/metaslab.h
  2. +15 −0 include/sys/metaslab_impl.h
  3. +1 −0 include/sys/space_map.h
  4. +293 −13 module/zfs/metaslab.c
  5. +22 −7 module/zfs/vdev.c
@@ -117,6 +117,7 @@ void metaslab_group_histogram_remove(metaslab_group_t *, metaslab_t *);
void metaslab_group_alloc_decrement(spa_t *, uint64_t, void *, int, int,
void metaslab_group_alloc_verify(spa_t *, const blkptr_t *, void *, int);
void metaslab_recalculate_weight_and_sort(metaslab_t *);

#ifdef __cplusplus
@@ -375,6 +375,21 @@ struct metaslab {
boolean_t ms_loaded;
boolean_t ms_loading;

* The point of the following histograms is to maintain copies of
* segments that have made it to the metaslab's space map histogram
* (and are part of the spacemap) but have not reached the
* ms_allocatable yet (or won't be in the ms_allocatable if it's
* loaded). We use them to make more accurate calculations of the
* ms_weight when the metaslab is unloaded. That said, there may
* still be some inaccuracies due to adjacent segments not being
* consolidated in the metaslab's space map histogram. That can
* cause us to have a worse view of an unloaded metaslab's weight
* than the metaslab has when it is loaded.
uint64_t ms_synchist[SPACE_MAP_HISTOGRAM_SIZE];

int64_t ms_deferspace; /* sum of ms_defermap[] space */
uint64_t ms_weight; /* weight vs. others in group */
uint64_t ms_activation_weight; /* activation weight */
@@ -193,6 +193,7 @@ int space_map_iterate(space_map_t *sm, sm_cb_t callback, void *arg);
int space_map_incremental_destroy(space_map_t *sm, sm_cb_t callback, void *arg,
dmu_tx_t *tx);

boolean_t space_map_histogram_verify(space_map_t *sm, range_tree_t *rt);
void space_map_histogram_clear(space_map_t *sm);
void space_map_histogram_add(space_map_t *sm, range_tree_t *rt,
dmu_tx_t *tx);
Oops, something went wrong.

0 comments on commit 374901b

Please sign in to comment.