Skip to content

Commit cb682a1

Browse files
ahrensbehlendorf
authored andcommitted
Illumos #3618 ::zio dcmd does not show timestamp data
3618 ::zio dcmd does not show timestamp data Reviewed by: Adam Leventhal <ahl@delphix.com> Reviewed by: George Wilson <gwilson@zfsmail.com> Reviewed by: Christopher Siden <christopher.siden@delphix.com> Reviewed by: Garrett D'Amore <garrett@damore.org> Approved by: Dan McDonald <danmcd@nexenta.com> References: http://www.illumos.org/issues/3618 illumos/illumos-gate@c55e05c Notes on porting to ZFS on Linux: The original changeset mostly deals with mdb ::zio dcmd. However, in order to provide the requested functionality it modifies vdev and zio structures to keep the timing data in nanoseconds instead of ticks. It is these changes that are ported over in the commit in hand. One visible change of this commit is that the default value of 'zfs_vdev_time_shift' tunable is changed: zfs_vdev_time_shift = 6 to zfs_vdev_time_shift = 29 The original value of 6 was inherited from OpenSolaris and was subotimal - since it shifted the raw tick value - it didn't compensate for different tick frequencies on Linux and OpenSolaris. The former has HZ=1000, while the latter HZ=100. (Which itself led to other interesting performance anomalies under non-trivial load. The deadline scheduler delays the IO according to its priority - the lower priority the further the deadline is set. The delay is measured in units of "shifted ticks". Since the HZ value was 10 times higher, the delay units were 10 times shorter. Thus really low priority IO like resilver (delay is 10 units) and scrub (delay is 20 units) were scheduled much sooner than intended. The overall effect is that resilver and scrub IO consumed more bandwidth at the expense of the other IO.) Now that the bookkeeping is done is nanoseconds the shift behaves correctly for any tick frequency (HZ). Ported-by: Cyril Plisko <cyril.plisko@mountall.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #1643
1 parent 570d6ed commit cb682a1

File tree

4 files changed

+16
-13
lines changed

4 files changed

+16
-13
lines changed

include/sys/vdev_impl.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,8 @@ struct vdev_queue {
105105
avl_tree_t vq_read_tree;
106106
avl_tree_t vq_write_tree;
107107
avl_tree_t vq_pending_tree;
108-
uint64_t vq_io_complete_ts;
109-
uint64_t vq_io_delta_ts;
108+
hrtime_t vq_io_complete_ts;
109+
hrtime_t vq_io_delta_ts;
110110
list_t vq_io_list;
111111
kmutex_t vq_lock;
112112
};

include/sys/zio.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -410,8 +410,8 @@ struct zio {
410410

411411
uint64_t io_offset;
412412
uint64_t io_deadline; /* expires at timestamp + deadline */
413-
uint64_t io_timestamp; /* submitted at (ticks) */
414-
uint64_t io_delta; /* vdev queue service delta (ticks) */
413+
hrtime_t io_timestamp; /* submitted at */
414+
hrtime_t io_delta; /* vdev queue service delta */
415415
uint64_t io_delay; /* vdev disk service delta (ticks) */
416416
avl_node_t io_offset_node;
417417
avl_node_t io_deadline_node;

module/zfs/vdev.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3219,10 +3219,10 @@ vdev_deadman(vdev_t *vd)
32193219
* the spa_deadman_synctime we log a zevent.
32203220
*/
32213221
fio = avl_first(&vq->vq_pending_tree);
3222-
delta = ddi_get_lbolt64() - fio->io_timestamp;
3223-
if (delta > NSEC_TO_TICK(spa_deadman_synctime(spa))) {
3224-
zfs_dbgmsg("SLOW IO: zio timestamp %llu, "
3225-
"delta %llu, last io %llu",
3222+
delta = gethrtime() - fio->io_timestamp;
3223+
if (delta > spa_deadman_synctime(spa)) {
3224+
zfs_dbgmsg("SLOW IO: zio timestamp %lluns, "
3225+
"delta %lluns, last io %lluns",
32263226
fio->io_timestamp, delta,
32273227
vq->vq_io_complete_ts);
32283228
zfs_ereport_post(FM_EREPORT_ZFS_DELAY,

module/zfs/vdev_queue.c

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,11 @@
4444
int zfs_vdev_max_pending = 10;
4545
int zfs_vdev_min_pending = 4;
4646

47-
/* deadline = pri + ddi_get_lbolt64() >> time_shift) */
48-
int zfs_vdev_time_shift = 6;
47+
/*
48+
* The deadlines are grouped into buckets based on zfs_vdev_time_shift:
49+
* deadline = pri + gethrtime() >> time_shift)
50+
*/
51+
int zfs_vdev_time_shift = 29; /* each bucket is 0.537 seconds */
4952

5053
/* exponential I/O issue ramp-up rate */
5154
int zfs_vdev_ramp_rate = 2;
@@ -396,7 +399,7 @@ vdev_queue_io(zio_t *zio)
396399

397400
mutex_enter(&vq->vq_lock);
398401

399-
zio->io_timestamp = ddi_get_lbolt64();
402+
zio->io_timestamp = gethrtime();
400403
zio->io_deadline = (zio->io_timestamp >> zfs_vdev_time_shift) +
401404
zio->io_priority;
402405

@@ -430,8 +433,8 @@ vdev_queue_io_done(zio_t *zio)
430433

431434
avl_remove(&vq->vq_pending_tree, zio);
432435

433-
zio->io_delta = ddi_get_lbolt64() - zio->io_timestamp;
434-
vq->vq_io_complete_ts = ddi_get_lbolt64();
436+
zio->io_delta = gethrtime() - zio->io_timestamp;
437+
vq->vq_io_complete_ts = gethrtime();
435438
vq->vq_io_delta_ts = vq->vq_io_complete_ts - zio->io_timestamp;
436439

437440
for (i = 0; i < zfs_vdev_ramp_rate; i++) {

0 commit comments

Comments
 (0)