CA-136792: Read VBD IO metrics from '/dev/shm'. #1818

Merged
merged 1 commit into from Jul 25, 2014

Projects

None yet

4 participants

@siddharthv
Contributor

With blktap3, blkback is not on the datapath except for rare occasions.
The statistics originally produced by blkback are now done from tapdisk.
This patch first attempts to read the statistics from under '/dev/shm'
to produce RRDs and defaults to the original sysfs statistics tree when
there are no valid files under 'dev/shm'.

Signed-off-by: Siddharth Vinothkumar siddharth.vinothkumar@citrix.com

Collaborator
johnelse commented Jul 8, 2014

I think the rdwr parameter of get_latency_metrics would be better as a variant type than a string - that way you could use an exhaustive pattern match e.g.

match rdwr with
| `read -> ... sscanf read requests ...
| `write -> ... sscanf write requests ...

We should avoid strings with magic values wherever possible :)

Collaborator
johnelse commented Jul 8, 2014

Also in get_latency_metrics you don't need the ref - you can just do something like

try
    ... attempt to sscanf the line ...
with _ -> (0L, 0L, 0L)

Similarly you can remove the ref in read_shm_stats_line.

Collaborator
johnelse commented Jul 8, 2014

It looks like the overall logic here is

if (there exist directories named "/dev/shm/vbd3-*")
then (read stats from /dev/shm)
else (read stats from sysfs)

However blktap2/blktap3 is a per-VBD rather than per-host switch, so shouldn't we be trying to read stats from both places in all cases?

Collaborator
johnelse commented Jul 8, 2014

I'm wary about the unguarded List.hd and List.nths - can we be certain that we will always read 3 lines from stat_file, even if we try to read it during an update?

Collaborator
johnelse commented Jul 8, 2014

It looks like this block of code is duplicated across the two parts of the function, apart from the comment and the definition of vbd_name:

            let open Device_number in
            let device_name = devid |> of_xenstore_key |> to_linux_device in
            let vbd_name = Printf.sprintf "vbd_%s" device_name in
            (* If blktap fails to cleanup then we might find a backend domid which doesn't
                 correspond to an active domain uuid. Skip these for now. *)
            let newacc =
                try
                    let uuid = uuid_of_domid doms domid in
                    (VM uuid, ds_make ~name:(vbd_name^"_write")
                        ~description:("Writes to device '"^device_name^"' in bytes per second")
                        ~value:(Rrd.VT_Int64 wr_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
                        ~units:"B/s" ())::
                    (VM uuid, ds_make ~name:(vbd_name^"_read")
                        ~description:("Reads from device '"^device_name^"' in bytes per second")
                        ~value:(Rrd.VT_Int64 rd_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
                        ~units:"B/s" ())::
                    (VM uuid, ds_make ~name:(vbd_name^"_read_latency")
                        ~description:("Reads from device '" ^ device_name ^ "' in microseconds")
                        ~units:"μs" ~value:(Rrd.VT_Int64 rd_avg_usecs)
                        ~ty:Rrd.Gauge ~min:0.0 ~default:false ())::
                    (VM uuid, ds_make ~name:(vbd_name^"_write_latency")
                        ~description:("Reads from device '" ^ device_name ^ "' in microseconds")
                        ~value:(Rrd.VT_Int64 wr_avg_usecs) ~ty:Rrd.Gauge ~min:0.0
                        ~default:false ~units:"μs" ())::
                    acc
                with _ -> acc

It would be good to pull some of this out into another function within the scope of update_vbds, to avoid the duplication.

Collaborator
johnelse commented Jul 9, 2014

Related to the last comment, do we really want to prefix the blktap3 datasources "vbd3_" instead of "vbd_" ? XenCenter expects certain datasource names and turns them into "nice" names, which can be internationalised. I expect that changing the prefix for the blktap3 datasources will break this.

Regardless of the XenCenter issue, do we really want to name the datasources differently at the API level depending on the implementation backing the VBD? It doesn't seem to me like a distinction an API user would want to make.

Owner

@siddharthv after discussions with @tmakatos we thought it would be a good idea to combine the info in the sysfs path and the shm path (if they both exist) to create one datasource rather than two for each of the read/write/latency metrics. This would involve adding the read/write bytes, and taking the maximum latency in the two files. So the logic becomes more like:

  • List files in sysfs path and shm path and union them
  • For each item
    • parse sysfs path, default to rd_sect=0, rd_usecs=0 etc
    • parse shm path, default to rd_sect=0 etc.
    • make combined figures (sum _sects, averate the averages and take the maximum 'max_usecs' figure)
    • make 1 datasource describing these numbers.

Does this make sense?

Contributor

The above logic makes complete sense. I'll also ensure that the blktap3 datasources are named "vbd_" rather than "vbd3_". Thanks a lot for the review @johnelse and @jonludlam.

Contributor

@tmakatos stated that the shm counters are reset every time they are read.

Contributor

The sysfs counters are reset every time they're read, not the shm ones. The shm ones are reset periodically.

Contributor

Oops! Wrong interpretation of a different comment. Sorry @tmakatos .
But I don't think that should affect the above logic suggested by @jonludlam

Collaborator

Quite a bit of the code is double-indented - see lines

  • 406-409
  • 415-416
  • 418-419
  • 442-end

Also I wonder whether it would be better to drop the datasources for which we fail to parse the files, rather than record all zeros. What do you think, @jonludlam ?

Owner

@johnelse yes, I agree. The semantics are fairly well defined, so rrdd should behave correctly if we drop the datasources, and it would behave incorrectly if we passed zeros.

Contributor

@johnelse , @jonludlam : I've added a commit to drop unwanted datasources. Could you please check again ?

Collaborator

I don't think we can use the fact that the datasource value is zero as an indicator that the read failed - zero could actually be a valid value for wr_bytes or rd_bytes if no IO happened in a particular period. Instead we should drop the datasources if the pattern match in parse_shm_stats fails.

If the pattern match succeeds, then we can combine the values read with the values from the sysfs files (if any).

If the pattern match fails, then we should use the sysfs files alone or, if they don't exist, then we should return no datasources for this vbd.

I admit this will take a bit of refactoring, but I think this will be more correct. It might be easiest to return an option type from parse_shm_stats, defaulting to None if the pattern match fails - that way you can distinguish between "the read failed" and "the read succeeded, but zeros were read".

@johnelse johnelse commented on an outdated diff Jul 23, 2014
ocaml/rrdd/rrdd_main.ml
+ let (shm_rd_reqs, shm_wr_reqs, shm_rd_avg_usecs, shm_wr_avg_usecs) = (a, b, c, d) in
+ let (rd_reqs, wr_reqs, rd_avg_usecs, wr_avg_usecs) = (p, q, r, s) in
+ (* Take max for usecs *)
+ let rd_avg_usecs = max rd_avg_usecs shm_rd_avg_usecs in
+ let wr_avg_usecs = max wr_avg_usecs shm_wr_avg_usecs in
+ (* Average out the read/write requests *)
+ let rd_reqs = avg64 rd_reqs shm_rd_reqs in
+ let wr_reqs = avg64 wr_reqs shm_wr_reqs in
+ generate_rrds acc rd_reqs wr_reqs rd_avg_usecs wr_avg_usecs;
+ | Some(a, b, c, d), None ->
+ let (shm_rd_reqs, shm_wr_reqs, shm_rd_avg_usecs, shm_wr_avg_usecs) = (a, b, c, d) in
+ generate_rrds acc shm_rd_reqs shm_wr_reqs shm_rd_avg_usecs shm_wr_avg_usecs
+ | None, Some(p, q, r, s) ->
+ let (rd_reqs, wr_reqs, rd_avg_usecs, wr_avg_usecs) = (p, q, r, s) in
+ generate_rrds acc rd_reqs wr_reqs rd_avg_usecs wr_avg_usecs
+ | None, None -> []
johnelse
johnelse Jul 23, 2014 Collaborator

Should this be acc rather than [] ?

@johnelse johnelse commented on an outdated diff Jul 23, 2014
ocaml/rrdd/rrdd_main.ml
List.filter
- (fun s -> String.startswith "vbd-" s || String.startswith "tap-" s) dirs in
+ (fun s -> String.startswith "vbd-" s || String.startswith "tap-" s) sysfs_dirs in
+ let vbds = shm_vbds @ sysfs_vbds in
+
johnelse
johnelse Jul 23, 2014 Collaborator

No trailing whitespace please :)

Collaborator

I've made a couple of comments in line, but I think this is almost there now.

Could you please also

Thanks!

Contributor

@johnelse thanks a lot for reviewing this. I've addressed your comments and squashed the commits.
Please have a look at this again!

Collaborator

@xen-git test this please

@johnelse johnelse commented on an outdated diff Jul 24, 2014
ocaml/rrdd/rrdd_main.ml
+ let vbd_name = Printf.sprintf "vbd_%s" device_name in
+ (* If blktap fails to cleanup then we might find a backend domid which doesn't
+ correspond to an active domain uuid. Skip these for now. *)
+ let newacc =
+ try
+ let uuid = uuid_of_domid doms domid in
+ (VM uuid, ds_make ~name:(vbd_name^"_write")
+ ~description:("Writes to device '"^device_name^"' in bytes per second")
+ ~value:(Rrd.VT_Int64 wr_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
+ ~units:"B/s" ())::
+ (VM uuid, ds_make ~name:(vbd_name^"_read")
+ ~description:("Reads from device '"^device_name^"' in bytes per second")
+ ~value:(Rrd.VT_Int64 rd_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
+ ~units:"B/s" ())::
+ (VM uuid, ds_make ~name:(vbd_name^"_read_latency")
+ ~description:("Reads from device '" ^ device_name ^ "' in microseconds")
johnelse
johnelse Jul 24, 2014 Collaborator

Should probably be "Read latency for device" rather than "Reads from device".

@johnelse johnelse commented on an outdated diff Jul 24, 2014
ocaml/rrdd/rrdd_main.ml
+ try
+ let uuid = uuid_of_domid doms domid in
+ (VM uuid, ds_make ~name:(vbd_name^"_write")
+ ~description:("Writes to device '"^device_name^"' in bytes per second")
+ ~value:(Rrd.VT_Int64 wr_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
+ ~units:"B/s" ())::
+ (VM uuid, ds_make ~name:(vbd_name^"_read")
+ ~description:("Reads from device '"^device_name^"' in bytes per second")
+ ~value:(Rrd.VT_Int64 rd_bytes) ~ty:Rrd.Derive ~min:0.0 ~default:true
+ ~units:"B/s" ())::
+ (VM uuid, ds_make ~name:(vbd_name^"_read_latency")
+ ~description:("Reads from device '" ^ device_name ^ "' in microseconds")
+ ~units:"μs" ~value:(Rrd.VT_Int64 rd_avg_usecs)
+ ~ty:Rrd.Gauge ~min:0.0 ~default:false ())::
+ (VM uuid, ds_make ~name:(vbd_name^"_write_latency")
+ ~description:("Reads from device '" ^ device_name ^ "' in microseconds")
johnelse
johnelse Jul 24, 2014 Collaborator

"Write latency for device" instead of "Reads from device"

Collaborator

Found a couple of incorrect datasource descriptions - other than that, I think this is good to merge :)

@siddharthv siddharthv CA-136792: Create IO metrics from '/dev/shm' and sysfs.
With blktap3, blkback is not on the datapath except for rare occasions.
The statistics originally produced by blkback are now done from tapdisk.
This patch attempts to acquire statistics from under '/dev/shm'
and sysfs to generate only valid datasources.

In case we obtain stats from both sysfs and '/dev/shm':
- We take the maximum value for usecs from amongst sysfs and shm
- We average the values for the read and write metrics

Signed-off-by: Siddharth Vinothkumar <siddharth.vinothkumar@citrix.com>
795b08e
Contributor

Good catch with the datasource descriptions :)
These descriptions are incorrect even on xcp-rrdd !

@johnelse johnelse merged commit 068a6b4 into xapi-project:xs64bit-ring3 Jul 25, 2014

1 check passed

default Merged build finished.
Details
@siddharthv siddharthv referenced this pull request in xapi-project/blktap Jul 25, 2014
Merged

CA-136792: produce blkback-style stats #97

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment