@@ -8,61 +8,13 @@ both at leaf nodes as well as at intermediate nodes in a storage hierarchy.
8
8
Plan is to use the same cgroup based management interface for blkio controller
9
9
and based on user options switch IO policies in the background.
10
10
11
- Currently two IO control policies are implemented. First one is proportional
12
- weight time based division of disk policy. It is implemented in CFQ. Hence
13
- this policy takes effect only on leaf nodes when CFQ is being used. The second
14
- one is throttling policy which can be used to specify upper IO rate limits
15
- on devices. This policy is implemented in generic block layer and can be
16
- used on leaf nodes as well as higher level logical devices like device mapper.
11
+ One IO control policy is throttling policy which can be used to
12
+ specify upper IO rate limits on devices. This policy is implemented in
13
+ generic block layer and can be used on leaf nodes as well as higher
14
+ level logical devices like device mapper.
17
15
18
16
HOWTO
19
17
=====
20
- Proportional Weight division of bandwidth
21
- -----------------------------------------
22
- You can do a very simple testing of running two dd threads in two different
23
- cgroups. Here is what you can do.
24
-
25
- - Enable Block IO controller
26
- CONFIG_BLK_CGROUP=y
27
-
28
- - Enable group scheduling in CFQ
29
- CONFIG_CFQ_GROUP_IOSCHED=y
30
-
31
- - Compile and boot into kernel and mount IO controller (blkio); see
32
- cgroups.txt, Why are cgroups needed?.
33
-
34
- mount -t tmpfs cgroup_root /sys/fs/cgroup
35
- mkdir /sys/fs/cgroup/blkio
36
- mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
37
-
38
- - Create two cgroups
39
- mkdir -p /sys/fs/cgroup/blkio/test1/ /sys/fs/cgroup/blkio/test2
40
-
41
- - Set weights of group test1 and test2
42
- echo 1000 > /sys/fs/cgroup/blkio/test1/blkio.weight
43
- echo 500 > /sys/fs/cgroup/blkio/test2/blkio.weight
44
-
45
- - Create two same size files (say 512MB each) on same disk (file1, file2) and
46
- launch two dd threads in different cgroup to read those files.
47
-
48
- sync
49
- echo 3 > /proc/sys/vm/drop_caches
50
-
51
- dd if=/mnt/sdb/zerofile1 of=/dev/null &
52
- echo $! > /sys/fs/cgroup/blkio/test1/tasks
53
- cat /sys/fs/cgroup/blkio/test1/tasks
54
-
55
- dd if=/mnt/sdb/zerofile2 of=/dev/null &
56
- echo $! > /sys/fs/cgroup/blkio/test2/tasks
57
- cat /sys/fs/cgroup/blkio/test2/tasks
58
-
59
- - At macro level, first dd should finish first. To get more precise data, keep
60
- on looking at (with the help of script), at blkio.disk_time and
61
- blkio.disk_sectors files of both test1 and test2 groups. This will tell how
62
- much disk time (in milliseconds), each group got and how many sectors each
63
- group dispatched to the disk. We provide fairness in terms of disk time, so
64
- ideally io.disk_time of cgroups should be in proportion to the weight.
65
-
66
18
Throttling/Upper Limit policy
67
19
-----------------------------
68
20
- Enable Block IO controller
@@ -94,7 +46,7 @@ Throttling/Upper Limit policy
94
46
Hierarchical Cgroups
95
47
====================
96
48
97
- Both CFQ and throttling implement hierarchy support; however,
49
+ Throttling implements hierarchy support; however,
98
50
throttling's hierarchy support is enabled iff "sane_behavior" is
99
51
enabled from cgroup side, which currently is a development option and
100
52
not publicly available.
@@ -107,9 +59,8 @@ If somebody created a hierarchy like as follows.
107
59
|
108
60
test3
109
61
110
- CFQ by default and throttling with "sane_behavior" will handle the
111
- hierarchy correctly. For details on CFQ hierarchy support, refer to
112
- Documentation/block/cfq-iosched.txt. For throttling, all limits apply
62
+ Throttling with "sane_behavior" will handle the
63
+ hierarchy correctly. For throttling, all limits apply
113
64
to the whole subtree while all statistics are local to the IOs
114
65
directly generated by tasks in that cgroup.
115
66
@@ -130,10 +81,6 @@ CONFIG_DEBUG_BLK_CGROUP
130
81
- Debug help. Right now some additional stats file show up in cgroup
131
82
if this option is enabled.
132
83
133
- CONFIG_CFQ_GROUP_IOSCHED
134
- - Enables group scheduling in CFQ. Currently only 1 level of group
135
- creation is allowed.
136
-
137
84
CONFIG_BLK_DEV_THROTTLING
138
85
- Enable block device throttling support in block layer.
139
86
@@ -344,32 +291,3 @@ Common files among various policies
344
291
- blkio.reset_stats
345
292
- Writing an int to this file will result in resetting all the stats
346
293
for that cgroup.
347
-
348
- CFQ sysfs tunable
349
- =================
350
- /sys/block/<disk>/queue/iosched/slice_idle
351
- ------------------------------------------
352
- On a faster hardware CFQ can be slow, especially with sequential workload.
353
- This happens because CFQ idles on a single queue and single queue might not
354
- drive deeper request queue depths to keep the storage busy. In such scenarios
355
- one can try setting slice_idle=0 and that would switch CFQ to IOPS
356
- (IO operations per second) mode on NCQ supporting hardware.
357
-
358
- That means CFQ will not idle between cfq queues of a cfq group and hence be
359
- able to driver higher queue depth and achieve better throughput. That also
360
- means that cfq provides fairness among groups in terms of IOPS and not in
361
- terms of disk time.
362
-
363
- /sys/block/<disk>/queue/iosched/group_idle
364
- ------------------------------------------
365
- If one disables idling on individual cfq queues and cfq service trees by
366
- setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
367
- on the group in an attempt to provide fairness among groups.
368
-
369
- By default group_idle is same as slice_idle and does not do anything if
370
- slice_idle is enabled.
371
-
372
- One can experience an overall throughput drop if you have created multiple
373
- groups and put applications in that group which are not driving enough
374
- IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
375
- on individual groups and throughput should improve.
0 commit comments