Add Dom0 mem_usage alert #1836

Merged
merged 5 commits into from Jul 22, 2014

Conversation

Projects
None yet
2 participants
Member

zli commented Jul 22, 2014

No description provided.

@zli zli Update the default priority setting in the comment section
... to reflect the changes of PR-1455.

Signed-off-by: Zheng Li <dev@zheng.li>
53e5c72

@thomassa thomassa and 1 other commented on an outdated diff Jul 22, 2014

scripts/perfmon
@@ -359,6 +359,27 @@ def get_percent_fs_usage(ignored):
# strip of % character and convert to float
return float(percentage[0:-1])/100.0
+def get_percent_mem_usage(ignored):
+ "Get the percent usage of Dom0 memory/swap. Input list is ignored and should be empty"
+ try:
+ memfd = open('/proc/meminfo', 'r')
+ memlist = memfd.readlines()
+ memfd.close()
+ memdict = [ m.split(':', 1) for m in memlist ]
+ memdict = dict([(k.strip(), float(re.search('\d+', v.strip()).group(0))) for (k,v) in memdict])
+ # We consider the sum of res memory and swap in use as the hard demand
+ # of mem usage, it is bad if this number is beyond the physical mem, as
+ # in such case swapping is obligatory rather than voluntary, hence
+ # degragating the performance. We define the percentage metrics as
@thomassa

thomassa Jul 22, 2014

Member

Spelling: "degrading" (not "degragating")

@zli

zli Jul 22, 2014

Member

Updated.

@thomassa thomassa and 1 other commented on an outdated diff Jul 22, 2014

scripts/mail-alarm
+ if not alarm_trigger_level: alarm_trigger_level = 1.0
+ if cls != 'VM':
+ raise Exception, "programmer error - this alarm should only be available for control domain VM"
+ self.params = get_VM_params(obj_uuid)
+ self.cls = cls
+ self.value = value
+ self.alarm_trigger_level = alarm_trigger_level
+
+ def generate_subject(self):
+ pool_name = get_pool_name()
+ return '[%s] XenServer Alarm: Dom0 memory demand is high on "%s"' % (pool_name, self.params['name_label'])
+
+ def generate_body(self):
+ return \
+ 'The memory demand on "%s" is about %.1f%% of the physical memory of the domain. ' \
+ 'Occasional performance degradation can be expected when memory swapping is enforced to happen.\n' \
@thomassa

thomassa Jul 22, 2014

Member

Should be "forced" not "enforced".
Does this understate the severity of the problem? I suspect "occasional performance degradation" could easily snowball into something much worse (and we have only 512MiBytes of swap-space).

@zli

zli Jul 22, 2014

Member

Typo fixed.

I don't think there is a best config for this, though I believe it makes some sense for the current setting of our system (with a comparatively small swap). I've also update the threshold to 95% which means it will get triggered when the hard mem demand is approaching the physical memory of the system, just in case that swap might become totally neglectable in future system.

Member

thomassa commented Jul 22, 2014

Looks fine except for tiny cosmetic things.
Probably worth fixing the text that goes into the alarm email.

zli added some commits Jul 21, 2014

@zli zli CP-9091: Add Dom0 mem_usage alert for perfmon
Signed-off-by: Zheng Li <dev@zheng.li>
43d6c0f
@zli zli CP-9093: decrease the inhibit period of fs_usage alert
It seems that the inhibit period of fs_usage alert is set too long (one week).
The consequence is that the alerts will be sent out much less frequently
(not showing much emergency as we want). Also it will mute the alarm for a week
long even if a different occurence shows up later on (after the first occurence
being addressed already) during this period of time.

Signed-off-by: Zheng Li <dev@zheng.li>
2773824
@zli zli CP-9091: Add mem_usage logic in mail-alarm
Signed-off-by: Zheng Li <dev@zheng.li>
d8c640a
@zli zli CP-9093: fix a debug message error
Signed-off-by: Zheng Li <dev@zheng.li>
b10dbb8

@thomassa thomassa added a commit that referenced this pull request Jul 22, 2014

@thomassa thomassa Merge pull request #1836 from zli/xenserver/master/CP-9091
Add Dom0 mem_usage alert
6cba889

@thomassa thomassa merged commit 6cba889 into xapi-project:master Jul 22, 2014

1 check passed

default Merged build finished.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment