Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

guest_metrics_last_updated not updated correctly #4237

Open
olivierlambert opened this issue Oct 23, 2020 · 4 comments
Open

guest_metrics_last_updated not updated correctly #4237

olivierlambert opened this issue Oct 23, 2020 · 4 comments

Comments

@olivierlambert
Copy link

olivierlambert commented Oct 23, 2020

Hello everyone,

Here is a report on something that might be interesting.

Context 馃挕

I was investigating the possibility to do VMware like app/VM watchdog capabilities. With Xen Orchestra, we could use some XAPI values to make actions if needed. Because we are able to replicate VMs, we could decide to start some of them if the original VM is frozen or dead (remember XO is "on top of XAPI", meaning we can make decisions globally on multiple pool, which is great to leverage your whole XAPI enabled host infrastructure).

Thanks to the XAPI doc (I will never emphasis that enough: this is our bible here! Thanks for it! 馃憤 ) I found interesting fields in the VM_guest_metrics class:

  • live bool [RO/runtime] > True if the guest is sending heartbeat messages via the guest agent
  • last_updated datetime [RO/runtime] > Time at which this information was last updated

Perfect tool for the job! 馃槃

The problem 馃悰

So I started to investigate when live would become false, by freezing the VM (eg with a simple xl pause $DOMID). Weirdly enough, nothing happened. It was always at true.

But that wasn't all: last_updated returned an old value (few minutes, hours or days before for some VMs), not something it's advertise to do. So I decided to compare directly by reading xenstore myself.

So for example:

# xenstore-ls /local/domain/<dom ID> | grep updated
 updated = "Thu Oct 22 19:44:54 2020"

I did the command multiple times, and obviously, the value was updated around every minutes (which is the expected thing) 馃憤

But in XAPI, nothing changed:

# xe vm-param-get uuid=<VM UUID> param-name=guest-metrics-last-updated 
20201022T15:43:24Z

I double/triple checked, the value was indeed refreshed in xenstore, but not in XAPI. So there's 2 problems:

  • live value seems to be not correctly computed (or not doing what the doc said it should)
  • last_updated isn't updated at the same pace than the xenstore, despite again what it should do if we take the doc for granted

Tests 馃И

I did tests on 8.1 and 8.2, same outcome. I have the feeling it's affecting master too.

@robhoes
Copy link
Member

robhoes commented Oct 23, 2020

Looking at 4d1b51c, it appears that the VM_guest_metrics.live field has effectively become obsolete (always true). We should update the API docs to reflect that.

I would expect last_updated to be set to the time when at least one of the VM_guest_metrics fields has been updated (e.g. PV_drivers_version, os_version, networks, ...). So that depends on the fields that xapi reflects. It is not the same as updated in xenstore. Try writing to one of the other keys.

So there isn't really a "heartbeat" that you can get through the guest metrics. It's not really a good mechanism for that sort of thing; due to the overhead, it doesn't scale well. RRDs may be more useful for what you are trying to do.

@olivierlambert
Copy link
Author

olivierlambert commented Oct 23, 2020

@robhoes thanks for the answer. However:

  • using RRDs is OK but it's not enough. VMware is doing that both with agent liveness and VM metrics for a reason (see below)
  • if you are using both, you can have a more fine grained situation report: eg you have some RRDs info but not the liveness because of some reasons (very high usage, not enough CPU time in the guest, etc.)

Also, having a freshness from the guest would allow to do application monitoring (your app is writing into the xenstore every minute, both with the guest agent). This way, you have a complete control on what's going on from API point of view (reporting that the app isn't sending heartbeat but the VM is for example, might not trigger the same action than the VM isn't sending anything).

That's why having live data from the xenstore by using XAPI is interesting for "higher level" applications.

@robhoes
Copy link
Member

robhoes commented Oct 23, 2020

RRD data sources can come from xenstore. This is for example how the memory RRDs work. So if a guest agent is writing heart beats to xenstore, then an RRD could be created to watch this. Doing this through the xapi database does not scale, and this is why RRDs were invented in the first place.

@olivierlambert
Copy link
Author

That would be a fair solution indeed 馃憤 Do you have any doc/tutorial on how to build such a thing?

Speaking of RRDs, it would be the occasion to not use XML but something easier to parse instead (in terms of CPU cost). But it's another topic 馃槃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants