Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some findings. #1

Closed
netzbasis opened this issue Mar 1, 2019 · 16 comments
Closed

Some findings. #1

netzbasis opened this issue Mar 1, 2019 · 16 comments

Comments

@netzbasis
Copy link

As I wrote on Twitter, does not compile for 4.9 kernel, 4.19 works when I "#undef CONFIG_PM_SLEEP" in virtio_vmmci.h.

VIRTIO_F_SR_IOV in virtio_pci_common.h was introduced with 4.18, see https://elixir.bootlin.com/linux/v4.18-rc1/ident/VIRTIO_F_SR_IOV

@voutilad
Copy link
Owner

voutilad commented Mar 1, 2019

Which distro and version are you using? I've been testing with both a 4.15 and a 4.20 kernel under Ubuntu 18.04 and haven't had issues with either yet.

I'll do some testing when I get a chance with the oldest longterm 4.x version which as of today is 4.4.176.

@netzbasis
Copy link
Author

Running debian testing (buster) now which uses kernel 4.19.0. Just running make gives:
root@debian:~/virtio_vmmci# make
make -C /lib/modules/4.19.0-2-amd64/build M=/root/virtio_vmmci modules
make[1]: Entering directory '/usr/src/linux-headers-4.19.0-2-amd64'
CC [M] /root/virtio_vmmci/virtio_vmmci.o
CC [M] /root/virtio_vmmci/virtio_pci_openbsd.o
CC [M] /root/virtio_vmmci/virtio_pci_common.o
/root/virtio_vmmci/virtio_pci_common.c:256:16: error: ‘virtio_pci_pm_ops’ undeclared here (not in a function); did you mean ‘virtio_pci_probe’?
.driver.pm = &virtio_pci_pm_ops,
^~~~~~~~~~~~~~~~~
virtio_pci_probe
make[4]: *** [/usr/src/linux-headers-4.19.0-2-common/scripts/Makefile.build:309: /root/virtio_vmmci/virtio_pci_common.o] Error 1
make[3]: *** [/usr/src/linux-headers-4.19.0-2-common/Makefile:1540: module/root/virtio_vmmci] Error 2
make[2]: *** [Makefile:146: sub-make] Error 2
make[1]: *** [Makefile:8: all] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.19.0-2-amd64'
make: *** [Makefile:7: all] Error 2

I have not done a kernel build, only installed the src. So maybe some autogenerated file is missing on my system.

@joe9
Copy link

joe9 commented Mar 2, 2019

same error on a 4.20.13 debian kernel

commenting out this line: .driver.pm = &virtio_pci_pm_ops, helps.

It appears that the function is not defined.

`~/virtio_vmmci$ make
make -C /lib/modules/4.20.13/build M=/home/j/virtio_vmmci modules
make[1]: Entering directory '/usr/src/linux-headers-4.20.13'
CC [M] /home/j/virtio_vmmci/virtio_vmmci.o
CC [M] /home/j/virtio_vmmci/virtio_pci_openbsd.o
CC [M] /home/j/virtio_vmmci/virtio_pci_common.o
/home/j/virtio_vmmci/virtio_pci_common.c:256:16: error: ‘virtio_pci_pm_ops’ undeclared here (not in a function)
.driver.pm = &virtio_pci_pm_ops,
^~~~~~~~~~~~~~~~~
scripts/Makefile.build:291: recipe for target '/home/j/virtio_vmmci/virtio_pci_common.o' failed
make[2]: *** [/home/j/virtio_vmmci/virtio_pci_common.o] Error 1
Makefile:1562: recipe for target 'module/home/j/virtio_vmmci' failed
make[1]: *** [module/home/j/virtio_vmmci] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-4.20.13'
Makefile:7: recipe for target 'all' failed
make: *** [all] Error 2

`

@voutilad
Copy link
Owner

voutilad commented Mar 2, 2019

I've added some more macro if/defs in a commit to my next branch. I still need to test it on my other vms, but seems to work on my buster vm:

Linux buster 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux

Should also fix the CONFIG_PM_SLEEP issue since I had forgot I ripped out the handler functions.

@netzbasis
Copy link
Author

next works for me except the typo in virtio_pci_common.h:139

LINUX_KERNEL_VERSION --> LINUX_VERSION_CODE

@joe9
Copy link

joe9 commented Mar 2, 2019

@netzbasis next works for me as-is. I do not see the typo mentioned by you.

4.9.0-6-amd64 kernel on devuan

p:~/virtio_vmmci-next$ make
make -C /lib/modules/4.9.0-6-amd64/build M=/home/j/virtio_vmmci-next modules
make[1]: Entering directory '/usr/src/linux-headers-4.9.0-6-amd64'
CC [M] /home/j/virtio_vmmci-next/virtio_vmmci.o
CC [M] /home/j/virtio_vmmci-next/virtio_pci_openbsd.o
CC [M] /home/j/virtio_vmmci-next/virtio_pci_common.o
LD [M] /home/j/virtio_vmmci-next/virtio_pci_obsd.o
Building modules, stage 2.
MODPOST 2 modules
CC /home/j/virtio_vmmci-next/virtio_pci_obsd.mod.o
LD [M] /home/j/virtio_vmmci-next/virtio_pci_obsd.ko
CC /home/j/virtio_vmmci-next/virtio_vmmci.mod.o
LD [M] /home/j/virtio_vmmci-next/virtio_vmmci.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.9.0-6-amd64'

@joe9
Copy link

joe9 commented Mar 2, 2019

@voutilad what do you think of removing this message?

[ 11.588721] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[ 16.609035] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock

and, also What do you think of doing this every second?

Thanks for the quick fix

@voutilad
Copy link
Owner

voutilad commented Mar 3, 2019

@netzbasis Typo is fixed on next branch. If it compiles cleanly now I'll merge it to master.

@joe9 I'm going to be working on cleaning up all the logging and having a sysctl exposed flag for changing log levels.

As for the drift setting, now that virtio_vmmci is handling interrupts from the host, I'm going to work to achieve parity with OpenBSD's vmmci(4) driver.

The current virtio_vmmci behavior differs from the OpenBSD vmmci(4) in that while it will measure drift at a regular interval (in vmmci(4) this is currently 15s), the OpenBSD driver doesn't take action on that drift and only exposes the data as a sensor readable via sysctl. When vmmci(4) receives a command to sync its RTC, it then calls inittodr(9) to sync system time to the emulated hardware time.

You might notice that the RTC values output via timedatectl (if on a systemd-based system) tend to always be accurately tied to the host...but it's the system time that drifts. I only discovered this after my initial work.

If I'm going to achieve parity with vmmci(4), I need to instead be tracking the drift at a regular interval and only updating system time when told by the host to do so. If a guest has regular clock drift without it being related to suspend/hibernate of the host, my current method of synchronization every N seconds is only a bandage over a bigger problem.

@joe9
Copy link

joe9 commented Mar 3, 2019

@voutilad Thanks for the explanation.

If a guest has regular clock drift without it being related to suspend/hibernate of the host, my current method of synchronization every N seconds is only a bandage over a bigger problem.

Yes, this is what is happening on devuan (no systemd, using sysvinit) vm's and unfortunately this bandage is the only solution that works. ntpd cannot handle this except at startup.

@voutilad
Copy link
Owner

voutilad commented Mar 3, 2019

@joe9 are you setting clocksource=tsc in your kernel boot parameters? You shouldn't see much if any clock drift. If you're seeing drift, even under CPU load, chances are the Linux guest kernel is using jiffies or refined_jiffies as the clocksource and not tsc. (Check /sys/devices/system/clocksource/clocksource0/current_clocksource and make sure it's reporting tsc. If it doesn't also check /sys/devices/system/clocksource/clocksource0/available_clocksource to make sure tsc is even listed.)

@joe9
Copy link

joe9 commented Mar 3, 2019

@voutilad I cannot set clocksource=tsc. I tried to set the 'clocksource=tsc' as a kernel boot parameter. On boot, the system still switches to refined-jiffies. Below is what dmesg says about it.


sudo dmesg | grep clocksource
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.036434] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.037731] clocksource: Switched to clocksource refined-jiffies
[    2.060123] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x13f89e3a839f, max_idle_ns: 440795697359 ns
[    2.060126] clocksource: Switched to clocksource tsc
[    3.134974] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[    3.134975] clocksource:                       'refined-jiffies' wd_now: fffedd50 wd_last: fffedcd0 mask: ffffffff
[    3.134975] clocksource:                       'tsc' cs_now: 4eb19288c2 cs_last: 4ca93e3f38 mask: ffffffffffffffff
[    3.136303] clocksource: Switched to clocksource refined-jiffies


$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
refined-jiffies jiffies tsc 

# echo tsc >/sys/devices/system/clocksource/clocksource0/current_clocksource

# sudo dmesg | tail
[  523.394816] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  527.332747] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  531.238387] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  535.133299] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  539.034377] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  542.967622] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  546.897944] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  549.866306] clocksource: Switched to clocksource tsc
[  550.755845] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[  554.647641] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock

@netzbasis
Copy link
Author

OK for merging next branch. It works great.

@voutilad
Copy link
Owner

voutilad commented Mar 3, 2019

@netzbasis thanks, I'll merge shortly. Thanks for help testing! I never thought virtio would have so many changes in the v4.x series.

@joe9 I've seen that on kernels after v4.15, hence in my v4.20 fork I reverted the tsc code (mostly in arch/x86/kernel/) to versions from ~v4.15. I'm assuming that what you wrote in the other issue you're using a stock 4.20 kernel that contains the refinement/calibration logic I reverted in my fork.

I haven't had the time to investigate what is going on with tsc after v4.15, but for now I've seen no issues using my 4.20.12 kernel. Something in how the kernel tries to calibrate tsc fails and it appears the kernel deems it unreliable and falls back to jiffies which causes the drift.

You can either try using my kernel fork if you need a more recent (i.e. 4.20) kernel OR try using the tsc=reliable kernel boot parameter. (I didn't have much success with tsc-reliable, hence my kernel modifications.)

@joe9
Copy link

joe9 commented Mar 3, 2019

@voutilad Now, I am using the next branch on 4.9.0-6-amd64

Earlier, when virtio_vmmci did not work on earlier kernels, I was trying it on 4.20.13 kernel.

Thanks for the tip on tsc=reliable. It does not seem to help though. Any other ideas?

joe@twsp:~$ sudo dmesg | grep clock
[sudo] password for joe: 
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-6-amd64 root=UUID=93ce8d9f-aabd-4400-bf29-36817db0aae9 ro quiet console=ttyS0,115200 clocksource=tsc tsc=reliable
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.9.0-6-amd64 root=UUID=93ce8d9f-aabd-4400-bf29-36817db0aae9 ro quiet console=ttyS0,115200 clocksource=tsc tsc=reliable
[    0.036445] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.037831] clocksource: Switched to clocksource refined-jiffies
[    0.409258] rtc_cmos rtc_cmos: setting system clock to 2019-03-03 22:38:18 UTC (1551652698)
[    1.258777] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1266f82e1c73, max_idle_ns: 440795386830 ns
[    1.258788] clocksource: Switched to clocksource tsc
[   10.604166] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
joe@twsp:~$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc 
joe@twsp:~$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc
joe@twsp:~$ sudo dmesg | tail
[sudo] password for joe: 
[   10.460520] Adding 262140k swap on /swapfile.  Priority:-1 extents:2 across:270332k FS
[   10.596955] virtio_pci_obsd: loading out-of-tree module taints kernel.
[   10.597057] virtio_pci_obsd_match: matching 0x0777
[   10.597057] virtio_pci_obsd_match: found OpenBSD device
[   10.598595] virtio_vmmci: started VMM Control Interface driver
[   10.604166] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[   15.791025] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[   20.907779] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[   26.029536] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
[   31.151382] virtio_vmmci: detected drift greater than 5 seconds, synchronizing clock
joe@twsp:~$ uname -r
4.9.0-6-amd64
joe@twsp:~$ uname -a
Linux twsp 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux

@voutilad
Copy link
Owner

voutilad commented Mar 3, 2019

@joe9 only thing I can think of could be related to if you're running OpenBSD 6.4 or a recent snapshot. Plus, there could be variations in Intel (vt-x) vs. AMD (svm) on your host machine.

@netzbasis netzbasis reopened this Mar 18, 2019
@netzbasis
Copy link
Author

Closing because works great for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants