-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZOS disastrous performance on PCIE 4 NVME SSD #1467
Comments
New tests made on zos v3.0.1-rc3, better but still way below I should get : Tests are done on a rootfs of Ubuntu zMachine :
Note performance regression on sequential write... Tests on disks added to the zMachine and mounted on /data :
It doesn't make sense ! If added disk should get the native NVME SSD performance, there is clearly a problem somewhere ! Could someone please explain how the storage framework on zos v3 works ? |
@maxux please take a look at it |
Just to be clear about this part, what you mean is that you mounted a |
For V3 all container workloads are virtualized, it means all IO is actually going through virtio driver. This explain the drop in performance. What happen behind the scene for V3:
So IO operations go through this,
Of course there is a lot of room for improvement, for example use of logical volumes on host so write operation on host are directly sent to physical disk not to another btrfs layer |
Yes, you got it
Thanks for the explanation. Indeed the architectural choice you made is not the best for IO performance ! It would be great to allow logical volume creation and mount inside the VMs (at least for power users who'd like to get all the performance from their hardware). I would be glad to be a tester for this use case !
If I get it correctly, every ZOS deployment will be VMs in v3 (like k3s) and containers should be deployed in the virtualized k3s ? |
Yes, ZOS has a unified workload type called When you start a k8s node on zos, it's basically a well crafted "flist" with the k8s well configured and ready to start. for ZOS it's just another VM that it runs same way as a container (this makes code much simpler) |
Which image do you run exactly ? Default zos runs a kernel 5.4, there is a 5.10 also available. |
My first post was done with kernel 5.4 in grid v2 node id is 68, IP is 2a02:842a:84c8:c601:d250:99ff:fedf:924d (ICMP is blocked, but IPv6 firewall allows everything else) |
I confirm, your node is running 5.10.55 kernel, which is the latest we support officially. |
FYI I automated my fio tests and launched it simultaneously on X ubuntu VMs Each 4 VMs have exactly the same results as a launch with only 1 VM I see a degradation of performance per VM when I launch the test on 8 VM My guess is that it is a vfio limitation, could be good to know if you make some performance tweaks someday Still, sequential write is disastrous with vfio, and I don't have a clue why... |
this will have to wait, we have other things to first do. |
Hello Team, can we have an update on this, please? |
Hello, following this forum post with no news since a month, I thought it will be a better idea to create this issue here.
Quick summary, ZOS is having terrible PCIE4 SSD performance issue. Here are some fio tests results on actual ZoS :
Made some tests on the same machine with Ubuntu 20 and kernel 5.4, same results.
Hopefully performance is very good on Ubuntu when switching to 5.10.x kernels 👍
This answer was given to me :
I got it, 0-fs is not meant to be fast, but this slow would still be a big problem for a computer which only have one container running... I tried to deploy a flist with a data container mounted at /data and ran again fio, results were strictly identical. I'm pretty sure the issue is kernel related.
Could you have a look please ? I cannot start hosting productions workload with such terrible IO performance...
The text was updated successfully, but these errors were encountered: