Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Systemd zfs-mount.service requires network.target #8734

Closed
tedwardd opened this issue May 10, 2019 · 14 comments
Closed

Systemd zfs-mount.service requires network.target #8734

tedwardd opened this issue May 10, 2019 · 14 comments
Labels
Type: Question Issue for discussion

Comments

@tedwardd
Copy link

System information

Type Version/Name
Distribution Name CentOS
Distribution Version 7
Linux Kernel 3.10.0-693.21.1.el7.x86_64
Architecture x86_64
ZFS Version version: 0.7.9-1
SPL Version version: 0.7.9-1

Describe the problem you're observing

ZFS filesystems do not auto-mount at reboot on CentOS 7 with systemd unless After=network.target is added to the zfs-mount.service unit file.

Describe how to reproduce the problem

# zfs create <pool>/<filesystem>
# zfs mount <pool>/<filesystem>

# reboot

# mount

After reboot, the zfs filesystems do not mount automatically.

Add After=network.target to /usr/lib/systemd/system/zfs-mount.service and repeat the above steps. After reboot, zfs filesystems will automatically mount.

Include any warning/errors/backtraces from the system logs

I've skimmed through journalctl and can't find any indication of error. As far as I can tell, zfs-mount.service is a oneshot which means it will run, do what it can, and then exit cleanly assuming no errors were produced from zfs mount -a. If there are logs generated elsewhere that are worth looking at I'm happy to scan them and post any relevant information found.

@rlaager
Copy link
Member

rlaager commented May 12, 2019

The network.target thing seems like a red herring. It's probably working with that because you have delayed it until later in the boot process. It's far more likely that some other dependency is the real issue, as mounting ZFS filesystems does not require the network. Are you able to determine which additional units ran between when zfs-mount.service used to start and when it starts with After=network.target? See, for example: https://serverfault.com/questions/617398/is-there-a-way-to-see-the-execution-tree-of-systemd

@tedwardd
Copy link
Author

tedwardd commented May 13, 2019

Yes, I've run systemd-analyze as part of my debugging. Sorry, I should have included that in the debug info. I have included the output of systemd-analyze critical-chain zfs-mount.service with and without the dependency on network.target below. My initial thought was that the dependency might have been at either paths.target or basic.target but I've attempted to change the dependency to each of the intermediate services and targets, down as far as sysinit.target, but it was not until setting it back to network.target that mounting at boot worked as expected.

With After=network.target

zfs-mount.service +1.739s
└─network.target @18.401s
  └─network.service @6.058s +12.342s
    └─basic.target @6.029s
      └─paths.target @6.029s
        └─brandbot.path @6.029s
          └─sysinit.target @6.028s
            └─systemd-udev-settle.service @2.971s +3.036s
              └─systemd-udev-trigger.service @194ms +2.386s
                └─systemd-udevd-control.socket @162ms
                  └─-.slice

Without After=network.target

zfs-mount.service +5ms
└─systemd-udev-settle.service @3.196s +2.878s
  └─systemd-udev-trigger.service @187ms +2.767s
    └─systemd-udevd-control.socket @165ms
      └─-.slice

I'm surprised that there is no dependency on networking considering that ZFS is able to manage NFS exports.

@mskarbek
Copy link
Contributor

mskarbek commented May 13, 2019

I'm surprised that there is no dependency on networking considering that ZFS is able to manage NFS exports.

In terms of NFS, we should depend on remote-fs.target not network.target.

@rlaager
Copy link
Member

rlaager commented May 13, 2019

Depending on remote-fs.target is for when you want to ensure that your mounts of remote filesystems have completed. That's not applicable to the ZFS case. It is the NFS server, not the NFS client.

Even for the NFS server, I doubt that network.target is appropriate. Why can't ZFS mount filesystems without the network, even if that involves configuring NFS to share them? Put differently, why does modifying the NFS configuration require the network to be up?

@rlaager
Copy link
Member

rlaager commented May 13, 2019

Without network.target, what does systemctl list-dependencies zfs-mount.service say? Is it depending on something (scan or cache) to import the pool, and has that worked?

@tedwardd
Copy link
Author

@rlaager

# systemctl list-dependencies zfs-mount.service
zfs-mount.service
● ├─system.slice
● └─zfs-import-cache.service

It looks like it does depend on zfs-import-cache.service but I don't know that I understand your question "and has that worked?". Has what worked?

@rlaager
Copy link
Member

rlaager commented May 14, 2019

Did zfs-import-cache succeed and import the pool?

@tedwardd
Copy link
Author

This is the only information in the logs around ZFS at time of boot. Is there another location I should look or flags I can enable to try to log more verbosely? Looking through the docs I didn't see anything that stuck out.

May  2 10:33:26 xxxxx01 kernel: ZFS: Loaded module v0.7.9-1, ZFS pool version 5000, ZFS filesystem version 5
May  2 10:33:26 xxxxx01 systemd: Started udev Wait for Complete Device Initialization.
May  2 10:33:26 xxxxx01 kernel: type=1130 audit(1556793206.434:68): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May  2 10:33:26 xxxxx01 systemd: Starting Import ZFS pools by cache file...
May  2 10:33:26 xxxxx01 systemd: Starting Mount ZFS filesystems...
May  2 10:33:26 xxxxx01 systemd: Started Mount ZFS filesystems.

@johnnyjacq16
Copy link

Look at the status of the zfs service with systemctl status zfs-mount.service, as this normally shows why the service is not starting properly
Also check the current logs for all logs that has anything to do with zfs-mount.service and zfs-import-cache.service with command journalctl -u zfs-mount.service -b 0 and journalctl -u zfs-import-cache.service -b 0.
To get live log updates you could open another terminal or use tmux and run the command journalctl -u zfs-mount.service -b 0 -f and in the other terminal run systemctl restart zfs-mount.service or zfs mount -a. If there a lot of error messages you can run journalctl -u zfs-mount.service -p err -b 0 -f to filter them.

@tedwardd
Copy link
Author

tedwardd commented May 15, 2019

@johnnyjacq16 Here's the output from journalctl as you described. No errors:

[root@xxx.xxx.xxx ~] # journalctl -u zfs-mount.service -b 0
-- Logs begin at Tue 2018-12-25 23:01:49 GMT, end at Wed 2019-05-15 18:04:03 GMT. --
May 02 10:33:26 xxx.xxx.xxx systemd[1]: Starting Mount ZFS filesystems...
May 02 10:33:26 xxx.xxx.xxx systemd[1]: Started Mount ZFS filesystems.
[root@xxx.xxx.xxx ~] # journalctl -u zfs-import-cache.service -b 0
-- Logs begin at Tue 2018-12-25 23:01:49 GMT, end at Wed 2019-05-15 18:04:09 GMT. --
May 02 10:33:26 xxx.xxx.xxx systemd[1]: Starting Import ZFS pools by cache file...
May 02 10:33:35 xxx.xxx.xxx systemd[1]: Started Import ZFS pools by cache file.

To get live log updates you could open another terminal or use tmux and run the command journalctl -u zfs-mount.service -b 0 -f and in the other terminal run systemctl restart zfs-mount.service or zfs mount -a. If there a lot of error messages you can run journalctl -u zfs-mount.service -p err -b 0 -f to filter them.

Unfortunately, this issue does not reproduce unless the mount is attempted during the boot process. If there's an option I can pass to the daemon or the unit file to produce more verbose logging, perhaps that would yield more information?

@johnnyjacq16
Copy link

johnnyjacq16 commented May 15, 2019

@k4k
Instead of making changes to the /usr/lib/systemd/system/zfs-mount.service file by adding After=network.target manually you could allow systemd to do it for you with command systemctl edit --full zfs-mount.service and systemctl edit --full zfs-import-cache.service at which point you can add the After=network.target. Since this issue affect the boot process it may be a good idea to update the initramfs.

To get debug information at early boot, power down the machine, then start it up, when the grub menu shows up and your selected kernel is highlighted press e add systemd.debug-shell=1 systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M printk.devkmsg=on to the kernel parameter then press f10 to boot with changes, you should arrive at a shell then run the journalctl commands for more debug information. Important reboot the machine after viewing debug information where the your system will revert back to what was, as leaving the debugging on brings security risk.
refer to this site for more info about systemd debugging https://freedesktop.org/wiki/Software/systemd/Debugging/

@behlendorf behlendorf added the Type: Question Issue for discussion label May 16, 2019
@tedwardd
Copy link
Author

I've continued to debug this issue on my end on and off over the last few weeks and here is what we've uncovered. It looks like the suspicion that network.service was required for zfs-mount.service was, in fact, a red herring. Adding network.service as a requirement for zfs-mount.service served to push the startup until later in the boot process which permitted the zfs filesystems to mount successfully but this was only coincidental and had nothing to do with the actual networking services.

It looks like there were some substantial changes to the unit files in zfs-0.8.0 that I would like to test. We're working on incorporating the CentOS 7.6 repositories in to our environment at this time. Once I've had a chance to test these packages I'll report back with any further problems or, hopefully, a report that the update resolved this issue.

@Lanzaa
Copy link

Lanzaa commented Sep 4, 2019

Looking at your output from the following:
$ systemd-analyze critical-chain zfs-mount.service

zfs-mount.service +5ms
└─systemd-udev-settle.service @3.196s +2.878s
  └─systemd-udev-trigger.service @187ms +2.767s
    └─systemd-udevd-control.socket @165ms
      └─-.slice

@k4k Is zfs-mount.service not waiting on zfs-import.target and zfs-import-cache.service? I really expect zfs-import to be in that critical-chain. It is almost like your system has zfs-mount Wants=zfs-import-cache.service rather than After=.... The fix may be as simple as enabling zfs-import.target. zfs-mount.service needs to be After the import.

For reference on my system:

$ systemd-analyze critical-chain zfs-mount.service
...
zfs-mount.service +140ms
└─zfs-import.target @3.884s
  └─zfs-import-cache.service @2.494s +1.389s
    └─systemd-udev-settle.service @222ms +2.268s
      └─systemd-udev-trigger.service @160ms +60ms
        └─systemd-udevd-control.socket @158ms
          └─-.mount @128ms

and

$ systemctl list-dependencies --after zfs-mount.service
zfs-mount.service
● ├─system.slice
● ├─systemd-journald.socket
● └─zfs-import.target
●   └─zfs-import-cache.service

@rlaager
Copy link
Member

rlaager commented Dec 14, 2019

I'm closing. We can reopen if further testing reconfirms this.

@rlaager rlaager closed this as completed Dec 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question Issue for discussion
Projects
None yet
Development

No branches or pull requests

6 participants