Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ceph package is broken #781

Closed
grantcurell opened this issue May 20, 2018 · 9 comments
Closed

Ceph package is broken #781

grantcurell opened this issue May 20, 2018 · 9 comments

Comments

@grantcurell
Copy link

I submitted a pull request (#780) with a working copy of ceph-deploy, which fixes the issues with the current copy where it tries to reach out to non-existent URLs to pull old copies of ceph. It now properly points to VMWare's repos, but ceph is still broken in a number of ways.

After running ceph-deploy, first the ceph-mgr will fail to come up. This is because there are root permissions inside of /etc/ceph and /var/lib/ceph instead of ceph. Changing those will bring ceph-mgr up in some cases, but there are still a number of import errors for python modules. You need:

  • prettytable
  • pecan
  • pyopenssl
  • status

After installing all of those, that worked on one of my ceph managers, but the other still fails with:

/usr/bin/ceph-mgr -f --cluster ceph --id bunkbotserv2.lan --setuser ceph --setgroup ceph
2018-05-20 11:07:13.759221 7fe30f7fe700 -1 mgr load Module not found: 'restful'
2018-05-20 11:07:13.759238 7fe30f7fe700 -1 mgr load Traceback (most recent call last):
  File "/usr/lib/ceph/mgr/restful/__init__.py", line 1, in <module>
    from module import *  # NOQA
  File "/usr/lib/ceph/mgr/restful/module.py", line 21, in <module>
    from werkzeug.serving import make_server, make_ssl_devcert
ImportError: No module named werkzeug.serving

2018-05-20 11:07:13.759492 7fe30f7fe700 -1 mgr init Error loading module 'restful': (2) No such file or directory
2018-05-20 11:07:13.763042 7fe30f7fe700 -1 mgr load Class not found in module 'status'
2018-05-20 11:07:13.763053 7fe30f7fe700 -1 mgr load AttributeError: 'module' object has no attribute 'Module'

2018-05-20 11:07:13.763064 7fe30f7fe700 -1 mgr init Error loading module 'status': (22) Invalid argument
2018-05-20 11:07:13.763081 7fe30f7fe700 -1 log_channel(cluster) log [ERR] : Failed to load ceph-mgr modules: restful, status
^C2018-05-20 11:09:32.990214 7fe3037fe700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory
2018-05-20 11:09:32.990237 7fe3037fe700 -1 received  signal: Interrupt from  PID: 0 task name: <unknown> UID: 0
2018-05-20 11:09:32.990239 7fe3037fe700 -1 mgr handle_signal *** Got signal Interrupt ***

Despite the error messages, the status module is present (I figure this must be some sort of environment issue). I'm not sure what the restful module is.

I'm not sure if it's the other ceph-mgr (I'm still teaching myself ceph), but this leaves my cluster health in warn with:

ceph -w
  cluster:
    id:     ba90e70a-f86c-48c2-a788-d6832f7d7563
    health: HEALTH_WARN
            Reduced data availability: 96 pgs inactive

  services:
    mon: 3 daemons, quorum bunkbotserv2,bunkbotsensor1,bunkbotserv1
    mgr: bunkbotserv1.lan(active)
    osd: 2 osds: 0 up, 0 in

  data:
    pools:   1 pools, 96 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs:     100.000% pgs unknown
             96 unknown


2018-05-20 11:07:13.763082 mgr.bunkbotserv2.lan [ERR] Failed to load ceph-mgr modules: restful, status
@grantcurell
Copy link
Author

grantcurell commented May 20, 2018

Update:

It was a personal problem. I have Ansible code that sets up ceph. By fixing the permissions issues in the manager before continuing everything worked and ceph came up healthy. (Albeit the module errors still exist.)

Update 2: I lied because ceph lied to me. It said ok at first - then it wasn't ok. Should have known because there are no OSDs available.

image

@grantcurell
Copy link
Author

grantcurell commented May 20, 2018

The problem is the id name in the instantiated service file.

image

For some reason, it is using \x2a. This should be 0. I haven't gone back and figured out

  1. How instantiated system services work
  2. Where to fix this is the package

The fix is this, in the folder /usr/lib/systemd/system/ceph-osd@XXXXXXX it says \x2a (not sure where it is getting that value), but the value there should correspond to the ceph osd ID. I found the id by checking the file names with ls /var/log/ceph. Change the service name to the correct ID and it comes up. Obviously, this is nothing more than a bandaid.

@grantcurell
Copy link
Author

grantcurell commented May 20, 2018

Now with that fixed, it seems kubernetes still doesn't work with ceph.

Events:
  Type     Reason                  Age              From                       Message
  ----     ------                  ----             ----                       -------
  Normal   Scheduled               3m               default-scheduler          Successfully assigned es-master-statefulset-0 to bunkbotserv2.lan
  Normal   SuccessfulAttachVolume  3m               attachdetach-controller    AttachVolume.Attach succeeded for volume "pvc-ae577376-5c22-11e8-a120-000c29dbf5c3"
  Normal   SuccessfulMountVolume   3m               kubelet, bunkbotserv2.lan  MountVolume.SetUp succeeded for volume "elastic"
  Normal   SuccessfulMountVolume   3m               kubelet, bunkbotserv2.lan  MountVolume.SetUp succeeded for volume "default-token-kh5cb"
  Warning  FailedMount             1m               kubelet, bunkbotserv2.lan  Unable to mount volumes for pod "es-master-statefulset-0_default(ace32419-5c26-11e8-a120-000c29dbf5c3)": timeout expired waiting for volumes to attach or mount for pod "default"/"es-master-statefulset-0". list of unmounted volumes=[data]. list of unattached volumes=[data elastic default-token-kh5cb]
  Warning  FailedMount             1m (x9 over 3m)  kubelet, bunkbotserv2.lan  MountVolume.WaitForAttach failed for volume "pvc-ae577376-5c22-11e8-a120-000c29dbf5c3" : rbd: map failed exit status 2, rbd output: modinfo: ERROR: Module rbd not found.
modprobe: FATAL: Module rbd not found in directory /lib/modules/4.9.99-1.ph2-esx
rbd: failed to load rbd kernel module (1)
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (2) No such file or directory

It would seem the rbd kernel module has been stripped out.

@grantcurell
Copy link
Author

I'm teaching myself a bit about the kernel modules. I know RBD is being stripped out from the mainline kernel, where is that happening? At a high level, how would I go about adding it back in? The Ceph package may as well not be in the repos without the RBD module.

@kganugapati
Copy link

http://docs.ceph.com/docs/argonaut/rbd/rbd-ko/

should this be a loadable kernel module?

@grantcurell
Copy link
Author

Yes, I think it should be available as a loadable option. It doesn't need to be loaded by default, but it does at least need to be available.

@suezzelur
Copy link
Contributor

@grantcurell thank you for the pull request

@YustasSwamp
rbd module is not a loadable module for the ESX kernel as the config is not enabled, it is only present in the generic kernel (package name is linux)

@grantcurell
Copy link
Author

@suezzelur - Forgive the ignorance. I'm going to go read. What does it mean to say, "the config is not enabled." Is it possible to go compile RBD against the ESX kernel and load it in? When you say package name, what do you mean? Is this like an RPM package you can install?

I'm going to go read about how this stuff works hahahaha. Sorry for all the questions.

@suezzelur
Copy link
Contributor

Yes, we have multiple kernel flavors. tdnf install linux and rebooting will let you boot into the generic kernel and you can do modprobe rbd after that. You can look at our config files for different kernels here https://github.com/vmware/photon/blob/2.0/SPECS/linux/config-esx#L1282
https://github.com/vmware/photon/blob/2.0/SPECS/linux/config#L1504

linux-esx is optimized to run on VMware hypervisors and may have a subset of modules that generic kernel packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants