Ceph

I migrated from Gluster to Ceph a while back, since Gluster appears to be headed towards oblivion. It’s a shame, since Ceph is massive overkill for a small server farm like mine, but that’s life.

On the whole, Ceph is well-organized and well-documented. It does have some warts however:

  • The processes for installing and maintaining Ceph are still in flux as of Ceph Octopus, which is the most recent my CentOS 7 servers can easily support. Ny my count, we’ve gone through the following options:
    • Brute force RPM install (at least on Red Hat™-style systems)
    • Ansible install(deprecated)
    • ceph-deploy command. Apparently not included in my installation???
    • cephadm command. The presumed wave of the future and definitely an easy way to get bootstrapped
  • Certain error messages don’t convey enough information to understand or correct serious problems. For example:
  • monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]. What does this even mean? How do I make it go away???
  • A number of systemd dæmons can fail with nothing more informative than “Start request repeated too quickly.” Often the best way to get usable (or sometimes useless) information on why the dæmon failed is to run the associated service manually and in the foreground. Especially since neither the system journal nor default log levels give any hints whatsoever.
  • Ceph handles the idea that a host can have multiple long and short names very poorly.
  • I have managed to severely damage the system multiple times, and I suspect a lot of it came from picking the wrong (obsolete) set of maintenance instructions even when I looked under a specific Ceph documentation release. Fortunately the core system is fairly bulletproof.

As a result, I’ve been posting information pages related to my journey. Hopefully they’ll be of help.