Podman and chmod frustrated?

In theory, Podman is “just like” Docker. In practice, of course, there are a couple of big differences. Some have to do with networking, and those are relatively easy to solve. A bigger one has to do with Podma’s ability to run rootless.

Rootless operation means that you don’t have to have root privileges to run a container. Also, it means that you’ve got an extra level of security, since running under a non-root account limits what invaders can hack into.

Where it gets really frustrating is when you try and run a container that does things with file ownership and rights on a mounted volume.

It’s not uncommon, especially when using a container built for Docker that the container wants to create and/or chown directories as part of its initial setup. That doesn’t work too well when running rootless. It can and probably will run afoul of the Linux OS file protection systems in one of two ways.

selinux violation. Oddly, I’ve had containers fail due to selinux violations even though the host OS had selinux running in Permissive mode (Almalinux 9). No explanation has been found, but that’s how it is. You can add custom selinux rules to the host environment to permit it, but that will likely drop you to the other way:

Operation not allowed. Even though the active user inside the container is root, it cannot chown files/directories in mounted volumes.

Not allowed? But I’m root!

Well, yes, but only withoin your tiny little kingdom.

Now think of what happens when you mount an external volume. Maybe its an NFS fileshare with files for many different users on it, each with their own user directories. Maybe you can read other user’s files, maybe not, depending on the rights you’ve been granted.

That’s how it looks to the host OS user accout. Logged in as the host user.

But now let’s start a container which runs as its own root. If the usual root rules applied, that container could run wild over the external filesystem tree mounted to it. That would completely negate the protections of the host’s user account!

So, instead, the container’s root is prohibited from doing things that it couldn’t do as a user outside the container.

But what about subuids?

At first glance, it seems like you might be able to work around this problem using subuids. But nope. The subuid facility aliases user/group IDs inside the container to alternative user/group IDs outside the container based on the subuid maps. That’s because a container is a mini-vm and thus can have its own private set of userids independent of the container host’s userids.

The full details of the subuid mapping can be found in podmain documentation, but in its basic form, userid 0 (root) inside the container converts to the rootless user’s userid and all other internal userids are converted to their external counterparts by adding an offset defined in the subuid map to them (for example, 10000, making userid 999 map to external userid 100998 (remember, 0 is root!)

Thus, though magic not documented in the current man pages or (as far as I know in podman), the “chown” command can chown to internal userids, but not to container host userids. Same for other attribute-changing operations.

Note that since the subuids are themselfs uids (though remapped) in the container host, they also adhere to standard outside-the-container restrictions on chown and its friends. In fact, you can’t even do a directory listing on a subuid’ed directory unless you’ve been assigned rights to do so.

But assigning rights is normally controlled by the root user and it would be unfair to restrict access to what are essentially your own files and directories just because they have subuid’s! So that gives us:

Unshare

The podman unshare command effectively undoes the uid remapping. It can be used to execute a single command or invokes to start a podman unshare shell.

Inside unshare, “whoami” changes from your userid to root and shows you your internal userids without the remapping. Thus, you can do all the necessary file operations wuthout actually becoming root. Aside from hiding the remapping, unshare also is a more limited root than sudo. For example, you cannot list the contents of the OS /etc/shadow file, nor should you be able to look into/alter the files and directories of other users.

Volumes

Last, but not least, there’s another way to have chrootable directories. The Podman (Docker) volume allows creation of non-volatile storage that’s integrated with the container system. Meaning that userids of assets within the volume should not be affected as they are sealed off from the rest of the filesystem.

Volumes were always convenient, but especially when using quadlets to manage containers. When I manually started containers, I often had data stores within the container image and thus information was “permanent” as long as I used that image. But quadlets destroy and re-create containers, so it’s not possible to do that. Instead, put the non-volatile data in a volume (which can be quadelet-managed) and attach the volume to your container. Solves the potential for data loss and makes it easier to make containers elastic.

“Instant OSD” — a fast way to bring up a VM as a Ceph OSD node

Running Ceph OSDs in virtual machines may not be considered as “best practice”, but it’s nevertheless popular. VMs are easier to manage on the whole and prior to Ceph becoming container-based, were a lot easier to configure and control without getting all confused with other host subsystems.

This isn’t quite a “one button” solution, but it’s close. There’s some manual network configuration as there would be for any Ceph host, but the bulk of the work is done via a simple shell script and Ansible.

So, for your edification and enjoyment: https://gogs.mousetech.com/mtsinc7/instant_osd

IMPORTANT NOTICE

A lightning strike took mousetech.com off the Internet for several days and has caused problems for the Gogs server. We hope to have it back online soon. (2024-07-27)

How to run a container that crashes on startup

one of the most frustrating things about running with containers is when a container fails immediately on startup.

This can happen for a number of reasons, and not all of them record errors in logfiles to help diagnose them.

However, there’s a fairly easy way to get around this.

Start the container with the “interactive” option and override the “entrypoint” option to execute “/bin/sh”. This will do two things.

  1. Instead or running the normal container startup, it will start the container executing the command shell
  2. The “interactive” option holds the container open. Without it, the command shell sees an immediate end-of-file and shuts down the container.

At this point, you can then use the docker/podman “exec” command to dial in to the container like so:

docker exec my_container /bin/sh

At that point, you can inspect files, run utilities, and do whatever is necessary to diagnose/repair the image.

An additional help is also available once you have a tame container running. You can use the docker/podman “cp” command to copy files into and out of the container. Many containers have minimal OS images and have neither an installed text editor nor a package installer to install a text editor. So you can pull a faulty file out of a container, fix it, and push it back. The changes will persist as long as you restart the container and don’t start a new instance from the original image.

When things go really bad

Most of the time when Linux has serious problems, it can run in a reduced capacity. But sometimes that capacity is reduced to a really painful point. Specifically, when the root filesystem gets mounted as “read only”.

This usually happens if the root filesystem is detected as corrupt, although recently I had it happen when a core mount failed — and because the root filesystem was read-only, I couldn’t edit the failing mount!

There’s an old technique I used to use for things like this, but changes in the boot process (grub) and system initialization (systemd) caused them to no longer be usable.

Fortunately, there’s a modern-day alternative.

When your system’s grub menu comes up, edit the OS kernel options line. That’s the line that references the “vmlinuz” root file.

Add the option rd.break to that line and proceed with booting. The “break” option will cause the initial ramdisk to load and set up a minimal OS environment capable of loading and initializing the full kernel, but it will halt at the point where the primary filesystem root has been temporarily mounted under /sysroot. You can unmount it and run filesystem repairs, if needed, chroot to it and reset the root password or fix show-stopping errors (like my bad fstab file!). and then reboot the repaired system.

Of course, this is only one recovery option. For best reliability it’s a good idea to keep a stand-alone boot “disk” (USB or whatever) and/or a recovery PXE boot.

For much more detailed information, look here: https://docs.fedoraproject.org/en-US/fedora/latest/system-administrators-guide/kernel-module-driver-configuration/Working_with_the_GRUB_2_Boot_Loader/

Note that on RHEL9, this process apparently broke for a while. An alternative option was to add a boot into /bin/bash on the grub boot command line.

Puppetserver in a can —Docker versus Podman

Things have been too exciting here lately. Attempting to set up Ceph, having a physical machine’s OS get broken, having a virtual machine get totally trashed and rebuilt from the ground up.

It was this last that gave me an unexpected challenge. When I built the new container VM, I moved it from CentOS 7 to AlamLinux 8. Should be no problem, right?

I sicced Ansible on it to provision it — creating support files, network share connections… and containers.

One thing that’s different between CentOS 7 and CentOS (IBM Linux) 8 systems is that the actual container manager changed from Docker to Podman.

For the most part, that change is transparent to the point that if you give a Docker command, it is passed more or less verbatim to Podman.

But one container proved difficult: the Puppetserver. It steadfastly refused to come up. It claimed that it couldn’t find the cert it was supposed to have constructed. Or that the cert didn’t match itself, if you prefer.

Tried it with Podman on my desktop according to instructions. No problem. Changed my Ansible provisioning from “docker_container” to “podman_container”. No luck. Did an extra evil trick that allowed be to freeze startup so I could dial into the container. Cert directories were incomplete or empty!

I hacked deeper and deeper into the initialization process, but no enlightenment came. So finally I tried manually running the container with minimal options from the VM’s command line.

It still failed. Just for giggles, I did one more thing. Since Docker requires a root environment and I was trying to keep changes to a minimal, I was running Podman as root. I tried launching the Puppetserver container from an ordinaty user account.

And it worked!

I’m not sure why, although I suspect a couple of things. I believe that Pupper moved loctions on some of its credential files and was possibly counting on references to the old locations to redirect. Maybe they didn’t because some of this stuff is apparently using internal network operations and networking works differently in userspace.

At any rate, simply running the container non-rooted was all it took. Happiness abounds and I’ve got my recalcitrant server back online!

Ceph — Moving up

OK. I moved from Ceph Octopus to Pacific.

The reason I originally settled on Octopus was I was thinking that it was the newest release that had direct CentOS 7 support for my older servers. I was wrong. Actually CentOS 7 maxed out at Nautilus.

Octopus has been a problematic release. A bug kept scheduled cephadm tasks from executing if even the most trivial warnings existed. And trivial warnings were endemic, since there were major issues in trying to get rgw running. Caused, in part, I think, from a mish-mash of setups via older and cephadm component deployments. Certain directories are not in the same place between the two.

I also couldn’t remove failed rbs pool setups. They kept coming back after I deleted them, no matter how brutally I did so.

And finally, a major lock-up crept into the NFS server subsystem. Although it turns out that the ceph mount utility for Nautilus works just fine for me with Octopus servers so I was able to re-define a lot of NFS stuff to use Ceph directly.

Before I proceed, I should note that despite all my whinging and whining, the Ceph filesystem share has been rock solid. Much more so than Gluster was. Despite all the mayhem and random blunderings, the worst I ever did to the filesystem was trash performance when I under-deployed OSDs. Never did I have to wipe it all out and restore from backups.

OK. So I finally bit the bullet. I’d fortuitiously run across the instructions to level-up all the Ceph depoyments to be cephadm based the day before. Now I followed the simple upgrade instructions.

I had concerns because I had a number of failed rgw daemons I couldn’t get rid of, but actually the worst thing that happened was it spend all night flailing because an OSD had gone offline. Restarted that and things began to happen. Lots of progress messages. Occasional stalls (despite what one source told me it’s NOT desirable to have my Ganesha pool have no replicas!),

But amazingly it was done! Now deleted rgw dæmos stay deleted! I might even get them hooked into my VM deployments! And since one of Pacific’s improvements had to do with NFS, I hope Ganesha will be happy again.

Ceph – another mystery error

Cannot place on myhost: Unknown hosts

Ran into this when attempting to use ceph orch to manage mgr’s.

It’s actually quite simple and has nothing (directly) to do with /etc/hosts or DNS. Only with the list of hosts that ceph knows internally.

And the problem — yet again — is that Ceph doesn’t manage multiple hostnames!!!

The hostname was registered as “myhost.mousetech.com”, but I had requested like this:

ceph orch apply mgr --placement="ceph01 myhost"

“ceph01” was OK, since that’s how Ceph had been set up. But when I added “myhost”, I did so with the fully-qualifed domain name. The bloody names must match what Ceph has on file exactly. Grrr!

So the proper command was:

ceph orch apply mgr --placement="ceph01 myhost.mousetech.com"

Ceph

I migrated from Gluster to Ceph a while back, since Gluster appears to be headed towards oblivion. It’s a shame, since Ceph is massive overkill for a small server farm like mine, but that’s life.

On the whole, Ceph is well-organized and well-documented. It does have some warts however:

  • The processes for installing and maintaining Ceph are still in flux as of Ceph Octopus, which is the most recent my CentOS 7 servers can easily support. Ny my count, we’ve gone through the following options:
    • Brute force RPM install (at least on Red Hat™-style systems)
    • Ansible install(deprecated)
    • ceph-deploy command. Apparently not included in my installation???
    • cephadm command. The presumed wave of the future and definitely an easy way to get bootstrapped
  • Certain error messages don’t convey enough information to understand or correct serious problems. For example:
  • monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2,1]. What does this even mean? How do I make it go away???
  • A number of systemd dæmons can fail with nothing more informative than “Start request repeated too quickly.” Often the best way to get usable (or sometimes useless) information on why the dæmon failed is to run the associated service manually and in the foreground. Especially since neither the system journal nor default log levels give any hints whatsoever.
  • Ceph handles the idea that a host can have multiple long and short names very poorly.
  • I have managed to severely damage the system multiple times, and I suspect a lot of it came from picking the wrong (obsolete) set of maintenance instructions even when I looked under a specific Ceph documentation release. Fortunately the core system is fairly bulletproof.

As a result, I’ve been posting information pages related to my journey. Hopefully they’ll be of help.

Waiting for a device: /dev/ttyUSB0

Quite a few people over the years have run into the problem, what to do when a service requires a device, but the service tries to go online before the device does?

One of the most common cases is if you have something like a display device driven by USB serial. That is, “lcddisplay.service” needs /dev/ttyUSB0.

Search the web for that and you’ll generally see uncomfortable hacks involving udev, but actually it can be much easier than that.

The systemd resource control subsystem for Linux can be configured to wait for filesystem resources using the resource pathname. So, for example, if I have a systemd mount for /export/usernet/, and I want to do things with files in that directory, I can set up the dependent service to be After=export-usernet.device”. Note that the “device” path is always absolute and that the slashes in the path are replaced with dashes, because otherwise the systemd resource pathname would appear to be a directory subtree. Incidentally, names with dashes in them double the dashes to avoid confusion with a slash-turned-dash.

However, things like /dev/ttyUSB0 are a little different, as they are managed not by the filesystem per se, but by udev – the hotswap manager. So does this trick still work?

Yes, it does! simply make your dependency on dev-ttyUSB0.device and you’re home free.

That doesn’t solve all your problems, since USB doesn’t always allow you to definitively determine which USB device will get which device ID. You can set up udev rules, but in the case of some devices like Arduinos, there is no unique serial number to map to, and in fact, sometimes unplugging and plugging can turn a USB0 into a USB1. But that’s a flaw in hardware and the OS can only do so much. Still, we can make it a little easier by doing this trick.

How to (really) Setup Bacula director on CentOS 7

I’ve seen quite a few examples of “how to setup bacula on CentOS7” and they all stink.

Mostly, they’re just re-warmed procedures from CentOS 6 or other distros, and they either don’t work or don’t work well.

I’m interested in setting up Bacula in a disaster recovery operation. I cannot afford wrong directions or directions that require elaborate procedures. So this is the quickest way to get the Bacula director set up and running using stock Centos 7 Resources.

The test platform used here was CentOS 7.6. Any earlier release of CentOS 7 should do as well. Later releases will probably work, but I make no guarantees for CentOS 8. After all, the CentOS 6 methods don’t work anymore!

In CentOS 6, there were 2 pre-configured bacula-director rpms. One was for using MySQL as the database backend, one was for using PostgreSQL as the backend. This changed in CentOS 7. In CentOS 7 there is only one director package and it’s defaulting to PostgreSQL.

So here’s the procedure:

Install Bacula Director

  1. Use RPM or Yum to install package bacula-dir. You’ll probably also want to install bacula-storage as well, since you cannot backup/restore without one!
  2. Configure the /etc/bacula/bacula-dir.conf file to set the passwords (hint, search for “@@”). Change the bacula-sd.conf and bconsole.conf files so that their passwords match up to the director passwords you chose.
  3. Also install and configure bacula-client package on whatever target machine(s) you are going to run bacula against.
  4. That should do for the moment. You’ll probably want to define bacula client profiles and jobs eventually.

Install PostgreSQL Server

  1. Use RPM or Yum to install package postgresql-server.
  2. Use this command to initialize the database:
postgresql-setup initdb

3. Start the postgresql server

systemctl enable postgresql
systemctl start postgresq

Initialize the Bacula database

This is the trickiest part. For one thing, postgresql doesn’t like you running a postgresql client as root. So the solution is to sudo in the opposite direction:

sudo -u postgres -i
# cd to avoid having server whine about root directory
cd
# Now run the builder scripts IN THIS ORDER
/usr/libexec/bacula/create_postgresql_database
/usr/libexec/bacula/make_postgresql_tables
/usr/libexec/bacula/grant_postgresql_privileges

^D (to exit sudo)

That’s it! (re)start bacula-dir and you should be able to connect with bconsole.

Special note on bacula scripts:

If you look at the /usr/lib/libexec/bacula/ directory, you’ll see a set of generic scripts (create_database, make_tables, grant_privileges). These are generic scripts and about all they actually do is determine which database type you’re set up for and invoke the specific setup scripts. However there is some extra lint in there and in my system it caused the generic scripts to complain. So I recommend invoking the postgresql scripts directly.