Podman and chmod frustrated?

In theory, Podman is “just like” Docker. In practice, of course, there are a couple of big differences. Some have to do with networking, and those are relatively easy to solve. A bigger one has to do with Podma’s ability to run rootless.

Rootless operation means that you don’t have to have root privileges to run a container. Also, it means that you’ve got an extra level of security, since running under a non-root account limits what invaders can hack into.

Where it gets really frustrating is when you try and run a container that does things with file ownership and rights on a mounted volume.

It’s not uncommon, especially when using a container built for Docker that the container wants to create and/or chown directories as part of its initial setup. That doesn’t work too well when running rootless. It can and probably will run afoul of the Linux OS file protection systems in one of two ways.

selinux violation. Oddly, I’ve had containers fail due to selinux violations even though the host OS had selinux running in Permissive mode (Almalinux 9). No explanation has been found, but that’s how it is. You can add custom selinux rules to the host environment to permit it, but that will likely drop you to the other way:

Operation not allowed. Even though the active user inside the container is root, it cannot chown files/directories in mounted volumes.

Not allowed? But I’m root!

Well, yes, but only withoin your tiny little kingdom.

Now think of what happens when you mount an external volume. Maybe its an NFS fileshare with files for many different users on it, each with their own user directories. Maybe you can read other user’s files, maybe not, depending on the rights you’ve been granted.

That’s how it looks to the host OS user accout. Logged in as the host user.

But now let’s start a container which runs as its own root. If the usual root rules applied, that container could run wild over the external filesystem tree mounted to it. That would completely negate the protections of the host’s user account!

So, instead, the container’s root is prohibited from doing things that it couldn’t do as a user outside the container.

But what about subuids?

At first glance, it seems like you might be able to work around this problem using subuids. But nope. The subuid facility aliases user/group IDs inside the container to alternative user/group IDs outside the container based on the subuid maps. That’s because a container is a mini-vm and thus can have its own private set of userids independent of the container host’s userids.

The full details of the subuid mapping can be found in podmain documentation, but in its basic form, userid 0 (root) inside the container converts to the rootless user’s userid and all other internal userids are converted to their external counterparts by adding an offset defined in the subuid map to them (for example, 10000, making userid 999 map to external userid 100998 (remember, 0 is root!)

Thus, though magic not documented in the current man pages or (as far as I know in podman), the “chown” command can chown to internal userids, but not to container host userids. Same for other attribute-changing operations.

Note that since the subuids are themselfs uids (though remapped) in the container host, they also adhere to standard outside-the-container restrictions on chown and its friends. In fact, you can’t even do a directory listing on a subuid’ed directory unless you’ve been assigned rights to do so.

But assigning rights is normally controlled by the root user and it would be unfair to restrict access to what are essentially your own files and directories just because they have subuid’s! So that gives us:

Unshare

The podman unshare command effectively undoes the uid remapping. It can be used to execute a single command or invokes to start a podman unshare shell.

Inside unshare, “whoami” changes from your userid to root and shows you your internal userids without the remapping. Thus, you can do all the necessary file operations wuthout actually becoming root. Aside from hiding the remapping, unshare also is a more limited root than sudo. For example, you cannot list the contents of the OS /etc/shadow file, nor should you be able to look into/alter the files and directories of other users.

Volumes

Last, but not least, there’s another way to have chrootable directories. The Podman (Docker) volume allows creation of non-volatile storage that’s integrated with the container system. Meaning that userids of assets within the volume should not be affected as they are sealed off from the rest of the filesystem.

Volumes were always convenient, but especially when using quadlets to manage containers. When I manually started containers, I often had data stores within the container image and thus information was “permanent” as long as I used that image. But quadlets destroy and re-create containers, so it’s not possible to do that. Instead, put the non-volatile data in a volume (which can be quadelet-managed) and attach the volume to your container. Solves the potential for data loss and makes it easier to make containers elastic.

“Instant OSD” — a fast way to bring up a VM as a Ceph OSD node

Running Ceph OSDs in virtual machines may not be considered as “best practice”, but it’s nevertheless popular. VMs are easier to manage on the whole and prior to Ceph becoming container-based, were a lot easier to configure and control without getting all confused with other host subsystems.

This isn’t quite a “one button” solution, but it’s close. There’s some manual network configuration as there would be for any Ceph host, but the bulk of the work is done via a simple shell script and Ansible.

So, for your edification and enjoyment: https://gogs.mousetech.com/mtsinc7/instant_osd

IMPORTANT NOTICE

A lightning strike took mousetech.com off the Internet for several days and has caused problems for the Gogs server. We hope to have it back online soon. (2024-07-27)