Train Wreck. How nemo-desktop trashed both my local machine and the LAN

For some time now, I’ve been having problems where the power-save (suspend) feature of my desktop system has failed to put the machine to sleep. In some cases, in fact, the entire machine became black-screen unresponsive.

Examination of the system logs indicated that the suspend function was failing to freeze a number of tasks, thereby aborting the suspend. Indications were that it related to the glusterfs fuse client, and some tweaking of the gluster client and servers to upgrade protocol versions did help, it appeared, but only temporarily.

The other thing that I didn’t like was that the nemo-desktop task was eating one of my cores alive. I actually removed the entire cinnamon system and re-installed it, but that didn’t help. I considered moving back to gnome, but I need those monitoring widgets that gnome 3 in its arrogance dropped, and I keep a lot of icons on the desktop for quick access to hot project resources.

As it happened, I botched a cron definition on a long-running backup job, the server started launching multiple instances of it, and the gluster system took over the LAN. I fixed that, but noticed that gluster was still doing a ton of traffic to my desktop system.

And, incidentally, nemo-desktop response was painfully slow even just to pop up menus. But not regular file-explorer nemo. Only the desktop!

Digging into the toolkit (and google), I found to my horror that for some reason, nemo-desktop was opening, reading, and closing files over and over and over again. And among the files it was chowing down on were a handful of shortcuts (softlinks) to resource out on the glusterfs filesystem.

I deleted all of those links and suddenly all was calm. My network activity dropped to a whisper, and, strangely, nemo-desktop seemed to stop iterating through desktop files. So I’ve lost some convenience, but now the system performs much better (although nemo-desktop still reacts somewhat sluggishly). And power-save features now work reliably.

The curse of the mad Puppet

I have been working with various things designed to allow me to control the mousetech.com domain assets in a more centralized way. One of them was to try and use Puppet to provision machines. Puppet is a fairly nice tool, but there are some unexpected pitfalls.

There are several ways to get puppet on a CentOS 5 server. If you’d a glutton for punishment, you can always pull down the source and build from scratch, but I don’t recommend that when just getting started. You can also pull it via YUM from the EPEL repository. Or, you can import the Puppet Labs repo in and pull from there.

I already have EPEL in my stock set of YUM repositories, so that’s what I went for first. In the beginning all was fun and games. Then I got more ambitions and started defining modules. This didn’t work. Worse, the sample documentation used commands that didn’t work. It was getting very frustrating.

It became obvious fairly early that some significant changes have been made to puppet and that what I had wasn’t the Latest and Greatest. That would have been OK, except that attempts to read the online documentation for the older stuff kept leaking back into docs for the newer stuff (a not uncommon problem, best handled I think by archiving the old docs as self-contained PDFs). On top of that, the version of puppet that I was running was sufficiently antique that much of the documentation had fallen off the website (see previous).

To add to the confusion, I wasn’t really sure WHICH version of puppet I was running, since their enterprise product doesn’t keep quite the same set of version numbers as their community version plus I suspected the version (2.6.18) in the EPEL RPM wasn’t indicative, either. I finally came to the conclusion that 2.6.18 – which is the version that Centos10 pulled actually correlates to the community 2.5 version, which is something like 2.3 in Enterprise versioning.

At this point, I went to the source: Puppet Labs and found out about their repo. Unfortunately a network-based RPM install failed for obscure reasons (I’m not sure whether I have lingering LAN issues or it’s them). Fortunately, I was able to wget and install locally. After which I was able to install a Version 3 Puppet, the documentation now matches the commands and module processing works the way they said it did.

One last fly in the ointment, though. It seems that nodes and classes share the same namespace. I was using the same name for my guinea-pug machine node and one of the packages it was trying out and while both node and component parsed, the actual execution was only done against the node – the package was silently ignored. I fixed this by changing the node name to its fully-qualified domain name.