Creating a Functional Custom CentOS Install DVD

There are several existing How-To’s out on the Internet on the subject of creating a custom CentOS installation DVD/USB storage medium. But unfortunately, actually trying to employ them can be frustrating. So here goes with Yet Another How-To that I hope will fill some of the holes.

Why Custom media?

Why even bother with a custom install? Why indeed. The mousetech.com server farm is a fairly typical in-house setup. It has extensive provisioning capabilities, daily backups, filesystem mirroring and failovers for High Availability. If a server dies, it’s relatively easy to reconstruct it.

But what if a meteor hits the server complex or war breaks out and I have to flee to Argentina? How do I minimize the time and effort required to reconstruct the essential frameworks?

One way is to define a master bootstrap server that can be used to rebuild the main provisioning systems. The master bootstrap doesn’t run in normal operations. It’s independent of them. The normal servers distribute their functions among many machines, VM’s and containers, but the master bootstrap compacts their core functions down onto one temporary machine.

I could do the master bootstrap functions via a stock Centos install DVD set and a kickstart file, but by creating a custom install with the essential packages and kickstart (and customization scripts/data), I can make this a completely unattended operation. And when things are in total disaster mode, the less I need to remember to do, the better.

One note. An unfortunate consequence of the curent CentOS and related OS distros is that the old reliable convention of expecting there to be an eth0 device to network through is pretty much shot. I don’t assume any particular physical machine to be the target of this install, and therefore cannot predict what names the installation will assign to the network ports. So rather than find false comfort, I leave actual NIC setup to manually configuring the /etc/sysconfig/network-scripts after the installation has taken place. Similarly, since I install with no known network or gateway, everything is self-contained – no external servers.

And with that, I begin.

Step 1 – Build a workspace

Making a DVD requires a lot of disk space. So find a place on your build system with lots of room and create a workspace directory. We’ll mostly work out of there. For convenience, I’m going to call this directory “buildiso” and so it’s going to have a pathname of something like /home/timh/buildiso.

cd to this directory. All relative paths given in this howto are relative to this directory.

Step 2 – Start with an existing image

Creating an installation CD from scratch is a monumental task and not worth the effort. So rather than do that, let’s do what everyone else does and modify an existing image. Because this is panic recovery and I want it all on a single medium, I’m going to use the CentOS minimal image and build on it.

So to begin, we make a file-by-file copy of the DVD image. If you have a physical DVD mounted you should be able to copy that. If you have an iso image, mount it like so:

cd /home/timh/buildiso
mkdir mnt     # this is where we'll mount the source ISO file (if we use one)

mkdir bootisoks # this is where we build our new ISO image
mount -t iso9660 -o loop /path/to/centos-disc.iso mnt

Copy the source files from your mounted DVD or loop mount into the bootiso directory. You can use the Linux cp command, rysnc, or whatever you like as long as it copies all files and directories, including the hidden ones.

Once you’ve copied the files, you can unmount the ISO (or DVD). You don’t need it anymore unless you have to go back to the beginning.

Now that we have our model files, change them to be writeable so we can play with them:

chmod -R u+w bootisoks

Create a kickstart file and copy it to the iso image isolinux directory:

cp my_ks.cfg bootisoks/isolinux/ks.cfg

This will end up in the root of the actual DVD we’re creating.

Add any additional RPMs we want to the workspace iso package directory. These can be additional stock RPMs, third-party RPMs or your own custom-built RPMs. They all go directly into the bootisoks/Packages directory.

The CD install process for CentoOS 7 uses yum and the yum repository it uses is stored on the disc itself. Part of the repository infrastructure is the repodata directory and as a precaution, you should make a backup copy of it.

Most of the files in repodata have long twisty names designed to help mirrors keep in sync when distributed. We actually only care about one of them, so we’ll steal that one for later use:

cp bootisoks/repodata/*-comps.xml comps.xml

Gotcha #1

This file defines the installation groups, including the most critical one of all, which is base. So you’re going to need to inject that into the process of building the updated repodata. If you do not, the Linux installer will whine and fail.

Here’s how to properly reconstruct the repodata:

cd bootisoks

rm -rf repodata

createrepo -g ../comps.xml -dp .

cd .. # return to our workspace directory

Gotcha #2

It is critical that the newly-created repodata be pointing to the correct location for the RPMs that will be installed, which is to say the Packages directory. To verify that this happened, you can use this command:

less bootisoks/repodata/*-primary.xml.gz

This is a compressed file, but “less” is helpful and will display it uncompressed.

What you need to see in the ‘package type=”rpm”‘ elements are location sub-elements that look like this:

 <location href="Packages/NetworkManager-glib-1.12.0-6.el7.x86_64.rpm"/>

If you don’t see “Packages/” in the location, then yum won’t look in the DVD’s Packages directory, and it won’t find the RPMs. It will whine profusely and the install will fail.

This is why, contrary to some examples, you should run the createrepo program from the bootisoks directory and not the Packages directory. It will presumably scan (and add) any other rpms it finds in the bootisoks tree, but since there shouldn’t be any, that’s OK. If there’s an option to get the proper location without scanning everything, the createrepo documentation is too vague and my experiments haven’t been productive. Although here’s something I don’t think I tried (Note: I tried. It failed.):

createrepo -g ../comps.xml -dp Packages

Check the primary.xml.gz file and if the location is correct, use that.

The other “gotcha” on package installation can come if you omitted a pre-requisite package needed by one of the packages you are installing. The Linux installer packaging log will name any missing packages, in which case add them to the Packages directory, delete and rebuild the repodata and try again.

Step 3 – Activate the Kickstart file

To use your custom kickstart file, you need to define it to the bootloader directives in the isolinux directory. This is basically a grub menu file named isolinux/isolinux.cfg and you can use sed to update the different boot options in a single swoop like this:

sed -i 's/append\ initrd\=initrd.img/append initrd=initrd.img\
  ks\=cdrom:\/ks.cfg/' bootisoks/isolinux/isolinux.cfg

In other works, add “ks=cdrom:/ks.cfg” to the “append” statements in that file.

Special Gotcha: I ran into serious problems when I created my first image because I gave my DVD a custom volume label (using the -V on the mkisofs command). This proved fatal, because the “append” statements refer to the install media by its label and when the installer could not find a volume with that label. Which resulted in the following cryptically useless installation output:

Starting dracut initqueue hook….

At which point the whole thing would hang forever. Changing the volume ID in isolinux.cfg fixed that.

Step 4 Build the image

At this point, we’ve installed and activated our custom kickstart and setup the packages and repodata. If you have any custom scripts or data files to add the the image, do it now. And then build the ISO file, like so:

cd bootisoks
mkisofs -o ../boot.iso -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -V "CentOS 7 x86_64" -R -J -v -T isolinux/. .
cd ..

You can make the ISO bootable from a thumb drive by doing this:

isohybrid boot.iso

And finally, add the checksum so that media testing will work properly.

implantisomd5 boot.iso

At this point, you should be able to burn the iso to DVD or “dd” it to a USB media device. Happy booting!

If you have problems:

This posting has attempted to correct and amplify what I have learned elsewhere. But of course, it’s likely to have introduced a few errors of its own. Because I don’t want to have to deal with spammers and abusers, this website doesn’t allow comments, but if you have questions or comments, I can be contacted through the Linux forum at http://coderanch.com

References:

https://serverfault.com/questions/517908/how-to-create-a-custom-iso-image-in-centos

https://www.frankreimer.de/?p=522

Memory testing on Asus M5A97

Hurricane Michael touched only lightly in Northeast Florida, which was a relief after what Irma did the previous year. However, we weren’t completely unscathed. and the lights did blip and the UPS kicked in, but not soon enough. Thereafter, my desktop machine’s energy-saving modes started acting up.

This is a machine that had been swapped with a production server when it acted up there (turns out the CPU was overheating and needed some thermal paste), but it had already been diagnosed with bad RAM once, so when lesser remedies failed, I kicked the box into memtest86+.

The results were appalling. There were 2 4GB sticks and one 8GB stick in the machine and I tested them alone and in combination. The 8G stick failed outright, and the 4 GB sticks would test fine, but only one at a time. Add 2, and the machine would fail. Anything more than 4GB total in the box would blow it.

This wasn’t simply a test fail – the entire box would reboot shortly after the test started, before the CPU even hit full operating temperature. I feared the worst and bought another motherboard and more RAM.

When that came in, I tested it. And got the exact same results. The new motherboard rebooted. And the new memory sticks were both 8GB, and they both caused reboots. Anything more than 4 GB would fail on either motherboard.

Such consistency leaves only one other failure point and that’s the venerable memtest86+ test program itself. Not something that you’d usually expect to fail so catastrophically, especially since the latest Fedora release was installed, but I’d tweaked the daylights out of the the BIOS (including manually setting RAM timings), and no luck. And, incidentally, memtest86+ was displaying the wrong RAM timings.

So I did some searching for alternatives and found 2. One will run on a live Linux OS, although of course, its ability to test RAM is limited by having to work around the RAM being used by the OS and apps. The other is memtest86, which is what memtest86+ was forked from and is now available in both free and paid models.

I tried memtester, which runs under Linux, and it did flunk part of my original 8G stick. Then I tried memtest86. Unlike memtest86+, it did not spontaneously reboot. In fact, all the RAM passed!

Since memtester did claim certain bits were bad, I need to do more research, but apparently despite having been updated are recently as this past July, memtest86+ apparently lacks decent support for UEFI, DDR4 and who knows what else, making it essentially useless even for as dated a board as the M5A97. And yes, I know it’s showing its age, but it meets all the necessary criteria for my needs.

Using the Millright CNC machine to make custom printed circuits

Synopsis

This is the anchor for what will probably be a whole series of notes on using the Millright CNC machine. This first posting give some overview info about CNCs.

Basic info about the Millright CNC

The Millright CNC is a Computer Numerical Control machine. It functions much like a 3D printer, except that instead of adding material, it uses a process of cutting away (milling) material. Even much of the architecture and control circuitry is the same as for a 3D printer.

The Millright M3 CNC machine is available as a (relatively) inexpensive kit. I took about 4 days to assemble it, since I didn’t want to rush anything and I didn’t (yet) know what did what, how, or where.

The CNC itself is merely the platform. To do useful work you also need the following:

  1. A cutting device motor in a suitable carrier that bolts to the vertical (Z) assembly. One of the recommended options for this is the DeWalt DWP-11 finish router, which is what I’m using
  2. A “bit” (mill end) that chucks into the cutting motor. These come in many types depending on what you want to cut and what shape you want the cut to be.
  3. A shop vacuum, if you don’t want to end up quickly up to your knees in shavings.
  4. A vortex separator. If you don’t want the shop vacuum’s filters to clog up in 5 minuter or less. I use one that clamps to the top of a generic hardware store paint bucket (5 gallon size).
  5. An easy way to cut power to the CNC and motor. A power distribution strip with an on/off switch will do. A big red panic button is optional.

Cautions

A CNC machine is a lot scarier than a 3D printer. Think of a table saw that not only wants to randomly shred both you and anything you’re working on, but a killer robot that can rapidly strike out in unexpected directions. Think carefully before you set it in motion and make sure you can stop it quickly!

Next up: designing a printed circuit board for CNC milling (coming soon!)

 

DigiSpark ATTiny85 Revisited

Finally got the thing to program. I went and bought some of the semi-bare USB units. All told, I think you can get this device in 4 different forms (or more).

  1. Bare chip. Should be easy to program as long as you have the right voltages and drivers.
  2. Bare chip-on-a-board. Same as bare chip but it’s on its own breakout board with a little support circuitry
  3. USB with bare connection. Like some cheap thumb drives, the card edge plugs straight into a USB port.
  4. USB with mini/micro connection. Basically #3 except you connect via a standard USB cable.

It was case #4 I was having so much trouble with, so I bought some #3s figuring that they might match available docs better.

They did, and it gave me a clue as to why the #4 boards didn’t program.

The ATTiny85 is programmed via a slightly non-standard Arduino service called Micronucleus. Micronucleus goes straight to the USB hardware. And by straight, I mean that it doesn’t even expect the device in question to be a named OS device. In Linux, that means nothing appears under /dev when you plug the ATTiny USB device in.

I installed the udev rules given for Ubuntu into my Fedora system. I haven’t dissected them, but I’m pretty sure that that’s what they’re for – capturing the hotplug of the ATTiny device and keeping it from mapping to /dev. The access rules given were 0666 and by reading carefully over available documentation I learned that often running the Micronucleus utility as root would clear up the error I was getting: “Assertion `res >= 4′ failed.”

666 doesn’t allow “execute” rights, so maybe on Fedora that’s a problem. The other alternative would be an selinux problem, but my audit logs don’t seem to indicate that.

So, by running the entire IDE as root (pending further discovery), I can now upload to the DigiSpark. Once I had the case #3 units, the case #4 units worked as well. Apparently they’re essentially identical except for their connection hardware.

I’m now poised to enjoy this inexpensive but useful little device. All is not forgiven, which is why I leave my original complaints posted. But at least I no longer have a box of useless parts.

 

 

Why I’m not using DigiSpark’s ATTiny85 in Almost Everything

The DigiStump ATTiny85 board  is a really attractive bit of hardware. It’s cheap, it can be accessed via the on-board USB connector, and while it hasn’t got the advanced hardware features of its bigger kin, there are a lot of things you can do with its small number of ports, memory and features. I’ve got a list, in fact.

And I’m not getting anything on that list done, because I cannot program the device!

A good comparison to the ATTiny85 is the ESP-01 mini-board featuring the ESP8266 processor. That’s an 8-pin board, also with relatively few connectors. In fact, it doesn’t even have a USB connector. And in theory, it should be harder to work with, since it runs an internal WiFi tcp/ip stack!

But I haven’t had any problems with the ESP-01, while my ATTiny85 units sit in a box, unused and useless.

Why? Apparently this was a hit-and-run project. The documentation, once written, has no indications that it’s being kept up-to-date. There’s a wiki, but so far it’s not been of any help.

There are several things I fault in the documentation.

  1. As mentioned, I don’t think it’s up-to-date as regards current Arduino IDEs or the OS’s they run under. Vague hints are given that certain Version 1.6 IDE’s are not suited (why?????), but the current Arduino IDE is version 1.8. What does that mean?
  2. A big part of the documentation is an animation. I don’t like animations in the middle of instructions.
    1. The motion distracts from reading the bulk of the text.
    2. You cannot print out the documentation and read/annotate it offline. Unless you’re Harry Potter rendering the Daily Prophet, animations don’t print well.
    3. If your animation is not only auto-playing, but starts playing sound when the page is opened, that’s it. I’m gone. Don’t try to sell me anything again. Ever.
  3. Pre-requisites. A side link points Linux users to some udev rules. I have problems with this, because first, the rules don’t actually explain what they are doing (If I read them correctly, they prevent the ATTiny85 from automatically creating /dev/ttyUSB and /dev/ACM devices, but don’t tell udev not to try and treat them as mountable USB drives). And secondly, there’s no explanation as to what to expect when things work right. Or, worse, what you’ll see/not see, if they’re wrong. Popular udev rules often end up as part of stock distros, so it’s important to know if you’re likely to do something that’s redundant or even counter-productive.
  4. Devices. Unlike most Arduino interfaces, I don’t think that the DigiSpark programmer actually uses any of the listed available devices on the Tools menus, but instead talks straight to hardware (presumably by scanning USB and looking for one (or more???) 16d0:0753 MCS Digistump DigiSpark units. But it would be nice if that had been explicitly mentioned, precisely because it’s not the usual mode of operation.
  5. Operation. There’s no indication of what you should see when a sketch uploads. The IDE isn’t uploading via normal channels, so its own messages are actually misleading. And it was only after a lot of looking around that I even saw a listing of something more like what’s to be expected.
  6. Diagnosis. As far as I can tell, there’s absolutely no error messages that ever print out if the IDE doesn’t connect, much less what couldn’t be connected to or possibly why. And so, I’m left frustrated, with no clue as to which of several subsystems are at fault. Much less how to diagnose or correct them.

Bottom line

Yes, it’s a nice device. Too bad I cannot use it. It takes up space in my parts box. And until Digispark spends some time and effort on making it useful, I won’t be buying any more of them. Because no matter how cheap they are, cheap and useless is too expensive. I won’t be buying any Digispark products until I hear that they’re committed to making said products usable. Even if this isn’t one of their more profitable units, it indicates how little they are willing to commit and thus a baseline on how much confidence I can expect from more advanced offerings.

I’ve got decades of experience on all sorts of equipment. It’s generally been my job to figure out how to work with new and unusual hardware and software. But this is simply more trouble than it’s worth.

Quick-printing recipes with a Bluetooth POS Thermal Printer

The problem

When I need a break from technology, I garden, growing herbs and the odd vegetable. That seques into cooking with what I’ve grown.

I’ve got lots of recipes for lots of cuisines, Mexican, Italian, Indian, English, German, Chinese, the Good Old USA, Panama, the world over. They’re held in many forms, 3×5 index cards, 4×6 index cards, loose-leaf binders, hardbound books, and saved web pages. But one of my principal “go-to” places for recipes is my desktop computer.

The Gourmet Recipe Manager is a very useful open-source program. It’s easy to use, easy to search, and while I have had some times when I had issues with the databases, generally easy to maintain. It can even scrape recipe webpages. At one time, in fact, it came with a set of templates that understood many common recipe websites and could automatically parse them, although that feature hasn’t been available for a while.

But having a database is one thing. Using it is another. I keep the recipes database file on one of my servers, where it’s not only accessible from the desktop app, but also from a custom webapp I wrote that allows me to search and display recipes on a tablet device.

You’d think that was enough, but I don’t really care to toss a tablet around in the kitchen and the screensaver turns things off at annoying times or else I have to burn power to keep the screen on. While I’ve been tempted to take my original epaper Nook and make it a permanent kitchen fixture, it hasn’t really been that attractive an idea.

So there are 2 other options I had. One, write down the essentials on paper or 2, print them. If I write them manually, I can’t read my own writing, but it seemed such a waste to fire up the printer and spit out a full-sized sheet of paper that I was only actually going to use a few square inches of.

So I’ve come up with another alternative.

The thermal printers used with Point-of-Sale (POS) cash register systems are fairly inexpensive. They use a minimal amount of paper and because they use inexpensive thermal paper, there’s no overpriced ink or toner to buy. There’s a standard interface (5v serial RS-232) and protocol (ESC/POS) that allows one to interface simply. Because I wanted to be able to put the printer anywhere it was convenient without worrying about wires, I got a model that supports Bluetooth.

It’s a cute little critter, smaller and lighter than I’d expected – I’d been anticipating something about the size of the ones at the grocery store, but this unit just about fits in the palm of my tiny hand and has a battery that’s good for several days on a charge, so not even a power cable is required most of the time.

The tricky part was in getting it to talk to my computer.

My first attempts were from an Android tablet – there are several apps in the Google Play store that can talk to devices like this, although none that fully support what I want. The important thing, however, was that I was able to confirm that I could pair to the printer and do basic printing.

Getting a Desktop PC Bluetooth link

Now that I knew the printer worked, I plugged a Bluetooth dongle into my server. That’s when the trouble began.

The standard Linux interface to Bluetooth these days is BlueZ. It gets a lot of criticism:

  1. User Documentation is virtually non-existent
  2. The source code has virtually no comments in it (this demotes you to “amateur”grammer in my view).
  3. There are several different versions of it, which are radically different from each other. Doing an Internet search is especially perilous, since you often get returned results telling you to use obsolete or not-yet-available tools.

Things don’t get any easier, since Bluetooth itself has gone through several versions, too. Bluetooth 4, alias Bluetooth Low Energy (BLE) is popular, but a lot of devices fall back to simpler, earlier protocols. Some of the services that BLE is supposed to support may not be available (or are still incomplete) in BlueZ, and the devices themselves.

It’s a mess, and begs for someone to come in and do a professional job of cataloging and documenting. And putting some EXPLETIVE DELETED comments in the source code.

But the long and short of it is that what I determined I needed to get my recipes printed was RFCOMM, which provides serial port services for Bluetooth. Bluetooth can do many other things, up to and including TCP/IP and OBEX (the Object Exchange protocol used, for example, to beam pictures to/from cellphones and other devices), but RFCOMM is what I needed.

That means that I had to ensure that the core bluez modules were installed on my desktop Linux system and that the bluetooth service was started. It also meant that I had to create an /etc/bluetooth/rfcomm.conf file that defined the target printer and in particular, its MAC address. Then I had to create a /var/lib/bluetooth/aa:bb:cc:dd:ee:ff/pincodes file that mapped that MAC address to the PIN code for the printer (1234, in my case).

The “aa:bb:cc:dd:ee:ff: directory appeared by magic, apparently when I first plugged in the dongle. Its actual name is the MAC address of the dongle itself. Looking at web-search results, I think the single most frustrating thing for most people was understanding when you had to refer to the dongle’s MAC address and when you had to refer to the target device’s MAC address. Failure to keep the 2 straight tends to make the RFCOMM utility whine about “no route to host”.

Incidentally, the command:

hcitool dev

should list the MAC address of the dongle (or built-in adapter, if you are using a machine with factory Bluetooth). This would typically be listed as the “hci0” device.

The command

hcitool scan

Should allow you to enumerate the Bluetooth devices that are broadcasting and ready to pair. Unlike a lot of other Bluetooth devices, my POS printer apparently is always ready to pair and doesn’t have a magic button to prep it.

Once all of the above are ready, you can bind the rfcomm device.

In theory, that would be as simple as this:

rfcomm bind 0

But in practice, you’ll get a “missing dev parameter” error message, Instead, you have to include the printer’s MAC address on the rfcomm command line – it can’t simply look up a logical name or unit ID from the rfcomm.conf and use the MAC address in there for some strange reason.

OK, with (a great deal!) of luck, you now have a “/dev/rfcomm0” device that you can talk to. Printing is as simple as redirecting an “echo” command to that device. While you can use all sorts of neat ESC/POS control codes to select fonts, justifications, bar codes and stuff, simple text lines ending with a linefeed character (no carriage return) are quite adequate.

Printing a Recipe

So much for the hard part. What I did next was simply yank a recipe out of the Gourmet database and enumerate its ingredients list. I didn’t bother with the cooking instructions because it’s a simple recipe and besides the line width of the POS printer is only about 32 characters and I didn’t want to deal with that just yet (text will wrap, but I’m not sure how much text the printer’s buffer will tolerate).

So here’s a python script to print recipe #37, which is the recipe ID in the database I found when I searched recipe titles for “Apple Oat Crisp”.

#!/bin/python
# Print a recipe to the Bluetooth (/dev/rfcomm0) POS thermal printer
#
import sqlite3
conn = sqlite3.connect('recipes.db')

c = conn.cursor()

t = (37,) # the recipe ID in the database
c.execute('SELECT title from recipe where id=?', t)
title = c.fetchone()[0]
print '*** ',title,' ***'
for r in c.execute('SELECT amount, unit, item FROM ingredients WHERE recipe_id = ?', t):
    print r[0],' ',r[1],' ',r[2]

This small script actually outputs to stdout, but by redirecting stdout to /dev/rfcomm0, I got my recipe print!

thermalprint

Future enhancements

It would, of course, be nice if I could make this function be a plugin to the Gourmet Recipe Manager app itself, but that’s no big deal.

Of more immediate concern is normalizing the output and adapting it to the shorter print lines. The Gourmet Recipe Manager stores units pretty much verbatim, so you’ll find units like “Tablespoons” instead of “tbsp”, and units are “2.0 Cups” of flour. The worst of all is “1.6666666667” tsp salt instead of 1 1/3 tsp. So a little tweaking of the data before printing would be nice.

cloud-init “gotcha”

I was putting together a project using The Foreman to spin up and manage Amazon EC2 instances and ran into a problem. I could take an AMI and launch it, but I couldn’t ssh into it.

One of the major reasons why this was so was that cloud-init was silently failing and as a result, my ssh key wasn’t being installed.

The AMIs in question were built on top of Ubuntu 14.04LTS. The Foreman creates its own private access key to launch and control EC2 instances and it’s not accessible for general use. You have to supply your own private key if you want to login via ssh.

The recommended way to do that is to inject it via cloud-init. However, cloud-init wasn’t working right.

After a lot of experimentation, I discovered that the issue was in the attempt to also use cloud-init to set the simple hostname and fqdn of the newly-created host. Turns out that including the “host:” line like all the samples out on the Internet do causes the ENTIRE cloud-init file to be ignored. So in order to get my ssh key (AND hostname) injected, I just needed the “fqdn:” line.

Apache, Tomcat and SSL – with Pictures!

Or at least examples!

Apache SSL to non-SSL Tomcat:

<VirtualHost mytchost:80>
  ProxyPass / http://backend.tomcat.host:8080
  ProxyPassReverse / http://backend.tomcat.host:8080
<VirtualHost mytchost:80>

<VirtualHost mytchost:443>
  ProxyPass / http://backend.tomcat.host:8080
  ProxyPassReverse / http://backend.tomcat.host:8080
<VirtualHost mytchost:80>

Apache SSL to SSL Tomcat. This is what you’d normally use if the Tomcat webapp had secure transport specified in its web.xml:

<VirtualHost mytchost:80>
  ProxyPass / http://backend.tomcat.host:8080
  ProxyPassReverse / http://backend.tomcat.host:8080
<VirtualHost mytchost:80>

<VirtualHost mytchost:443>
  ProxyPass / https://backend.tomcat.host:8080
  ProxyPassReverse / https://backend.tomcat.host:8080
<VirtualHost mytchost:80>

Apache, Tomcat and SSL

Its a popular thing to use Apache (or nginx, etc.) as a reverse-proxy server fronting Tomcat. However, documentation on such practices tends to gloss over certain important things. Specifically:

1. Who owns the SSL cert that manages such a configuration. Apache or Tomcat?

2. Is the Apache-to-Tomcat tunnel encrypted? If so, how?

I finally decided to determine by experimentation. Here’s the scoop:

1. Encryption between Apache and Tomcat is not supported by the AJP protocol. If you need back-end encryption, use Apache’s mod_proxy, not mod_ajp.

2. If you make an https connection to a website hosted by Apache or proxied by Apache to Tomcat, the cert that’s applied will be the (x509) cert for that Apache host. Not a Tomcat jks cert.

3. You can configure Apache to proxy incoming SSL traffic to Tomcat even though Tomcat  itself isn’t configured for SSL.  Simply forward from your Apache ProxyPass/ProxyPassReverse to the Tomcat http port (8080 by default).

Note If you forward SSL to Tomcat via its http port, then none of the traffic between Apache and Tomcat will be encrypted. That’s OK if you are doing a forward within the local machine (using loopback) or if you are OK with security on your LAN.

4. If clear-text between Apache and Tomcat is not acceptable, you can do SSL from Apache to Tomcat. In that case, Tomcat needs its own keystore and certs, independent of the Apache certs. Apache will decrypt incoming Internet traffic so that it can do whatever it needs to do with headers and rewrites, then re-encrypt the proxy data using Tomcat’s cert.

To do SSL between Apache and Tomcat, the ProxyPass/ProxyPassReverse directives should address Tomcat’s HTTPS port (8443). Presumably you can even take plain HTTP coming into Apache and SSL it to Tomcat, but I didn’t bother to check.

Note that between Apache and Tomcat, a self-signed cert is probably good enough. In fact, since the cert won’t be officially registered, it’s one less internal secret for people to learn on the Internet. Apache’s handling of the finer aspects of backend certs are tunable, but the defaults are sufficient for most purposes.

Land Mines – Spring Neo4j

One of the primary purposes of this blog is to record what I’ve learned by tedious trial and error and/or spending time down in source code I shouldn’t have had to look at.

This particular topic has more than its share of discoveries.

Spring Neo4j claims that it’s intended to imitate, where possible, existing persistence systems approaches. Unfortunately, it has a long way to go on that.

First, let me mention that after descending through Maven’s equivalent of “DLL Hell”, I have been working with the following version sets:

  • Spring Framework version 4.0.6
  • Spring Neo4j 3.3.0.Release
  • Neo4j version 2.2.0
  • Logging courtesy of slf4j version 1.7.6 and log4j2 version 2.2, which has renamed the log4j config files since log4j v1 and added a few new config options (just for info).

As usual, just finding what was compatible with what was an adventure. I have a nasty, if unprovable suspicion that some of the pain could have been reduced if certain functions and classes had been marked “deprecated” instead of being removed or relocated.

Some of what I’ve learned will break shortly. Spring Neo4j 4 will actually lose a number of functions and annotations that exist in Spring Neo4j 3 (they were probably broken anyway). And some of the fixes I’ve come up with may actually be exploiting bugs rather than clean fixes. It’s the best I could do.

My use case seemed simple. I have 2 entity types: A and B, and a relationship, we’ll call “wants” which has a property named “priority”. I want to set up a network of many-to-many A-wants-B relationships with associated priorities and I want to be able to select a given B and get back an ordered list of A-wants in priority order with the priorities visible.

So using Spring Data Neo4j (SDN, for short), that gives me 2 NodeEntity classes (A and B) and a RelationshipEntity (“Wants”). I’m using GraphRepository extensions to manage persistence for A and B, which gives rise to the first question:

Q: What’s the best way to handle persistence of RelationshipEntities?

There’s a RelationshipGraphRepository, but unlike GraphRepository, it’s a class, not an interface and there is absolutely zero documentation I can find on how to use it or even if I should be explicitly using it. I therefore used ordinary GraphNode for the relationshipEntity. It seems to work. More or less.

And the first observation:

When they say “@Fetch”, they mean “@Fetch”

In JPA, a recommended way of persisting an object is in the form “a = repository.save(a);” This is because JPA may construct new new “a” using the original “a” as a basis, but carrying (visibly or not) extra data and/or meta-data provided by the “save” operation.

This is also the (explicitly) recommended practice for SDN. But there’s a big “gotcha”.

When you persist with JPA, the returned object will contain AT LEAST as much data as the original copy did (it may even be the original copy). When you persist with SDN, the returned object may not. What SDN does is save the data, then do a basic lazy fetch resulting in a new object instance. Which means that unless you annotate your complex properties with “@Fetch”, you’ll end up with a nasty surprise. Unlike JPA, which returns either the original value or an unresolved proxy object, SDN returns null for unresolved lazy fetches. Which is probably a recipe for lost data and is definitely a recipe for confusion (remember, this is things I learned the hard way!)

Ouch!

You can code @Query on both Repository and EntityNode classes

The manual doesn’t mention this. It’s a useful thing to know, although since in EntityNode classes, you often want to base your query relative to the current EntityNode instance, I also spent a lot of time coding things like “{self}” and “{this}” without success. I finally found out that using the clause START node=({self}) does the trick, although since “START” is supposed to be deprecated, there’s likely something that can do the same thing with a simple MATCH. That’s something to tackle another day.

What you get back isn’t what you think it is

Cypher isn’t as intuitively obvious to me as some people seem to think it should be. It’s a form of complex mathematical notation and while the docs on it are fairly illustrative, they are perhaps less complete or explanatory as they might be. So I spent a lot of time trying weird expressions. Here’s what worked as a member query on my “B” node:

@RelatedToVia
List wantsList = new ArrayList(5);

@Query("START b=node({self}) MATCH (b)-[r:WANTS]-(a:A) return r ORDER BY r.priority")
public List getWants() {
    return wantsList;
}

A lot of the earlier attempts returned false “Wants” objects whose nodeIds were actually the IDs of the “A” objects on the other end of the relationship.

Incidentally, Neo4j normally wants relationships to be unordered and will complain if you attempt to use an ordered collection (such as List) to hold them. However it is smart enough to realize that when you do an “ordered by” query it’s OK to define the collection using an ordering collection object.

But that’s not all!

The “Wants” list that this particular query returned is not directly usable. The nodeIds are valid, but all the other property values were blank. Not merely the lazy-fetch values, but even the primitive property values. So to be actually usable I have to pull the nodeId from the returns pseudo-Wants and use the repository to lookup the actual Wants node.

So I’ve solved what should have been a relatively simple problem – even though it’s an ugly solution – and only lost about 2 days doing so.