Land Mines – Spring Neo4j

One of the primary purposes of this blog is to record what I’ve learned by tedious trial and error and/or spending time down in source code I shouldn’t have had to look at.

This particular topic has more than its share of discoveries.

Spring Neo4j claims that it’s intended to imitate, where possible, existing persistence systems approaches. Unfortunately, it has a long way to go on that.

First, let me mention that after descending through Maven’s equivalent of “DLL Hell”, I have been working with the following version sets:

  • Spring Framework version 4.0.6
  • Spring Neo4j 3.3.0.Release
  • Neo4j version 2.2.0
  • Logging courtesy of slf4j version 1.7.6 and log4j2 version 2.2, which has renamed the log4j config files since log4j v1 and added a few new config options (just for info).

As usual, just finding what was compatible with what was an adventure. I have a nasty, if unprovable suspicion that some of the pain could have been reduced if certain functions and classes had been marked “deprecated” instead of being removed or relocated.

Some of what I’ve learned will break shortly. Spring Neo4j 4 will actually lose a number of functions and annotations that exist in Spring Neo4j 3 (they were probably broken anyway). And some of the fixes I’ve come up with may actually be exploiting bugs rather than clean fixes. It’s the best I could do.

My use case seemed simple. I have 2 entity types: A and B, and a relationship, we’ll call “wants” which has a property named “priority”. I want to set up a network of many-to-many A-wants-B relationships with associated priorities and I want to be able to select a given B and get back an ordered list of A-wants in priority order with the priorities visible.

So using Spring Data Neo4j (SDN, for short), that gives me 2 NodeEntity classes (A and B) and a RelationshipEntity (“Wants”). I’m using GraphRepository extensions to manage persistence for A and B, which gives rise to the first question:

Q: What’s the best way to handle persistence of RelationshipEntities?

There’s a RelationshipGraphRepository, but unlike GraphRepository, it’s a class, not an interface and there is absolutely zero documentation I can find on how to use it or even if I should be explicitly using it. I therefore used ordinary GraphNode for the relationshipEntity. It seems to work. More or less.

And the first observation:

When they say “@Fetch”, they mean “@Fetch”

In JPA, a recommended way of persisting an object is in the form “a =;” This is because JPA may construct new new “a” using the original “a” as a basis, but carrying (visibly or not) extra data and/or meta-data provided by the “save” operation.

This is also the (explicitly) recommended practice for SDN. But there’s a big “gotcha”.

When you persist with JPA, the returned object will contain AT LEAST as much data as the original copy did (it may even be the original copy). When you persist with SDN, the returned object may not. What SDN does is save the data, then do a basic lazy fetch resulting in a new object instance. Which means that unless you annotate your complex properties with “@Fetch”, you’ll end up with a nasty surprise. Unlike JPA, which returns either the original value or an unresolved proxy object, SDN returns null for unresolved lazy fetches. Which is probably a recipe for lost data and is definitely a recipe for confusion (remember, this is things I learned the hard way!)


You can code @Query on both Repository and EntityNode classes

The manual doesn’t mention this. It’s a useful thing to know, although since in EntityNode classes, you often want to base your query relative to the current EntityNode instance, I also spent a lot of time coding things like “{self}” and “{this}” without success. I finally found out that using the clause START node=({self}) does the trick, although since “START” is supposed to be deprecated, there’s likely something that can do the same thing with a simple MATCH. That’s something to tackle another day.

What you get back isn’t what you think it is

Cypher isn’t as intuitively obvious to me as some people seem to think it should be. It’s a form of complex mathematical notation and while the docs on it are fairly illustrative, they are perhaps less complete or explanatory as they might be. So I spent a lot of time trying weird expressions. Here’s what worked as a member query on my “B” node:

List wantsList = new ArrayList(5);

@Query("START b=node({self}) MATCH (b)-[r:WANTS]-(a:A) return r ORDER BY r.priority")
public List getWants() {
    return wantsList;

A lot of the earlier attempts returned false “Wants” objects whose nodeIds were actually the IDs of the “A” objects on the other end of the relationship.

Incidentally, Neo4j normally wants relationships to be unordered and will complain if you attempt to use an ordered collection (such as List) to hold them. However it is smart enough to realize that when you do an “ordered by” query it’s OK to define the collection using an ordering collection object.

But that’s not all!

The “Wants” list that this particular query returned is not directly usable. The nodeIds are valid, but all the other property values were blank. Not merely the lazy-fetch values, but even the primitive property values. So to be actually usable I have to pull the nodeId from the returns pseudo-Wants and use the repository to lookup the actual Wants node.

So I’ve solved what should have been a relatively simple problem – even though it’s an ugly solution – and only lost about 2 days doing so.

Baby Steps with OpenStack

The OpenStack cloud platform is hot these days. Anyone can set up and run their own private cloud without too much difficulty.

Relatively speaking. You do need a huge chunk of RAM and a respectable amount of disk space, even for a minimal cloud. Also a x64-bit hardware VM capable CPU. But considering what you get, it’s not a bad payoff.

An openstack cloud has 3 types of nodes: control, compute, and storage. You need at least one of each, but a single OS instance can host any combination of them, so the simplest cloud would be an all-in-one server.

First Step

There are quite a number of components that make up these nodes – including some that are plug-replaceable, so the easiest way to get started is to use a springboard. One popular route uses a Vagrant VM to launch the DevStack ready-to-run server. This is a good way to get familiar with OpenStack, since everything’s already pretty much set up and running and you can launch it via VirtualBox on your desktop. Assuming you have at least 8GB RAM to spare, since the DevStack VM is going to eat up about half of that.

Second Step

Running a cloud on your desktop is pretty cool, but if you have aspirations on running a real cloud, you need real servers. Since I didn’t have that many spare servers with sufficient capabilities, the next step I did was again launch OpenStack in a VM, but using a KVM under CentOS 5.11. Why not 6 or 7? Primarily because I have legacy Xen VMs on its siblings and I’m not yet ready to migrate them to an OS that can’t host Dom0.

If you do not allocate sufficient RAM or disk space for the OpenStack VM, it may not install properly and almost certainly won’t work properly, and for the most part you’re not going to get much in the way of helpful messages. OpenStack is comprised of a whole raft of component products and there’s not much in the way of centralized detection and reporting of broken components.

Here’s what I used to create the basic OpenStack VM:


virt-install --name $VM \
	--hvm --ram 4096 --cpus 4 \
	--disk path=$IMAGES/$VM.img,size=6 \
	--network bridge:br0 \
	--os-type=linux --os-variant=rhel6 \
	--accelerate --vnc -v \
	--location= \
	-x "ks=$VM.ks"

Note that this VM runs the IceHouse release of RDO under Centos 6.5. I tried Juno and CentOS 7, but it kept whining about running out of memory, at least up to about 3GB or so. The network bridge is my VM host’s bridge to the VMs the number of CPUs isn’t important, but I had a few to spare. The kickstart file is nothing special, but it does install and enable ntp and format the disks into a /boot (about 300M) and an LVM partition (everything else), containing a single Logical Volume for the OS.


A production all-in-one node needs a LOT more disk. You’ll want storage for the client disk images and working permanent storage,

Using PackStack

The PackStack package makes the job of setting up an OpenStack node a lot easier. It fetches the various component packages and uses Puppet to install and configure them. It also creates an “answers” file so you can replay the installation, if needed.

Under CentOS, the easiest way to get things going is to run packstack. My Kickstart had a post-install command to install the Icehouse Yum repository:

rpm -ivh

So the sequence once the VM came up post-install went like this:

  1. Install YUM plugin to enforce precedence on repo search/fetch
    yum -y install yum-plugin-priorities
  2. Upgrade the OS
    yum -y upgrade
  3. Reboot to get the latest kernel
  4. Install PackStack
    yum -y install openstack-packstack
  5. Run PackStack
    packstack --allinone --provision-demo=n

Once all that’s done, with luck, you can open up a web browser on the Openstack console.

Things that can go horribly wrong

The single biggest headache I’ve found with OpenStack is networking. Networking a collection of VMs is a major pain even without clouds. OpenStack raises the ante considerably, since you have 2 options for network stacks (legacy nova network or neutron), and all sorts of real and virtual device/network options. Which, if you’re not already well-read on the subject, you’ll have no clue which ones you should be using or how to set them up.

More on this later.

Beyond that, the most critical functions for OpenStack are the security/identity manager (keystone) and the messaging agent (defaults to rabbitmq, but replaceable). Without the identity manager, nothing can be accessed, without the messaging system, components cannot notify each other about important events. Fortunately, these two are less likely to screw up and appear to be easier to diagnose/repair.

Gnome Evolution is an Abomination and gnome-keyring should die in a fire!


Between Evolution’s penchant for creating non-deletable – and defective – account associations and gnome-keyring’s useless pop-up dialogs, the whole thing almost makes Microsoft Windows seem attractive.

Then again, gnome is, by and large, a slavish attempt to imitate many of Windows’ more obnoxious features. Like the Windows Registry.

Honestly. People have been complaining about this stuff for years and it never gets fixed.

The popup for gnome-keyring is especially odious, since it blocks all other user interaction (including access to pwsafe) and it LIES. It says that the Google password incorrect when it isn’t.

There are no documented fixes to speak of, short of wiping the entire OS, no one on the respective gnome development teams does anything and users get angry.

Including me. So I’m going to go take a stress pill.

Maven: No plugin found for prefix ‘X’ in the current project and in the plugin groups

I just spent 2 days constructing Apache Stratos and it was an apalling experience. Both Stratos and one of its key components: jcloud are massively-complex projects and they brought out the worst in Maven 3.

In theory, Maven is a “write-once, build-anywhere” system. Practice wasn’t quite as kind to me this last week. In addition to the grief I had building Stratos, it turns out that Maven 3 does strange things to the dbunit tests on one of my own major projects for no apparent reason or gain.

Stratos, on the other hand, showed the weaknesses in repository retrieval, as many dependencies simply refused to download from the master repo over and over again and had to be manually yanked from the repository and installed by hand in my local repository. In some cases, the jar didn’t seem to be present on the master repo at all, in others, that particular part of the repository simply timed out when attempting to look at it with the browser and I had to find a secondary source. I thought Maven was supposed to have the intelligence to consult mirror servers!

Having finally downloaded and/or built all the components, the last wall I hit was the error in this post’s title:

No plugin found for prefix ‘X’ in the current project and in the plugin groups

I got this because Maven ran out of memory. PermGen space, no less! So I tried adding the usual Java memory parameters to the Maven command line, and that’s when I got this mysterious message.

Turns out, that the memory options are not passed verbatim to Maven’s JVM, they are passed as though they were command-line arguments, so an “-Xmx” option comes out as something like Maven goal “X”. Obviously not what I needed.

To get the MAVEN_OPTS set in Linux for a 1-shot build command, I used the following command line, instead:

MAVEN_OPTS=” -XX:MaxPermSize=256m” mvn -Dmaven.test.skip=true install

This gave Maven the necessary cues to finish the build. I now have an allegedly useful Stratos system to play with.

HOWTO: get Docker Containers under Centos 5 with Xen

Centos5 is getting long in the tooth, but then again, many of my servers are antiques that would find native Centos6 to be problematic.

A recent adventure in disaster recovery led me to upgrade several of my Xen DomU’s from CentOS 5 to CentOS 6, but I was distressed to discover that about the minimum you can get by with on RAM for CentOS6 is nearly 400MB. I wanted to host several CentOS6 VMs, but the thought of getting dinged to the tune of half-a-GByte of RAM plus several gigs of disk image didn’t sit well for lightweight systems.

The “in” thing for this kind of stuff is Containers, which neatly fit in the space between a full VM and something less capable such as a chroot jail. The question was, could I get CentOS 6 containers to work in a CentOS5 Dom0?

As a matter of fact, yes, and it was considerably less painful than expected!

I cheated and did the real dirty work using my desktop machine, which is running Fedora 20, hence is better supported for all the bleeding-edge tools. Actually, Ubuntu would probably be even better, but I’m at home with what I’ve got and besides, the idea is to make it as little work as possible given my particular working environment.

Step 1: Vagrant.

Vagrant is one of those products that everyone says is wonderful (including themselves), but it was hard to tell what it’s good for. As it turns out, what it’s good for is disposable VM’s.

Specifically, Vagrant allows the creation of VM “boxes” and the management of repositories of boxes. A “box” is a VM image plus the meta-data needed for Vagrant to deploy and run the VM.

So I yum-installed vagrant on my Fedora X86_64 system.

My selected victim was a basic CentOS 6 box, since for the VirtualBox VM environment.

vagrant box add centos65-x86_64-20131205

Step 2. Docker

It would have been more convenient to get a ready-made Centos6 Docker box, but most Docker-ready boxes in the public repo are for Ubuntu. So I did a “vagrant up” to cause the box image to download and launch, connected to the Centos6 guest, and Docker-ized it using this handy resource:

An alternative set of instructions:

The process is rather simple as long as you’re using the latest CentOS 6.5 release. Older kernels don’t have the necessary extensions, requiring a kernel upgrade first.

Step 3. Porting to Xen

Once docker was working, the challenge of getting the VM from VirtualBox to Xen presented itself. I was expecting a nightmare of fiddling with the disk image and generating a new initrd, but there was a pleasant surprise. All I had to do was convert the VM image from the “vmdk” format to a raw disk image, transfer the raw image to the Xen box, hack up a xen config and it booted right up!

The details:

On the Fedora desktop:

$ qemu-img convert -f vmdk virtualbox-centos6-containers.vmdk -O raw centos6-containers.img
$ rsync -avz --progress centos6-containers.img root@vmserver:/srv/xen/images/centos6-container/

File and directory names vary depending on your needs and preferences.

Now it’s time to connect to “vmserver”, which is my CentOS5 physical host.

I stole an existing XEN DomU pygrub-booting definition from another VM, changed the network and virtual disk definitions to provide a suitable environment. The disk definition itself looks like this:

disk = [ "tap:aio:/srv/xen/images/centos6-container/centos6-containers.img,xvda,w"]

xvda, incidentally is a standard Centos VM disk image, with a swap and LVM partition inside.

I launched the VM and behold! a Centos 6 Docker container DomU on a CentOS 5 Dom0 base.

Everything should be this easy.

The curse of the mad Puppet

I have been working with various things designed to allow me to control the domain assets in a more centralized way. One of them was to try and use Puppet to provision machines. Puppet is a fairly nice tool, but there are some unexpected pitfalls.

There are several ways to get puppet on a CentOS 5 server. If you’d a glutton for punishment, you can always pull down the source and build from scratch, but I don’t recommend that when just getting started. You can also pull it via YUM from the EPEL repository. Or, you can import the Puppet Labs repo in and pull from there.

I already have EPEL in my stock set of YUM repositories, so that’s what I went for first. In the beginning all was fun and games. Then I got more ambitions and started defining modules. This didn’t work. Worse, the sample documentation used commands that didn’t work. It was getting very frustrating.

It became obvious fairly early that some significant changes have been made to puppet and that what I had wasn’t the Latest and Greatest. That would have been OK, except that attempts to read the online documentation for the older stuff kept leaking back into docs for the newer stuff (a not uncommon problem, best handled I think by archiving the old docs as self-contained PDFs). On top of that, the version of puppet that I was running was sufficiently antique that much of the documentation had fallen off the website (see previous).

To add to the confusion, I wasn’t really sure WHICH version of puppet I was running, since their enterprise product doesn’t keep quite the same set of version numbers as their community version plus I suspected the version (2.6.18) in the EPEL RPM wasn’t indicative, either. I finally came to the conclusion that 2.6.18 – which is the version that Centos10 pulled actually correlates to the community 2.5 version, which is something like 2.3 in Enterprise versioning.

At this point, I went to the source: Puppet Labs and found out about their repo. Unfortunately a network-based RPM install failed for obscure reasons (I’m not sure whether I have lingering LAN issues or it’s them). Fortunately, I was able to wget and install locally. After which I was able to install a Version 3 Puppet, the documentation now matches the commands and module processing works the way they said it did.

One last fly in the ointment, though. It seems that nodes and classes share the same namespace. I was using the same name for my guinea-pug machine node and one of the packages it was trying out and while both node and component parsed, the actual execution was only done against the node – the package was silently ignored. I fixed this by changing the node name to its fully-qualified domain name.


[SOLVED] mail loops back to me (MX problem?) for virtual machine

Sometimes they just gang up on you.

I was migrating my sendmail server from a NAT address to a bridge address when it all started.

Xen has this really nasty habit of zapping your hardware MAC address if you don’t get the nat routing configure just right. There’s obviously some way to get it to revert, because occasionally for no obvious reason, the real MAC address will revert, but don’t try searching the web for an answer – all you’ll get is fruitless inquiries and flame responses (you shouldn’t be changing your MAC address, idiot!). Please. There are very good reasons why it’s useful to be able to set a custom MAC address. One place I worked coded their hardware asset IDs in the MAC to assist their DHCP server, for instance.

On the mousetech domain, I’d be happier if it didn’t happen. As it is, the MAC addresses of the primary and secondary NICs got swapped and I didn’t find out until I’d gotten most of the way through fixing things. So the former eth0 became eth1 and vice versa.

Shortly thereafter, outbound mail started bouncing with the infamous “mail loops back to me” message. Since I’d just done major relocation on the mail VM, I wasted a LOT of time messing around with sendmail options to no avail. Normally this message can be cured by putting in a valid MX record in DNS and/or adding all the mailserver alias names to the sendmail local-host-table (cw table). Not this time.

I was fairly sure that the problem had something to do with the fact that the physical host had been set up to forward all port 25 (smtp) requests to the mail VM and that somehow the wrong IP address was getting mixed in, but I could see the actual routing since it was all internal and specific to port 25 to boot.

Turns out I’d been sloppy when I fixed up the iptables forwarding. The correct version (after the NIC mixup) looks like this:

-A PREROUTING -i eth0 -p tcp --dport 25 -j DNAT --to-destination

Where I went wrong was in being lazy and omitting the NIC ID (eth0) when I repaired the damage that Xen did. As a result, BOTH NICs were being re-routed – the actual internet-facing NIC (which should be routed) and the internal DMZ bridge-facing NIC (which should not). As a result, traffic on port 25 for eth1 was being routed back on itself and sendmail complained.

The Underappreciated Raspberry

The Raspberry Pi B version is one of the most popular hacker toys of the day and with good reason. Although it’s not the first sub-miniature single-board computer, it’s the first one whose price, performance, features and power make it an acceptable substitute for a “real” desktop computer.

But there’s another Raspberry Pi as well. Ironically, the “A” model came out after the “B” model. Unlike the “B”, it lacks a few things. Only one USB port and no on-board ethernet, for example.

But before you dismiss it as not worth the $10 lower price, here are some things to consider:

1. The “A” model pulls less power, since it doesn’t have to support those missing components. This is especially important if you want to run the unit off batteries or solar cells.

2. The ethernet controller on the B is actually just a third USB port with an on-board USB-ethernet adapter. If I need to add ethernet to an “A” board, I have a USB ethernet dongle sitting on the shelf right next to me. I originally bought it to network-enable an old laptop.

3. Some people prefer NOT to have critical machines on the Internet. Why pay for hardware you don’t want to use? Plus these days, a lot of people skip the wiring and use WiFi adapters. There are a number of WiFi dongles that will plug into the USB port. Why pay extra for a jack you don’t need? Or for power to support it?

4. Enough about networking, though, consider USB. For some people 2 USB ports are more than they need. So the single port on the “A” system may be sufficient or even surplus. Conversely, for some people, 2 ports are not enough. And/or they need powered ports with more power than the Raspberry Pi itself supplies. USB expansion hubs, both powered and un-powered have been around a long time and are easy to come by.

So we have 2 choices. A good general-purpose platform and a cheaper basic platform. Both expandable, both economical in their own ways.

DBUnit and CSV reference data

The CSV capabilities of dbUnit are under-documented. Here’s the results of some pain, suffering and debug tracing:

The CSV files are expected to be one-per-table. The tablename is part of the filename, thus: “TABLE1.csv”. Format is the usual, with the first row containing the column names and subsequent rows containing data. It is possible to customize the delimiters and separators, but the defaults work with bog-standard CSV. One possible reason to override is to allow use with pipe-separated format files.

Here’s some code to snapshot an SQL query out to CSV for use as a later test reference (or whatever).

Connection con =
IDatabaseConnection dbUnitCon =
new DatabaseConnection(con, "MYDB");

ITable actualTable =
+ " AND INK_ID in('ABEND001', 'AAA', '07CSI')");

// Take reference snapshot:
IDataSet ds1 = new DefaultDataSet(actualTable);
CsvDataSetWriter.write(ds1, new File("/home/timh/csvdir"));

dbUnit will create csvdir, if needed and output 2 files. The SNAPTABLE.csv file and a file named “table-ordering.txt”.

To load SNAPTABLE for use in validating the results of a test:

// Load expected data from CSV dataset
CsvDataFileLoader ldr = new CsvDataFileLoader();
// NOTE: terminal "/" on URL is MANDATORY!!!
IDataSet expectedDataSet =
ITable expectedTable =

// Assert actual database table match expected table
Assertion.assertEquals(expectedTable, actualTable);

Note that while you can specify case-insensitivity on table names (it’s the default), the case of the SNAPTABLE.csv file and the SNAPTABLE entry in table-ordering.txt must match – at least on case-sensitive OS’s. And it’s good practice regardless. Also note that table-ordering.txt can contain multiple table names, one tablename per line.

Finally, note that the error exception for the comparison assert counts line numbers off by 2. It doesn’t take the column-name row into account, and it starts counting from 0, instead of the more usual case (for databases) of counting starting at 1.

important! The CSV reader is not very flexible when it comes to reading NULL values. Null fields MUST be given a value of “null”, lower case, WITHOUT surrounding quotes. Like so:

“Fred Smith”,”127.0″,null,null,”Jabber”

Gourmet Recipe Manager problems under recent Fedora releases

The Gourmet Recipe Manager program is a very useful way to store and find recipes, but it has been essentially useless since about Fedora 11 (give or take).

The problem is that this app uses an SQLite database to keep its recipes and accesses the database using SQLAlchemy. Version 0.7 is seriously broken relative to Gourmet.

The cure, once you know it, is relatively easy – at least as long as you don’t have any other apps depending on SQLAlchemy (and I didn’t). First, check to see if you do:

rpm -q --whatrequires python-sqlalchemy

If gourmet is your only dependent (or you don’t care/feel brave), remove the 0.7 sqlalchemy package by brute force:

rpm --erase --nodep python-sqlalchemy

Download one of the 0.6 versions of sqlalchemy. If you have a 64-bit system, be sure to get the 64-bit RPM, which includes the 32-bit version.

Install the downloaded sqlalchemy RPM

rpm -ivh python-sqlalchemy-0.6.xxxxxx.rpm

Fire up gourmet. One of the easiest ways to see if it works is just to type a search. If it returns search results, you’re OK!