JPA and Fixed-length text fields in databases

Lucky me. I took over a system whose database is awash with compound keys. The more I work with this stuff, the more justification I find for always having a simple sequence key as the primary key, despite the apparent extra overhead.

I’ve had a long-running problem where parent-child (one-to-many) relationships weren’t able to fetch their children. At best, the collection came up empty. At worst, accessing the children collection property of the parent would throw a NullPointerException because the “bag” property of the PersistentBagCollection wasn’t initialized.

I spent a LOT of time with Hibernate’s trace-level logging turned on and the only thing I could determine was that the children were, in fact, being correctly loaded and formed into a collection, but the collection wasn’t the collection that was bound to the parent – instead the child collection was being silently lost.

One thing that did look suspicious, however was that the trace showed two collections found. I went round and round with this, added various bits of code that made objects more identifiable in the debugger, and finally – at long last ended up with a set of trace messages showing the owner primary keys of these collections.

Lo, and behold! One of them showed trailing spaces in the key field values, the other did not.

Trailing spaces in ORM fields have much the same effect and being sloppy with upper/lower case in Java filenames under Windows. Sometimes you get away with it, but not always. Because of the trailing spaces, the two primary keys didn’t compare equal, the “real” collection couldn’t be attached to its parent, and data was lost.

I suppose that this problem could be avoided by proper implementation of the equals() and hashCode() methods of the offending primary key objects, but I found it easier to simply force the key objects to be space-padded and to be more careful about what I passed to search functions, since the ultimate problem lies in the fact that sending a space-trimmed value to a Hibernate JPA query will match and return a space-padded object.

The ultimate cure for this would probably be to use fixed-size character arrays instead of String objects, but there are several problems here as well:

  1. Java’s static type checking doesn’t cover mismatches on array size.
  2. Individual characters in a character array can be null
  3. The ORM class-generating tools I use render fixed character fields as Strings, not arrays
  4. I’m not actually sure that either JPA or Hibernate support character array properties. I have enough grief trying to portably represent boolean and enum values.

OpenJPA/Spring/Tomcat6

Oh, what a tangled web we weave…

In theory, using JPA and Spring is supposed to make magical things happen that will make me more productive and allow me to accomplish wonderful things.

Someday. At the moment, I gain tons of productivity only to waste it when deployment time comes and I have to fight the variations in servers.

JPA allows coding apps using POJOs for data objects. You can then designate their persistence via external XML files or using Java Annotations. The Spring Framework handles a lot of the “grunt” work in terms of abstract connection to the data source, error handling and so forth.

But that, alas, is just the beginning.

First and foremost, I had to build and run using Java 1.5. OpenJPA 1.2 doesn’t support Java 6.

Tomcat is not a full J2EE stack. To serve up JPA in Tomcat requires a JPA service – I used Hibernate-entitymanager.

JPA requires a little help. Specifically, I used the InstrumentationLoadTimeWeaver to provide the services needed to process the annotations.

The weaver itself requires help. And to enable the weaver in Tomcat, I needed the spring-agent.

To the Tomcat6 lib directory I added:

  • spring-tomcat-weaver jar
  • spring-agent jar

But that’s not enough! The agent won’t turn itself on automatically. So I need to add a “-javaagent” to Tomcat’s startup. The easiest way to do that was to create a CATALINA_BASE/bin/setenv.sh file:

#!/bin/sh
CATALINA_BASE=/usr/local/apache-tomcat-6.0.18
JAVA_OPTS=”-javaagent:$CATALINA_BASE/lib/spring-agent-2.5.4.jar”

Tomcat 5

I think this all works more or less the same in Tomcat5, except that there are 3 library directories instead of the one library that Tomcat6 uses so the location for the spring suport jars is different. common/lib seems to work, although I’m not sure it’s the best choice.

That’s half the battle. Next up: JSF/RichFaces – and Maven

ORM and Imprecise Data Types

JPA is great in a lot of ways. But there’s one problem a lot of people have with JPA and ORM in general – imprecise keys.

Technically, this is as much a JDBC problem as an ORM problem. It just seems to cause more problems in ORM. Perhaps because ORM lets you concentrate less on the raw data details.

Anyone who has taken a basic programming course has (hopefully) had it pointed out to them that it’s dangerous to compare for exact equality on floating-point numbers. Most binary representations of floating point are imprecise on fractions, and even a simple 0.1 has no simple binary value.

Less obvious, however, is that times and dates are also imprecise formats. Technically, not dates, but in the real world, dates and times are often intermingled even when you don’t expect them to be.

There are 3 primary time/date representations commonly found in Java:

java.sql.Date

java.sql.Date

DBMS-specific dates.

I won’t address high-precision time data types. They’re less likely to surprise people.

java.sql.Date is, in theory granular to one day. In practice, it’s a subclass of java.util.date, so that’s not literally true.

java.util.Date is granular to one millisecond. Since java.sql.Date doesn’t enforce granularity, you can get in trouble with java.sql.Dates which have time-of-date day in them. Especially when dealing with Calendar timezone conversions.

Things get even more interesting when these objects are used in ORM and persisted out to a database. Oracle Dates have a granularity of 1 second – there’s no standard Java time class that reflects this. So if you persist out a java Data object, it may – generally will – get silently truncated. Thus your in-memory date and your database dates will not compare equal!

As bad as this is, it’s worse if you try and use that date as a primary or foreign key. You’ll get invalid results, since there’s no such actual value in the database. More insidiously, in an ORM environment, you can make queries without realizing it. That is, if you retrieve an object that’s liked to another object and the actual linkage was a date, the linkage may fail for non-obvious reasons.

There’s no easy fix for this. You can write your own Date class that enforces the granularity of your choice (truncates/rounds to seconds in the case of Oracle), but then you have to configure the data type mapping in your ORM configuration. You can write accessor functions to do the same thing, but if you forget to use one, the program will fail for non-obvious reasons. It’s best to avoid using dates as keys, but this isn’t always an option, and even non-key dates have their perils.

Oracle isn’t the only – or perhaps even the worst – offender. It turns out that PostgreSQL has an even more insidious date problem. By default, the Oracle time data types are floating-point. Which means that they’re imprecise. And if dates are bad as keys, floating-point numbers are a thousand times worse!

There are 2 possible ways to handle that. One is to build a custom copy of the PostgreSQL server using the option of internally storing fractional seconds in interger form. Probably not going to happen, since not only does this mean you have to have permission to run a non-standard server, but also the internal table data won’t be freely interchangable with standard-build tables. This would be an even worse problem than it is, except that PostgreSQL is notorious for changing internal structure even between minor releases.

The other alternative is to define the time value with fractional seconds truncated – for example, as TIMESTAMP(0). It is, unfortunately, not possible to accurately represent millisecond values in a timedate value on a stock PostgreSQL server, so the next best thing is to simply hack them off if you intend to retrieve by time or date.

OpenJPA, Tomcat6, Spring and Hibernate

Getting all of the above to play together can be tricky. I knocked myself offline for nearly 2 days after the latest upgrade.

Broken Tomcat/Eclipse

The sysdeo Tomcat plugin no longer works when my testapp is installed – the log says the WebAppClassLoader (todo: check name) cannot be found. Something’s messed up the core Tomcat classpath. Fortunately the stand-alone remote debug option still works.

Annotations and Weaving

The test app is using Java 5 annotations on the persistent classes. Persistent classes are “magic”. Before they can actually be used, they have to be re-written to add the additional properties that aid the persistency manager. In the Old Days, that would have meant using some sort of utility that added additional source text to the persistent classes before they were compiled. In our new, more transparent era, this means that after the java source is compiled into classes, the extra embellishments are added to the binary class definition.

There’s 2 ways to do that:

  1. Permanently modify the class binary file
  2. Piggyback the embellishments onto the class as it’s being loaded into the JVM.

When I used JDO – and in early OpenJPA, option 1 was used courtesy of a utility known as the enhancer.

Since about October 2007 (give or take), OpenJPA has had the ability to do on-the-fly enhancement, which is option 2 or some facsimile thereof. But there’s a catch.

In order to