Java Persistence Architecture (JPA) and related ORM – Things I wish I'd Known Earlier

Land Mines – Spring Neo4j

One of the primary purposes of this blog is to record what I’ve learned by tedious trial and error and/or spending time down in source code I shouldn’t have had to look at.

This particular topic has more than its share of discoveries.

Spring Neo4j claims that it’s intended to imitate, where possible, existing persistence systems approaches. Unfortunately, it has a long way to go on that.

First, let me mention that after descending through Maven’s equivalent of “DLL Hell”, I have been working with the following version sets:

Spring Framework version 4.0.6
Spring Neo4j 3.3.0.Release
Neo4j version 2.2.0
Logging courtesy of slf4j version 1.7.6 and log4j2 version 2.2, which has renamed the log4j config files since log4j v1 and added a few new config options (just for info).

As usual, just finding what was compatible with what was an adventure. I have a nasty, if unprovable suspicion that some of the pain could have been reduced if certain functions and classes had been marked “deprecated” instead of being removed or relocated.

Some of what I’ve learned will break shortly. Spring Neo4j 4 will actually lose a number of functions and annotations that exist in Spring Neo4j 3 (they were probably broken anyway). And some of the fixes I’ve come up with may actually be exploiting bugs rather than clean fixes. It’s the best I could do.

My use case seemed simple. I have 2 entity types: A and B, and a relationship, we’ll call “wants” which has a property named “priority”. I want to set up a network of many-to-many A-wants-B relationships with associated priorities and I want to be able to select a given B and get back an ordered list of A-wants in priority order with the priorities visible.

So using Spring Data Neo4j (SDN, for short), that gives me 2 NodeEntity classes (A and B) and a RelationshipEntity (“Wants”). I’m using GraphRepository extensions to manage persistence for A and B, which gives rise to the first question:

Q: What’s the best way to handle persistence of RelationshipEntities?

There’s a RelationshipGraphRepository, but unlike GraphRepository, it’s a class, not an interface and there is absolutely zero documentation I can find on how to use it or even if I should be explicitly using it. I therefore used ordinary GraphNode for the relationshipEntity. It seems to work. More or less.

And the first observation:

When they say “@Fetch”, they mean “@Fetch”

In JPA, a recommended way of persisting an object is in the form “a = repository.save(a);” This is because JPA may construct new new “a” using the original “a” as a basis, but carrying (visibly or not) extra data and/or meta-data provided by the “save” operation.

This is also the (explicitly) recommended practice for SDN. But there’s a big “gotcha”.

When you persist with JPA, the returned object will contain AT LEAST as much data as the original copy did (it may even be the original copy). When you persist with SDN, the returned object may not. What SDN does is save the data, then do a basic lazy fetch resulting in a new object instance. Which means that unless you annotate your complex properties with “@Fetch”, you’ll end up with a nasty surprise. Unlike JPA, which returns either the original value or an unresolved proxy object, SDN returns null for unresolved lazy fetches. Which is probably a recipe for lost data and is definitely a recipe for confusion (remember, this is things I learned the hard way!)

Ouch!

You can code @Query on both Repository and EntityNode classes

The manual doesn’t mention this. It’s a useful thing to know, although since in EntityNode classes, you often want to base your query relative to the current EntityNode instance, I also spent a lot of time coding things like “{self}” and “{this}” without success. I finally found out that using the clause START node=({self}) does the trick, although since “START” is supposed to be deprecated, there’s likely something that can do the same thing with a simple MATCH. That’s something to tackle another day.

What you get back isn’t what you think it is

Cypher isn’t as intuitively obvious to me as some people seem to think it should be. It’s a form of complex mathematical notation and while the docs on it are fairly illustrative, they are perhaps less complete or explanatory as they might be. So I spent a lot of time trying weird expressions. Here’s what worked as a member query on my “B” node:

@RelatedToVia
List wantsList = new ArrayList(5);

@Query("START b=node({self}) MATCH (b)-[r:WANTS]-(a:A) return r ORDER BY r.priority")
public List getWants() {
    return wantsList;
}

A lot of the earlier attempts returned false “Wants” objects whose nodeIds were actually the IDs of the “A” objects on the other end of the relationship.

Incidentally, Neo4j normally wants relationships to be unordered and will complain if you attempt to use an ordered collection (such as List) to hold them. However it is smart enough to realize that when you do an “ordered by” query it’s OK to define the collection using an ordering collection object.

But that’s not all!

The “Wants” list that this particular query returned is not directly usable. The nodeIds are valid, but all the other property values were blank. Not merely the lazy-fetch values, but even the primitive property values. So to be actually usable I have to pull the nodeId from the returns pseudo-Wants and use the repository to lookup the actual Wants node.

So I’ve solved what should have been a relatively simple problem – even though it’s an ugly solution – and only lost about 2 days doing so.

DBUnit and CSV reference data

The CSV capabilities of dbUnit are under-documented. Here’s the results of some pain, suffering and debug tracing:

The CSV files are expected to be one-per-table. The tablename is part of the filename, thus: “TABLE1.csv”. Format is the usual, with the first row containing the column names and subsequent rows containing data. It is possible to customize the delimiters and separators, but the defaults work with bog-standard CSV. One possible reason to override is to allow use with pipe-separated format files.

Here’s some code to snapshot an SQL query out to CSV for use as a later test reference (or whatever).
Connection con = DataSourceUtils.getConnection(dataSource); IDatabaseConnection dbUnitCon = new DatabaseConnection(con, "MYDB");


        ITable actualTable =

              dbUnitCon.createQueryTable(

                     "SNAPTABLE",

                     "SELECT * FROM QTABLE "

                     + " WHERE INK1='GLORP'"

                     + " AND INK_ID in('ABEND001', 'AAA', '07CSI')");

// Take reference snapshot: IDataSet ds1 = new DefaultDataSet(actualTable); CsvDataSetWriter.write(ds1, new File("/home/timh/csvdir"));

dbUnit will create csvdir, if needed and output 2 files. The SNAPTABLE.csv file and a file named “table-ordering.txt”.

To load SNAPTABLE for use in validating the results of a test:
// Load expected data from CSV dataset CsvDataFileLoader ldr = new CsvDataFileLoader(); // NOTE: terminal "/" on URL is MANDATORY!!! IDataSet expectedDataSet = ldr.load("/testdata/snapshots/"); ITable expectedTable = expectedDataSet.getTable("SNAPTABLE");

// Assert actual database table match expected table Assertion.assertEquals(expectedTable, actualTable);

Note that while you can specify case-insensitivity on table names (it’s the default), the case of the SNAPTABLE.csv file and the SNAPTABLE entry in table-ordering.txt must match – at least on case-sensitive OS’s. And it’s good practice regardless. Also note that table-ordering.txt can contain multiple table names, one tablename per line.

Finally, note that the error exception for the comparison assert counts line numbers off by 2. It doesn’t take the column-name row into account, and it starts counting from 0, instead of the more usual case (for databases) of counting starting at 1.

important! The CSV reader is not very flexible when it comes to reading NULL values. Null fields MUST be given a value of “null”, lower case, WITHOUT surrounding quotes. Like so:

“Fred Smith”,”127.0″,null,null,”Jabber”

org.hibernate.PropertyAccessException: could not get a field value by reflection getter of com.mypackage.MyEntity.entityId

There are reports that messages of this sort were due to bugs in one or more Hibernate releases. But this is also a legitimate error.

I wasted a lot of time before it hit me. The original code was:


@SuppressWarnings("unchecked")
public List<State> getStatesForCountry(String countryID) {
    final String SQLCOMMAND =
        "SELECT c " + "FROM "
            + State.class.getSimpleName() + " c "
            + " WHERE c.parentCountry = :countryID"
            + " ORDER BY c.StateName ASC";

    Query query = entityManager.createQuery(SQLCOMMAND);
    query.setParameter("countryID", countryID);
    List<State> results = query.getResultList();
    return results;
}

It should have been:


@SuppressWarnings("unchecked")
public List<State> getStatesForCountry(String countryID) {
final String SQLCOMMAND =
    "SELECT c " + "FROM "
         + State.class.getSimpleName() + " c "
        + " WHERE c.parentCountry = :country"
        + " ORDER BY c.StateName ASC";

    Query query = entityManager.createQuery(SQLCOMMAND);
    Country country = findCountry(countryID);
    query.setParameter("country", country);
    List<State> results = query.getResultList();
    return results;
}

Where findCountry is simply an entityManager.find(Country.class, countryID);

This is a sneaky one because when the database is laid out, you think in terms of foreign keys. But in ORM, you don’t see those keys directly – they translate to object references. So the original version was attempting to compare an object to the key of the object instead of an instance of the object.

I can only plead distraction. The object model in question had been designed by someone else. Instead of accessor methods, it was using direct field access (which I avoid for a number of reasons). Further aggravating the issue was that the fields were all given names starting with an upper-case letter as though they were independent classes instead of properties.

But even without distractions, it’s not too hard to make this mistake.

OpenJPA/Spring/Tomcat6

Oh, what a tangled web we weave…

In theory, using JPA and Spring is supposed to make magical things happen that will make me more productive and allow me to accomplish wonderful things.

Someday. At the moment, I gain tons of productivity only to waste it when deployment time comes and I have to fight the variations in servers.

JPA allows coding apps using POJOs for data objects. You can then designate their persistence via external XML files or using Java Annotations. The Spring Framework handles a lot of the “grunt” work in terms of abstract connection to the data source, error handling and so forth.

But that, alas, is just the beginning.

First and foremost, I had to build and run using Java 1.5. OpenJPA 1.2 doesn’t support Java 6.

Tomcat is not a full J2EE stack. To serve up JPA in Tomcat requires a JPA service – I used Hibernate-entitymanager.

JPA requires a little help. Specifically, I used the InstrumentationLoadTimeWeaver to provide the services needed to process the annotations.

The weaver itself requires help. And to enable the weaver in Tomcat, I needed the spring-agent.

To the Tomcat6 lib directory I added:

spring-tomcat-weaver jar
spring-agent jar

But that’s not enough! The agent won’t turn itself on automatically. So I need to add a “-javaagent” to Tomcat’s startup. The easiest way to do that was to create a CATALINA_BASE/bin/setenv.sh file:

#!/bin/sh
CATALINA_BASE=/usr/local/apache-tomcat-6.0.18
JAVA_OPTS=”-javaagent:$CATALINA_BASE/lib/spring-agent-2.5.4.jar”

Tomcat 5

I think this all works more or less the same in Tomcat5, except that there are 3 library directories instead of the one library that Tomcat6 uses so the location for the spring suport jars is different. common/lib seems to work, although I’m not sure it’s the best choice.

That’s half the battle. Next up: JSF/RichFaces – and Maven

ORM and Imprecise Data Types

JPA is great in a lot of ways. But there’s one problem a lot of people have with JPA and ORM in general – imprecise keys.

Technically, this is as much a JDBC problem as an ORM problem. It just seems to cause more problems in ORM. Perhaps because ORM lets you concentrate less on the raw data details.

Anyone who has taken a basic programming course has (hopefully) had it pointed out to them that it’s dangerous to compare for exact equality on floating-point numbers. Most binary representations of floating point are imprecise on fractions, and even a simple 0.1 has no simple binary value.

Less obvious, however, is that times and dates are also imprecise formats. Technically, not dates, but in the real world, dates and times are often intermingled even when you don’t expect them to be.

There are 3 primary time/date representations commonly found in Java:

java.sql.Date

DBMS-specific dates.

I won’t address high-precision time data types. They’re less likely to surprise people.

java.sql.Date is, in theory granular to one day. In practice, it’s a subclass of java.util.date, so that’s not literally true.

java.util.Date is granular to one millisecond. Since java.sql.Date doesn’t enforce granularity, you can get in trouble with java.sql.Dates which have time-of-date day in them. Especially when dealing with Calendar timezone conversions.

Things get even more interesting when these objects are used in ORM and persisted out to a database. Oracle Dates have a granularity of 1 second – there’s no standard Java time class that reflects this. So if you persist out a java Data object, it may – generally will – get silently truncated. Thus your in-memory date and your database dates will not compare equal!

As bad as this is, it’s worse if you try and use that date as a primary or foreign key. You’ll get invalid results, since there’s no such actual value in the database. More insidiously, in an ORM environment, you can make queries without realizing it. That is, if you retrieve an object that’s liked to another object and the actual linkage was a date, the linkage may fail for non-obvious reasons.

There’s no easy fix for this. You can write your own Date class that enforces the granularity of your choice (truncates/rounds to seconds in the case of Oracle), but then you have to configure the data type mapping in your ORM configuration. You can write accessor functions to do the same thing, but if you forget to use one, the program will fail for non-obvious reasons. It’s best to avoid using dates as keys, but this isn’t always an option, and even non-key dates have their perils.

Oracle isn’t the only – or perhaps even the worst – offender. It turns out that PostgreSQL has an even more insidious date problem. By default, the Oracle time data types are floating-point. Which means that they’re imprecise. And if dates are bad as keys, floating-point numbers are a thousand times worse!

There are 2 possible ways to handle that. One is to build a custom copy of the PostgreSQL server using the option of internally storing fractional seconds in interger form. Probably not going to happen, since not only does this mean you have to have permission to run a non-standard server, but also the internal table data won’t be freely interchangable with standard-build tables. This would be an even worse problem than it is, except that PostgreSQL is notorious for changing internal structure even between minor releases.

The other alternative is to define the time value with fractional seconds truncated – for example, as TIMESTAMP(0). It is, unfortunately, not possible to accurately represent millisecond values in a timedate value on a stock PostgreSQL server, so the next best thing is to simply hack them off if you intend to retrieve by time or date.

Detached objects and JSF

JSF and JPA have proven to be more problematic than expected. The upside of the JSF framework is that the datamodel objects can be presented more or less right up to the page view layer without recourse to Data Transer Objects (DTOs). The downside, is that what’s presented isn’t always clear, leading to the dreaded OptimisticLockingException.

Officially, you get an Optimistic Locking Exception when your in-memory model is out of sync with the database (the database is more up-to-date than the model). In actuality, all that really needs to happen is for the ORM manager to think the model is out of sync with the database.

This perception seems to be distressingly easy cause. I’d blame it on a bug in OpenJPA, but I had similar issues with the detach/attach model of JDO. I’ve not seem any good writeups on how to prevent the problem, but I’ve come up with a means of handling it (right or wrong).

In Struts, the issue was more obvious. It may be the same mechanism in JSF, just more obscure.

Here’s what happens:

1. You fetch in a record, display it, get the user’s input back.

2. You merge the update with the database. OB so far.

By default, JSF will redisplay the same form. If you then do more changes and attempt to merge them, you get an OptimisticLockingException.

In theory, this shouldn’t be happening, but in the current and development releases of OpenJPA (up to 1.2.0), I can do only one merge.

As it turns out, the simple solution is to do the update, then fetch a brand-new copy of the object if further edits are required/anticipated. That makes it almost a return to the old-time concept of EJB where a handle could be used to re-activate a copy of the EJB. Only instead of a handle, use the object’s primary key. This should be low-overhead assuming the object’s still in cache, but it is irritating.

As to what’s actually causing the problem, it seems that the “dirty” flags for the object don’t get cleared when the merge is done and the EntityManager merge() method returns the object. So on the next merge, the database is updated, but the original pre-merge criteria were applied.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30