ORM and Imprecise Data Types

JPA is great in a lot of ways. But there’s one problem a lot of people have with JPA and ORM in general – imprecise keys.

Technically, this is as much a JDBC problem as an ORM problem. It just seems to cause more problems in ORM. Perhaps because ORM lets you concentrate less on the raw data details.

Anyone who has taken a basic programming course has (hopefully) had it pointed out to them that it’s dangerous to compare for exact equality on floating-point numbers. Most binary representations of floating point are imprecise on fractions, and even a simple 0.1 has no simple binary value.

Less obvious, however, is that times and dates are also imprecise formats. Technically, not dates, but in the real world, dates and times are often intermingled even when you don’t expect them to be.

There are 3 primary time/date representations commonly found in Java:

java.sql.Date

java.sql.Date

DBMS-specific dates.

I won’t address high-precision time data types. They’re less likely to surprise people.

java.sql.Date is, in theory granular to one day. In practice, it’s a subclass of java.util.date, so that’s not literally true.

java.util.Date is granular to one millisecond. Since java.sql.Date doesn’t enforce granularity, you can get in trouble with java.sql.Dates which have time-of-date day in them. Especially when dealing with Calendar timezone conversions.

Things get even more interesting when these objects are used in ORM and persisted out to a database. Oracle Dates have a granularity of 1 second – there’s no standard Java time class that reflects this. So if you persist out a java Data object, it may – generally will – get silently truncated. Thus your in-memory date and your database dates will not compare equal!

As bad as this is, it’s worse if you try and use that date as a primary or foreign key. You’ll get invalid results, since there’s no such actual value in the database. More insidiously, in an ORM environment, you can make queries without realizing it. That is, if you retrieve an object that’s liked to another object and the actual linkage was a date, the linkage may fail for non-obvious reasons.

There’s no easy fix for this. You can write your own Date class that enforces the granularity of your choice (truncates/rounds to seconds in the case of Oracle), but then you have to configure the data type mapping in your ORM configuration. You can write accessor functions to do the same thing, but if you forget to use one, the program will fail for non-obvious reasons. It’s best to avoid using dates as keys, but this isn’t always an option, and even non-key dates have their perils.

Oracle isn’t the only – or perhaps even the worst – offender. It turns out that PostgreSQL has an even more insidious date problem. By default, the Oracle time data types are floating-point. Which means that they’re imprecise. And if dates are bad as keys, floating-point numbers are a thousand times worse!

There are 2 possible ways to handle that. One is to build a custom copy of the PostgreSQL server using the option of internally storing fractional seconds in interger form. Probably not going to happen, since not only does this mean you have to have permission to run a non-standard server, but also the internal table data won’t be freely interchangable with standard-build tables. This would be an even worse problem than it is, except that PostgreSQL is notorious for changing internal structure even between minor releases.

The other alternative is to define the time value with fractional seconds truncated – for example, as TIMESTAMP(0). It is, unfortunately, not possible to accurately represent millisecond values in a timedate value on a stock PostgreSQL server, so the next best thing is to simply hack them off if you intend to retrieve by time or date.