DBUnit and CSV reference data

The CSV capabilities of dbUnit are under-documented. Here’s the results of some pain, suffering and debug tracing:

The CSV files are expected to be one-per-table. The tablename is part of the filename, thus: “TABLE1.csv”. Format is the usual, with the first row containing the column names and subsequent rows containing data. It is possible to customize the delimiters and separators, but the defaults work with bog-standard CSV. One possible reason to override is to allow use with pipe-separated format files.

Here’s some code to snapshot an SQL query out to CSV for use as a later test reference (or whatever).

Connection con =
DataSourceUtils.getConnection(dataSource);
IDatabaseConnection dbUnitCon =
new DatabaseConnection(con, "MYDB");

ITable actualTable =
dbUnitCon.createQueryTable(
"SNAPTABLE",
"SELECT * FROM QTABLE "
+ " WHERE INK1='GLORP'"
+ " AND INK_ID in('ABEND001', 'AAA', '07CSI')");

// Take reference snapshot:
IDataSet ds1 = new DefaultDataSet(actualTable);
CsvDataSetWriter.write(ds1, new File("/home/timh/csvdir"));

dbUnit will create csvdir, if needed and output 2 files. The SNAPTABLE.csv file and a file named “table-ordering.txt”.


To load SNAPTABLE for use in validating the results of a test:

// Load expected data from CSV dataset
CsvDataFileLoader ldr = new CsvDataFileLoader();
// NOTE: terminal "/" on URL is MANDATORY!!!
IDataSet expectedDataSet =
ldr.load("/testdata/snapshots/");
ITable expectedTable =
expectedDataSet.getTable("SNAPTABLE");

// Assert actual database table match expected table
Assertion.assertEquals(expectedTable, actualTable);

Note that while you can specify case-insensitivity on table names (it’s the default), the case of the SNAPTABLE.csv file and the SNAPTABLE entry in table-ordering.txt must match – at least on case-sensitive OS’s. And it’s good practice regardless. Also note that table-ordering.txt can contain multiple table names, one tablename per line.

Finally, note that the error exception for the comparison assert counts line numbers off by 2. It doesn’t take the column-name row into account, and it starts counting from 0, instead of the more usual case (for databases) of counting starting at 1.

important! The CSV reader is not very flexible when it comes to reading NULL values. Null fields MUST be given a value of “null”, lower case, WITHOUT surrounding quotes. Like so:

“Fred Smith”,”127.0″,null,null,”Jabber”