Will Derby, H2, or SQLite give faster load time and/or smaller file size than HSQL

database-designderbyh2hsqldbsqlite

I have some flat files with the following columns; 3 integers, 3 reals, and 1 varchar(20). For querying I need an index that contains both 1 of the integer columns and the varchar column. Each file is around 1.8GB in size with around 38 million rows.

Currently I am using a HSQL(Standalone) database to load a file for processing; one database per file. It is very slow to load (120+ min) the file and results in a 4.7GB database file when the database is created with the following options.

"Properties" -> {
  "check_props" -> "true",
  "shutdown" -> "true",
  "hsqldb.default_table_type" -> "cached",
  "sql.syntax_mss" -> "true",
  "hsqldb.log_data" -> "false",
  "hsqldb.inc_backup" -> "false" 
  }

The file read in batches of 100k records. The read is very fast (almost instant) so I do not think it is the read that is slowing things down. It also takes a very long time to close the connection to the database.

I have the option to use Derby, H2, or SQLite. Will any of these result in faster load time and/or smaller database file size in this scenario? If so what are the connection string options that should be used to achieve this? Alternatively, are there different connection string options I can use with HSQL(Standalone) that will reduce the load time and/or database file size?

Driver information added.

JDBCDriver[
  "Name" -> "HSQL(Standalone)", 
  "Driver" -> "org.hsqldb.jdbcDriver", 
  "Protocol" -> "jdbc:hsqldb:file:", 
  "Version" -> 3.1, 
  "Description" -> "HSQL Database Engine (In-Process Mode) - Version 2.3.3 -  This ...", 
  "Location" -> "C:\... "]

Driver information for the other options available to me.

Derby

JDBCDriver[
  "Name" -> "Derby(Embedded)", 
  "Driver" -> "org.apache.derby.jdbc.EmbeddedDriver", 
  "Protocol" -> "jdbc:derby:", 
  "Version" -> 3.1, 
  "Description" -> "Derby Database Engine (Embedded Mode) - Version 10.12.1.1 - This...",
  "Location" -> "C:\... "]

H2

JDBCDriver[
  "Name" -> "H2(Embedded)", 
  "Driver" -> "org.h2.Driver", 
  "Protocol" -> "jdbc:h2:", 
  "Version" -> 3.1, 
  "Description" -> "H2 Database Engine (Embedded Mode) - Version 1.3.176 - This...",
  "Location" -> "C:\... "]

SQLite

JDBCDriver[
  "Name" -> "SQLite", 
  "Driver" -> "org.sqlite.JDBC", 
  "Protocol" -> "jdbc:sqlite:", 
  "Version" -> 3.1, 
  "Description" -> "SQLite using Zentus-derived JDBC Driver - Version 3.8.11.2", 
  "Location" -> "C:\..."]

Additional variants include the below. However, I need it all to run on the client's computer. I believe this excludes server and webserver modes.

{"Derby(Embedded)", "Derby(Server)", "H2(Embedded)", "H2(Memory)", 
 "H2(Server)", "HSQL(Memory)", "HSQL(Server)", "HSQL(Standalone)", 
 "SQLite", "SQLite(Memory)"}

Best Answer

You need a larger Java memory allocation and a larger HSQLDB memory cache size for faster loading of data into large tables.

Add hsqldb.cache_rows=4000000 and hsqldb.cache_size=1000000 to the startup configuration for a new database.

http://hsqldb.org/doc/2.0/guide/dbproperties-chapt.html#dpc_db_file_mem