I can speak for Postgres. PostgreSQL is really fast to copy to if you are a super user. Postgresql has a copy to
command. Merely save the file as a csv (which can also be opened in excel) and use:
COPY table FROM 'fpath' CSV HEADER DELIMITER ';'
You can copy anything with COPY (SELECT STATEMENT) TO 'fpath'
as well.
If you must use excel formats like .xlsx, why not use something like Pentaho and develop an application that can connect to any database. There is a java api for this one. Other ETL tools may do the trick too.
In these instances, upload speed is under 10 minutes. My last upload over a non-commercial network of 900000 rows lasted 20 minutes in Pentaho. A straight copy to in PostgreSQL is actually much faster.
As for speed, querying is your major problem. If you are just copying and querying, Postgres is fine. However, if you are running intensive queries, counts; aggregation, use an ETL tool like Pentaho that you can run from Java and connect with a connection pool like BoneCP for Postgres or another connection pooling software for other databases.
Most databases support connection pooling but have connection limits so look at your configuration settings.
A Fast Method
A probably faster way to do this is the following:
- Provide the data in a database table
- Execute a MERGE statement that merges the data from this table into the source table
To provide the data in a database table you can use sql loader or [external tables].
http://docs.oracle.com/database/121/SQLRF/statements_9016.htm#SQLRF01606
create table big_tab(
id number primary key,
value1 number,
value2 number,
value3 number)
;
create table mod_tab(
id number,
value1 number,
value2 number,
value3 number)
;
Now assume you have the following data in big_tab.
SQL> select * from big_tab;
ID VALUE1 VALUE2 VALUE3
1 11 21 31
2 12 22 32
3 13 23 33
4 14 24 34
5 15 25 35
you have a csv file mod_data.csv with the following values
1,111,121,131
3,113,123,133
you can load this into the table with sqlldr load.ctl DIRECT=true
with the sql loader control file sql.ldr containing
LOAD DATA
INFILE "mod_data.csv"
TRUNCATE
INTO TABLE mod_tab
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
id char,
value1 char,
value2 char,
value3 char
)
Using the option "DIRECT=true" will be faster thean loading without this option.
Now you have
SQL> select * from mod_tab;
ID VALUE1 VALUE2 VALUE3
---------- ---------- ---------- ----------
1 111 121 131
3 113 123 133
you can merge this into big_tab with the following statement
MERGE INTO big_tab
using mod_tab on (big_tab.id=mod_tab.id)
when matched then update
set
value1=mod_tab.value1,
value2=mod_tab.value2,
value3=mod_tab.value3
;
and get
SQL> select * from big_tab;
ID VALUE1 VALUE2 VALUE3
---------- ---------- ---------- ----------
1 111 121 131
2 12 22 32
3 113 123 133
4 14 24 34
5 15 25 35
If the processing was ok you can cleanup
truncate table mod_tab;
An Even Faster Method
If you have a lot of data to modify, there is a way that is even better (at least if your io system is fast):
- create a auxiliary table with the same structure as big_tab table
- insert the modify rows in this auxiliary table
- insert the rows from big_table that will not be modified in the auxiliary table
- drop the big_tab table
- rename the auxiliary table to big_tab.
- add constraints and indexes to the new big_tab table
This can be don by the following script
create table aux_tab as select * from big_tab where 1=0;
alter table mod_tab add primary key (id);
insert /*+ APPEND */ into aux_tab select big_tab.id id,
decode(mod_tab.id,null,big_tab.value1,mod_tab.value1) value1,
decode(mod_tab.id,null,big_tab.value2,mod_tab.value2) value1,
decode(mod_tab.id,null,big_tab.value3,mod_tab.value3) value3
from big_tab, mod_tab
where big_tab.id=mod_tab.id(+)
;
drop table big_tab;
rename aux_tab to big_tab;
alter table big_tab add primary key (id);
alter table mod_tab drop primary key;
truncate table mod_tab;
And Even Faster
You should always do your load and insert in direct path mod. Therefore I added and APPEND
hint to the insert statement and the DIRECT=true
option to the SQL*Loader command. But you can try to parallelize some actions to get even faster. Parallel statement work optimal on partitioned tables so it may make sense to partition big_tab and also mod_tab and aux_tab.
Best Answer
It's very easy to do this in access. Setup an ODBC Connection on the users computer then add in a linked-table into the access db.
I don't think it would take too long for someone to create an application with a nice winform interface. We have one at my office where if you create a function/stored procedure in SQL starting with certain characters, lets say dbo.app_*******, it will appear as a report in the application and any parameters in the stored procedure will prompt the user to input a value in the app.