Postgresql – Best way to async data inserts into timescaledb

postgresqltimescaledb

Having 3 files with one data set of daily data inside:

2019-09-11.txt
2019-09-12.txt
2019-09-13.txt

Having lines like:

2019-09-11 15:59:37.459802,1
2019-09-11 15:59:38.959802,1
[...]

The files are sorted by a timestamp, which is the main index in the hypertable:

CREATE TABLE IF NOT EXISTS l1(
    timestamp TIMESTAMP(6) NOT NULL,
    data INT

So it is a lot of data…
Is there a way to reach a higher insert performance rate by asyncing the INSERT INTO commands to one insert process per input file/date?

Best Answer

If the files are comma separated or can be converted into CVS, then use Timescale tool to insert data from CVS file in parallel: timescaledb-parallel-copy

A manual approach to insert data into hypertable can be to create several sessions of PostgreSQL, e.g., by executing psql my_database in several command prompts and insert data from different files into the same hyperatble. It is important that parallel loading will insert data into different chunks. For example, if hypertable is created with default chunks size, one chunk will fit one week of data. Thus different parallel sessions should read data from files belonging to different weeks. Otherwise, parallel execution will not perform well due to conflicts. Also you want to fit active chunks into memory.

In both cases it is good that the data are already sorted by timestamp as mentioned in the question.

Best Answer

Related Solutions

Postgresql – EC2 – How to correctly back up PostgreSQL data

Mysql – Partitioning MySQL vs PostgreSQL

Related Question