Postgresql – intel p3600 SSD no increase in read speed – postgres rdbms vs hdd

hardwareperformancepostgresqlpostgresql-performanceread-only-databasessd

We run a custom application that produces analyzable data-sets for medical research.

the amount of data we have is about 600GB in form 3Billion rows distributed in 600+ tables it is hosted in postgres RDBMS and data is indexed and clustered.

Our application is read intensive(read only).

Usually it took about 7-8 hours on a normal computer running on a HDD to go through all this data and produce a dataset.

Recently we brought a Highend machine (intel i5 5960x, 32GB ram with an intel p3600 2TB SSD) and set up the database and our application.

I was amazed to find out it took almost the same amount of time that it used to take on the old machine to extract a dataset.

EDIT 1 : How the program works

The program is a Java Application

There are about 13Million patient's records in the database.
and each patient has 10000+ entries about their health/diagnosis/prescriptions etc. Hence the massive size.

What we do is look for patients with particular medical conditions(defined in a particular table) and pull out their medical history and other events(from different tables) from the database and do research on it.

The program is not CPU bound as we have very simple logic behind the program

I create a lot of java objects(at least 1 object per patient, which in turn is made up of lot of strings which hold values I pull out of database), hence we can say it is a bit memory intensive

Yes it is harddrive intensive as we read Billions of records, and that is why I though if i have a really fast harddrive I can get this running very fast. And hence brought a Intel P3600 2TB SSD

SAMPLE QUERY

These kind of queries are run on tables with 100,000+ of records for each patient 4-5 times per program execution.

SELECT Field1,Date  
FROM schema.table1  
WHERE Field1 IN ('aaaaaaa','bbbbbbb','ccccccc',...,'nnnnnnn' )   
patientid = 'abcdefg'  
AND Date <= 'xxxx-xx-xx'  
order by Field1,Date;  

Note: Field1,Date,patientID is indexed

EDIT 2:

**Explain Analyze Output:**

"GroupAggregate  (cost=51.75..51.78 rows=1 width=12) (actual time=1.112..1.112 rows=1 loops=1)"
"  Output: Field1, min(Date), Date"  
"  Group Key: medical.Date, medical.Field1"  
"  Buffers: shared hit=102 read=4 written=4"  
"  ->  Sort  (cost=51.75..51.76 rows=2 width=12) (actual time=1.107..1.108 rows=1 loops=1)"  
"        Output: Field1, Date"  
"        Sort Key: medical.Date, medical.Field1"  
"        Sort Method: quicksort  Memory: 25kB"  
"        Buffers: shared hit=102 read=4 written=4"
"        ->  Index Scan using patmedmed_index on a6658.medical  (cost=0.42..51.74 rows=2 width=12) (actual time=1.016..1.089 rows=1 loops=1)"  
"              Output: Field1, Date"  
"              Index Cond: ((medical.patid = '00Nb'::bpchar) AND (medical.Field1 = ANY ('{C100111,C100112,C100z00,C101.00,C101000,C101100,C101y00,C101z00,C102.00,C102000,C102100,C102z00,C103.00,C103000,C103100,C103y00,C103z00,C104.00,C104.11,C104000,C10410 (...)"  
"              Filter: (medical.Date <= '2015-01-01'::date)"  
"              Buffers: shared hit=102 read=4 written=4"  
"Planning time: 0.844 ms"  
"Execution time: 1.248 ms"  

I really need help

Best Answer

You should store your value for FIELD1 in a TABLE, instead of in a bunch of variables. Make a three level index, a COVERED index specifically.

your IN clause is preposterous. I mean.. Learn how to Normalize.

THIS:

WHERE Field1 IN ('aaaaaaa','bbbbbbb','ccccccc',...,'nnnnnnn' )   

is actually SLOWER than this:

SELECT Field1,Date  
FROM schema.table1  
WHERE Field1 = 'aaaaaaa'   
patientid = 'abcdefg'  
AND Date <= 'xxxx-xx-xx'  
UNION
SELECT Field1,Date  
FROM schema.table1  
WHERE Field1 = 'bbbbbbb'   
patientid = 'abcdefg'  
AND Date <= 'xxxx-xx-xx'  
UNION
SELECT Field1,Date  
FROM schema.table1  
WHERE Field1 = 'ccccccc'   
patientid = 'abcdefg'  
AND Date <= 'xxxx-xx-xx'  
UNION
SELECT Field1,Date  
FROM schema.table1  
WHERE Field1 = 'nnnnnnn'   
patientid = 'abcdefg'  
AND Date <= 'xxxx-xx-xx'  
order by Field1,Date