Sql-server – Ideal configuration and structure for single db + single table + 20 million static records

mariadbMySQLPHPphpmyadminsql server

I've spent the last few weeks trying to learn SQL the best I can and after doing research and reading many articles, specifically (This SO Thread), I have ran out of luck and was hoping to get some better advice to achieve the result I need.


My setup:

  1. Single Database (Running MariaDB, Though this could be
    changed)
  2. Single Table with 20+ Million records that has static data and is never changed. (This is also saved to a csv file)
  3. Windows server 2012 datacenter (64gb ram + Xeon E5 @ 2.4ghz with 12 cores & 24 logical processors) – (Could change OS if recommended)
  4. I have been using the latest Xampp/phpmyadmin at the moment while testing different methods (but plan to use a more enterprise solution like sqlserver).

What I need to achieve:

I need to be able to run a query (with the least delay as possible) against the data either it be to the database/table or to the csv file directly, and return a dataset containing 4 of the column values.


My data:

The data is historic property purchase information that dates back to 1999. A property can be bought and sold numerous times so the table has duplicated records for the (postcode, address and date), however the uuid of every row is unique.

Example Query:

To run query's I have been using laravel / php as I am not too advanced with mysql command line and would go something like:

$query = DB::table('postcode_records')->select('uuid', 'postcode', 'address', 'sale_date')->where('postcode', '=', $pcode)->get();

This will query the table and return all the records for the given postcode. This works fine however it takes 3-4 minutes to return the result and would be horrendous when in production. After reading many articles I'm hoping to achieve a much faster response time of ms or 1/2 seconds if possible.


My Structure:

(1) – This is the current structure with multiple composite keys..

+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| uuid               | varchar(250) | NO   | PRI | NULL    |       |
| sale_price         | int(10)      | NO   |     | NULL    |       |
| sale_date          | datetime     | NO   | PRI | NULL    |       |
| postcode           | varchar(15)  | NO   | PRI | NULL    |       |
| house_name_num     | varchar(50)  | NO   |     | NULL    |       |
| flat_or_apartment  | varchar(50)  | NO   |     | NULL    |       |
| street             | varchar(50)  | NO   |     | NULL    |       |
| town               | varchar(150) | NO   |     | NULL    |       |
| address            | varchar(150) | NO   |     | NULL    |       |
| city               | varchar(150) | NO   |     | NULL    |       |
| district           | varchar(152) | NO   |     | NULL    |       |
+--------------------+--------------+------+-----+---------+-------+

(2) – I have tried this structure with a single primary key however no difference in query time.

+--------------------+--------------+------+-----+---------+-------+
| Field              | Type         | Null | Key | Default | Extra |
+--------------------+--------------+------+-----+---------+-------+
| uuid               | varchar(250) | NO   | PRI | NULL    |       |
| sale_price         | int(10)      | NO   |     | NULL    |       |
| sale_date          | datetime     | NO   |     | NULL    |       |
| postcode           | varchar(15)  | NO   |     | NULL    |       |
| house_name_num     | varchar(50)  | NO   |     | NULL    |       |
| flat_or_apartment  | varchar(50)  | NO   |     | NULL    |       |
| street             | varchar(50)  | NO   |     | NULL    |       |
| town               | varchar(150) | NO   |     | NULL    |       |
| address            | varchar(150) | NO   |     | NULL    |       |
| city               | varchar(150) | NO   |     | NULL    |       |
| district           | varchar(152) | NO   |     | NULL    |       |
+--------------------+--------------+------+-----+---------+-------+

The data:

"000000D6-CFA4-476E-95A4-8680BE96B482","181995","2005-12-14 00:00:00","ST11 9TL","S","Y","F","12",,"HOFFMAN DRIVE","BLYTHE BRIDGE","STOKE-ON-TRENT","STAFFORD","STAFFORDSHIRE","A","A"
"000000FE-94CA-47DA-8D75-6FDFA5960D75","75000","2002-03-20 00:00:00","DA9 9PT","F","N","L","20",,"SWALLOW CLOSE","GREENHITHE","GREENHITHE","DARTFORD","KENT","A","A"
"0000012D-3A97-4FF7-BCA7-897FAA91E25B","52500","1997-06-27 00:00:00","BS20 6JQ","S","N","F","14",,"AVON WAY","PORTISHEAD","BRISTOL","NORTH SOMERSET","NORTH SOMERSET","A","A"

I hope I have not confused anyone. I have tried to keep the post as basic but at same time providing an understanding of the issue I face and what setup I have.

Any help will be a god send, I have tried for weeks to resolve this!

Best Answer

All you're missing is an index on postcode. In your query you're doing this: where('postcode', '=', $pcode

It almost certainly translates at the database layer to select a, b, c from table where postcode = x.

Without the index on postcode you're doing a slow scan of the entire table, with an index you can do a fast seek.