Postgresql – How make the architecture scalable to handle linear process

pgroutingpostgispostgresql

I have a system to calculate Real Time Traffic (RTT), using Automatic Vehicle Location (AVL).

I can know where the car was, where is now, and calculating distance and time can estimate the traffic speed on those roads. I have two store procedures:

  • near_link using avl (x,y) found the near road link
  • create_route using pgrouting extension, current link and previous link can calculate the route the vehicule took.

Both process are very linear, calculate the near_link for 1 row take 10 ms, calculate for 100 rows take 1000 ms or 1 sec. calculate the route is a litle more expensive 50 ms for one route or 5 sec for 100 routes.

The problem is the avl fleet growth, so instead of 400 avl/min I now receive 2000 avl/min. And instead of 400*60ms = 24 sec now I need 2000*60 = 120 sec, so every minute I can only process half of the data.

The only solution I can think right now is have two separated servers one handle even car_id and the other odd car_id so split the load between both servers.

Currently Im using just a production Desktop, Windows i3 Core 3Ghz 8gb RAM normal disks. I can request a better hardware for the production server. For example I know the querys are very HDD demanding because need to check the map_rto table very often, but I can see in the resources monitor CPU and Memory very low use. So I could upgrade to SDD disk. And hope my time is reduce to half.

But what happen when the fleet increase to 4000 avl/min or to 8000 avl/min. What are the strategies to scale those linear calculations??.


Aditional description of current design:

I like the idea of use of pgAgent and triggers to move the data from one stage to another. But maybe there is a better way to do it.

enter image description here

  • avl_sources (table): I can have multiple companies providing data and each one has a separated table and there are triggers on each table to insert new rows into the avl_pool table
    • Each row has: car_id, x, y, azimuth, datetime.
    • I check external source every min, receive around 2000 records and take ~5 sec to finish all the process.
  • map_rto (table): contain my coutry roads information. There are ~3 million links. Is used to calculate the near_link and the route
  • near_link (sp): Using avl x, y, azimuth try to find the closest link to that position. I call this sp from a pg_agent job every minute.
  • export (sp): Also call this sp from pgAgent. Move those avl with near_link to traffic_avl table
  • traffic_avl (table):: This table has a trigger to using the current position and previous position calculate the route.

Best Answer

Since you suspect the disk is the limiting factor, upgrading to SSD sounds like your best option if you are focused on hardware. More RAM might also work, if it would allow to cache most of the slow table/indices. (How big is map_rto and its most-used index)?

If you would rather try tuning the hardware you already have, or making sure that that is really the issue before buy hardware, then you should identify the slow queries from your stored procedures, and follow this advice for them.

Particularly useful would be turn on track_io_timing and post the EXPLAIN (ANALYZE, BUFFERS) for the slow queries.

Finally, is the code that calls these two functions single-threaded? If so, you could try making it parallel, with different database connections for each thread.