Database Recommendation – Managing Tens of Billions of Elevation Data

database-recommendationspatial

I have a need of a service which gives me the elevation for a coordinate (latitude, longitude). There is a dataset called SRTM3 where you have this information in a list of files. Each file contains a representation of the elevation value in a 1201*1201 square and each elevation value is stored as a big endian integer. There are around 20 000 files in this dataset.

For example, for the latitude 50.4161 and longitude 14.12345, you would find the data inside one of these 1201*1201 values of the file N50E014.hgt.

My question is this : if I were to import all these data in a table, I would have around 20 000 * 1201 * 1201 = 28 848 020 000 records.

  • Is it the right way to go in storing this in a database? If yes, what system would you recommend? The only request I would have to make is something like:

    SELECT elevation 
    FROM elevation e 
    WHERE e.min_latitude < latitude 
    AND e.max_latitude > latitude 
    AND e.min_longitude < longitude 
    AND e.max_longitude > longitude`
    
  • Or should I just load the data from the raw file directly on request? But it means loading a file on each request (I could keep some of the most used in memory but it does not seem efficient).

EDIT: I was thinking about something more around the line of a task queue than opening the file on each request but let's forget this and keep our focus on database system. @Paparrazi said that a conventional database can handle this much data and advised with a cluster index. I thought of that previously but in this case, once the data are loaded, the only operation will be a read (never or almost never a write). It seems kind of useless to use these conventional databases with lot of feature like transactions, … when you don't need it. Is there another alternative ? In bigdata oriented database (cassandra, mongo, …) ?

EDIT2: I am looking into @MickyT answer with postgis. It seems to fit my needs. I will try to import the data into the index clustered table or the postgis raster one, benchmark the 2 solutions and post my result here.

Best Answer

PostgreSQL with the PostGIS extension would appear to have everything you need to manage this data.

PostGIS has a fairly rich set of functions that deal with rasters like the files that you have described. Further documentation on enabling and using rasters is here.

Once you have the rasters loaded a query like the following should perform reasonably well

SELECT p.Identifier, ST_Value(r.rast, p.geog) AS elevation
FROM query_point p
    INNER JOIN elevation_raster r ON ST_Intersects(r.rast, p.geog);