MongoDB has built in support for geoindexing. You don't need to do the calculation yourself.
Basically, you would create a field with the lat/long stored as an array or as sub documents, something like one of these:
{ loc : [ 50 , 30 ] } //SUGGESTED OPTION
{ loc : { x : 50 , y : 30 } }
{ loc : { lon : 40.739037, lat: 73.992964 } }
Then index the new loc field appropriately:
db.places.ensureIndex( { loc : "2d" } )
Finally you can then use one of the operators to query a point for the nearest 20 results:
db.places.find( { loc : { $near : [50,50] } } ).limit(20)
You could, of course, just use MongoDB to store the data, then pull the information out of the DB with a find() and do the calculation client-side but I imagine that is not what you want to do.
If the distance part of the equation is what you want:
http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-geoNearCommand
The $geoNear operator will return the distance also. An example:
> db.runCommand( { geoNear : "places" , near : [50,50], num : 10 } );
{
"ns" : "test.places",
"near" : "1100110000001111110000001111110000001111110000001111",
"results" : [
{
"dis" : 69.29646421910687,
"obj" : {
"_id" : ObjectId("4b8bd6b93b83c574d8760280"),
"y" : [
1,
1
],
"category" : "Coffee"
}
},
{
"dis" : 69.29646421910687,
"obj" : {
"_id" : ObjectId("4b8bd6b03b83c574d876027f"),
"y" : [
1,
1
]
}
}
],
"stats" : {
"time" : 0,
"btreelocs" : 1,
"btreelocs" : 1,
"nscanned" : 2,
"nscanned" : 2,
"objectsLoaded" : 2,
"objectsLoaded" : 2,
"avgDistance" : 69.29646421910687
},
"ok" : 1
}
The "dis" : 69.29646421910687
elements are what you are looking for, there is also a spherical distance option.
For all this, how to use the distances, and more, take a look here for more information on geo indexes and how to use them:
http://www.mongodb.org/display/DOCS/Geospatial+Indexing/
Update:
Since this was published, the problem evolve and now we deal with 140kk+ rows. However, Postgis also evolved and it is now possible to "fix" the the feature table. No more need of Dice
from ArcMap.
I did it using a ST_VoronoiPolygons approach. I created a working gist with a function that breaks features on the original table in feasible sizes for processing.
Original Answer:
For information, I was able to do this using other tools and splitting the workload:
- To everything work fine, I had to limit each feature in 800 vertices
max and 15 kmĀ² max. I tried some tools and even a recurring procedure
on plsql, but without success. The only thing that I tried and split
everything correctly was the Dice feature of ArcMap;
- I divided my 170k+ rows in chunks of 20 rows and ran six
instances of the query in parallel to compute the area using a .net console app;
- With the area stored, I was able to do some processing, also in
chunks and parallel, to compute the value of each single 9kk+ cell.
This process now takes "only" 3 hours to finish.
The ST_Intersects
is light. The problem was this calculation:
st_area(st_intersection(grade.the_geom, uso.geom)).
Build the intersection of complex features and calculate the area was the demanding task.
Best Answer
Don't use
ST_Distance
. It will never use an index. Instead use KNN distance with <->, and useST_DWithin
when possible.To find the nearest point, you can do..
To find the nearest point within one mile,
Also, unless your postgis is old don't ever do
From the docs,
Make sure in the above you have a spatial/gist index on A.geom and B.geom. And, consider clustering both of them on those indexes.