Prior to SQL 2008, the most common solution was to use a UDF to calculate the great-circle distance between two points on a sphere. The Haversine formula is probably the most commonly used method.
Of course the Earth is not actually a perfect sphere, but this was considered "good enough" for most uses.
In SQL 2008, as you anticipated, such calculations are simplified and made more accurate by the introduction of the Geography and Geometry data types. Here's a brief sample of how you can use them to simplify distance calculations.
DECLARE @locations TABLE(locname VARCHAR(100), coord geography)
DECLARE @loc1 geography
DECLARE @loc2 geography
INSERT INTO @locations
VALUES('HOME', geography::Point(-81.810194, 41.478156, 4326)) --Note: Lat, Long, SRID
--The 4326 is the SRID (spatial reference id) used by SQL as
--a reference to the WGS 84 Standard. This is the same reference
--used by the GPS system
INSERT INTO @locations
VALUES('WORK', geography::Point(-81.687771, 41.498227, 4326))
SELECT * FROM @locations
SELECT @loc1 = coord FROM @locations WHERE locname = 'HOME'
SELECT @loc2 = coord FROM @locations WHERE locname = 'WORK'
SELECT @loc1.STDistance(@loc2) * 3.2808399 --STDistance is in meters so we multiply to convert to feet
SELECT @loc1.STIntersects(@loc2.STBuffer(300 / 3.2808399)) as isWithin300Ft --This formula returns True when the point @loc1 intersects with the 300ft buffer zone around @loc2
The SRID is the key to the improved accuracy. The WGS 84 specification to which it refers includes a standardized coordinate system and a reference ellipsoid. In other words, it accounts for the non-spherical nature of the Earth, giving better results than a pure spherical Great Circle calculation.
If GIS accuracy is important to your work, this is the simplest way to implement it in SQL 2008.
FILLFACTOR
only applies when you build or rebuild an index, not during normal operation. Normal operations always try to fill the pages to 100%.
If you insert a row that has a variable width, then update the row to be longer, that row will no longer fit on the page if there isn't enough extra space to store the after-image on the same page. If there isn't enough space, this will cause a page split, which is the process that creates the necessary space.
What's a bit misleading about page splits is that there are "good splits" and "bad splits," even though the performance counter counts all of them.
A good split is when new rows are added to the end of the index, like what happens when you do your initial batch of INSERT
s. The new row doesn't fit on the last page, so the storage engine must allocate a new page and logically hook it up to the last index page. The new page will probably exist physically after the old-last index page.
A bad split is when a page must be inserted in the middle of an index: the new page is attached to the index structure logically on both sides, but may (likely) not exist in contiguous physical order to those pages.
Fragmentation is the discrepancy between the logical and physical order, and for the most part, only the bad type of page splits cause fragmentation.
Because you're inflating the size of already-existing rows, this causes the bad type of page split, which is why you're seeing high fragmentation numbers.
It's unclear what your exact process is, and how many rows are ultimately going to end up in this table. If it's something like a one-time population, do the full population process, then turn around and rebuild the clustered index with 100% FILLFACTOR
.
If this process happens continuously, you could "pre-allocate" the space by doing the following: add a dummy variable-width column to the table, populate it to the max length on INSERT
, and then on the first UPDATE
, set the dummy value to NULL
while updating the real value. This method will probably add overhead to the logging mechanism because of all the data values flying around.
In general, though, you only need to concern yourself with fragmentation when all of these factors are true:
- The index is large enough that randomly reading the pages (extents) is too slow (choose your own definition of what "too slow" means)
- The index will be scanned
- The index pages (extents) will be physically read from disk (i.e., they are not already in the buffer pool)
- The storage subsystem is poor at handling random reads (i.e., spindle-based, and the bytes won't come out of cache)
I would recommend, however, putting in place a solid index maintenance solution to keep things more or less in check. You can search this site for recommendations with regards to that.
Best Answer
There is no one answer for all with fill factor. It largely depends on how your data modifications take place database-wide or in particular indexes. The default, out-of-the-box is 0 (100) which leaves virtually no space in a page. What this means is an insert into the middle of a page will cause a page split.
Monitor your Page Splits/sec counter and if you are seeing a high rate, then you may want to reconsider your fill factor. But don't just blindly set that low, as you will be using more disk space with a lower fill factor. And naturally, it will cause higher IO reads, as there is more pages to house the data.
I recommend you look at my answer here regarding fill factor. It's a different type question, but it should give you a better idea about fill factor and how it affects you.