Sql-server – Is the concept of a clustered index in a DB design sensical when using SSDs

clustered-indexsql server

When designing a SQL server data schema and the subsequent queries, sprocs, views, etc. does the notion of a clustered index and order of data on disk make any sense to consider for DB designs made explicitly to be deployed on SSD platforms?

http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx
"A clustered index determines the physical order of data in a table."

On a physical disk platform, the design to consider them makes sense to me as a physical scan of the data to retrieve "sequential" rows can be more performant than a seek through the table.

On an SSD platform, all data read access uses an identical seek. There is no concept of "physical order" and data reads are not "sequential" in the sense that bits are stored on the same piece of silicon.

So, in the process of designining an application database is the clustered index consideration relevant to this platform?

My initial thought is that it is not because the idea of "ordered data" doesn't apply to SSDs storage and seek/retreival optimization.

EDIT: I know the SQL Server will create one, I'm just philosophizing about whether it makes sense to think about it during design/optimization.

Best Answer

Ask yourself another question: If the entire database is in memory and I never have to touch the disk, do I want to store my data in an ordered B-tree or do I want to store my data in an unordered heap?

The answer to this question will depend on your access pattern. On most cases your access requires single row look-up (ie. seeks) and range scans. These access patterns require a B-Tree, otherwise they are inefficient. Some other access patterns, common in DW and OLAP, are always doing aggregates over the entire table end-to-end always and they do no benefit from range scans. As you drill further other requirements come to light, like the speed of insert and allocation into a heap vs. B-Tree may play a role for huge ETL transfer jobs. But most times the answer really boils down to one question: do you seek or range-scan? The overwhelming number of times the answer is YES. And therefore the overwhelming number of times the design requires a clustered index.

In other words: just because is cheap to read it from disk in random order does not imply that you can trash your TLBs and L2 lines in a 64Gb RAM scan bonanza...

Related Solutions

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

A partitioned table is really more like a collection of individual tables stitched together. So your in example of clustering by IncidentKey and partition by IncidentDate, say that the partitioning function splits the tables into two partitions so that 1/1/2010 is in partition 1 and 7/1/2010 is partition two. The data will be layed out on disk as:

Partition 1:
IncidentKey    Date
ABC123        1/1/2010
ABC123        1/1/2011
XYZ999        1/1/2010

Partition 2:
IncidentKey    Date
ABC123        7/1/2010
XYZ999        7/1/2010

At a low level there really are two, distinct rowsets. Is the query processor that gives the illusion of a single table by creating plans that seek, scan and update all rowsets together, as one.

Any row in any non-clustered index will have have the clustered index key to which it corresponds, say ABC123,7/1/2010. Since the clustered index key always contains the partitioning key column, the engine will always know in what partition (rowset) of the clustered index to search for this value (in this case, in partition 2).

Now whenever you're dealing with partitioning you must consider if your NC indexes will be aligned (NC index is partitioned exactly the same as the clustered index) or non-aligned (NC index is non-partitioned, or partitioned differently from clustered index). Non-aligned indexes are more flexible, but they have some drawbacks:

non-aligned indexes require large amounts of memory for certain query plans
non-aligned indexes prevent efficient partition switch operations

Using aligned indexes solves these issues, but brings its own set of problems, because this physical, storage design, option ripples into the data model:

aligned indexes mean unique constrains can no longer be created/enforced (except for the partitioning column)
all foreign keys referencing the partitioned table must include the partitioning key in the relation (since the partitioning key is, due to alignment, in every index), and this in turn requires that all tables referencing the partitioned table contain partitioning key column value. Think Orders->OrderDetails, if Orders have OrderID but is partitioned by OrderDate, then OrderDetails must contain not only OrderID, but also OrderDate, in order to properly declare the foreign key constraint.

These effects I found seldom called out at the beginning of a project that deploys partitioning, but they exists and have serious consequences.

If you think aligned indexes are a rare or extreme case, then consider this: in many cases the cornerstone of ETL and partitioning solutions is the fast switch in of staging tables. Switch in operations require aligned indexes.

Oh, one more thing: all my argument about foreign keys and the ripple effect of adding the partitioning column value to other tables applies equally to joins.

Sql-server – How to enforce data consistency across one-to-one relationship in an inheritance hierarchy

Add a student/teacher attribute to Person. Since this attribute is dependent of the key in Person (whatever that is), no Person can be both a Teacher and a Student. Now it is a matter of guaranteeing that a person whose type is student is not added to Teacher etc.

For DBMS:s that support queries in check constraints you can do something like:

ALTER TABLE Student Add constraint ...
         CHECK ( (select type 
                  from person p
                  where p.<key> = <key>) = 'Student' )

If you DBMS does not support this type of construction you can add a super key in Person consisting of the primary key + the type attribute. Add the type attribute to Teacher and Student, add a check constraint that guarantees the type in those "subtables", and a foreign key that includes the type attribute:

ALTER TABLE Person ADD COLUMN type_attribute varchar(..) not null;
ALTER TABLE Person ADD CONSTRAINT ... UNIQUE (<key>, type_attribute);

ALTER TABLE Student ADD COLUMN type_attribute varchar(..) not null;
ALTER TABLE Student ADD CONSTRAINT ... CHECK (type_attribute = 'Student')
ALTER TABLE Student ADD CONSTRAINT ... 
     FOREIGN KEY (<key>, type_attribute)
     REFERENCES Person (<key>, type_attribute);

Now it is not possible to add a student as a teacher and the other way around. That there really is a student/teacher for a person with that attribute has to be guaranteed through the transaction that adds the information.

Best Answer

Related Solutions

Sql-server – SQL Server 2008 – Partitioning and Clustered Indexes

Sql-server – How to enforce data consistency across one-to-one relationship in an inheritance hierarchy

Related Question