Sql-server – Make queries and BCP faster on a very large table

bcpindex-tuningperformanceperformance-tuningquery-performancesql server

This question is somewhat of a continuation to this:
Huge data and performance in SQL Server, but different enough to merit its own question. I'm going to copy and paste some of the text from that question here for legibility.

I've written an application with a SQL Server backend that collects and stores an extremely large amount of records. I've calculated that, at the peak, the average amount of records is somewhere in the avenue of 3-4 billion per day (20 hours of operation).

My application receives data, chunks it into batches of ~100k records and creates BCP-ready staging files, moves them to the server and SQL Server uses BCP to move the data into a production table. With that solution I've successfully moved 5.2 billion rows into my production table with little to no backlog of BCP files. I have BCP running continually as a SQL Agent job.

My production table has the following schema:

CREATE TABLE [dbo].[SignalValues](
    [Timestamp] [datetime] NOT NULL,
    [SignalId] [uniqueidentifier] NOT NULL,
    [SignalValue] [nvarchar](4000) NULL,
    [SignalValueId] [uniqueidentifier] NOT NULL
)

Because of the volume of records, the table is partitioned on [Timestamp] every half hour from 2016-01-20 21:00:00.000 to 2016-06-15 08:00:00.000. (~11k total partitions).

My problem is twofold:

  1. Querying the table in any way takes eons.
  2. While queries are running against the table, the BCP processing basically stops. It's like all processing power goes to the query and I get a files backlog.

How do I speed up query performance? Users will mostly query based on [Timestamp] and SignalId. I created a non-clustered index on [Timestamp] but it didn't seem to speed things up that much. Seems to be that the more indexing I apply, the slower the BCP inserts which creates a file backlog. Any index suggestions?

My assumption of partitions is that internally they act as separate tables so if I were to query [Timestamp] for 2016-01-25 22:01:00.000, it should only lock the one partition it needs for the query and leave the other partitions unlocked. Is my assumption incorrect? It would seem so with the behavior seen in 2) above. How do I force SQL Server not to lock during a SELECT query (WITH (READPAST) and WITH (NOLOCK) didn't help)?

I should note that rows in this table will never be updated or deleted. Also, it's possible that data could be inserted from a time in the past, depending on what the applications providing my application with data set as the timestamp for their rows. This unfortunately negates any possibility of partition switching unless I were to do some crazy magic.

The average number of rows per partition is ~70 million.

Based on the feedback from my previous question (above), I was able to beef up my server considerably. The following is my server's system information:

Info

Also, I've configured SQL Server as follows based on feedback:

  • All database-related files are connected to a SAN which is connected to the system via fibre channel
  • My database's data file is on its own drive alone — a RAID 10 array of 4 spinning disks
  • tempdb consists of 6 files (half the number of logical processors), each sized at 512mb, all on one drive — a RAID 0 array of 2 spinning disks
  • All transaction logs live on their own drive — a RAID 1 array of 2 spinning disks.

The table does not have a clustered index. I'd had a clustered unique index (the PK) before but the uniqueness constraint caused inserts to take 10 minutes for 100k records in a table with 800m rows. My understanding was that a clustered index creates a uniqueness constraint internally per the heading, "Can you create a clustered index on a column with duplicate values?" from this page:

14 SQL Server Indexing Questions You Were Too Shy To Ask

I could be completely wrong though.

Let me know if I'm missing anything.

Best Answer

This is a long shot but seen as you are running SQL 2014 Enterprise I would have a test with using a Clustered Columnstore Index on this table and see if this improves performance for you.

Especially considering that you are using the table for selects and bulk inserts only - no updates and no deletes.

You will have the added advantage of taking a bit of load off the spinning disks and moving it more towards the CPU / Memory which you seem to have a decent amount of.