Sql-server – Non clustered index for non-evenly distributed data in large SQL table

indexnonclustered-indexsql serverssis

I have the following table containing 2.000.000.000 rows:

CREATE TABLE [VeryLargeTable]
(
   [dataCol1] [NVARCHAR(MAX)]
   , [dataCol2] [NVARCHAR(MAX)]
   , [Manufacturer] [NVARCHAR(10)]
   , [MsaiSalesType] [NVARCHAR(2)]
   , [Return Volume] [NVARCHAR(2)]
)

The data is distributed this way:

I have to determine the first column(Sales Types), which is determined by the following 3(Manufacturer, MsaiSalesType, Return Volume).

The table will be created on the spot, then the data will be inserted via SSIS from a flat file. After that, data will not be inserted, updated, deleted.

I have taken into consideration filtered indexes and partitioned tables.

Would they be efficient if the data is so un-evenly distributed ?
What are my other options ?

If I get rid of the data representing 95 % of the table, I should have no further trouble filtering the rest of the data.

Best Answer

If I get rid of the data representing 95 % of the table, I should have no further trouble filtering the rest of the data.

In that case, separating that 95% of the data is way to go. If you want to stay with SSIS, the obvious solution is to redirect those records to a another flat file (can also be a separate db) during import. Specifying this flat file in SSIS is granular too. That is, you can send malformed data rows to a separate rejects flat file, for example. That way you can have clean SalesType1 file.

To specify which records to put in the flat file, use Conditional Split in SSIS to filter the rows (during import) using the same logic as your Where condition consisting of Manufacturer, MsaiSalesType, and Return Volume.

Best Answer

Related Solutions

Sql-server – the best practice for storing large data when nvarchar(4000) is sufficient 99% of the time

Sql-server – Best index strategies for read-only table

Related Question