Sql-server – Is it a good idea to include a value column (e.g. from measurements) in an index

indexperformancequery-performancesql server

Is it a good idea to include a value column into an index for a table which has many million rows?

For example a measurement table which looks kind of like this:

CREATE TABLE measurements
(
    id bigint IDENTITY,
    parameter_id int NOT NULL,
    measuretime datetime NOT NULL,
    value float NOT NULL,
    PRIMARY KEY CLUSTERED (id ASC)
);

ALTER TABLE measurements ADD FOREIGN KEY (parameter_id)
REFERENCES parameters(id)

When I run a simple avg query the execution plan suggest to create an index which also includes the value column.

SELECT AVG(value)
FROM measurements
WHERE measuretime BETWEEN '2015-01-01' AND '2015-02-01'
GROUP BY parameter_id

execution plan

CREATE NONCLUSTERED INDEX [idx_measurements_mt_pi_v]
ON [dbo].[measurements] ([measuretime])
INCLUDE ([parameter_id],[value])

I get that it's a good idea to have an index on measuretime and parameter_id, but isn't storing every value in an index a bit over the top?

Won't this result in a massive index? As values are probably always unique because of their precision. Or is just my concept of what include does wrong?

Besides is this really necessary, as the query is running only for about 2-3 seconds with 38.000.000 row (but I guess this could have a much higher impact on a long running production database)?

I'm also thankful for any other performance tips regarding this kind of data query!

EDIT: As suggested I've added the index and ran the query again with SET STATISTICS IO ON and SET STATISTICS TIME ON again

Output:

SQL Server parse and compile time: 
   CPU time = 15 ms, elapsed time = 17 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 9 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(12 row(s) affected)
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'measurements'. Scan count 5, logical reads 24212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 1391 ms,  elapsed time = 481 ms.

And here is the new execution plan:

enter image description here

Best Answer

You have a time series (measurements) organized by id (clustered index). I am yet to see a single case where using id as clustered key for time series makes sense. All queries will ask for date ranges. Organize by time:

CREATE TABLE measurements
(
    id bigint IDENTITY,
    parameter_id int NOT NULL,
    measuretime datetime NOT NULL,
    value float NOT NULL,
    PRIMARY KEY NONCLUSTERED (id ASC)
);

CREATE CLUSTERED INDEX cdx_measurements ON measurements(measuretime);

Yes, lookup by id will take two reads instead of one. Who cares, id is looked up in singletons. Is a trade off.