You are asking the query optimizer to produce a plan that can answer the query:
SELECT *
FROM tbl_pics
WHERE deleted = 0 AND map_id = 1 AND hidden = 0;
The rest is fluff (including the OR pic_id = @pic_id
). This will be a table scan, guaranteed, because of the low selectivity of the predicates involved (I'm sure deleted
and hidden
are 0/1, and map_ip
I doubt it has any significant impact). The only predicate that could save the query is pic_id = @pic_id
but by placing it in an OR condition you killed it's chance. No secondary index can help it, realistically. The addition of ROW_NUMBER will ad a sort, most likely, but the real damage is the scan.
This is a lost cause. Come up with realistic requirements.
Are multicolumn histograms possible?
Not true multi-dimensional histograms, no.
Is this artificial and inelegant solution the only available option to achieve accurate estimations when dealing with filtering by two or more not independent columns?
SQL Server does support "multi-column" statistics, but they only capture average density (correlation) information in addition to a histogram on the first named column. They are only useful for equality comparisons.
The average density information does not capture any detail, so you will get the same selectivity for any pair of values on a two-column statistics object. In some cases, multi-column statistics can be good enough, and better than nothing. Multi-column statistics are automatically built on multi-column indexes.
Depending on SQL Server version, you may also be able to make use of filtered indexes and filtered statistics:
-- Filtered statistics example
CREATE STATISTICS stats_StatsTest_col2_col1_eq_1
ON dbo.StatsTest (col2)
WHERE col1 = 1;
CREATE STATISTICS stats_StatsTest_col2_col1_eq_2
ON dbo.StatsTest (col2)
WHERE col1 = 2;
Or you could build an indexed view (which can support indexes and statistics of its own). Indexed views are the mechanism behind the DATE_CORRELATION_OPTIMIZATION
database setting, a little-used feature for inter-table correlations, but which applies to the spirit of the question.
Is the computed column and the filter by the computed column strictly necessary for obtaining actual precision?
It is not the only method. In addition to the things already mentioned, you could also specify the exact textual definition of the computed column and the optimizer will generally match it to statistics on the computed column.
There are also trace flags that change the assumptions made about multi-column correlations. Also, the default correlation assumption in SQL Server 2014 (with the new cardinality estimator enabled) is changed from Independence to Exponential Backoff (more details here and here). Ultimately, this is just a different assumption though. It will be better in many cases, and worse in others.
Exact precision in cardinality estimation is not always needed to obtain a good execution plan. There is always a trade-off between generating a plan that can be reused for different parameter values, and a plan that is optimal for a specific execution, but not reused.
Best Answer
Nope ! query optimizer assumes that columns within the same table are independent. Joe Sack talks about it here
and @paulwhite talks about it here.