The solution I'd think of for this problem is to run weekly a job that will run sp_spaceused for all tables in a database and save this data in a table. If there are differences in size for each table greater than..let's say..10%, I would run the dbcc cleantable.
My code to loop through table sizes looks like that:
if OBJECT_ID ('tempdb.dbo.#temp') is not null
drop table #temp;
if OBJECT_ID ('dbo.BigTables') is not null
drop table dbo.BigTables;
go
CREATE TABLE [dbo].[BigTables] (
[table_name] [sysname] NOT NULL,
[row_count] [int] NULL,
[col_count] [int] NULL,
[data_size] [varchar](50) NULL,
[Date] [datetime] NOT NULL,
[DBName] [nvarchar](128) NULL
);
GO
CREATE TABLE #temp (
table_name sysname ,
row_count int,
reserved_size varchar(50),
data_size varchar(50),
index_size varchar(50),
unused_size varchar(50)
);
go
INSERT #temp
EXEC sp_msforeachtable 'sp_spaceused ''?'''
insert into dbo.BigTables
SELECT a.table_name,
a.row_count,
count(*) as col_count,
a.data_size,
getdate() as [Date],
'MY DB' as DBName
FROM #temp a
INNER JOIN information_schema.columns b
ON a.table_name collate database_default
= b.table_name collate database_default
GROUP BY a.table_name, a.row_count, a.data_size
ORDER BY CAST(Replace(a.data_size, ' KB', '') as integer) desc
DROP TABLE #temp;
Select * from dbo.BigTables;
Now you only need to build the logic that will verify what would be the size change through the week and schedule it.
I was able to reproduce your issue in SQL Server 2014 with the legacy CE and TF 4199 enabled. I used a rowstore fact table because I have little experience with columnstore.
For your query with the filter, the query optimizer rewrites the query to something like this:
select *
from fact_sales f
join dim_company d on f.company_SK = d.company_SK
where f.company_SK = 1 and d.company_SK = 1
For that query the optimizer can directly uses the histogram step containing 1 for both tables and you get 467,583,000 X 1 as the estimated number of rows.
The query without a filter will use linear interpolation:
For joins with a single equality or inequality predicate, the legacy
CE joins the histograms on the join columns by aligning the two
histograms step-by-step using linear interpolation.
Note that the statistics for your dimension table do not have a histogram step with a RANGE_HI_KEY value of 1, which is where almost all of the data is in your fact table. Something is going wrong with the linear interpolation step for that value. If I create your dimension table without a 0 value then the histogram gets a step with a RANGE_HI_KEY value of 1. That fixes the estimate. In short, you got unlucky with your histogram. I don't think there is a supported way to force the histogram on the dimension table to contain a step for 1, unless perhaps you are willing to add additional filters to your query.
I don't know the exact rules of your data, but I can give you two workarounds. The first workaround is to enable trace flag 2301 for the query. That fixes the estimate for this part of the query but may have other negative effects for the rest of your query. This trace flag is documented by MS, but talk to your DBA before using it.
The second workaround is to encourage the query optimizer to use a different method of cardinality estimation for the join. The following query gave a good estimate for me and only does a single scan on your fact table:
select *
from FACT_SALES f
join (SELECT COMPANY_SK FROM DIM_COMPANY UNION SELECT 1) d
on f.company_SK = d.company_SK;
Best Answer
So you really have two questions:
Based on some reports of significant performance issues in the field when Query Store was enabled (although they are quite rare), one might infer that you should not enable Query Store on databases where it causes significant performance issues.
Of course the only way you'll know for sure whether it will cause a performance issue is to research all the known issues, make sure none of them will affect your system, and then enable it--but you still won't know if it will cause a performance issue until you enable it. For example, see FIX: Slow performance of SQL Server 2016 when Query Store is enabled. Note that enabling or changing settings in the query store can also flush the plan cache.
Secondly, if you aren't going to use the information and functionality it provides to improve the performance of the queries running on your server, then you shouldn't enable it. Some environments, particularly large, heavily regulated organizations, may have such strict change control that you can't realistically do anything with Query Store.
I would certainly not enable it on hundreds of databases all at once, due to the possibility of causing a lot of problems and having too much data to analyze. If your environment is such that you'll be able to use the Query Store to improve performance, then I would recommend enabling on a few database at a time.
If you have hundreds of databases, you probably don't have the staff to analyze all of the information, so it might make sense to enable it periodically on a few servers, see if there are any improvements to be made, then disable it and use it on some other servers for a while. Or, when an application upgrade takes place, enable it for a while to see if the new code is triggering anything that Query Store can help with.