I'm trying to fetch addresses for certain users only ...
IN
does not scale well with big lists. Compare performance to unnest()
/ JOIN
SELECT a.*
FROM unnest(ARRAY[9681,23824,23760,20098,962,14730,12294,9552,534,
553,5837,6768,6583,956,24179]) AS t(user_id)
JOIN addresses a USING (user_id)
WHERE a.type = 'VerifiedAddress'
AND a.state <> 'manual_verification'
I hope state
and type
are at least enum
types. Otherwise you should normalize your database design. Have lookup-tables for state
and type
and only use a small IDs referencing entries there. Makes the addresses
table smaller and everything faster.
Assuming all columns NOT NULL
for lack of information.
If you need it faster still, and if there are more than a few rows with type <> 'VerifiedAddress'
or state = 'manual_verification'
, a partial index would help some more. The gain grows with the percentage of rows you can exclude from the index.
CREATE INDEX addresses_selection_idx ON addresses (user_id)
WHERE type = 'VerifiedAddress'
AND state <> 'manual_verification';
Be sure to use matching WHERE
conditions in your query to allow Postgres to use this index. And weigh the benefit of tailored indexes against their cost (maintenance, extra disk space, small penalty for write operations).
You might be able to do a union instead of the or. That could prevent a table scan.
select
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
,rtrim(ltrim(b.Store)) Store
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0) as Actual
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
...
into #SalvesVsBudgets
from #StoresBudgets b
left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null)
group by
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end
,rtrim(ltrim(b.Store))
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0)
Union
select
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end as AM
,rtrim(ltrim(b.Store)) Store
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0) as Actual
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget
,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
...
into #SalvesVsBudgets
from #StoresBudgets b
left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is null and b.ProgramName = 'NewProgram')
group by
b.ProgramName
,b.Region
,case when b.AM IS null and b.ProgramName IS not null
then 'Unassigned'
else b.AM
end
,rtrim(ltrim(b.Store))
,trd.Store_ID
,b.appliesToPeriod
,isnull(trd.countLeadActual,0);
Additionally you might be getting forced into a single thread execution plan by the trim functions, if you are able to do the ltrim and rtrim in the application that is consuming the data instead of in the query you might be able to get an execution plan that goes parallel.
Best Answer
We've got a ton of learning resources on that Aggressive Indexes page (I just updated it with more, coincidentally) and I can't even begin to do justice to it overall. Having said that, here goes.
Imagine that you've only got a clustered index on a table. Inserts will be pretty doggone fast. However, when you try to update a row in that table, and you don't specify the clustering key in your where clause, then you'll end up with a lot of table locks (aggression). In that case, the fix is to add indexes for the fields you frequently query on.
On the other hand, imagine that you've got dozens of indexes. Whenever you want to insert or delete rows, you're going to need locks all over the place to get your work done. The high number of indexes will slow you down because nobody can get their insert or delete done quickly - and again, you'll see aggressive index warnings. In that case, the fix is to prune down the number of indexes to a more managable number.
Generally, for transactional tables (as opposed to overnight data warehouse loads), I tell people to aim for:
That rule of 5 and 5 can be violated - the less write activity you have, and the faster your hardware is, and the better you tune your queries, the more you may be able to get away with more indexes. On the flip side, if you have crappy hardware and crappy queries, you might need to drop those numbers lower.
The 5 and 5 rule stems from the fact that I've gotta start people somewhere, and I have 5 fingers on one hand, and 5 fingers on the other - so the rule is easy to communicate.