Postgresql – How to optimize a query where a IN clause is replaced with multiples UNION ALL

ormperformancepostgresqlpostgresql-9.5query-performanceview

I have a query like this:

select *
from slow_view
where id = 1
union all
....
union all
select *
from slow_view
where id = 1000

-- 50 ids, exec time: 10.615 sec,

This would run in a very acceptable time if it was replaced with:

select *
from slow_view
where id in (1, ...., 1000)

-- 50 ids, exec time: 1.3 sec,

with slow_view_cached as (
select * from slow_view)
select *
from slow_view_cached
where id = 1
union all
....
union all
select *
from slow_view_cached
where id = 1000

-- 50 ids, exec time: 1.83 sec,

But unfortunately it is generated by an ORM and I don't have a chance to modify it. It's quite bad as an ORM, I don't think I can do much to improve its behavior.

Is there something I can do to my query or to my view to make it "cached" for all the subqueries in this case? I know I could use a materialized view but it doesn't fit my case very well.

AFAIK in Postgres it's not possible to define indexes on a view, so id is not indexed or unique.

Is there anything else I can try?

I'm using Postgresql 9.5 (but if you have a solution for PG 9.6 or even PG 10 it would be acceptable, since I plan to update very soon).

The code is a bit too complex to post here. In essence this query is generated in the following case. slow_view is an association of a Model m. What I do is basically m.find().populate(slow_view, criteria); The documentation is here.

criteria is excluded from the query for the sake of simplicity, id is the foreign key from _slow_view_ to Model.

I could call a stored procedure from this ORM, but it would mean altering the application a bit too much.

Best Answer

if slow_view don't have too many records, you can consider fetching all the records to application and apply the filter in the application side result set. This approach helped me in a specific case to reduce the database load.

Related Solutions

Sql-server – SQL Server Query: Inefficient where clause

If you don't need the other data from the NETWORK_STATUS table, how about:

select *
from dbo.NETWORK AS n
inner join dbo.vwNETWORK_KEYMSTN AS km 
  on n.Network_ID = km.Network_ID
inner join dbo.vwAPPROVAL_LATEST AS a 
  on n.Network_ID = a.Network_ID
inner join dbo.APPROVAL_VINTAGE  AS av 
  on a.Approval_ID = av.Approval_ID 
  and km.Milestone_Type_ID = av.Milestone_Type_ID
inner join dbo.NETWORK_MILESTONE AS m 
  on A.Approval_ID = m.Approval_ID 
  and km.Milestone_Type_ID = m.Milestone_Type_ID
inner join dbo.REF_MILESTONE AS rm 
  on km.Milestone_Type_ID = rm.Milestone_Type_ID
WHERE EXISTS
(
  SELECT 1 FROM dbo.NETWORK_STATUS 
    WHERE Network_ID = n.Network_ID
    and Status_Type_ID = 2
    --and Status_Type_ID = rm.Status_Type_ID
);

SQL Server Linked Server – Query Performance Tips

Your problem begins and ends with statistics and estimates. I have reproduced your situation on my servers and found some interesting hints, and a workaround solution.

First things first, let's take a look at your execution plan:
When a view is used we can see that a filter is applied after the Remote Query is executed, while without the view there was no filter applied at all. The truth is that the filter was applied inside the Remote Query, at the remote server, before retrieving the data over the network.
Well, obviously applying the filter at the remote server and thus retrieving less data is a better option, and obviously that only happens when not using a view.

So... what is so intersting...?

Surprisingly, when I changed the filter from cognome = 'test' to cognome = N'test' (unicode representation of the string) the view used the same execution plan as the first query did.
I guess the reason is that somehow when using the view SQL Server estimated that there will be a small number of rows returning from the (remote) query, and that a local filtering will be cheaper, but when SQL Server had to implicit convert NVARCHAR to VARCHAR, statistics could no longer be used and the decision to filter locally was not taken.
I have looked for the statistics locally, but the view had no statistics, so my guess is that the view uses the remote statistics in a way that ad-hoc query does not, and than takes the wrong decision.

OK, so what solves the problem?

I stated earlier that there is a workaround (at least until someone comes up with a better solution), and no, I don't mean using unicode for your strings.
I wanted to give an answer first, I still have to find why, but when using an Inline Function SQL Server behaves exactly the same as with the query (without view), so replacing the view with the function will give the same result, in a simple query, and with good peformance (at least in my environment).

My code suggestion for you is:

CREATE FUNCTION fn_anagrafiche2()
RETURNS table
AS
RETURN 
(
    SELECT * 
    FROM dolph2.agendasdn.dbo.vistaanagraficagrp
    UNION
    SELECT * 
    FROM dolph2.acampanet.dbo.vistaanagraficagrp
    UNION
    SELECT * 
    FROM municipio2.dbnet.dbo.vistaanagraficagrp
)
GO

The query will then be:

SELECT * 
FROM fn_anagrafiche2()
WHERE cognome = 'prova'

This works on my servers, but of course test it first.
Note: I do not recommend using SELECT * at all, as it is prone to future errors, I simply used it because it was in your question and there was no need for me to change that when I can add this remark instead :)

Best Answer

Related Solutions

Sql-server – SQL Server Query: Inefficient where clause

SQL Server Linked Server – Query Performance Tips

Related Question