Sql-server – Viability of a large database without using indexes and some other things

indexperformancesql serverstored-procedures

I'll explain my situation as briefly as possible. My group is working for another company and I'm mostly creating stored procedures for making reports.

The twist is that, last month, they decided to do some major changes, which increased a lot the time required to what before worked nicely. I don't know the details, but one of the big changes is that what used to be tables now are views that get the data from an external source. Those are defined as:

CREATE view [dbo].[eng_contra_env] as 
select ce_serial, ce_empresa, ce_ganado, ce_contra, ce_fecha_env, ce_fecha_rec, ce_observa
  from OPENROWSET('MSDASQL', '***', 'SELECT * from eng_contra_env');
GO

Also, since that change, I'm not allowed to do some basic things like indexing some tables, as my boss tells me not to do so due to those changes that the client did.

So, I'm asked to improve the performance of some procedure (which again, before the changes it finished, now it doesn't). One bit of the procedure requires me to do the following:

https://www.brentozar.com/pastetheplan/?id=SkDJAvZQI

How am I supposed to optimize that? Also, with what I said already, how viable do you think it is to work under these conditions.

Update: Apparently they had a meeting at the client and they decided to revert these changes. Thank god!

Best Answer

The use of views backed by linked server queries puts you in a pretty tough situation. You get bad cardinality estimates coming out of the "remote query" operators (10k rows estimated, vs. the actuals of 521k and 30k respectively). This results in SQL Server choosing to sort and use a merge join, when it would probably have chosen a hash join had the estimates been better.

For this specific query, you could try a hash join hint to avoid the sort + merge join plan:

From  eng_facven A
    inner hash Join
   eng_facved B On A.fv_serial = B.fd_serial 

As a general solution, you might consider materializing the remote tables into temp tables first. This will give you better estimates and thus likely better plans.

Of course, the feasability of this option depends a lot on the size of the data / number of rows. In the query referenced above, you'd only be loading around 16 MB total into temp tables, which is probably reasonable - especially for a report that might only be run once a day, or a few times per day.

The example query you provided would look like this:

SELECT *
INTO #eng_facven
FROM eng_facven;

SELECT *
INTO #eng_facved
FROM eng_facved;

Select  A.fv_empresa, A.fv_ganado, A.fv_contra
   , Sum(A.fv_animales)        fv_animales
   , Sum(A.fv_primales)        fv_primales
   , Sum(B.fd_prima)/Nullif(Sum(A.fv_animales),0)  fv_prima_total
   , Sum(B.fd_kilos)         fv_kg_factu
   , Sum(B.fd_bruto)         fd_precio
   , Sum(A.fv_imp_dto_trans)       fv_dto_fac

   , (Sum(B.fd_bruto) - Sum(A.fv_imp_dto_trans))
    /
    Nullif(Sum(B.fd_kilos),0)       fv_fd_prec_kg_canal


From  #eng_facven A
    Join
   #eng_facved B On A.fv_serial = B.fd_serial 

Group by A.fv_empresa, A.fv_ganado, A.fv_contra;

For larger tables, you might need to consider having a scheduled (daily / hourly) job that copies the contents of the remote tables into local tables, and have your queries reference those. This depends on how up-to-date your reports are required to be.