SSRS – Filtering Dataset vs. Using Query Parameter Efficiency

sql serverssrsssrs-2008

People at my organization use SQL Queries within SSRS reports: for instance

SELECT name, age 
FROM egTable

Then, this query is run on the database and then they use a FILTER within SSRS to get rid of unwanted rows. For instance, there is a filter in the SSRS called AGE, and this will be something like Age = 11

My proposition was that, this is bad: this way of doing things means that we are querying the ENTIRE table, and then from the GIANT resultset we are just getting rid of unwanted rows. Instead, we should be introducing a parameter: @age, and then write the query as follows:

    SELECT name, age 
    FROM egTable
    WHERE age = @age

Am I correct in saying that the first method pulls the entire table, whereas mine is far more efficient because it only returns a small result set?

How can I verify/prove this?

Best Answer

TL/DR

Yes you are correct, you can prove it by showing the result of ExecutionLog in the SSRS database.

Longer Answer

I created 2 identical reports based on the Adventureworks database, one with a filter on the dataset for City and one with a parameter in the query for City.

Report 1

The query for this report is:

SELECT  Person.Address.*
FROM    Person.Address

And the filter is added like this:

The result of this report is:

Report 2

The query for this report is

SELECT  Person.Address.*
FROM    Person.Address WHERE city=@city

There are no filters on this dataset

The result of this report is:

Where you can see the same data is shown, but the behaviour is different since the user has to put in a value for the filter (marked in yellow)

If this is undesired this can be overcome by adding a default value to the parameter and setting the visibility to hidden like this:

and this:

Proof of efficiency

The efficiency of both reports can be proven by querying the executionlog table and views like this:

SELECT [ItemPath], [Parameters], [TimeDataRetrieval], [TimeProcessing], [TimeRendering], [RowCount] 
FROM ExecutionLog3;

Which for these 2 reports returns:

+--------------------------+--------------+-------------------+----------------+---------------+----------+
|         ItemPath         |  Parameters  | TimeDataRetrieval | TimeProcessing | TimeRendering | RowCount |
+--------------------------+--------------+-------------------+----------------+---------------+----------+
| /Report Project3/Report1 | NULL         |               669 |           1878 |           880 |    19614 |
| /Report Project3/Report2 | city=Bothell |                 8 |             42 |             4 |       26 |
+--------------------------+--------------+-------------------+----------------+---------------+----------+

So the second method not only fetched a lot less rows (26 versus 19614) but also consumed less time processing and rendering the report.

Related Solutions

Sql-server – which one is more efficient query

Write a query that pulls the student who registered the last?

The second query does not necessarily do this, depending on the data type of the EnrolmentDate column and how granular the data is. If this column does not contain a time component, the query will return all students registered on the last day a student registered, which does not satisfy the question. If there is a time component, it's possible (but much less likely) that there will be multiple rows returned.

(Edit: Alex Kuznetsov correctly pointed out in the comments that the first query doesn't necessarily return the last enrolled student either when there are ties. It is, however, guaranteed to return a record in that event, instead of all records, which is normally satisfactory. I think my point was more that comparing the two queries is comparing apples to oranges, so to speak.)

In any event, if we assume all enrollment dates/times are unique, from what's given, the answer to the question isn't necessarily clear cut either. You would need to qualify for me what you mean by more efficient.

The first query will only scan once, but could potentially incur an expensive sort (you didn't say which indexes exist on the table, so I assume none). The latter query will do a scan to find the maximum, then do another scan to find all matching rows, which would possibly use less CPU, but more logical I/Os. It's entirely possible the second query would be less expensive overall (again, with no indexes available).

Having said all that, if I was to start doing performance tuning on this business operation, I would most certainly start with query (a).

Sql-server – How to do a differential query (delta plus/minus) telling me what rows are in view A that are not in view B and vice versa

You can use a FULL OUTER JOIN for this

WITH T1
     AS (SELECT trantype,
                product_code
         FROM   vwVIEW1
         WHERE  KEY = 'DEMO'),
     T2
     AS (SELECT trantype,
                product_code
         FROM   vwVIEW2
         WHERE  KEY = 'DEMO')
SELECT *
FROM   T1
       FULL OUTER JOIN T2
         ON T1.trantype = T2.trantype
            AND T1.product_code = T2.product_code
WHERE  T1.trantype IS NULL
        OR T2.trantype IS NULL