SQL Server – Improving SQL Query Joins Performance

join;sql serversql server 2014

We have been using SQL Server 2014 for our project as a backend source. We have been performing various joins on the tables. We came across a question in one to many and many to one relationship in joins.

Say for example: I have a table named Industry and another table called Products. Industry holds the list of Industry in market and Products holds that industry's multiple products.

Industry

Id
IndustryName
Owner

Products

Id
IndustryId
ProductId
ProductName

From the above tables, I am performing two queries as shown below:

1 to many

Select *
from Industry
inner join Products 
    on Industry.Id = Product.IndustryId

many to 1

Select * 
from Product
inner join Industry 
    on Product.IndustryId = Industry.Id

Among the above which will be more efficient performance wise?

Best Answer

No difference. Only the order of the columns in the SELECT list changes.

See related questions:

Changes like this will almost 100% produce the same execution plan.

_{Communtiy Wiki answer based on an original comment on the question by @ypercube}

Related Solutions

Sql-server – How to increase sql server performance? denormalization (making cascade delete off) vs making duplicate data

How big is this database? How many rows are in each table? Etc?

I would say that normalized data is default state to try to obtain. It is a leaner database, rows are shorter, and indexes may be used more effectively. The short, leaner rows therefore lead to a smaller, leaner database.

One of the major accelerators of performance is memory. If you can get your 5 tables to remain cached in memory, that will be a performance accelerator for your queries since you will avoid much of the disk I/O overhead.

You identify that you are joining with IDs (which are usually integers), so your indexes may be narrow and offer relatively inexpensive joins.

If you decide to denormalize, your tables will be bigger because they are carrying more redundant data on every row. This causes a need for more memory to keep the data in cache and will require even more I/O when the cache is insufficient to buffer the data. (And your backups are bigger.)

In addition, you have taken on the task to denormalize and to maintain the denormalized data. This is an extra load of programming and on the server as well: consuming memory, I/O, and CPU.

But sometimes denormalization is the best choice. Data Warehouses, for example, are largely denormalized data. Also, you may find that in your system the benefits of denormalization may exceed the cost.

Still, you are asking a forum for an answer.

Your best answer would come by building normalized test case and seeing how it works. Even though you may not have a lot of 'real data', you should generate a fairly large data set in the millions of rows to test with.

You can try to find a tool that does it for you (RedGate has one for example, but it is not free) or generate the data yourself so that you control the complexity. There are online sources of states, cities, et cetera, and you can make up call centers, generate landline numbers and so forth.

Then try it.

If you do not like the performance, then create a denormalized table to test. And put some effort into writing the code to maintain the denormalization, since that will become integral to your process.

MySQL – Troubleshooting JOIN Query Producing Wrong Results

The problem is the subquery, where you combine in the select list both an aggregated (max(cr.posted_dt)) and non-aggregated expressions/columns from the tables.

MySQL allows you to run this kind of inconsistent queries - with default settings - which basically means it depends on the developer to write consistent queries and not fire themselves on the foot. You can change the sql mode to ONLY_FULL_GROUP_BY and see what happens if you try to run your query.

Now to solve the issue, it seems you want a [greatest-n-per-group] type of query. There are several ways to do this, all quite complicated in MySQL (because it lacks window functions). Check the relevant tags in this site and in the SO main site.

Here's one way:

SELECT c.id,
       c.title,
       c.member_id,
       cr.mem_id    AS last_replier,
       CONCAT(cr.posted_date, 'T', cr.posted_time) AS last_posted_dt
FROM complaints AS c
  LEFT JOIN complaints_reply AS cr
    ON  cr.id = 
        ( SELECT crl.id
          FROM complaints_reply AS crl
          WHERE crl.complaint_id = c.id
            AND crl.mem_id <> c.member_id
          ORDER BY posted_date DESC, posted_time DESC
          LIMIT 1
        ) ;

It will give you consistent results and also be quite efficient, if you add an index on (complaint_id, posted_date, posted_time, mem_id)

Tested at SQLfiddle.