Sql-server – MySQL and MSSQL Joined Query Performance Issue

execution-planMySQLsql serversql server 2014

In an attempt to make some reports while linking a MySQL and MSSQL server, the performance is poor under some circumstances. There are two unique test cases, one with a "YR_CDE" of 1617 and another with 1718.

Case 1:

SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM NAME_MASTER NM
  INNER JOIN SHU_STUD_TERM_SUM_DIV STS ON NM.ID_NUM = STS.ID_NUM AND STS.YR_CDE = '1617' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE NM.ID_NUM not in (
  SELECT su.exid
  FROM [MYSQL1]...[myshu.shu_user] su
    INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
    INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
)
order by nm.LAST_NAME, nm.FIRST_NAME

Case 2:

SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM NAME_MASTER NM
  INNER JOIN SHU_STUD_TERM_SUM_DIV STS ON NM.ID_NUM = STS.ID_NUM AND STS.YR_CDE = '1718' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE NM.ID_NUM not in (
  SELECT su.exid
  FROM [MYSQL1]...[myshu.shu_user] su
    INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
    INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
)
order by nm.LAST_NAME, nm.FIRST_NAME

Case 1 returns results in about 4 seconds while case 2 takes nearly 4 minutes. After digging through execution plans, we noticed one major difference, case 1 has a Table Spools.

Case 1:
Case 1 image of MySQL joins

Case 2:
Case 2 image of MySQL joins

The other interesting thing here is that the MySQL queries are executing for every single row in the MS SQL side.

Case 1 has 2418 rows from the base MS SQL query. Case 2 has 965 rows from the base MS SQL query. It appears that the MySQL queries are executing once for every single row in the base query.

It seems like case 1 is cache the data in a temp table while case 2 is not. Does anyone have suggestions on how to do this better or how we can force the second case to be cached as well?

Edit 1:

Thanks for the suggestions RMathis.

Performance is still an issue unfortunately. This query returned in 3:01. The execution plan looks similar to the original.

SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERM_SUM_DIV STS
  INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.id_num
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE STS.YR_CDE = '1718' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0 AND NOT EXISTS (
    SELECT 1
    FROM [MYSQL1]...[myshu.shu_user] su
      INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
      INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
    WHERE su.exid = NM.ID_NUM
)
order by nm.LAST_NAME, nm.FIRST_NAME

The second suggestion took 3:35 to execute. The two parts separately return in less than 5 seconds combined.

    SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERM_SUM_DIV STS
INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.ID_NUM
INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE STS.YR_CDE = '1718' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0 and sts.DIV_CDE = dh.DIV_CDE
EXCEPT
SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERM_SUM_DIV STS
  INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.ID_NUM
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
  INNER JOIN [MYSQL1]...[myshu.shu_user] su ON su.exid = NM.ID_NUM
  INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
  INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
WHERE STS.YR_CDE = '1718' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0 and sts.DIV_CDE = dh.DIV_CDE
order by nm.LAST_NAME, nm.FIRST_NAME;

Best Answer

Try rewriting the query in different ways to see how the execution plan and times change. Changing the order in which tables are accessed, using set operators, indexing, etc... can make a huge difference.

The two approaches suggested below are using the MINUS set operator and using NOT EXISTS instead of NOT IN. In both cases, the queries are driven by the SHU_STUD_TERM_SUM_DIV table. This will reduce the number of rows joined to NAME_MASTER.

The MINUS set operator works by filtering the results of the first query with he results of the second. This allows the MySQL tables to be joined directly.

The NOT EXISTS will behave like the NOT IN (each row will query the MySQL tables) but does have the potential of comparing a smaller set of values (only those that directly match NM.ID_NUM).

Using MINUS:

SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERMS_SUM_DIV STS
  INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.ID_NUM
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE STS.YR_CDE = '1617' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0
MINUS
SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERMS_SUM_DIV STS
  INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.ID_NUM
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
  INNER JOIN [MYSQL1]...[myshu.shu_user] su ON su.exid = NM.ID_NUM
  INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
  INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
WHERE STS.YR_CDE = '1617' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0
order by nm.LAST_NAME, nm.FIRST_NAME;

Using EXISTS:

SELECT NM.ID_NUM, NM.FIRST_NAME, NM.LAST_NAME, BM.ENTRANCE_CDE, SM.CURRENT_CLASS_CDE, STS.TUITION_CDE, DH.MAJOR_1
FROM SHU_STUD_TERMS_SUM_DIV STS
  INNER JOIN NAME_MASTER NM ON NM.ID_NUM = STS.ID_NUM
  INNER JOIN BIOGRAPH_MASTER BM ON NM.ID_NUM = BM.ID_NUM
  INNER JOIN STUDENT_MASTER SM ON NM.ID_NUM = SM.ID_NUM
  INNER JOIN DEGREE_HISTORY DH ON NM.ID_NUM = DH.ID_NUM AND DH.CUR_DEGREE = 'Y'
WHERE STS.YR_CDE = '1617' AND STS.TRM_CDE = 'FA' AND STS.TRANSACTION_STS IN ('C', 'H') AND STS.HRS_ENROLLED > 0
AND NOT EXISTS (
  SELECT 1
  FROM [MYSQL1]...[myshu.shu_user] su
    INNER JOIN [MYSQL1]...[myshu.demographics_core] dc on su.id = dc.user_id
    INNER JOIN [MYSQL1]...[myshu.demographics_term] dt on dt.id = dc.term_id and dt.description = 'Spring 2017'
  WHERE su.exid = NM.ID_NUM
)
order by nm.LAST_NAME, nm.FIRST_NAME