SQL Server – Query Slow with JDBC Parameters, Fast with Concatenated SQL

jdbcperformancequery-performancesql server

I have an application that can be run with either mysql or SQL Server. There is a query that runs in a reasonable time on mysql, but is very slow on SQL Server. My application uses hibernate, but this particular query is coded in raw SQL so I know exactly what SQL is being executed. If I run the same query on SQL Server directly in a database client, it runs quickly. It's only slow when run in my application.

I discovered that if I take out the use of JDBC parameters and just concatenate the values into the SQL string, it runs much much faster. However this isn't a good solution; it doesn't guard against SQL injection or handle characters that might need escaping.

I have found recommendations to use sendStringParametersAsUnicode=false on the connection, but this isn't appropriate for my case because the columns are all nvarchar. In any case, I did try this and it didn't make anything faster and made some things slower. I also found info that suggested the jtds driver is faster, but it's not compatible with hibernate so I have to use the Microsoft driver (mssql-jdbc-7.0.0.jre8.jar).

My slow-running code looks like this:

    String sql = " SELECT DISTINCT FOO.BAR_ID_ "
            [big join here across multiple tables]
            "WHERE FOO.ID_= :fooId " +
            " AND BAZ.NAME_ LIKE :likeParameter ";
    Query query = getSession().createSQLQuery(sql)
            .setParameter("fooId", fooId)
            .setParameter("likeParameter", likeParameter);
    return query.list();

It runs quickly if I change it to this:

String sql = " SELECT DISTINCT FOO.BAR_ID_ "
            [big join here across multiple tables]
            " WHERE FOO.ID_= " + fooId  +
            " AND BAZ.NAME_ LIKE '" + likeParameter + "'";
    Query query = getSession().createSQLQuery(sql);
    return query.list();

The LIKE parameter has wildcards on both ends, which I would expect to prevent the use of an index on that column, but that would be true for the concatenation case as well, and it runs quickly. Clearly there is some other issue with SQL Server and JDBC parameters besides the Unicode issue. What can I do to get around this?

UPDATE: here are the query plans:

Fast query with in-line data: https://www.brentozar.com/pastetheplan/?id=SkkH0NiU4

Slow query with parameters: https://www.brentozar.com/pastetheplan/?id=SkBxhQoLE

Best Answer

Not a complete answer, but here's some diagnostics and suggestions from looking at the plans.

You're running SQL 2016, so there's a ton of useful info in the plan.

First is that you're running a GDR patch for SQL 2016 SP1 (13.0.4224.16). SQL 2016 SP2 was released on April 24, 2108, almost a year ago, and after 1 year applying the service pack is required for support. So you can test this on SQL 2016 SP2, or SQL 2017 to see if it behaves differently.

When optimizing a parameterized query, SQL Server has to pick a plan that will be used for all future values of fooId and likeParameter. Apparently it's guessing you will often have a selective LIKE predicate, and so using the Name index will, on average, be cheaper. With the hard-coded values, SQL Server doesn't need to build a plan that works for multiple different LIKE predicates, and so picks a plan that doesn't use a non-clustered index.

To evaluate the predicate NAME_ LIKE @p SQL Server is range scanning a non-clustustered index on NAME_. This works great if you happen to pass a LIKE predicate that starts with a few literal characters, but when it starts with a wildcard, will require a complete scan of the index. Here scanning, and evaluating the LIKE predicate for each of 14M rows.

In the fast plan SQL seeks 21,518 separate components using the clustered index and checks each one's name against the LIKE predicate.

From the plans:

The difference between the plans is a single operator, which accounts for all of the increased cost: a scan of 14,675,300 rows from the index [PDL_COMPONENT].[IDX_COMPONENT_NAME], taking 37sec of CPU time:

  <RunTimeInformation>
    <RunTimeCountersPerThread 
      Thread="0" 
      ActualRows="741" 
      Batches="0" 
      ActualEndOfScans="1" 
      ActualExecutions="1" 
      ActualExecutionMode="Row" 
      ActualElapsedms="37038" 
      ActualCPUms="37028" />
  </RunTimeInformation>

Not parameter sniffing, as the ParameterCompiledValue is the same as ParemeterRuntimeValue:

    <ParameterList>
      <ColumnReference Column="@P1" ParameterDataType="nvarchar(127)" ParameterCompiledValue="N'%geronimo%'" ParameterRuntimeValue="N'%geronimo%'" />
      <ColumnReference Column="@P0" ParameterDataType="int" ParameterCompiledValue="(9)" ParameterRuntimeValue="(9)" />
    </ParameterList>

Using the query with the hard-coded values isn't the worst idea here. Or you could try a RECOMPILE and turn off plan caching without giving up parameterizing your queries.

And I don't know what the indexes and foreign keys in the database are, but using DISTINCT has really bad code smell, as it often indicates that one or more JOIN is not correct, and it affects the cardinality estimation of the query.