SQL Server – Query Slow with JDBC Parameters, Fast with Concatenated SQL

jdbcperformancequery-performancesql server

I have an application that can be run with either mysql or SQL Server. There is a query that runs in a reasonable time on mysql, but is very slow on SQL Server. My application uses hibernate, but this particular query is coded in raw SQL so I know exactly what SQL is being executed. If I run the same query on SQL Server directly in a database client, it runs quickly. It's only slow when run in my application.

I discovered that if I take out the use of JDBC parameters and just concatenate the values into the SQL string, it runs much much faster. However this isn't a good solution; it doesn't guard against SQL injection or handle characters that might need escaping.

I have found recommendations to use sendStringParametersAsUnicode=false on the connection, but this isn't appropriate for my case because the columns are all nvarchar. In any case, I did try this and it didn't make anything faster and made some things slower. I also found info that suggested the jtds driver is faster, but it's not compatible with hibernate so I have to use the Microsoft driver (mssql-jdbc-7.0.0.jre8.jar).

My slow-running code looks like this:

    String sql = " SELECT DISTINCT FOO.BAR_ID_ "
            [big join here across multiple tables]
            "WHERE FOO.ID_= :fooId " +
            " AND BAZ.NAME_ LIKE :likeParameter ";
    Query query = getSession().createSQLQuery(sql)
            .setParameter("fooId", fooId)
            .setParameter("likeParameter", likeParameter);
    return query.list();

It runs quickly if I change it to this:

String sql = " SELECT DISTINCT FOO.BAR_ID_ "
            [big join here across multiple tables]
            " WHERE FOO.ID_= " + fooId  +
            " AND BAZ.NAME_ LIKE '" + likeParameter + "'";
    Query query = getSession().createSQLQuery(sql);
    return query.list();

The LIKE parameter has wildcards on both ends, which I would expect to prevent the use of an index on that column, but that would be true for the concatenation case as well, and it runs quickly. Clearly there is some other issue with SQL Server and JDBC parameters besides the Unicode issue. What can I do to get around this?

UPDATE: here are the query plans:

Fast query with in-line data: https://www.brentozar.com/pastetheplan/?id=SkkH0NiU4

Slow query with parameters: https://www.brentozar.com/pastetheplan/?id=SkBxhQoLE

Best Answer

Not a complete answer, but here's some diagnostics and suggestions from looking at the plans.

You're running SQL 2016, so there's a ton of useful info in the plan.

First is that you're running a GDR patch for SQL 2016 SP1 (13.0.4224.16). SQL 2016 SP2 was released on April 24, 2108, almost a year ago, and after 1 year applying the service pack is required for support. So you can test this on SQL 2016 SP2, or SQL 2017 to see if it behaves differently.

When optimizing a parameterized query, SQL Server has to pick a plan that will be used for all future values of fooId and likeParameter. Apparently it's guessing you will often have a selective LIKE predicate, and so using the Name index will, on average, be cheaper. With the hard-coded values, SQL Server doesn't need to build a plan that works for multiple different LIKE predicates, and so picks a plan that doesn't use a non-clustered index.

To evaluate the predicate NAME_ LIKE @p SQL Server is range scanning a non-clustustered index on NAME_. This works great if you happen to pass a LIKE predicate that starts with a few literal characters, but when it starts with a wildcard, will require a complete scan of the index. Here scanning, and evaluating the LIKE predicate for each of 14M rows.

In the fast plan SQL seeks 21,518 separate components using the clustered index and checks each one's name against the LIKE predicate.

From the plans:

The difference between the plans is a single operator, which accounts for all of the increased cost: a scan of 14,675,300 rows from the index [PDL_COMPONENT].[IDX_COMPONENT_NAME], taking 37sec of CPU time:

  <RunTimeInformation>
    <RunTimeCountersPerThread 
      Thread="0" 
      ActualRows="741" 
      Batches="0" 
      ActualEndOfScans="1" 
      ActualExecutions="1" 
      ActualExecutionMode="Row" 
      ActualElapsedms="37038" 
      ActualCPUms="37028" />
  </RunTimeInformation>

Not parameter sniffing, as the ParameterCompiledValue is the same as ParemeterRuntimeValue:

    <ParameterList>
      <ColumnReference Column="@P1" ParameterDataType="nvarchar(127)" ParameterCompiledValue="N'%geronimo%'" ParameterRuntimeValue="N'%geronimo%'" />
      <ColumnReference Column="@P0" ParameterDataType="int" ParameterCompiledValue="(9)" ParameterRuntimeValue="(9)" />
    </ParameterList>

Using the query with the hard-coded values isn't the worst idea here. Or you could try a RECOMPILE and turn off plan caching without giving up parameterizing your queries.

And I don't know what the indexes and foreign keys in the database are, but using DISTINCT has really bad code smell, as it often indicates that one or more JOIN is not correct, and it affects the cardinality estimation of the query.

Related Solutions

Understanding Parameters in SQL Server Query Plan

Yes, you are likely seeing implicit conversions due to the guesses SQL Server has to make. It is creating a plan that will work not just for the parameter value you're passing now, but other potential values, too. With strings it (or perhaps JDBC? not sure) chooses an arbitrary default of 4000 and Unicode. I believe there are ways in JDBC to not send Unicode (Unicode vs. non-Unicode implicit conversions are among the most tedious), but you're still going to deal with the length issue (which may or may not be an issue in your specific case; again, not sure).

To avoid the implicit conversions and guesses, and ultimately the scans and high CPU conditions, you need to make SQL Server understand the data type that is being used in the parameter through explicit / strong typing.

Here are my suggestions, in order of preference:

Switch to stored procedures.
Switch to stored procedures.

Declare your variables explicitly, e.g.:

DECLARE @p1 VARCHAR(36) = ?; UPDATE ... WHERE col = @p1;

Keep using CAST/CONVERT.

Postgresql – set PostgreSQL “stringtype=unspecified” behavior as default

Unfortunately the answer is no. stringtype is not server parameter. It's JDBC's own client parameter. Check the function here;

org.postgresql.jdbc.PgConnection.getStringVarcharFlag()

and here

org.postgresql.jdbc.PgPreparedStatement.getStringType()

You will see that JDBC sets parameter type to VARCHAR or UNSPECIFIED type according to the stringtype property.

Best Answer

Related Solutions

Understanding Parameters in SQL Server Query Plan

Postgresql – set PostgreSQL “stringtype=unspecified” behavior as default

Related Question