First things first, I notice that your 'what I do now' query:
SELECT TOP (1)
ca.SensorValue,
ca.Date
FROM sys.partitions AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue
FROM SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.[partition_number]
AND v.DeviceId = @fDeviceId
AND v.SensorId = @fSensorId
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
WHERE
p.[partition_number] <= $PARTITION.SensorValues_Date_PF(@fDate)
AND p.[object_id] = OBJECT_ID(N'dbo.SensorValues', N'U')
AND p.index_id = 1
ORDER BY
p.[partition_number] DESC,
ca.Date DESC;
...produces an execution plan like this:
This execution plan has an estimated total cost of 0.02 units. Over 50% of this estimated cost is the final Sort, running in Top-N mode. Now estimates are just that, but sorts can be expensive in general, so let's remove it without changing the semantics:
SELECT TOP (1)
ca.SensorId,
ca.SensorValue,
ca.Date
FROM
(
-- Partition numbers
SELECT DISTINCT
partition_number = prv.boundary_id
FROM
sys.partition_functions AS pf
JOIN sys.partition_range_values AS prv ON
prv.function_id = pf.function_id
WHERE
pf.name = N'SensorValues_Date_PF'
AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
) AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue,
v.SensorId
FROM dbo.SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
AND v.DeviceId = @fDeviceId
AND v.SensorId = @fSensorId
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
ORDER BY
p.partition_number DESC,
ca.Date DESC
Now the execution plan has no blocking operators, and no sorts in particular. The estimated cost of the new query plan below is 0.01 units and the total cost is distributed evenly over the data access methods:
With the improvement in place, all we need to produce a result for each Sensor ID is to make a list of Sensor IDs and APPLY
the previous code to each one:
SELECT
PerSensor.SensorId,
PerSensor.SensorValue,
PerSensor.Date
FROM
(
-- Sensor ID list
VALUES
(@fSensorId1),
(@FSensorId2),
(@FSensorId3)
) AS Sensor (Id)
CROSS APPLY
(
-- Optimized code applied to each sensor
SELECT TOP (1)
ca.SensorId,
ca.SensorValue,
ca.Date
FROM
(
-- Partition numbers
SELECT DISTINCT
partition_number = prv.boundary_id
FROM
sys.partition_functions AS pf
JOIN sys.partition_range_values AS prv ON
prv.function_id = pf.function_id
WHERE
pf.name = N'SensorValues_Date_PF'
AND prv.boundary_id <= $PARTITION.SensorValues_Date_PF(@fDate)
) AS p
CROSS APPLY
(
SELECT TOP (1)
v.Date,
v.SensorValue,
v.SensorId
FROM dbo.SensorValues AS v
WHERE
$PARTITION.SensorValues_Date_PF(v.Date) = p.partition_number
AND v.DeviceId = @fDeviceId
AND v.SensorId = Sensor.Id--@fSensorId1
AND v.Date <= @fDate
ORDER BY
v.Date DESC
) AS ca
ORDER BY
p.partition_number DESC,
ca.Date DESC
) AS PerSensor;
The query plan is:
Estimated query plan cost for three Sensor IDs is 0.011 - half that of the original single-sensor plan.
You can do this.
WITH T
AS (SELECT ISNULL((SELECT MAX(ID) FROM StagingTable), 0) +
ROW_NUMBER() OVER (ORDER BY NaturalID) AS New_ID,
ID
FROM StagingTable
WHERE ID IS NULL)
UPDATE T
SET ID = New_ID
So the windowed function is used in the SELECT
list but you can still use the result of it to UPDATE
the column.
You should probably have a filtered index unique constraint on ID WHERE ID IS NOT NULL
to prevent duplicates too. Or run this at serializable isolation level to block concurrent inserts.
Best Answer
This question is far more complicated than it appears to be on the surface (hence the longer-than-most-would-expect answer). If the strings being searched were codes (postal codes, ISO country codes, ISO state codes, SKUs, etc) or something where the characters used were a limited subset of all possible letters of all languages, then this would be fairly straight-forward. But when dealing with people's names, then no such luck.
All of the following example code and test cases can be found on Pastebin ( Searching
for Case-Sensitive patterns in SQL Server ). The SQL posted on Pastebin includes
additional test cases and additional examples related to various points mentioned below.
Very simplistically you can do the following, which does work with the sample data. I added two test cases to help determine whether or not this (or any) method works as it is data that should not match.
Please note:
I am not using a collation that starts with
SQL_
as those have been deprecated (or perhaps more accurately: obsoleted) since SQL Server 2000 was released.I am using a collation in the
100
series as that is the most recent, and should be available starting in SQL Server 2008. Not all collations have a100
series, but it is best to use the most recent one, which might be90
.I am not using a binary collation (i.e. ending in
_BIN2
or even_BIN
though the_BIN
collations have been deprecated since SQL Server 2005 was released, so only use_BIN2
if needing a binary collation). Binary collations are not "case sensitive"; they are "byte sensitive" (_BIN
) / "code point sensitive" (_BIN2
) and those are very different concepts. Binary collations only appear to be case sensitive when working with (or testing) US-English-only characters. Any characters with accents can potentially be handled incorrectly when using binary collations. For example:For a more in-depth look into this topic, please see my blog post: No, Binary Collations are not Case-Sensitive.
Instead of using a single character range (i.e. the
[A-Z]
that works inLIKE
andPATINDEX
) I added each of the 26 letters. This is required in order for the preferred_CS
(i.e. case sensitive) collations to work properly. If you specify a range of characters via the[A-Z]
syntax, then it will appear to not respect the case sensitive collation. That, however, is a false interpretation of the behavior. Case sensitive comparisons can be tricky since they, by design, work differently between equality comparisons (e.g.WHERE N'zbz' LIKE N'%B%'
orWHERE N'b' = N'B'
) and range comparisons (e.g.WHERE N'zbz' LIKE N'%[A-C]%'
). Range comparisons are more like sorting. And, case sensitive sorting does not handle all of one case before the other case. Think in terms of how dictionaries are ordered (and in fact, the Microsoft documentation even uses the term "dictionary ordering" to describe non-binary collations): they don't place all words starting with A-Z prior to words starting with a-z. They group upper and lower case of each letter together: A,a,B,b,C,c, and so on, not A,B,C,a,b,c. Hence, assuming that upper-case are first (and which one comes first depends on if you are using aSQL_
or non-SQL_
collation), a range of[A-C]
equates to[AaBbC]
, and if lower-case are first, that same range equates to[AbBcC]
. In both cases, the lower-case "b" is validly in that range.However, this approach mainly works if the data you are searching for is guaranteed to only ever have the 26 US-English characters in it, and never ever any other characters found in most other languages, especially those with accents / diacritical marks. Given that your data contains names, you cannot guarantee that, even if working in the US since people live here who came from somewhere else (or an ancestor did), or sign up for services hosted here even though they live anywhere in the world.
For example, add the following two test cases to the example code above:
Running the code again will not yield new results. The second line added uses a lower-case
ü
so it shouldn't match anyway. But the first line uses an upper-caseÜ
so it should match, but can't unless we either make the collation accent-insensitive (i.e.Latin1_General_100_CS_AI
) or add theÜ
character to both character ranges of theLIKE
clause (e.g.[ABCDEFGHIJKLMNOPQRSTUVWXYZÜ]
).Making the comparison accent-insensitive might be an option when looking for any two capital letters next to each other, but not when comparing actual words. And adding the extra letters (the ones with accents) only works when using
VARCHAR
(i.e. Extended ASCII) data. If that is the case then this approach could be made to work IF you added all of the additional capital letters available in the code page specified by the Collation. To find that, you can do the following:According to the Wikipedia page for Code Page 1252, there are 32 additional capitals, which are (I made it a heading so that the characters would be easier to see):
Š Ž ÀÁÂÃÄÅ Æ Ç ÈÉÊË ÌÍÎÏ Ð Ñ ÒÓÔÕÖ Ø ÙÚÛÜ Ý Þ
The result would be a
LIKE
clause single-character specification of:[ABCDEFGHIJKLMNOPQRSTUVWXYZŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ]
But that particular set of characters only works with Code Page 1252. If your collation uses another Code Page then that set of extra characters would most likely be different, even if there is some overlap.
On the other hand, if you are using Unicode /
NVARCHAR
data, then even adding those extra characters will not work. Not only can Unicode represent all accented characters (unlike any individual Code Page), it also allows for constructing accented characters by starting with a regular, non-accented character, such as "U" and then adding one or more accents to it. These accents are called "Combining Characters" since they only work in combination with certain base characters (for a more detailed look into Combining Characters, please see my answer to How To Strip Hebrew Accent Marks ). An example of this was shown in Note #3 above (e.g.PRINT N'U' + NCHAR(0x0308); -- Ü
).If we now add the following two test cases to the example code above:
and re-run the test, those new entries will not show up, at least not when using the original
LIKE
definition that specified just the 26 letters of the US-English alphabet. That first new entry would show up if we added theÜ
character sinceU
+ that combining character equate to the same thing. So, is it feasible to add the remaining characters like I showed above in theVARCHAR
example? That was 32 extra characters. How many extra characters forNVARCHAR
? About 1200 ;-). Um no, not feasible.And if you did add in all 1200 additional capital letters, would that work? Nope, because there can still be more capitals that simply don't have a "pre-composed" equivalent. For example, that second new entry won't show up since there is no equivalent character to put into the
LIKE
definition. The only way to make this work would be to not only add in all 1200 additional capitals, but also use an accent-insensitive collation. This still doesn't seem very feasible. And it still would only work in limited scenarios such as this one (looking for any capitals, not specific ones).Unfortunately, T-SQL is just not equipped to deal with this.
But fortunately, there is a way of accomplishing this: Regular Expressions (commonly referred to as just RegEx). RegEx allows for specifying "classes" of characters (e.g.
\s
= "any white-space character"), and one class --\p{property}
-- will match characters based on certain properties of Unicode characters. The following two class specifications will allow us to get any capital letter, even if it doesn't exist today:\p{Lu}
= An upper-case letter that has a lower-case equivalent\p{M}
= A combining markWe can use those to come up with the following RegEx pattern:
To translate this for those who aren't familiar with RegEx syntax:
(
and)
group everything inside to be a single pattern that can be repeated or referenced later?:
tells the RegEx engine to not save any matches for that grouping. We won't be using whatever matches so it is more efficient to not store it in memory.*
indicates that there should be 0 or more matches for the preceding item (i.e. the\p{M}
). This will match any number of diacritical marks, or none.\p{Lu}\p{M}*
altogether says to match any single upper-case letter by itself, or that is followed by any number of diacritical marks. This will match, for example:U
,Ü
,U
+NCHAR(0x0308)
, and so on.{2}
indicates that a match will consist of exactly 2 occurrences of the preceding item (i.e. the(?:\p{Lu}\p{M}*)
). Without this part it would only match a single upper-case letter, not 2. And this is why we needed the parenthesis: to group the upper-case letter and the zero-or-more combining marks together to be a sub-pattern that could be repeated.The only part left to take care of is the fact that SQL Server does not have any built-in support for Regular Expressions. For this we turn to SQLCLR which allows us to use the RegEx functionality that exists in .NET. If you do not already have SQLCLR RegEx functions, there are many code examples around the Interwebs (just be careful since a few are pretty good, but many are pretty bad), or you can download the Free version of SQL#, a pre-compiled library of SQLCLR functions (that I created) which contains several RegEx functions, including the one used in the following example code:
Running against the test data again will bring back all of the expected entries, including the final one with the 3 diacritical marks that isn't even a character in any language (at least, not yet). Using RegEx with this pattern will just work, always, for all known languages (well, languages that have capitals since some do not).