I've got a column with a filenames stored in a nvarchar(255). I'm trying to parse out the file extension using
reverse( left( reverse(filename), charindex('.', reverse(filename) ) -1 ) )
This works great for most of my records, however some are failing due to charindex returning 0 while a '.' is definitely part of the string.
I think I've isolated it to an issue with accented characters and a few others.
One of them I have is ité.pdf, when reversed, charindex is not finding the '.'
fdp.́eti
I've taken to converting it to it's hex representation to narrow down what's going on using
cast(reverse(filename) as varbinary(max))
0x6600640070002E000103650074006900
From what I can work out The '.' is 2E 00. 01 03 65 00 is the 'é' which is encoded differently than the others I have in the same column.
Since 2E 00 comes first I would still expect charindex
to get a match, but it's returning me 0, which in turn make the call to LEFT
fail.
Am I missing something obvious here? Converted to HEX it seems the '.' I am looking for is definitely there, but the character after is tripping charindex
somehow.
EDIT: Simplified test
When outputting the string there is clearly a ., but charindex is not finding it?
select charindex('.', cast(0x6600640070002E000103650074006900 as nvarchar(max)))
Fiddle:
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=cda6f0a77fa9e206bff9217ec58be075
Best Answer
Apparently it seems related to a collation or encoding issue.
I'd check database and column collations to ensure both are the same.
Server collation
Database collation
Column collation
Quoted from MS-Docs Collation precedende
Let me suggest to have a look at this or this answers of Solomon Rutzky about UTF-8 and UTF-16 encoding values for VARCHAR and NVARCHAR.
By now you can try by CASTING it to VARCHAR(255) before use CHARINDEX.
I've tried to reproduce it in one of my servers in this way:
Note it returns the accent as a single character.