SQLite – Matching Separate Unicode Characters as One

csqlite

I am writing a C# program, and I'm querying a SQLite database. I have multiple diacritical characters (e.g ã, õ, ó, ê) which I need to match in my search. I have the command

"SELECT * FROM DesanoDict1 WHERE Desano LIKE '%" + string + "%' ORDER BY Desano ASC"

However, I have the problem where those diacritical characters are entered in as separate ones (i.e the ã is an a + ~). I could make a custom keyboard with the single unicode characters (as one instead of two separate characters), but the users that will be using the program will probably be using either the same or a similar keyboard to the one I'm currently using, so it will only create problems. I am very new to database and querying, so any help here would be greatly appreciated.

Best Answer

Unicode is a complex beast and database engines tend to be at best superficially aware of it. They certainly don't tend to know about sequence equivalences.

I would suggest making sure that you store data in a consistent normalised form, and normalise search parameters in the same way before searching. I'm not really a c# person, but https://msdn.microsoft.com/en-us/library/ebza6ck1(v=vs.110).aspx looks like a decent place to start for how to go about that.

Obviously you will have existing data that is not normalised, so if you go with this route you'll need to write something to scan the data and replace string values with normalised versions as needed.