Does [:ascii:]
class work in Postgres at all? It is not listed in their help, however I see examples in the web which utilize it.
I have a UTF-8 database, where collation and c_type are en_US.UTF-8
, and Postgres version is 9.6.2.
When I search for non-ASCII rows like this:
select title from wallabag_entry where title ~ '[^[:ascii:]]';
I get both Unicode and non-Unicode symbols (full output is here):
Сталинская правозащитница: мать Меленкова бабушка Настя
Дневник НКВДиста Шабалина: Знает ли Москва положение на фронте?
Бег по городу и поездка на осле: как в средневековье наказывали прелюбодеев
Как комиссар Крекшин в 1740-е чуть не отменил историю России
Have you heard of Saint Death? Don’t pray to her.
Архаїчна українська мова: перевага чи недолік?
Гренада не их
China’s marriage rate is plummeting because women are choosing autonomy over
What is wrong with this query?
Best Answer
To answer your question:
[:ascii:]
works. You may have some characters in your text that you do not recognize as non-ASCII, yet they're there. They can be something like a non-breakable space, for instance, or any other Unicode space character.It is not strange to have non-breakable spaces (
) in texts that you copy-and-paste from a web page, yet you don't notice they're there.Here is an example to show:
That's what you get:
You can see from this, that your problem is the right-apostrophe character. ASCII only supports the apostrophe. Left-apostrophe and right-apostrophe are typographically correct Unicode extensions.
dbfiddle here
You can check it also with previous versions at http://rextester.com/UKIQ48014 (PostgreSQL 9.5) and http://sqlfiddle.com/#!15/4c563/1/0 (PostgreSQL 9.3)
The texts that I guess you think are pure ASCII, and are not:
dbfiddle here
These texts are using ’ instead of ' to mark apostrophes.
Check Punctuation: Why is the right single quote (U+2019), and not the semantically distinct apostrophe (U+0027), the preferred apostrophe character in Unicode? ... to see that you're not the first person encountering this problem.