I have csv-file that is UTF-8 encoded. I'm using python (psycopg2) to copy it into a Postgres database. My Postgres database is using UTF-8. However I have problems to show Ä, Ö and Å characters.
If I'm using query SET client_encoding = 'ISO-8859-1';
It would fix the issue.
My database is looking like this (psql -l
):
Name | Owner | Encoding | Collate | Ctype | Access privileges
data | postgres | UTF8 | fi_FI.UTF-8 | fi_FI.UTF-8 |
My server locale is fi_FI.utf8
My CSV file is UTF-8. Its made with python command:
with open('users.csv', "w+", encoding="UTF-8", newline='') as f:
writer = csv.writer(f)
writer.writerows(list_table)
I'm writing it to Postgres table with this code:
cur = conn.cursor()
cur.execute("CREATE SCHEMA IF NOT EXISTS data;")
cur.execute("DROP TABLE IF EXISTS data.users"`)
cur.execute("CREATE TABLE data.users(name VARCHAR(40), lastname VARCHAR(40), phone VARCHAR(13), email VARCHAR(100), iban VARCHAR(18), id VARCHAR(11))")
copy_ = """
COPY data.users FROM stdin WITH CSV HEADER`
DELIMITER as ',' ENCODING 'UTF-8'
"""
with open('users.csv', 'r') as f:
cur.copy_expert(sql=copy_sql, file=f)
conn.commit()
cur.close()
First I used this code
with open('users.csv', 'r') as f:
next(f)
cur.copy_from(f, 'data.users', sep=',')
cur.close()
I switched to Copy_expert
to get that ENCODING 'UTF-8'
into the string. However it didn't do anything.
Now when I use pgadmin4 or console (Linux ubuntu 18.04LTS terminal) I can't get ÄÖÅ characters to show up. Mökkisuo is "Mökkisuo" and Lammasjärvi is "Lammasjärvi". I'm using basic query like select * from data.users
to check results.
However if I change my client encoding to ISO-8859-1 everything shows like it should? (SET client_encoding = 'ISO-8859-1';
).
Start of EDIT
Here is my locales.
~$ locale
LANG=fi_FI.utf8
LANGUAGE=
LC_CTYPE="fi_FI.utf8"
LC_NUMERIC="fi_FI.utf8"
LC_TIME="fi_FI.utf8"
LC_COLLATE="fi_FI.utf8"
LC_MONETARY="fi_FI.utf8"
LC_MESSAGES="fi_FI.utf8"
LC_PAPER="fi_FI.utf8"
LC_NAME="fi_FI.utf8"
LC_ADDRESS="fi_FI.utf8"
LC_TELEPHONE="fi_FI.utf8"
LC_MEASUREMENT="fi_FI.utf8"
LC_IDENTIFICATION="fi_FI.utf8"
LC_ALL=
~$ locale charmap
UTF-8
And here is how I get bad result!
:~$ sudo su - postgres
postgres@Server:~$ psql
postgres=# \c data
data=# select * from data.users;
result is like this:
Piritta | Mökkisuo | 0 | piritta.mokkisuo@notreal.com
Oh and data is just generated. It's not real people data.
What I have to change to get this show correctly in UTF-8? I don't understand where that conversion to ISO-8859-1 is coming from?
I don't understand why file that is UTF-8, Database that is UTF-8. Locale that is UTF-8 etc. Has to be changed to Latin-1 (client encoding) to show up correctly? What is causing this? Everything should already be UTF-8? So why ISO-8859-1? Why UTF-8 doesn't show these characters?
Best Answer
I solved my own problem. Everything was right from server side and problem was few little things. My first mistake was that I made two text files that I used in my code to generate users. These two text files was in ANSI encoded. I made them with notepad in windows and didn't think much of it.
Even though Visual Studio Code show these ANSI encoded text-files correctly and writes them inside of UTF-8 csv-file correctly. Even if I open that csv-file with notepad++ it's show's those characters as UTF-8 encoded. However when I pushed them to database they all broke up. Not sure why they show up correctly in UTF-8 file in notepad++? Is there some weird translation correction?
I solved this to convert those ANSI text files to UTF-8 with notepad++.
However my second mistake was that I didn't open those files as UTF-8! So even when I converted them to UTF-8 my Visual Studio tried to read them as latin-1 (ISO-8859-1). This round they where broken before I pushed them to database. I took me while to notice, because end result was still the same as ANSI coded files.
I corrected this with adding
encoding="utf-8"
to all my open statements. Example herewith open('users.csv', 'r',encoding="utf-8") as f:
I wanted to be sure that Visual Studio Code will open them as UTF-8 and not some weird format.This finally solved my issue!! This was really specific problem, so I'm not sure if this will help anyone else. However I wanted to post solution here just in case. I hate those post that just writes up that this is now solved and never tells how.