Sql-server – Correct Code Page or Collation in SQL Server for Portuguese characters

collationsql-server-2008

I'm inserting data (Bulk insert and merge update) for a Portuguese data source (CSV files) into my database. Although the DB collation is Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data and data stored in NVARCHAR columns, the portuguese charaters are not correctly being saved.

i.e : Original Data

Regularização de Importação de mercadorias das ZEEs com Pagamento  diferido, com Isenção

Data in database after insert

Regularização de Importação de mercadorias das ZEEs com Pagamento  diferido, com Isenção

Here the import statement:

CREATE PROCEDURE [dbo].[SP_Import_Temp_CPC] 
-- Add the parameters for the stored procedure here
@FilePath [nvarchar](1000)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;

TRUNCATE TABLE Temp_CPC;

-- Insert statements for procedure here
EXEC ('BULK INSERT Temp_CPC
FROM ''' + @FilePath + '''
WITH ( FIELDTERMINATOR = ''<,>'', ROWTERMINATOR =''\n'', FIRSTROW = 2, KEEPIDENTITY, CODEPAGE = ''ACP'',TABLOCK,ORDER(ITEMID ASC) );');
END

I've tried with "ACP,RAW and OEM" but get the same result.

Am I doing something wrong here?

Best Answer

You're not going to get the correct result until you tell BULK INSERT the code page of your .csv file. You tried ACP, RAW and OEM but you did not try the correct one. Which one is the correct one you ask? That would be something only you can answer, since you have the file and you (should) know what code page was originally created as! Perhaps MS-DOS Portuguese?

the other alternative is to create the .csv file itself as an Unicode file, but that goes back to your source of the .csv file and depends how that file is created (you bcp out, you have it from a friend, was always used by the original program author, etc etc).

the file is an Unicode one (UTF8 without BOM)

SQL Server does not support UTF-8 files for bulk insert. You must save your file as UCS-2LE or as ASCII with codepage 860. You can use libiconv to do the conversion if you cannot save the file again in the proper encoding.