Why Does SSIS Oracle Source by Attunity return DT_WSTR for Oracle CHAR(1) Column

oraclessisssis-2012

Using SSIS 2012 and Microsoft SSIS Connector for Oracle by Attunity v. 2.0.

When I add an Oracle Source for a particular table, it is setting the data type for an Oracle CHAR(1) field to DT_WSTR.

I have read documentation and searched the web, and I would expect an Oracle CHAR(1) to be DT_STR and not DT_WSTR.

Does anyone know why this is happening?

Further, when I try to set the data type in my Oracle Source to DT_STR for the column, the setting doesn't seem to stick.

Anyone encounter this, and do you have any suggestions as to how to work around it?

Best Answer

I believe I figured this out, but if anyone has better documentation or expertise, please do post a separate answer, and I will vote it up and accept it.

What I have discovered is that Oracle PL/SQL has a means for expressing the concepts of "byte semantics" and "character semantics" in PL/SQL when declaring character data types.

See the Length Semantics for Character Datatypes heading in this Oracle Documentation Web Page where it states:

Consider the size of characters when you specify the column length for character datatypes. You must consider this issue when estimating space for tables with columns that contain character data.

The length semantics of character datatypes can be measured in bytes or characters.

Byte semantics treat strings as a sequence of bytes. This is the default for character datatypes.

Character semantics treat strings as a sequence of characters. A character is technically a codepoint of the database character set.

While reviewing my Oracle source database, I am now noticing that the fields that have been imported as DT_WSTR are in fact declared using the PL/SQL syntax xxxCHAR(n CHAR) - where the the second "CHAR" is explicitly stating the character semantics - rather than the default byte semantics.

Thus, in my Oracle database, I believe I need to treat a column with database data type CHAR(1 CHAR) as a character datatype using character semantics, and I believe that means that the field can accommodate 1 character regardless of the byte length of the character set in use.

Given the above, SSIS (via the Microsoft Oracle Connector by Attunity) will promote non-NCHAR/NVARCHAR data types to DT_WSTR (as opposed to DT_STR).

And, following the above, I will need NCHAR and NVARCHAR SQL Server database data types when importing columns expressed using character semantics.

Once I do this, I will no longer have to convert DT_WSTR data types to DT_STR in my SSIS data flows (which was becoming quite tedious).

Note, I was already using NCHAR and NVARCHAR for fields declared as such on the Oracle side - expecting to provide the proper storage for these types on the SQL Server side.

UPDATE: Here is the docs.oracle.com web page that shows the syntax I describe as follows:

CHAR

You use the CHAR datatype to store fixed-length character data. How the data is represented internally depends on the database character set. The CHAR datatype takes an optional parameter that lets you specify a maximum size up to 32767 bytes. You can specify the size in terms of bytes or characters, where each character contains one or more bytes, depending on the character set encoding. The syntax follows:

CHAR[(maximum_size [CHAR | BYTE] )]