SQL Server – 9 Digit Zip Codes Reporting Length of 12

openrowsetsql serversql-server-2012

I am attempting to resolve some difficulties in getting zip codes to display properly.

The original spreadsheet has zip codes of mixed 5 and 9 digit formats. After the import process, these 9 digit zip codes are reporting a length of 12 digits. Now when I try to add a hyphen '-' to the 9 digit zip codes I'm getting abnormal results and errors because of the wrong length and various data type conversion issues.

The import was performed using an openrowset method for importing the data from a spreadsheet.

When I query the newly imported data, I see the zip codes showing the same as they were in the spreadsheet, but the length is wrong.

SELECT ZIP,
    LEN(LTRIM(RTRIM(Zip))) AS ZIPLENGTH
  FROM XLS_IMPORT
ZIP         ZIPLENGTH
45750       5
432013256   12
441153221   12
44120       5
441351362   12

If I select the left 9 characters of the data, everything gets converted to float and the zip codes are now unreadable.

SELECT LEFT(ZIP,9) FROM XLS_IMPORT
WHERE LEN(LTRIM(RTRIM(ZIP))) = 12
ZIP
4.32013e+
4.42034e+
4.56637e+
4.41153e+
4.36045e+
4.41133e+

How can I get these zip codes back to the correct 9 digits? Or how can I add a hyphen to those 9 digit zip codes which are reporting length of 12? My end goal is to simply get the 9 digit zip code to have a hyphen in the middle.

The datatype of the Zip column is float.

I just discovered that some of my spreadsheets like NJ and NY have an apostrophe before the leading 0 in the zip code. I will need to investigate how to handle the '0xxxx zip codes to get this to work on some of my spreadsheet imports.

Best Answer

Zip codes are strings, not numbers. Some of them have 1 or even 2 (but not more than 2) leading zeroes. The datatype in the import table should be VARCHAR(10) so that it can hold 5 digit and 9 digit + hyphen zip codes. Even if you never have to store postal codes of other countries, and even if the values will only ever have numeric digits (i.e. 0 - 9), this data is still string data, just like phone numbers.

Based on the import query shown in your other question (automate import and export of process EXCEL -> SQL SERVER - > EXCEL without using SSIS ):

SELECT * INTO XLS_IMPORT
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0; Database=C:\RSG_ETL_Tool\Ohio\OH.xls; HDR=YES; IMEX=1',
'SELECT * FROM [OH$]');

I would suggest not relying upon the SELECT INTO construct to create the XLS_IMPORT table, but instead create the import table(s) manually, and then use the INSERT INTO ... SELECT FROM OPENROWSET() construct. Doing this would allow you to do the following things to improve this situation:

  1. Create the ZipCode field as VARCHAR(10)
  2. Convert the values on the way in using the STR function which, for an initial FLOAT value of 432013256, will return 432013256 instead of 4.32013e+008 (which is what you get when you convert to VARCHAR).
  3. Fix any missing leading zeroes using something like the following:

    CASE
       WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 3 AND 4
              THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 5)
       WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 7 AND 8
              THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 9)
       WHEN LEN(LTRIM(STR(@ZipColumn))) IN (5, 9) THEN LTRIM(STR(@ZipColumn))
       ELSE 'BadZipCode'
    END
    

Example:

DECLARE @ZipColumn FLOAT = 032013256.000000;

SELECT CASE
          WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 3 AND 4
                 THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 5)
          WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 7 AND 8
                 THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 9)
          WHEN LEN(LTRIM(STR(@ZipColumn))) IN (5, 9) THEN LTRIM(STR(@ZipColumn))
          ELSE 'BadZipCode'
       END;

Returns:

032013256

Ideally, you would fix the column definition in the spreadsheet to be a string. But even if you do that, it might still be a good idea to keep this code around.

My end goal is to simply get the 9 digit zip code to have a hyphen in the middle.

With that goal in mind, the following inline TVF can be used to both convert the FLOAT value to VARCHAR, and to add in the hyphen for ZIP + 4 values.

Code for iTVF:

CREATE FUNCTION dbo.FormatZIPCode(@NumericZIPCode FLOAT)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN

WITH string AS
(
    SELECT  LTRIM(STR(@NumericZIPCode)) AS [Value],
            LEN(LTRIM(STR(@NumericZIPCode))) AS [Size]
), converted AS
(
SELECT  CASE
                WHEN st.[Value] IS NULL THEN NULL
                WHEN st.[Size] BETWEEN 3 AND 4
                    THEN RIGHT('0000' + st.[Value], 5)
                WHEN st.[Size] BETWEEN 7 AND 8
                    THEN RIGHT('0000' + st.[Value], 9)
                WHEN st.[Size] IN (5, 9)
                    THEN st.[Value]
                ELSE 'BadZipCode'
            END AS [ZIP],
            st.[Size] AS [OriginalSize]
    FROM        string st
)
SELECT  IIF(cnv.[OriginalSize] >= 7, STUFF(cnv.[ZIP], 6, 0, '-'), cnv.[ZIP])
               AS [FormattedZIPCode]
FROM        converted cnv;

Test:

SELECT  *
FROM    (VALUES (CONVERT(FLOAT, NULL)), (1), (12), (123), (1234), (12345),
                (123456), (1234567), (12345678), (123456789)) src(val)
CROSS APPLY dbo.FormatZIPCode(src.[val]) frmt;

Returns:

val         FormattedZIPCode
---------   ----------------
NULL        NULL
1           BadZipCode
12          BadZipCode
123         00123
1234        01234
12345       12345
123456      BadZipCode
1234567     00123-4567
12345678    01234-5678
123456789   12345-6789

In order to be clearer about what is being suggested, the following shows all of the suggestions noted above put together:

CREATE TABLE dbo.XLS_IMPORT
(
  Col1     DataTypeForCol1,
  Col2     DataTypeForCol2,
  ZIPCode  VARCHAR(10),
  ...
);

INSERT INTO dbo.XLS_IMPORT (Col1, Col2, ZIPCode, ...)
  SELECT xls.Col1, xls.Col2, zip.[FormattedZIPCode], ...
  FROM   OPENROWSET('Microsoft.ACE.OLEDB.12.0',
         'Excel 12.0; Database=C:\RSG_ETL_Tool\Ohio\OH.xls; HDR=YES; IMEX=1',
         'SELECT * FROM [OH$]')
  CROSS APPLY dbo.FormatZIPCode(xls.[ZIP]) zip;

Some of the spreadsheets are "aware" of there being leading zeroes, and so prefixed the field in Excel with a single apostrophe so that Excel would treat the value as a string instead of as numeric (e.g. '01234). In that case, you can use the REPLACE function to strip out that apostrophe.

-- Test incoming string data (potentially prefixed with a single apostrophe)
SELECT  src.[val], frmt.[FormattedZIPCode],CHARINDEX(N'''', src.[val])
FROM    (VALUES (NULL), (N'''01234'), (N'''123456789'), (N'123'), (N'12345678')) src(val)
CROSS APPLY dbo.FormatZIPCode(REPLACE(src.[val], N'''', N'')) frmt;

However, you can't use REPLACE on all spreadsheets because the spreadsheets that have the ZIP column as numeric will do a CONVERT_IMPLICIT as it passes the value to the REPLACE function and the converted value will be in scientific notation (e.g. 1.23457e+008). So if you can't be certain which datatype is going to be returned by OPENROWSET, then you can use IIF (or CASE if using a version of SQL Server prior to 2012) and CHARINDEX to test for the presence of an apostrophe. If the data is being returned as FLOAT, then the implicit conversion to VARCHAR that will occur when passing the values into CHARINDEX will not be an issue since there is no apostrophe in 1.23457e+008 and the converted value is only going to CHARINDEX and not to dbo.FormatZIPCode.

-- Test handling incoming data as both numeric and string
SELECT  src.[val], frmt.[FormattedZIPCode],CHARINDEX(N'''', src.[val])
FROM    (VALUES (CONVERT(FLOAT, NULL)), (1), (12), (123), (1234), (12345),
                (123456), (1234567), (12345678), (123456789)) src(val)
CROSS APPLY dbo.FormatZIPCode(
       IIF(CHARINDEX(N'''', src.[val]) > 0, REPLACE(src.[val], N'''', N''), src.[val])
                             ) frmt;

SELECT  src.[val], frmt.[FormattedZIPCode]
FROM    (VALUES (N'''01234'), (N'''123456789'), (N'123'), (N'12345678')) src(val)
CROSS APPLY dbo.FormatZIPCode(
       IIF(CHARINDEX(N'''', src.[val]) > 0, REPLACE(src.[val], N'''', N''), src.[val])
                             ) frmt;