SQL Server – 9 Digit Zip Codes Reporting Length of 12

openrowsetsql serversql-server-2012

I am attempting to resolve some difficulties in getting zip codes to display properly.

The original spreadsheet has zip codes of mixed 5 and 9 digit formats. After the import process, these 9 digit zip codes are reporting a length of 12 digits. Now when I try to add a hyphen '-' to the 9 digit zip codes I'm getting abnormal results and errors because of the wrong length and various data type conversion issues.

The import was performed using an openrowset method for importing the data from a spreadsheet.

When I query the newly imported data, I see the zip codes showing the same as they were in the spreadsheet, but the length is wrong.

SELECT ZIP,
    LEN(LTRIM(RTRIM(Zip))) AS ZIPLENGTH
  FROM XLS_IMPORT

ZIP         ZIPLENGTH
45750       5
432013256   12
441153221   12
44120       5
441351362   12

If I select the left 9 characters of the data, everything gets converted to float and the zip codes are now unreadable.

SELECT LEFT(ZIP,9) FROM XLS_IMPORT
WHERE LEN(LTRIM(RTRIM(ZIP))) = 12

ZIP
4.32013e+
4.42034e+
4.56637e+
4.41153e+
4.36045e+
4.41133e+

How can I get these zip codes back to the correct 9 digits? Or how can I add a hyphen to those 9 digit zip codes which are reporting length of 12? My end goal is to simply get the 9 digit zip code to have a hyphen in the middle.

The datatype of the Zip column is float.

I just discovered that some of my spreadsheets like NJ and NY have an apostrophe before the leading 0 in the zip code. I will need to investigate how to handle the '0xxxx zip codes to get this to work on some of my spreadsheet imports.

Best Answer

Zip codes are strings, not numbers. Some of them have 1 or even 2 (but not more than 2) leading zeroes. The datatype in the import table should be VARCHAR(10) so that it can hold 5 digit and 9 digit + hyphen zip codes. Even if you never have to store postal codes of other countries, and even if the values will only ever have numeric digits (i.e. 0 - 9), this data is still string data, just like phone numbers.

Based on the import query shown in your other question (automate import and export of process EXCEL -> SQL SERVER - > EXCEL without using SSIS ):

SELECT * INTO XLS_IMPORT
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0; Database=C:\RSG_ETL_Tool\Ohio\OH.xls; HDR=YES; IMEX=1',
'SELECT * FROM [OH$]');

I would suggest not relying upon the SELECT INTO construct to create the XLS_IMPORT table, but instead create the import table(s) manually, and then use the INSERT INTO ... SELECT FROM OPENROWSET() construct. Doing this would allow you to do the following things to improve this situation:

Create the ZipCode field as VARCHAR(10)
Convert the values on the way in using the STR function which, for an initial FLOAT value of 432013256, will return 432013256 instead of 4.32013e+008 (which is what you get when you convert to VARCHAR).

Fix any missing leading zeroes using something like the following:

CASE
   WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 3 AND 4
          THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 5)
   WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 7 AND 8
          THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 9)
   WHEN LEN(LTRIM(STR(@ZipColumn))) IN (5, 9) THEN LTRIM(STR(@ZipColumn))
   ELSE 'BadZipCode'
END

Example:

DECLARE @ZipColumn FLOAT = 032013256.000000;

SELECT CASE
          WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 3 AND 4
                 THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 5)
          WHEN LEN(LTRIM(STR(@ZipColumn))) BETWEEN 7 AND 8
                 THEN RIGHT('0000' + LTRIM(STR(@ZipColumn)), 9)
          WHEN LEN(LTRIM(STR(@ZipColumn))) IN (5, 9) THEN LTRIM(STR(@ZipColumn))
          ELSE 'BadZipCode'
       END;

Returns:

032013256

Ideally, you would fix the column definition in the spreadsheet to be a string. But even if you do that, it might still be a good idea to keep this code around.

My end goal is to simply get the 9 digit zip code to have a hyphen in the middle.

With that goal in mind, the following inline TVF can be used to both convert the FLOAT value to VARCHAR, and to add in the hyphen for ZIP + 4 values.

Code for iTVF:

CREATE FUNCTION dbo.FormatZIPCode(@NumericZIPCode FLOAT)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN

WITH string AS
(
    SELECT  LTRIM(STR(@NumericZIPCode)) AS [Value],
            LEN(LTRIM(STR(@NumericZIPCode))) AS [Size]
), converted AS
(
SELECT  CASE
                WHEN st.[Value] IS NULL THEN NULL
                WHEN st.[Size] BETWEEN 3 AND 4
                    THEN RIGHT('0000' + st.[Value], 5)
                WHEN st.[Size] BETWEEN 7 AND 8
                    THEN RIGHT('0000' + st.[Value], 9)
                WHEN st.[Size] IN (5, 9)
                    THEN st.[Value]
                ELSE 'BadZipCode'
            END AS [ZIP],
            st.[Size] AS [OriginalSize]
    FROM        string st
)
SELECT  IIF(cnv.[OriginalSize] >= 7, STUFF(cnv.[ZIP], 6, 0, '-'), cnv.[ZIP])
               AS [FormattedZIPCode]
FROM        converted cnv;

Test:

SELECT  *
FROM    (VALUES (CONVERT(FLOAT, NULL)), (1), (12), (123), (1234), (12345),
                (123456), (1234567), (12345678), (123456789)) src(val)
CROSS APPLY dbo.FormatZIPCode(src.[val]) frmt;

Returns:

val         FormattedZIPCode
---------   ----------------
NULL        NULL
1           BadZipCode
12          BadZipCode
123         00123
1234        01234
12345       12345
123456      BadZipCode
1234567     00123-4567
12345678    01234-5678
123456789   12345-6789

In order to be clearer about what is being suggested, the following shows all of the suggestions noted above put together:

CREATE TABLE dbo.XLS_IMPORT
(
  Col1     DataTypeForCol1,
  Col2     DataTypeForCol2,
  ZIPCode  VARCHAR(10),
  ...
);

INSERT INTO dbo.XLS_IMPORT (Col1, Col2, ZIPCode, ...)
  SELECT xls.Col1, xls.Col2, zip.[FormattedZIPCode], ...
  FROM   OPENROWSET('Microsoft.ACE.OLEDB.12.0',
         'Excel 12.0; Database=C:\RSG_ETL_Tool\Ohio\OH.xls; HDR=YES; IMEX=1',
         'SELECT * FROM [OH$]')
  CROSS APPLY dbo.FormatZIPCode(xls.[ZIP]) zip;

Some of the spreadsheets are "aware" of there being leading zeroes, and so prefixed the field in Excel with a single apostrophe so that Excel would treat the value as a string instead of as numeric (e.g. '01234). In that case, you can use the REPLACE function to strip out that apostrophe.

-- Test incoming string data (potentially prefixed with a single apostrophe)
SELECT  src.[val], frmt.[FormattedZIPCode],CHARINDEX(N'''', src.[val])
FROM    (VALUES (NULL), (N'''01234'), (N'''123456789'), (N'123'), (N'12345678')) src(val)
CROSS APPLY dbo.FormatZIPCode(REPLACE(src.[val], N'''', N'')) frmt;

However, you can't use REPLACE on all spreadsheets because the spreadsheets that have the ZIP column as numeric will do a CONVERT_IMPLICIT as it passes the value to the REPLACE function and the converted value will be in scientific notation (e.g. 1.23457e+008). So if you can't be certain which datatype is going to be returned by OPENROWSET, then you can use IIF (or CASE if using a version of SQL Server prior to 2012) and CHARINDEX to test for the presence of an apostrophe. If the data is being returned as FLOAT, then the implicit conversion to VARCHAR that will occur when passing the values into CHARINDEX will not be an issue since there is no apostrophe in 1.23457e+008 and the converted value is only going to CHARINDEX and not to dbo.FormatZIPCode.

-- Test handling incoming data as both numeric and string
SELECT  src.[val], frmt.[FormattedZIPCode],CHARINDEX(N'''', src.[val])
FROM    (VALUES (CONVERT(FLOAT, NULL)), (1), (12), (123), (1234), (12345),
                (123456), (1234567), (12345678), (123456789)) src(val)
CROSS APPLY dbo.FormatZIPCode(
       IIF(CHARINDEX(N'''', src.[val]) > 0, REPLACE(src.[val], N'''', N''), src.[val])
                             ) frmt;

SELECT  src.[val], frmt.[FormattedZIPCode]
FROM    (VALUES (N'''01234'), (N'''123456789'), (N'123'), (N'12345678')) src(val)
CROSS APPLY dbo.FormatZIPCode(
       IIF(CHARINDEX(N'''', src.[val]) > 0, REPLACE(src.[val], N'''', N''), src.[val])
                             ) frmt;

Endpoint configuration

One of the most common causes are misconfigured endpoints for the Database Mirroring role. There are various options and configurations that can be used and sometimes, especially when created by hand, those configuration options do not match. Take the OPs example:

CREATE ENDPOINT dbm_endpoint
    STATE=STARTED 
    AS TCP (LISTENER_PORT=5022) 
    FOR DATABASE_MIRRORING (ROLE=ALL)

This creates an endpoint called dbm_endpoint on tcp port 5022 for database mirroring. However it leaves out quite a bit of configuration options to which we assume are defaulted but we really do not know. This means the default options such as Windows authentication using either Kerberos or NTLM, it also doesn't specify which IP address (v4, v6, interface) is used, nor does it specify encryption options to which the default is REQUIRED and depending on the version of SQL Serve either RC4 or AES.

That is quite the assumption to make, especially when it isn't working. If you're not sure what the endpoint is set to, join sys.tcp_endpoints with sys.database_mirroring_endpoints to turn the resulting single row to individual configuration options.

There are advanced configurations, for example, that use certificates since they can't use Windows authentication. We don't know if the OP has a proper domain and if these servers are joined. Again, the assumption based off the endpoints is that they are, however we don't know for sure. When using certificates the largest issue is that the same certificate is used on multiple replicas yet the private key is not exported when copying the certificate to import to the replicas. Since the endpoints use the private/public key pair, using the same certificate will require that endpoint at some time to use the private key. Make sure the certificate is exported and imported with both keys if you're using it on multiple replicas.

When specifying the IP Address, the default is ANY which could be any of the IPv4 or IPv6 addresses. Though most places don't register or use IPv6 addresses, I do see it as an issue every once and a while. Checking to see what addresses are registered to a host in your DNS can help narrow the issue. You can also setup a network capture and filter based on the port. Every once and a while an IPv6 address is used and it's improperly setup, disabled, or firewalled on other servers but is returned via DNS.

Availability Group Replica Specification

When creating the availability group each replica will need to be specified and as such will need an endpoint url which will define the communications port (database mirroring endpoint).

Taking the OPs example:

REPLICA ON
        'SQL1' WITH
            (
                ENDPOINT_URL = 'TCP://sql1.sql.sandbox.net:7022',
                PRIMARY_ROLE ( ALLOW_CONNECTIONS = READ_WRITE),
                SECONDARY_ROLE (ALLOW_CONNECTIONS=READ_ONLY),
                AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,
                FAILOVER_MODE = MANUAL
            ),

This replica, called SQL1 has a TCP endpoint on sql1.sql.sandbox.net port 7022. the address seems to be a FQDN and should be checked that the proper DNS lookups, (if using) Kerberos SPNs, and firewall rules exist or are created. If there are multiple interfaces or addresses that are returned on a lookup of the address, then it may make sense to narrow it down to a specific address in the endpoint url for testing or to narrow it down in the mirroring endpoint specification.

SQL Server Permissions

One of the last surface areas for the disconnected issues to appear is the endpoint permissions on each replica. Since each endpoint the account will need the CONNECT permission (similar if using certificates) in order to use the endpoints. This is a fairly common issue, so common that this is marked in the errorlog and gives you the fix in the error message which is to grant the connect permission to the endpoint.

Windows Clustering

Since the replica manager replies upon Windows Server Failover Clustering to be working (not counting Linux, Docker, or Read-Scale) in order to start up, do make sure the cluster service is running on each replica and each node shows an up status in the cluster (Powershell, Failover Cluster Manager, etc.). If the cluster service is not running, then the manager will stop until it comes back up - this will be logged in the errorlog and should be very evident.

Troubleshooting Methodology

When nothing else works or shows up, what do you do?

Availability groups are a distributed system, this means there are many individual parts (replicas) that are distributed among many areas (databases) that function through a common set of protocols (availability groups). Given this, the troubleshooting methodology differs from typical single instance or clustered instance troubleshooting.

Each part needs to be checked against itself and against the overall metadata available. Additionally, each part needs to have the intermediary items checked against each other intermediary (such as firewalls, network cards, load balancers, etc.).

Start by checking the errorlog on each replica as only some errors (due to where the error occurs) are only applicable when the replica is primary or secondary. After an overview of each errorlog, this should point in a direction - if it does not, then most likely the issue is under SQL Server which could mean networking, firewall, OS, etc. System and application logs can be helpful here as a next step.

Look out for NDIS drivers installed on the system as they could be blocking traffic or incorrectly working and dropping traffic. Additionally, do check the interface diagnostics for the network cards as things such as dropped/discarded packets may be an issue if the disconnected state is intermittent.

When all else fails, check every piece of metadata against the other and make sure all match and are the same. In the OPs metadata specified above, one port was on 7022 and the other on 5022. While this in of itself is not incorrect and can be used, it does make issues in other places such as firewall rules where someone can easily misread the port or assume and not read that the ports are the same.

Depending upon skill level and knowledge, a network trace from the primary side and the secondary side would be ideal. This can show the packets leaving the primary/secondary and hopefully arriving at the opposite side. If it does not, an intermediary is not configured correctly - look for firewalls, load balancers, routers, etc., that are stopping the traffic.

Sql-server – sysjobhistory “sql_severity” codes

From here, the sql_severity column is the "severity of any SQL Server error."

When this column has the value 0, that means either no message was returned (and it's therefore a magic value), or a message with severity 0 was returned. I would argue this is a bad table design, as the column does not allow NULL.

It's almost the same thing with the sql_message_id column. This column will contain 0 when no message was returned -- in this case, though, it's purely a magic value, as there's no message id 0 (run SELECT * FROM sys.messages WHERE message_id = 0;).

You may want to clarify the intent of the report query you've inherited, because only comparing the sql_severity column may not do exactly what the author thought. (Severity 0 errors being raised are undoubtedly rare, but hey, you asked.)

If it's supposed to look for steps that returned a message, then it should check for sql_message_id > 0. As mentioned, it's completely possible that a job step can have a sql_severity > 0 and/or sql_message_id > 0 while still succeed. ^**
If the query is supposed to look for failed steps, it should be looking at the run_status column instead.

_{** You can try this yourself by creating and running a job with a T-SQL step that simply calls RAISERROR.}