SQL Server – How to Test if a String is a Palindrome Using T-SQL

functionssql serversql-server-2012t-sql

I am a beginner in T-SQL. I want to decide whether an input string is a palindrome, with output = 0 if it is not and output = 1 if it is. I am still figuring out the syntax. I am not even getting an error message. I am looking for different solutions and some feedback, to gain a better understanding and knowledge of how T-SQL works, to become better at it –I am still a student.

The key idea, as I see it, is to compare the left- and right- most characters to each other, to check for equality, then go on to compare the second character from the left with the 2nd-from last one, etc. We do a loop: If the characters are equal to each other, we continue. If we reached the end, we output 1, if not, we output 0.

Would you please critique:

CREATE function Palindrome(
    @String  Char
    , @StringLength  Int
    , @n Int
    , @Palindrome BIN
    , @StringLeftLength  Int
)
RETURNS Binary
AS
BEGIN
SET @ n=1
SET @StringLength= Len(String)

  WHILE @StringLength - @n >1

  IF
  Left(String,@n)=Right(String, @StringLength)

 SET @n =n+1
 SET @StringLength =StringLength -1

 RETURN @Binary =1

 ELSE RETURN @Palindrome =0

END

I think I am on the right track, but I am still a long way off. Any ideas?

Best Answer

Since there are a fair number of solutions I'm going to go with the "critique" part of your question. A couple of notes: I've fixed some typos and noted where I did. If I'm wrong about them being a typo mention it in the comments and I'll explain what's going on. I'm going to point out several things that you may already know, so please don't take offense if I did. Some comments may seem picky but I don't know where you are in your journey so have to assume you are just starting out.

CREATE function Palindrome (
    @String  Char
    , @StringLength  Int
    , @n Int
    , @Palindrome BIN
    , @StringLeftLength  Int

ALWAYS include the length with a char or varchar definition. Aaron Bertrand talks about it in depth here. He is talking about varchar but the same goes for char. I'd use a varchar(255) for this if you only want relatively short strings or maybe a varchar(8000) for larger ones or even varchar(max). Varchar is for variable length strings char is only for fixed ones. Since you aren't sure of the length of string being passed in use varchar. Also it's binary not bin.

Next you don't need to put all of those variables as parameters. Declare them within your code. Only put something in the parameter list if you plan on passing it in or out. (You'll see how this looks at the end.) Also you have @StringLeftLength but never use it. So I'm not going to declare it.

The next thing I'm going to do is re-format a bit to make a few things obvious.

BEGIN
    SET @n=1
    SET @StringLength = Len(@String) -- Missed an @

    WHILE @StringLength - @n >1 
        IF Left(@String,@n)=Right(@String, @StringLength) -- More missing @s
            SET @n = @n + 1 -- Another missing @

    SET @StringLength = @StringLength - 1  -- Watch those @s :)

    RETURN @Palindrome = 1 -- Assuming another typo here 

    ELSE 
        RETURN @Palindrome =0

END

If you look at the way I did the indenting you'll notice that I have this:

    WHILE @StringLength - @n >1 
        IF Left(@String,@n)=Right(@String, @StringLength)
            SET @n = @n + 1

That's because commands like WHILE and IF only affect the first line of code after them. You have to use a BEGIN .. END block if you want multiple commands. So fixing that we get:

    WHILE @StringLength - @n > 1 
        IF Left(@String,@n)=Right(@String, @StringLength)
            BEGIN 
                SET @n = @n + 1
                SET @StringLength = @StringLength - 1
                RETURN @Palindrome = 1 
            END
        ELSE 
            RETURN @Palindrome = 0

You'll notice that I only added a BEGIN .. END block in the IF. That's because even though the IF statement is multiple lines long (and even contains multiple commands) it is still a single statement (covering everything performed in the IF and the ELSE portions of the statement).

Next you'll get an error after both of your RETURNs. You can return a variable OR a literal. You can't set the variable and return it at the same time.

                SET @Palindrome = 1 
            END
        ELSE 
            SET @Palindrome = 0

    RETURN @Palindrome

Now we are into logic. First let me point out that the LEFT and RIGHT functions you are using are great, but they are going to give you the number of characters you pass in from the requested direction. So let's say you passed in the word "test". On the first pass you are going to get this (removing variables):

LEFT('test',1) = RIGHT('test',4)
    t          =      test

LEFT('test',2) = RIGHT('test',3)
    te         =      est

Obviously that isn't what you expected. You would really want to use substring instead. Substring lets you pass in not only the starting point but the length. So you would get:

SUBSTRING('test',1,1) = SUBSTRING('test',4,1)
         t            =         t

SUBSTRING('test',2,1) = SUBSTRING('test',3,1)
         e            =         s

Next you are incrementing the variables you use in your loop only in one condition of the IF statement. Pull the variable incrementing out of that structure entirely. That is going to require an additional BEGIN .. END block, but I do get to remove the other one.

        WHILE @StringLength - @n > 1 
            BEGIN
                IF SUBSTRING(@String,@n,1) = SUBSTRING(@String, @StringLength,1)
                    SET @Palindrome = 1 
                ELSE 
                    SET @Palindrome = 0

                SET @n = @n + 1
                SET @StringLength = @StringLength - 1
            END

You need to change your WHILE condition to allow for the last test.

        WHILE @StringLength > @n

And last but not least, the way it stands now we don't test the last character if there are an odd number of characters. For example with 'ana' the n isn't tested. That's fine but it does me we need to account for a single letter word (if you want it to count as a positive that is). So we can do that by setting the value up front.

And now we finally have:

CREATE FUNCTION Palindrome (@String  varchar(255)) 
RETURNS Binary
AS

    BEGIN
        DECLARE @StringLength  Int
            , @n Int
            , @Palindrome binary

        SET @n = 1
        SET @StringLength = Len(@String)
        SET @Palindrome = 1

        WHILE @StringLength > @n 
            BEGIN
                IF SUBSTRING(@String,@n,1) = SUBSTRING(@String, @StringLength,1)
                    SET @Palindrome = 1 
                ELSE 
                    SET @Palindrome = 0

                SET @n = @n + 1
                SET @StringLength = @StringLength - 1
            END
        RETURN @Palindrome
    END

One last comment. I'm a big fan of formatting in general. It can really help you to see how your code works and help to point out possible mistakes.

Edit

As Sphinxxx mentioned we still have a flaw in our logic. Once we hit the ELSE and set @Palindrome to 0 there is no point in continuing. In fact at that point we could just RETURN.

                IF SUBSTRING(@String,@n,1) = SUBSTRING(@String, @StringLength,1)
                    SET @Palindrome = 1 
                ELSE 
                    RETURN 0

Given that we are now only using @Palindrome for "it's still possible this is a palindrome" there is really no point in having it. We can get rid of the variable and switch our logic to short circuit on failure (the RETURN 0) and RETURN 1 (a positive response) only if it makes it all the way through the loop. You'll notice this actually simplifies our logic somewhat.

CREATE FUNCTION Palindrome (@String  varchar(255)) 
RETURNS Binary
AS

    BEGIN
        DECLARE @StringLength  Int
            , @n Int

        SET @n = 1
        SET @StringLength = Len(@String)

        WHILE @StringLength > @n 
            BEGIN
                IF SUBSTRING(@String,@n,1) <> SUBSTRING(@String, @StringLength,1)
                    RETURN 0

                SET @n = @n + 1
                SET @StringLength = @StringLength - 1
            END
        RETURN 1
    END

Related Solutions

SQL Server to MySQL Migration – Remove UCS-2 Surrogate Pairs

You need to take the data from UTF-8 and convert it into UCS-2LE using something like iconv. For example, using the character in your example:

echo "010000: dcb3" | xxd -r -s -0x10000 | iconv -f "UTF-8" -t "UCS-2LE" | xxd
0000000: 3307

Now I'm not sure what character UTF-8 \xdcb3 is, but apparently it's correct translation to UCS-2LE is \U0733. If you have \0xDCB3 in the SQL Server it means it was not translated into UCS-2LE before import. You should not have surrogates in the NVARCHAR fields, UCS-2 is "surrogate agnostic". See UCS-2 vs. UTF-16 (not quite Kramer vs. Kramer).

I'm not an expert in the MySQL tool set so I can't say what step is missing that was supposed to do the iconv.

Update

to locate the records with surrogates you must turn to the binary representation, since any character function will threat the surrogates as 'special':. Luckily the string manipulation functions work on binary too with the expected semantics. Eg. CHARINDEX:

insert into test(a) values  (N'a');
insert into test(a) values  (NCHAR(0xdc83));
insert into test(a) values  (N'b');
go

select * from test where charindex(0x83dc, cast(a as varbinary(8000))) > 0;

SQL Server – varchar Storage and Comparisons in SQL Server 2008

Add a persistent calculated field that contains a CHECKSUM on the 5 fields, and use that to perform the comparisons.

The CHECKSUM field will be unique for that specific combination of fields, and is stored as an INT that results in a much easier target for comparisons in a WHERE clause.

USE tempdb; /* create this in tempdb since it is just a demo */

CREATE TABLE dbo.t1
(
    Id       bigint constraint PK_t1 primary key clustered identity(1,1)
    , Sequence int
    , Parent   int not null constraint df_T1_Parent DEFAULT ((0))
    , Data1    varchar(20)
    , Data2    varchar(20)
    , Data3    varchar(20)
    , Data4    varchar(20)
    , Data5    varchar(20)
    , CK AS CHECKSUM(Data1, Data2, Data3, Data4, Data5) PERSISTED
);

GO

INSERT INTO dbo.t1 (Sequence, Parent, Data1, Data2, Data3, Data4, Data5)
VALUES (1,1,'test','test2','test3','test4','test5');

SELECT *
FROM dbo.t1;
GO

enter image description here

/* this row will NOT get inserted since it already exists in dbo.t1 */
INSERT INTO dbo.t1 (Sequence, Parent, Data1, Data2, Data3, Data4, Data5)
SELECT 2, 3, 'test', 'test2', 'test3', 'test4', 'test5'
WHERE Checksum('test','test2','test3','test4','test5') NOT IN (SELECT CK FROM t1);

/* still only shows the original row, since the checksum for the row already
exists in dbo.t1 */
SELECT *
FROM dbo.t1;

In order to support a large number of rows, you'd want to create an NON-UNIQUE index on the CK field.

By the way, you neglected to mention the number of rows you are expecting in this table; that information would be instrumental in making great recommendations.

In-row data is limited to a maximum of 8060 bytes, which is the size of a single page of data, less the required overhead for each page. Any single row larger than that will result in some off-page storage of row data. I'm certain other contributors to http://dba.stackexchange.com can give you a much more concise definition of the engine internals regarding storage of large rows. How big is your largest row, presently?

If items in Data1, Data2, Data3... have the same values occurring in a different order, the checksum will be different, so you may want to take that into consideration.

Following a brief discussion with the fantastic Mark Storey-Smith on The Heap, I'd like to offer a similar, although potentially better choice for calculating a hash on the fields in question. You could alternately use the HASHBYTES() function in the calculated column. HASHBYTES() has some gotchas, such as the necessity to concatenate your fields together, including some type of delimiter between the field values, in order to pass HASHBYTES() a single value. For more information about HASHBYTES(), Mark recommended this site. Clearly, MSDN also has some great info at http://msdn.microsoft.com/en-us/library/ms174415.aspx

Best Answer

Related Solutions

SQL Server to MySQL Migration – Remove UCS-2 Surrogate Pairs

SQL Server – varchar Storage and Comparisons in SQL Server 2008

Related Question