SQL Server – Using String Function to Get Left 1000 Characters Up to Last Period or Carriage Return

sql serversql-server-2012

I have blocks of text, anywhere between 500 and 2000 characters long.

What I need to be able to do is take a maximum of 1000 characters from the description but only to the last full stop or carriage return (Char(13)), so that the text isn't cut off mid-sentence. Should there be both periods and Char(13)'s I would ideally like it to cut if off at the period.

I'm pasting some randomly generated text to try and show what I mean.

*Up unpacked friendly ecstatic so possible humoured do. Ample ended might folly quiet one set spoke her. We no am former valley assure.
Four need spot ye said we find mile. Are commanded him convinced
dashwoods did estimable forfeited. Shy celebrated met sentiments she
reasonably but. Proposal its disposed eat advanced marriage sociable.
Drawings led greatest add subjects endeavor gay remember. Principles
one yet assistance you met impossible. The who arrival end how fertile
enabled. Brother she add yet see minuter natural smiling article
painted. Themselves at dispatched interested insensible am be
prosperous reasonably it. She boisterous use friendship she dissimilar considered expression. Sex quick arose mrs lived. Mr things do plenty others an vanity myself
waited to. Always parish tastes at as mr father dining at. Led ask
possible mistress relation elegance eat likewise debating.

Or am nothing amongst chiefly address. The its enable direct men
depend highly. Ham windows sixteen who inquiry fortune demands. Is be
upon sang fond must shew. Really boy law county she unable her sister.
Feet you off its like like six. Among sex are leave law built now. In
built table in an rapid blush. Merits behind on afraid or warmly. In either so spring wished. Melancholy wayBy message*

I would want to be able to return all text up to "chiefly address." (990 characters).

Best Answer

For the original requirement -- 1000 max characters, up to the last period, but there might not be any period -- the following would work:

LEFT(LEFT(string, 1000),
     1001 - CHARINDEX('.', REVERSE(LEFT(string, 1000))))

The LEFT(string, 1000) is used as the expression to take the LEFT(expression, some number) from because it ensures that no more than 1000 characters will be taken which would otherwise happen in the case of the string not having any periods at all. While the LEFT(expression, 1000) could have instead been wrapped around the entire LEFT(string, 1001 - ...), doing it the way shown above should allow for the expression to be reused since it also shows up exactly the same inside of the REVERSE function, hence a little more efficient.

For the updated requirement of introducing a Carriage Return as a fall-back character to look for, you can maybe switch to using PATINDEX as it can look for a list of characters:

LEFT(LEFT(string, 1000),
     1001 - PATINDEX('%[.' + CHAR(13) + ']%', REVERSE(LEFT(string, 1000))))

However, there is one final requirement added:

Should there be both periods and Char(13)'s I would ideally like it to cut if off at the period.

Unfortunately, PATINDEX can't distinguish which one comes first. At this point it seems like some form of conditional logic will need to be added :-(.

LEFT(LEFT(TestData, 1000), 1001 - IIF(CHARINDEX('.', REVERSE(LEFT(TestData, 1000))) > 0,
                                      CHARINDEX('.', REVERSE(LEFT(TestData, 1000))),
                                      CHARINDEX(CHAR(13), REVERSE(LEFT(TestData, 1000))))
                                     )

Here I not only reused the LEFT(TestData, 1000), but the REVERSE(LEFT(TestData, 1000)) shows up 3 times, and for the IIF (which is really just shorthand for CASE WHEN x THEN y ELSE x END) I reused the entire CHARINDEX('.', REVERSE(LEFT(TestData, 1000))).


The example code is posted on Pastebin.com at:

Get 1000 chars max, up to final period, else to final return

The example shows the outcome of the 3 methods shown above across 7 different test cases.