SQL Server – How to Convert Data to Proper Case

functionssql servert-sql

SQL Server contains system functions for viewing / updating string data to both uppercase and lowercase but not proper case. There are multiple reasons for desiring this operation takes place in SQL Server rather than at the application layer. In my case, we were performing some data cleanup during a consolidation of our global HR data from multiple sources.

If you search the internet you will find multiple solutions to this task but many seem to have restrictive caveats or do not allow for exceptions to be defined in the function.

NOTE: As mentioned in the comments below SQL Server is NOT the ideal place to perform this conversion. Other methods were suggested as well – such as CLR, for example. This post has served it's purpose, in my opinion – it's great to have all of these thoughts in one place as opposed to the random tidbits available here and there. Thank you to all.

Best Answer

The challenge you'll run into with these approaches is that you've lost information. Explain it to the business users that they've taken a blurry, out of focus picture and despite what they see on t.v. there's no way to make it crisp and in focus. There's always going to be situations where these rules won't work and as long as everyone knows going into this that's the case, then have at it.

This is HR data so I'm going to assume we're talking about getting names in a consistent title case format because the mainframe stores it as AARON BERTRAND and we want the new system to not yell at them. Aaron is easy (but not cheap). You and Max have already identified the problem with the Mc/Mac so it correctly capitalizes Mc/Mac but there are instances where it's too aggressive with Mackey/Maclin/Mackenzie. Mackenzie is an interesting case though - look how the popularity of it has boomed as a baby name

Mackenzie

At some point, there will be a poor child named Mackenzie MacKenzie because people are awful beings.

You're also going to run into lovely things like D'Antoni where we should cap both letters around the tick mark. Except for d'Autremont where you only capitalize the letter after the apostrophe. Heaven help you though, if you send mail to d'Illoni as their family name is D'illoni.

For the sake of contributing actual code, the following is a CLR method we used in a 2005 instance for our purposes. It generally used ToTitleCase except for the list of exceptions we built out which is when we basically gave up trying to codify the aforementioned exceptions.

namespace Common.Util
{
    using System;
    using System.Collections.Generic;
    using System.Globalization;
    using System.Text;
    using System.Text.RegularExpressions;
    using System.Threading;

    /// <summary>
    /// A class that attempts to proper case a word, taking into
    /// consideration some outliers.
    /// </summary>
    public class ProperCase
    {
        /// <summary>
        /// Convert a string into its propercased equivalent.  General case
        /// it will capitalize the first letter of each word.  Handled special 
        /// cases include names with apostrophes (O'Shea), and Scottish/Irish
        /// surnames MacInnes, McDonalds.  Will fail for Macbeth, Macaroni, etc
        /// </summary>
        /// <param name="inputText">The data to be recased into initial caps</param>
        /// <returns>The input text resampled as proper cased</returns>
        public static string Case(string inputText)
        {
            CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
            TextInfo textInfo = cultureInfo.TextInfo;
            string output = null;
            int staticHack = 0;

            Regex expression = null;
            string matchPattern = string.Empty;

            // Should think about maybe matching the first non blank character
            matchPattern = @"
                (?<Apostrophe>'.\B)| # Match things like O'Shea so apostrophe plus one.  Think about white space between ' and next letter.  TODO:  Correct it's from becoming It'S, can't -> CaN'T
                \bMac(?<Mac>.) | # MacInnes, MacGyver, etc.  Will fail for Macbeth
                \bMc(?<Mc>.) # McDonalds
                ";
            expression = new Regex(matchPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);

            // Handle our funky rules            
            // Using named matches is probably overkill as the
            // same rule applies to all but for future growth, I'm
            // defining it as such.
            // Quirky behaviour---for 2005, the compiler will 
            // make this into a static method which is verboten for 
            // safe assemblies.  
            MatchEvaluator upperCase = delegate(Match match)
            {
                // Based on advice from Chris Hedgate's blog
                // I need to reference a local variable to prevent
                // this from being turned into static
                staticHack = matchPattern.Length;

                if (!string.IsNullOrEmpty(match.Groups["Apostrophe"].Value))
                {
                    return match.Groups["Apostrophe"].Value.ToUpper();
                }

                if (!string.IsNullOrEmpty(match.Groups["Mac"].Value))
                {
                    return string.Format("Mac{0}", match.Groups["Mac"].Value.ToUpper());
                }

                if (!string.IsNullOrEmpty(match.Groups["Mc"].Value))
                {
                    return string.Format("Mc{0}", match.Groups["Mc"].Value.ToUpper());
                }

                return match.Value;
            };

            MatchEvaluator evaluator = new MatchEvaluator(upperCase);

            if (inputText != null)
            {
                // Generally, title casing converts the first character 
                // of a word to uppercase and the rest of the characters 
                // to lowercase. However, a word that is entirely uppercase, 
                // such as an acronym, is not converted.
                // http://msdn.microsoft.com/en-us/library/system.globalization.textinfo.totitlecase(VS.80).aspx
                string temporary = string.Empty;
                temporary = textInfo.ToTitleCase(inputText.ToString().ToLower());
                output = expression.Replace(temporary, evaluator);
            }
            else
            {
                output = string.Empty;
            }

            return output;
        }
    }
}

Now that all of that is clear, I'm going to finish this lovely book of poetry by e e cummings