Pros - It is the natural key, it makes sense and it will likely be searched on, I presume?
Cons - The default behavior (which is totally changeable) is for a primary key to be the clustered index. An alphanumeric doesn't make the best candidate because inserts can cause page splits because they aren't set on an ever increasing value like an identity column. The Int identity column will take less space (4bytes) compared to the character data (40+bytes for the unicode) . This makes your other indexes larger since the clustered key is part of them. If you ever change how you identify your customers and make customer codes, this all breaks - going with a surrogate insulates you from those type of changes.
In this situation, I tend to optimize for the insert performance and go with an identity column more often than not for the clustered key and primary key. I really like integer clustered indexes. (Now I know your question was not about clustered index, it was about primary key... You could still choose some other column to be the clustered index and make this your primary key, you could also put a unique constraint on this and treat it as a natural key but not make it your primary key).
I would at the very least index this with a unique constraint and treat it like a natural key. I just don't know if you really need to make it your primary key.
Kimberly Tripp is a trusted resource who has a lot to say about primary keys and (more so) clustered keys on her blog - https://www.sqlskills.com/blogs/kimberly/guids-as-primary-keys-andor-the-clustering-key/
This is all just my opinion - YMMV.
In the past I built a "fuzzy-search" in a .Net CLR function. This function gets called the same way a user-defined function gets called.
For example,
select id, name
from customers
where dbo.CompareStrings("newCustomerName", customers.name) > .8
would only return customers with a name that was 80% similar to the input name.
The % match is based on the number of changes needed to convert one value to another, not the number of characters that are different. We use it to compare addresses, and found this was more effective because of the numerous street abbreviations that are used.
Here's the code I used to compare strings. I did this so long ago that I can't remember how to deploy it, although a quick search will show you many articles on how to create SQL CLR functions
' Checks two strings against each other and returns a decimal between 0 (doesn't match at all) and 1 (100% match)
<Microsoft.SqlServer.Server.SqlFunction()> _
Public Shared Function CompareStrings(ByVal input1 As SqlChars, ByVal input2 As SqlChars) _
As <SqlFacet(Precision:=10, Scale:=4)> SqlDecimal
If IsNothing(input1) And IsNothing(input2) Then
Return New SqlDecimal(1.0)
ElseIf IsNothing(input1) Or IsNothing(input2) Then
Return New SqlDecimal(0.0)
End If
Dim s1 As String = New String(input1.Value)
Dim s2 As String = New String(input2.Value)
If s1.Length = 0 Or s2.Length = 0 Then
Return New SqlDecimal(1.0)
Else
Dim re As New Regex("[^A-Za-z0-9 ]", RegexOptions.IgnorePatternWhitespace Xor RegexOptions.Singleline)
s1 = re.Replace(s1, "( )\1*", "$1")
s2 = re.Replace(s2, "( )\1*", "$1")
s1 = UCase(re.Replace(s1, ""))
s2 = UCase(re.Replace(s2.ToString, ""))
Dim dif As Integer = GetStringSimilarity(s1, s2)
Dim max As Integer = s1.Length
If s2.Length > max Then max = s2.Length
Return New SqlDecimal(1.0 - (dif / max))
End If
End Function
' Compares two strings using the relationship in patterns of letters
Private Shared Function GetStringSimilarity(ByVal s1 As String, ByVal s2 As String) As Integer
Dim n As Integer = s1.Length
Dim m As Integer = s2.Length
Dim distance(n + 1, m + 1) As Integer
Dim cost As Integer = 0
If n = 0 Then Return m
If m = 0 Then Return n
For i As Integer = 0 To n
distance(i, 0) = i
Next
For j As Integer = 0 To m
distance(0, j) = j
Next
For i As Integer = 1 To n
For j As Integer = 1 To m
If Mid(s2, j, 1) = Mid(s1, i, 1) Then cost = 0 Else cost = 1
distance(i, j) = Min3(distance(i - 1, j) + 1, distance(i, j - 1) + 1, distance(i - 1, j - 1) + cost)
Next
Next
Return distance(n, m)
End Function
' Returns the min of 3 values
Private Shared Function Min3(ByVal x As Integer, ByVal y As Integer, ByVal z As Integer) As Integer
Dim min As Integer = x
If y < min Then min = y
If z < min Then min = z
Return min
End Function
Best Answer
If you want to force every child of every parent to have the same customer then there are two quick solutions that you could use to accomplish this.
The first, and my preference, would be to use a check constraint on the table like so:
The second option would be to add a trigger to the table to enforce the requirements like so:
You can adjust the error handling and message returned to the client a bit better in the trigger but I personally prefer using the
CHECK CONSTRAINT
version.These are just two possibilities that I can come up with, there may be more and you may be able to optimize the function or trigger to perform better. You could also change the design of your database to accommodate this better if you wanted. I just wrote the example above off the top of my head so be aware that they are also untested.
EDIT
I just noticed that the parent can be null (which makes sense) - but the example I gave above does not account for this. Make sure you account for this scenario in your end solution.