Sql-server – Create a function/tsql to get strings from the 1st table/data and compare and convert if from the value of second table

querysql server

I want to create a function/tsql to get strings from the 1st table/data and compare and convert to the value of a second table if the strings match what is on table 2.

Business example.
When a cashier mistakenly encoded ABC1231 as company name instead of ABC the report will have an issue showing ABC1231 instead of ABC. We need to correct this by matching it against a master list of company names in another table.

The query logic Im trying to apply here is to match how many similar strings are in table 1 data to master list data in table 2. Then if the process saw that the strings from table 1 has a match or has the highest similar string it will output what data in table 2. This should be also case sensitive and allows int and special character. See example below.

Example:
Table 1 contains ->

A121B312C33,AB12312C,AB-C,ABC223,AsBCD21,D23EF,D2E2F2,DEF

Table 2 contains ->

ABC,DEF

Output should be:

ABC,ABC,ABC,ABC,ABC,DEF,DEF,DEF

It is not only capital letters. It should match what is on the 2nd table whether lower case or upper case with special characters or with integer as well.

Best Answer

Byron,

Following our small conversation in the comments, I would recommend making a table structure change instead of trying to make a code change to accommodate this.

I believe this is the solution for you because it will:

  1. Limit the possibility for this kind of mismatch to happen again in the future
  2. Eases the level of effort to correct an issue in the future if it comes up again
  3. Removes the need for any complicated Macros or Queries to retrieve the data you are actually looking for

This will however require a couple of steps of initial effort from you:

  1. Make table changes
  2. Make application changes to support both the new table structures and probably a slight change to data entry processes
  3. Data Cleanup Project

1. Make Table Changes

One thing Relational Databases are really good at is creating was I have heard call either a Master List or an List Of Values (LOV) table. The benefit is this forces your data and your application to only store certain allowed value for data integrity purposes. It then also allows you to make a change in one place and have it update everything.

Let's assume your two tables look something like this:

Table_1
ID INT,
Company VarChar(50),
CreateDateTime DATETIME

Table_2
Company VarChar(50)

If we rebuild the tables to be something like this we get to create a Foreign Key relationship which will ensure that the data stays consistent over time. (Table_1.CompanyID must be a value found in Table_2.CompanyID)

Table_1
ID INT,
CompanyID INT,
CreateDateTime DATETIME

Table_2
CompanyID INT,
CompanyName VarChar(50)

2. Make application changes to support both the new table structures and probably a slight change to data entry processes

When you wanted to create the original output of Table_1 you would run a query like below. This kind of change would need to be made in anything that tries to retrieve this data (reports, applications, automated processes, etc...)

SELECT T1.ID,
T2.CompanyName,
T1.CreateDateTime
FROM Table_1 T1
    INNER JOIN Table_2 T2
        ON T1.CompanyID = T2.CompanyID

When it comes to Insert and Update processes, you will first need to know that whatever the CompanyName value is you are trying to store. If that value does not exist already, than you have to create that value in Table_2 before writing your record to Table_1. For the sake of simplicity lets assume that the value ABC already exists and Table_1.ID is an Identity/Auto-Incriminating column (IE: I don't need to create a value for it on the Insert). A simple Insert and Update script would look something like:

--Insert
INSERT INTO Table_1 (CompanyID, CreateDateTime)
SELECT CompanyID,
GETDATE()
FROM Table_2
WHERE CompanyName = @CompanyName

--Update
DECLARE @CompanyID INT
SET @CompanyID = (SELECT TOP(1) CompanyID
                    FROM Table_2
                    WHERE CompanyName = @CompanyName
                )

UPDATE Table_1
SET CompanyID = @CompanyID
FROM Table_1
WHERE ID = @Table_1ID

Your application will probably need to have some sort of separate screen that allows certain users to create new Table_2 records when appropriate. Otherwise you may have a similar problem down the road with just a different implementation. You can do a check and when needed insert into Table_2 when trying to run insert or update operation for Table_1. I can provide some examples for those if you would like but I would recommend against it.

3. Data Cleanup Project

Depending on the amount of data in your database, this may be the biggest effort. You will need to take the data you have right now and transpose it into this new table structure.

--Create Table_2 Records
INSERT INTO Table_2 (CompanyName)
SELECT DISTINCT Company
FROM Table_1_Original

--Create Table_1 records with correct references to values in Table_2
INSERT INTO Table_1 (CompanyID, CreateDateTime)
SELECT T2.CompanyID,
T1O.CreateDateTime
FROM Table_2 T2
    INNER JOIN Table_1_Original T1O
        ON T2.CompanyName = T1O.Company

Then correct the records in Table_2 so that only the correct values you are looking for exist. You could do this directly in the original Table_1 before running the above script or make changes on Table_2 directly after everything is said and done. There are a couple of different ways to make it happen and I can provide some guidance if needed.

When This is All Said and Done

You should be in a position where your application self-regulates much better and your reporting process is as simple as a single SELECT with a simple JOIN and no need for any kind of conversion in either SQL or Excel.

Hopefully this helps. Let me know if this solution won't work for some reason and we can see about creating a different solution.