Sql-server – Create a function/tsql to get strings from the 1st table/data and compare and convert if from the value of second table

querysql server

I want to create a function/tsql to get strings from the 1st table/data and compare and convert to the value of a second table if the strings match what is on table 2.

Business example.
When a cashier mistakenly encoded ABC1231 as company name instead of ABC the report will have an issue showing ABC1231 instead of ABC. We need to correct this by matching it against a master list of company names in another table.

The query logic Im trying to apply here is to match how many similar strings are in table 1 data to master list data in table 2. Then if the process saw that the strings from table 1 has a match or has the highest similar string it will output what data in table 2. This should be also case sensitive and allows int and special character. See example below.

Example:
Table 1 contains ->

A121B312C33,AB12312C,AB-C,ABC223,AsBCD21,D23EF,D2E2F2,DEF

Table 2 contains ->

ABC,DEF

Output should be:

ABC,ABC,ABC,ABC,ABC,DEF,DEF,DEF

It is not only capital letters. It should match what is on the 2nd table whether lower case or upper case with special characters or with integer as well.

Best Answer

Byron,

Following our small conversation in the comments, I would recommend making a table structure change instead of trying to make a code change to accommodate this.

I believe this is the solution for you because it will:

Limit the possibility for this kind of mismatch to happen again in the future
Eases the level of effort to correct an issue in the future if it comes up again
Removes the need for any complicated Macros or Queries to retrieve the data you are actually looking for

This will however require a couple of steps of initial effort from you:

Make table changes
Make application changes to support both the new table structures and probably a slight change to data entry processes
Data Cleanup Project

1. Make Table Changes

One thing Relational Databases are really good at is creating was I have heard call either a Master List or an List Of Values (LOV) table. The benefit is this forces your data and your application to only store certain allowed value for data integrity purposes. It then also allows you to make a change in one place and have it update everything.

Let's assume your two tables look something like this:

Table_1
ID INT,
Company VarChar(50),
CreateDateTime DATETIME

Table_2
Company VarChar(50)

If we rebuild the tables to be something like this we get to create a Foreign Key relationship which will ensure that the data stays consistent over time. (Table_1.CompanyID must be a value found in Table_2.CompanyID)

Table_1
ID INT,
CompanyID INT,
CreateDateTime DATETIME

Table_2
CompanyID INT,
CompanyName VarChar(50)

2. Make application changes to support both the new table structures and probably a slight change to data entry processes

When you wanted to create the original output of Table_1 you would run a query like below. This kind of change would need to be made in anything that tries to retrieve this data (reports, applications, automated processes, etc...)

SELECT T1.ID,
T2.CompanyName,
T1.CreateDateTime
FROM Table_1 T1
    INNER JOIN Table_2 T2
        ON T1.CompanyID = T2.CompanyID

When it comes to Insert and Update processes, you will first need to know that whatever the CompanyName value is you are trying to store. If that value does not exist already, than you have to create that value in Table_2 before writing your record to Table_1. For the sake of simplicity lets assume that the value ABC already exists and Table_1.ID is an Identity/Auto-Incriminating column (IE: I don't need to create a value for it on the Insert). A simple Insert and Update script would look something like:

--Insert
INSERT INTO Table_1 (CompanyID, CreateDateTime)
SELECT CompanyID,
GETDATE()
FROM Table_2
WHERE CompanyName = @CompanyName

--Update
DECLARE @CompanyID INT
SET @CompanyID = (SELECT TOP(1) CompanyID
                    FROM Table_2
                    WHERE CompanyName = @CompanyName
                )

UPDATE Table_1
SET CompanyID = @CompanyID
FROM Table_1
WHERE ID = @Table_1ID

Your application will probably need to have some sort of separate screen that allows certain users to create new Table_2 records when appropriate. Otherwise you may have a similar problem down the road with just a different implementation. You can do a check and when needed insert into Table_2 when trying to run insert or update operation for Table_1. I can provide some examples for those if you would like but I would recommend against it.

3. Data Cleanup Project

Depending on the amount of data in your database, this may be the biggest effort. You will need to take the data you have right now and transpose it into this new table structure.

--Create Table_2 Records
INSERT INTO Table_2 (CompanyName)
SELECT DISTINCT Company
FROM Table_1_Original

--Create Table_1 records with correct references to values in Table_2
INSERT INTO Table_1 (CompanyID, CreateDateTime)
SELECT T2.CompanyID,
T1O.CreateDateTime
FROM Table_2 T2
    INNER JOIN Table_1_Original T1O
        ON T2.CompanyName = T1O.Company

Then correct the records in Table_2 so that only the correct values you are looking for exist. You could do this directly in the original Table_1 before running the above script or make changes on Table_2 directly after everything is said and done. There are a couple of different ways to make it happen and I can provide some guidance if needed.

When This is All Said and Done

You should be in a position where your application self-regulates much better and your reporting process is as simple as a single SELECT with a simple JOIN and no need for any kind of conversion in either SQL or Excel.

Hopefully this helps. Let me know if this solution won't work for some reason and we can see about creating a different solution.

Related Solutions

Sql-server – Generate List of Missing Relationships

You have to define what you mean with "similar workgroups" and alter the relevant line but this is what I understand from your wording.

It will show all combinations of users from different workgroups (and similar in an arbitrary way, matching the first 3 characters of group name) that do not appear (in either order) in the cluserrx table:

; WITH UserCombinations AS
    ( SELECT r.logonid AS logonid,
             u.workgroup AS User_workgroup,
             p.logonid AS rxuser
             p.workgroup AS Provider_workgroup
      FROM cluser AS u
        JOIN cluser AS p
          ON  u.workgroup < p.workgroup                     -- not identical workgroups
          AND LEFT(u.workgroup, 3) = LEFT(p.workgroup, 3)   -- but similar
    )
SELECT 
    uc.logonid,
    uc.User_workgroup,
    uc.rxuser
    uc.Provider_workgroup
FROM UserCombinations AS uc
WHERE NOT EXISTS
      ( SELECT *
        FROM cluserrx AS rx
        WHERE rx.logonid = uc.logonid  AND  rx.rxuser = uc.rxuser
           OR rx.logonid = uc.rxuser  AND  rx.rxuser = uc.logonid
      ) ;

Sql-server – Get streak count and streak type from win-loss-tie data

Since you are on SQL Server 2012 you can use a couple of the new windowing functions.

with C1 as
(
  select T.team_id,
         case
           when M.winning_team_id is null then 'T'
           when M.winning_team_id = T.team_id then 'W'
           else 'L'
         end as streak_type,
         M.match_id
  from FantasyMatches as M
    cross apply (values(M.home_fantasy_team_id),
                       (M.away_fantasy_team_id)) as T(team_id)
), C2 as
(
  select C1.team_id,
         C1.streak_type,
         C1.match_id,
         lag(C1.streak_type, 1, C1.streak_type) 
           over(partition by C1.team_id 
                order by C1.match_id desc) as lag_streak_type
  from C1
), C3 as
(
  select C2.team_id,
         C2.streak_type,
         sum(case when C2.lag_streak_type = C2.streak_type then 0 else 1 end) 
           over(partition by C2.team_id 
                order by C2.match_id desc rows unbounded preceding) as streak_sum
  from C2
)
select C3.team_id,
       C3.streak_type,
       count(*) as streak_count
from C3
where C3.streak_sum = 0
group by C3.team_id,
         C3.streak_type
order by C3.team_id;

SQL Fiddle

C1 calculates the streak_type for each team and match.

C2 finds the previous streak_type ordered by match_id desc.

C3 generates a running sum streak_sum ordered by match_id desc keeping a 0 a long as the streak_type is the same as the last value.

Main query sums up the streaks where streak_sum is 0.

Best Answer

Related Solutions

Sql-server – Generate List of Missing Relationships

Sql-server – Get streak count and streak type from win-loss-tie data

Related Question