Sql-server – Select one name per person where spelling variations exist

duplicationinsertselectsql-server-2008-r2

I have a source table that contains Employee IDs and names. The employee names may be listed multiple times with variations on the exact spelling (see 'source table' for example – note the middle initial is sometimes present).

I want to insert unique values, in alphabetical order, into a new table (see 'target table' for desired results). I don't care which version of someone's name is inserted into my target table…what I want is one truly distinct value per person in my target table sorted alphabetically by last name.

Source Table:

Emp_ID  Name
123 Jones, John
123 Jones, John P
123 Jones, John P.
456 Lewis, Jerry
456 Lewis, Jerry L
456 Lewis, Jerry L.
789 Hewitt, Jennifer
789 Hewitt, Jennifer L

Target table (desired results):

Emp_ID  Name
789 Hewitt, Jennifer
123 Jones, John
456 Lewis, Jerry L

Best Answer

You can "group by" Emp_ID and use an aggregate function like MIN() or MAX() to get one of the names:

INSERT INTO TargetTable 
  (Emp_ID, Name) 
SELECT Emp_ID, MIN(Name) 
FROM SourceTable 
GROUP BY Emp_ID ;

And note that there is no inherent order in a table (actually you can define a clustered index for a table and this affects how the rows are stored on the disk but that is no guarantee for the order of retrieval).

You can get the data from the target table with a query afterwards and if you want them ordered, you can (and should) define the order you like (and a different one if you like, every time you query it):

SELECT Emp_ID, Name 
FROM TargetTable 
ORDER BY Name ;

Related Solutions

Mysql – How to select the latest record having one state where no later records exist with any other state

SELECT widget, MAX(`timestamp`) AS ts
FROM tableX AS t
WHERE state = 'down'
GROUP BY widget
HAVING NOT EXISTS
       ( SELECT *
         FROM tableX AS tt
         WHERE tt.widget = t.widget
           AND tt.state <> 'down'
           AND tt.`timestamp` > MAX(t.`timestamp`)
       ) ;

I think that you'll need two indices, one on (widget, state, timestamp) and one on (widget, timestamp, state) for efficiency.

This will work, too, and will be needing only one index, on (widget, timestamp, state):

SELECT t.widget, t.`timestamp`
FROM 
        tableX AS t
    JOIN
        ( SELECT widget, MAX(`timestamp`) AS ts
          FROM tableX
          GROUP BY widget
        ) AS tm
            ON  tm.widget = t.widget
            AND tm.ts = t.`timestamp`
WHERE t.state = 'down' ;

Tested both at SQL-Fiddle: test

MySQL Insert into table, where name = id, if not exist, then insert name and use that ID

Assuming you have some tables for Persons, Animals:

CREATE TABLE Person
  ( PersonID INT UNSIGNED NOT NULL AUTO_INCREMENT
  , PersonName VARCHAR(255) NOT NULL
  , CONSTRAINT Person_PK
      PRIMARY KEY (PersonID)
  , CONSTRAINT PersonName_UQ 
      UNIQUE (PersonName)
  ) ;

CREATE TABLE Animal
  ( AnimalID INT UNSIGNED NOT NULL AUTO_INCREMENT
  , AnimalName VARCHAR(255) NOT NULL
  , CONSTRAINT Animal_PK
      PRIMARY KEY (AnimalID)
  , CONSTRAINT AnimalName_UQ 
      UNIQUE (AnimalName)
  ) ;

and results:

CREATE TABLE Result
  ( RaceID INT UNSIGNED NOT NULL
  , Position INT UNSIGNED NOT NULL
  , PersonID INT UNSIGNED NOT NULL 
  , AnimalID INT UNSIGNED NOT NULL
  , Errors INT UNSIGNED NOT NULL DEFAULT 0
  , CompletionTime Time NULL DEFAULT NULL

  , CONSTRAINT Result_PK
      PRIMARY KEY (RaceID, Position)

  , CONSTRAINT Race_Person_UQ           -- assuming a Person cannot enter
      UNIQUE (RaceID, PersonID)                 -- a race twice

  , CONSTRAINT Race_Animal_UQ           -- assuming an Animal cannot enter
      UNIQUE (RaceID, AnimalID)                 -- a race twice

  , INDEX PersonID_IX (PersonID)                -- indexes for the Foreign Key
  , INDEX AnimalID_IX (AnimalID)                -- constraints:

  , CONSTRAINT Person_Result_FK     
      FOREIGN KEY (PersonID)
      REFERENCES Person (PersonID)
  , CONSTRAINT Animal_Result_FK     
      FOREIGN KEY (AnimalID)
      REFERENCES Animal (AnimalID)
  ) ;

I suggest you first bulk load the data (possibly with LOAD DATA from .txt or .csv files) in a table in MySQL (supplying race IDS. If you can't supply raceIDs but you have race names, the tables should be adjusted accordingly). You should have a Race table as well, this is just a sample procedure:

CREATE TABLE BulkData
  ( RaceID INT UNSIGNED NOT NULL
  , Position INT UNSIGNED NOT NULL
  , PersonName VARCHAR(255) NOT NULL
  , AnimalName VARCHAR(255) NOT NULL
  , Errors INT UNSIGNED NOT NULL DEFAULT 0          -- adjust datatypes according
  , CompletionTime Time NULL DEFAULT NULL           -- to your data
  ) ;

LOAD DATA INFILE '/results.txt' 
    INTO TABLE BulkData
    FIELDS TERMINATED BY ',' 
           ENCLOSED BY '"'
    LINES TERMINATED BY '\r\n' ;

Then you can manipulate them and insert them into the 2-3 tables. For Person:

INSERT INTO Person
    (PersonName)
SELECT DISTINCT
    b.PersonName
FROM
    BulkData AS b
WHERE NOT EXISTS
    ( SELECT 1
      FROM Person AS p
      WHERE p.PersonName = b.PersonName
    ) ;

Similar for Animal:

INSERT INTO Animal
    (AnimalName)
SELECT DISTINCT
    b.AnimalName
FROM
    BulkData AS b
WHERE NOT EXISTS
    ( SELECT 1
      FROM Animal AS a
      WHERE a.AnimalName = b.AnimalName
    ) ;

And then in Result:

INSERT INTO Result
    (RaceID, Position, PersonID, AnimalID, Errors, CompletionTime)
SELECT 
    b.RaceID, b.Position, p.PersonID, a.AnimalID, b.Errors, b.CompletionTime
FROM
    BulkData AS b
  JOIN
    Person AS p  ON p.PersonName = b.PersonName
  JOIN 
    Animal AS a  ON a.AnimalName = b.AnimalName
WHERE NOT EXISTS
    ( SELECT 1
      FROM Result AS r
      WHERE r.RaceID = b.RaceID
        AND r.PositionID = b.PositionID
    ) ;

If the importing results are satisfying, then you can empty the BulkData table and repeat the procedure with more files. The NOT EXISTS conditions will take care and not allow duplicates even if you try to load same data twice.

Best Answer

Related Solutions

Mysql – How to select the latest record having one state where no later records exist with any other state

MySQL Insert into table, where name = id, if not exist, then insert name and use that ID

Related Question