Sql-server – Query to insert data into columns and check for pre-existing values for those columns data

MySQLpostgresqlquerysql server

I have table t1 and t2 with same column names in both tables, column a and b. I want to enter data of column a and b from t2 into t1 at the same time check weather any data pre exist in t1. If data exist already in a and b column then don't enter that data.
The main challenge in my case is , I want to use combination of (a and b) column data to check for pre existing values.
Thanks for your precious time in advance.

Best Answer

SQL Server:

INSERT INTO t1 (a,b) 
SELECT a, b 
FROM t2
EXCEPT
SELECT a, b 
FROM t1

The EXCEPT operator compares the result sets of two queries and eliminates those rows in the first result set that match in the second result set. It uses an exact match to identify and eliminate rows.

There are other options of course, such as MERGE or using NOT EXISTS in your WHERE clause, however, for a simple query such as this EXCEPT will likely perform better.

Related Solutions

MySQL Insert into table, where name = id, if not exist, then insert name and use that ID

Assuming you have some tables for Persons, Animals:

CREATE TABLE Person
  ( PersonID INT UNSIGNED NOT NULL AUTO_INCREMENT
  , PersonName VARCHAR(255) NOT NULL
  , CONSTRAINT Person_PK
      PRIMARY KEY (PersonID)
  , CONSTRAINT PersonName_UQ 
      UNIQUE (PersonName)
  ) ;

CREATE TABLE Animal
  ( AnimalID INT UNSIGNED NOT NULL AUTO_INCREMENT
  , AnimalName VARCHAR(255) NOT NULL
  , CONSTRAINT Animal_PK
      PRIMARY KEY (AnimalID)
  , CONSTRAINT AnimalName_UQ 
      UNIQUE (AnimalName)
  ) ;

and results:

CREATE TABLE Result
  ( RaceID INT UNSIGNED NOT NULL
  , Position INT UNSIGNED NOT NULL
  , PersonID INT UNSIGNED NOT NULL 
  , AnimalID INT UNSIGNED NOT NULL
  , Errors INT UNSIGNED NOT NULL DEFAULT 0
  , CompletionTime Time NULL DEFAULT NULL

  , CONSTRAINT Result_PK
      PRIMARY KEY (RaceID, Position)

  , CONSTRAINT Race_Person_UQ           -- assuming a Person cannot enter
      UNIQUE (RaceID, PersonID)                 -- a race twice

  , CONSTRAINT Race_Animal_UQ           -- assuming an Animal cannot enter
      UNIQUE (RaceID, AnimalID)                 -- a race twice

  , INDEX PersonID_IX (PersonID)                -- indexes for the Foreign Key
  , INDEX AnimalID_IX (AnimalID)                -- constraints:

  , CONSTRAINT Person_Result_FK     
      FOREIGN KEY (PersonID)
      REFERENCES Person (PersonID)
  , CONSTRAINT Animal_Result_FK     
      FOREIGN KEY (AnimalID)
      REFERENCES Animal (AnimalID)
  ) ;

I suggest you first bulk load the data (possibly with LOAD DATA from .txt or .csv files) in a table in MySQL (supplying race IDS. If you can't supply raceIDs but you have race names, the tables should be adjusted accordingly). You should have a Race table as well, this is just a sample procedure:

CREATE TABLE BulkData
  ( RaceID INT UNSIGNED NOT NULL
  , Position INT UNSIGNED NOT NULL
  , PersonName VARCHAR(255) NOT NULL
  , AnimalName VARCHAR(255) NOT NULL
  , Errors INT UNSIGNED NOT NULL DEFAULT 0          -- adjust datatypes according
  , CompletionTime Time NULL DEFAULT NULL           -- to your data
  ) ;

LOAD DATA INFILE '/results.txt' 
    INTO TABLE BulkData
    FIELDS TERMINATED BY ',' 
           ENCLOSED BY '"'
    LINES TERMINATED BY '\r\n' ;

Then you can manipulate them and insert them into the 2-3 tables. For Person:

INSERT INTO Person
    (PersonName)
SELECT DISTINCT
    b.PersonName
FROM
    BulkData AS b
WHERE NOT EXISTS
    ( SELECT 1
      FROM Person AS p
      WHERE p.PersonName = b.PersonName
    ) ;

Similar for Animal:

INSERT INTO Animal
    (AnimalName)
SELECT DISTINCT
    b.AnimalName
FROM
    BulkData AS b
WHERE NOT EXISTS
    ( SELECT 1
      FROM Animal AS a
      WHERE a.AnimalName = b.AnimalName
    ) ;

And then in Result:

INSERT INTO Result
    (RaceID, Position, PersonID, AnimalID, Errors, CompletionTime)
SELECT 
    b.RaceID, b.Position, p.PersonID, a.AnimalID, b.Errors, b.CompletionTime
FROM
    BulkData AS b
  JOIN
    Person AS p  ON p.PersonName = b.PersonName
  JOIN 
    Animal AS a  ON a.AnimalName = b.AnimalName
WHERE NOT EXISTS
    ( SELECT 1
      FROM Result AS r
      WHERE r.RaceID = b.RaceID
        AND r.PositionID = b.PositionID
    ) ;

If the importing results are satisfying, then you can empty the BulkData table and repeat the procedure with more files. The NOT EXISTS conditions will take care and not allow duplicates even if you try to load same data twice.

Sql-server – What would be the most efficient way to check for null values in all columns for multiple tables

For part 1 of your question, I created these three tables in an empty database:

CREATE TABLE dbo.table1(a INT, b INT);
CREATE TABLE dbo.table2(a INT, b INT);
CREATE TABLE dbo.table3(a INT, b INT);

INSERT dbo.table1(a,b) VALUES(1,1),(1,2),(1,NULL);
INSERT dbo.table2(a,b) VALUES(1,1),(NULL,NULL),(1,NULL);
INSERT dbo.table3(a,b) VALUES(1,1),(1,2),(1,2);

Then I wrote this code to extract various bits of NULL information:

CREATE TABLE #x
(
  t           NVARCHAR(512),                     -- table name
  nullrows    INT NOT NULL DEFAULT 0,            -- number of rows with at least one NULL
  nullvalues  INT NOT NULL DEFAULT 0,            -- total NULL values in table
  nulldist    NVARCHAR(MAX) NOT NULL DEFAULT N'' -- how many NULLs in each column
);

INSERT #x(t) VALUES(N'dbo.table1'),(N'dbo.table2'),(N'dbo.table3');

Now some dynamic SQL fun derived from metadata:

DECLARE @sql NVARCHAR(MAX) = N'';


SELECT @sql = @sql + N'
  UPDATE #x SET nullrows = nullrows 
    + (SELECT COUNT(*) FROM ' + t.t 
    + N' WHERE (' + STUFF((SELECT N' OR ' 
    + QUOTENAME(c.name) + N' IS NULL'
  FROM sys.columns AS c
  WHERE OBJECT_ID(t.t) = c.[object_id]
  FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),
    1, 4, N'') + N')) WHERE t = N''' + t.t + N''';'
FROM #x AS t;

EXEC sys.sp_executesql @sql;
SET @sql = N'';

SELECT @sql = @sql + N'
  IF EXISTS (SELECT 1 FROM ' + t + N' WHERE ' 
    + QUOTENAME(c.name) + N' IS NULL)
  UPDATE #x SET nullvalues = nullvalues 
    + (SELECT COUNT(*) FROM ' + t + N' WHERE ' 
    + QUOTENAME(c.name) + N' IS NULL),
    nulldistribution = nulldistribution + '''
    + c.name + N':'' + RTRIM((SELECT COUNT(*) FROM ' 
    + t + N' WHERE ' + QUOTENAME(c.name) + N' IS NULL))
    + N'','' WHERE t = N''' + t + N''';'
  FROM #x AS t
  INNER JOIN sys.columns AS c
  ON OBJECT_ID(t.t) = c.[object_id];

EXEC sys.sp_executesql @sql;

SELECT * FROM #x;

Results:

t             nullrows   nullvalues   nulldistribution
dbo.table1    1          1            b:1,
dbo.table2    2          3            a:1,b:2,
dbo.table3    0          0

I'll leave it as an exercise to the reader how to send an e-mail based on those results - that does not exactly seem like the hard part of this.

For part 2 of your question, as I said in a comment:

There is no code you could possibly write that would tell you that DueDate and PaymentDate contain the same kind of information, other than checking that they share the same data type. This is what source control and documentation are for.

I would argue that this is a different and unanswerable question.

Best Answer

Related Solutions

MySQL Insert into table, where name = id, if not exist, then insert name and use that ID

Sql-server – What would be the most efficient way to check for null values in all columns for multiple tables

Related Question