How to Merge Two Large MySQL Tables Keeping the Latest Record

MySQL

I have two similar large tables with below table structure

CREATE TABLE tableA (
  col1 varchar(20) NOT NULL,
  col2 varchar(40) NOT NULL,
  date date NOT NULL,
  time time NOT NULL,
  PRIMARY KEY (col1)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE tableB (
  col1 varchar(20) NOT NULL,
  col2 varchar(40) NOT NULL,
  date date NOT NULL,
  time time NOT NULL,
  PRIMARY KEY (col1)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I want to create one large table merging above two tables with the latest record

CREATE TABLE combine_table (
  col1 varchar(20) NOT NULL,
  col2 varchar(40) NOT NULL,
  date date NOT NULL,
  time time NOT NULL,
  PRIMARY KEY (col1)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Some data may duplicated in table A and B.
How can I achieve this.
Note that these tables are very large

Best Answer

Since you have very much data at hand, I suggest you merge your date and time columns first. Then you can use an index efficiently. If you don't, you will have to do something like

...WHERE CONCAT(date, ' ', time) = SELECT MAX(CONCAT(date, ' ', time)) ...

So, first do this for both tables.

ALTER TABLE tableA ADD COLUMN creation_date datetime; /*or whatever name, just make it meaningful and don't use keywords*/
UPDATE tableA SET creation_date = CONCAT(date, ' ', time);
ALTER TABLE tableA DROP COLUMN date, DROP COLUMN time;
CREATE INDEX idx_dt_tableA_creation ON tableA(creation_date);

Then you can insert both tables into your combine_table (Note, left this for completeness, the second option is much better).

INSERT INTO combined_table
SELECT col1, col2, creation_date
FROM (
      SELECT col1, col2, creation_date 
      FROM tableA 
      UNION ALL
      SELECT col1, col2, creation_date 
      FROM tableB 
) sq /*subquery_alias*/
WHERE creation_date = (SELECT MAX(creation_date) FROM (
                                    SELECT col1, col2, creation_date 
                                    FROM tableA 
                                    UNION ALL
                                    SELECT col1, col2, creation_date 
                                    FROM tableB 
                        ) another_sq 
                        WHERE sq.col1 = another_sq.col1
                       )
;

Nonetheless, this will be a heavy operation, if you really have that much data.

Now that I think of it, there's a better way of doing it.

First insert tableA

INSERT INTO combined_table
SELECT * FROM tableA;

Then do an insert on duplicate key update.

INSERT INTO combined_table c
SELECT * FROM tableB b
ON DUPLICATE KEY UPDATE 
/*you can skip col1, since it's the identifying primary key here*/
col2 = IF(b.creation_date > c.creation_date, b.col2, c.col2),
creation_date = IF(b.creation_date > c.creation_date, b.creation_date, c.creation_date)
;