MySQL – Fetch Only New Data from Auto-Updating Online SQL Database

MySQLpython

I have an online MySQL (presto) database which keeps on auto-updating (i.e., new rows keep getting added to the tables(views, to be exact) every day). I want to fetch the data from it every day but don't want to refetch the data that I have already fetched before.

I am using python3 for extracting the data from the online host database as:

from pyhive import presto
import numpy as np
cursor=presto.connect().cursor()
sql = 'select * from schema.table'
cursor.execute(sql)
rows = np.array(cursor.fetchall())
cursor.close()

Doing this takes time and fetches all the data (new data as well as already fetched data). I am storing the data in rows variable in a csv file afterward.

Is there any way to get only the new data (id is the primary key in the table)?

Thanks in advance.

Best Answer

Assuming id is an incrementing number or something else which grows every time you insert a new row (such as a timestamp), this should be quite straightforward:

First read the last id value in the csv file from the previous run, and then use this in your query line:

sql = 'SELECT * FROM schema.table WHERE id > %d ORDER BY id ASC' % id

UPDATE:

If id is not an incrementing key, e.g. it's a UUID, then this becomes much harder. Other DBMSes (e.g. Oracle) have rowid which could probably have been used here, though that is not a MySQL feature. MariaDB has rowid in its CONNECT storage engine, so you could set up a local MariaDB instance, install the CONNECT engine, create a CONNECT table connected to your remote MySQL table schema.table with the special rowid column using a statement such as:

CREATE TABLE connect_table (
 col1 float,
 col2 datetime,
 col3 varchar(20),
 row_id int special=rowid
)
ENGINE=CONNECT TABLE_TYPE=MYSQL DBNAME=mytest TABNAME=table
CONNECTION='mysql://root:password@localhost';

Related Solutions

MySQL – Most Elegant Way to Replicate Data Contents Periodically

If you don't need to retain the data from the previous day when you update the data, you could do this without downtime:

Day 1:

insert data from API into Table1

Day 2+:

create a new table with the same structure as Table1

create NewTable like Table1;

insert API data into NewTable (note: app is still using Table1 at this point)
switch out the tables in one statement

rename Table1 to OldTable, NewTable to Table1;

drop OldTable now if you want to, or keep some archives

Mysql – Updating records from the past with MySQL query

Define an unique key in your table such that each day can only be represented by one row (UNIQUE KEY(date) would be enough for the example you show). Then use

INSERT INTO $tablename(...) VALUES(...)
    ON DUPLICATE KEY UPDATE col1=VALUES(col1), col2=VALUES(col2);

Alternatively you might use REPLACE INTO, but that does the delete and insert, "burning" through autoincrement IDs which you seem to have.

You might even use the date column as a PRIMARY KEY with this setup.

Best Answer

Related Solutions

MySQL – Most Elegant Way to Replicate Data Contents Periodically

Mysql – Updating records from the past with MySQL query

Related Question