MySQL – Fetch Only New Data from Auto-Updating Online SQL Database

MySQLpython

I have an online MySQL (presto) database which keeps on auto-updating (i.e., new rows keep getting added to the tables(views, to be exact) every day). I want to fetch the data from it every day but don't want to refetch the data that I have already fetched before.

I am using python3 for extracting the data from the online host database as:

from pyhive import presto
import numpy as np
cursor=presto.connect().cursor()
sql = 'select * from schema.table'
cursor.execute(sql)
rows = np.array(cursor.fetchall())
cursor.close()

Doing this takes time and fetches all the data (new data as well as already fetched data). I am storing the data in rows variable in a csv file afterward.

Is there any way to get only the new data (id is the primary key in the table)?

Thanks in advance.

Best Answer

Assuming id is an incrementing number or something else which grows every time you insert a new row (such as a timestamp), this should be quite straightforward:

First read the last id value in the csv file from the previous run, and then use this in your query line:

sql = 'SELECT * FROM schema.table WHERE id > %d ORDER BY id ASC' % id

UPDATE:

If id is not an incrementing key, e.g. it's a UUID, then this becomes much harder. Other DBMSes (e.g. Oracle) have rowid which could probably have been used here, though that is not a MySQL feature. MariaDB has rowid in its CONNECT storage engine, so you could set up a local MariaDB instance, install the CONNECT engine, create a CONNECT table connected to your remote MySQL table schema.table with the special rowid column using a statement such as:

CREATE TABLE connect_table (
 col1 float,
 col2 datetime,
 col3 varchar(20),
 row_id int special=rowid
)
ENGINE=CONNECT TABLE_TYPE=MYSQL DBNAME=mytest TABNAME=table
CONNECTION='mysql://root:password@localhost';