How to sync a database with an external full text engine

full-text-search

How do you keep your external full-text engine (Lucene, Solr, etc) in sync with your database data? For example, keeping track of inserts, deletes, updates.

My current solution involves a big union all query on modified records, that is polled by Lucene every few minutes.

Any better solutions?

Thanks!

Best Answer

Depends on your database engine. For example, in Microsoft SQL Server, there are several database engine features that can track which rows have been changed, and then you can grab just those rows in your periodic query.

Another technique I've seen is to add an UpdatedDateTime field on the tables you want to search. Use a default value of the current date/time, and add an update trigger so that whenever the record is updated, the UpdatedDateTime is reset to the current date/time. Keep in mind that you'll probably want to index that field since you'll be querying it frequently. Then, in your app, just poll for all records where UpdatedDateTime > the last time you updated.

If you take the latter approach, you'll probably want to do full repopulations periodically to catch any goofups where the polling app failed for a while.