I realize that the question is vague and it depends on hardware and our needs.
We currently have 5 million rows of data, a total of 5GB of data which we want to index using full text indexing. Our data increases quite rapidly and it's not unreasonable to assume that in a few years it will be closer to a billion rows and a TB of data.
The index is searchable by web site users, and they expect responses within a second or two.
Is it reasonable to assume that this data set will be indexable using SQL Server 2012 full text indexing? Is it common to do full text indexing of this amount of data? And is there any good reading on the subject, for example from others' experience?
Best Answer
I've not implemented FTS anywhere near that scale personally but there are documented examples:
1.4TB is mentioned as the source database here but that doesn't necessarily translate to TB+ of full text indexed data.
This example is quite smart, extracting content from files before shuffling them off to cheap cloud storage:
Newsgator is documented as running a 2.5 billion row, 4TB full text implementation and FileControl as having a 2 billion, 1TB system back in 2005. Given the other options available today, would they do the same now?
Given the licensing cost of SQL Server I'd be inclined to consider alternatives if they fit your use case. Lucene or Solr are the obvious open source choices.