You may want to try the retrieval (re trie val) tree, as known as the TRIE. Some refer to this as a Radix Tree.
The idea is to create tree node structures that contain branches for every character your data field can possibly contain.
Let's use a simple case, a numeric field. Obviously, the character range is 0-9. Each tree node would contain ten(10) branches. Let's take the worst case for a 4 byte unsigned integer, 2^32 - 1, which is 4294967295. What is its length? Just compute the length by take the integer of the log base 10 of 4294967295 and adding 1.
mysql> select floor(log10(power(2,32)) + 1);
+-------------------------------+
| floor(log10(power(2,32)) + 1) |
+-------------------------------+
| 10 |
+-------------------------------+
1 row in set (0.00 sec)
So, you would have a TRIE with a maximum height of 10. Starting at the root of the TRIE, if you have the number 4294967295, you traverse branches 4,2,9,4,9,6,7,2,9, and 5. At each branch, you would perform an array-styled binary search.
If the branch located at that TRIE node is an exact match, you can assign a percentage to that level, and recursively walk down that branch to check for the next digit and return percentages from deeper TRIE nodes to add to the percentage you have at the currently searched TRIE node.
If the branch located at that TRIE node is NOT an exact match, you stop your recursive search there and return either 0 or some other percentage you may want to designate.
Given the sum of return values from all searched TRIE nodes, you may want to sum up the
percentages and divide that answer by the length of the string. In other words,
Pct per Node = (1 / (Number of TRIE nodes that need to be searched)) or Zero(0).
Sum(Pct) = (Number of TRIE nodes exactly matched) / (Number of TRIE nodes that need to be searched [length of the string being searched]).
Given the length of the numberic field you store, you have O(log n) due to field length. For each TRIE node, you have O(log n) for searching for the proper branch. Overall, your search should have O(log (log n)) search time.
This performance stands out ever more if the field is alphanumeric. Assuming using only ASCII, each TRIE node would have 256 branches. The height of the TRIE would depend on the length of the character field. Representing this TRIE for variable-length strings would produce TRIE nodes that would be very sparse, but quickly searchable nonetheless.
Regardless what database you use, carefuly plan the data types you will be using to represent the TRIE node. You may also want to partition the table so that strings of length n terminate in partition n. Thus, you will have O(log n) search time at each partition.
http://en.wikipedia.org/wiki/Trie
http://www.eecs.harvard.edu/~ellard/Q-97/HTML/root/node24.html
http://www.webreference.com/js/tips/000318.html
http://en.wikipedia.org/wiki/Radix_tree
Best Answer
PostgreSQL is perfectly fine for this. You got a couple of options to make this work.
First of all: PostgreSQL has a special index type for that called GIN (http://www.cybertec.at/gin-just-an-index-type/). It is perfect for full-text-search in general.
The cool thing is: In the latest version of PostgreSQL there is support for a thing called jsonb. You can put your string into a JSON document and search on EACH field in the JSON in a nice way using GIN and a couple of cool operators (see http://www.postgresql.org/docs/9.4/static/datatype-json.html). JSONB is really really fast, very powerful and it definitely kills MongoDB on the performance side. In addition to that a couple million of rows are no real big deal for PostgreSQL anyway. There is one more cool thing: PostgreSQL can do fuzzy search on strings (through Nearest-Neighbour search).