MySQL full text search ranking

full-text-searchMySQL

I found in this article the formula that MySQL use to ranking in FTS

w = (log(dtf)+1)/sumdtf * U/(1+0.0115*U) * log((N-nf)/nf)

Where

dtf is the number of times the term appears in the document sumdtf
is the sum of (log(dtf)+1)'s for all terms in the same document U
is the number of Unique terms in the document N is the total
number of documents nf is the number of documents that contain
the term

But I wonder that what is 0.0115?

Best Answer

From your source:

The normalization factor is the middle part of the formula. The idea of normalization is: if a document is shorter than average length then weight goes up, if it's average length then weight stays the same, if it's longer than average length then weight goes down. We're using a pivoted unique normalization factor. For the theory and justification, see the paper "Pivoted Document Length Normalization" by Amit Singhal and Chris Buckley and Mandar Mitra ACM SIGIR'96, 21-29, 1996: http://ir.iit.edu/~dagr/cs529/files/handouts/singhal96pivoted.pdf. The word "unique" here means that our measure of document length is based on the unique terms in the document. We chose 0.0115 as the pivot value, it's PIVOT_VAL in the MySQL source code header file myisam/ftdefs.h

So, it's a "best practice" pivot value.