Designing a user table, FName LName v.s Username

database-designfull-text-search

I did some research on different ways of getting user's name. Some database designers prefer to have a username that contains first name and last name with space separator, and some other designers have it in two separate fields such as fname and lname. When it comes to searching using mySql, which approach makes it easier to search for a user based on his full name, or only either his last name or first name ? Any advice on the design approach ? My application is probably going to have about 5,000,000 users registered, I'm a bit concerned about in what way to approach user full name.

This is how I'm doing it, assuming that the user enter "alex coner" in one input textbox and press enter, then I parse the string based on space separation and then run the query like this which doesn't know which one is fname and lname :

select * from User 
where UserFName in ("alex", "coner") 
   OR UserLName in ("alex", "coner") 

is there a way to run this and then get the one that matches the closest ? Assuming there is these users as below:

alex buche
coner johnson
alex coner
david rudd

and the result appears as :

alex coner
alex buche
coner johnson

Best Answer

The answer from RLF lays out most of the basics very clearly. There are a few places where the basic approach may fall flat:

  1. If you are searching for a substring within either the first or last name and that substring is somewhere in the middle of the name. This will prevent the index from being used and result in a full table scan. This could impact performance. For example, you might have the first name 'MARY MARGARET' somewhere in the database. You would want to return this result if a user searched on 'MARGARET' meaning that you need to generate something like: select * from user where fname like '%MARGARET%'. This query would not use the index on the fname column because of the wildcard needed at the beginning of the search term.
  2. One of the things that you allude to is the need for ranking by relevance within the results. Again, this is somewhere where a simple index on the columns fails. You will get the results, but you have no way of measuring relevance. You could establish some rules for ranking the results depending on whether they represent an exact or partial match, but identifying and tuning the criteria will take some work.
  3. Simple substring matching also will not support fuzzy matches or misspellings in the search. For example, you would probably like for a search on 'JON' to return both name containing 'JON' and 'JOHN' with higher relevance assigned to the former.

If you have a need for search functions in your application other than this simple name search, you are finding that performance is too slow using simple indexes, or you find that you really need fuzzy matching, it is definitely worth considering using the full text search capabilities in the MySQL itself or integrating an external search engine like Lucene. This approach would address both of the gaps described above. My team has had a fair amount of success using Lucene and the setup wasn't nearly as hard as we thought it would be up front.