MySQL – First Names Database Design for Performance

database-designMySQLperformance

I have a large amount of data on users (about 8 million users).

I want to mine this data to create a table with statistics about first names. It should also include as many variations of each name as possible. The variations of first names ("John" and "Johny") should be recognized and treated properly.

The purpose of this table is to quickly find variations for a name and select one, in order to use user's non-standard name when contacting him. The name to be searched for can be his standard/ official ("John") or any variation of it ("Jack, Jackie, Jacky").

So far, I came up with following table structure:

table name: first_name_variations
+---------+----------+-------------+-------+
|  name   | group_id | is_standard | count |
+---------+----------+-------------+-------+
| Jack    |        1 |           1 |    53 |
| Jackie  |        1 |           0 |     5 |
| Francis |        2 |           1 |    32 |
| Frank   |        2 |           0 |    52 |
+---------+----------+-------------+-------+

The "name" column should be primary index. Standard/ official names are to be marked with "is_standard". The variations of one name must be assigned the same group_id.

Should I include surrogate key (e.g. id)? Will it improve something?

Should I create a separate table for name groups, or maybe just marking names with group_id would be sufficient?

Since my database knowledge is rather shallow, can you please propose improvements to his approach.

The database is MySQL

UPD: I think I'll remove 'is_standard' column; it's an overkill.

Best Answer

Adding a surrogate id will not help in this case. You already have a good PRIMARY KEY(name), correct?

The 4 columns look good for the purposes mentioned. You probably need a composite INDEX(group_id, name) to facilitate finding all the related queries.

What do you do about nicknames that apply to two standard names?

What do you do about someone with a non-"standard" name as their 'official' name?