Mysql – Storing sports teams with multiple alternative names

database-designMySQLschema

I am designing a MySQL database to store some football match results and statistics in.

The data for populating the DB will be scraped from multiple sources and the names of the teams will be slightly different on each website. (Some have old names used before the team was rebranded and changed its name completely; some will have abbreviations such as United -> Utd; some will have FC in the name, some won't; other sources will have the names in a language different than English.)

I'm wondering what would be the best way to store that data.

My thinking is to use the TeamID in the main table and have an extra table as a dictionary for translating multiple variants of the team name into the ID. Of course it would be quite a hassle to populate such a "dictionary".

Example

TeamID | TeamName
-----------------------------
1      | Manchester United
1      | Man Utd
1      | MUFC
2      | PSG
2      | Paris Saint-Germain

This is actually a big problem with South American clubs, where one club can have five completely different names in various sources and it's not as easy as shortening United to Utd or abbreviating the name.

So I'll have a script scrape the name of a team, find it in the "dictionary" table, then write all the statistics and info into the main table using just the TeamID. That should also save some memory.

Please let me know if that's a good solution and if you have some different approaches for me.

Best Answer

Wiki answer from comments on the question:

Yes, this is what I do at the day job with textbook names where different schools will call the same textbook something completely different. Hopefully you won't run into an issue where different team names can overlap, as that makes this sort of problem 10x harder. - matigo

You may want to add a source column and an active to help disambiguate abbreviations since, presumably, a single site won't use the same name for two different teams at the same time even if two different sites use the same name for different teams.


Speaking about storing (subj) I'd add a column for additional attributes. Like "primary", "official full", "official short" and so on.. Speaking about scraping script I'd recommend to create normalizing vocabulary, which sets the relation between "any form" of a token and its canonical name (Utd -> United). This may simplify in "some name" converting to canonical name (or in searching for the most close canonical names list). - Akina