Database Design: Linking ‘vague’ entries to associations

database-design

Experienced developer, inexperienced DBA here.

I'm currently working on a project that lets users input the names of freight vehicles for particular packages. Eg:

Company: White Star Line
Vehicle: The Titanic
Vehicle Type: Ship

At the present time, all three of those fields are unrestricted text fields. So effectively the data is useless for anything except display. If someone enters the exact same details as above, the DB has no means of realising it's the same ship as a record with the exact same entries.

We cannot restrict any of these fields to a set of predefined values (too many potential entries, need to be open to anything), but I'd love to unify some of the more frequently used values. I'm wondering about the most elegant/efficient way of doing this.

My current line of thinking is to add an extra set of database tables with 'recognised company' & 'recognised vehicle', which work in parallel to the existing table. So if a user enters "White Star Line" as the company, and there's a pre-approved entry in the "recognised company" table, it forms an association with that company in an optional 'recognised_company_id' field.

Similarly, if the user enters White Star Line AND 'The Titanic' as ship, and there's an entry for that ship WITH that brand, it enters the association into recognised_vehicle_id.

A few things:

  • Autocomplete on the input fields could hopefully eliminate some spelling errors/syntax differences on identical brands/vehicles.

  • Reviewing a count of duplicates of the existing table could help determine which recognised companies/vehicles should be manually added.

This is far from a pretty solution, but it's the best I've got. I'm guessing this is something folks have to do quite regularly, and I'm just wondering – is there a more graceful approach to what I'm trying to achieve? Any rules of thumb for tackling this sort of issue?

Thanks in advance

Best Answer

One possibility would be to build a list of the values you currently have, and make users pick from that list, or enter it separately to get it on the list.

I do this all the time in various web interfaces - when I need to pay a bill through my online bank, for example, I first have to go create a payee. I curse this when I am only paying a one-time bill, but I love that I don't have to enter my account number and other details every time I make a car payment, mortgage payment or pay an electric bill - only the first time.

This will also reduce the noise from "almost" duplicates, for example if one person spells it Weber's and another spells it Webbers hopefully the second user will just use the first even if they don't think it's correct (and if they're both wrong you still only have to correct one row).

If you make users see the long-term benefit there should be no screaming about the up-front pain.