MySQL database design to sort and filter air traffic data

database-designMySQLmysql-5.6

I am struggling to design the correct database / structure for what I'm trying to do. Let me try and explain the situation.

Background

We are a group of enthusiasts who monitor air traffic via ADS-B reception. If you have seen www.fr24.com then you have a pretty good idea about it all. All planes broadcast a signal containing the unique HEXCODE identifier, together with speed, position, etc.

We who receive the signals store this information in a database of sorts, all based on the HEXCODE that was transmitted by the big bird in the sky. Depending on what hardware and software you use, that determines the db that the software creates.

But many times you see a "new" plane and the db has no reference (prior data) for it, say what model or airline it is, so it ends up as an "unknown". Each one tries to update the unknowns from a variety of different sources on the internet, so when next you see it, you have a name and model and airline etc. attached to it. This is just for own viewing on local computer.

That is the background of the project that I'm busy with.

Data

We are now trying to consolidate all the data each user has gathered into one database, update what is missing, help each other with missing information and try to go forward with an up-to-date db. This is where it becomes difficult now.

I have managed to get 6 different datasets (lets not call them databases), each with its own data in many different fields. I have stripped down the CSV/xls files to contain the important fields: (MODATE is the last seen/modified date)

+---------+---------+-------+-------+------+-------+--------+----------+---------+
| HEXCODE | LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR | MODDATE |
+---------+---------+-------+-------+------+-------+--------+----------+---------+

Datasets Bravo, Delta, Hotel, Sierra (let's name them) contain:

Bravo:

+---------+-------+-------+------+-------+--------+----------+---------+
| LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR | MODDATE |
+---------+-------+-------+------+-------+--------+----------+---------+

Charlie (the most correct source in terms of what it contains):

+-------+-------+------+--------+----------+
| REGNO | MODEL | TYPE | SERIAL | OPERATOR |
+-------+-------+------+--------+----------+

Foxtrot (the most accurate mapping of HEXCODE to REGNO available):

+---------+---------+-------+-------+------+-------+--------+----------+
| HEXCODE | LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR |
+---------+---------+-------+-------+------+-------+--------+----------+

So I created a DB, called it "aircraft1" with tables named:

  • Bravo
  • Charlie
  • Delta
  • Foxtrot
  • Hotel
  • Sierra

and each of those tables contain the same fields:

+---------+---------+-------+-------+------+--------+----------+------------+
| hexcode | hexlast | regno | model | type | serial | operator | moddate    |
+---------+---------+-------+-------+------+--------+----------+------------+

… though some of the fields are NULL (based on the source where it came from, and if that source has that data. I imported the relevant datasets into the relevant tables.

Queries

Now at this point I can ask the DB anything with my limited SQL-speak:

select * from {table} where {regno/hex} = {whatever}

…and it gives accurate information (compared with the different data sets).

The ultimate purpose of all of this, is to combine all the data and create a current and complete database/dataset of everything, where the selection would be something like:

  1. Select a HEXLAST from BRAVO/DELTA/HOTEL/SIERRA
    check in FOXTROT if it's a HEXLAST or HEXCODE to find the REGNO
    then goto CHARLIE and pull the other fields in.
  2. If unable to find the HEXLAST in FOXTROT, check in others (datasets B/D/H/S) if there is a REGNO, if found, goto CHARLIE and pull the other fields in.
  3. If not found in CHARLIE, check others (datasets B/D/H/S) for the most current MODDATE and select other fields from that dataset.
  4. If no match found at all, list it as UNKNOWN with the MODDATE (last seen)
  5. Create a new table with the updated information.

Questions

Speaking to peers, they advised me I'm silly to have created tables based on the data sources, I should rather have created tables named: hexcode, hexlast, regno, etc. I should also have created a Primary Key that is a number unique to every record, rather than a Primary key within the table itself (where depending on the data in the table, some PK's are the REGNO, others the HEXLAST, etc).

Basically doubting what I have done thus far, asn wondering if ever I would be able to get the desired output from all of this.

How would YOU (with your experience and expertise) would have gone about this?

What I have created thus far, I am more than willing to share with anyone willing to help with this project. The information in there is basically in the public domain, so its nothing uber secret/confidential at all).

My strength lies in PC's, networking and wireless networks, not in databases. Google is helpful to a great extent, but when it comes to custom/specific stuff like this, finding the right answer/solution/method … thats not really out there.

Any help and advice, pointers and slaps in the right direction, will be appreciated.

Best Answer

It is 'rarely' good form to have two tables with the same set of columns. It is 'usually' better to have one table with an extra column that specifies 'dataset' the row belongs to (Bravo/Delta/...).

Often it is helpful to think of the schema (table definitions, etc) from an "Entity/Relation" point of view. Start with the Entities such as (I guess) Signals, Aircraft, etc. Where necessary, use "many-to-many" relation tables. Think of students:classes. Entities are students, classes, teachers, etc. The Students_Classes relation table says which class(es) each student is in and which student(s) are in each class.