MySQL database design to sort and filter air traffic data

database-designMySQLmysql-5.6

I am struggling to design the correct database / structure for what I'm trying to do. Let me try and explain the situation.

Background

We are a group of enthusiasts who monitor air traffic via ADS-B reception. If you have seen www.fr24.com then you have a pretty good idea about it all. All planes broadcast a signal containing the unique HEXCODE identifier, together with speed, position, etc.

We who receive the signals store this information in a database of sorts, all based on the HEXCODE that was transmitted by the big bird in the sky. Depending on what hardware and software you use, that determines the db that the software creates.

But many times you see a "new" plane and the db has no reference (prior data) for it, say what model or airline it is, so it ends up as an "unknown". Each one tries to update the unknowns from a variety of different sources on the internet, so when next you see it, you have a name and model and airline etc. attached to it. This is just for own viewing on local computer.

That is the background of the project that I'm busy with.

Data

We are now trying to consolidate all the data each user has gathered into one database, update what is missing, help each other with missing information and try to go forward with an up-to-date db. This is where it becomes difficult now.

I have managed to get 6 different datasets (lets not call them databases), each with its own data in many different fields. I have stripped down the CSV/xls files to contain the important fields: (MODATE is the last seen/modified date)

+---------+---------+-------+-------+------+-------+--------+----------+---------+
| HEXCODE | LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR | MODDATE |
+---------+---------+-------+-------+------+-------+--------+----------+---------+

Datasets Bravo, Delta, Hotel, Sierra (let's name them) contain:

Bravo:

+---------+-------+-------+------+-------+--------+----------+---------+
| LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR | MODDATE |
+---------+-------+-------+------+-------+--------+----------+---------+

Charlie (the most correct source in terms of what it contains):

+-------+-------+------+--------+----------+
| REGNO | MODEL | TYPE | SERIAL | OPERATOR |
+-------+-------+------+--------+----------+

Foxtrot (the most accurate mapping of HEXCODE to REGNO available):

+---------+---------+-------+-------+------+-------+--------+----------+
| HEXCODE | LASTHEX | REGNO | MODEL | TYPE | MODEL | SERIAL | OPERATOR |
+---------+---------+-------+-------+------+-------+--------+----------+

So I created a DB, called it "aircraft1" with tables named:

Bravo
Charlie
Delta
Foxtrot
Hotel
Sierra

and each of those tables contain the same fields:

+---------+---------+-------+-------+------+--------+----------+------------+
| hexcode | hexlast | regno | model | type | serial | operator | moddate    |
+---------+---------+-------+-------+------+--------+----------+------------+

… though some of the fields are NULL (based on the source where it came from, and if that source has that data. I imported the relevant datasets into the relevant tables.

Queries

Now at this point I can ask the DB anything with my limited SQL-speak:

select * from {table} where {regno/hex} = {whatever}

…and it gives accurate information (compared with the different data sets).

The ultimate purpose of all of this, is to combine all the data and create a current and complete database/dataset of everything, where the selection would be something like:

Select a HEXLAST from BRAVO/DELTA/HOTEL/SIERRA
check in FOXTROT if it's a HEXLAST or HEXCODE to find the REGNO
then goto CHARLIE and pull the other fields in.
If unable to find the HEXLAST in FOXTROT, check in others (datasets B/D/H/S) if there is a REGNO, if found, goto CHARLIE and pull the other fields in.
If not found in CHARLIE, check others (datasets B/D/H/S) for the most current MODDATE and select other fields from that dataset.
If no match found at all, list it as UNKNOWN with the MODDATE (last seen)
Create a new table with the updated information.

Questions

Speaking to peers, they advised me I'm silly to have created tables based on the data sources, I should rather have created tables named: hexcode, hexlast, regno, etc. I should also have created a Primary Key that is a number unique to every record, rather than a Primary key within the table itself (where depending on the data in the table, some PK's are the REGNO, others the HEXLAST, etc).

Basically doubting what I have done thus far, asn wondering if ever I would be able to get the desired output from all of this.

How would YOU (with your experience and expertise) would have gone about this?

What I have created thus far, I am more than willing to share with anyone willing to help with this project. The information in there is basically in the public domain, so its nothing uber secret/confidential at all).

My strength lies in PC's, networking and wireless networks, not in databases. Google is helpful to a great extent, but when it comes to custom/specific stuff like this, finding the right answer/solution/method … thats not really out there.

Any help and advice, pointers and slaps in the right direction, will be appreciated.

Best Answer

It is 'rarely' good form to have two tables with the same set of columns. It is 'usually' better to have one table with an extra column that specifies 'dataset' the row belongs to (Bravo/Delta/...).

Often it is helpful to think of the schema (table definitions, etc) from an "Entity/Relation" point of view. Start with the Entities such as (I guess) Signals, Aircraft, etc. Where necessary, use "many-to-many" relation tables. Think of students:classes. Entities are students, classes, teachers, etc. The Students_Classes relation table says which class(es) each student is in and which student(s) are in each class.

Related Solutions

Design of maintenance database

IMO It is much better to have a design with multiple tables. You will be able to track changes more effectively that way.

Also, for the fields that change, I would not go with a Name-Value pair of storage. I would actually create tables with appropriately named columns.

Is data space really an issue for you? You should run a few calculations estimates to see if space is really going to be an issue. You'll be very surprised to see how much of a non-factor hard disk space is.

Database design: where do I put this data

What you are doing here is called specialization.

You can look at this answer to a similar question at stackoverflow which will point you in the right direction. I've implemented a slightly modified version with success.

*An other option (that I do not advise!!) is the use of EAV. *

Edit: I've created an example for your situation. (Don't mind the sample data ;) )

CREATE TABLE EventType (
EventTypeID INT NOT NULL PRIMARY KEY,   
EventType VARCHAR(200) NOT NULL   
);

INSERT INTO EventType (EventTypeID, EventType) VALUES (1, 'Play Rockband'),(2, 'Play Call of Duty');

CREATE TABLE Event (
    EventID INT NOT NULL PRIMARY KEY,
    EventTypeID INT NOT NULL,
    Event VARCHAR(200) NOT NULL,
    CONSTRAINT UQ_Event UNIQUE (EventID, EventTypeID),
CONSTRAINT FK_Event_EventType FOREIGN KEY (EventTypeID) REFERENCES EventType(EventTypeID)
);

INSERT INTO Event (EventID, EventTypeID, Event) VALUES (1, 1, 'Play the bass'),(2, 2, 'Sniper!');

CREATE TABLE SpecialEvent1 (
    EventID INT NOT NULL PRIMARY KEY,
    EventTypeID INT NOT NULL,
    Extrafield VARCHAR(200) NOT NULL,
/* Restrict to only insert SpecialEvent1 */
CONSTRAINT CHK_EventTypeID_1 CHECK (EventTypeID = 1),  
/* Check if event exists */  
CONSTRAINT FK_SpecialEvent1_Event FOREIGN KEY (EventID, EventTypeID) REFERENCES Event(EventID, EventTypeID)
);

INSERT INTO SpecialEvent1 (EventID, EventTypeID, Extrafield) VALUES (1, 1, 'Expert');

CREATE TABLE SpecialEvent2 (
    EventID INT NOT NULL PRIMARY KEY,
    EventTypeID INT NOT NULL,
    Extrafield2 VARCHAR(200) NOT NULL,
Extrafield3 VARCHAR(200) NOT NULL,
/* Restrict to only insert SpecialEvent2 */
CONSTRAINT CHK_EventTypeID_2 CHECK (EventTypeID = 2),  
/* Check if event exists */    
CONSTRAINT FK_SpecialEvent2_Event FOREIGN KEY (EventID, EventTypeID) REFERENCES Event(EventID, EventTypeID)
);

INSERT INTO SpecialEvent2 (EventID, EventTypeID, Extrafield2, Extrafield3) VALUES (2, 2, 'Character X', 'Team Z');

You can query the events like this:

SELECT EventType.Eventtype AS [Type of event], Event.Event, SpecialEvent1.Extrafield AS Difficulty
FROM Event
INNER JOIN EventType ON Event.EventTypeID = EventType.EventTypeID
INNER JOIN SpecialEvent1 ON Event.EventID =  SpecialEvent1.EventID AND Event.EventTypeID = SpecialEvent1.EventTypeID;

SELECT EventType.Eventtype AS [Type of event], Event.Event [Play As], SpecialEvent2.Extrafield2 AS Character, SpecialEvent2.Extrafield3 AS Team
FROM Event
INNER JOIN EventType ON Event.EventTypeID = EventType.EventTypeID
INNER JOIN SpecialEvent2 ON Event.EventID =  SpecialEvent2.EventID AND Event.EventTypeID = SpecialEvent2.EventTypeID;

SQL Fiddle: http://sqlfiddle.com/#!6/d1911/1/0