Postgresql – How to model complex systems like networks in Postgres

database-designpostgresqlrdbms

I need help with modelling of a network involving switches (24 port, 48 port, POE, and a combination of those) and edge devices (computers, IP cameras, telephones). From an object oriented perspective, I can code this in my sleep in C++. Switches, for example, would inherit from TwentyFourSwitch, and if they have POE, then from POESwitch, then in each class I would create 24 EthernetPort instances, and each would have a connect function making sure not to get an ethernet port to an SFP port, or a non POE port to a POE device and so on.

With RDBMS, we keep running into problems. First we organized everything to minimize data redundancy. Here is a sample of what we would do (to keep code short I used 2 and 4 port switches instead of 24 and 48).

CREATE TABLE MACAddress{
   id SERIAL INT PRIMARY KEY,
   mac1 INT,
   mac2 INT,
   mac3 INT,
   mac4 INT,
   mac5 INT,
   mac6 INT,
   UNIQUE(mac1,mac2,mac3,mac4,mac5,mac6)
);

CREATE TABLE IPAddress{
   id SERIAL INT PRIMARY KEY,
   ip1 INT,
   ip2 INT,
   ip3 INT,
   ip4 INT,
   UNIQUE (ip1,ip2,ip3,ip4)
);

CREATE TABLE NetworkDevice{
   id SERIAL INT PRIMARY KEY,
   ip INT REFERENCES (IPAddress),
   mac INT REFERENCES (MACAddress),
);

CREATE TABLE TwoPortSwitch{
   id SERIAL INT PRIMARY KEY,
   eth0 INT REFERENCES (NetworkDevice),
   eth1 INT REFERENCES (NetworkDevice),
   netId REFERENCES (NetworkDevice)
);

CREATE TABLE FourPortSwitch{
   id SERIAL INT PRIMARY KEY,
   eth0 INT REFERENCES (NetworkDevice),
   eth1 INT REFERENCES (NetworkDevice),
   eth0 INT REFERENCES (NetworkDevice),
   eth1 INT REFERENCES (NetworkDevice),
   netId REFERENCES (NetworkDevice)
);

CREATE TABLE Camera{
   id SERIAL INT PRIMARY KEY,
   eth0 INT REFERENCES (NetworkDevice),
   netId REFERENCES (NetworkDevice)
);

But the problem we ran into was that it took many joins and unions and it got very confusing very fast. For example, if you want to know what ip eth0 is connected to, you have to do joins/unions on TwoPortSwitch, FourPortSwitch, Camera, NetworkDevice and IPAddress. And mind you we have about 10 different cameras, 10 different switches, and 5 different computers.

We just got fed up and flattened everything, so now all switches are in one big switch table. It has 48 ports of 1 gig ethernet, 48 ports of 1 gig SFP and 48 ports of 10 gig SFP Plus. All in all, there are more than 1000 columns!

Is there a better way to approach this? I am seriously considering building a webserver and having the C++ server run at all times.

Best Answer

TL;DR: Normalize, normalize, normalize. Joins good, unions bad. Narrow tables with lots of rows good, wide tables with few rows bad. Fewer more general tables good, lots of very specific tables bad.

Denormalized structure

You're trying to model relations as objects. This can work, but it gets clumsy fast. It looks like a pretty direct mapping of your C++ classes, so it has lots of tables for specific types of things, and isn't normalized.

PostgreSQL has table inheritance features that make this approach more manageable, if you really want to pursue it. However the use of table inheritance is basically incompatible with the use of foreign keys due to the lack of unique index support across all child relations of an inheritance base relation. This makes inheritance much less useful than it could otherwise be.

Instead try to model it relationally. There are two schools of thought here, bottom-up and top-down modelling. In practice you use a mix of the two most of the time. Your goal is a well normalized database structure.

First, though...

`IPAddress` and `MACAddress` are unnecessary

You're also modelling IPAddress and MACAddress as entities, which they aren't. They're just scalar values. As far as I can tell these are probably direct translations of a C++ class or C struct that stores IP addresses as four uint8_t octets, MAC addresses as six uint8_t octets, etc. This is completely unnecessary in the database and should be done away with entirely. PostgreSQL provides native data types that serve these specific needs:

inet
macaddr

... but if you have to remain cross-database portable you would be better advised storing your IP addresses as numeric with a boolean indicating IPv4 or IPv6, rather than storing them as octets. You need numeric because a bigint (signed 64-bit integer) isn't big enough for an IPv6 address.

BTW, in the network a MAC address may have multiple IP addresses. While no IP address within a given connected subnet may have multiple MAC addresses, the same IP address may appear in discrete networks with different MAC addresses, so I wouldn't try to model this at the schema level. If I did, I'd create an NetworkIP relation with a primary key of (network_id integer REFERENCES Network, ipaddr) to allow the same IP to exist only on disconnected networks. But then you have to consider multilevel NAT and it gets horrifying. Don't go there. I'd leave them as scalars.

Relational model

Now that we've done away with MACAddress and IPAddress, lets look at structure.

You should flatten everything, but not like this:

It has 48 ports of 1 gig ethernet, 48 ports of 1 gig SFP and 48 ports of 10 gig SFP Plus. All in all, there are more than 1000 columns!

Rather than "wide", go "tall". Don't fear joins, joins are your friend.

You have network devices, of various specific kinds, where each kind has various additional characteristics. Each device may have one or more IP addresses and MAC addresses assigned to it (e.g. switch management addresses). Additionally, each device may have zero or more ports, and each port has one MAC address and zero or more IP addresses. To simplify things we define management interfaces as virtual ports.

So lets model that core structure.

CREATE TABLE device (
    id serial primary key,
    device_type text not null,
    device_name text,
    description text
);

-- A port is a logical port, which may correspond to a physical
-- plug, or a virtual port like an internal management interface,
-- IPMI controller, etc.
CREATE TABLE port (
    device_id integer not null references device(id),
    port_name text not null,
    port_mac macaddr not null,
    port_is_virtual boolean not null default 'f',
    PRIMARY KEY (device_id, port_name),
    UNIQUE(device_id, port_mac)
);

-- Now the IP address(es) for a port
CREATE TABLE port_ip (
    port_id integer not null references port(id),
    port_ipaddr ipaddr not null
);

-- A socket is a physical plug, which generally has a logical
-- port associated with it.
CREATE TABLE socket (
  device_id integer references device(id),
  socket_index integer not null,
  socket_name text,
  -- This might be a good use for an enum type, but for now
  -- just use a check constraint
  socket_type text not null check (socket_type IN ('rj45', 'sfp', 'sfpplus'),
  PRIMARY KEY(device_id, socket_index)
);

-- A port may have a socket or not, and a socket may have a port
-- or not, so model via a join table. If you want to prevent
-- a socket having multiple ports or a port having multiple sockets
-- you could use additional unique constraints.
CREATE TABLE port_sockets (
   device_id integer references device(id),
   port_name text references port(port_name),
   socket_index integer references socket(socket_index),
   PRIMARY KEY (device_id, port_name, socket_index)
);

See how it's "vertical" - we have a port table with entries for each port for each host, etc.

So now you want to model switches. These are network devices. They have a certain number of ports based on the model.

You can keep track of the number of ports directly in the entity, or you can just count the number of ports for that switch when you want to know. While keeping track of the port count in the switch entity is a denormalization that duplicates data, in this case it's probably worth it. So I'd do something like:

CREATE TABLE switch (
  switch_id serial primary key,
  device_id integer references device(id),
  nports integer not null,
  model text,
  management_port_name text,

  constraint switch_management_port
   foreign key (device_id, management_port_name)
   references port(device_id, port_name)

);

etc.

The switch is a network device, and ports are associated with it via the device table.

All this is a bit hand-wavey since I don't know exactly what you need to model, but should help you start thinking about it in a way that'll get you moving again.

Learn to love joins.

Modelling connections

To model connections between devices you might want to use an adjacency list to model it as an undirected cyclic graph. That sounds hairy, but it's really not, it's just a simple table saying "this one is connected to this one" without bothering to say which is "from" and which is "to", treating them as symmetrical.

There's lots of established literature on querying adjacency lists from SQL. You'll probably want to get friendly with recursive common table expressions (WITH RECURSIVE) if you want to do it efficiently, but for simple cases you can just do loops in C, or fetch relevant subetss of the adjacency list and search it with boost::graph.

e.g.

CREATE TABLE socket_connection (
    first_device_id integer references device(id),
    first_socket_index integer,
    second_device_id integer references device(id),
    second_socket_index integer,

    PRIMARY KEY (first_device_id, first_socket_index,
                 second_device_id, second_socket_index),

    UNIQUE(first_device_id, first_socket_index),

    UNIQUE(second_device_id, second_socket_index),

    FOREIGN KEY (first_device_id, first_socket_index)
      REFERENCES socket(device_id, socket_index),

    FOREIGN KEY (second_device_id, second_socket_index)
      REFERENCES socket(device_id, socket_index),

    -- A simple way to prevent both (1, 1, 2, 1) and (2, 1, 1, 1)
    -- being stored as separate connections:

    CONSTRAINT first_device_id_is_always_lowest
      CHECK (first_device_id < second_device_id)
);

... which looks hideous, but is actually just a mapping between two sockets plus a lot of sanity checking.

A note on composite keys

I've used composite keys like (device_id, socket_index) a lot above.

Some people like them, some don't. There are advantages and disadvantages to their use. If you want to use generated surrogate keys everywhere, that's fine too.

Reinventing the wheel

You might think "hey, has this been done before?".

I'm quite sure it has. Whether the existing models match your needs is less certain, but I suggest looking over some of them before you go and invent your own from scratch.

pgAdmin timing

When you execute a query from the query tool, the message pane shows something like:

Total query runtime: 62 ms.

And the status line shows the same time. I quote pgAdmin help about that:

The status line will show how long the last query took to complete. If a dataset was returned, not only the elapsed time for server execution is displayed, but also the time to retrieve the data from the server to the Data Output page.

If you want to see the time on the server you need to use SQL EXPLAIN ANALYZE or the built in Shift + F7keyboard shortcut or Query -> Explain analyze. Then, at the bottom of the explain output you get something like this:

Total runtime: 0.269 ms

Best Answer

Denormalized structure

IPAddress and MACAddress are unnecessary

Relational model

Modelling connections

A note on composite keys

Reinventing the wheel

Related Solutions

Postgresql – Streaming Replication on Two Postgres Systems

Postgresql – Postgres multiple joins slow query, how to store default child record

pgAdmin timing

Related Question

`IPAddress` and `MACAddress` are unnecessary