Database Design – Storing Historical nmap Data

database-design

I am designing a database to store nmap data. I want to store two basic types of data:

  1. Information about the nmap scans themselves (e.g. start and end times, nmap version used, command line arguments used)
  2. Information about the hosts on the network (e.g. host names, IP and MAC addresses, open ports, services running)

The primary purpose of my application is to generate a "map" of the network; essentially, a listing of which hosts are up, which hosts are down, which ports are open on which hosts, etc. For this part of the application, all I care about is the current state; in other words, the results of the latest nmap scan. However, my application also needs to do some historical reporting, like generate a graph of host OS over time.

I read this StackOverflow answer about storing historical data and it seems logical: only track the current state in your main tables and use audit tables for historical reporting. So I would have, for example, a hosts table with the following columns:

host_id
name
ip_address
mac_address
status

and a ports table with the following columns:

port_id
host_id
number
status

Every time I run nmap, I would truncate the hosts and ports tables and insert the new results.1 The corresponding audit tables would track all of the changes over time so I could do historical reporting. Easy enough.

The problem(s)

What I can't figure out is the best way to store the information about the nmap scans themselves. If I followed the above structure, I might have a scans table with these columns:

scan_id
start
end
version
arguments

and a corresponding audit table. But the scans table would only ever contain one row (the most recent scan). This seems utterly pointless. There would also be no need for a scan_id column.

Okay, drop the audit table just for scans. Every time I run nmap, create a new row in the scans table and increment the scan_id. Problem solved.

However, now I want to generate a graph of number of hosts on the network over time. This would require me to join the scans table with the hosts_audit table. But now I need to add scan_id as a column to the hosts and ports tables in order to do the join. If I do this, however, every row in the hosts and ports tables would have the same scan_id, since those tables are truncated every time I run nmap. Hmmmm.

This is my first time designing a database from the ground up, and something doesn't smell right. I want the hosts table to reflect the present state of the network, but truncating the table every time I run nmap seems…unorthodox. I want to be able to connect data from the scans table with data from the hosts table, but can't figure out a way that makes sense. What exactly is wrong with the design I've outlined above and how can I improve it?


  1. I don't think simply updating rows would work; for example, if a particular host is decommissioned, it would no longer appear in the nmap results but without truncating, would stay in the hosts table indefinitely. However, it feels a little odd to be truncating all my tables every few minutes…

Best Answer

I'm not sure you need audit tables. I think a simple schema like this would be sufficient:

scans
-----
  scan_id (PK)
  start (this must be a datetime)
  end (this must be a datetime)
  version
  arguments

hosts
-----
  host_id (PK)
  scan_id (FK to scans.scan_id)
  name
  ip_address
  mac_address
  status

ports
-----
  port_id (PK)
  host_id (FK to hosts.host_id)
  number
  status

Each scan has a start and end so you know when it took place. From this you can easily determine which is the most recent scan. Since hosts has a foreign key to scans, you can always tell which hosts were found with a specific scan. Similar for ports.

In this example, host_id is simply a numeric identifier generated by the database. If you want to store something like myhost.com, add a hostname column to hosts. Same thing for ports.