Postgresql – Schema design for user activity logging

database-designpostgresql

I'm designing a schema to support logging of user activity, where users must be able to search:

  • Across all events (events of any type), with datetime range, by username;
  • Across events of one type, with same as above and additionally with parameters of that module.

I created this schema:

enter image description here

And designed queries to search:

  • across all events:

    SELECT extract(epoch from log_time) * 1000,
      u.username,
      CASE WHEN l.event_type IN (0, 2) THEN e.template
        WHEN l.event_type = 1 THEN format(e.template, ecs.field)
        WHEN l.event_type = 3 THEN format(e.template, ess.field)
        WHEN l.event_type = 4 THEN format(e.template, ect.ip, ect.port)
      END
    FROM logs l
    JOIN users u ON u.id = l.user_id
    JOIN event_def e ON e.id = l.event_type
    LEFT JOIN event_client_search ecs ON ecs.log_id = l.id
    LEFT JOIN event_switch_search ess ON ess.log_id = l.id
    LEFT JOIN event_cable_test ect ON ect.log_id = l.id
    
  • across same-type events:

    SELECT extract(epoch from log_time) * 1000,
        u.username,
        format(e.template, ect.ip, ect.port)
    FROM logs l
    JOIN users u ON u.id = l.user_id
    JOIN event_def e ON e.id = l.event_type
    JOIN event_cable_test ect ON ect.log_id = l.id
    

This is what the event_def table looks like:

 id |                template
----+----------------------------------------
  0 | Logged in
  2 | Logged out
  1 | Searched in clients %s
  3 | Searched in switches %s
  4 | Cable test switch %s port %s

But what I didn't like, when I have about 100 events (now I have about 50, but haven't implemented them yet), it is going to be a performance issue.

So I thought to merge events with same parameters, like this:

enter image description here

I don't think that will help a lot (it'll cut the number of events by half at most), maybe I'm thinking in the wrong direction?

Best Answer

One of the assumptions that I would ask you to reconsider is whether you really want to have different tables for each event type, or if rather these event types are rows in a table instead of distinct tables.

If you had an EVENT_TYPE table containing your list of event types (about 100 of these) you could take the columns from your current various event tables and make them rows in an EVENT_PARAMETER table.

This type of design would turn your schema changes into data changes when you add new event types. That would save you code maintenance issues and also avoid making your queries more complex and potentially slower.

The instances of events would similarly be kept in two tables, for example a LOG table and a LOG_DETAIL table.

LOG would be an intersection between user and event type.

LOG_DETAIL would be an intersection between log and event parameter.

Your data model might look something like this:ERD

Note that one objection some people may have to this approach is that you end up saving the log parameter values in a string format, rather than in a native format. This is clearly a trade-off. You have to ask yourself is it a good trade-off for your situation.