Postgresql – Database scheme for web log analytics

database-designpostgresqlschema

I analyse web logs (nginx) and want store (organize) result in DB.
After analyse and aggregate my result looks like:

Date and Time (Truncate it by hours)
Country
City
URL (Page on the website)
Referrer (source website)
UTM_Medium
UTM_Campaign
c_sessions
c_views
avg_session_time
c_js_event_1
c_js_event_2
c_js_event_3

And so on, about 30 columns.

For that data I prepare "big table"

CREATE TABLE public.cca_data_hours
(
  datetime timestamp without time zone NOT NULL,
  id_geo_country integer NOT NULL,
  id_geo_city integer NOT NULL,
  id_web_page bigint NOT NULL,
  id_web_referrer bigint NOT NULL,
  id_utm_campaign bigint NOT NULL,
  id_utm_medium bigint NOT NULL,
  c_sessions integer,
  c_views integer,
  avg_sd integer,
  c_jse_1 integer,
  c_jse_2 integer,
  c_jse_3 integer,
  ...
  CONSTRAINT cca_data_hours_pri PRIMARY KEY (datetime, id_web_page, id_web_referrer, id_utm_campaign, id_utm_medium, id_geo_country, id_geo_city)
)

I expect 100 000 000 records.

My selections

I want see all data by countries, cities
I want see data per page
I want see data per source (referrer)
I want see data per page, source and utm campaign
Some other combinations

My questions

Is it normal use so many columns as primary key (index)? Primary key now contains 7 columns, it may up to 10-15 columns in the future.
I think this is a simple solution how to organize this data in one object-relational DB table. Do you know better solution, practice?

Best Answer

So many columns in a primary key sounds strange, at least. That means to me that the data is not really adhering to a sensible uniqueness definition - this is more often than not like this in the case of logs. For this reason, I wouldn't even try to find a PK here, just define indexes based on the queries used against the table. One could call this opportunistic indexing ;)

Note that in general, in a relational models PKs are very important. Here it is more the case the data is not really relational, only stored in a RDBMS.

Best Answer

Related Solutions

Database Design – New Table vs New Columns

Postgresql table with one integer column, sorted index, with duplicate primary key

Related Question