Your question isn't really clear, but I think you want something like this:
select group_id,
item_id,
comment
from (
select group_id,
item_id,
comment,
row_number() over (partition by group_id order by item_id) as rn
from the_unknown_table
) t
where rn = 1;
(you didn't state your DBMS, so this is ANSI SQL)
SQL Fiddle demo: http://sqlfiddle.com/#!12/a8471/1
In Postgres (tested with v9.3) you can use the dedicated inet
data type, to store IPv4 addresses with only 7 bytes (or IPv6 with 19 bytes) and with automatic integrity checks and dedicated functions and type casts etc.
Schema
The translated schema could look like this:
CREATE TABLE log (
id serial PRIMARY KEY
, dst_port int
, src_ip inet
, dst_ip inet
);
CREATE INDEX ON log (dst_port);
CREATE INDEX ON log (src_ip);
I moved to dst_port int
to the 2nd position to optimize alignment / padding:
Now we can use standard window functions (not possible in MySQL).
Step 1: Fold groups of consecutive dst_ip
for same (dst_port
)
One special difficulty: The aggregate function min()
/ max()
are not yet implemented for inet
in Postgres 9.4. Both are in the upcoming Postgres 9.5!
So I substituted with DISTINCT ON
in the first step:
SELECT DISTINCT ON (dst_port, ip_grp)
dst_ip, count(*) OVER (PARTITION BY dst_port, ip_grp) AS ip_ct, dst_port
FROM (
SELECT dst_ip, dst_port, dst_ip - row_number() OVER (PARTITION BY dst_port
ORDER BY dst_ip) AS ip_grp
FROM log
ORDER BY dst_port, dst_ip
) sub
ORDER BY dst_port, ip_grp, dst_ip;
Result as desired - with a count of rows (could be upper IP as well).
You can subtract/add integer
from/to the inet
type. By subtracting the row_number()
all consecutive rows get the same grp
- the value of grp
is irrelevant, just the fast that it's the same per partition (dst_port
).
Then we can GROUP BY ...
- or in this special case DISTINCT ON dst_port, ip_grp
. I use another window function to get the count ip_ct
in the same step: count(*) OVER (PARTITION BY dst_port, ip_grp) AS ip_ct
.
Note that consecutive IPs can cross byte boundaries (see my comment to question).
Detailed explanation for this technique:
Step 2: Fold groups of consecutive dst_port
for same (dst_ip, ip_ct)
SELECT dst_ip, ip_ct, min(dst_port) AS dst_port, count(*) AS port_ct
FROM (
SELECT *, dst_port - row_number() OVER (PARTITION BY dst_ip, ip_ct
ORDER BY dst_port) AS port_grp
FROM (
SELECT DISTINCT ON (dst_port, ip_grp)
dst_ip, count(*) OVER (PARTITION BY dst_port, ip_grp) AS ip_ct, dst_port
FROM (
SELECT dst_ip, dst_port, dst_ip - row_number() OVER (PARTITION BY dst_port
ORDER BY dst_ip) AS ip_grp
FROM log
ORDER BY dst_port, dst_ip
) sub1
ORDER BY dst_port, ip_grp, dst_ip
) sub2
) sub3
GROUP BY 1, 2, port_grp
ORDER BY 1, 3, 2;
Basically, repeat the same logic like in the first step, applied to the result of the first step.
But now you have to group on ip_ct
additionally. And this time, you can use the simpler min(dst_port)
, since the port number is a plain integer
.
SQL Fiddle demonstrating all.
Best Answer
The traditional solution, the one you may find in books, is to do a self join: first find that "max date per group" of yours, then join to self table on rows with said max date.
However, some hacks allow you to avoid that. Consider the following query:
GROUP_CONCAT
is an aggregation function which implodes values onto one string. It allows forORDER BY
, which we utilize viaORDER BY date_added DESC
so as to implode our desired value first. We then slice up the first token in the string viaSUBSTRING_INDEX
.The downside here (apart from making the query quite the frightening appearance) is that your numerical values are transformed into texts. Typically no big deal with SQL, but please be aware.
See also my old post: Selecting a specific non aggregated column data in GROUP BY
There's another option where you do a semi-self-join, a much lighter one; you will have to give up usage of index. It's quite long to describe; it still uses
GROUP_CONCAT
andSUBSTRING_INDEX
, but only for the purpose of creating a derived table with only relevant keys. This derived table is then joined with original table. See an example in SQL: selecting top N records per group, another solution.