Here is something very quick and dirty
Get the minimum datetime for each personalid, status, ride_taken
SELECT B.* FROM
(SELECT personalid,status,ride_taken,MIN(swipe_time) swipe_time
FROM swipes GROUP BY personalid,status,ride_taken) A
INNER JOIN swipes B USING (personalid,status,ride_taken,swipe_time);
First let's load your sample data:
mysql> drop database if exists sdujan;
Query OK, 1 row affected (0.00 sec)
mysql> create database sdujan;
Query OK, 1 row affected (0.00 sec)
mysql> use sdujan;
Database changed
mysql> CREATE TABLE `swipes` (
-> `id` int(11) NOT NULL AUTO_INCREMENT,
-> `personalid` varchar(20) NOT NULL,
-> `swipe_time` datetime NOT NULL,
-> `status` tinyint(1) NOT NULL COMMENT '1=registered,0=not',
-> `ride_taken` tinyint(1) NOT NULL DEFAULT '0',
-> `source` tinyint(4) NOT NULL DEFAULT '0',
-> PRIMARY KEY (`id`),
-> UNIQUE KEY `my_key` (`personalid`,`swipe_time`,`status`,`ride_taken`)
-> ) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.09 sec)
mysql> insert into swipes values
-> (1661,'C08877547','2012-10-21 01:12:08',1,1,3),
-> (1662,'C09364782','2012-10-21 01:23:38',1,1,3),
-> (1663,'C09364782','2012-10-21 01:23:48',1,1,3),
-> (1664,'D09490557','2012-10-21 01:24:39',1,1,3);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from swipes;
+------+------------+---------------------+--------+------------+--------+
| id | personalid | swipe_time | status | ride_taken | source |
+------+------------+---------------------+--------+------------+--------+
| 1661 | C08877547 | 2012-10-21 01:12:08 | 1 | 1 | 3 |
| 1662 | C09364782 | 2012-10-21 01:23:38 | 1 | 1 | 3 |
| 1663 | C09364782 | 2012-10-21 01:23:48 | 1 | 1 | 3 |
| 1664 | D09490557 | 2012-10-21 01:24:39 | 1 | 1 | 3 |
+------+------------+---------------------+--------+------------+--------+
4 rows in set (0.00 sec)
mysql>
Here is the output of my proposed query
mysql> SELECT B.* FROM
-> (SELECT personalid,status,ride_taken,MIN(swipe_time) swipe_time
-> FROM swipes GROUP BY personalid,status,ride_taken) A
-> INNER JOIN swipes B USING (personalid,status,ride_taken,swipe_time);
+------+------------+---------------------+--------+------------+--------+
| id | personalid | swipe_time | status | ride_taken | source |
+------+------------+---------------------+--------+------------+--------+
| 1661 | C08877547 | 2012-10-21 01:12:08 | 1 | 1 | 3 |
| 1662 | C09364782 | 2012-10-21 01:23:38 | 1 | 1 | 3 |
| 1664 | D09490557 | 2012-10-21 01:24:39 | 1 | 1 | 3 |
+------+------------+---------------------+--------+------------+--------+
3 rows in set (0.00 sec)
mysql>
Give it a Try !!!
SUGGESTION #1
To accommodate the query better, add this index
ALTER TABLE swipes ADD UNIQUE KEY `my_key2`
(`personalid`,`status`,`ride_taken`,`swipe_time`);
SUGGESTION #2 (Optional)
To accommodate more than 255 swipe stations, perhaps source
should be smallint unsigned
SUGGESTION #3 (Optional)
Some Transit Systems allow multiple swipes (up to 4 for PATH TransHudson) in one station. You could give some additional thought on this should you have to allow multiple swipes.
UPDATE 2012-10-24 17:30 EDT
If you are trying to limit within a time range, I have another viewpoint for you
Here is the subquery within my answer:
mysql> SELECT
-> personalid,status,ride_taken,MIN(swipe_time) swipe_time
-> FROM
-> swipes GROUP BY personalid,status,ride_taken
-> ;
+------------+--------+------------+---------------------+
| personalid | status | ride_taken | swipe_time |
+------------+--------+------------+---------------------+
| C08877547 | 1 | 1 | 2012-10-21 01:12:08 |
| C09364782 | 1 | 1 | 2012-10-21 01:23:38 |
| D09490557 | 1 | 1 | 2012-10-21 01:24:39 |
+------------+--------+------------+---------------------+
3 rows in set (0.00 sec)
mysql>
Let's say you want to limit to the Latest Entry Per 10 seconds. You will first have to generate a ten-second interval boundary like this:
mysql> SELECT personalid,status,ride_taken,swipe_time,
-> (swipe_time - INTERVAL MOD(UNIX_TIMESTAMP(swipe_time),10) SECOND) ten_sec_in
-> FROM swipes;
+------------+--------+------------+---------------------+---------------------+
| personalid | status | ride_taken | swipe_time | ten_sec_int |
+------------+--------+------------+---------------------+---------------------+
| C08877547 | 1 | 1 | 2012-10-21 01:12:08 | 2012-10-21 01:12:00 |
| C09364782 | 1 | 1 | 2012-10-21 01:23:38 | 2012-10-21 01:23:30 |
| C09364782 | 1 | 1 | 2012-10-21 01:23:48 | 2012-10-21 01:23:40 |
| D09490557 | 1 | 1 | 2012-10-21 01:24:39 | 2012-10-21 01:24:30 |
+------------+--------+------------+---------------------+---------------------+
4 rows in set (0.00 sec)
mysql>
Perhaps the thing to do is introduce the ten_second boundary into the table
ALTER TABLE swipes ADD swipe_int DATETIME;
UPDATE swipes SET
swipe_int = swipe_time - INTERVAL MOD(UNIX_TIMESTAMP(swipe_time),10) SECOND;
ALTER TABLE swipes ADD UNIQUE (personalid,swipe_int);
I ran this
mysql> ALTER TABLE swipes ADD swipe_int DATETIME;
Query OK, 4 rows affected (0.05 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> UPDATE swipes SET
-> swipe_int = swipe_time - INTERVAL MOD(UNIX_TIMESTAMP(swipe_time),10) SECOND;
Query OK, 4 rows affected (0.00 sec)
Rows matched: 4 Changed: 4 Warnings: 0
mysql> ALTER TABLE swipes ADD UNIQUE personalid_swipe_int_ndx (personalid,swipe_int);
Query OK, 4 rows affected (0.05 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> select * from swipes;
+------+------------+---------------------+--------+------------+--------+---------------------+
| id | personalid | swipe_time | status | ride_taken | source | swipe_int |
+------+------------+---------------------+--------+------------+--------+---------------------+
| 1661 | C08877547 | 2012-10-21 01:12:08 | 1 | 1 | 3 | 2012-10-21 01:12:00 |
| 1662 | C09364782 | 2012-10-21 01:23:38 | 1 | 1 | 3 | 2012-10-21 01:23:30 |
| 1663 | C09364782 | 2012-10-21 01:23:48 | 1 | 1 | 3 | 2012-10-21 01:23:40 |
| 1664 | D09490557 | 2012-10-21 01:24:39 | 1 | 1 | 3 | 2012-10-21 01:24:30 |
+------+------------+---------------------+--------+------------+--------+---------------------+
4 rows in set (0.00 sec)
mysql> show create table swipes\G
*************************** 1. row ***************************
Table: swipes
Create Table: CREATE TABLE `swipes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`personalid` varchar(20) NOT NULL,
`swipe_time` datetime NOT NULL,
`status` tinyint(1) NOT NULL COMMENT '1=registered,0=not',
`ride_taken` tinyint(1) NOT NULL DEFAULT '0',
`source` tinyint(4) NOT NULL DEFAULT '0',
`swipe_int` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `my_key` (`personalid`,`swipe_time`,`status`,`ride_taken`),
UNIQUE KEY `personalid_swipe_int_ndx` (`personalid`,`swipe_int`)
) ENGINE=MyISAM AUTO_INCREMENT=1665 DEFAULT CHARSET=utf8
1 row in set (0.01 sec)
mysql>
Once you do that, going forward, any INSERT within the same 10-second interval gets immediately rejected. That way, you do not have to code it. It will be part of the table design.
Naturally, you will need a trigger to populate swipe_int
. Here is the trigger:
mysql> DELIMITER $$
mysql> CREATE TRIGGER swipes_bi BEFORE INSERT ON swipes
-> FOR EACH ROW
-> BEGIN
-> SET NEW.swipe_int = (NEW.swipe_time -
-> INTERVAL MOD(UNIX_TIMESTAMP(NEW.swipe_time),10) SECOND);
-> END $$
Query OK, 0 rows affected (0.11 sec)
mysql> DELIMITER ;
mysql>
Best Answer
The query for populating the
users_club_meta
table looks good to me except for two issues:I am not entirely convinced about
HAVING COUNT(*) > 1
. If you want to create entries for all combinations of(user_id, name, function, description)
, remove that condition.If you want to omit the groups consisting only of empty
school_grade
values, you can use this condition instead:That is assuming that by "empty" you mean NULL. If you mean to exclude empty strings (
''
) as well, you could modify the above condition like this:You are selecting
id
in that query but the column is not in GROUP BY. It is true that MySQL allows you to do that but the manual also discourages that you do that in cases where the non-aggregated non-GROUP BY column has multiple values per group.If you want predictable results, use an aggregate function, for instance MIN():
For the DELETE query, take the final draft of the INSERT statement's SELECT query keeping only
MIN(id)
in the SELECT clause and removing the HAVING clause:That gives you the first ID of every unique combination of
(user_id, name, function, description)
. That will be the rows you want to keep. All the others you want to remove. This is an anti-join, and there are various ways to implement it. In this case you can go with:NOT IN
:LEFT JOIN
+WHERE IS NULL
:The left join method is known to work fast in MySQL but you should test for yourself in your environment to choose what works better for you.