Have a MySql table that records magnetic stripe swipe times from identity cards:
CREATE TABLE `swipes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`personalid` varchar(20) NOT NULL,
`swipe_time` datetime NOT NULL,
`status` tinyint(1) NOT NULL COMMENT '1=registered,0=not',
`ride_taken` tinyint(1) NOT NULL DEFAULT '0',
`source` tinyint(4) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `my_key` (`personalid`,`swipe_time`,`status`,`ride_taken`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The unique key is there to prevent exact duplicate inserts.
However there is another source of almost duplicates where the card gets swiped twice in quick succession or even for some odd reason within a minute or so when there may be other cards swiped in between. In some ways this is like bounce on electronic switch contacts.
Here's a snippet of rows to illustrate:
id personalid swipe_time status ride_taken source
1661 C08877547 2012-10-21 01:12:08 1 1 3
1662 C09364782 2012-10-21 01:23:38 1 1 3
1663 C09364782 2012-10-21 01:23:48 1 1 3
1664 D09490557 2012-10-21 01:24:39 1 1 3
What I want is a query that will remove entries like id 1663, that are within some specified time of rows with the same personalid, status=1, ride_taken=1 and source
Yes I have searched and inspected both here and on stackoverflow
such as https://stackoverflow.com/questions/18932/how-can-i-remove-duplicate-rows
for a similar situation but have not found an answer yet. If there is one already, please point me at it.
I'm at a loss how to apply a time range filter for the swipe_time to delete only similar rows within a specified time range.
Best Answer
Here is something very quick and dirty
Get the minimum datetime for each personalid, status, ride_taken
First let's load your sample data:
Here is the output of my proposed query
Give it a Try !!!
SUGGESTION #1
To accommodate the query better, add this index
SUGGESTION #2 (Optional)
To accommodate more than 255 swipe stations, perhaps
source
should besmallint unsigned
SUGGESTION #3 (Optional)
Some Transit Systems allow multiple swipes (up to 4 for PATH TransHudson) in one station. You could give some additional thought on this should you have to allow multiple swipes.
UPDATE 2012-10-24 17:30 EDT
If you are trying to limit within a time range, I have another viewpoint for you
Here is the subquery within my answer:
Let's say you want to limit to the Latest Entry Per 10 seconds. You will first have to generate a ten-second interval boundary like this:
Perhaps the thing to do is introduce the ten_second boundary into the table
I ran this
Once you do that, going forward, any INSERT within the same 10-second interval gets immediately rejected. That way, you do not have to code it. It will be part of the table design.
Naturally, you will need a trigger to populate
swipe_int
. Here is the trigger: