From the sake of this question, I will call the source table aaronlax
PROPOSED SOLUTION
SET @ndx = 0;
SET @cur_hash = '';
SELECT entry_id,CONCAT(url_title,IF(ndx>0,CONCAT('-',ndx),'')) url_title
FROM (SELECT
@new_hash := MD5(url_title),
(@ndx := IF(@new_hash=@cur_hash,@ndx+1,0)) ndx,
@cur_hash := MD5(url_title),
url_title,entry_id
FROM (SELECT url_title,entry_id FROM aaronlax ORDER BY url_title) AA) A;
SAMPLE DATA
use test
DROP TABLE IF EXISTS aaronlax;
CREATE TABLE aaronlax
(
entry_id int(10) unsigned NOT NULL AUTO_INCREMENT,
url_title varchar(75) NOT NULL,
PRIMARY KEY (entry_id),
KEY (url_title)
) ENGINE=MyISAM;
INSERt INTO aaronlax (url_title) VALUES
('a-stoney'),('a-stoney'),
('ab-dille'),('ab-dille'),
('ab-dille'),('ac-gifta'),
('ae-jones'),('ae-jones');
SAMPLE DATA LOADED
mysql> use test
Database changed
mysql> DROP TABLE IF EXISTS aaronlax;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE aaronlax
-> (
-> entry_id int(10) unsigned NOT NULL AUTO_INCREMENT,
-> url_title varchar(75) NOT NULL,
-> PRIMARY KEY (entry_id),
-> KEY (url_title)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.02 sec)
mysql> INSERt INTO aaronlax (url_title) VALUES
-> ('a-stoney'),('a-stoney'),
-> ('ab-dille'),('ab-dille'),
-> ('ab-dille'),('ac-gifta'),
-> ('ae-jones'),('ae-jones');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0
mysql> select * from aaronlax;
+----------+-----------+
| entry_id | url_title |
+----------+-----------+
| 1 | a-stoney |
| 2 | a-stoney |
| 3 | ab-dille |
| 4 | ab-dille |
| 5 | ab-dille |
| 6 | ac-gifta |
| 7 | ae-jones |
| 8 | ae-jones |
+----------+-----------+
8 rows in set (0.00 sec)
mysql>
PROPOSED SOLUTION EXECUTED
mysql> SET @ndx = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> SET @cur_hash = '';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT entry_id,CONCAT(url_title,IF(ndx>0,CONCAT('-',ndx),'')) url_title
-> FROM (SELECT
-> @new_hash := MD5(url_title),
-> (@ndx := IF(@new_hash=@cur_hash,@ndx+1,0)) ndx,
-> @cur_hash := MD5(url_title),
-> url_title,entry_id
-> FROM (SELECT url_title,entry_id FROM aaronlax ORDER BY url_title) AA) A;
+----------+------------+
| entry_id | url_title |
+----------+------------+
| 1 | a-stoney |
| 2 | a-stoney-1 |
| 3 | ab-dille |
| 4 | ab-dille-1 |
| 5 | ab-dille-2 |
| 6 | ac-gifta |
| 7 | ae-jones |
| 8 | ae-jones-1 |
+----------+------------+
8 rows in set (0.00 sec)
mysql>
GIVE IT A TRY !!!
CAVEAT
In case you are wondering why the proposed solution works, let's take a look at the subqueries
INNERMOST SUBQUERY
mysql> SELECT url_title,entry_id FROM aaronlax ORDER BY url_title;
+-----------+----------+
| url_title | entry_id |
+-----------+----------+
| a-stoney | 1 |
| a-stoney | 2 |
| ab-dille | 3 |
| ab-dille | 4 |
| ab-dille | 5 |
| ac-gifta | 6 |
| ae-jones | 7 |
| ae-jones | 8 |
+-----------+----------+
8 rows in set (0.00 sec)
mysql>
NEXT LEVEL SUBQUERY
mysql> SELECT
-> @new_hash := MD5(url_title),
-> (@ndx := IF(@new_hash=@cur_hash,@ndx+1,0)) ndx,
-> @cur_hash := MD5(url_title),
-> url_title,entry_id
-> FROM (SELECT url_title,entry_id FROM aaronlax ORDER BY url_title) AA;
+----------------------------------+------+----------------------------------+-----------+----------+
| @new_hash := MD5(url_title) | ndx | @cur_hash := MD5(url_title) | url_title | entry_id |
+----------------------------------+------+----------------------------------+-----------+----------+
| 911adc6db360f218da5069b20cfb1917 | 0 | 911adc6db360f218da5069b20cfb1917 | a-stoney | 1 |
| 911adc6db360f218da5069b20cfb1917 | 1 | 911adc6db360f218da5069b20cfb1917 | a-stoney | 2 |
| b468c0dd0ef78fde7cf5ae14765f9a5a | 0 | b468c0dd0ef78fde7cf5ae14765f9a5a | ab-dille | 3 |
| b468c0dd0ef78fde7cf5ae14765f9a5a | 1 | b468c0dd0ef78fde7cf5ae14765f9a5a | ab-dille | 4 |
| b468c0dd0ef78fde7cf5ae14765f9a5a | 2 | b468c0dd0ef78fde7cf5ae14765f9a5a | ab-dille | 5 |
| 3fff0367af8bf5d3b3b0089b8f187c04 | 0 | 3fff0367af8bf5d3b3b0089b8f187c04 | ac-gifta | 6 |
| 1909ba0f7a57ca081c82d2f7ba54c9e5 | 0 | 1909ba0f7a57ca081c82d2f7ba54c9e5 | ae-jones | 7 |
| 1909ba0f7a57ca081c82d2f7ba54c9e5 | 1 | 1909ba0f7a57ca081c82d2f7ba54c9e5 | ae-jones | 8 |
+----------------------------------+------+----------------------------------+-----------+----------+
8 rows in set (0.00 sec)
mysql>
From this level, I only need three columns: entry_id
, url_title
, ndx
in the outer most level
Then, I append to the url_title
a hyphen and ndx
if ndx
is nonzero
There are performance issues with very complex primary keys. And It may not be defending against duplication as well as a simpler primary key might.
However, there is one design pattern that frequently yields tables with a primary key made up of six or so components. It's star schema fact tables. If the fact table of a star schema has six dimensions, then the primary key will have six components. I've never seen a fact table with no declared primary key, and I think it's well worth the overhead, even though the ETL process still has to be quite carefully written.
Some reporting databases imitate the pattern of star schema even if it's not explicitly designed that way.
100 million + rows is not overly big for a fact table, especially with today's big data.
Best Answer
It seems like your query needs to be something like:
More information on creating aliases (the
AS
keyword).More information on
GROUP BY
More information on the
AVG
function with a handy example that would have served you well.Since you had a question about the
GROUP BY
clause in the comments below...As I understand it MySQL has some quirks with it's
GROUP BY
rules. Those quirks not withstanding, I think of a simpleGROUP BY
clause, like the one in the query above, as sorting the result set into buckets. So in our query above there would be one bucket for each unique value in thecolumn_with_duplicates
column. So each value in thecolumn_to_aggregate
would get dropped into the proper bucket. Once all the rows/values are sorted into their buckets then an average is taken of all the values in each bucket individually.Of course this is just a conceptual model, but it should give you a basic understanding of the role of a
GROUP BY
clause.