Mysql – Mathematical or Computing Alternatives for MySQL Combination Queries operating toward their Performance Limit

configurationMySQLparallelismperformancequery-performance

We regularly run a contest that is becoming quite popular. In the contest, we need to select the combinations of contestants from 8 groups whose cost is under a threshold, and who combined have the largest number of points. Running queries with 2, 3 or 4 groups, the MySQL processing times are respectable, but of course the processing time grows exponentially when either combining more groups or combining groups with more and more members.

When combining contestants from 8 groups, here are some query execution times:

  • Query took 0.0308 sec [3 contestants per group, 8 groups]
  • Query took 0.4668 sec [4 contestants per group, 8 groups]
  • Query took 1.5043 sec [5 contestants per group, 8 groups]
  • Query took 9.0799 sec [6 contestants per group, 8 groups]
  • Query took 46.3970 sec [7 contestants per group, 8 groups]
  • Query took 106.5729 sec [8 contestants per group, 8 groups]

We want to run the contest daily with at least 20 contestants per group, but the MySQL processing times would take longer than a day on the largest cloud instance available.

So first of all, are there any obvious tuning improvements we could make to MySQL to optimize it for this situation? What is the longest time I should expect MySQL to be able to perform a query, and how is that set?

Any thoughts on how we could split the problem to be performed on multiple MySQL servers in parallel?

If we put the dataset (which is not large) into an array, are there any particular programming languages that could perform the mathematical combinations in parallel on multiple processor cores? My understanding is that "MySQL can’t execute a single query in parallel on many CPUs" (High Performance MySQL p. 234), and that PostgreSQL is not a whole lot better.

Here is a sample database with 8 records per group in 8 groups:

CREATE TABLE IF NOT EXISTS `contest` (
  `id` int(7) NOT NULL AUTO_INCREMENT,
  `grp` int(7) NOT NULL,
  `name` varchar(16) COLLATE utf8_unicode_ci NOT NULL,
  `cost` decimal(10,2) NOT NULL,
  `pts` int(7) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `grp` (`grp`),
  KEY `cost` (`cost`),
  KEY `pts` (`pts`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=89 ;

--
-- Dumping data for table `contest`
--

INSERT INTO `contest` (`id`, `grp`, `name`, `cost`, `pts`) VALUES
(1, 1, 'Bob', 7.82, 79),
(2, 1, 'Mary', 7.34, 81),
(3, 1, 'Tim', 7.57, 80),
(4, 2, 'Fred', 10.01, 67),
(5, 2, 'Claire', 9.99, 68),
(6, 2, 'John', 9.98, 69),
(7, 3, 'Steve', 8.57, 93),
(8, 3, 'Pam', 8.82, 91),
(9, 3, 'Sam', 8.13, 91),
(10, 4, 'Tom', 10.43, 85),
(11, 4, 'Dick', 10.35, 97),
(12, 4, 'Harry', 10.39, 94),
(13, 5, 'Bert', 7.50, 75),
(14, 5, 'Sara', 8.50, 85),
(15, 5, 'Terry', 9.50, 95),
(16, 6, 'Frank', 7.88, 51),
(17, 6, 'Bill', 7.77, 53),
(18, 6, 'Greg', 7.93, 58),
(19, 7, 'Dave', 8.75, 66),
(20, 7, 'Jack', 7.88, 61),
(21, 7, 'Jill', 7.44, 64),
(22, 8, 'Jeff', 10.50, 45),
(23, 8, 'Will', 9.50, 55),
(24, 8, 'Kara', 8.50, 65),
(25, 1, 'Karen', 7.41, 74),
(26, 2, 'Alan', 7.52, 75),
(27, 3, 'Barry', 7.61, 76),
(28, 4, 'Corey', 7.72, 77),
(29, 5, 'Ken', 7.83, 78),
(30, 6, 'Gary', 7.88, 78),
(31, 7, 'Henry', 7.80, 79),
(32, 8, 'Jeb', 7.94, 80),
(33, 1, 'Kim', 10.80, 98),
(34, 2, 'Amy', 10.70, 97),
(35, 3, 'Ben', 10.60, 96),
(36, 4, 'Mel', 10.50, 95),
(37, 5, 'Nancy', 10.40, 94),
(38, 6, 'Nelly', 10.30, 93),
(39, 7, 'Polly', 10.20, 92),
(40, 8, 'Robin', 10.10, 91),
(41, 1, 'Bart', 6.10, 61),
(42, 2, 'Annie', 6.20, 62),
(43, 3, 'Allie', 6.30, 63),
(44, 4, 'Lester', 6.40, 64),
(45, 5, 'Violet', 6.50, 65),
(46, 6, 'Victor', 6.60, 66),
(47, 7, 'Kate', 6.70, 67),
(48, 8, 'Jake', 6.80, 68),
(49, 1, 'Noah', 8.90, 89),
(50, 2, 'Lucas', 8.80, 88),
(51, 3, 'James', 8.70, 87),
(52, 4, 'Jim', 8.60, 86),
(53, 5, 'Ryan', 8.50, 85),
(54, 6, 'Dylan', 8.40, 84),
(55, 7, 'Owen', 8.30, 83),
(56, 8, 'Wyatt', 8.20, 82),
(57, 1, 'Evan', 9.10, 91),
(58, 2, 'Tyler', 9.20, 92),
(59, 3, 'Norah', 9.30, 93),
(60, 4, 'Maria', 9.40, 94),
(61, 5, 'Julia', 9.50, 95),
(62, 6, 'Alice', 9.60, 96),
(63, 7, 'Tony', 9.70, 97),
(64, 8, 'Joe', 9.80, 98),
(65, 1, 'Asher', 6.20, 62),
(66, 2, 'Austin', 6.40, 64),
(67, 3, 'Blake', 6.60, 66),
(68, 4, 'Chris', 6.80, 68),
(69, 5, 'Chloe', 7.00, 70),
(70, 6, 'Colin', 7.20, 72),
(71, 7, 'Dirk', 7.40, 74),
(72, 8, 'Dwight', 7.60, 76),
(73, 1, 'Elijah', 7.70, 77),
(74, 2, 'Ella', 7.50, 75),
(75, 3, 'Elliot', 7.30, 73),
(76, 4, 'Isaac', 7.10, 71),
(77, 5, 'Kostas', 6.90, 69),
(78, 6, 'Layla', 6.70, 67),
(79, 7, 'Leo', 6.50, 65),
(80, 8, 'Lily', 6.30, 63),
(81, 1, 'Mark', 7.71, 77),
(82, 2, 'Zach', 7.81, 78),
(83, 3, 'Zoe', 7.91, 79),
(84, 4, 'Rudy', 8.01, 80),
(85, 5, 'Riley', 8.11, 81),
(86, 6, 'Perry', 8.21, 82),
(87, 7, 'Nolan', 8.31, 83),
(88, 8, 'Reggie', 8.41, 84);

Here is the query selecting from just 2 groups (Query took 0.00 sec):

SELECT c1.name AS A, c2.name AS B, c1.cost + c2.cost AS Price, c1.pts + c2.pts AS Points FROM `contest` c1, `contest` c2 WHERE c1.`grp`=1 AND c2.`grp`=2 AND c1.cost + c2.cost < 17 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 3 groups (Query took 0.02 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C,  c1.cost + c2.cost + c3.cost AS Price, c1.pts + c2.pts + c3.pts AS Points FROM `contest` c1, `contest` c2, `contest` c3 WHERE c1.`grp`=1 AND c2.`grp`=2 AND c3.`grp`=3 AND c1.cost + c2.cost + c3.cost < 28 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 4 groups (Query took 0.07 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C, c4.name AS D,  c1.cost + c2.cost + c3.cost + c4.cost AS Price, c1.pts + c2.pts + c3.pts + c4.pts AS Points FROM  `contest` c1,  `contest` c2, `contest` c3, `contest` c4 WHERE c1.`grp` =1 AND c2.`grp` = 2 AND c3.`grp` = 3 AND c4.`grp` = 4 AND c1.cost + c2.cost + c3.cost  + c4.cost < 40 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 5 groups (Query took 0.78 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C, c4.name AS D,  c5.name AS E,  c1.cost + c2.cost + c3.cost + c4.cost  + c5.cost AS Price, c1.pts + c2.pts + c3.pts + c4.pts  + c5.pts AS Points FROM  `contest` c1,  `contest` c2, `contest` c3, `contest` c4, `contest` c5 WHERE c1.`grp` =1 AND c2.`grp` = 2 AND c3.`grp` = 3 AND c4.`grp` = 4 AND c5.`grp` = 5 AND c1.cost + c2.cost + c3.cost + c4.cost + c5.cost <50 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 5 groups (Query took 9.10 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C, c4.name AS D,  c5.name AS E,   c6.name AS F,  c1.cost + c2.cost + c3.cost + c4.cost  + c5.cost  + c6.cost  AS Price, c1.pts + c2.pts + c3.pts + c4.pts  + c5.pts + c6.pts AS Points FROM  `contest` c1,  `contest` c2, `contest` c3, `contest` c4, `contest` c5, `contest` c6 WHERE c1.`grp` =1 AND c2.`grp` = 2 AND c3.`grp` = 3 AND c4.`grp` = 4 AND c5.`grp` = 5 AND c6.`grp` = 6 AND c1.cost + c2.cost + c3.cost + c4.cost + c5.cost + c6.cost <60 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 7 groups (Query took 1 min 48.64 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C, c4.name AS D, c5.name AS E,  c6.name AS F, c7.name AS G, c1.cost + c2.cost + c3.cost + c4.cost  + c5.cost  + c6.cost + c7.cost  AS Price, c1.pts + c2.pts + c3.pts + c4.pts + c5.pts + c6.pts + c7.pts AS Points FROM  `contest` c1,  `contest` c2, `contest` c3, `contest` c4, `contest` c5, `contest` c6, `contest` c7 WHERE c1.`grp` =1 AND c2.`grp` = 2 AND c3.`grp` = 3 AND c4.`grp` = 4 AND c5.`grp` = 5 AND c6.`grp` = 6 AND c7.`grp` = 7 AND c1.cost + c2.cost + c3.cost + c4.cost + c5.cost + c6.cost + c7.cost < 70 ORDER BY Points DESC LIMIT 100;

Here is the query selecting from 8 groups (Query took 26 min 1.32 sec):

SELECT c1.name AS A, c2.name AS B, c3.name AS C, c4.name AS D,  c5.name AS E,  c6.name AS F, c7.name AS G, c8.name AS H, c1.cost + c2.cost + c3.cost + c4.cost  + c5.cost  + c6.cost + c7.cost + c8.cost AS Price, c1.pts + c2.pts + c3.pts + c4.pts + c5.pts + c6.pts + c7.pts + c8.pts AS Points FROM  `contest` c1,  `contest` c2, `contest` c3, `contest` c4, `contest` c5, `contest` c6, `contest` c7, `contest` c8 WHERE c1.`grp` =1 AND c2.`grp` = 2 AND c3.`grp` = 3 AND c4.`grp` = 4 AND c5.`grp` = 5 AND c6.`grp` = 6 AND c7.`grp` = 7 AND c8.`grp` = 8 AND c1.cost + c2.cost + c3.cost + c4.cost + c5.cost + c6.cost + c7.cost + c8.cost < 69 ORDER BY Points DESC LIMIT 100;

I am running an Ubuntu 14.04.1 LTS 30GB Standard Instance on Rackspace Cloud with 8 vCPUs. The MySQL version is 5.5.40-0ubuntu0.14.04.1. I am executing all queries from command line under VI Screen.

Best Answer

Look at https://stackoverflow.com/questions/11442191/parallelizing-a-numpy-vector-operation

Essentially, you could do this in numpy relatively easily, with numexpr to take advantage of multi core.

I haven't tried it yet, so YMMV.