Mysql – Creating a query to pull very specific information using four tables

MySQLPHP

I have four tables, with a structure like this:

testData

This table holds all test data for every user. Each row associates one response for a question, divided up by test. A user can have many tests, a test can have many unique questions, a question can only have one response.

testUUID                               questionUUID                           answerUUID
===================================================================================================================
0d09eea5-4015-4777-92db-44cf48c9b3d6   fd7485fd-5c99-4d26-bdbd-a559b9e47d14   00044311-b9fe-411f-b125-61a40c090068
...(~2 million rows)

answerData

This table stores the actual answers to the questions. There is a similarly structured questionData table, but I can work that in later.

uuid                                   answerText              answerCorrect
============================================================================
00044311-b9fe-411f-b125-61a40c090068   this is an answer       1
00044234-c820-432a-2003-239ffbbac982   this is another answer  0

testHistory

This stores data about each test a user has taken.

uuid                                   userUUID    (and lots of other columns including score)
==============================================================================================
0d09eea5-4015-4777-92db-44cf48c9b3d6   01b7f715-4142-4436-b527-26b5158578f7

userData

This stores data about the user.

uuid                                   userFirstName (etc.)
===========================================================
01b7f715-4142-4436-b527-26b5158578f7   tehbing

I need to figure out…. across all tests that a user has taken, find the questions that are missed the most so they can study those particular questions more often. Ideally, questionUUID should be in one column and a count for the amount of times a question was missed in another column. I can do this in PHP, but I feel that this is something the database can do much faster if structured correctly.

Indexes for the two biggest tables

testData table
Keyname         Type    Unique  Packed  Column          Cardinality Collation   Null
testQuestion    BTREE   Yes     No      testUUID        39354       A           No  
                                        questionUUID    1692249     A           No
testUUID        BTREE   No      No      testUUID        34535       A           No  
questionUUID    BTREE   No      No      questionUUID    11281       A           No  
answerUUID      BTREE   No      No      answerUUID      54588       A           No


testHistory table
Keyname         Type    Unique  Packed  Column          Cardinality Collation   Null
PRIMARY         BTREE   Yes     No  uuid                44965       A           No  
userUUID        BTREE   No      No  userUUID            5620        A           No  
                                    totalQuestions      5620        A           No
                                    questionsMissed     44965       A           No
                                    testScore           44965       A           No
                                    testTimeStarted     44965       A           No
                                    testTimeCompleted   44965       A           No
testArchived    BTREE   No      No  testArchived        4           A           Yes

Best Answer

Using the hint Rick James provided, I came up with the following. I'm not sure about efficiency, but it gives me the results I was looking for.

SELECT
    testData.questionUUID,
    COUNT(testData.answerUUID) AS wrongAnswerCount
FROM testData
    LEFT JOIN answerData ON answerData.uuid=testData.answerUUID
    LEFT JOIN testHistory ON testHistory.uuid=testData.testUUID
WHERE testHistory.userUUID = ?
    AND answerData.answerCorrect=0
GROUP BY testData.questionUUID
HAVING COUNT(testData.answerUUID) > 5
ORDER BY wrongAnswerCount DESC