Mysql – Php, MySql – Optimisation

MySQLoptimizationPHP

I asked this on the general site, but they suggested I come here for ultimate optimisation. It really boils down to which is faster; doing a load of calculations in php with database stuff that is accessed once, or doing two database calls. (There are about 1000-1500 table entries, and I have to use the lot.)

Here is my original question:

I am not a great php coder (I come from C++). I am using php only for database entry.

I have a database with the following:

UserId (an unique int)
AsyncPointsAverage (float)
AsyncPointsAverageRank (a position based on the value immediately above)
AsyncPointsRecentAverage (float an average for the last 5 tests only)
AsyncPointsRecentAverageRank (a position based on the value immediately above)

There are about 1000-1500 entries in that table. Every morning and afternoon 5 people take a test which effects their overall average and recent average. (This is updated elsewhere, but not shown here.) After that is calculated for those 5 people, then the rankings of all 1000-1500 will be effected, so I have written the code below. Is it optimal?

The thing I am most concerned with is I am doing a MySql UPDATE about a 1000 times. Is that great? Should I be doing it another way? (Also feel free to optimise any other code in the function. As I say, I am from a C++ background, so do not really know the nuances of php.)

// Sorts by array entry 1
function ReRankCompareAverage($a, $b)
{
    if($a[1] == $b[1]) return 0;
    else return ($a[1] > $b[1] ? 1 : -1);
}
// Sorts by array entry 2
function ReRankCompareAverageRecent($a, $b)
{
    if($a[2] == $b[2]) return 0;
    else return ($a[2] > $b[2] ? 1 : -1);
}

function ReRank($db)
{
    $i = 0, $j = 0;
    $usersARR = null;

    $stmt = $db->prepare("SELECT UserId, AsyncPointsAverage, AsyncPointsRecentAverage FROM studenttable");
    $stmt->execute();
    if($stmt && isset($stmt) && $stmt->rowCount() > 0)
    {
        $i = 0;
        while(($row = $stmt->fetch(PDO::FETCH_ASSOC)))
        {
            $usersARR[$i][0] = intval($row['UserId']);
            $usersARR[$i][1] = floatval($row['AsyncPointsAverage']);
            $usersARR[$i][2] = floatval($row['AsyncPointsRecentAverage']);
            $i++;
         }
    }
    $stmt->closeCursor(); // mysql_free_result equivalent

    // The first pass of $j == 3 does the ranking by Average, filling position $usersARR[][3] with that rank
    // The second pass of $j == 4 does the ranking by AverageRecent, filling position $usersARR[][4] with that rank
    for($j = 3, $j <= 4; $j++)
    {
        $iCompare = $j == 3 ? 1 : 2;

        usort($usersARR, $j == 3 ? "ReRankCompareAverage" : "ReRankCompareAverageLast");
        $count = count($usersARR);
        if($count > 0)
        {
            // Start it off, with the person with the highest average is rank 1
            $usersARR[$count - 1][$j] = 1; // Position $j is filled with the rank
            // Now loop starting from the second one down
            for($i = $count - 2, $rank = 1; $i >= 0; $i--)
            {
                // Only change the rank if the next one down is strictly lower than the one above, otherwise will share the same rank
                if($usersARR[$i][$iCompare] < $usersARR[$i+1][$iCompare]) $rank = $count - $i; // Otherwise keep the same rank, because they are equal
                $usersARR[$count - 1][$j] = $rank;
            }
        }
     }

     // Now $usersARR is filled with the correct rankings, and they are asscoiated with $UserId
    // Now we must put all of these rankings into the database
    $count = count($usersARR);
    for($i = 0; $i < $count; $i++)
    {
         $stmt = $db->prepare("UPDATE studenttable SET AsyncPointsAverageRank=:AsyncPointsAverageRank, AsyncPointsRecentAverageRank=:AsyncPointsRecentAverageRank "
                        . "WHERE UserId=:UserId");
         $stmt->execute(array(':AsyncPointsAverageRank' => $usersARR[$i][3],
                        ':AsyncPointsRecentAverageRank' => $usersARR[$i][4],
                        ':UserId' => $usersARR[$i][0]));
    }
}

After several answers, including not having any rankings in the database, but doing the rankings instead on-the-fly with:

SET @rank=0; SELECT @rank := @rank +1 AS rank, UserId, AsyncPointsAverage FROM studenttable ORDER BY AsyncPointsAverage DESC

This obviously has one draw-back, namely it does not cater for the follwoing scenario: if the 2nd and 3rd ranked people have 88%, they will both be ranked 2nd. The next person will then be ranked 4th (leaving rank 3rd out entirely.)

Another person has suggested I do something like:

SQL

SELECT
    UserId,
    AsyncPointsAverage,
    AsyncPointsAverageRank
FROM
    studenttable
ORDER BY
    AsyncPointsAverage DESC

PHP

$stmt = $db->prepare("SEE ABOVE...");
$stmt->execute();

if( $stmt && isset( $stmt ) && $stmt->rowCount() ) {
    $rank = 1;
    $last_grade = -1;

    while( ( $row = $stmt->fetch( PDO::FETCH_ASSOC ) ) ) {
        $usersARR[$i][0] = intval($row['UserId']);
        $usersARR[$i][1] = floatval($row['AsyncPointsAverage']);
        $usersARR[$i][2] = floatval($row['AsyncPointsRecentAverage']);

        if( $usersARR[$i][1] < $last_grade ) {
            $rank++;
        }

        $usersARR[$i][3] = $rank;

        $last_grade = $usersARR[$i][1];
    }
}

but obviously this has to be done again for the RECENT Average, ie you are doing two calls to the database.

So please be super-critical of my code. I am not a php or mysql expert, so I will not be offended in any way. The way I see it, I want to learn.

Best Answer

Debate

Once you get more fluent in SQL, you will realize that many queries are much shorter to write in SQL than in PHP. Since "programmer time" is worth a lot, picking the shorter one is often best.

Doing the 'work' in SQL may or may not be faster. For example, SELECT AVG(x) FROM big_table returns only one row, thereby having much less network traffic than if you copied the entire table into PHP, thereby probably being faster.

GROUP BY, ORDER BY, subqueries, etc are much more concise in SQL.

JOIN is probably always faster in SQL -- otherwise you have to go back and forth between PHP and MySQL to do the lookups in the 'second' table.

Millions of rows might choke PHP, whereas SQL can handle virtually unlimited number of rows.

But, on the other hand, SQL has to work a lot harder to fetch a thousand rows than PHP simply walking through an array.

What do I do? Whatever seems appropriate each case case. As for "1000 rows", that's pretty trivial for either MySQL or PHP. I usually embed timers in my code (microtime(true)); if a web page seems slow, I look for the 'worst' part of it to optimize.

What should you do? Do it in SQL. It's a learning experience. Back off to PHP only if it gets too tough.