Mysql – Performance of updating row if content has changed

MySQLperformancequery-performance

I'm trying to make a website where there is a form that contains input type fields that are editable. In fact I get the elements from one of my tables (containing an id column and a string column) from my database, which I display via input with id as the id in this database and string as the string in this database. The user can edit the text in these fields and also there is the possibility to add a new 'row' (input with string, but which refers to no row in the database).

Finally, there is a button to save.

My question is do I directly modify the existing lines even if the user has not changed the string or should I check if the string has been updated? Which method is best in terms of performance?

Best Answer

If your business logic layer(s) know that there has been no change (i.e. it has already re-read the data and can compare it to the provided input then you can simply not send anything to the DB.

If you don't know before hitting the data layer if there has been any change or not then you can do something like:

UPDATE yourTable
SET    field1 = @input1
  ,    field2 = @input2
       ...   ...   ...
  ,    fieldN = @inputN
WHERE  idField = @ID
AND    (
           field1 <> @field1
        OR field2 <> @field2
           ...   ...   ...
        OR fieldN <> @fieldN
       )

This avoids any update if nothing has changed so it more efficient (the lookups needed to reject the update would be needed to perform it anyway and the extra CPU time for the comparison is vanishingly small unless you have huge data in any of the columns), but it is more work to maintain and more prone to errors so most of the time I would recommend keeping it simple and updating without checking every column, i.e. the standard:

UPDATE yourTable
SET    field1 = @input1
  ,    field2 = @input2
       ...   ...   ...
  ,    fieldN = @inputN
WHERE  idField = @ID
-- note no comparison, just update

The maintainability issue is even worse for NULLable columns as fieldN <> @fieldN has to be changed to (`fieldN <> @fieldN OR (fieldN IS NULL AND @fieldN IS NOT NULL) OR (fieldN IS NOT NULL AND @fieldN IS NULL) because comparisons between NULLs are always false (NULL is not equal to NULL, but it is also not not equal to NULL - NULL by definition is unknown so all comparisons (=, !=, <, >, ...) return false unless your DB supports non-ansi behaviour like https://stackoverflow.com/questions/9766717/). If this is all baked into you data access layer then fine, some ORMs for instance provide such optimisation as an option, but you don't want to be manually writing and maintaining it for every update statement in your application.

The DB (ever engine I'm aware of) will perform a full write of the relevant data pages even if no values have changed - they won't spend the few extra CPU cycles checking. Away from the efficiency of the update itself you need to consider knock-on effects and certain business requirements. If there are triggers that take action on updates to the table they will fire even if nothing has changed, this could be a significant consideration if you have any complex triggers to consider. From a business logic point of view do you need to log that the user actively saved the record with no changes? If you are relying on automated change capture of some sort (such as your own triggers, temporal table features in postgres and the upcoming MSSQL2016) to track changes then you (or your end users) need to decide if the data model should care that a no-change update happened or not. Of course if the record includes a "last saved" column this all becomes a moot point - there is always something to update upon save if only that one column.

Related Solutions

Mysql – Database schema – performance v correctness information

I concur with your speculation that it's bad design practice. I'd recommend against it. Any time you do this sort of thing, it almost inevitably comes back to bite and the performance difference compared to a properly designed schema should be small.

You say that as it stands now, your process "would involve searching through the first table for all question_ids answered by the user, then searching the topics_and_questions table" but that's not really a good analysis of what should happen behind the scenes.

Properly designed, this process would involve checking an index on the first table for all questions answered by the user, joining to the second table to find the topics of the questions (and possibly a third table to find the names of the topics), but all in one query and with no full table scans.

If your first table (no table name provided) has (user_id, question_id) as its primary key (as it probably should), then finding the questions answered by each user will be a fast operation, no matter how many records are in this table. Rows in InnoDB are stored in primary key order, so there's no "searching though" the entire table that needs to happen... the storage engine can go directly to where that user's records are without a full table scan. If (user_id,question_id) isn't the primary key, then adding an index on (user_id,question_id) will optimize this part of the query.

In topics_and_questions either (question_id) or (question_id,topic_id) should probably be the primary key, depending on whether a question can be in multiple topics. If it can, of course, then counting questions answered becomes more complicated.

Since you appear to be allowing a user to answer the same question more than once, there may be yet another issue to consider: you actually have to duplicate part of what you're trying to avoid, every time you update this new proposed table: You have to check whether the question just answered was one that's been answered before, or not, and only increment this counter, if it is, I would assume.

Mysql – Performance impact of including result column in indexed columns

Based on the information given, I will take a shot at an answer...

You stated

If I add identifiedtex as the last column indexed in the index, will this speed up a lookup of the texture since the result is part of the index?

Let me say that you have the following query:

SELECT identifiedtex FROM texels
WHERE texelx = 210938
AND texely = 378432
AND texelhue = 23;

What would be the effect of having an index (call it Index3)

texelx, texely, texelhue

versus this index (call it Index4)

texelx, texely, texelhue, identifiedtex

When you retrieve identifiedtex using only Index3, it will require an additional table lookup to ascertain the identifiedtex from the table.

When you retrieve identifiedtex using only Index4, it will an index-only lookup because all mentioned columns are in the index. It is known as a covering index.

I mentioned covering indexes in my past posts:

Mar 12, 2012 : How to index this table (a_level, b_level, item_id)
Oct 17, 2012 : Combining columns in index
Nov 13, 2012 : Must an index cover all selected columns for it to be used for ORDER BY? (See My Conclusion)

So to answer your original question, yes adding identifiedtex will index make for a faster query.

Best Answer

Related Solutions

Mysql – Database schema – performance v correctness information

Mysql – Performance impact of including result column in indexed columns

Related Question