Postgresql – Determining which isolation level is appropriate

isolation-leveloptimizationpostgresqlrdbms

This is a homework question.

For the following transactions state the isolation level that will
maximize throughput without lowering the integrity of the database.
Explain the answer.

Change the course identified by coursed_id = ’CPSC1350’ from one
department to a different one.

The Courses table contains information about courses: their id, their name, the
department that offers it, the id of its instructor, and the maximum
number of students who can take it (max_size). Courses(coursed_id:
string, cname: string, dept: string, instructor_id: string,
max_size:integer) – Primary Key: coursed_id – Foreign Key:
instructor_id references Instructors

Assume that PostgreSQL is used.

I believe the transaction can be done using read committed because

dirty reads are not OK because there is an update going on, thus a read is involved
non-repeatable reads are OK because it is unlikely someone else is changing the value
phantoms are OK because there are no SELECT statements

Am I on the right track?

Best Answer

The question seems like a puzzle which appears very simple, but maybe not so simple ,or it just pretends to be complex... I'll try my best answering it as I understand it. I apologize if I misunderstood some obvious hints.

With PostgreSQL, there is no real Read uncommitted - you get Read committed. Quoting the documentation:

In PostgreSQL, you can request any of the four standard transaction isolation levels, but internally only three distinct isolation levels are implemented, i.e. PostgreSQL's Read Uncommitted mode behaves like Read Committed. This is because it is the only sensible way to map the standard isolation levels to PostgreSQL's multiversion concurrency control architecture.

The question doesn't sound very clear to me because isolation levels matter when you have simultaneous queries, and no other queries mentioned, so don't blame me too much if I miss something.

If we put remaining isolation levels in the order "Read committed"->"Repeatable read"->"Serializable", the overhead grows accordingly. So we need to check them in the same order, and once we are satisfied, there is no need to check remaining levels (they will be fine, but with more overhead). As far as I understand, UPDATE Courses set dept = 'New department' coursed_id = ’CPSC1350’ does the required work. From Postgre documentation that describes Read committed isolation level,

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

I'd say I'm satisfied with such behaviour in this case, so I'll go with READ COMMITTED unless there are other restrictions not mentioned in the question.

Related Solutions

Postgresql – Read Committed Isolation Level

The difference lies between a query and a transaction. A transaction can contain any number of queries. To illustrate the difference, I set up a small example:

CREATE TABLE table_to_be_updated (
    id serial PRIMARY KEY,
    other_column text,
    column_changing text
);

INSERT INTO table_to_be_updated (other_column, column_changing)
VALUES
('value', 'old_value'),
('value', 'other_value'),
('nonvalue', 'doesnt matter');

Then run two transactions concurrently (issuing the commands one by one, the middle line wants to depict the timeline):

                                | <-- BEGIN;
                                |
                                |
                                |     UPDATE table_to_be_updated
BEGIN; -----------------------> |     SET column_changing = 'new_value'
                                |     WHERE
                                |         other_column = 'value' AND
                                |         column_changing = 'old_value';
                                |
                                |
SELECT column_changing -------> |     -- update not yet committed
FROM table_to_be_updated        |
WHERE other_column = 'value';   | <-- COMMIT;
                                |
                                |
SELECT column_changing -------> |
FROM table_to_be_updated        |
WHERE other_column = 'value';   |
                                |
                                |
COMMIT; ----------------------> |

Running these in READ COMMITTED isolation level, the first query returns a row with 'old_value', while the second one shows a row with 'new_value'. On an other run, I change the left-hand-side transaction isolation level:

SET transaction ISOLATION LEVEL REPEATABLE READ;

(The command must be the first statement in a transaction.) Now both SELECTs return the same rowset, while a third one after committing both transactions will show the new row.

Why is READ COMMITTED a common default transaction isolation level

Though, I don't remember any reference mentioned in BOL thus can't provide it here but as per my understanding it is related to locking. Higher level of isolation level can cause locking issues. READ COMMITTED ISOLATION is more towards write locks than read, which fits good in OLTP environment when compared to OLAP. Commercial databases decides default isolation level which fits perfect for their internal algorithm. Choosing isolation level depends upon how RDBMS wants to deal with locking and care about reading correct data. Most of the RDBMS prefers READ COMMITTED for faster read, performance and to minimize locking.

A lower isolation level increase concurrency and descrease waiting for other transaction but increase the chances of reading incorrect data. However, a higher isolation level decreases concurrency and increases waiting for other transaction, but decreases the chance of reading incorrect data.

As per Wikipedia

The default isolation level of different DBMS's varies quite widely. Most databases that feature transactions allow the user to set any isolation level. Some DBMS's also require additional syntax when performing a SELECT statement to acquire locks (e.g. SELECT ... FOR UPDATE to acquire exclusive write locks on accessed rows).

However, the definitions above have been criticized [3] as being ambiguous, and as not accurately reflecting the isolation provided by many databases:

This paper shows a number of weaknesses in the anomaly approach to defining isolation levels. The three ANSI phenomena are ambiguous. Even their broadest interpretations do not exclude anomalous behavior. This leads to some counter-intuitive results. In particular, lock-based isolation levels have different characteristics than their ANSI equivalents. This is disconcerting because commercial database systems typically use locking. Additionally, the ANSI phenomena do not distinguish among several isolation levels popular in commercial systems.

You can read the complete synopsis here.

Best Answer

Related Solutions

Postgresql – Read Committed Isolation Level

Why is READ COMMITTED a common default transaction isolation level

Related Question