Database timestamp

timestamp

What's the definition of timestamp?

The timestamp is just for a transaction? (no matter what actions are in the transaction. The timestamp is always the time when the transaction happened).

OR the timestamp is for each action of a transaction? (each action has a timestamp.)

T1 | R1(X)                         R1(X)
T2 |       R2(Y) R2(Y) W2(X)
T3 |                         W3(Y)

For instance, when the first R1(X) happened, it has a timestamp TS. But for the second R1(X), the timestamp is same as the TS? OR the timestamp has changed?

For another example, if I have R1(X)W2(X)W1(X), is it view serializable? and is it conflict serializable?

Best Answer

Assuming that you are referring to informal timestamp-based concurrency control, Wikipedia has a pretty good explanation, that should answer your question about the definition of timestamps and use for transactions and actions (or objects). Here is an excerpt (see the actual page for more details):

Assumptions

  • Every timestamp value is unique and accurately represents an instant in time.
  • No two timestamps can be the same.
  • A higher-valued timestamp occurs later in time than a lower-valued timestamp.

Generating a Timestamp

A number of different ways have been used to generate timestamp:

  • Use the value of the system's clock at the start of a transaction as the timestamp.
  • Use a thread-safe shared counter that is incremental at the start of a transaction as the timestamp.
  • A combination of the above two methods.

Whenever a transaction starts, it is given a timestamp. This is so we can tell which order that the transactions are supposed to be applied in. So given two transactions that affect the same object, the transaction that has the earlier timestamp is meant to be applied before the other one. However, if the wrong transaction is actually presented first, it is aborted and must be restarted.

Every object in the database has a read timestamp, which is updated whenever the object's data is read, and a write timestamp, which is updated whenever the object's data is changed.

If a transaction wants to read an object,

  • but the transaction started before the object's write timestamp it means that something changed the object's data after the transaction started. In this case, the transaction is cancelled and must be restarted.
  • and the transaction started after the object's write timestamp, it means that it is safe to read the object. In this case, if the transaction timestamp is after the object's read timestamp, the read timestamp is set to the transaction timestamp.

If a transaction wants to write to an object,

  • but the transaction started before the object's read timestamp it means that something has had a look at the object, and we assume it took a copy of the object's data. So we can't write to the object as that would make any copied data invalid, so the transaction is aborted and must be restarted.
  • and the transaction started before the object's write timestamp it means that something has changed the object since we started our transaction. In this case we use the Thomas Write Rule and simply skip our write operation and continue as normal; the transaction does not have to be aborted or restarted
  • otherwise, the transaction writes to the object, and the object's write timestamp is set to the transaction's timestamp.

Source: Wikipedia - Timestamp-based concurrency control

In addition to the above explanation, CSE 513: Distributed Systems (Concurrency Control) is a pretty well explained, and easy to follow, PDF. It has the same information, explained in a slightly different way, and covers Serialization. There is too much information to reproduce it here, unfortunately.

Hope that helps. Good Luck!