Transaction Management

Transaction

Logical unit of work
Sequence of operations that transform one consistent DB state into another
Operations either executed in their entirety or totally cancelled

Failures

local: affecting one transaction, e.g. overflow
global: affecting whole system, e.g. power outage

Transaction manager

COMMIT: success, consistent state, make all updates permanent
ROLLBACK: error, possibly inconsistent, undo all updates (by using a log or journal to restore previous values)

DB constraints may not be satisfied during the transaction.

ACID:

Atomicity: transaction are all or nothing
Consistency: from one consistent state to another
Isolation: a given transaction's updates are concealed from the rest running concurrently
Durability: once commited updates survive

Recovery

take checkpoints
after global failure, undo and redo transactions according to timeline

SQL

COMMIT and ROLLBACK
No BEGIN TRANSACTION in SQL92, but commands are transaction-initiating
(when there is no current transaction; all subsequent statements until COMMIT or ROLLBACK become part of the transaction)
in scripts and programs use initial 'COMMIT;' to clarify the situation
Postgres: BEGIN; ... COMMIT;
Interactively usually auto-commit after every statement

Concurrency

Three problems resulting from interleaving of transactions that are 'correct in itself' i.e. do not violate the Golden Rule:

lost update (based on current value)
1. A retrieve
2. B retrieve
3. A update
4. B update
uncommitted dependency (retrieve or update of value later rolled back)
1. B update
2. A retrieve (or update based on current value)
3. B rollback
inconsistent analysis (retrieve of inconsistent state before commit)
e.g. accounts 1,2,3 with balances 40,50,30
transaction A is summing, transaction B is transferring 10 from 3 to 1
1. A retrieve tup1
2. A retrieve tup2
3. B retrieve tup3
4. B update tup3
5. B retrieve tup1
6. B update tup1
7. B commit
8. A retrieve tup3
A sees final value of 3 but not of 1

Locking

Locking granularity: individual tuples vs whole table (concurrency vs overhead)

Transactions A and B holding exclusive (X) locks or shared (S) locks on tuple t

If A holds X then
- all request of B denied, B goes into wait state
If A holds S then
- request for X denied, B into wait state
- request for S granted

Wait state:

until lock released by other transaction (at end of transaction i.e. COMMIT or ROLLBACK)
wait forever (bug): livelock
make locks available first-come, first-served

Data access protocol (locking protocol):

retrieve t: acquire S lock
update t: acquire X lock (or promote existing S to X)

Concurrency problems with locks:

uncommitted dependency: solved, A always sees a committed value
lost update: becomes deadlock, A and B waiting for X lock
inconsistent analysis: deadlock
- B waiting for X lock on tup1 (S held by A)
- A waiting for S lock on tup3 (X held by B)

Two options to detect deadlock:

search cycle in wait-for graph
timeout means deadlock

Then:

select victim, rollback
Two further options:
- restart (good)
- send 'deadlock victim' error (bad)

Serializing

Schedule: execution of transactions

Criterion for correctness:

individual transactions are correct i.e. transform a correct state of the DB into another correct state
running transactions one at at time in any order is correct (transactions are assumed to be independent of one another)
interleaved execution of transactions
- only correct if serializable
- i.e. equivalent to (produces the same results as) some serial execution

Note that schedules S1 and S2 may be correct and produce different results!

Eswaran's two-phase locking theorem:

If all transactions obey the two-phase locking protocol,
then all possible interleaved schedules are serializable

Two-phase locking protocol:

(growing phase) Before operating on any t, the transaction must acquire a lock on t
(shrinking phase) After releasing a lock, a transaction must never acquire any more locks

Note: the two-phase locking protocol can still lead to deadlocks.

A interleaved with B
deadlock
rollback A and restart
equivalent to B then A

Isolation Levels

SQL isolation levels using SET TRANSACTION ..
1. READ UNCOMMITTED
2. READ COMMITTED
3. REPEATABLE READ
4. SERIALIZABLE
default highest level SERIALIZABLE (SQL92)
i.e. interleaved execution guaranteed to be serializable
for performance reasons lower levels may be desired
PostgreSQL default: READ COMMITTED
MySQL InnoDB default: REPEATABLE READ

Problems with lower isolation levels (violations of serializability):

dirty read - A transaction reads data written by a concurrent uncommitted transaction. (Postgres definition)
1. T1 update tup1
2. T2 retrieves tup1
3. T1 rollback
nonrepeatable read - A transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read). (Postgres definition)
1. T1 retrieves tup1
2. T2 updates tup1
3. T1 retrieves tup1 again
phantoms - A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction. (Postgres definition)
1. T1 retrieves set satisfying condition
2. T2 inserts tuple
3. T1 retrieves again: new tuple

Isolation levels and associated violations:

	dirty	nonrep	phantom
READ UNCOMMITTED	Y	Y	Y
READ COMMITTED	N	Y	Y
REPEATABLE READ	N	N	Y
SERIALIZABLE	N	N	N

References:

PostgreSQL 9.1.14 Documentation - 13.2. Transaction Isolation