Transaction Management
Transaction
-
Logical unit of work
-
Sequence of operations that transform one consistent DB state into
another
-
Operations either executed in their entirety or totally cancelled
Failures
-
local: affecting one transaction, e.g. overflow
-
global: affecting whole system, e.g. power outage
Transaction manager
-
COMMIT: success, consistent state, make all updates permanent
-
ROLLBACK: error, possibly inconsistent, undo all updates (by using a log
or journal to restore previous values)
DB constraints may not be satisfied during the transaction.
ACID:
-
Atomicity: transaction are all or nothing
-
Consistency: from one consistent state to another
-
Isolation: a given transaction's updates are concealed from the rest
running concurrently
-
Durability: once commited updates survive
Recovery
-
take checkpoints
-
after global failure, undo and redo transactions according to timeline
SQL
-
COMMIT and ROLLBACK
-
No BEGIN TRANSACTION in SQL92, but commands are transaction-initiating
(when there is no current transaction; all subsequent statements
until COMMIT or ROLLBACK become part of the transaction)
in
scripts and programs use initial 'COMMIT;' to clarify the situation
- Postgres: BEGIN; ... COMMIT;
-
Interactively usually auto-commit after every statement
Concurrency
Three problems resulting from interleaving of transactions that are
'correct in itself' i.e. do not violate the Golden Rule:
-
lost update (based on current value)
-
A retrieve
-
B retrieve
-
A update
-
B update
-
uncommitted dependency (retrieve or update of value later rolled
back)
-
B update
-
A retrieve (or update based on current value)
-
B rollback
-
inconsistent analysis (retrieve of inconsistent state before
commit)
e.g. accounts 1,2,3 with balances 40,50,30
transaction A is summing, transaction B is transferring 10 from 3 to 1
-
A retrieve tup1
-
A retrieve tup2
-
B retrieve tup3
-
B update tup3
-
B retrieve tup1
-
B update tup1
-
B commit
-
A retrieve tup3
A sees final value of 3 but not of 1
Locking
Locking granularity: individual tuples vs whole table (concurrency vs overhead)
Transactions A and B holding exclusive (X) locks or shared (S) locks on
tuple t
-
If A holds X then
-
all request of B denied, B goes into wait state
-
If A holds S then
-
request for X denied, B into wait state
-
request for S granted
Wait state:
-
until lock released by other transaction (at end of transaction i.e.
COMMIT or ROLLBACK)
-
wait forever (bug): livelock
-
make locks available first-come, first-served
Data access protocol (locking protocol):
-
retrieve t: acquire S lock
-
update t: acquire X lock (or promote existing S to X)
Concurrency problems with locks:
-
uncommitted dependency: solved, A always sees a committed value
-
lost update: becomes deadlock, A and B waiting for X lock
-
inconsistent analysis: deadlock
-
B waiting for X lock on tup1 (S held by A)
-
A waiting for S lock on tup3 (X held by B)
Two options to detect deadlock:
-
search cycle in wait-for graph
-
timeout means deadlock
Then:
-
select victim, rollback
-
Two further options:
-
restart (good)
-
send 'deadlock victim' error (bad)
Serializing
Schedule: execution of transactions
Criterion for correctness:
-
individual transactions are correct i.e. transform a correct state of
the DB into another correct state
-
running transactions one at at time in any order is correct (transactions
are assumed to be independent of one another)
-
interleaved execution of transactions
-
only correct if serializable
-
i.e. equivalent to (produces the same results as) some serial
execution
Note that schedules S1 and S2 may be correct and produce different
results!
Eswaran's two-phase locking theorem:
-
If all transactions obey the two-phase locking protocol,
-
then all possible interleaved schedules are serializable
Two-phase locking protocol:
-
(growing phase) Before operating on any t, the transaction must acquire
a lock on t
-
(shrinking phase) After releasing a lock, a transaction must never
acquire any more locks
Note: the two-phase locking protocol can still lead to deadlocks.
-
A interleaved with B
-
deadlock
-
rollback A and restart
-
equivalent to B then A
Isolation Levels
-
SQL isolation levels using SET TRANSACTION ..
-
READ UNCOMMITTED
-
READ COMMITTED
-
REPEATABLE READ
-
SERIALIZABLE
-
default highest level SERIALIZABLE (SQL92)
-
i.e. interleaved execution guaranteed to be serializable
-
for performance reasons lower levels may be desired
PostgreSQL
default: READ COMMITTED
MySQL InnoDB default: REPEATABLE READ
Problems with lower isolation levels (violations of serializability):
-
dirty read - A transaction reads data written by a concurrent uncommitted transaction. (Postgres definition)
-
T1 update tup1
-
T2 retrieves tup1
-
T1 rollback
-
nonrepeatable read - A transaction re-reads data it has previously read and finds that
data has been modified by another transaction (that committed since the initial read). (Postgres definition)
-
T1 retrieves tup1
-
T2 updates tup1
-
T1 retrieves tup1 again
-
phantoms - A transaction re-executes a query returning a set of rows that satisfy a search
condition and finds that the set of rows satisfying the condition has changed due to another recently-committed
transaction. (Postgres definition)
-
T1 retrieves set satisfying condition
-
T2 inserts tuple
-
T1 retrieves again: new tuple
Isolation levels and associated violations:
|
dirty
|
nonrep
|
phantom
|
READ UNCOMMITTED
|
Y
|
Y
|
Y
|
READ COMMITTED
|
N
|
Y
|
Y
|
REPEATABLE READ
|
N
|
N
|
Y
|
SERIALIZABLE
|
N
|
N
|
N
|
References:
PostgreSQL 9.1.14 Documentation -
13.2. Transaction Isolation