ACID and transaction
Transaction provides data integrity in highly concurrent environments. It’s also called Unit of Work.
- For database systems, transaction is a must-have feature. But also some exceptions, like MySQL MyISAM engine.
- For messaging systems like JMS, transactions are not mandatory which has non-transacted acknowledgement modes.
- File system operations are usually non-managed, but if your business requirements demand transaction file operations, you might make use a tool such as XADisk.
ACID
Inherently a transaction is characterized by four properties (commonly referred as ACID) :
- Atomicity - process
- Consistency - data state
- Isolation - across transactions
- Durability - storage
Atomicity takes individual operations and turns them into an all-or-nothing unit of work, succeeding if and only if all contained operations succeed.
Consistency is about the state change. A transaction must always leave the system in a consistent state.
Isolation brings us the benefit of hiding uncommitted state changes from the outside world (other transactions), as failing transactions shouldn’t ever corrupt the state of the system.Isolation is achieved through concurrency control using pessimistic or optimistic locking mechanisms.
Durability ensure permanently change the state of a system after successful committed transaction, or logging the state changes before commit. Of the four properties, the atomicity is a fixed property, but everything else may be traded-off for performance/scalability reasons.
three phenomena and isolation
The SQL standard defines four Isolation levels:
- READ_UNCOMMITTED
- READ_COMMITTED
- REPEATABLE_READ
- SERIALIZABLE
Dirty read
Transaction 2 reads the uncommitted(temporary or dirty) data of transaction 1. It’s called dirty because the change may be discarded if rollback.
Non-repeatable read
Transaction 2 commits after transaction 1, so the queried record in memory differs in _state_ (changed) from that in database when commit.
Phantom read
Transaction 2 commits after transaction 1, so the queried records in memory differ in _range _(more or less) from that in database when commit.
You can regard phantom read is an extensive version of unrepeatable read.
Even if the SQL standard mandates the use of the SERIALIZABLE isolation level, most database management system use a different default level.
a classic database problem - lost update
transactional scenario
offline concurrency scenario
In this case, we should differentiate the system transaction boundary and business transaction boundary. All the above discussed is about system transaction.
locking and MVCC
Two ways for concurrency control:
- MVCC
- locking But as we all know, locking increases the serializable portion of the executed code, affecting parallelization.
In low level, how the database uses locks or MVCC to control concurrency is very complicated and different from one to another. The explanation is worth explaining in a big book.
Here we consider locking and MVCC in a high level.
(We consider implicit locking is database-specific and ubiquitous when interacting with database using SQL, like read lock, write lock, share lock, mutually exclusive lock, table lock, row lock, range lock, lock escalation etc….the list goes, so many buzzwords!)
Basically, we mention locking just in pessimistic locking context, where locking is explicitly requested. This is from the SQL:
select …. for update/insert/delete And we refer to MVCC mainly for _application-level_ optimistic locking implementation – version column. Other optimistic locking implementations include:
- old/new value comparison in committing time
- checksum or hash computation on original data