Transactions and concurrency¶
Transactions are a core feature of ZODB. Much has been written about transactions, and we won’t go into much detail here. Transactions provide two core benefits:
When a transaction executes, it succeeds or fails completely. If some data are updated and then an error occurs, causing the transaction to fail, the updates are rolled back automatically. The application using the transactional system doesn’t have to undo partial changes. This takes a significant burden from developers and increases the reliability of applications.
Transactions provide a way of managing concurrent updates to data. Different programs operate on the data independently, without having to use low-level techniques to moderate their access. Coordination and synchronization happen via transactions.
All activity in ZODB happens in the context of database connections and transactions. Here’s a simple example:
import ZODB, transaction db = ZODB.DB(None) # Use a mapping storage conn = db.open() conn.root.x = 1 transaction.commit()
In the example above, we used
transaction.commit() to commit a
transaction, making the change to
conn.root permanent. This is
the most common way to use ZODB, at least historically.
If we decide we don’t want to commit a transaction, we can use
conn.root.x = 2 transaction.abort() # conn.root.x goes back to 1
In this example, because we aborted the transaction, the value of
conn.root.x was rolled back to 1.
There are a number of things going on here that deserve some explanation. When using transactions, there are three kinds of objects involved:
Transactions represent units of work. Each transaction has a beginning and an end. Transactions provide the
- Transaction manager
Transaction managers create transactions and provide APIs to start and end transactions. The transactions managed are always sequential. There is always exactly one active transaction associated with a transaction manager at any point in time. Transaction managers provide the
- Data manager
Data managers manage data associated with transactions. ZODB connections are data managers. The details of how they interact with transactions aren’t important here.
Explicit transaction managers¶
ZODB connections have transaction managers associated with them when
they’re opened. When we call the database
without an argument, a thread-local transaction manager is used. Each
thread has its own transaction manager. When we called
transaction.commit() above we were calling commit on the
thread-local transaction manager.
Because we used a thread-local transaction manager, all of the work in the transaction needs to happen in the same thread. Similarly, only one transaction can be active in a thread.
If we want to run multiple simultaneous transactions in a single
thread, or if we want to spread the work of a transaction over
multiple threads 5,
then we can create transaction managers ourselves and pass them to
my_transaction_manager = transaction.TransactionManager() conn = db.open(my_transaction_manager) conn.root.x = 2 my_transaction_manager.commit()
In this example, to commit our work, we called
commit() on the
transaction manager we created and passed to
In the examples above, the transaction beginnings were
implicit. Transactions were effectively
6 created when the transaction
managers were created and when previous transactions were committed.
We can create transactions explicitly using
A more modern 7 way to manage transaction
boundaries is to use context managers and the Python
statement. Transaction managers are context managers, so we can use
them with the
with statement directly:
with my_transaction_manager as trans: trans.note(u"incrementing x") conn.root.x += 1
When used as a context manager, a transaction manager explicitly begins a new transaction, executes the code block and commits the transaction if there isn’t an error and aborts it if there is an error.
as trans above to get the transaction.
Databases provide the
transaction() method to execute a code
block as a transaction:
with db.transaction() as conn2: conn2.root.x += 1
This opens a connection, assignes it its own context manager, and
executes the nested code in a transaction. We used
as conn2 to
get the connection. The transaction boundaries are defined by the
Getting a connection’s transaction manager¶
In the previous example, you may have wondered how one might get the
current transaction. Every connection has an associated transaction
manager, which is available as the
So, for example, if we wanted to set a transaction note:
with db.transaction() as conn2: conn2.transaction_manager.get().note(u"incrementing x again") conn2.root.x += 1
Here, we used the
get() method to get
the current transaction.
In the last few examples, we used a connection opened using
transaction(). This was distinct from and used a
different transaction manager than the original connection. If we
looked at the original connection,
conn, we’d see that it has the
same value for
x that we set earlier:
>>> conn.root.x 3
This is because it’s still in the same transaction that was begun when a change was last committed against it. If we want to see changes, we have to begin a new transaction:
>>> trans = my_transaction_manager.begin() >>> conn.root.x 5
ZODB uses a timestamp-based commit protocol that provides snapshot isolation. Whenever we look at ZODB data, we see its state as of the time the transaction began.
As mentioned in the previous section, each connection sees and operates on a view of the database as of the transaction start time. If two connections modify the same object at the same time, one of the connections will get a conflict error when it tries to commit:
with db.transaction() as conn2: conn2.root.x += 1 conn.root.x = 9 my_transaction_manager.commit() # will raise a conflict error
If we executed this code, we’d get a
ConflictError exception on the
last line. After a conflict error is raised, we’d need to abort the
transaction, or begin a new one, at which point we’d see the data as
written by the other connection:
>>> my_transaction_manager.abort() >>> conn.root.x 6
The timestamp-based approach used by ZODB is referred to as an optimistic approach, because it works best if there are no conflicts.
The best way to avoid conflicts is to design your application so that multiple connections don’t update the same object at the same time. This isn’t always easy.
Sometimes you may need to queue some operations that update shared data structures, like indexes, so the updates can be made by a dedicated thread or process, without making simultaneous updates.
The most common way to deal with conflict errors is to catch them and retry transactions. To do this manually involves code that looks something like this:
max_attempts = 3 attempts = 0 while True: try: with transaction.manager: ... code that updates a database except transaction.interfaces.TransientError: attempts += 1 if attempts == max_attempts: raise else: break
In the example above, we used
transaction.manager to refer to the
thread-local transaction manager, which we then used used with the
with statement. When a conflict error occurs, the transaction
must be aborted before retrying the update. Using the transaction
manager as a context manager in the
with statement takes care of this
The example above is rather tedious. There are a number of tools to automate transaction retry. The transaction package provides a context-manager-based mechanism for retrying transactions:
for attempt in transaction.manager.attempts(): with attempt: ... code that updates a database
Which is shorter and simpler 1.
For Python web frameworks, there are WSGI 2 middle-ware components, such as repoze.tm2 that align transaction boundaries with HTTP requests and retry transactions when there are transient errors.
For applications like queue workers or cron jobs, conflicts can sometimes be allowed to fail, letting other queue workers or subsequent cron-job runs retry the work.
ZODB provides a conflict-resolution framework for merging conflicting changes. When conflicts occur, conflict resolution is used, when possible, to resolve the conflicts without raising a ConflictError to the application.
Commonly used objects that implement conflict resolution are
Length objects provided by the BTree package.
The main data structures provided by BTrees, BTrees and TreeSets, spread their data over multiple objects. The leaf-level objects, called buckets, allow distinct keys to be updated without causing conflicts 3.
Length objects are conflict-free counters that merge changes by
simply accumulating changes.
Conflict resolution weakens consistency. Resist the temptation to try to implement conflict resolution yourself. In the future, ZODB will provide greater control over conflict resolution, including the option of disabling it.
It’s generally best to avoid conflicts in the first place, if possible.
ZODB and atomicity¶
ZODB provides atomic transactions. When using ZODB, it’s important to align work with transactions. Once a transaction is committed, it can’t be rolled back 4 automatically. For applications, this implies that work that should be atomic shouldn’t be split over multiple transactions. This may seem somewhat obvious, but the rule can be broken in non-obvious ways. For example a Web API that splits logical operations over multiple web requests, as is often done in REST APIs, violates this rule.
Partial transaction error recovery using savepoints¶
A transaction can be split into multiple steps that can be rolled back individually. This is done by creating savepoints. Changes in a savepoint can be rolled back without rolling back an entire transaction:
import ZODB db = ZODB.DB(None) # using a mapping storage with db.transaction() as conn: conn.root.x = 1 conn.root.y = 0 savepoint = conn.transaction_manager.savepoint() conn.root.y = 2 savepoint.rollback() with db.transaction() as conn: print([conn.root.x, conn.root.y]) # prints 1 0
If we executed this code, it would print 1 and 0, because while the initial changes were committed, the changes in the savepoint were rolled back.
A secondary benefit of savepoints is that they save any changes made before the savepoint to a file, so that memory of changed objects can be freed if they aren’t used later in the transaction.
Concurrency, threads and processes¶
ZODB supports concurrency through transactions. Multiple programs 8 can operate independently in separate transactions. They synchronize at transaction boundaries.
The most common way to run ZODB is with each program running in its own thread. Usually the thread-local transaction manager is used.
You can use multiple threads per transaction and you can run multiple transactions in a single thread. To do this, you need to instantiate and use your own transaction manager, as described in Explicit transaction managers. To run multiple transaction managers simultaneously in a thread, you need to use a separate transaction manager for each transaction.
To spread a transaction over multiple threads, you need to keep in mind that database connections, transaction managers and transactions are not thread-safe. You have to prevent simultaneous access from multiple threads. For this reason, using multiple threads with a single transaction is not recommended, but it is possible with care.
Using multiple processes¶
Using multiple Python processes is a good way to scale an application horizontally, especially given Python’s global interpreter lock.
Some things to keep in mind when utilizing multiple processes:
If using the
multiprocessingmodule, you can’t 9 share databases or connections between processes. When you launch a subprocess, you’ll need to re-instantiate your storage and database.
You’ll need to use a storage such as ZEO, RelStorage, or NEO, that supports multiple processes. None of the included storages do.
But also a bit obscure. The Python context-manager mechanism isn’t a great fit for the transaction-retry use case.
Conflicts can still occur when buckets split due to added objects causing them to exceed their maximum size.
Transactions can’t be rolled back, but they may be undone in some cases, especially if subsequent transactions haven’t modified the same objects.
While it’s possible to spread transaction work over multiple threads, it’s not a good idea. See Concurrency, threads and processes
Transactions are implicitly created when needed, such as when data are first modified.
ZODB and the transaction package predate context managers and the Python
We’re using program here in a fairly general sense, meaning some logic that we want to run to perform some function, as opposed to an operating system program.
at least not now.