From 1bbe4ae2f63a3941c4a0cbbfbe7802921b2dbf8a Mon Sep 17 00:00:00 2001 From: Andreas Jung Date: Sat, 21 Feb 2009 08:25:58 +0000 Subject: [PATCH] half-way reSTified --- ZODB2.rst | 488 ++++++++++++++++-------------------------------------- 1 file changed, 147 insertions(+), 341 deletions(-) diff --git a/ZODB2.rst b/ZODB2.rst index 0592cf2c63..57baa63b25 100644 --- a/ZODB2.rst +++ b/ZODB2.rst @@ -1,85 +1,76 @@ Advanced ZODB for Python Programmers -#################################### +==================================== -In the first article in this series, `ZODB for Python -Programmers `_ I covered some of the simpler aspects of Python +In the first article in this series, "ZODB for Python +Programmers":ZODB1 I covered some of the simpler aspects of Python object persistence. In this article, I'll go over some of the more advanced features of ZODB. - In addition to simple persistence, ZODB offers some very useful extras for the advanced Python application. Specificly, we'll cover -the following advanced features in this article:.. comment:: description list - -Persistent-Aware TypesZODB comes with some special, -"persistent-aware" data types for storing data in a ZODB. The -most useful of these is the "BTree", which is a fast, efficient -storage object for lots of data. - -Voalitile DataNot all your data is meant to be stored in the -database, ZODB let's you have volatile data on your objects that -does not get saved. +the following advanced features in this article: -Pluggable StoragesZODB offers you the ability to use many -different storage back-ends to store your object data, including -files, relational databases and a special client-server storage -that stores objects on a remote server. +- Persistent-Aware Types -- ZODB comes with some special, + "persistent-aware" data types for storing data in a ZODB. The + most useful of these is the "BTree", which is a fast, efficient + storage object for lots of data. -Conflict ResolutionWhen many threads try to write to the same -object at the same time, you can get conflicts. ZODB offers a -conflict resolution protocol that allows you to mitigate most -conflicting writes to your data. - -TransactionsWhen you want your changes to be "all or nothing" -transactions come to the rescue. +- Voalitile Data -- Not all your data is meant to be stored in the + database, ZODB let's you have volatile data on your objects that + does not get saved. +- Pluggable Storages -- ZODB offers you the ability to use many + different storage back-ends to store your object data, including + files, relational databases and a special client-server storage + that stores objects on a remote server. +- Conflict Resolution -- When many threads try to write to the same + object at the same time, you can get conflicts. ZODB offers a + conflict resolution protocol that allows you to mitigate most + conflicting writes to your data. +- Transactions -- When you want your changes to be "all or nothing" + transactions come to the rescue. Persistent-Aware Types -====================== +---------------------- You can also get around the mutable attribute problem discussed in the first article by using special types that are "persistent aware". ZODB comes with the following persistent aware mutable -object types:.. comment:: description list - -PersistentListThis type works just like a list, except that -changing it does not require setting _p_changed or explicitly -re-assigning the attribute. - -PersistentMappingA persistent aware dictionary, much like -PersistentList. - -BTreeA dictionary-like object that can hold large -collections of objects in an ordered, fast, efficient way. - - +object types: +- PersistentList -- This type works just like a list, except that + changing it does not require setting _p_changed or explicitly + re-assigning the attribute. -BTrees offer a very powerful facility to the -Python programmer:.. comment:: bullet list +- PersistentMapping -- A persistent aware dictionary, much like + PersistentList. + +- BTree -- A dictionary-like object that can hold large + collections of objects in an ordered, fast, efficient way. -- BTrees can hold a large collection of information in an -efficient way; more objects than your computer has enough -memory to hold at one time. -- BTrees are integrated into the persistence machinery to work -effectively with ZODB's object cache. Recently, or heavily -used objects are kept in a memory cache for speed. -- BTrees can be searched very quickly, because they are stored -in an fast, balanced tree data structure. +BTrees offer a very powerful facility to the Python programmer: +- BTrees can hold a large collection of information in an + efficient way; more objects than your computer has enough + memory to hold at one time. +- BTrees are integrated into the persistence machinery to work + effectively with ZODB's object cache. Recently, or heavily + used objects are kept in a memory cache for speed. -BTrees come in three flavors, OOBTrees, IOBTrees, OIBTrees, and -IIBTrees. The last three are optimized for integer keys, values, -and key-value pairs, respectively. This means that, for example, -an IOBTree is meant to map an integer to an object, and is -optimized for having integers keys. +- BTrees can be searched very quickly, because they are stored + in an fast, balanced tree data structure. +- BTrees come in three flavors, OOBTrees, IOBTrees, OIBTrees, and + IIBTrees. The last three are optimized for integer keys, values, + and key-value pairs, respectively. This means that, for example, + an IOBTree is meant to map an integer to an object, and is + optimized for having integers keys. Using BTrees -============ +------------ Suppose you track the movement of all your employees with heat-seeking cameras hidden in the ceiling tiles. Since your @@ -87,65 +78,45 @@ employees tend to frequently congregate against you, all of the tracking information could end up to be a lot of data, possibly thousands of coordinates per day per employee. Further, you want to key the coordinate on the time that it was taken, so that you -can only look at where your employees were during certain times::: +can only look at where your employees were during certain times:: -from BTrees import IOBTree -from time import time + from BTrees import IOBTree + from time import time -class Employee(Persistent): + class Employee(Persistent): -def __init__(self): -self.movements = IOBTree() + def __init__(self): + self.movements = IOBTree() + + def fix(self, coords): + "get a fix on the employee" + self.movements[int(time())] = coords -def fix(self, coords): -"get a fix on the employee" -self.movements[int(time())] = coords + def trackToday(self): + "return all the movements of the + employee in the last 24 hours" + current_time = int(time()) + return self.movements.items(current_time - 86400, + current_time) -def trackToday(self): -"return all the movements of the -employee in the last 24 hours" -current_time = int(time()) -return self.movements.items(current_time - 86400, -current_time) - - -In this example, the :: - -fix - -method is called every time one of your +In this example, the 'fix' method is called every time one of your cameras sees that employee. This information is then stored in a -BTree, with the current :: - -time() - -as the key and the :: - -coordinates - - +BTree, with the current 'time()' as the key and the 'coordinates' as the value. - Because BTrees store their information is a ordered structure, they can be quickly searched for a range of key values. The -:: - -trackToday - -method uses this feature to return a sequence of +'trackToday' method uses this feature to return a sequence of coordinates from 24 hours hence to the present. - This example shows how BTrees can be quickly searched for a range of values from a minimum to a maximum, and how you can use this technique to oppress your workforce. BTrees have a very rich API, including doing unions and intersections of result sets. - Not All Objects are Persistent -============================== +------------------------------ You don't have to make all of your objects persistent. Non-persistent objects are often useful to represent either @@ -154,168 +125,94 @@ objects that are useful only as a "cache" that can be thrown away when your persistent object is deactivated (removed from memory when not used). - -ZODB provides you with the ability to have *volatile*> attributes. +ZODB provides you with the ability to have *volatile* attributes. Volatile attributes are attributes of persistent objects that are never saved in the database, even if they are capable of being -persistent. Volatile attributes begin with :: - -_v_ - -are good for +persistent. Volatile attributes begin with '_v_' are good for keeping cached information around for optimization. ZODB also provides you with access to special pickling hooks that allow you to set volatile information when an object is activated. - Imagine you had a class that stored a complex image that you needed to calculate. This calculation is expensive. Instead of calculating the image every time you called a method, it would be -better to calculate it *once*> and then cache the result in a -volatile attribute::: - -def image(self): -"a large and complex image of the terrain" -if hasattr(self, '_v_image'): -return self._v_image -image=expensive_calculation() -self._v_image=image -return image - - - -Here, calling :: - -image - -the first time the object is activated will +better to calculate it *once* and then cache the result in a +volatile attribute:: + + def image(self): + "a large and complex image of the terrain" + if hasattr(self, '_v_image'): + return self._v_image + image=expensive_calculation() + self._v_image=image + return image + +Here, calling 'image' the first time the object is activated will cause the method to do the expensive calculation. After the first call, the image will be cached in a volatile attribute. If the -object is removed from memory, the :: - -_v_image - -attribute is not +object is removed from memory, the '_v_image' attribute is not saved, so the cached image is thrown away, only to be recalculated -the next time you call :: - -image - -. - - -ZODB and Concurrency -==================== +the next time you call 'image'. + +ZODB and Concurrency +-------------------- Different, threads, processes, and computers on a network can open connections to a single ZODB object database. Each of these different processes keeps its own copy of the objects that it uses in memory. - The problem with allowing concurrent access is that conflicts can occur. If different threads try to commit changes to the same objects at the same time, one of the threads will raise a ConflictError. If you want, you can write your application to either resolve or retry conflicts a reasonable number of times. - Zope will retry a conflicting ZODB operation three times. This is usually pretty reasonable behavior. Because conflicts only happen when two threads write to the same object, retrying a conflict means that one thread will win the conflict and write itself, and the other thread will retry a few seconds later. - Pluggable Storages -================== +------------------ Different processes and computers can connection to the same -database using a special kind of storage called a :: - -ClientStorage - -. -A :: - -ClientStorage - -connects to a :: - -StorageServer - -over a network. - +database using a special kind of storage called a 'ClientStorage'. +A 'ClientStorage' connects to a 'StorageServer' over a network. In the very beginning, you created a connection to the database by -first creating a storage. This was of the type :: - -FileStorage - -. +first creating a storage. This was of the type 'FileStorage'. Zope comes with several different back end storage objects, but -one of the most interesting is the :: - -ClientStorage - -from the Zope +one of the most interesting is the 'ClientStorage' from the Zope Enterprise Objects product (ZEO). - -The :: - -ClientStorage - -storage makes a TCP/IP connection to a -:: - -StorageServer - -(also provided with ZEO). This allows many +The 'ClientStorage' storage makes a TCP/IP connection to a +'StorageServer' (also provided with ZEO). This allows many different processes on one or machines to work with the same object database and, hence, the same objects. Each process gets a cached "copy" of a particular object for speed. All of the -:: - -ClientStorages - -connected to a :: - -StorageServer - -speak a special +'ClientStorages' connected to a 'StorageServer' speak a special object transport and cache invalidation protocol to keep all of your computers synchronized. - -Opening a :: - -ClientStorage - -connection is simple. The following +Opening a 'ClientStorage' connection is simple. The following code creates a database connection and gets the root object for a -:: - -StorageServer - -listening on "localhost:12345"::: - -from ZODB import DB -from ZEO import ClientStorage -storage = ClientStorage.ClientStorage('localhost', 12345) -db = DB( storage ) -connection = db.open() -root = connection.root() - +'StorageServer' listening on "localhost:12345":: + from ZODB import DB + from ZEO import ClientStorage + storage = ClientStorage.ClientStorage('localhost', 12345) + db = DB( storage ) + connection = db.open() + root = connection.root() In the rare event that two processes (or threads) modify the same object at the same time, ZODB provides you with the ability to retry or resolve these conflicts yourself. - Resolving Conflicts -=================== +------------------- If a conflict happens, you have two choices. The first choice is that you live with the error and you try again. Statistically, @@ -325,149 +222,73 @@ if you can redesign your application so that the changes get spread around to many different objects then you can usually get rid of the hot spot. - -Your second choice is to try and *resolve*> the conflict. In many +Your second choice is to try and *resolve* the conflict. In many situations, this can be done. For example, consider the following -persistent object::: - -class Counter(Persistent): +persistent object:: -self.count = 0 - -def hit(self): -self.count = self.count + 1 + class Counter(Persistent): + self.count = 0 + def hit(self): + self.count = self.count + 1 This is a simple counter. If you hit this counter with a lot of requests though, it will cause conflict errors as different threads try to change the count attribute simultaneously. - But resolving the conflict between conflicting threads in this case is easy. Both threads want to increment the self.count attribute by a value, so the resolution is to increment the attribute by the sum of the two values and make both commits happy. - To resolve a conflict, a class should define an -:: - -_p_resolveConflict - -method. This method takes three arguments... comment:: description list - -:: - -oldState - -The state of the object that the changes made by -the current transaction were based on. The method is permitted -to modify this value. - -:: - -savedState - -The state of the object that is currently -stored in the database. This state was written after -:: - - -oldState - - - - -and reflects changes made by a transaction that committed -before the current transaction. The method is permitted to -modify this value. - -:: - -newState - -The state after changes made by the current -transaction. The method is -* -not -*> -permitted to modify this -value. This method should compute a new state by merging -changes reflected in -:: - - -savedState - - - -and -:: - - -newState - - - -, relative to - -:: - - -oldState - - - -. +'_p_resolveConflict' method. This method takes three arguments: +- 'oldState' -- The state of the object that the changes made by + the current transaction were based on. The method is permitted + to modify this value. +- 'savedState' -- The state of the object that is currently + stored in the database. This state was written after 'oldState' + and reflects changes made by a transaction that committed + before the current transaction. The method is permitted to + modify this value. +- 'newState' -- The state after changes made by the current + transaction. The method is *not* permitted to modify this + value. This method should compute a new state by merging + changes reflected in 'savedState' and 'newState', relative to + 'oldState'. The method should return the state of the object after resolving the differences. +Here is an example of a '_p_resolveConflict' in the 'Counter' +class:: -Here is an example of a :: - -_p_resolveConflict - -in the :: - -Counter - - -class::: - -class Counter(Persistent): - -self.count = 0 - -def hit(self): -self.count = self.count + 1 - -def _p_resolveConflict(self, oldState, savedState, newState): - -# Figure out how each state is different: -savedDiff= savedState['count'] - oldState['count'] -newDiff= newState['count']- oldState['count'] + class Counter(Persistent): -# Apply both sets of changes to old state: -return oldState['count'] + savedDiff + newDiff + self.count = 0 + def hit(self): + self.count = self.count + 1 + def _p_resolveConflict(self, oldState, savedState, newState): -In the above example, :: + # Figure out how each state is different: + savedDiff= savedState['count'] - oldState['count'] + newDiff= newState['count']- oldState['count'] -_p_resolveConflict + # Apply both sets of changes to old state: + return oldState['count'] + savedDiff + newDiff -resolves the difference +In the above example, '_p_resolveConflict' resolves the difference between the two conflicting transactions. - Transactions and Subtransactions -================================ +-------------------------------- Transactions are a very powerful concept in databases. Transactions let you make many changes to your information as if @@ -477,28 +298,19 @@ another. You would do this by deducting the amount of the transfer from one account, and adding that amount onto the other. - If an error happened while you were adding the money to the receiving account (say, the bank's computers were unavailable), then you would want to abort the transaction so that the state of the accounts went back to the way they were before you changed anything. +To abort a transaction, you need to call the 'abort' method of the +transactions object:: -To abort a transaction, you need to call the :: - -abort - -method of the -transactions object::: - -get_transaction().abort() - - - -This will throw away all the currently changed objects and start a -new, empty transaction. + get_transaction().abort() + This will throw away all the currently changed objects and start a + new, empty transaction. Subtransactions, sometimes called "inner transactions", are transactions that happen inside another transaction. @@ -506,13 +318,11 @@ Subtransactions can be commited and aborted like regular "outer" transactions. Subtransactions mostly provide you with an optimization technique. - Subtransactions can be commited and aborted. Commiting or aborting a subtransaction does not commit or abort its outer transaction, just the subtransaction. This lets you use many, fine-grained transactions within one big transaction. - Why is this important? Well, in order for a transaction to be "rolled back" the changes in the transaction must be stored in memory until commit time. By commiting a subtransaction, you are @@ -521,31 +331,26 @@ permenant, you can store this subtransaction somewhere other than in memory". For very, very large transactions, this can be a big memory win for you. - If you abort an outer transaction, then all of its inner subtransactions will also be aborted and not saved. If you abort an inner subtransaction, then only the changes made during that -subtransaction are aborted, and the outer transaction is *not*> +subtransaction are aborted, and the outer transaction is *not* aborted and more changes can be made and commited, including more subtransactions. - You can commit or abort a subtransaction by calling either -commit() or abort() with an argument of 1::: - -get_transaction().commit(1) # or -get_transaction().abort(1) - +commit() or abort() with an argument of 1:: + get_transaction().commit(1) # or + get_transaction().abort(1) Subtransactions offer you a nice way to "batch" all of your "all or none" actions into smaller "all or none" actions while still keeping the outer level "all or none" transaction intact. As a bonus, they also give you much better memory resource performance. - Conclusion -========== +---------- ZODB offers many advanced features to help you develop simple, but powerful python programs. In this article, you used some of the @@ -556,3 +361,4 @@ information on ZODB, join the discussion list at zodb-dev@zope.org where you can find out more about this powerful component of Zope. +