|
|
/ Hathaway Weblog / Ape and transactions, part 3 |
I think I've solved the problem. Ape should now reliably commit or abort immediately after talking with the database, even when nothing was written to the database. Writing a little about the effort in this weblog had the nice effect of keeping me focused.
Now I can spend time rambling about other designs I've been pondering. First let me talk a little about ZODB. The most important feature of ZODB is the object cache. Without the object cache, ZODB would be impractically slow, since ZODB applications would have to rebuild large parts of the object system for each transaction. But with an object cache, database connectivity overhead falls dramatically as cache utilization improves. Thus for many kinds of applications, ZODB is faster than any relational database.
The catch is this: caching is hard. Having studied ZODB for a few years now, it appears to me that if ZODB and ZEO had no caches at any level, ZODB would be utterly simple. It wouldn't have to worry about stale objects. Uncommitted objects and transactional consistency would be simpler problems to solve. Ape wouldn't need to scan for changes nor maintain its own special caches.
So, pie-in-the-sky, can we remove the caches in ZODB without killing speed, perhaps at the cost of some less important features? The main benefit of caching in ZODB is that it reduces the number of times we have to unpickle objects. Unpickling is fast, but not fast enough to do constantly.
So, time for a crazy idea: establish a one-to-one match between the in-memory representation of a database object and its serialized form. To achieve this, objects destined for the database must be laid out in memory less like Python objects and more like C structs. Database objects must not contain absolute pointers (but relative pointers are OK.) Instead of pickling, just memcpy() to and from the database, which may be a memory-mapped file. Reconstruct the object system as you please because the kernel's VM subsystem is all the cache you need.
I think such a thing is actually quite doable using a metaclass implemented in C. I can't think of a good name for the base class, so let's just call it "Entity" for now. Here's some sample code:
class Employee(Entity):
def __init__(self, firstname, lastname, boss):
self.firstname = firstname
self.lastname = lastname
self.boss = boss
def name(self):
return '%s, %s' % (self.lastname, self.firstname)
Looks perfectly normal, doesn't it? Under the covers, however, the metaclass of Entity creates a representation of a Python class instance without any absolute pointers. When the __init__ method assigns the firstname and lastname attributes, it copies the values into the employee object, rather than create a reference. However, when it assigns the boss attribute, where boss is an Entity, an entity reference is stored rather than a copy. If you try to store an attribute with a value that can't be converted to the special in-memory form, an exception is thrown. When the name method reads the attributes, the attributes are transparently converted back to the standard Python representation (most likely a str or a unicode.)
Now, here's a more earthy idea. Create this Entity stuff, but instead of overhauling ZODB, use the Entity class for sharing objects between processes like POSH. I believe this could address the stability problems that POSH has. Only Entity objects would be shareable.
Well, if you've read this far, you're pretty dedicated. What do you think? Has something like the Entity base class been created before? Its most important feature is that no (de)serialization is necessary.
