|
|
/ Hathaway Weblog / PGStorage |
I'm happy to announce PGStorage, a new ZODB backend that stores pickles in a PostgreSQL database. It supports undo and packing.
The intent of this storage is to be a good ZODB pickle store, and PostgreSQL has many features that simplify storage. Taking advantage of all the capabilities of a relational database are not the intent of this storage. The idea of a PostgreSQL storage has been attempted before, but at the time, PostgreSQL did not have the TOAST feature. TOAST simplifies and optimizes the task of storing binary objects in the database.
Multiple Zope instances can connect to the database, so ZEO is not needed when using this storage. ZODB caches are invalidated by polling the database at transaction boundaries. Is that an efficient choice? Maybe. If it's not, maybe the PostgreSQL "listen" and "notify" statements can help.
I think PGStorage has a chance of beating ZEO in scaleability, since PostgreSQL's networking layer is likely to be more efficient that ZEO. PGStorage has a fraction of the number of lines of code of FileStorage and ZEO, but it's certainly not as well tested yet.
I did end up fighting with ZODB somewhat. ZODB normally assumes that MVCC is implemented by the Connection object, but this storage prefers to depend on the MVCC capabilities already in PostgreSQL, so I had to override some methods in Connection. I didn't fight with ZODB in this package nearly as much as I did in Ape, though.
Try it out and see if it's any good. After the first round of testing, I plan to take advantage of the two-phase commit support in PostgreSQL 8.1.
Comments
PGStorage sounds very good to me,
PostgreSQL is quite extensible enough to provide alternative access modes, such as query based object fetches, and unpickling / repickling en route. Add a bit of PostgreSQL driven version control, and about 90% of the objections to ZODB are history. Talk about a no-brainer when you're working on a system with multi-version concurrency built-in. Finally, a really great way for vacuum to do double duty. Suck out the old stuff, zip it up, and dump it into an indexed archive. Call it all back on demand!
I'm a bit tired of reading about "best practises" that shun ZODB, and the ZMI. While Rich-GUI web applications are popularizing TTW (through the web) object access, the Zope world is stolidly heading in the opposite direction. All because of implementation problems that won't be resolved. "Don't it always seem to go that you don't know what you've got 'til it's gone..."
Thanks for all the way cool stuff! Jerry S.
will it really be possible to connect more than one zope instance to a pgstorage? -> no need to use zeo at all?
keep up the cool stuff, looking forward for a test of pgstorage soon...
regards, juergen
Hey Shane. Just skimmed the release for a few minutes. Wow, that code is short and sweet! :-) Nice to be able to rely on someone else's (popular) work.
Have you done any performance tests? I imagine the most interesting results would be in a grid, given the advantages you list: compare FileStorage/PGStorage(/DirectoryStorage) both as standalone (no ZEO/shared storage) and shared.
I'm guessing the TOAST capabilities would fit in fine with the ZODB blob stuff too--I assume you would see just storing blobs in Postgres as well?
Good ideas, Gary. I need to do some performance tests.
is it just for zope 2.x or does it work with zope 3.x?
I believe it will work equally well with Zope 3.x, although I've only tested it with Zope 2.
zope 3 conf is different to zope2:
quick try: zope.conf in zope 3...
%import ZODB.PGStorage <zodb> <pgstorage> params host=localhost dbname=zodb user=xxxx password=xxxxx </pgstorage> </zodb> .....
gives an error on startup: AttributeError: 'Application' object has no attribute 'getSiteManager'
any hints, how to configure PGStorage for zope3?
Thank you!
For filesystem storage, would you recommend ape or (modify) pgstorage?
-thanks
Is there a bug in the pgstorage.py:
should this be: from persistent.TimeStamp import TimeStamp
like this: from ZODB.TimeStamp import TimeStamp
I just wonder, because atleast at my workstation it will not find persistent.TimeStamp, but when modified as to ZODB.TimeStamp it works.. :O ..or am I making a fool of myself?
Juhapekka: Zope 2.7 puts the TimeStamp module in the ZODB package, but PGStorage requires Zope 2.8.
Larry: There still isn't a great answer for filesystem storage. This site (hathawaymix.org) is probably the only site in the world that uses Ape for filesystem storage. I like it a lot, but I can see why others wouldn't. Its complexity is nasty.
Guest: I still need to get around to testing PGStorage with Zope 3. It's on my mind.
This should be interesting for use with Nuxeo CPS; I attempted to use Ape for this purpose, but alas, Ape doesn't work with Zope 2.9.x... at least the Ape distribution didn't and I'm too lazy to download from SVN... :) I'll post an update if/when I get it working.
Nuxeo CPS can be found here: http://www.nuxeo.com http://www.cps-project.org
Cheers, -Josh
Ok, this works with Nuxeo (using Zope-2.9.2 and Python-2.4.3 if I remember correctly)... the next step is to do performance testing and tweaking! Great product (so far, we'll see if anything breaks :) ).
Anyone try this out on Plone 2.1.2+?
Taking it one step further: is it possible to de-pickle the object before insertion in the database? Bear with me as I'm new to Python and Zope, but let's use this example: suppose I have an Object that is 'contact info'. I might want to search on Last Name or Company Name. Normally I can control that by adding an index on that field in the DB. Is such a thing possible with PGStorage or another Zope<->Postgres adapter?
just installed your PGStorage on Mandriva 2006 + python 2.4.3 + Zope 2.9.2 + Plone 2.1.2
it seems quite nice, maybe it's faster than ZODB
i think ZODB is bad when we need very large data. but with PostgreSQL there is nothing impossible
good job shane
I wrote trivial application that stores in database a filesystem hierarchy (files and directories): in SQL (postgres) or in ZODB (FileStorage in single process, or ZEO + FileStorage in concurrent processes). Performance test is:
- Import real directory into DB.
- Open each file in random order and edit its contents.
- List all files.
- Pack DB to remove garbage, or VACUUM.
I found that in single process application ZODB is better performance, but with concurrent processes SQL is faster.
And I confused that in this test (single process) PGStorage is much slower then FileStorage: 2-3 times on Import and List operations, 7 times on Edit and 130 times on Pack!
Great Software! I just installed it on macos x and was able to upload a 600MB file. But when i tried to upload a directory hierarchy via webdav I encountered not catched deadlocks.
Traceback:
Traceback (innermost last): Module ZPublisher.Publish, line 121, in publish Module Zope2.App.startup, line 240, in commit Module transaction._manager, line 96, in commit Module transaction._transaction, line 380, in commit Module transaction._transaction, line 378, in commit Module transaction._transaction, line 436, in _commitResources Module ZODB.Connection, line 665, in tpc_vote Module ZODB.BaseStorage, line 252, in tpc_vote Module ZODB.PGStorage.pgstorage, line 373, in _vote Module ZODB.PGStorage.pgstorage, line 42, in run_script ProgrammingError: deadlock detected DETAIL: Process 3227 waits for ShareLock on transaction 9122; blocked by process 3229. Process 3229 waits for ShareRowExclusiveLock on relation 16390 of database 16387; blocked by process 3227.
Regards, Bernd
Zope3 Try
With zope 3.3 it does not work correctly. The problem mentioned in comment from guest (February 22, 2006 03:36) is because he did use a db which he used with zope2 before. With a fresh db one can startup zope3 but the undolog does not work and the zmi becomes unusable, seems that history information is not written corretly.
Traceback snippet:
File "/Users/bd/zsandbox/trunk/src/zope/app/undo/browser.py", line 33, in principalLastTransactionIsUndo
last=1)
File "/Users/bd/zsandbox/trunk/src/zope/app/undo/__init__.py", line 107, in getPrincipalTransactions
return self._getUndoInfo(context, principal, first, last)
File "/Users/bd/zsandbox/trunk/src/zope/app/undo/__init__.py", line 128, in _getUndoInfo
entries = self.__db.undoInfo(first, last, specification)
File "/Users/bd/zsandbox/trunk/src/ZODB/UndoLogCompatible.py", line 37, in undoInfo
return self.undoLog(first, last, filter)
File "/Users/bd/zsandbox/trunk/src/ZODB/PGStorage/pgstorage.py", line 479, in undoLog
d.update(cPickle.loads(ext))
EOFError
cursor_dealloc: deleted cursor object at 0x44a8ab0, refcnt = 0
chunk_dealloc: deallocating memory at 0x42c6870, size 127
connection_dealloc: deleted connection object at 0x4454650, refcnt = 0
Bernd: The Zope 3 undo issue is fixed in CVS. The locking problem appears to be the same as an issue that has also been fixed in CVS. I need to release.
const: PGStorage doesn't compete with raw FileStorage in speed, since raw FileStorage gets a lot of speed gains by being limited to a single process. However, PGStorage competes favorably with the combination of ZEO and FileStorage. There is a speed test (in CVS again...) that compares the two. However, the speed test does not test the speed of packing, which is probably not yet optimal.
Const / Shane,
When you say 'concurrent processes': Do you mean multiple Zope sites hitting the same ZODB, or do you mean a high-volume single Zope site hitting a ZODB (with the understanding that Zope might create additional processes as more users hit the site)? I'm planning a high-volume site based on Nuxeo and I'm trying to determine if I should use ZODB or PGStorage...
Cheers, -Josh
Ok, here's another theoretical scenario. Suppose I have a site (like Plone), the storage of which is based in PGStorage, hosted on a Linux/X86 box.
Now, suppose I compile Python and Zope under Solaris on a Niagara box. I don't know offhand if the endian-ness of Niagara is different than X86, but I suspect that like Sparc it is.
Can I just do a pgdump on the Linux box, import that data into Postgres on the Niagara box, start zope, and have the site work as expected? Or does the data stored by PGStorage depend on endian-ness?
Cheers, -Josh
Hi Shane, is the cvs public available somewhere? Would be nice to have this stuff on svn.zope.org
CVS is on Sourceforge, project name "pgstorage". I'm slowly building the project.
