/ Hathaway Weblog / PGStorage

Shane :: Zope :: February 11, 2006 # PGStorage

I'm happy to announce PGStorage, a new ZODB backend that stores pickles in a PostgreSQL database. It supports undo and packing.

The intent of this storage is to be a good ZODB pickle store, and PostgreSQL has many features that simplify storage. Taking advantage of all the capabilities of a relational database are not the intent of this storage. The idea of a PostgreSQL storage has been attempted before, but at the time, PostgreSQL did not have the TOAST feature. TOAST simplifies and optimizes the task of storing binary objects in the database.

Multiple Zope instances can connect to the database, so ZEO is not needed when using this storage. ZODB caches are invalidated by polling the database at transaction boundaries. Is that an efficient choice? Maybe. If it's not, maybe the PostgreSQL "listen" and "notify" statements can help.

I think PGStorage has a chance of beating ZEO in scaleability, since PostgreSQL's networking layer is likely to be more efficient that ZEO. PGStorage has a fraction of the number of lines of code of FileStorage and ZEO, but it's certainly not as well tested yet.

I did end up fighting with ZODB somewhat. ZODB normally assumes that MVCC is implemented by the Connection object, but this storage prefers to depend on the MVCC capabilities already in PostgreSQL, so I had to override some methods in Connection. I didn't fight with ZODB in this package nearly as much as I did in Ape, though.

Try it out and see if it's any good. After the first round of testing, I plan to take advantage of the two-phase commit support in PostgreSQL 8.1.

Comments

Jerry Spicklemire (February 13, 2006 16:09)

PGStorage sounds very good to me,

PostgreSQL is quite extensible enough to provide alternative access modes, such as query based object fetches, and unpickling / repickling en route. Add a bit of PostgreSQL driven version control, and about 90% of the objections to ZODB are history. Talk about a no-brainer when you're working on a system with multi-version concurrency built-in. Finally, a really great way for vacuum to do double duty. Suck out the old stuff, zip it up, and dump it into an indexed archive. Call it all back on demand!

I'm a bit tired of reading about "best practises" that shun ZODB, and the ZMI. While Rich-GUI web applications are popularizing TTW (through the web) object access, the Zope world is stolidly heading in the opposite direction. All because of implementation problems that won't be resolved. "Don't it always seem to go that you don't know what you've got 'til it's gone..."

Thanks for all the way cool stuff! Jerry S.

Juergen Herrmann (February 15, 2006 06:47)

will it really be possible to connect more than one zope instance to a pgstorage? -> no need to use zeo at all?

keep up the cool stuff, looking forward for a test of pgstorage soon...

regards, juergen

Gary Poster (February 15, 2006 09:47)

Hey Shane. Just skimmed the release for a few minutes. Wow, that code is short and sweet! :-) Nice to be able to rely on someone else's (popular) work.

Have you done any performance tests? I imagine the most interesting results would be in a grid, given the advantages you list: compare FileStorage/PGStorage(/DirectoryStorage) both as standalone (no ZEO/shared storage) and shared.

I'm guessing the TOAST capabilities would fit in fine with the ZODB blob stuff too--I assume you would see just storing blobs in Postgres as well?

Shane Hathaway (February 17, 2006 00:36)

Good ideas, Gary. I need to do some performance tests.

guest (February 21, 2006 14:05)

is it just for zope 2.x or does it work with zope 3.x?

Shane Hathaway (February 21, 2006 16:30)

I believe it will work equally well with Zope 3.x, although I've only tested it with Zope 2.

guest (February 22, 2006 03:36)

zope 3 conf is different to zope2:

quick try: zope.conf in zope 3...

%import ZODB.PGStorage <zodb> <pgstorage> params host=localhost dbname=zodb user=xxxx password=xxxxx </pgstorage> </zodb> .....

gives an error on startup: AttributeError: 'Application' object has no attribute 'getSiteManager'

any hints, how to configure PGStorage for zope3?

Thank you!

larry (March 04, 2006 11:40)

For filesystem storage, would you recommend ape or (modify) pgstorage?

-thanks

Juhapekka Piiroinen (April 03, 2006 05:40)

Is there a bug in the pgstorage.py:

should this be: from persistent.TimeStamp import TimeStamp

like this: from ZODB.TimeStamp import TimeStamp

I just wonder, because atleast at my workstation it will not find persistent.TimeStamp, but when modified as to ZODB.TimeStamp it works.. :O ..or am I making a fool of myself?

Shane Hathaway (April 04, 2006 01:22)

Juhapekka: Zope 2.7 puts the TimeStamp module in the ZODB package, but PGStorage requires Zope 2.8.

Larry: There still isn't a great answer for filesystem storage. This site (hathawaymix.org) is probably the only site in the world that uses Ape for filesystem storage. I like it a lot, but I can see why others wouldn't. Its complexity is nasty.

Guest: I still need to get around to testing PGStorage with Zope 3. It's on my mind.

josh (April 10, 2006 14:44)

This should be interesting for use with Nuxeo CPS; I attempted to use Ape for this purpose, but alas, Ape doesn't work with Zope 2.9.x... at least the Ape distribution didn't and I'm too lazy to download from SVN... :) I'll post an update if/when I get it working.

Nuxeo CPS can be found here: http://www.nuxeo.com http://www.cps-project.org

Cheers, -Josh

Josh (April 14, 2006 14:04)

Ok, this works with Nuxeo (using Zope-2.9.2 and Python-2.4.3 if I remember correctly)... the next step is to do performance testing and tweaking! Great product (so far, we'll see if anything breaks :) ).

Jesse (April 18, 2006 05:33)

Anyone try this out on Plone 2.1.2+?

josh (May 02, 2006 11:13)

Taking it one step further: is it possible to de-pickle the object before insertion in the database? Bear with me as I'm new to Python and Zope, but let's use this example: suppose I have an Object that is 'contact info'. I might want to search on Last Name or Company Name. Normally I can control that by adding an index on that field in the DB. Is such a thing possible with PGStorage or another Zope<->Postgres adapter?

hotstan (May 04, 2006 03:53)

just installed your PGStorage on Mandriva 2006 + python 2.4.3 + Zope 2.9.2 + Plone 2.1.2

it seems quite nice, maybe it's faster than ZODB

i think ZODB is bad when we need very large data. but with PostgreSQL there is nothing impossible

good job shane

const (July 13, 2006 00:58)

I wrote trivial application that stores in database a filesystem hierarchy (files and directories): in SQL (postgres) or in ZODB (FileStorage in single process, or ZEO + FileStorage in concurrent processes). Performance test is:

  • Import real directory into DB.
  • Open each file in random order and edit its contents.
  • List all files.
  • Pack DB to remove garbage, or VACUUM.

I found that in single process application ZODB is better performance, but with concurrent processes SQL is faster.

And I confused that in this test (single process) PGStorage is much slower then FileStorage: 2-3 times on Import and List operations, 7 times on Edit and 130 times on Pack!

Bernd Dorn (July 28, 2006 12:22)

Great Software! I just installed it on macos x and was able to upload a 600MB file. But when i tried to upload a directory hierarchy via webdav I encountered not catched deadlocks.

Traceback:

Traceback (innermost last):
  Module ZPublisher.Publish, line 121, in publish
  Module Zope2.App.startup, line 240, in commit
  Module transaction._manager, line 96, in commit
  Module transaction._transaction, line 380, in commit
  Module transaction._transaction, line 378, in commit
  Module transaction._transaction, line 436, in _commitResources
  Module ZODB.Connection, line 665, in tpc_vote
  Module ZODB.BaseStorage, line 252, in tpc_vote
  Module ZODB.PGStorage.pgstorage, line 373, in _vote
  Module ZODB.PGStorage.pgstorage, line 42, in run_script
ProgrammingError: deadlock detected
DETAIL:  Process 3227 waits for ShareLock on transaction 9122; blocked by process 3229.
Process 3229 waits for ShareRowExclusiveLock on relation 16390 of database 16387; blocked by   process 3227.

Regards, Bernd

Bernd Dorn (July 29, 2006 03:24)

Zope3 Try

With zope 3.3 it does not work correctly. The problem mentioned in comment from guest (February 22, 2006 03:36) is because he did use a db which he used with zope2 before. With a fresh db one can startup zope3 but the undolog does not work and the zmi becomes unusable, seems that history information is not written corretly.

Traceback snippet:

  File "/Users/bd/zsandbox/trunk/src/zope/app/undo/browser.py", line 33, in principalLastTransactionIsUndo
    last=1)
  File "/Users/bd/zsandbox/trunk/src/zope/app/undo/__init__.py", line 107, in getPrincipalTransactions
    return self._getUndoInfo(context, principal, first, last)
  File "/Users/bd/zsandbox/trunk/src/zope/app/undo/__init__.py", line 128, in _getUndoInfo
    entries = self.__db.undoInfo(first, last, specification)
  File "/Users/bd/zsandbox/trunk/src/ZODB/UndoLogCompatible.py", line 37, in undoInfo
    return self.undoLog(first, last, filter)
  File "/Users/bd/zsandbox/trunk/src/ZODB/PGStorage/pgstorage.py", line 479, in undoLog
    d.update(cPickle.loads(ext))
EOFError
cursor_dealloc: deleted cursor object at 0x44a8ab0, refcnt = 0
chunk_dealloc: deallocating memory at 0x42c6870, size 127
connection_dealloc: deleted connection object at 0x4454650, refcnt = 0

Shane Hathaway (August 01, 2006 10:00)

Bernd: The Zope 3 undo issue is fixed in CVS. The locking problem appears to be the same as an issue that has also been fixed in CVS. I need to release.

const: PGStorage doesn't compete with raw FileStorage in speed, since raw FileStorage gets a lot of speed gains by being limited to a single process. However, PGStorage competes favorably with the combination of ZEO and FileStorage. There is a speed test (in CVS again...) that compares the two. However, the speed test does not test the speed of packing, which is probably not yet optimal.

josh (August 04, 2006 10:55)

Const / Shane,

When you say 'concurrent processes': Do you mean multiple Zope sites hitting the same ZODB, or do you mean a high-volume single Zope site hitting a ZODB (with the understanding that Zope might create additional processes as more users hit the site)? I'm planning a high-volume site based on Nuxeo and I'm trying to determine if I should use ZODB or PGStorage...

Cheers, -Josh

josh (August 04, 2006 13:11)

Ok, here's another theoretical scenario. Suppose I have a site (like Plone), the storage of which is based in PGStorage, hosted on a Linux/X86 box.

Now, suppose I compile Python and Zope under Solaris on a Niagara box. I don't know offhand if the endian-ness of Niagara is different than X86, but I suspect that like Sparc it is.

Can I just do a pgdump on the Linux box, import that data into Postgres on the Niagara box, start zope, and have the site work as expected? Or does the data stored by PGStorage depend on endian-ness?

Cheers, -Josh

Bernd Dorn (August 05, 2006 06:01)

Hi Shane, is the cvs public available somewhere? Would be nice to have this stuff on svn.zope.org

Shane Hathaway (August 12, 2006 17:37)

CVS is on Sourceforge, project name "pgstorage". I'm slowly building the project.

No further comments may be added.

Click below to fill in the scripture reference.
Your browser is not able to display the scripture fill-in program. To see it, enable Javascript or use Mozilla 1.0 or better.
But the LORD said unto Samuel, Look not on his countenance, or on the height of his stature; because I have refused him: for [the LORD seeth] not as man seeth; for man looketh on the outward appearance, but the LORD looketh on the heart.

Church: lds scriptures provident games pearls kzion shiblon film chancellor gateway cumorah byutv happiness nephi
Zope: freezope org com zen labs newbies zettai warnes
Python: home pyzine daily icanprogram
Genealogy: cyndi
Weblogs: jeffrey paul jon joel another-shane guido barry jeremy windley chrism zac
News: quakes lwn dc weather deseret zeitgeist softwarelivre
Zaurus: software developer
Tech: tango spintronics thin
Semantic: aaron sean
Reference: css rdf html4 javascript geckodom iecss emacs phrases acronyms
Reverse: advogato slashdot
Misc: gimp-savvy directory soda jokes shouldexist pdphoto