/ Hathaway Weblog / Compete Filesystem

Shane :: Python :: October 17, 2005 # Compete Filesystem

Mamading Ceesay pointed me to the Compete filesystem, presented at PyCon 2005.

http://www.python.org/pycon/2005/papers/46/CompeteFileSystem.pdf

I thought it would be useful to compare the notes on Compete with MogileFS and Bit Mountain.

Platforms: Compete uses NFS, Python, and MySQL. MogileFS uses HTTP or NFS, Perl, and MySQL. Bit Mountain uses HTTP, Python, and PostgreSQL. I chose HTTP over of NFS because NFS doesn't behave well when a node fails. Also, NFS uses a heavy abstraction layer in the kernel that I just don't need. I chose Python over Perl because Python is more likely to be maintainable, and it's what I know. I chose PostgreSQL over MySQL because PostgreSQL has a better reputation for reliability.

Safety: The docs I've seen on Compete don't mention replication, so I wonder what happens when a storage node disappears or loses sanity. MogileFS replicates automatically, and the number of replicas to maintain is configurable per file. Bit Mountain extends the MogileFS replication functionality in two ways: you can configure the required isolation level (isolated storage devices, hosts, or racks), and you can use forward error correction in place of (or in conjunction with) replication. Note that RAID 5 is a form of forward error correction, so applying FEC to data storage is a proven idea.

Access: Compete has a data access layer that's easy for scripts to use. MogileFS also has a script interface and client library. Bit Mountain has a client library, and while the script interface isn't complete, it has an HTTP proxy; the proxy simply implements GET/PUT/DELETE on a namespace that spans the entire storage.

Speed: I don't know anything about the speed of Compete. On a test system with dual 2.4 GHz Xeons, MogileFS was able to read about 600 small files per second with one database server. (Bytes per second is irrelevant, since that figure is bound by the network, not the software.) When I translated MogileFS to Python + MySQL, I found some easy optimizations and hit 1200 files per second. Bit Mountain is currently sitting at around 500 reads per second on the same hardware, but I bet I can improve that. I don't remember how quickly MogileFS can write files, but Bit Mountain writes about 20 files per second with fsync enabled and 100 per second with fsync disabled.

License: I don't know whether Compete is available. MogileFS is freely available, but the license terms aren't set in stone (probably GPL since it depends on MySQL.) Bit Mountain isn't available, but I hope it will be in the future.

Compete seems to allow files to be modified even after they have been stored. This is probably tied to the choice to use NFS and to not implement replication. Replication is obviously possible using RAID, or even DRBD to distribute RAID across nodes, but that's heavier than the MogileFS / Bit Mountain approach, which automates replication at the application layer. I prefer to treat the files as atoms in a transactional system, where the whole file changes all at once.

Compete might be ahead on the database replication aspect. All database changes simply go to multiple databases. MogileFS and Bit Mountain currently expect database-level replication. I need to learn more about both before I can choose between them.

One thing I haven't seen addressed is how to store 1 billion files. Bit Mountain currently adds about 2 kilobytes of database storage space, including indexes, per file. If you're storing 1 billion files, that's a 2 TB database. A SAN will store that database easily, but then will the database be the bottleneck? The obvious thing to do is to distribute to multiple databases, each database handling a different hash region of the global namespace. Both MogileFS and Bit Mountain should be compatible with that strategy. The risk there is that if you ever have to redistribute the hashes, it will be a very time-consuming process.

BTW, I believe the storage industry's term for this type of software is "storage virtualization layer". That term is probably more accurate than "filesystem", since filesystem usually refers to the translation of blocks to files. Translating blocks to files is much harder than what MogileFS and Bit Mountain do. MogileFS and Bit Mountain just assemble a bunch of less-reliable files into a single, reliable namespace.

Another interesting piece of storage software is DIBS, the Distributed Internet Backup System. This one uses Reed-Solomon codes and encryption for backup. I think it's a fantastic idea, and if it gains a nice UI and runs faster, everyone will want it--it will become the default choice for backup.

Bit Mountain is less ambitious than DIBS, since Bit Mountain doesn't tackle any P2P aspects. OTOH, Bit Mountain uses a relational database and is meant to work without a user sitting over it.

I'd like to know if there is other similar software out there.

Comments

Aaron 'Teejay' Trevena (October 18, 2005 04:49)

Regarding your comments on MogileFS maintainability and speed -- I think you'll find that Perl would be every bit as easy to optimise (switching to mysql will provide a significant part of that) and maintain as Python.

The only advantage in using python is that you already know it.

As somebody who has written enterprise level perl for a living for over 5 years (and have a BSc in Computer Systems & Networks), I am sick to the teeth of python zealots and their wild claims of improved speed and maintainability over perl.

Really, if you were an experienced software programmer you would know better than to think that a richer syntax makes for unmaintainable code, when what stops programs from scaling in complexity, vertically and horizontally, has very little indeed to do with the choice of language and far more with the design, planning and management of the project.

Thats why perl is used in investment banks, online trading, e-commerce and mail management -- it scales, it solves problems and works well for team development.

When I see a good reason to use Python, I will but as yet I haven't had a compelling requirement for anything it offers over perl and certainly couldn't sacrifice the availability of skilled developers, literature and the proven success of packages on CPAN.

Shane Hathaway (October 18, 2005 09:55)

Aaron, you're being defensive. I really made no suggestion that Python is better than Perl. I did say that I'm a better Python programmer than Perl programmer. I don't know whether that generalizes to other people. Also note that Bit Mountain imitates MogileFS, a Perl package, which I felt had the best design among the many open source storage projects available. Imitation is a high compliment.

Colin Grady (January 07, 2006 11:50)

Hey, would you be willing to make the Python library for MogileFS public (via GPL or something)? I've love to see that.

No further comments may be added.

Doctrine and Covenants 84:33-39 (Click below to fill in the blanks.)
Your browser is not able to display the scripture fill-in program. To see it, enable Javascript or use Mozilla 1.0 or better.

Church: lds scriptures provident games pearls kzion shiblon film chancellor gateway cumorah byutv happiness nephi
Zope: freezope org com zen labs newbies zettai warnes
Python: home pyzine daily icanprogram
Genealogy: cyndi
Weblogs: jeffrey paul jon joel another-shane guido barry jeremy windley chrism zac
News: quakes lwn dc weather deseret zeitgeist softwarelivre
Zaurus: software developer
Tech: tango spintronics thin
Semantic: aaron sean
Reference: css rdf html4 javascript geckodom iecss emacs phrases acronyms
Reverse: advogato slashdot
Misc: gimp-savvy directory soda jokes shouldexist pdphoto