|
|
/ Hathaway Weblog / Compete Filesystem |
Mamading Ceesay pointed me to the Compete filesystem, presented at PyCon 2005.
http://www.python.org/pycon/2005/papers/46/CompeteFileSystem.pdf
I thought it would be useful to compare the notes on Compete with MogileFS and Bit Mountain.
Platforms: Compete uses NFS, Python, and MySQL. MogileFS uses HTTP or NFS, Perl, and MySQL. Bit Mountain uses HTTP, Python, and PostgreSQL. I chose HTTP over of NFS because NFS doesn't behave well when a node fails. Also, NFS uses a heavy abstraction layer in the kernel that I just don't need. I chose Python over Perl because Python is more likely to be maintainable, and it's what I know. I chose PostgreSQL over MySQL because PostgreSQL has a better reputation for reliability.
Safety: The docs I've seen on Compete don't mention replication, so I wonder what happens when a storage node disappears or loses sanity. MogileFS replicates automatically, and the number of replicas to maintain is configurable per file. Bit Mountain extends the MogileFS replication functionality in two ways: you can configure the required isolation level (isolated storage devices, hosts, or racks), and you can use forward error correction in place of (or in conjunction with) replication. Note that RAID 5 is a form of forward error correction, so applying FEC to data storage is a proven idea.
Access: Compete has a data access layer that's easy for scripts to use. MogileFS also has a script interface and client library. Bit Mountain has a client library, and while the script interface isn't complete, it has an HTTP proxy; the proxy simply implements GET/PUT/DELETE on a namespace that spans the entire storage.
Speed: I don't know anything about the speed of Compete. On a test system with dual 2.4 GHz Xeons, MogileFS was able to read about 600 small files per second with one database server. (Bytes per second is irrelevant, since that figure is bound by the network, not the software.) When I translated MogileFS to Python + MySQL, I found some easy optimizations and hit 1200 files per second. Bit Mountain is currently sitting at around 500 reads per second on the same hardware, but I bet I can improve that. I don't remember how quickly MogileFS can write files, but Bit Mountain writes about 20 files per second with fsync enabled and 100 per second with fsync disabled.
License: I don't know whether Compete is available. MogileFS is freely available, but the license terms aren't set in stone (probably GPL since it depends on MySQL.) Bit Mountain isn't available, but I hope it will be in the future.
Compete seems to allow files to be modified even after they have been stored. This is probably tied to the choice to use NFS and to not implement replication. Replication is obviously possible using RAID, or even DRBD to distribute RAID across nodes, but that's heavier than the MogileFS / Bit Mountain approach, which automates replication at the application layer. I prefer to treat the files as atoms in a transactional system, where the whole file changes all at once.
Compete might be ahead on the database replication aspect. All database changes simply go to multiple databases. MogileFS and Bit Mountain currently expect database-level replication. I need to learn more about both before I can choose between them.
One thing I haven't seen addressed is how to store 1 billion files. Bit Mountain currently adds about 2 kilobytes of database storage space, including indexes, per file. If you're storing 1 billion files, that's a 2 TB database. A SAN will store that database easily, but then will the database be the bottleneck? The obvious thing to do is to distribute to multiple databases, each database handling a different hash region of the global namespace. Both MogileFS and Bit Mountain should be compatible with that strategy. The risk there is that if you ever have to redistribute the hashes, it will be a very time-consuming process.
BTW, I believe the storage industry's term for this type of software is "storage virtualization layer". That term is probably more accurate than "filesystem", since filesystem usually refers to the translation of blocks to files. Translating blocks to files is much harder than what MogileFS and Bit Mountain do. MogileFS and Bit Mountain just assemble a bunch of less-reliable files into a single, reliable namespace.
Another interesting piece of storage software is DIBS, the Distributed Internet Backup System. This one uses Reed-Solomon codes and encryption for backup. I think it's a fantastic idea, and if it gains a nice UI and runs faster, everyone will want it--it will become the default choice for backup.
Bit Mountain is less ambitious than DIBS, since Bit Mountain doesn't tackle any P2P aspects. OTOH, Bit Mountain uses a relational database and is meant to work without a user sitting over it.
I'd like to know if there is other similar software out there.
Comments
Regarding your comments on MogileFS maintainability and speed -- I think you'll find that Perl would be every bit as easy to optimise (switching to mysql will provide a significant part of that) and maintain as Python.
The only advantage in using python is that you already know it.
As somebody who has written enterprise level perl for a living for over 5 years (and have a BSc in Computer Systems & Networks), I am sick to the teeth of python zealots and their wild claims of improved speed and maintainability over perl.
Really, if you were an experienced software programmer you would know better than to think that a richer syntax makes for unmaintainable code, when what stops programs from scaling in complexity, vertically and horizontally, has very little indeed to do with the choice of language and far more with the design, planning and management of the project.
Thats why perl is used in investment banks, online trading, e-commerce and mail management -- it scales, it solves problems and works well for team development.
When I see a good reason to use Python, I will but as yet I haven't had a compelling requirement for anything it offers over perl and certainly couldn't sacrifice the availability of skilled developers, literature and the proven success of packages on CPAN.
Aaron, you're being defensive. I really made no suggestion that Python is better than Perl. I did say that I'm a better Python programmer than Perl programmer. I don't know whether that generalizes to other people. Also note that Bit Mountain imitates MogileFS, a Perl package, which I felt had the best design among the many open source storage projects available. Imitation is a high compliment.
Hey, would you be willing to make the Python library for MogileFS public (via GPL or something)? I've love to see that.
