|
|
/ Hathaway Weblog / Aching to Unveil Bit Mountain |
Bit Mountain is a digital archiving system I've been building. I've mentioned it before. I'm working to make it scale from from a few small files on a few hard drives to billions of files (consisting of multiple petabytes) on thousands of media. I believe I've come to understand the storage problems that occur at that scale and have coded many of the solutions in Bit Mountain.
I recently presented a paper about Bit Mountain. You can download the slides from the Family History Technology Workshop 2006 archive. I'll also post the paper I presented if anyone requests it.
I believe I just found the solution to a nagging problem in Bit Mountain. Bit Mountain currently creates 3-4 relational database rows for every file in the archive. This is no problem if there are millions of files, but it's an ugly problem if there are billions of files. Databases typically don't support billions of rows without big hardware and deep magic tuning.
The obvious solution is a distributed hash table, but that would require a complete rewrite of the software, and a DHT does not take advantage of natural file groupings. Many of the images represent pages of books, so it would make sense to keep book pages together in the archive. Keeping pages together means that when you flip pages in a digital book, loading a page is likely to transparently pre-fetch information about the other pages, and turning pages will be quick.
What I decided to try is uploading bundles of files rather than individual files. The database will record only 3-4 rows for each bundle. To read an individual file, clients will fetch it from the bundle that contains it. Bundles are encoded as multipart MIME messages and can span gigabytes. For each bundle, there is a metadata file that holds an index of the files it contains. If bundles contain an average of 1000 files, then the billion file problem suddenly becomes a mere million file problem and life is easy again. Yesterday and today, I worked out the code for uploading bundles. Now I need to work on the download portion. I'm optimistic that it will perform well.
Everyone I talk to at work is in favor of releasing Bit Mountain as open source software, but I can't do it without official approval. The idea of releasing it keeps stalling for some reason. I ache to release it. I want feedback, I want people to try it out, I want people to improve upon it. I want people to test my theory that Bit Mountain can provide higher reliability and better storage utilization than what is achievable through replication. I want as many people as possible to run it, harden it, and prove that it's reliable.
Comments
Hi Shane,
I'm very interested in your paper. I'd also love to see this as open source software. Can you describe the key points in how BitMountain differs from MogileFS?
Thanks,
-- Dan
I just posted the paper in the writings section. Some important ways that Bit Mountain differs from MogileFS:
- Bit Mountain allows you to use either forward error correction or replication, while MogileFS is limited to replication. Forward error correction is theoretically more reliable and incurs less overhead.
- Bit Mountain controls the distribution of files better. With MogileFS, the loss of any two or three drives could mean lost data. By controlling file distribution, Bit Mountain can survive much more hardware loss than MogileFS can.
- Bit Mountain is more automated; it initiates recovery automatically.
On the other hand, MogileFS has been tested in production.
