/ Hathaway Weblog / Aching to Unveil Bit Mountain

Shane :: Free Software :: May 17, 2006 # Aching to Unveil Bit Mountain

Bit Mountain is a digital archiving system I've been building. I've mentioned it before. I'm working to make it scale from from a few small files on a few hard drives to billions of files (consisting of multiple petabytes) on thousands of media. I believe I've come to understand the storage problems that occur at that scale and have coded many of the solutions in Bit Mountain.

I recently presented a paper about Bit Mountain. You can download the slides from the Family History Technology Workshop 2006 archive. I'll also post the paper I presented if anyone requests it.

I believe I just found the solution to a nagging problem in Bit Mountain. Bit Mountain currently creates 3-4 relational database rows for every file in the archive. This is no problem if there are millions of files, but it's an ugly problem if there are billions of files. Databases typically don't support billions of rows without big hardware and deep magic tuning.

The obvious solution is a distributed hash table, but that would require a complete rewrite of the software, and a DHT does not take advantage of natural file groupings. Many of the images represent pages of books, so it would make sense to keep book pages together in the archive. Keeping pages together means that when you flip pages in a digital book, loading a page is likely to transparently pre-fetch information about the other pages, and turning pages will be quick.

What I decided to try is uploading bundles of files rather than individual files. The database will record only 3-4 rows for each bundle. To read an individual file, clients will fetch it from the bundle that contains it. Bundles are encoded as multipart MIME messages and can span gigabytes. For each bundle, there is a metadata file that holds an index of the files it contains. If bundles contain an average of 1000 files, then the billion file problem suddenly becomes a mere million file problem and life is easy again. Yesterday and today, I worked out the code for uploading bundles. Now I need to work on the download portion. I'm optimistic that it will perform well.

Everyone I talk to at work is in favor of releasing Bit Mountain as open source software, but I can't do it without official approval. The idea of releasing it keeps stalling for some reason. I ache to release it. I want feedback, I want people to try it out, I want people to improve upon it. I want people to test my theory that Bit Mountain can provide higher reliability and better storage utilization than what is achievable through replication. I want as many people as possible to run it, harden it, and prove that it's reliable.

Comments

Dan Hanks (May 17, 2006 13:22)

Hi Shane,

I'm very interested in your paper. I'd also love to see this as open source software. Can you describe the key points in how BitMountain differs from MogileFS?

Thanks,

-- Dan

Shane Hathaway (May 17, 2006 17:16)

I just posted the paper in the writings section. Some important ways that Bit Mountain differs from MogileFS:

  • Bit Mountain allows you to use either forward error correction or replication, while MogileFS is limited to replication. Forward error correction is theoretically more reliable and incurs less overhead.
  • Bit Mountain controls the distribution of files better. With MogileFS, the loss of any two or three drives could mean lost data. By controlling file distribution, Bit Mountain can survive much more hardware loss than MogileFS can.
  • Bit Mountain is more automated; it initiates recovery automatically.

On the other hand, MogileFS has been tested in production.

No further comments may be added.

Click below to fill in the scripture reference.
Your browser is not able to display the scripture fill-in program. To see it, enable Javascript or use Mozilla 1.0 or better.
And when ye shall receive these things, I would exhort you that ye would ask God, the Eternal Father, in the name of Christ, if these things are not true; and if ye shall ask with a sincere heart, with real intent, having faith in Christ, he will manifest the truth of it unto you, by the power of the Holy Ghost. And by the power of the Holy Ghost ye may know the truth of all things.

Church: lds scriptures provident games pearls kzion shiblon film chancellor gateway cumorah byutv happiness nephi
Zope: freezope org com zen labs newbies zettai warnes
Python: home pyzine daily icanprogram
Genealogy: cyndi
Weblogs: jeffrey paul jon joel another-shane guido barry jeremy windley chrism zac
News: quakes lwn dc weather deseret zeitgeist softwarelivre
Zaurus: software developer
Tech: tango spintronics thin
Semantic: aaron sean
Reference: css rdf html4 javascript geckodom iecss emacs phrases acronyms
Reverse: advogato slashdot
Misc: gimp-savvy directory soda jokes shouldexist pdphoto