/ Hathaway Weblog / Bit Mountain

Shane :: Python, Free Software :: October 13, 2005 # Bit Mountain

Lately, I've been working on a piece of software designed to manage a hypothetical 20 petabyte digital archive. I'm calling the software Bit Mountain. I hope to release it as free software under the GPL.

I began researching how to store 20 PB at the beginning of this year. A lot of interesting things happen when you try to build a data store so large that it requires thousands of digital media units, and Bit Mountain is a research project that tries to solve the following problems:

  • Periodic verification and replacement is essential. Any high-density digital media (a tape, a hard drive, a DVD, etc.) fails over time. A tape on a shelf is generally OK, but one hundred tapes without some kind of robot is quite risky.
  • Hard drives are easier to verify periodically than tapes. Hard drives are also well-understood commodity items.
  • Power is a large concern. 20 PB worth of spinning hard drives would incur a power bill in the neighborhood of $100,000 per month. Over time, that power bill could even exceed the hardware acquisition cost. Unfortunately, RAID-based SANs depend on constant power to maintain data integrity. It's not hard to imagine powering up a SAN after one month of inactivity and discovering that one too many drives have been lost.
  • Simple mirroring doubles or triples the amount of digital media required. RAID schemes involving one or two parity drives in a set don't require as much media, but they're also less reliable than mirroring if you shut off the power. Something better is needed.
  • On top of all that, it would be such a shame to have to hire 20+ system administrators to watch over a 20 PB archive. That's a low estimate, from what I've heard! So many administrators would be much more expensive than the hardware.

Bit Mountain is a lot like MogileFS. It works at the application level instead of the kernel level, uses a relational database, performs replication automatically, talks to hard drives using simple HTTP, works with any filesystem, and can be configured with no single point of failure. Unlike MogileFS, it also incorporates forward error correction (FEC) in the form of Reed-Solomon encoding. I believe that tunable forward error correction is the key to maintaining integrity on sleeping hard drives, tapes, or optical media. FEC also consumes less space than replication.

I've come up with a formula for determining the probability of maintaining data integrity on sleeping hard drives, given the repository size, media reliability, the data protection parameters, and a verification period. I've spent enough time on this post so I'll post the equation later.

No further comments may be added.

Joshua 1:8 (Click below to fill in the blanks.)
Your browser is not able to display the scripture fill-in program. To see it, enable Javascript or use Mozilla 1.0 or better.

Church: lds scriptures provident games pearls kzion shiblon film chancellor gateway cumorah byutv happiness nephi
Zope: freezope org com zen labs newbies zettai warnes
Python: home pyzine daily icanprogram
Genealogy: cyndi
Weblogs: jeffrey paul jon joel another-shane guido barry jeremy windley chrism zac
News: quakes lwn dc weather deseret zeitgeist softwarelivre
Zaurus: software developer
Tech: tango spintronics thin
Semantic: aaron sean
Reference: css rdf html4 javascript geckodom iecss emacs phrases acronyms
Reverse: advogato slashdot
Misc: gimp-savvy directory soda jokes shouldexist pdphoto