You are currently browsing the tag archive for the ‘emc’ tag.
Storage has always been a big concern of mine. My mother always taught us to keep everything forever, so despite of my repeated efforts to trim down the cruft, I’ve managed to accumulate quite a few things over the years. Once you have a lot of stuff, where to put it becomes the ever-important question. Granted, an ever better question might be whether you really need to keep it or not, but some of us have trouble answering that one. So we stock up on folders and drawers, baskets, bins, boxes and buckets, label it all and pack it away in our shelves, closets, trunks, basements (or perhaps a $147 per month storage unit if your landlord forbids access to the basement) and sheds.
For those that put some thought into it, there’s a science to where everything goes. Simply, it’s based on having the things you need the most frequently or with the most urgency ready at hand (tier 1), and having things that are just nice-to-haves accessible with an amount of effort that is reasonable for however often or under whichever circumstances you want to be able to access them (tiers 2 and 3). For example, you might keep your flashlight in the hallway closet and keep the rest of your camping equipment at the storage unit, knowing that you’ll probably have a lot more advance notice about needing your canteen while the need for the flashlight might just creep up on you, unannounced.
This problem extends into the digital world more and more. As drives get bigger and bigger every year, the volume of our data grows even faster and becomes ever more difficult to manage. For example, last year we got a fancy digital video camera that we’ve managed to use quite a bit, to the tune of nearly 700 movie clips comprising well over 100 gigabytes of raw footage. Because neither of us has enough space on our computers, we have to get external FireWire drives to store the stuff and since hard drives are unreliable and we don’t want to lose everything in some horrible mishap, we have to back it all up. I have burned nearly 30 DVDs of all of our footage from the last year! Not only is that time consuming, but now we have to find some place to keep all those discs. If we store them at home, then both the original and the copy could be lost in the same fire, for example. Storage unit? Ugh.
At the Broad Institute, where I lead the server/storage team, we have this same problem only our problem is several orders of magnitude larger. Much of the work we do involves running genetic material through fancy instruments that apply some chemistry to the samples to generate high-resolution images that are processed and evaluated for the greater good of mankind (curing cancer and the like). Because we’re constantly refining our processes and because some of the samples are in limited supply and can’t be reproduced, we have to keep much of this raw data around forever in case we want to re-process it. Much like my mother, our scientists have a “let’s just keep it forever” perspective on their data.
When I started at the Broad more than four years ago, this storage problem was measured in hundreds of gigabytes and terabytes. As current sequencing technologies are scaled up and new technologies emerge, the volume of data we generate increases exponentially. Today we measure our storage problem in hundreds of terabytes and petabytes. We have a single application that generates upwards of 50 terabytes a week! It is becoming a common occurrence to receive a ticket with a request like “I’ve run out of space for my research data, can I please have another 10 terabytes today?” Keeping up with the ever-growing demand for storage is currently our biggest challenge with cooling/power/space a close second.
In order to stay on top of new storage technologies, we are constantly involved in extensive product evaluations. Over the last 18 months, I have tried or purchased many of the most advanced storage technologies on the market today. Here are some examples:
- Acopia ARX6000 – Adaptive Resource Switch (File Virtualization)
- ADIC Scalar i2000 – Intelligent Enterprise Tape Library
- IBM GPFS – Parallel File System on IBM servers & storage
- IBM TS3500 – Enterprise Tape Library
- IBRIX – Parallel File System on Dell servers & EMC CLARiiON storage
- EMC Celerra Multi-Path File System (MPFSI) – High-performance File System on EMC CLARiiON storage
- EMC Rainfinity – Global File Virtualization
- Isilon IQ 9000 – Clustered Storage System
- NetApp FAS6000 – Net Attached Storage (NAS: CIFS/NFS/iSCSI)
- Sun x4500 Thumper – Data Server
In addition, I’ve been in talks with 3par, AgÃ¡mi and BlueArc, as well as one company that doesn’t even have a real website or any customers, but has a very interesting product. On Monday, I will be speaking with Amazon about their S3 simple storage service. It’s a constant effort to know what’s out there and to get enough understanding and hands on experience with the technology to really know what makes sense for our present or future needs.
This problem is not going away. It was sometime in this year that we officially entered the “petabyte club,” with it’s rapidly growing membership, and it won’t be long before we are dealing in 10s of petabytes. Data is our product and it’s very valuable to us, sometimes priceless. Storing and protecting that data is my job and is something I take very seriously.
Now if I could only find that flashlight.