Library Of Congress Has Every Tweet, But No Way To Search Them

You know that even if you delete that ill-advised tweet right away, once posted online it’s there forever, right? No, really – it is. And the Library of Congress has it; they have EVERY public tweet sent since 2006.
But don’t worry, no one will be able to take advantage of your social missteps just yet – the database is not searchable. So the likelihood that a researcher will find (and use) your tweets to illustrate humanity’s depravity has been postponed – for now.
As you may know, the Library of Congress has a tweet archive dating back to the first days of Twitter in 2006.
They gathered this tweet data with help from Twitter partner and Twitter firehose enabler, Gnip. They had hoped to provide an index of tweets that would be searchable by researchers – and only researchers (the general public need not apply):

“…access to the Twitter archive will be restricted to “known researchers” who will need to go through the Library of Congress approval process to gain access to the data.”

But that day isn’t coming any time soon. The Library of Congress may have every tweet, but it’s about as useful as that iPad turned paperweight you bought for grandma to video chat with the kiddies. And an apt (though admittedly, hypercritical) comparison.
In an update report released today, the Library notes it “has extensive expertise in managing acquisition of and access to largevolume digital collections,” yet “the technical infrastructure for the Library’s Twitter archive follows the same general practices for monitoring and managing other digital collection data at the Library.”

Tape archives are the Library’s standard for preservation and long-term storage. Files are copied to two tape archives in geographically different locations as a preservation and security measure.
The volume of tweets the Library receives each day has grown from 140 million beginning in February, 2011 to nearly half a billion tweets each day as of October, 2012. The Library is processing data from the original 2006-2010 archive and organizing the material into hourly files. This operation is necessary so the entire archive from 2006 moving forward is organized the same – by time and in hourly files. This process will be completed in January 2013.

Hmm. It’s possible, of course – assuming they hire a place to help them sort it in a meaningful way. When do you expect to see a searchable database launch? And how do you feel about only “known researchers” accessing this info?
(Scratch head image from Shutterstock)

@MaryCLong Mary C. Long is Chief Ghost at Digital Media Ghost. She writes about everything online and is published widely, with a focus on privacy concerns, specifically social sabotage.