Anyone who thought they could sneak around Twitter’s increasingly restricted API and get at historical and real-time tweets through the archive in the Library of Congress had better think again. While the Library is making a serious effort to index all tweets since 2006, they’re only opening up this archive to “known researchers” who have the approval of the Library to access the information.
Audry Watters of O’Rielly Radar took a close look at the Library of Congress’ Twitter archive one year after it had partnered with Twitter to begin collecting the data.
The Library has access to Twitter’s historical and real-time tweets through Twitter’s data partner Gnip, who also sells access to the Twitter firehose to interested developers and publishers.
Watters notes that the Library of Congress has been archiving digital content – such as politicians’ websites and digital newspapers – for over a decade, but that Twitter’s constant flow of content (as much as 140 million tweets per day) poses a unique challenge to their archiving abilities.
As it stands, the Library isn’t seeking to catalog all of the tweets on Twitter just yet, but rather provide an index of these tweets that will be searchable by researchers looking to conduct a study of some sort. They will not be opening up this search to the general public, however:
“…access to the Twitter archive will be restricted to “known researchers” who will need to go through the Library of Congress approval process to gain access to the data.”
So, while the Library expects to open its digital doors to its Twitter archive in about four or five months, the average citizen won’t be able to casually look up what their first tweet was, at least for the foreseeable future.