Tuesday, January 08, 2013

A Mountain of Tweets at the Library of Congress

The Library of Congress just put out a White Paper on the status of their Twitter Archive which was started in 2010 with tweets from 2006-10, and continues with a streaming operation set up for tweets post 2010 to the present.We're talking 170 billion tweets so far, with a growth rate of 140 tweets harvested per day.

While they have received over 400 serious research requests they are not yet ready to provide research access to the archive.  Explains the paper: "Currently, executing a single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours.  This is an inadequate situation in which to begin offering access to researchers, as it so severely limits the number of possible searches."  It's no easy problem to solve, either as it will take an extensive infrastructure overhaul of their servers which is cost-prohibitive for a public institution such as theirs.  In the meantime, they are developing a "basic level of access that can be implemented while archival access technologies catch up"--which doesn't tell us a whole lot but it will be interesting to follow for sure. 

Labels: , , ,

Web Analytics