April 15, 2010 at 9:39 am #97772
from Library of Congress Blog:
Have you ever sent out a “tweet” on the popular Twitter social media service? Congratulations: Your 140 characters or less will now be housed in the Library of Congress.
That’s right. Every public tweet, ever, since Twitter’s inception in March 2006, will be archived digitally at the Library of Congress. That’s a LOT of tweets, by the way: Twitter processes more than 50 million tweets every day, with the total numbering in the billions.
We thought it fitting to give the initial heads-up to the Twitter community itself via our own feed @librarycongress. (By the way, out of sheer coincidence, the announcement comes on the same day our own number of feed-followers has surpassed 50,000. I love serendipity!)
We will also be putting out a press release later with even more details and quotes. Expect to see an emphasis on the scholarly and research implications of the acquisition. I’m no Ph.D., but it boggles my mind to think what we might be able to learn about ourselves and the world around us from this wealth of data. And I’m certain we’ll learn things that none of us now can even possibly conceive.
Just a few examples of important tweets in the past few years include the first-ever tweet from Twitter co-founder Jack Dorsey (http://twitter.com/jack/status/20), President Obama’s tweet about winning the 2008 election (http://twitter.com/barackobama/status/992176676), and a set of two tweets from a photojournalist who was arrested in Egypt and then freed because of a series of events set into motion by his use of Twitter (http://twitter.com/jamesbuck/status/786571964) and (http://twitter.com/jamesbuck/status/787167620).
Twitter plans to make its own announcement today on its blog from “Chirp,” the Official Twitter Developer Conference, in San Francisco.
So if you think the Library of Congress is “just books,” think of this: The Library has been collecting materials from the web since it began harvesting congressional and presidential campaign websites in 2000. Today we hold more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress.
We also operate the National Digital Information Infrastructure and Preservation Program http://www.digitalpreservation.gov, which is pursuing a national strategy to collect, preserve and make available significant digital content, especially information that is created in digital form only, for current and future generations.
In other words, if you’re looking for a place where important historical and other information in digital form should be preserved for the long haul, we’re it!
April 15, 2010 at 12:57 pm #97782
What bothers me about this topic is how narrow the argument seems to be. People either argue that it is valuable or it isn’t. Yes, this data is valuable. Yes, data should be preserved. Yes, LoC is relevant.
But at its most broad, though, this argument is about trust in the public domain — to have a government agency not recognize how thorny the issue of digital privacy is tells me that we really don’t have a clue how to handle people’s personal data in the 21st century. Would you think the same thing if LoC announced they were archiving everything on Flickr?
I want to see government using social media, but I think this is a step backwards.
Twitter should have made an announcement to its users, and given them the choice to opt in or not. Someone in another post on this topic said Twitter is great for assessing public opinion: Agreed! Let them solicit feedback from their own users and give them a choice. This is becoming business-as-usual for popular social networks: make a significant change and then tell people about it after-the-fact. Facebook’s Beacon, anyone? Google Buzz? Maybe these companies can handle these risks to their reputation, but do you really want the government involved in that game? It seems hard enough to show the value of social media in government with all the unknowns attached — why make it harder by showing very little thought about people’s trust? When it comes to government, trust is already pretty shaky for many citizens, and many aren’t convinced they can trust the Internet.
And on top of that, we can’t archive the zettabytes of information that are supposed to be archived, yet LoC is preserving tweets? I just don’t get it. I’m not surprised that Twitter (or Google or Facebook) would do something like this, but I am disappointed that LoC jumped in the game.
Fred’s post is excellent — I’ve pasted a few highlights below, but I recommend reading the whole entry. And now, Debbie Downer here, signing off.
“If you talk to people about things shared online, you generally run into two assumptions. The first is that things shared publicly are meant for the general public. The second is that things shared publicly are meant for posterity. Both of these assumptions are dangerous. Some of my recent work has identified that people do share privately in public, and that individuals do engage in the grooming (i.e. removal) of content shared publicly. danah’s found this. So have lots of others. If there’s anything we should know by now about social media, is that a deterministic, one-size-fits-all approach to privacy is a bad approach to privacy.”
“There’s probably a certain class of reader that looks at Bob and says, well, Bob’s out of luck. There’s Google cache and third party tools and a whole host of other ways tweets are preserved. The difference I’d argue is that these tools have certain properties – they react to API calls, they decay, etc. – that make them qualitatively different from a professionally managed archive. Through the creation of a permanent, public, third-party archive, Twitter changes the privacy-management strategies that are going to be available to users in the future. This is critical, because if Bob can’t trust his down-the-road privacy management strategy, Bob might share less today.”
“This is a great opportunity to plug the work of Helen Nissenbaum, whose most recent book Privacy in Context extends the argument for privacy as contextual integrity. Nissenbaum argues that disclosures have contextual expectations, and that shifting these expectations constitutes a meaningful violation of privacy and freedom. Even though the tweets are public, it is a fallacy to assume that digital content shared in public was created with an understanding that the content would end up in a third-party, government-managed archive. Facebook’s helped us demonstrate again and again that privacy is both qualitative and quantitative.”
April 15, 2010 at 1:39 pm #97780
@Sheryl: I may be wrong, but I think that they are only collecting public tweets. Not those on locked accounts restricted to approved followers. I think people need to be realistic in their expectations of privacy on things they publicly and openly publish on the Internet.
In terms of Flickr, I’d have no objection to them collecting anything that was published for anyone to view. On the other hand, if I’ve restricted a photo for viewing only by family and/or friends, it shouldn’t show up in their collection.
You point out that people are still posting things publicly on the Internet and expecting privacy and seem to suggest that the government should support this expectation. I worry that this will lead to expectations that we can’t deliver on. People can try to scrub but they have no way of being sure that they’ve succeeded. If we try to educate them to expect privacy in public postings rather than the contrary, we’re setting them up for a world of hurt.
The most I think we can expect from the LoC is that they’ll remove material if requested by an authorized person.
That’s the privacy side. I’m not going to get into the copyright side.
April 15, 2010 at 4:57 pm #97778
More info is coming out about the archive.
Here are some of the questions Michael asks (he also posts more details about the archive as it was made available, some of which answers a few of these):
Here are some immediate questions that need to be addressed:
1. Will user profile information also be archived and made accessible? And historical changes to user profile information? If so, can users update the profile information that might be archived at LOC?
2. Will lists of followers and who is followed be included? If so, how will the be updated?
3. Will geo-locational data be included?
4. Will the LOC allow automated scraping of the database (by search engine crawlers or other bots)?
5. Will the LOC allow commercial use of the archive?
6. Will the LOC process the archive in such a way to create categories of users or tweets? Essentially, are we going to see a Library of Congress Classification scheme for tweets?
7. Currently users can delete tweets from Twitter, which (presumably in a reasonable time) are deleted from Twitters logs, and no longer discoverable. Will users have the ability to remove unwanted tweets from the LOC? (I presume not)
April 21, 2010 at 12:41 pm #97776
Great questions, especially #5. I’d hate to have to later pay for what was available for free – including my own postings. I’m sure that great research studies will come out of the tweets.
April 21, 2010 at 12:42 pm #97774
I’m sure that great research studies will come out of this. Some disciplines include:
library and information science
You must be logged in to reply to this topic.