, , ,

Government Use of Tweetstreams

I just had an interesting discussion with several people, including privacy officers and ethics people revolving around the use of a Federal agency using/repurposing tweets and any potential privacy issues involved with doing so. Things seem a little unresolved and further discussion will be happening, but I'm hoping the GovLoop community can help address some of the issues.

Some of the questions being posed, along with some of the discussion results are posted below.

What information from the publicly available Twitter tweetstream (pulled via the API/ATOM feed) can be republished without causing privacy concerns?

If tweets are being collected and republished, either in a list, a graph, a map, a chart, etc...does the republishing of tweet data like handle/username or location data constitute personal identifiable information? If someone tweets, they're aware that any information they put in that tweet, in their location, or their bio is considered publicly available information (for those tweets they don't mark as private). So if a government agency was going to use that publicly available information, mold it into some chart/map, etc and then republish it in it's entirety, are they any further privacy issues that arise simply because a government agency is the one republishing the already public domain data?

A username by itself might be identifiable if they're using their real name. But they might also be using "dog" as their username which would not be identifiable. Is the username considered identifiable?

With location information, users can put "Reston, VA" or "the big lake by my house", or even their GPS coordinates. When combining username with the first two locations, that's probably not "identifiable." But would it be identifiable if someone chose to use their GPS location? Not all GPS enabled devices give the same depth of location accuracy. Also, some locations are based not on GPS but off of cell phone tower triangulation. That's probably not as identifiable considering it may have a wider range of uncertainty when it comes to the "exact" location.

BUT, if a government agency was pulling in a tweetstream that included all the publicly available data, but then filtered the results (stripping username/leaving it in, chopping off the lat/long coordinates to just a couple decimal points to increase the "location" to a broader range), would that be considered less of a privacy concern?

Does using information from a public available tweetstream fall under the realm of an OMB "collection of information" and therefore requiring an OMB control number?

OMB's wording for "collection of information" refers to the specifics of "asking" of questions or "requiring" the recording or recordkeeping of some information. So under that definition that means that an agency "collecting" information from a publicly available tweetstream is not a requirement on the person or asking a question of a person--it's being willingly provided, correct? Hence, the assumption is that it does not constitute and OMB collection of information and therefore does not require a control number. Agreed? Disagreed?

If using location information (from the bio or GPS) to plot tweets on a map, are there any privacy concerns with doing so?

I think already addressed this in the above questions but, considering the validity of locations being shared already, at what level of detail is the location information considered personally identifiable? Would it a be 20 yards? 100 yards? 1 mile?

Is there a requirement to do a Privacy Impact Assessment and a Privacy Statement?

Even though tweets are in the public domain already, does the collection of those same tweets and associated data by a government agency (even if being redisplayed in full for scientific purposes) require a privacy statement and privacy impact assessment? Simply because it's a government agency doing so?

Would it be acceptable to have a privacy statement that clearly stated what was being collected (in detail), what it was being used for and why, how long it's being retained for and why, how it's being disposed of, etc?

Would having such a statement negate all the other issues listed above because it's being made known to the person (through a website, Twitter profile, etc) of it's intended use?

I'm sure these are tough issues that other agencies are dealing with as well...or trying to deal with. Has anyone had an experience answering these issues? How about the GSA legal folks? What is your take on all of this? Originally the Twitter TOS, according to GSA, was OK for agencies use and didn't require a special TOS for government use. But since then the Twitter TOS has changed. Even still, were these types of questions addressed during the determination of the original TOS use? And if they were, what were the answers to these questions?


Note: This post is of my own opinion and not of any Federal, State, or Local government organization.

Leave a Comment

2 Comments

Leave a Reply

Profile Photo Scott Horvath

@Gwynne: If you are downloading and storing the data, in order to deal with the data and output it to another format (chart, map, etc), then a PIA sounds as if it's needed...even if you're "spitting back out" that same information that was downloaded but in a different view, right?