, ,

Gov 2.0 – making sense of complex data sets in order to use them

Hi, everyone,

Before coming to government, I served as a college financial aid director. After moving to DC, I spent eight years surrounded by large financial aid data sets, doing risk analysis and providing decision support based on the myriad records that colleges had submitted to the Department in order to administer aid to their students. Recently, I spent a few years doing analysis across those same “collections” in the Department of Education’s information management office.

Why am I telling you this?

Each position’s set of tools provided a different point of view on a program’s data, a different “way in.” See how the language changes, above? Since I already knew how each program was intended to operate – having managed student aid operations at colleges – each successive experience with a program’s data and metadata provided new information about context, relationships, what the data signified, and how the data could be used properly. Bottom line: it’s possible to learn enough about many data sets to use them properly, without having program experience.

At last week’s Gov 2.0 Summit, there was talk of contests such as Apps for America 2 and Code for America to build interactive applications using public, non-Personally Identifiable Information previously provided by people and organizations to government. At this stage in the game, simpler data sets (the low hanging fruit) are in play. Soon, people may want to delve into the more complex sets that are available on data.gov or agency web sites. If they’re willing to invest the time, then they’ll be able to gain enough of an understanding of the data to have intelligent conversation with the agency program experts who are listed as contacts.

I’m assuming that anyone who’s coding software already knows what to do with record layouts, schema, and data dictionaries. However, none of my acquaintances have understood the resource potential of the OMB Information Collection Clearances – and that’s the reason for this blog post. Below are six posts I made to Twitter this morning, just to get you started.

* * * *

Make sense of complex data sets by consulting “Information Collection” docs – start with Supporting Statement #g2s #Gov20

Information Collection Requests req. by Paperwork Reduction Act http://bit.ly/83URn & cleared by OMB http://bit.ly/1t4pC7 #g2s #Gov20

E.g., Dept of Education: http://edicsweb.ed.gov/ #g2s #Gov20

E.g., Dept of Health and Human Services: http://aspe.hhs.gov/datacncl/DataDir/index.shtml #g2s #Gov20

Other agencies? Option 1: http://www.reginfo.gov/public/do/PRAMain 4 clicks to Supporting Statement #g2s #Gov20

Option 2: Ask a Fed Web Mgr http://bit.ly/eIcIT for public link to his/her agency’s information collection clearances #g2s #Gov20

* * * *

By the way, for the non-Twitter users, the hashtags (#) are a way to group and find posts on specific topics. You can ignore them. If you want to see the posts on Twitter, go here.

Questions or comments?


Leave a Comment


Leave a Reply


Great post. Your experience at Ed and many in government shows that dealing with data is not new. We’ve been collecting and analyzing data for a long time internally. The key is helping release the data and helping unlease hundreds of thousands of more developers on the data….Keep up the good fight…

Kitty Wooley

LOL – thanks, Steve! I didn’t expect to get any comments on this one because this stuff makes eveyone’s eyes glaze over pretty quickly! But I’m just sayin’ – there’s gold in them thar hills; just gotta dig it out.