, , ,

Social Media as a Sensor – Leveraging Crowdsourced Data for Early Warning and Response

Originally posted at www.thehomelandsecurityblog.com
2011 January 24

By Sara Estes Cohen
Co-authored by Bill Hyjek

A recent story published on Wired.com discussed the findings of group of researchers at the Indiana University School of Informatics and Computing who developed a method for predicting changes in the Dow Jones Industrial Average through the analysis of Twitter updates. The research team leveraged open-source mood-tracking tools like OpenFinder to sort Tweets into positive and negative bins based off of emotionally charged words, the research team was able to predict the ups and downs of the stock market at closing bell three days later to within 86.7% accuracy.

Now consider leveraging data collected in this manner via Twitter and other social media tools for other types of predictions. The implications of this type of data collection for early warning and/or confirmation of information – social media as a sensor – are significant if applied to the field of public safety.

Earlier this year, Federal Computer Week highlighted a group of Namibian officials who, with assistance from an international team of experts including representatives from NASA and the National Oceanic and Atmospheric Administration (NOAA), developed a geospatial application tapping and combining satellite imagery and river-height sensors to get an early read on possible flooding in Namibia. Leveraging sensory data, officials are now able to predict, prepare for, and respond to events much sooner than previously possible. Furthermore, aggregating and geospatially depicting data provides contextual understanding of a large volume of information very quickly.

By combining social media data with geospatial analysis, officials may be able to prepare for and respond to a disaster faster than ever before. Sensory data like that collected via river-height gauges and seismic monitors, when combined with social media data and/or sentiment analysis, provides both the “what,” or that an event has just occurred or is about to occur, and the “who,” the “why,” and the “how” – or the context of an event, including the public’s level of understanding, its reaction to and knowledge of factual information, may even assist in predicting second and third-level events that might arise as a result of the original disaster.

Emergency response officials already monitor seismic data provided by the United States Geological Survey (USGS)for early detection of earthquakes. Why not combine seismic data with key word searches for “earthquake,” “shaking,” etc. within specific geographical locations? Going further, why not overlay both seismic data and geospatially mapped data from Twitter with historical event data, critical infrastructure data, hazard and mitigation data, etc.? The resulting mash-up could provide an unprecedented level of contextual understanding to response agencies experiencing resource cutbacks and struggling to keep up with the volume of information available on the internet.

Despite the benefits of collecting crowd-sourced data during an emergency, it has not yet been adopted by incident response agencies for a variety of reasons. Many in the incident response community are reticent to social media data a valid information source. In large part, this is due to the difficulty in vetting the potentially vast amounts of data during a major operation. The inability to process this information, in turn, raises other issues for decision-makers, including potential liability concerns. To a lesser degree, the incident response community is steeped in tradition, with a strong proclivity to favor only proven methods and tools for the conduct of their mission. Dramatically divergent concepts are likely to meet with some cultural resistance.

For agencies to begin using social media and other types of sensory data for early warning and response, several changes must occur. First, rather than constantly monitoring Tweetdeck or similar other tools and attempting to physically sift through the data that is rapidly coming in from social media, news wires, etc., imagine if a predetermined aggregation and filtering mechanism could automatically filter through the information and geographically map it so you could look at all the information in context to an event as it unfolds. Incoming Tweets and sensory data could then be visualized as points on a map, and additional tools could enable you to pull in relevant information from other sources including government agencies, public information offices, and non-governmental organizations. This information too could be automatically sorted and mapped for further analysis. Additional tools could then enable more rapid and accurate analysis of the information allowing for efficient and effective decision making. Virtual USA, the Department of Homeland Security’s flagship program, sponsored by the White House Open Government Initiative and DHS Secretary Napolitano, has already made these concepts a reality.

Second, although social media tools enable access to a great deal of information from multiple sources prior to and during an incident which can, in turn, greatly enhance decision making and situational awareness, the wide scale use of social media during an event can also present significant challenges in monitoring and sorting through large amounts of data in order to authenticate information for real time decision making.

To harness crowd-sourced and sensory data most effectively, agencies need the ability to successfully aggregate, filter, integrate, map, prioritize, assign, and follow up on data collected via these methods. Accomplishing this requires that:

– Data aggregation and analysis tools to be developed to assist organizations in decision making;
– Data should be geospatially enabled for additional context; and
– Applicable governance framework, policies and challenges (e.g., liability and privacy issues, etc.) must be identified and addressed.

Leveraging crowd-sourced and sensory data may prove useful for alert and early warning of several types of events, including:

– Shooting and other violent acts as they occur;
– Disease, outbreaks, symptom clusters;
– Bird and other animal deaths;
– Floods, tornados, wildfires, and other natural events; and
– Traffic.

I am interested in hearing what others have to say about the types of data that might serve to assist public safety organizations in responding to events within their jurisdictions. Often it is the real-world application and identification of an information gap that drives the development of new and innovative technologies and methodologies.

Leave a Comment

21 Comments

Leave a Reply

Profile Photo Allen Sheaprd

@Sara,

Great post. To do predictions as well as garner public opinion. A few years ago HHS Secretary Mike Levitt hels open web casts on H5N1 or bird flu. H5N1 is unstoppable virus with no cure and 68% death rate under medical care and intervention. Death rate higher without 21st century medicine.

Much to the dismay of others the public did not “panic” and was up on the facts. Twitter was used to make comments to those not on the webcast. Reactions during the webcast where also made.

@Sara, do you also feel a “big brother” side to this? If software allows only a few to peer into the hearts and minds of others then those few have the upper hand.

Profile Photo Sara Estes Cohen

@allen – Thanks! I think there is a piece of big brother to this…but utlimately the data crowdsourced is public information provided by the public, so it remains (as it is now) up to the invidual to decide how much privacy they are willing to forego for safety…I don’t think this is an issue of “big brother” – but a method for aggregating and analyzing data that already exists…in addition to data that is already collected for scientific purposes, like in the case of river height sensors and seismographs.

Profile Photo Chris Poirier

As a member of the emergency management/response/etc community I can fill in some comments on this topic. A large issue with including social media information into the stores of data already being collected is as you state; a large section of data that we cannot place any level of certainty against. Most operations centers are overwhelmed with information during a crisis and public/social/media information all gets categorized into “unconfirmed” information. In a perfect world organizations could focus resources just to poor through all of this data and attempt to validate it against confirmed sources, however most operations lack the funding and personnel to do this timely.

However, to be fair, I do usually keep an eye to the the social reporting during any emergency as nothing beats “eye witness” accounts/pictures/video/etc from an event scene. I typically use google realtime and search an incident by name. Google will pull up all tweets, blogs, etc associated with your search and provide time and volume information on that search. VERY USEFUL. Though it does increase the amount of “fog” that I have to sort through next to official reporting I am already receiving.

I see the trick being how do you combine social/crowd sourcing/real-time reporting to applications that can be more timely validated and confirmed. Gov does a great job of pushing incident information to people (e.g., street closures, school incidents, evacuations, etc.) However, we’re not as good about using similar platforms to receive information back from the public. This citizen engagement would lend a hand to the operators pouring through volumes of data, while also providing important context to an emergency that can be validated and turned back around to the public and responders alike.

To this end I like a project I saw DARPA working on: Going into Haiti the benefits of a geolocation based reporting tool was seen as highly beneficial. The army had been working on a program where units could report important information on a map and tag photos/orders/etc for future patrols. This concept was used in Haiti to locate relief supplies, buildings that had been searched already, tent cities, etc.

Therefore, I think its a combination of ideas: Find a home for citizen input for emergencies, find a way to validate it quickly and then provide for responders and citizens alike. Anyone got some money and time, I am feeling venture capitalistic today 😉

Profile Photo Sara Estes Cohen

Agree Chris and thanks! –

I’ve proposed the development of a mechanism to aggregate and analyze based on thresholds developed by each agency for specific purposes. Also the inclusion of sensory data (satellite, lidar, flood height, seismic), etc. is already utilized by several agencies within their visualization platforms (whatever they may be). The trick now is to combine for additional context. Social media data shouldn’t be the ONLY data for emergency response, but can be leveraged to enhance traditional means.

Profile Photo Chris Poirier

That’s the comment I was looking for: “shouldn’t be the ONLY” A lot of folks get worked up and stick on this comment. When advertised as an additional data feed to support ongoing open source collection against an event and that, that information can be parsed; that is when the value add is amplified for most end users. We already suffer from information overload, we now need solutions that can do business/process intelligence without removing the human element in open source collection.

Profile Photo Allen Sheaprd

@Sara,

You are right about people putting out the information publicly. The big brother aspect goes deeper than people know. Real life example. I may be able to tell you which bra an old boyfriends current girlfriend wears. Sound insane? Working on a contract for House Hold recovery we used several private databases for doing skip-trace and fraud. The database’s warehouse recipts, magazine subscriptions, telephone, cable bills (though its pay information only). Givne a name, the phone book gives an address. These database can then look back to see where they have lived. Then see what guys have lived there. Use that information to see where they live. Use that result to check for current live in girlfriends. Then check her buysing habbits to see what she has bought from Sears, Proffits, Kohls, Victoria Secret, etc.

Please if anyone can correct this do. Well phone books are not as valuable as they once where.

@Sara – even though tweets, GovLoop, FaceBook, etc are public only those with the program see the trends gaining an upper hand. If you ask which database’s where used I can only say they are located in Arizona and Chicago. Name and contact info are company confidential as HouseHold recovery and others use them.

Yea it freaked me out at first too. There are more stories.

Profile Photo Allen Sheaprd

@Chris,

You make great points that I have not been able to get anywhere with. Valid real time data is key during an EOC (emergency operations center) event. I’ve pushed “Tornado spotters” concept for EOC use. Just as NOAA trains civilians to be tornado spotters cities should train people/CERT folks to do small scale filed reports. Down power lines, closed roads, need for Oxygen/meds, electrical & phone outages, etc. Can it work? IMO – yes. Actually let me rephrase that *YES* One example came after a hurricane hit the city. Knowing the GIS director I called in to let them know which roads had powerlines across them. He was thankfull. No cops or firemen had been out that far because of – you guess it – down power lines.

Another person in the EOC noticed her neighborhood looked “untouched” She called her mom but go no answer. She called her sister who drove over to check on her. Later she got a phone call. “Mom is fine. Their phones and power are out. BTW there are some neighbors who’s oxygen tank. Sorry it took so long to get back with you. I had to walk the last few blocks because of down power lines” @Chris this really happened. The GIS folks are friends of mine.

Please if you can make any progress, let me know how.

Profile Photo Chris Poirier

@Allen, this is typical. However, I can understand the problem. It’s hard to open up to the community on things like this as officials like to have confirmed reporting before posting things on public facing sites etc. It is a real problem. The issue is over coming the “fog” and turning it into actionable information that has a level of confirmed certainty. there is no doubt that making this public and social would get more reporting during incidents but that is part the problem AND the solution.

Profile Photo Chris Poirier

@Sara this could be an outstanding use of FEMA disaster ops cadre, red cross volunteers, and/or volunteer surge professionals for states. (A lot of states have surge call rosters of people who will volunteer time to stand up operations centers. etc.) I would also say integrating this with the Ham radio operators that come out during emergencies would be a great use of the RACES resources.

Profile Photo Allen Sheaprd

@Chris,

Ham operators for RACES are great – but the ARRL is a semi closed community. Few people have the equipment.

Grand Forks flood of ’08 (think it was ’08) was a great use of Twitter and PhotoBucket. Tweeets helped give reports and bring in those who wanted to help but did not know where to go. It was also a great learning experiance for spectators. Between Twitter and photobucket I got a “feel” for being there. Twitter also got the “group effect” or “band wagon” effect going. The whole “you want to help – well come on; we are going too”

My info came from a RedCross specialist who provided info, updates and photos. It was interesting to feel how stragers come together. No idea where they will eat or sleep but wanting to help. Showers & laundry came up later.

@Chris, (sigh) if only more people would sign up *before hand* To be trained, even if they moved out of state before the emergency. Being prepared for a hurricane helps prepare one for snow storm, earth quake or black out.

Profile Photo Chris Poirier

@Allen You can bring in RACES, ARES, Red Cross, emergency management carde, Fire Corps, CERT, etc etc. The combination of all these groups should bring up the numbers. It is just getting them to be smart on social engagement type things they can do. Bottom line is that you want to ensure trained people are going into disaster areas, making something social, but not educating the people would be a disaster. We learned this lesson post 9/11 and Katrina, you want trained volunteers in a disaster area, anything/anyone else most likely will become a victim themselves.

Profile Photo Allen Sheaprd

@Chris,

Amen

Trained people. Andrew, Katrina both proved that.

So far its mostly the young on Twitter. A social blog or web site with a similar format to WEBEOC. I’m not sure a Wiki could keep up.

Anyone have thoughts or experiances with “myth” control? Correcting bad information on social sites?

Allen

Profile Photo Chris Poirier

It is by far a cultural issue as much as it is a training issue. In so far as being able to conduct “myth” control, it is more so ensuring that emergency messaging is consistent, confirmed, and controlled. Though this may sound counter to the open government model most are looking for, I feel like they are required when dealing with public safety. And it certainly is not to say that open reporting cannot be included in this process, in fact I think it helps validate information when combined with official reporting from government entities and volunteers alike.

I think a solid Jive/sharepoint/etc type social web front could be used to collect public information and even provide it back to the public in a social context. With everything using XML type standards to move information this information could be standardized and ported into a WebEOC/ETeam etc type application with easy as they all function on XML databases. (In other words public collected information could be feed into incident management systems to be validated and pushed back to the people that need information on the fly in an emergency. You could even go so far as to continue to push this validated information to any organization that is using XML based information. (e.g, power company, water company, national guard, NGOs, etc.)

However, this discussion then comes back down to the normal discussion held here which is data standardization. If everyone goes out to create their own XML based social platform for sharing emergency information all of the solid work that has been done on EDXL may go to the way side. Though, if done in a coordinated effort a lot of good things could come out of this type of engagement. (http://www.comcare.org/edxl.html) I highly recommend interested parties to look into COMCARE and the efforts they work on.

Profile Photo Allen Sheaprd

@Chris,

I agree – because, IMO, soooo many are untrained there needs, *needs* to be one consistant message. Sadly many people will be learning “primitive” camping for the first time. I agree with you.

Here @Sara is right about crowdsourcing helping to take the pulse of the people. Past experiance shows crowdsourcing is a great source of information and skill. Kinda like a *real time wikipedia* everything from Sanitaion to cooking in cardboard box with tinfoil. Yes it works. Pizza comes out like a good brick over one because its cooked right over the coals. (that is another story)

@Chris,

Good link – (http://www.comcare.org/edxl.html) for NIMS teaches “plain english – no jargon or techical terms like 10-4.” The XML help.

Twitter hash tags and social media sites are more accessable to the general public.

This still does not solve the “digital divide” but there are other programs for that.

BTW there are emergency & crisis manamgent (https://www.govloop.com/group/emergencyandcrisismanagement), Disaster recovery and Pandemic (https://www.govloop.com/group/pandemic)group and Volunteer (https://www.govloop.com/group/communityserviceandvolunteerism) here on GovLoop for a wider discussion of social medai

Profile Photo David Resseguie

@Sara (and others commenting), these are some great ideas and valid issues we’ve got to work through.

Regarding @Chris’s point on XML formats and data standardization: That’s an issue we’ve been thinking about a lot with Sensorpedia and related projects for our customers. In an ideal world, everyone would just agree on a format and sharing (at least from the technical perspective) would come easy. But unfortunately getting people to agree on formats is often orders of magnitude more difficult than the original problem. We took a looser coupled approach that doesn’t specify specific formats, but utilizes things like URLs and tags to still maintain much of the benefits of a tightly coupled system. So far we’ve had some good success with this approach so we can move past that initial hurtle and start to actually build solutions. Then by taking the “may the best format/protocol win” approach, hopefully we can continue to evolve towards more formalized and agreed upon standards.