I attended the breakout session: Drinking from the Fire House, which looked at the impact of and strategies and techniques for dealing with the explosion of data. John Kelly (Morningside Analytics) moderated a panel that consisted of Jason Hines (Recorded Future), Eddie Smith (Topsy), Jonathan Gosier (Metalayer), and Adam Sharp (Twitter).
The presentations were a bit sales-pitchy and the Q&A quite wide-ranging, so I’m taking great liberty in providing some of the nuggets I found most fascinating, as opposed to a full recap of the panel.
Fire-Hose analogy: Sharp played out the fire hose analogy, calling Twitter the reservoir and the analytics organizations the nozzles. To set the stage he provided some mind-numbing stats: 5 years after its founding, Twitter has more than 100 million active users who generating over 250 million tweets per day. 70% of these tweets come from outside of the U.S. and fewer than 50% are written in English. 1 in 5 world leaders participate. Finally, more than 750,000 developers have registered more than 1,000,000 apps using Twitter data. Oh, and remember the East coast Earthquake last fall? Some New Yorkers read DC tweets about it before they felt it themselves.
Making sense of the future: Hines’s organization is trying to record everything we know about the future and structure it in a way that people can use it. How do you make sense of phrases like “this weekend,” or “next spring?” What is the context with which it’s associated? Who is saying it? Pair that information and you can begin to meaningfully analyze it in new ways. If you get the temporal aspect right, you can move from not just knowing about what’s happening now, but what’s going to happen in the future. He believes this is the next big event in data.
- Smith: “real-time” is subjective. While for some people it’s a few minutes, for others its milliseconds.
- Smith: Major challenges include: increasing volume of data, dealing with non-English text, ranking information (as opposed to filtering), and figuring out what’s outside your blinders (i.e. the way you are defining things, search terms, etc.)
- Gosier: there is “solution overload” out there. Massive amounts of one-problem solutions pop up to help users make sense of massive amounts of data.
- Hines: There has been a shift from organizations looking inward to in-house data and applying analytics to a focus on data that lies outside the organization.
- Sharp: On the spectrum of real-time, the government takes the narrow view. Being able to respond to disaster is a lot more urgent than monitoring brand, which is generally fine if you find out the next day.
- Gosier: People tend to overestimate how much of the big data they need. You can do more than you think with focused, relevant data.
- How to anticipate the billions of people who aren’t yet online and using social media to integrate? The panel was generally in agreement that the best thing to do is let those people figure it out themselves. There are a lot of languages in developing countries that “don’t exist” on the web, very few commonalities in spelling, etc. The best thing to do is let those people crowd-source and arrive at a solution that works for them. Don’t impose it from a different culture, or it won’t work.