Hadoop World 2011 is upon us. Once again the community, with the consistent/polite/leading/persistent coordination of Cloudera, has built a great agenda with the right speakers and great attendees for what is now the “must attend” Big Data event.
For those who are attending, I can’t wait to see you there. We are going to be treated to some GREAT keynotes from masters community-focused thinkers like Doug Cutting, Hugh Williams, James Markarian, Larry Feinsmith and Mike Olson.
But now we all face a real problem. We have to decide which breakout tracks to attend. This is where it gets hard.
As for me, I decided to attend the sessions that would have a direct impact on my ability to understand the mega trends in the community. I also thought hard about who the readers at CTOvision.com are, as well as the researchers are that hit the CTOlabs.com site and decided to attend the tracks that will help me generate good content for those sites. So, below are my pick for the key tracks at Hadoop World. I’ll be at these. Then I’ll try to learn what transpired in others by watching video and downloading graphics as they are made available.
That said, here are my personal recommendations for breakout tracks:
Tuesday 8 Nov Hadoop World Breakouts:
10:15 – Building Realtime Big Data Services at Facebook with Hadoop and HBase.
Jonathan Gray Software Engineer Facebook. Facebook has one of the largest Apache Hadoop data warehouses in the world, primarily queried through Apache Hive for offline data processing and analytics. However, the need for realtime analytics and end-user access has led to the development of several new systems built using Apache HBase. This talk will cover specific use cases and the work done at Facebook around building large scale, low latency and high throughput realtime services with Hadoop and HBase. This includes several significant contributions to existing projects as well as the release of new open source projects.
11:15 – The Hadoop Stack – Then, Now and In The Future
Charles Zedlewski Vice President, Product Cloudera.
Many people refer to Apache Hadoop as their system of choice for big data management but few actually use just Apache Hadoop. Hadoop has become a proxy for a much larger system which has HDFS storage at its core. The Apache Hadoop based “big data stack” has changed dramatically over the past 24 months and will change even more over the next 24 months. This session will explore the trends in the evolution of the Hadoop stack, change in architecture and changes in the kinds of use cases that are supported. It will also review the role of interoperability and cohesion in the Apache Hadoop stack and the role of Apache Bigtop in this regard.
Bob’s thoughts: Charles Zedlewski is one of the smartest engineers I have spoken with. And this breakout is hitting a topic very important for us all to understand. “Hadoop has become a proxy for a much larger system” is an important point. When you use that term, what do you mean? I want to hear Charles’ views on that.
1:15 – Hadoop Trends & Predictions
Vanessa Alverez Analyst, Infrastructure and Operations Forrester.
Hadoop is making its way into the enterprise, as organizations look to extract valuable information and intelligence from the mountains of data in their storage environments. The way in which this data is analyzed and stored is changing, and Hadoop has become a critical part of this transformation. In this session, Vanessa will cover the trends we are seeing in the enterprise in regards to Hadoop adoption and how it’s being used, as well as predictions on where we see Hadoop and Big Data in general, going as we enter 2012.
Bob’s thoughts: Vanessa Alverez is smart and the trends she highlights will be worth noting and learning from.
2:15 – Life in Hadoop Ops – Tales From the Trenches.
Panel with: Eric Sammer Solutions Architect and Training Instructor Cloudera, Gregory Baker, Lead Software Engineer, AT&T Interactive, Karthik Ranganathan, Software Engineer, Facebook and Nicholas Evans, System Engineer, AOL Advertising.
This session will be a panel discussion with experienced Hadoop Operations practitioners from several different organizations. We’ll discuss the role, the challenges and how both these will change in the coming years.
Bob’s thoughts: I have found the only way you can pick a “Big Data” project team with confidence is to pick someone who has done it before. This is also the right way to learn lessons, learn from those who have really built solutions before.
3:30 – The Hadoop Award for Government Excellence.
Bob Gourley, CTO, Crucial Point LLC.
Federal, State and Local governments and the development community surrounding them are busy creating solutions leveraging the Apache Foundation Hadoop capabilities. This session will highlight the top five picked from an all star panel of judges. Who will take home the coveted Government Big Data Solutions Award for 2011? This presentation will also highlight key Big Data mission needs in the federal space and provide other insights which can fuel solutions in the sector.
Bob’s thoughts: I have to be there, of course. I would appreciate it if you are so you can let me know how it goes and so you can help build better systems for the federal space.
4:30 – I Want to Be BIG – Lessons Learned at Scale.
David “Sunny” Sundstrom, Director, Software Products, SGI.
SGI has been a leading commercial vendor of Hadoop clusters since 2008. Leveraging SGI’s experience with high performance clusters at scale, SGI has delivered individual Hadoop clusters of up to 4000 nodes. In this presentation, through the discussion of representative customer use cases, you’ll explore major design considerations for performance and power optimization, how integrated Hadoop solutions leveraging CDH, SGI Rackable clusters, and SGI Management Center best meet customer needs, and how SGI envisions the needs of enterprise customers evolving as Hadoop continues to move into mainstream adoption.
Bob’s thoughts: SGI does build systems that scale. And with CDH they are building for speed. This will be a great session.
Wednesday 9 Nov Hadoop World Breakouts:
10:00 – Preview of the New Cloudera Management Suite.
Henry Robinson, Software Engineer, Cloudera, Phil Zeyliger, Software Engineer, Cloudera, and Vinithra Varadharajan, Software Engineer, Cloudera.
This session will preview what is new in the latest release of the Cloudera Management Suite. We will cover the common problems we’ve seen in Hadoop management and will do a demonstration of several new features designed to address these problems.
Bob’s thoughts: Enterprise technologists need management capabilities. The only one that is really enterprise ready is the Cloudera Management Suite. This session will be a great way to learn more.
11:00 – Leveraging Hadoop to Transform Raw Data into Rich Features at LinkedIn.
Abhishek Gupta, Software Engineer, Recommendation Engine, LinkedIn.
This presentation focuses on the design and evolution of the LinkedIn recommendations platform. It currently computes more than 100 billion personalized recommendations every week, powering an ever growing assortment of products, including Jobs You May be Interested In, Groups You May Like, News Relevance, and Ad Targeting. We will describe how we leverage Hadoop to transform raw data to rich features using knowledge aggregated from LinkedIn’s 100 million member base, how we use Lucene to do real-time recommendations, and how we marshal Lucene on Hadoop to bridge offline analysis with user-facing services.
Bob’s thoughts: LinkedIn is one of the great use cases we should all understand better.
1:00 – How Hadoop is Revolutionizing Business Intelligence and Advanced Data Analytics.
Dr. Amr Awadallah, Co-founder and CTO, Cloudera.
The introduction of Apache Hadoop is changing the business intelligence data stack. In this presentation, Dr. Amr Awadallah, chief technology officer at Cloudera, will discuss how the architecture is evolving and the advanced capabilities it lends to solving key business challenges. Awadallah will illustrate how enterprises can leverage Hadoop to derive complete value from both unstructured and structured data, gaining the ability ask and get answers to previously un-addressable big questions. He will also explain how Hadoop and relational databases complement each other, enabling organizations to access the latent information in all their data under a variety of operational and economic constraints.
Bob’s thoughts: Every time I have ever heard Amr talk I have learned something new and I can’t wait to hear what he has this time.
2:00 – Practical Knowledge for Your First Hadoop Project.
Boris Lublinsky, Principal Architect, NAVTEQ, Mark Slusar, Manager of Location Content, NAVTEQ and Mike Segel, NAVTEQ.
A collection of guidelines and advice to help a technologist successfully complete their first Hadoop project. This presentation is based on our experiences in initiating and executing several successful Hadoop projects. Part 1 focuses on tactics to “sell” Hadoop to stakeholders and senior management, including understanding what Hadoop is and what is its “sweet” spots, alignment of goals, picking the right project, and level setting expectations. Part 2 provides some recommendations on running a successful Hadoop development project. Topics covered include preparation & planning activities, training and preparing development teams, development & test activities, and deployment & operations activities. Also included are talking points to help with educating stakeholders.
Bob’s thoughts: This could be the most useful session of the whole event for enterprise CTOs seeking to think through all aspects of a project. Seems like you can learn lessons from others who have done great things by attending this session.
3:20 – SHERPASURFING – Open Source Cyber Security Solution.
Wayne Wheeles, Cyber Security Defensive Analytic Developer, Novii Design.
Every day billions of packets, both benign and some malicious, flow in and out of networks. Every day it is an essential task for the modern Defensive Cyber Security Organization to be able to reliably survive the sheer volume of data, bring the NETFLOW data to rest, enrich it, correlate it and perform. SHERPASURFING is an open source platform built on the proven Cloudera’s Distribution for Apache Hadoop that enables organizations to perform the Cyber Security mission and at scale at an affordable price point. This session will include an overview of the solution and components, followed by a demonstration of analytics.
Bob’s thoughts: This award winning capability is already making a dramatic positive difference. I’ve seen Wayne brief and know he is a powerful thinker. I believe this talk will help us all understand how to more widely deploy CDH in support of cyber security.
4:20 – Indexing the Earth – Large Scale Satellite Image Processing Using Hadoop.
Oliver Guinan, Vice President, Ground Systems, Skybox Imaging.
Skybox Imaging is using Hadoop as the engine of it’s satellite image processing system. Using CDH to store and process vast quantities of raw satellite image data enables Skybox to create a system that scales as they launch larger numbers of ever more complex satellites. Skybox has developed a CDH based framework that allows image processing specialists to develop complex processing algorithms using native code and then publish those algorithms into the highly scalable Hadoop Map/Reduce interface. This session will provide an overview of how we use hdfs, hbase and map/reduce to process raw camera data into high resolution satellite images.
Bob’s thoughts: My friends at the National Geospatial Intelligence Agency and the Geospatial Intelligence Foundation will be very interested in this one. I’ll take notes and let them know how it goes.