6 Steps to Get Started With Big Data

Since the founding of our nation, government has been in the business of data collection. Starting in 1790 with our first census, all the way to today’s developments of hiring chief data officers (CDO), data has been essential to America becoming a leader in the global economy.

A recent report by the Department of Commerce highlighted how government data could potentially guide trillions of dollars in investments, and the impact data can make to improve the quality of services and decisions made by public sector leaders.

Today, big data technology represents an unparalleled opportunity to transform the public sector. By deploying tools to extract knowledge from information, agencies can gain insights on how to improve processes, government services or even combat crime and disease. But in order to capitalize on big data, organizations must deploy advanced IT solutions to manage, store and extract value from the information.

That’s why I was intrigued by a recent report by CTOlabs.com, Enhancing Functionality and Security of Enterprise Data Holdings: Examining new mission-enabling design patterns made possible by the Cloudera-Intel Partnership. The report is a reminder of the importance of partnering with industry and investing in modernized infrastructure, which includes the enterprise data hub (EDH).

With an EDH, data can be stored safely and securely in its original fidelity and many types of computing capabilities, like batch processing, interactive SQL, enterprise search and advanced analytics, can be brought together and directly to this data. And with an EDH, these data and computing functions can be integrated with existing infrastructures and tools. This is important, because with this kind of data architecture, you are able to create a broad infrastructure that supports various kinds of workloads with shared resources and data,.

Serving as the foundation for big data and the critical core to the success of the EDH is Apache Hadoop, a 100% open source solution for storing and processing data. Yet Hadoop alone might not meet the needs for true agency- and enterprise-wide adoption, including compliance-ready security and governance.

The report identifies that with Cloudera, many of these challenges have been solved, and that Cloudera can help organizations deploy Hadoop in a way to take full advantage of the possibilities and power afforded by an EDH.

But where can you being? The report includes some considerations on how to get started with an EDH and big data. The report gives six ways to get started, highlighted below.

  1. Understand and focus on current use cases: “Our review of workloads above should help planners identify and clarify the most important/prioritized use cases as design goals for your project. Determining the prioritized data flows for the first use cases will help ensure success on a project is demonstrated early.”
  1. Ensure the design focuses on output: “Identify the analytical queries and algorithms required to generate desired outputs. This will enable the capturing of the advanced analytics requirements and interactive query needs the system must meet and ultimately dictate the rollout of a converged computing strategy.”
  1. Assess your business rules for operating over the data and interacting with the solution: “Agencies now have the capability to encrypt 100% of data and to assign and control access and audit it in new ways. And the applications and solutions that run over this data can also have access controlled by this end-to-end ”
  1. Plan for future expansion of use cases: “First successes will be measured based on how well they meet current agency needs. But the power of a well-engineered enterprise data hub is that it can support many new use cases and future workloads. The key action in planning for expansion is to listen to the challenges faced by mission owners, and be prepared to iteratively incorporate into the solution new workloads and new data flows provided by them.”
  1. Consider the full design: “Consider compute, networking, data storage, and the software framework together as the data platform.”
  1. Ensure to ask for design help: “Repeatable patterns from other enterprises are available for reference. Engineers from Cloudera, Intel, and their partners can help refine and turn functional reference architecture into a technical design that will rapidly bring new functionality to the agency mission.”

Big data is changing the way that government operates, and is helping organizations to dramatically re-imagine how services are delivered. As organizations continue to explore big data, it will become imperative that they have an EDH architecture to power their big data programs.

Read the report to learn more.



Cloudera is revolutionizing enterprise data management with the first unified Platform for Big Data: The Enterprise Data Hub. Cloudera offers enterprises one place to store, process and analyze all their data, empowering them to extend the value of existing investments while enabling fundamental new ways to derive value from their data. Learn more here.



Leave a Comment

Leave a comment

Leave a Reply