Government Big Data Award Special Mention: US Department of State, Bureau of Counselor Affairs

The Government Big Data Solutions Award was established to highlight innovative solutions and facilitate the exchange of best practices, lessons learned and creative ideas for addressing Big Data challenges. The top five nominees and overall winner was announced at Hadoop World in New York City on November 8 2011.

The Government Big Data Solutions Award Program is coordinated by The 2011 judging panel included: Doug Cutting, creator of Apache Hadoop and architect at Cloudera, Alan Wade, former CIA and IC CIO, Ryan LaSalle, Director of Accenture Cyber R & D, Ed Granstedt, Senior VP Director of the QinetiQ Strategic Solutions Center and Chris Dorobek, founder, editor and publisher of

The top five honorees of the Government Big Data Solutions Award are:

  • USA Search: Hosted search services over more than 500 government sites. Provides search and suggestion services plus analytical tool dashboards.
  • GCE Federal: Cloud-based financial management solutions.
  • PNNL Bioinformatics: Advancing understanding of health, biology, genetics and computing.
  • SherpaSurfing: A cybersecurity solution that analyzes trends, finds malware, and writes alerts.
  • US Department of State, Bureau of Counselor Affairs: Large data set with critically important applications for citizen service and national security.

Because of its critical mission impact and creative ability to leverage legacy architecture, the US Department of State, Bureau of Counselor Affairs earned a special mention. Department of State (DoS), Bureau of Consular Affairs (CA) uses a suite of software applications to collect information on applicants for immigrant visas, non-immigrant visas, and United States passports at consular posts abroad, domestic processing centers, and government agencies. It must use that massive amount of data to issue travel documents to U.S. and foreign citizens. So that it may use this data effectively, CA stores it in the Consular Consolidated Database (CCD), a robust, economical, and analytically-powerful data platform.

The data stored on the CCD is extensive and complex. Currently, there are 115 terabytes of information in the CCD, growing at a rate of 6-8 terabytes a month. That data is gathered by over 170 software applications processing travel documents and interfacing with partner agencies. Data from these applications comes in different formats and is organized by the case rather than by the individual. It also contains unstructured data such as comments, case notes, and images such as photos and scanned documents. The CCD synthesizes that information and finds all possible identity matches to give users the ability to make informed decisions, detect and prevent fraud, and identify potential national security threats.

The CCD is also crucial for information sharing between national security agencies. It is the is the single most important and frequently-used source of data for the DHS and gets hundreds of thousands of hits a month from the DoD and FBI. The CCD has made sharing information like fingerprints automated, simplified, routine, and almost real-time.

To achieve this performance, CCD is built around flexible and scalable architecture. It provides a web-enabled, directly-accessible single database platform that is easily integrated wit external systems. The CCD centralizes information from 270 consular posts using Oracle Multimaster Replication, making it the “largest connected/replicating database structure in the government” according to Oracle. The CCD still manages to save energy and $1.4 million annually by reducing redundancy in servers, data centers, and storage networks.

Using powerful analytical tools and a set of custom-built services, the CCD performs rapid background and identity checks. By linking a variety of information from biometrics such as facial recognition to points of contact on applications and strings of unstructured data in notes, it can counteract fraud, see through aliases, and vet individuals against other government databases. It brings all relevant information on an individual together and pre-screens applicants, making it very easy to use. In an environment where fragmentation and inefficiency had been the norm, the Consular Consolidated Database breaks the paradigm of data isolated in independent databases to facilitate information sharing, fight fraud, and help CA complete its mission effectively and efficiently.

Original post

Leave a Comment

Leave a comment

Leave a Reply