Your space to share ideas and get insight for maximizing productivity using Microsoft products.
Georgia State Archives: Bringing Public Records into the Digital Age
November 22, 2010 at 5:00 pm #115914
By Andy Pitman, Microsoft eGovernment Solutions Business Development Director
With all of the buzz around open government, it’s easy to forget that governments have a long history of preserving and making accessible important public records. In many cases, these preservation functions were formalized into national and state archives during the early- to mid-20th Century. One early example is the Georgia State Archives, which was established in 1918. For most of its history, the Georgia Archives has traditionally focused on preserving paper-based records. While paper-based records remain critically important, more and more records are born digital and in some cases never have physical manifestations.
The Georgia Archives and almost every other state, local, and national archives, developed various methods for capturing and maintaining these electronic records. Their processes are often extremely complex due to continually evolving technologies, and the topic of how to best handle electronic records is an evolving science that is beyond the scope of this blog entry (the Library of Congress National Digital Information Infrastructure & Preservation Program, NDIIPP, provides additional information). But regardless of the effectiveness of their digital records management efforts, additional problems associated with accessibility persist.
At the same time, accessibility for many other types of electronic content has become relatively easy as a result of public and enterprise search engines. Computer users have grown accustomed to simply searching for content they’re seeking and quickly finding the desired results. Search, however, has not yet satisfactorily resolved all information access problems.
One major challenge is that there are very limited capabilities available to search for specific words in audio or video files. When these capabilities do exist, they are typically relegated to frequently sought content such as popular movies or musical lyrics. And there’s often significant work required for someone to listen to and transcribe each audio or video file, even in cases where there’s some degree of technology automation involving voice recognition. While this might be viable in scenarios where monetization is possible (e.g., advertising within search results or paid search), it’s not realistic for most government archives’ records.
For example, recordings of legislative sessions are often many hours in duration and seldom have accompanying transcripts. While there’s no doubt that the recordings contain useful historical information, it’s not economically feasible for the government to pay to have the recordings transcribed, nor is it reasonable for a citizen to have to listen to entire session recordings in hopes of finding relevant information.
The good news is that Microsoft Research developed the Microsoft Research Audio Video Indexing System (MAVIS), which provides audio indexing and search for very large batches of source audio content. When representatives of the Georgia Archives learned about this capability, they realized its enormous potential for increasing accessibility to audio and video records and particularly the ability to make thousands of hours of past state legislative sessions more useful to citizens.
Microsoft Research and the Georgia Archives conducted an informal test to determine the viability of indexing these legislative recordings. Results expectations were set low because the recordings were made in a manner not conducive to reducing ambient noise, there are many varying types of accents captured in the recordings, and there was no input metadata utilized in the indexing process (this type metadata can improve MAVIS’s indexing accuracy). But these conditions represent the reality of the available Georgia legislative recording content, and are probably typical of other governments’ audio content. On the other hand, any meaningful indexing results are better than the current complete lack of index of the audio files, other than the recording date. The test results were not perfect, but they were better than expected and unquestionably succeeded in fulfilling the intended mission of making the legislative recordings more accessible.
You can get an idea of the results by searching these test records at this MAVIS Web page; try searching for terms such as “budget,” “Atlanta,” or “bill.” What’s noteworthy is not only that the data has been automatically indexed, but also the method in which the search results are presented. You’ll see that the search results, much like you’re accustomed to with public search engines, provide text preceding and following the searched term, and make these terms an active hyperlink that routes the user directly to the appropriate portion of the audio or video file.
The Georgia Archives continues to evaluate the viability of audio indexing/search and plans to enhance the test via addition of other data (for example legislators’ names) that will make the results even more useful. As these tests progress, we’ll keep you updated here on the Bright Side of Government. We also hope that other government archives (as well as other state and local entities), like Georgia, will consider how audio indexing and search can improve records accessibility for your citizens.
You must be logged in to reply to this topic.