Lee: What Gets Redacted in Pacer?

Timothy B. Lee of the Princeton University Department of Computer Science and Center for Information Technology Policy (CITP) has posted What Gets Redacted in Pacer?, on the CITP’s blog, Freedom to Tinker.

In this post, Mr. Lee reports on research respecting documents from the U.S. federal courts’ PACER database. Using customized software, Mr. Lee — using a non-random sample of 1.8 million PACER documents, of which 11,000 appeared to contain redactions — identifies the types of information most frequently redacted in PACER documents. In this sample, social security numbers were the most frequently redacted type of information. Mr. Lee summarizes:

[…][O]ut of 6208 redacted documents, there are 4315 Social Security that can be redacted automatically by machine, 449 addresses whose redaction doesn’t seem to be required by the rules of procedure, and 419 “trade secrets” whose release will typically only harm the party who fails to redact it. That leaves around 1000 documents that would expose risky confidential information if not properly redacted, or about 0.05 percent of the 1.8 million documents I started with. A thousand documents is worth taking seriously (especially given that there are likely to be tens of thousands in the full PACER corpus). The courts should take additional steps to monitor compliance with the redaction rules and sanction parties who fail to comply with them, and they should explore techniques to automate the detection of redaction failures in these categories.

Mr. Lee’s post doesn’t appear to explain the difference between the 11,000 documents found to contain redactions, and the 6,208 documents described in his statistical analysis.

Mr. Lee concludes:

This tiny fraction of PACER documents with confidential information in them is a cause for concern, but it probably isn’t a good reason to limit public access to the roughly 99.9 percent of documents that contain no sensitive information and may be of significant benefit to the public.

For more information, please see the complete post.

Leave a Comment

Leave a comment

Leave a Reply Cancel reply

Related Content

New: June’s Online Training Events

Failing to Use Modern Technology Makes Data Access Harder

Staying Productive in the Digital Age

Leave a Comment

Leave a comment

Leave a Reply Cancel reply