Skip to main content

LAJC Expungement Project

In late 2020, we partnered with the Legal Aid Justice Center (LAJC) to help them in their fight to expand criminal record expungement reform in Virginia.

What is expungement?

Expungement is the sealing of police and court records from public view for individuals who have been arrested and charged with a crime. These records are NOT destroyed. The records are removed from the public and can only be seen subject to court permission (such as to a law enforcement officer).

In Virginia, expungement has traditionally been available in very limited circumstances and required the individual charged to go through a formal process to request expungement. Consequently, only a very small percentage of criminal records have been sealed from public view.

LAJC | Looking Forward

In 2021, the LAJC aided the Virginia legislature in getting legislation passed to expand the definition of “expungeable” and enable automatic expungement for limited circumstances. This was a great win for criminal record expungement reform and those adversely affected by having these records publicly searchable.

Despite these legislative efforts, a very large number of criminal records remain for future consideration. The LAJC is continuing its battle to expand the definition of “expungeable”, automatically or by request, even further.

C4C | Our Efforts

The LAJC informed us of a tool and dataset that a fellow Code for America volunteer has worked on that aggregates this publicly accessible data (discussed below). Code for Cville is working with the LAJC to run analyses on this dataset to help the LAJC advocate for additional expungement reform. 

Expungement Analysis

Currently, we are developing a system for coding the esoteric law code to see how many existing criminal records are now expungeable under the new definition. Our goal is to show how tweaking pieces of the law could have the greatest impact on expungeable records for future action.

Racial Disparity Analysis

Another benefit of this data is in looking at racial disparity across the state and criminal codes. With LAJC providing guidance, we have built tools and analysis to help guide where to focus their advocacy work.

One of our brigade volunteers, Amanda West, built a dashboard for showing which counties over time have shown the largest racial disparities concerning marijuana charges. This tool came after analysis of the data showed that possession and manufacture of a controlled substance were the second and third highest charges in 2019 Circuit Courts and knowing that marijuana legalization and criminal record expungement related to marijuana charges had recently passed. Another member of the brigade, James Bennett, built a tool for showing counties with the largest racial disparities on charges in general. We created these tools with targeting in mind, and the data allows us to point these tools at criminal charges that the LAJC wants to explore further. With their guidance, we can iterate on our research as their advocacy continues to grow and evolve.

About the data:


The Commonwealth of Virginia hosts a web tool called Online Case Information System (OCIS) that allows anyone to search district and circuit court records for information about case dates, locations, results, charges, and much more. Before mid-2019, a search for a record required the specific date of the hearing and courthouse where the case was heard. This obstacle made it nearly impossible to use this system for any large-scale data analysis.

A member of a sister brigade of Code for Cville, Ben Schoenfeld at Code for Hampton Roads, was contacted by a journalist asking him if it would be possible to make this system more searchable so that they could work on a piece. From that request, Ben built This tool regularly scrapes the data from OCIS, anonymizes the data, and aggregates it for journalists, researchers, and non-profit organizations to use.


Various ethical concerns must be considered when working with this kind of data.


Ben has helped us greatly in anonymizing this data. Rightfully so - Ben did not want to create a system that made it simpler to run background criminal record checks on anyone.

The raw data from OCIS includes the defendant’s full name and the month and day of birth but not their birth year. In the dataset that Ben provides publicly, names and dates of birth have been scrubbed. Instead, Ben provides anonymized IDs, which are generated by associating records with the same name and birth month/day combination as the same person. This imperfect solution allows for fuzzy data analysis around individuals without making it easier for everyone to search for data about specific individuals.

Expunged Records

The core purpose of record expungement is another consideration.

If someone gets their record expunged from the Virginia system, they will not be automatically expunged from Ben’s scraped dataset. Ben has anonymized the record for public consumption, but this data is now stored in more than one location, which increases the chances of that information one day being subjected to a leak.

This is something that Ben has considered, and the solution in the future might be to rerun scrapes periodically to look for records that are no longer available to mark for removal. However, due to the resource intensity of data scraping, this is not a currently viable solution.

At Code for Cville, we only host the anonymized versions of this data for our analysis but we are constantly trying to consider how this data could be misused and making every effort to ethically use and store the data made available to us.

As mentioned, the searchability of OCIS before mid-2019 was very difficult. In mid-2019, Virginia quietly rolled out OCIS 2.0 which allowed anyone to search the entire* state’s court records by a person’s name. The result is a more transparent system for background checks or individual record searching, but still leaves the need for a tool like Ben’s to perform any large-scale data analysis.


"screen shot of the race and records data"

This is a first look at a couple James Bennett's dashboard done in streamlit with breakdown of crimes and breakdown by district

This is the link to the app delivered in streamlit's shared ecosystem, and the github code is available here

Also available in Tableau is Amanda West's view of racial disparity in Virginia Marijuana Charges.!/vizhome/Marijuana_Disparities_Virginia/FinalDashboard

Project notes

Dashboards from databases - 2021-08-10 We delivered a first report on the data to LAJC… more
Postgres live & python boilerplate - 2021-08-06 The expungement data has moved over in to a… more
Working with Random Forest - 2021-08-02 https… more