This article was written in August for GovLoop’s “Your Data in the Year of Everything Else.” The report can be downloaded here.
Every week, Dr. Susan Gregurick changes her videoconferencing background to a favorite photo from vacations past. The transposed digital image is a moment of pre-pandemic serenity for the Associate Director for Data Science at the National Institutes of Health (NIH). Since the start of the pandemic, Gregurick has been working seven days a week to direct and coordinate data efforts related to the many COVID-19 studies, clinical trials and tests under NIH’s umbrella.
On the day of her interview with GovLoop, Gregurick’s backdrop is a restaurant in Sanibel Island, Florida, whose walls are inch-by-inch spattered with an eclectic assemblage of framed photos and artwork. From the trip, Gregurick remembers that the auspicious décor captivated her children – young at the time.
“I have two kids at home [now], and they’re both young adults,” Gregurick said. “But you know, they’re only going to tolerate this life for so long before they themselves get a little antsy.”
In addition to the everyday COVID-19 challenges of balancing family, pets and technological needs in her remote setup, Gregurick is also tasked with coordinating the data activities of eight coronavirus-focused NIH teams, each with 50 or more dedicated NIH staffers. Those teams have special COVID-19 focuses, including developing rapid diagnostics, creating data infrastructure for near-to-real time clinical data and understanding multisystem inflammatory syndrome in children.
One such team is the Rapid Acceleration of Diagnostics (RADx) program, which aims to speedily scale up the number of available tests and make them faster, easier to use and more accurate. The RADx teams award grants to academic labs and companies, a familiar practice for NIH research.
The NIH approach to managing data during COVID-19 is based on a hub-and-spoke model, Gregurick said. Data is passed by awarded projects – the spokes – to higher-level data coordination centers – the hubs – until it reaches a data aggregator at the top. In the case of RADx data, the final hub is the COVID-19 RADx Data Hub, which is a data aggregator, and on the way, data undergoes quality assurance and control. This approach aims to make COVID-19 data available to the research, scientific and medical communities.
“Coordination centers basically serve as a hub for data and information, and the grantees are the spokes and they feed that data in,” Gregurick said. “Those coordination centers then become a spoke for a bigger hub. So, it’s sort of building up concentric circles of information.”
The advantage of NIH’s hub-and-spoke model is that data can be looked at from a highly focused or bird’s-eye view. Therefore, NIH researchers can collate COVID-19 data for information on very specific demographics, such as lung capacity in teenagers, or examine the broader health information of participants who’ve been tested.
The reason NIH can aggregate and analyze so much data from different sources is the standardization and frameworks put into place before data is ever input. Grantees must meet data use, standards and sharing requirements that the teams define. Privacy standards are being set up, Gregurick said.
Gregurick has helped lead a trans-NIH working group to establish common data elements. With these foundations, NIH can use algorithms to identify the profiles of individuals who might track across different systems. Doing so is important to understanding the many different health factors that COVID-19 impacts.
“If you’re Jane Doe, and you’re in an All of Us study, and you’re also in a clinical trial, we can identify you as the same person, and therefore we can make sure that the data is linked to that one person,” Gregurick said.
COVID-19 came to the U.S. with plenty of unknowns. In March 2020, many within the medical and research communities mistakenly believed that COVID-19 was only dangerous to elderly populations and just impacted the respiratory system. At the federal level, major testing and research initiatives often trailed the virus’s arrival by months, such as when the RADx program began April 29, 2020, following Congressional appropriations.
In a climate of continuing uncertainty, one challenge NIH faces is that data repositories are focused on specific diseases or data types, complicating data-sharing for COVID-19, which has a wide range of health effects across populations.
That problem has been compounded because data-sharing agreements were not in place before COVID-19, a challenge many agencies encountered during the pandemic.
“The real problem is that when you create a data platform that’s very specific for a particular mission, you don’t necessarily think about the transaction of all the different data, like aging and child health, and heart, lung and blood missions that could be a part of COVID,” Gregurick said. “COVID makes you realize that it’s a disease that has many different important contributors.”
Since initial delays, NIH research has included focuses on COVID-19’s impacts on the heart and blood, as well as studies of other demographics, including mothers and children.
Gregurick pointed out that all of these efforts are driven by a common mission: to save lives. While with a vaccine and the right response, the pandemic itself will fade, its long-term health impacts will live with those who contracted and survived the virus.
Interoperable, nuanced data will be vital to treating their conditions.
“We’ll be out of this soon, hopefully. We’ll be out of this particular situation. But the people who’ve gotten sick and who’ve recovered may have lifelong or some periods of time of health issues,” Gregurick said. “We do need this data to understand their health.”