NCI’s Surveillance, Epidemiology, and End Results (SEER) Program in the Division of Cancer Control and Population Sciences (DCCPS) is marking 50 years of cancer surveillance research. Among the major SEER milestones are three new initiatives that are likely to be of particular interest to the data science field:
- Virtual Pooled Registry (VPR). SEER is drawing on technology to remove some of the barriers that researchers face when conducting multisite studies using patient data from SEER registries. In the past, researchers were required to submit their protocol and forms to each study site for Institutional Review Board (IRB) approval and data access. Now, researchers can use a Templated IRB/Registry Application to apply to all the registry sites at one time, significantly expediting the research review process. The VPR also features software, called Match*Pro, so researchers can automatically match cases across participating registries. SEER currently has 43 registries participating in the VPR, with more being added each year.
- MOSSAIC (Modeling Outcomes using Surveillance data and Scalable Artificial Intelligence for Cancer). In partnership with the Department of Energy (DOE), SEER is using artificial intelligence (AI) and translational AI for advanced surveillance and, ultimately, better cancer care. SEER/DOE are developing several application programming interfaces (APIs) to extract tumor characteristics from electronic pathology reports to aid in predicting cancer diagnosis and improving care. The APIs make it easier to abstract structured data from unstructured pathology reports. For comparison, using an API, data extraction was 18,000 times faster than a human performing the same task. This translates to about 65 seconds per report, a significant saving in both time and resources.
- Virtual Tissue Repository (VTR). SEER is scaling up a program aimed at solidifying the infrastructure used in population sampling from pathology labs. Through the VTR Program, SEER’s goal is to give researchers better access to hard-to-find data. This includes data on rare cancers and outcomes (e.g., long-term survival from pancreatic cancer), and data from underreported and underrepresented subgroups of people (who typically aren’t included in clinical trials). By pooling large numbers of cases, SEER will be able to capture rare and valuable data that accurately represent the Nation’s demographics, including African American, Hispanic, and Asian populations.
According to Steve Friedman, senior advisor for operations in NCI’s Surveillance Research Program in DCCPS, “Part of our goal at the SEER Program is to improve the completeness and quality of the data we capture, ensuring that the data sets are quite pristine by the time they’re released to the research community.”
He added, “We think we’re on the right path in leveraging new technology to help solve some of the surveillance challenges that we’ve identified—particularly as we look to improve diversity in our data and increase the speed and accuracy in our reporting.”
Read more on NCI’s website.