Analytics Software Developer Tackles the Problem of Coronavirus Data Modeling

This team took up the challenge of providing local officials and healthcare providers with the accurate models they need for their regions.

COVID-19 Data Modeling

The accuracy of data models for coronavirus, including for case progression, infection, fatality and resource use, has been a favorite topic of debate in the media — and serious impact on the people watching the pandemic spread.  Michael O’Connell, Chief Analytics Officer of TIBCO, explains, “With exponential growth rates of transmission come exponential errors. When predictive models are uncertain, there is no wonder there is a wide range of human reactions — from fear to indifference.”

O’Connell says the problem of modeling coronavirus is difficult for several reasons. For example:

  • COVID-19 has a long incubation period: Asymptomatic people can still be infectious. “Because of this, it is difficult to track how many infected people there truly are. This is only exacerbated by the last of testing available in other parts of the world,” he says.
  • Calculating case fatality rate (CFR) is tricky: O’Connell explains that the best way to calculate CFR is to track a large group of people from the point when they develop symptoms until they recover or die. Then, calculate the proportion of all people who die from COVID-19. “While this is ideal, it is not possible in practice,” he says. This illness has some behaviors and survival strategies that make it difficult to anticipate short- and long-term infection scenarios. A common problem this creates is that some people divide the total number of deaths by the total number of cases, but this doesn’t account for unreported cases or a delay in illness from death. “The infection fatality rate (IFR) depends on estimates of the number of infections. We’ve seen a wide variety of variability in estimates of both CFR and IFR,” he comments.

How TIBCO’s Model is Different

TIBCO recognized the challenges that modeling coronavirus created. The model, which tracks the spread and impact of the localized COVID-19 pandemic in real-time, uses TIBCO Spotfire analytics software for visual analytics, TIBCO Data Virtualization (TDV) to federate, manage, and govern the underlying data as they are refreshed on a rolling basis, and TIBCO Data Science for predictive modeling.

“We are polling Johns Hopkins’ COVID-19 site, Our World in Data and local public health department websites on an hourly basis,” O’Connell says. “From these polling operations, we are populating a Postgres database with the federated data. We are also incorporating the data from regional healthcare organizations, partners and via our own data collections. This brings data on interventions, testing sites and hospital locations and capacities, to supplement the analysis.”

“Additionally, TIBCO’s strong data science team is working to ensure the model functions seamlessly with the latest data from regional healthcare organizations to supplement the analysis,” he adds.

The solution includes Python and R scripts for data shaping and predictive analytics. We are making these scripts available to the visual data science community, along with curated data in a Spotfire starter app, making Spotfire available for those who don’t already have it.

“Our goal is to enable organizations to assess the potential impact of the COVID-19 pandemic using sound data science principles versus relying on reports that tend only to stress the best- or worst-case scenarios,” O’Connell says.

In general, the TIBCO analytical model is different because it allows individuals to understand the pandemic in their own regions while updating in real time. O’Connell outlines additional unique features of the model:

  • Estimates and predictions of Effective Reproduction Number at local, regional and county levels. This includes point-click parameter selection for serial interval parameters, which are leading indicators for expected cases and fatalities.
  • Non-parametric super-smoothing analysis of cases and fatalities. This removes inconsistencies and errors in reporting and provides summaries and projections of case velocities and local and regional levels.
  • Tracking and projecting local hospital utilization rates, including beds, ICU beds, ventilators and equipment based on the number of reported infections (in collaboration with Change Healthcare).
  • A global dataset on local government interventions, categorized to match prominent scientific papers and work from leading epidemiology institutions, and including detail from local public health department sources. These are used to annotate case count data with the super-smoother estimates, showing the impact of local mitigation efforts.
  • Collaborations with Perkin Elmer and Washington University (St. Louis) on understanding evolution and mutations of COVID-19 across regions.
  • Models for retail store reopening, including forecasting e-commerce revenues and store sales, incorporating localized geospatial analyses on case velocity projections.
  • Models for employee infection status and tracing, including symptoms, contact with infected people and local case hotspots.

Technology Steps Up to Meet the Challenge

TIBCO is an excellent example of a software company that saw an opportunity to provide a much-needed capability to fight the pandemic and created a valuable solution.

“Every new disease is a new beast for data scientists to battle and tests our technology in exciting ways,” O’Connell says. “As the field of data science continues to expand, improvements to disease tracking and modeling are increasingly necessary. To ensure that these innovations and improvements take place, it is essential that data and the software behind it are strong, helping developers overcome any challenge.”