This is a review of the Ferguson Imperial model in mathematical epidemiology. This article begins with a familiar review of the political significance of the model. It then describes the general mathematical structure of the model and the computer code that implements it. Finally, it makes a detailed analysis of the bugs in a particular section of the computer code. This is an expanded version of a presentation to University of Pittsburgh’s Math Biology seminar on October 8, 2020.
In early March 2020, the U.K. government had decided to let the “virus run its course … with the exception of the elderly, who were to be kept indoors.” Policies were quickly reversed in mid-March, when on Tuesday March 17, major newspapers delivered the shocking news of a computational model predicting millions of deaths.
The article gives a mathematical description of the Imperial model, and gives a detailed exposition of the equations.
It is a “stochastic, spatially structured, individual based discrete time simulation.”
Individual-based means that every individual from a country’s population (as given by census data) is represented as a separate individual in the model. If the US has a population of 330 million, then the US model has 330 million individuals.
Spatially-structured means that each individual lives at a particular location in the country, determined by realistic demographic data obtained from Oakridge National Laboratory with resolution at scale under one kilometer for the United States and Great Britain.
Stochastic means that any susceptible individual might become infected at any time, according to random variables in the model. Different runs of the software produce different outcomes. In one run the infection might fail to spread, and in another run there might be millions of deaths.
Discrete time (and space) means that there are no differential equations or other forms of continuous modeling.
It is a survival model in the sense of hazard theory.
The Imperial model is very far from being a simple SIR model. Indeed, the Imperial model is orders of magnitude more complex and depends on many more empirically determined parameters. SIR is deterministic, but Imperial is stochastic. Unlike SIR, the Imperial model has no differential equations: unlike the continuous functions in SIR, time and the number of individuals are discrete. However, just like the SIR models, the Imperial model partitions the population into three compartments: the susceptibles, the infected, and the removed. Like the SEIR model, there is a latent period during which the infected are not infectious. It is therefore useful to keep the SIR picture in mind when studying the Imperial model.
I have thoroughly analyzed the bugs in one function (which is 168 lines of computer code in C) of the Imperial code. The function
AssignHouseholdAges is part of the initialization code for the model. It takes an input parameter
householdSize giving the number of individuals in the household and it assigns ages to each member of the household, subject to a long list of constraints. For example, parents must have a certain minimum age, parents must not be too different in age from each other, their children must not be too close in age to or too distant in age from the parents, and so forth.
I picked this function partly because of its manageable length, but also because of its relative independence from other parts of the code; I can analyze it in isolation. I have no reason to suspect that it has more bugs than other parts of the Imperial code base.
Serious bugs destroy the overall integrity of the code. A separate question is whether these bugs materially change the predictions of the code. I do not know. It is possible that other uncertainties are so extreme and that the model is so insensitive to its own internal details that the bugs are obscured.