]]>

]]>

]]>

The work of an Italian mathematician in the 1930s may hold the key to epidemic modeling.That's because models that try to replicate reality in all its detail have proven hard to steer during this crisis, leading to poor predictions despite noble and urgent efforts to recalibrate them. On the other hand overly stylized compartmental models have run headlong into paradoxes such as Sweden's herd immunity.A theorem of de Finetti inspires a different approach.We identify orbits in the space of models (to be explained) that leave unchanged the key decision making quantities we care about.We use mixtures of IID models to span the orbits, assisted by convexity calculations.This approach is represented in the picture above. The important thing to note is that we are not attempting to find a model that is close to the truth, only close to the orbit. This will make a lot more sense after Section 1, I promise.OutlineAn introduction to de Finetti's TheoremNature's model for an epidemic ... somewhere in a large spaceMost important questions are symmetricalThe orbit of the symmetric groupFinding the orbit1. Introducing de Finetti's Theorem ... or something close to itTwo nearby hospitals risk running out of supplies. Joe and Mary are engaged to build a probabilistic model for binary outcomes. Either there will be a shortage or not. Four possible outcomes in total. Mayor Pete lays out the rules, also noting that only coins may be deployed for this particular stochastic modeling exercise.Both locations should have 50% chance of a shortageOutcomes must be positively correlatedThe hospitals are interchangeableJoe is the first to return. He explains that his model uses conditional probability to link the fates of the two hospitals. He flips one coin to determine the fate of hospital A. Then (the part that took him longer to figure out) he flips more coins and begins recording the sequence until either HTH or TTH appears. The former will happen first with 3/8 probability, he explains, in which case he assigns an outcome to hospital B that is the opposite of hospital A's outcome. Otherwise, with 5/8 probability, it will be the same. Joe is beaming with pride.Mary goes next. Well, she says, it is obvious that I can use two coins to generate a 1 in 4 chance of a shortage. So I just do that twice, once for each hospital.Mayor Pete looks at Mary aghast. Not only has Mary failed to link the fates of the two hospitals, she apparently has ignored his first rule entirely. Well at least you used coins, he says. But did I not tell you that each hospital has a fifty fifty chance? Oh yes, says Mary. I know. I reverse the result every other time.Right says Mayor Pete, rolling his eyes, I'm going to use Joe's model. But Mary, if you don't mind I'd like to use yours as a backup. Mary graciously agrees and for the next several days supplies Mayor Pete with calculations that he requests, all of which are based on the probabilities of the four outcomes. Mayor Pete requires insurance calculations, risk measures and various metrics to guide decision making.The next week Mayor Pete summons Joe and Mary to his office. To their mutual surprise he launches into an accusation of collusion, throwing Joe and Mary's results back at them and demanding to know why they are identical in every respect. Do you not realize how important this is, he asks? There is a reason I want a second opinion. Joe is stunned, but Mary casually takes out a pen and demonstrates wordlessly how many times on average each model will produce each of the four outcomes, given 32 chances to do so.It seems Mary knew what she was doing after all. Her approach has another advantage that will be important to us. The joint probabilities in Joe's model (i.e. 6/32 and 10/32) cannot be written as a product of individual probabilities of hospital outcomes. But on Mary's side of the equation, notice that this is possible for each of the two matrices individually.She has demonstrated that Joe's model, in which hospital outcomes are explicitly connected, is equivalent to a mixture of two models in which the hospitals are independent. Moreover we also notice that in each of the scenarios that comprise Mary's mixture, the probabilities of outcomes are the same for hospital A and hospital B. The hospitals are not only independent but also identically distributed.This can be viewed as a finite dimensional rendering of the celebrated theorem of Bruno de Finetti. That theorem says that any model for exchangeable random variables can be decomposed into a mixture of much simpler models in which variables are independent and identically distributed. Joe can try to be creative, but assuming strong correlation between hospitals Mary's approach may be a more straightforward path.The implications of de Finetti's Theorem, both practical and philosophical, are quite far reaching. The theorem is sometimes used to motivate Bayesian statistics. It has been called the fundamental theorem of statistical inference. Certainly if one is faced with the need to produce an exchangeable model for something it should come to mind. More pertinently for us, it may be informative when one is faced with a mathematical conundrum that seemingly remains the very same conundrum after some rearranging of deck chairs.2. Nature's model for an epidemic may be subtleBefore we get to that, let's agree on a vast space of possible models that might encompass the true one. We shall assume the following of an epidemic:The world is divided into M regions that don't interactWithin each region are N neighborhoods that do interactThese are the only assumptions, so we have done little to tie nature down. We suggest that there is one true model for a region. It gets played out M times with different results each time. Nature's model is extremely complicated, no doubt, but despite the hint I have given regarding de Finetti's Theorem, there is no requirement that neighborhoods are treated equally. One neighborhood's odds may be skewed more than another's. The joint distribution of the trajectories of neighborhoods may be well beyond our modeling abilities despite all the tools of stochastic modeling at our disposal.As we are learning, nature may play a stranger game of chess than we imagined. In a previous article I considered the curious relationship between herd immunity and early stage growth. The space of models probably includes an orbit where this happens:Observation #1: In the early stages, doubling times are under a weekObservation #2: Peak infection occurs at 20% penetration.The classic SIR epidemiological model is an example of an approach that refuses to bend to this kind of observation and instead, snaps in two. We quickly review that failure played backwards from #2 to #1.Observation #1 means that the actual number of people being infected every day, relative to the number of people who are contagious, must be 4/5th's of what it started out as (the susceptible population has dropped to 80%, so the chance of running into someone you can infect has dropped by a factor of 0.8). So if things are balanced, it must be that infection was 5/4th's of recovery back when the susceptible population was close to 100%.The difference between 5/4 and 1 is 1/4. That excess infection versus recovery translates to 25% growth, when measured in units of the typical generation time which we shall assume is 7.5 days. Our doubling time is 23 days, much larger than a week as demanded by Observation #1.Thus it seems like a very bad idea to assume nature falls within that particular class of models, and better to work back from extremely mild assumptions about what is true. I have assumed only one shortcoming of nature's model - regions are independent. Of course we know that regions will interact, in reality, that would seem to be the definition of a pandemic. But neighborhoods are likely to interact a lot more and so this is perhaps a simplification we might live with. You might even try to remove it by assuming the global dependence is driven by one factor.3. Most questions are symmetricalThe following are important for decision making:A count of the number of cases at a given timeThe time of peak infectionThe slope of the time series of cases on a logarithmic plotDoubling timesEstimates of total use of ICU units extrapolated from the modelAnd we could list many more. These are aggregate statistics. They are left unchanged if we swap the roles of neighborhoods at the start of the simulation. I won't belabor this point because it is obvious, but when combined with de Finetti's Theorem it goes a long way.4. The orbit of the symmetric groupOur hunt for nature's true model will never succeed, at least in this particular post. However we will try to locate a symmetrized version of that model. This is nature's model shuffled. We explain the symmetrizing operation using something that is easier to visualize.Permuting a model for three diceLet's start with dice. For this purpose, let us suppose Model A assigns probabilities to the outcomes of three dice. For convenience Die1=X, Die 2=Y and Die 3=Z and outcomes are labeled x, y and z. Model B is very similar to Model A. The models assign probabilities as follows:where the 0.02 is approximate. Not quite the same, but obviously similar. Both models assign higher probabilities to combinations of die rolls when the product of two dice is close to the third. However as we move from Model A to Model B we exchange the roles of the dice. Because this is the only change, we say that Model A and Model B lie in the same orbit of the symmetric group, which is the set of all permutations of models. Each model may be viewed as permuting the inputs of the others (as we all can do by accident when passing positional arguments to functions in the wrong order).Here we show the probabilities of all three die outcomes for Model A. Different colors correspond to different probabilities.The most likely outcome is (1,1,1). There is a lot going on in this picture (normal dice would of course show all dots the same color). Here's a home-buyers walkthrough of Model B's living room:Symmetrizing a model for three diceThere is a similar model for the three dice that treats each die on an equal footing.The E stands for "Exchangeable". Notice that if you take any re-ordering of x, y and z this probability is left unchanged. However there is still plenty of structure in the model as you can see from the probabilities it assigns to each of the 216 possible outcomes.In our case the exchangeable model E and the non-symmetric models A and B all lie on the same orbit. To create the symmetrized model I took an average of three models that were similar to A and B but permutations of each other. In general if we have a model for n variables the symmetrizing operator acting on a model might be writtenwhere the average is over all possible rule changes where we interchange the roles of variables. Notice this is an average over the orbit. There are six permutations of three objects but we only needed three due to symmetry in the formula.Now suppose I have some quantity of interest I want to compute that is symmetric itself. For example we might be interested in the probability that the product of three dice exceeds 10. The answer we get will be exactly the same for all three models, and for that matter any model on the same orbit. Thus...Any model on nature's orbit will doMoving from dice back to models for epidemics in neighborhoods, and taking as a given that many if not most important questions are symmetric, we realize that any model on the same orbit is as good as another. For example, suppose we are interested in the number of infections after three days. We havethus we will accidentally get the right answer, as it were, so long as we are close to the orbit.5. Finding the orbitWe can find the right orbit using IID models and convexity adjustments.De Finetti's theorem suggests a mixture of IID models might be near the orbitBased on the experience of Joe and Mary we make one final supposition. Not only can we pick any model on the orbit, but there is a good chance that a mixture of independent identically distributed models may get us there. In the dice picture this means that we'd consider mixtures of models that treat each die as an independent roll. In the case of our epidemic model it would mean that each neighborhood does not interact with the others.Of course assuming no interaction is not a good assumption in isolation, but by taking different mixtures of these models we create dependence. And we will generate a whole bunch of exchangeable models. We know that there is an exchangeable model on every orbit because you can apply the symmetrization operations, and a little reflection might convince you there is exactly one. So as we change up the mixture we will be rapidly traversing the orbits, visiting a lot of exchangeable models and maybe even all of them. Whereas if we were just tinkering with a model that has 10,000 lines of code and exists in the very high dimensional space of models, we might be making a lot of sideways movement.Here's an attempt at a "big picture":It is certainly big. Can we justify it?No matter how complex nature's model is, there is a symmetrical equivalent.The exchangeable model provides the same answers (to symmetrical questions)The exchangeable model is a mixture of simple IID models (de Finetti's Theorem)It doesn't matter that we will never get our hands on the truth, which we won't. It also doesn't matter that we wouldn't be able to physically implement the symmetrization operator that moves natures' model around the orbit to its exchangeable representative. We don't have to (which is good because by the time you get to 60 neighborhoods that is one permutation for every electron in the universe).Point 2 is intuitively obvious I hope. (Physicists: You might like to think of this as moving an operator from the "bra" to the "ket", and you can define the action of permutation on functions in a contragredient fashion if you wish).Point 3 is de Finetti's Theorem, in spirit. It is only technically true if we allow signed probabilities but as a practical matter it is probably good enough with unsigned. It says we can use independent identically distributed (iid) models for neighborhoods - which means that we only need a model for one neighborhood and we assume they all evolve independently. This decomposition is rather remarkable but Mary has prepared us for the surprise.Do we need negative weights in the mixture?It is possible that nature has chosen a model which cannot be decomposed into a positive weighted combination of iid models. Then again, this might not be the end of the world. We can allow a few small negative weights in the mix even if this feels a little strange (given that the weight we assign to our simple models is interpreted as a probability). Better people like Feynman and Dirac found negative probabilities on the route to useful numbers."Negative energies and probabilities should not be considered as nonsense. They are well-defined concepts mathematically, like a negative of money" - Paul Dirac, 1942Then there's the Wigner quasiprobability distribution ... for another day.[ Wigner Quasiprobability Function courtesy of Geek3 ]My point: panic isn't a requirement should negative probabilities show up - though in fact they are less likely to show up in the first place because epidemics are a classic case of strong dependence (and this diminishes the need for minus signs). That's why I claim it doesn't matter terribly much that de Finetti's original theorem applies to infinite sequences of neighborhoods rather than finite collections, though this part of my argument could benefit from generalizations of de Finetti's Theorem I am sure.A few years ago I had the pleasure of working with Gabor Szekely who proved a number of results about the finite case. Not being as clever as him my comments are based more on numerical experiments. There are some links to other articles on the finite case below.Mixing deterministic disease modelsNow there is another strong reason to use simple mixtures of independent identically distributed models - it is very easy to compute things. For instance:The model for each neighborhood is a classic epidemiological model like SIRWe take a mixture of themWe don't have to believe the neighborhoods are really independent because the mixing takes care of that. Now to fit to observed phenomena we can:Use existing results for model predictionsCompute convexity adjustments that take account of the mixingInvert the convexity adjustment to find the right orbitTo proceed, the constant parameter SIR model is:Now allow beta to be varied in the de Finetti mixture according to some variation functionwhere eta has unit norm with respect to population density rhoThen in our mixture model at herd immunity the number of susceptible people isThis is the usual answer divided by the harmonic mean of eta. The harmonic mean is known in closed form for a number of distributions which you can see from the Wikipedia page for harmonic mean. You can invert these formulas to correct for how far wrong the standard model is and you'll get much closer to the orbit of nature's model.A word of warning here that you really want to average cases where beta exceeds gamma only, so it is close to a harmonic mean but not exactly the same.Mixing stochastic modelsI think a better approach than mixing deterministic models, and more true to de Finetti's theorem, is to use a mixture of stochastic disease models. Here I would suggest browsing the Fixed Income Modeling section of the library and not just Epidemiology.The terms on the right hand side have closed form formulas, or good approximations, so backing into the right amount of variation is in principle easy.TakeawayWhen epidemic models are chosen at the level of equivalence classes under permutation, which is to say picking an orbit of the symmetric group, it is difficult to go wildly wrong and also much harder to waste effort moving around an orbit (rather than up and down to traverse all the exchangeable models). Aided by convexity adjustments, mixtures of independent identically distributed models are very easy to use and the work of de Finetti and others has taught us about their span. The strong dependence between outcomes suggests to me that de Finetti's Theorem is indeed fundamental to epidemic modeling.Of course de Finetti's Theorem is already pretty fundamental to statistics, if not quite rising to the status of "the" fundamental theorem because there isn't one. As an aside we have a Fundamental Theorem of Algebra, a Fundamental Theorem of Arithmetic, a Fundamental Theorem of Calculus and a Fundamental Theorem of Linear Algebra but no Fundamental Theorem of Statistics. A frontrunner must be the Central Limit Theorem (which is, after all the Central Limit Theorem and not the Central Limit Theorem - something I only recently learned) but perhaps I have convinced you that de Finetti's Theorem is another strong candidate.If not, perhaps there is an opening in Epidemiology.Related postsOn the topic of convexity adjustments to help find the right orbit:Reconciling herd immunity and growth with the Vasicek model Infection attenuation due to repeated contact Herd immunity and the harmonic mean A way to add even more variation (regime switching Vasicek) Readings related to de Finetti's Theorem (finite case)Finite exchangeable sequences, Diaconis and Freedman de Finetti's Theorem for Abstract Finite Exchangeable Sequences, Kerns and Szekely (paywalled)Alternative approach by Janson, Konstantopoulos and YuanOriginally posted hereSee More

]]>

]]>

Just a note to say that the Python pandemic library is now dockerized so if you want to help with Swarm Prediction it is even easier. docker run xtellurian/pandemicSee More

]]>

This is a longer form followup to my post describing the open source pandemic package on PyPI (with Python code also available on Github). You can use the code to simulate millions of people moving about in two dimensions, crossing paths and, unfortunately, getting sick.This video illustrates the dynamic with a toy sized town of fifty people. Watch how transmission takes place. You might even discern commuting and households. theTheThe package implements an agent based model, which is to say that people are modeled individually. This is not unusual (see for example this paper on influenza modeling in Australia that a reader of my prior post noted). An agent model can be conceptualized as taking several million copies of a disease progression model for an individual, and having those models walk around and bump into each other.But how should they walk?Outline:The modeling of illness progressionThe model for population movement (Ornstein-Uhlenbeck processes on the plane)The model used to generate a city (sprawling home and work locations)Candidate model improvementsA visual tour of parameters and their sensitivities, stressing the importance of population density and of social distanceHow you can help if you have spare CPU cycles0. Motivation. Deficiencies with some other models for virus spreadLike you, I had heard forecasts and strategy choices referencing how far the virus will penetrate into the population. They did not accord with my intuition. No code was released. In some case not even an outline of the mathematical approach was made public. Here is why that is a big problem. Two schools of thought emerged:Strong form defeatism ("let's head straight for herd immunity")Weak form defeatism ("we must flatten the curve but the area under the curve may remain roughly the same").These were not the only views taken, fortunately, and countries like New Zealand and South Korea have pursued aggressive elimination strategies. But defeatism of these varieties was certainly prominent. Perhaps it is borderline excusable. It is seductive if you are primarily guided by simple models that treat the population as a whole. That's because when you calibrate those models to an observed exponential explosion in cases the mathematics doesn't give you a way out. If you think every person will infect another two, there's only one way that will end.In contrast this simple simulation package might, if interpreted carefully, suggest quite different characteristics of disease progression across cities, towns and countries - especially when densities and movements vary considerably. In my previous post I showed a video of two larger towns with population 40,000 that suffer 50% and 75% virus penetration due to differing testing regimes. In the video shown below we remove half the inhabitants and run it again. The result is dramatically different. The smaller town fares much better. The virus penetration is less than 30%.To tilt the scales I set the testing rate per capita three times lower in the small town compared to the more populated. And in one of the more populous towns, I introduced randomized testing. Not so for the small town. In other words, the smaller town was set up to fail but despite these disadvantages, the smaller town fared substantially better in relative terms. Density matters in this model.Here is another example, this time with a much larger city of 800,000 people. The virus spreads in this city along the main commuting corridor between the two centers of population. The result is anything but pretty but again, only about 35% of the populace gets the bug and the percentage falls dramatically as you move into the suburbs. Pro tip: watch this one at higher speed. It may not be best to interpret density literally as geographical density - but so long as there are significant differences in the intensity of interactions between people (and the frequency of happenstance exposure) from one region to the next we should be very wary of simplistic models and their tendency towards defeatism.1. Modeling an individual's illness progression.Turning to the description of the model that generated these simulations we start with disease progression. This aspect is orthodox. The code assumes that each individual has a health state:Vulnerable,Infected but not symptomatic or as yet testedSymptomatic but not as yet testedPositiveRecovered, orDeceased.For every moment that passes, there is a probability that an individual will move from one state to another. As per the code:Infected or symptomatic people might be tested, and progress to being positive.An infected, symptomatic or positive person may either recover or die.Furthermore,The vulnerable colliding with those infected or symptomatic may become infected.This is the key difference between this simulation and (some) commonly deployed models for pandemic which assume that anyone can become infected at any time.Put another way, we are using a motion simulation in order to model the exposure (viral load) that individuals receive - as compared with simply assuming it is some fixed or proportional quantity. In this model there is no single rate of infection - your odds of being exposed vary dramatically depending on where you are, which in turn depends on where you live, whether you commute and so forth.If this type of model (where motion and collisions or near misses creates exposure) is new to you and my first video above doesn't bring home the point, then I heartily recommend this video by Grant Sanderson on spatiotemporal models for contagion. He has taken mathematical pedagogy to a very high level. Here are nine simulations of disease spreading. It is somewhat reminiscent of a forest fire. You would not model forest fire by assuming that any tree in the country might spontaneously ignite whether or not it was close to the flame front. And even if most trees could recover as quickly as humans do, you would probably not recommend a strategy of simply allowing everything to burn in order to build up immunity against fire.2. The OU Movement modelI now describe the pandemic package's motion model that controls movement.The primary challenge as I saw it was finding a way to deal with movement at different scales without getting super messy, prescriptive or exploding the parameter space. Which seat you choose on a train will determine your viral load. Which elevator you take could be critical. Whether you place your hand on a table in one position or an inch to the side could make the difference between life and death. These tiny differences are as important as whether you live in San Mateo or San Salvador, or whether you come into the city from the North or South on a regular basis.One can imagine the list of parameters getting out of hand pretty quickly if we start modeling modes of transport and where you sit on the sofa when you get home. You should surely try to use that data if you have it. I don't. Instead, the pandemic library pulls out that old chestnut Brownian motion, or rather a minor modification, to try to capture unfortunate coincidence on different scales. It says it on the box actually, individuals follow OU processes.The OU stands for mathematicians Leonard Ornstein and George Eugene Uhlenbeck (we'll pretend Uhlenbeck means "one length back please" in German). An Ornstein-Uhlenbeck pandemic model, as we might term it, is one where everyone ambles about like Brownian motion - aka a random walk. However an OU process isn't entirely directionless. Rather, it is a combination of a stagger and a steady pull towards a target - like someone who has imbibed too much looking for the campground toilet in the dark.Take another close look at the first video and you may see that some of the points are meandering to work (or school) and back. Others are staying home - including some who are sick. They are all following random walks with a pull. Here is an example of such a walk courtesy of Shiyu Ji and wikipedia, where you can read more about uses and properties of OU processes.In the OU pandemic model as currently implemented, people take random walks but they have invisible springs drawing them towards an attractor. The stiffness of the spring is a parameter. In addition some percentage of the population wander their way to work and then to home on a daily basis (increment the parameter count by one for the working fraction, those of you who are counting). They have two attractors instead of one. In the morning the commuters are pulled by an invisible spring in the direction of their workplace. In the evening they are pulled by another invisible force back home.Mathematical aside. Technically speaking the walks are regime switching OU process (morning regime, evening regime). One advantage of this setup - not currently exploited - is that the probability of not being infected looks a lot like the expectation of the exponential of the integral of a different regime switching Ornstein-Uhlenbeck process (namely the instantaneous probability of getting exposed, which rises and falls with your commute, drawn to a higher level at work and lower at home). It is a little known fact that this expression for survival admits an asymptotic series expansion where all terms are known.The non-mathematical version of that aside is that one can construct an OU model of this OU model ... a little bit like a dream inside a dream in Inception ... though doing it might age you prematurely.So back to the top level dream we go. The attraction term in the drunken walk is linear. Only the best, perfectly crafted invisible Hooken springs are employed. There is no momentum because I wasn't sure it was necessary or even a good idea, but we could consider it. Commuting by spring is something Elon Musk might perfect when, God willing, this is all over.We can critique the realism of OU walks with or without momentum - although let's be honest, you were never that keen to go straight to work. After you stop in at the coffee shop and then run into a few friends you're approaching OU. At home you wander about too, trying to avoid the horror of home schooling.When you get to work, in this model, you are still being tugged at by the spring but it doesn't mean you will all settle into the same exact location and automatically infect everyone you work with. Your random motion keeps you bobbling about in a neighborhood of your attractor, which is to say the water cooler. The same occurs when you go home. People don't converge to a single point so infection is not guaranteed at either location.But why listen to me? Open up this page and cut and paste the contents into the terminal. You'll see. In better times we could use this stylized setup to model the spread of gossip or something. Sadly, we have a morbid scenario unfolding before our eyes and that's why I am trying to enlist your help.3. Modeling citiesWhat of initial conditions? The people in the model needs to start somewhere, and they start a small random distance from their home. But where is their home? Here is a little piece of code that generates a synthetic collection of home and work locations. I would consider it something of a placeholder, yet I suspect, and I might be wrong, that the model retains some essential connection to population density regardless. When you run the model you will see just how important density is.The city is generated in a manner vaguely inspired by the Chinese restaurant process. The general idea is that you start throwing people at a map but not uniformly. Instead they pick a person already on the map at random and then decide to live near them. But they also have a tendency to plonk themselves down further away from the origin thus leading to urban sprawl. More on that below, but your best documentation for this is the code. I'd expect someone will suggest a more realistic generating process.I will make one remark, however. There would seem to be a danger in confusing implied geometry with actual geometry. The former could be defined as a correction to the latter when you take into account the overly stylized movement model. It is not necessarily the case that swapping out the fake city model for actual coordinates of people from actual demographic data is the best immediate use of time. For while we can certainly question the dynamics of motion I suspect there is a stretching of the map that makes it more realistic - and with sufficient skullduggery we could reverse engineer that into the code that generates the locations of workplaces and homes. What I mean to say is that deficits in the modeling of movement and geography can cancel out - but only if you let them.4. What's not in, yet.Quite a few things could be added quite easily - though I am not sure they justify increasing the parameter space - something I am loathe to do.Overloading. The addition of an intensive care state and contingent death hazard rates based on proximal capacity would capture a major effect now missing in the model - namely overloading of the system. However to be pedantic, this effect can be calculated after the fact to first order so including it in the simulation is not strictly required. Indeed some PDE models treat death and recovery as the same thing, which is a similar trick.Fomites. (Hat tip Stephen Luterman). I had to Google that word too. The passing of disease via inanimate objects. I can't find literature that makes quantitative assessment, but including trails of people and not just their contemporaneous positions would not be unreasonable. However my gut tells me that this might be taking movement too literally. It is just one of the ways that proximity leads to transmission and we aren't modeling whether it is a cough or cross contamination. We can absorb it into the walk size to first order and if necessary, testing rates - since although they don't move, fomites are otherwise like non-compliant symptomatic people.Cohorts. Different behavior by age and vulnerability. If the data by age is available, then we get back the cost of putting in cohorts. I think this might be the next thing that goes into the model.Schools. I point out that work, in the model, is wherever anyone goes. It can be school. I think the city simulation could be improved. School may be an important dynamic especially during lockdown periods in countries like Australia, where most things are closed but schools remain open.Non-compliant positive people could be included in the model, but this is very close to simply changing the testing rates (non-compliant positives are the same as asymptomatic carriers).Repeat illness. A post by Daniel Goldstein reads: "New data not published yet suggests that 70% of people develop measurable neutralizing antibodies that peak and stabilize at about 2 weeks. 30% don’t, even though they recovered clinically. This may prove to be a thing, we don't know yet.Gatherings. A hurricane could force many people into shelters (suggestion from Michael Broome). A small change could enable one to study the impact of allowing events involving fifty or five hundred people. However is this similar to including a small number of large but far-flung households?5. A visual tour of some parameters and their sensitivityIn mathematics we count one, two, many. Two is usually the hardest so we skipped it and went straight to many agents. This does not imply an absurd number of parameters. In this section we introduce the small number of control knobs for the city generation and movement - as distinct from the medical parameters governing illness progression that are entirely standard.Kappa [default=3.0] Kappa is the linear coefficient in the restoring force that drags everyone towards their attractors. The larger kappa, the stiffer the invisible spring. Kappa controls how keen people are to get to work, and symmetrically how keen your are to get home. Some might wish to break this symmetry - and we probably all know people for whom the homebound kappa might be smaller than the workbound kappa - but I don't think it is required. In fact we might convince ourselves that kappa might not even need to be a free parameter and can be fixed once and for all.The first thing to appreciate about kappa is its role at home. The video below shows a number of OU particles that are all attracted to the origin. They quickly reach an equilibrium state where their typical distance to the origin ceases to change very much. The green circle on the left is the theoretically computed root mean square distance to the origin, which is inversely proportional to the square root of kappa. The larger kappa, the tighter people cluster at home and, to a lesser extent, at work.Typically we would expect higher kappa to lead to more contagion. h. Average household size [default=2.5]The average household size and the tightness of the household (kappa) are going to drive contagion at home. Households are a recent change to the code. Household sizes are now binomially distributed. A household is merely a group of people with the exact same home attractor. Household size and the relative size of kappa and w (discussed next) are going to determine if everyone gets sick when one person does. Anecdotally that seems to be the case!It isn't clear to me that we need household size to be a free parameter for forecasting purposes given we have two other knobs, and I would be inclined to fix it at the national average. However in the presence of demographic data, varying household size and holding everything else constant might enrich the feature space and possibly help us understand differing infection rates (say between the Bronx and Manhattan).Before leaving household size some comment on the relationship to kappa is in order. The interplay between kappa and household size is seemingly straightforward at home. Commuters' mean square distance to home in the evenings can also relate quite closely to the ergodic average if kappa is on the higher side. Here we set kappa=6 to make the point.All that said commuting muddies the waters in other ways. Different values of kappa might lead to a faster commute through troubled waters - depending on the geography - or quite time in the car alone (as it were). The video below shows progression of the disease in a town using nine increasing values of kappa (increasing left to right along the first row, then the second and third). In this and the other comparisons, I will show that all parameters are relative to a baseline town (see the code for this and other ready to go towns and cities). The baseline for kappa is 3.0. Thus the top left simulation is for kappa=1.8 and the bottom right kappa=4.2.There is, I think you will agree, no glaringly obvious pattern although, as expected, contagion occurs less rapidly for the small values of kappa. It may not be just a home effect. With kappa set in the vicinity of 1.8 some people might not be turning up to work every day - never mind not mingling as closely with their work colleagues as they might.A caveat. With all of these simulations the initial conditions are important and those are generated randomly by design. If you scan back up to the first 3 x 3 video you'll see nine simulations that, though I didn't mention it at the time, use identical parameters. They certainly don't end up the same. Perhaps this will give you newfound respect for experts trying to predict the course of this disease but also a sense of the noise in these comparisons.c [default=0.5] The fraction of people who have work attractors. Work, as noted, can be school or just some place people go that is sufficiently removed from where one would otherwise wander to near home.W [default=1.0] The random walk step size is governed by the variable W. Philosophically I would like to think we could fix this as W=1.0 the way physicists like to set the speed of light c=1, but as a practical matter that doesn't work too well (see the geohashing part of the code, those who are interested). It might also detract from the message in the video below: walk size matters a lot. Morally, walk size is social distance.You'll notice that contagion occurs very quickly on the top left simulation - although its geography was a little unlucky, admittedly. The penetration is about 80%. On the bottom right, in comparison, we have a simulation where roughly one third as many people have caught the bug.One should be leery of interpreting this as a "right to roam". I prefer the interpretation where people are meandering further from the water cooler on average, crossing the street to avoid joggers, choosing alternative means to get to work, commuting at off hours, wearing a mask (to create greater effective distance) and changing what they do when they socialize - such as shouting at each other as they sit in Adirondack chairs placed twenty feet apart.n [default=40000] The population parameter is, as you might expect, the number of agents in the simulation. If we hold the city generation parameters constant then this is almost a population density parameter - except for a small sprawl effect.In this simulation we watch towns with differing populations varying from 24,000 on the top left to 56,000 bottom right. Towns begin with the same proportion of infections, ranging from 30 infections top left to 70 bottom right. I apologize but because the simulation runs more slowly for the larger populations the progression is not in sync, making it look as if the smaller towns are doing worse than they are.Nonetheless we see the larger towns fare considerably worse by every proportional measure. The dreaded exponential growth kicks in quickly, powered by asymptomatic carriers who collide with a sufficient number of people to set off the chain reaction. Meanwhile, in the small town things turn pear shaped, but more slowly. The rate of recovery is still an appreciable fraction of the people getting sick. Recovered people are the control rods for this reactor. For some time this takes some heat out of the core, as it were. Had this simulated town done more (higher rate of testing, or higher rate of contact tracing, which is equivalent to raising the asymptomatic test rate) it might have done much better.Need we say it, a relatively small change in density makes a big difference. At the risk of a misleading comparison between this simulation's density and real world density, we note that the top right versus the bottom right density ratio happens to be about the same as the density ratio between San Fransisco and New York.The two leftmost simulations provide a density ratio of 2:1 which is similar to the density difference between Manhattan and Brooklyn. The density ratio between right and top left approximates the ratio between the densities of New York City versus Los Angeles or in turn, Los Angeles against Detroit, Cleveland or the Portland metropolitan area. We should not bucket New York with Portland. Never mind the fact that Tennessee, South Dakota and Alaska are about twenty times less populated than New Jersey per square mile.Care must be taken, however, as this is a highly stylized model intended to capture close collisions. Offices might be slightly bigger in Missouri than Manhattan, but perhaps not three times as large. People with longer commutes don't stand apart from each other more than those with short commutes, once they get to the office. There are close talkers living in Brandenburg, the least dense city in Germany.The fact that the larger town started with the same proportion of infections but a numerically larger number than the small town might play a role. However here is a similar set of simulations in which all towns start with fifty infections, irrespective of size. Things still turn out worse for the more densely packed populations.You are going to see a lot of discussion about population density. Out of curiosity I quickly scratched out this plot of the logarithm of COVID-19 deaths (as of Apr 4th) versus the logarithm of population density for European countries. It ain't a law of physics, but it is hard to ignore.I know what you are thinking ... there might be all manner of confounding variables here and I'm sure you are right. What else is correlated? I did discover in the course of my brief investigations that amongst the "kissy countries" COVID-19 deaths are also correlated strongly with the number of times it is customary to kiss someone on the cheek when meeting. However I hastily add that this correlation completely disappeared after accounting for population density (actually the sign turned negative, a little). I think, therefore, we should take this as a cautionary tale. A lot of things are likely to be correlated, spuriously or otherwise, with population density. If you hear that "X causes COVID-19" then check against the plausible culprit: population density.Parenthetically, this does leave us with a true mystery. I leave it to a reader with superior insight to explain why people who live in more densely populated countries kiss each other on the cheek a larger number of times when they meet, than those living in less densely populated countries. And before you jump to it, no I am not convinced that in increase in the number of ceremonial kisses causes children.Radius [default=0.04] Previously known as "sprawl distance", the radius spaces out homes and workplaces. A work location is selected randomly by first choosing an existing work location and then moving away (on average) a distance equal to the radius. We multiply the radius by a standard normal number. At present this parameter also determines how close people live to each other. The home radius is fixed at four times the work radius parameter. There is no real rationale for this choice beyond a desire for one less parameter.I'm sure it will not surprise you to learn that decreasing the radius, ceteris paribus, tends to set up a city more likely to be susceptible to contagion. However geometry and luck enter the fray as seen in the comparison between the bottom left and top right towns.Two more parameters control the geography simulation (arguably one too many).s. Sprawl [default=0.25] Controls the extent to which home and work locations tend to drift away from the origin. After an existing home location is chosen, the new location is centered at (1+s) times the existing home position. So when we said the mean distance from the previous home was the radius r, that wasn't entirely true.e. Sprawl quadratic term [default=0.05] ... actually we also add e times the square of the distance to the origin of the existing home when choosing the center of where the next home is to be built. It remains to be seen if this guy gets the chop from Ockham's razor.I won't make you watch any more poorly produced videos but different choices of sprawl coefficients can yield different density profiles for the city and this tips the scales toward higher or lower virus penetration into the periphery.6. Coming soon, a better use for spare CPU cycles than bitcoin miningThe has been kept as small as possible but there are quite a few we have omitted in our discussion. For those who are interested the names of parameters - some of which you may choose to treat as configuration constants, are found in pandemic/conventions.py.It is my suspicion that you can go a long way varying a small number of these parameters, but I don't yet have a lot to back up that statement. And while running the model for any one set of initial conditions or parameters is easy, running it for a large number is not. So here's a plan. If you run this script then pretty soon the following will occur:The simulation starts with parameters chosen to optimize an acquisition function, which is to say they are chosen differently each time but in a way that tries to optimize what will be learned about the simulation.When it finishes, it sends back the results.If enough people run the model, we may build up a large database of initial conditions, parameters, and model outcomes (a public database of course) which can serve as a training set for a surrogate model of pandemics. A surrogate model is one that is approximately the same as the simulation yet can be computed thousands or millions of times faster than the code you now run.So here's a slightly premature call out to all armchair epidemiologists, disease epistemologists, statistical etymologists ("data science" - c'mon) and all you people on Linked-in offering to be my personal life coach. Do you have a few spare CPU cycles? Please stop drawing exponential curves and instead do the following:Open up terminalOpen up this page and cut and paste the bash script into the terminalI'll make sure the script does a bit more than drawing pretty pictures.AcknowledgementsIt is pretty hard to model disease progression without accidentally reinventing a standard model or something like it. I like to think of myself as a defender of research mores but like others in a hurry I have written the code more quickly than I can navigate academic paywalls and research prior work, on this particular occasion.I would like to acknowledge Python's implementation of set authored by Raymond D. Hettinger which I'm guessing doesn't get a shout out too often. I found Hioaki Kawai's geohash package to be more than useful.Open source pandemic librariesHere are some other libraries that model millions of agents (c.f. pedagogical tools)The epydemic library by Simon Dobson places people on nodes of a graph.SimpactCyan (Liesenborgs et al) also models relationships and events.Please help me flesh out this list.Parting thoughtsAs noted in the prior post, I wrote this code because I couldn't get my hands on any "official" open source spatiotemporal model for contagion and, like many of you, wanted to better understand the dynamics of disease. Agent models such as this one can be used or abused. However I don't think we should be resorting to heterogenous population models for infection that don't actually model infection at all.Another way to come at this is to recognize that an agent model can, with a small modification, imitate a heterogeneous population (macroscopic) model - albeit a computationally inefficient one. Suppose I were to introduce a line of code into the simulation that shuffled the positions of every particle on the screen, placing people uniformly irrespective of their home or work locations, and doing this over and over again. This enforces homogeneity, but doesn't seem terribly sensible.Instead, I encourage you to mess with the agent model code where density differentials are preserved if you look closely, there are parties at Westport and Stanwell Tops. I don't think it will do your mathematical intuition any harm - nor your mental health. As noted this has turned me into something of a cautious conditional optimist, my warning about lack of rigorous estimation notwithstanding.The dramatic drop in movement we have seen (as registered by cell phone locations) in places like New York City needs to be carefully translated in the strange geometry of this model, but combined with the knife edge behavior you see in the simulations, it suggests a dramatic turnaround is possible. Furthermore, the lags between turning points for things like infection and death in the simulation also give me hope. I won't be surprised if things take a turn for the better - though of course we all wish we never got this far.Bye for now. A reminder that you are invited to make suggestions and improve this simulation model. It is yours to fork.Originally published hereSee More

This is a longer form followup to my post describing the open source pandemic package on PyPI (with Python code also available on Github). You can use the code to simulate millions of people moving about in two dimensions, crossing paths and, unfortunately, getting sick.This video illustrates the dynamic with a toy sized town of fifty people. Watch how transmission takes place. You might even discern commuting and households. theTheThe package implements an agent based model, which is to say that people are modeled individually. This is not unusual (see for example this paper on influenza modeling in Australia that a reader of my prior post noted). An agent model can be conceptualized as taking several million copies of a disease progression model for an individual, and having those models walk around and bump into each other.But how should they walk?Outline:The modeling of illness progressionThe model for population movement (Ornstein-Uhlenbeck processes on the plane)The model used to generate a city (sprawling home and work locations)Candidate model improvementsA visual tour of parameters and their sensitivities, stressing the importance of population density and of social distanceHow you can help if you have spare CPU cycles0. Motivation. Deficiencies with some other models for virus spreadLike you, I had heard forecasts and strategy choices referencing how far the virus will penetrate into the population. They did not accord with my intuition. No code was released. In some case not even an outline of the mathematical approach was made public. Here is why that is a big problem. Two schools of thought emerged:Strong form defeatism ("let's head straight for herd immunity")Weak form defeatism ("we must flatten the curve but the area under the curve may remain roughly the same").These were not the only views taken, fortunately, and countries like New Zealand and South Korea have pursued aggressive elimination strategies. But defeatism of these varieties was certainly prominent. Perhaps it is borderline excusable. It is seductive if you are primarily guided by simple models that treat the population as a whole. That's because when you calibrate those models to an observed exponential explosion in cases the mathematics doesn't give you a way out. If you think every person will infect another two, there's only one way that will end.In contrast this simple simulation package might, if interpreted carefully, suggest quite different characteristics of disease progression across cities, towns and countries - especially when densities and movements vary considerably. In my previous post I showed a video of two larger towns with population 40,000 that suffer 50% and 75% virus penetration due to differing testing regimes. In the video shown below we remove half the inhabitants and run it again. The result is dramatically different. The smaller town fares much better. The virus penetration is less than 30%.To tilt the scales I set the testing rate per capita three times lower in the small town compared to the more populated. And in one of the more populous towns, I introduced randomized testing. Not so for the small town. In other words, the smaller town was set up to fail but despite these disadvantages, the smaller town fared substantially better in relative terms. Density matters in this model.Here is another example, this time with a much larger city of 800,000 people. The virus spreads in this city along the main commuting corridor between the two centers of population. The result is anything but pretty but again, only about 35% of the populace gets the bug and the percentage falls dramatically as you move into the suburbs. Pro tip: watch this one at higher speed. It may not be best to interpret density literally as geographical density - but so long as there are significant differences in the intensity of interactions between people (and the frequency of happenstance exposure) from one region to the next we should be very wary of simplistic models and their tendency towards defeatism.1. Modeling an individual's illness progression.Turning to the description of the model that generated these simulations we start with disease progression. This aspect is orthodox. The code assumes that each individual has a health state:Vulnerable,Infected but not symptomatic or as yet testedSymptomatic but not as yet testedPositiveRecovered, orDeceased.For every moment that passes, there is a probability that an individual will move from one state to another. As per the code:Infected or symptomatic people might be tested, and progress to being positive.An infected, symptomatic or positive person may either recover or die.Furthermore,The vulnerable colliding with those infected or symptomatic may become infected.This is the key difference between this simulation and (some) commonly deployed models for pandemic which assume that anyone can become infected at any time.Put another way, we are using a motion simulation in order to model the exposure (viral load) that individuals receive - as compared with simply assuming it is some fixed or proportional quantity. In this model there is no single rate of infection - your odds of being exposed vary dramatically depending on where you are, which in turn depends on where you live, whether you commute and so forth.If this type of model (where motion and collisions or near misses creates exposure) is new to you and my first video above doesn't bring home the point, then I heartily recommend this video by Grant Sanderson on spatiotemporal models for contagion. He has taken mathematical pedagogy to a very high level. Here are nine simulations of disease spreading. It is somewhat reminiscent of a forest fire. You would not model forest fire by assuming that any tree in the country might spontaneously ignite whether or not it was close to the flame front. And even if most trees could recover as quickly as humans do, you would probably not recommend a strategy of simply allowing everything to burn in order to build up immunity against fire.2. The OU Movement modelI now describe the pandemic package's motion model that controls movement.The primary challenge as I saw it was finding a way to deal with movement at different scales without getting super messy, prescriptive or exploding the parameter space. Which seat you choose on a train will determine your viral load. Which elevator you take could be critical. Whether you place your hand on a table in one position or an inch to the side could make the difference between life and death. These tiny differences are as important as whether you live in San Mateo or San Salvador, or whether you come into the city from the North or South on a regular basis.One can imagine the list of parameters getting out of hand pretty quickly if we start modeling modes of transport and where you sit on the sofa when you get home. You should surely try to use that data if you have it. I don't. Instead, the pandemic library pulls out that old chestnut Brownian motion, or rather a minor modification, to try to capture unfortunate coincidence on different scales. It says it on the box actually, individuals follow OU processes.The OU stands for mathematicians Leonard Ornstein and George Eugene Uhlenbeck (we'll pretend Uhlenbeck means "one length back please" in German). An Ornstein-Uhlenbeck pandemic model, as we might term it, is one where everyone ambles about like Brownian motion - aka a random walk. However an OU process isn't entirely directionless. Rather, it is a combination of a stagger and a steady pull towards a target - like someone who has imbibed too much looking for the campground toilet in the dark.Take another close look at the first video and you may see that some of the points are meandering to work (or school) and back. Others are staying home - including some who are sick. They are all following random walks with a pull. Here is an example of such a walk courtesy of Shiyu Ji and wikipedia, where you can read more about uses and properties of OU processes.In the OU pandemic model as currently implemented, people take random walks but they have invisible springs drawing them towards an attractor. The stiffness of the spring is a parameter. In addition some percentage of the population wander their way to work and then to home on a daily basis (increment the parameter count by one for the working fraction, those of you who are counting). They have two attractors instead of one. In the morning the commuters are pulled by an invisible spring in the direction of their workplace. In the evening they are pulled by another invisible force back home.Mathematical aside. Technically speaking the walks are regime switching OU process (morning regime, evening regime). One advantage of this setup - not currently exploited - is that the probability of not being infected looks a lot like the expectation of the exponential of the integral of a different regime switching Ornstein-Uhlenbeck process (namely the instantaneous probability of getting exposed, which rises and falls with your commute, drawn to a higher level at work and lower at home). It is a little known fact that this expression for survival admits an asymptotic series expansion where all terms are known.The non-mathematical version of that aside is that one can construct an OU model of this OU model ... a little bit like a dream inside a dream in Inception ... though doing it might age you prematurely.So back to the top level dream we go. The attraction term in the drunken walk is linear. Only the best, perfectly crafted invisible Hooken springs are employed. There is no momentum because I wasn't sure it was necessary or even a good idea, but we could consider it. Commuting by spring is something Elon Musk might perfect when, God willing, this is all over.We can critique the realism of OU walks with or without momentum - although let's be honest, you were never that keen to go straight to work. After you stop in at the coffee shop and then run into a few friends you're approaching OU. At home you wander about too, trying to avoid the horror of home schooling.When you get to work, in this model, you are still being tugged at by the spring but it doesn't mean you will all settle into the same exact location and automatically infect everyone you work with. Your random motion keeps you bobbling about in a neighborhood of your attractor, which is to say the water cooler. The same occurs when you go home. People don't converge to a single point so infection is not guaranteed at either location.But why listen to me? Open up this page and cut and paste the contents into the terminal. You'll see. In better times we could use this stylized setup to model the spread of gossip or something. Sadly, we have a morbid scenario unfolding before our eyes and that's why I am trying to enlist your help.3. Modeling citiesWhat of initial conditions? The people in the model needs to start somewhere, and they start a small random distance from their home. But where is their home? Here is a little piece of code that generates a synthetic collection of home and work locations. I would consider it something of a placeholder, yet I suspect, and I might be wrong, that the model retains some essential connection to population density regardless. When you run the model you will see just how important density is.The city is generated in a manner vaguely inspired by the Chinese restaurant process. The general idea is that you start throwing people at a map but not uniformly. Instead they pick a person already on the map at random and then decide to live near them. But they also have a tendency to plonk themselves down further away from the origin thus leading to urban sprawl. More on that below, but your best documentation for this is the code. I'd expect someone will suggest a more realistic generating process.I will make one remark, however. There would seem to be a danger in confusing implied geometry with actual geometry. The former could be defined as a correction to the latter when you take into account the overly stylized movement model. It is not necessarily the case that swapping out the fake city model for actual coordinates of people from actual demographic data is the best immediate use of time. For while we can certainly question the dynamics of motion I suspect there is a stretching of the map that makes it more realistic - and with sufficient skullduggery we could reverse engineer that into the code that generates the locations of workplaces and homes. What I mean to say is that deficits in the modeling of movement and geography can cancel out - but only if you let them.4. What's not in, yet.Quite a few things could be added quite easily - though I am not sure they justify increasing the parameter space - something I am loathe to do.Overloading. The addition of an intensive care state and contingent death hazard rates based on proximal capacity would capture a major effect now missing in the model - namely overloading of the system. However to be pedantic, this effect can be calculated after the fact to first order so including it in the simulation is not strictly required. Indeed some PDE models treat death and recovery as the same thing, which is a similar trick.Fomites. (Hat tip Stephen Luterman). I had to Google that word too. The passing of disease via inanimate objects. I can't find literature that makes quantitative assessment, but including trails of people and not just their contemporaneous positions would not be unreasonable. However my gut tells me that this might be taking movement too literally. It is just one of the ways that proximity leads to transmission and we aren't modeling whether it is a cough or cross contamination. We can absorb it into the walk size to first order and if necessary, testing rates - since although they don't move, fomites are otherwise like non-compliant symptomatic people.Cohorts. Different behavior by age and vulnerability. If the data by age is available, then we get back the cost of putting in cohorts. I think this might be the next thing that goes into the model.Schools. I point out that work, in the model, is wherever anyone goes. It can be school. I think the city simulation could be improved. School may be an important dynamic especially during lockdown periods in countries like Australia, where most things are closed but schools remain open.Non-compliant positive people could be included in the model, but this is very close to simply changing the testing rates (non-compliant positives are the same as asymptomatic carriers).Repeat illness. A post by Daniel Goldstein reads: "New data not published yet suggests that 70% of people develop measurable neutralizing antibodies that peak and stabilize at about 2 weeks. 30% don’t, even though they recovered clinically. This may prove to be a thing, we don't know yet.Gatherings. A hurricane could force many people into shelters (suggestion from Michael Broome). A small change could enable one to study the impact of allowing events involving fifty or five hundred people. However is this similar to including a small number of large but far-flung households?5. A visual tour of some parameters and their sensitivityIn mathematics we count one, two, many. Two is usually the hardest so we skipped it and went straight to many agents. This does not imply an absurd number of parameters. In this section we introduce the small number of control knobs for the city generation and movement - as distinct from the medical parameters governing illness progression that are entirely standard.Kappa [default=3.0] Kappa is the linear coefficient in the restoring force that drags everyone towards their attractors. The larger kappa, the stiffer the invisible spring. Kappa controls how keen people are to get to work, and symmetrically how keen your are to get home. Some might wish to break this symmetry - and we probably all know people for whom the homebound kappa might be smaller than the workbound kappa - but I don't think it is required. In fact we might convince ourselves that kappa might not even need to be a free parameter and can be fixed once and for all.The first thing to appreciate about kappa is its role at home. The video below shows a number of OU particles that are all attracted to the origin. They quickly reach an equilibrium state where their typical distance to the origin ceases to change very much. The green circle on the left is the theoretically computed root mean square distance to the origin, which is inversely proportional to the square root of kappa. The larger kappa, the tighter people cluster at home and, to a lesser extent, at work.Typically we would expect higher kappa to lead to more contagion. h. Average household size [default=2.5]The average household size and the tightness of the household (kappa) are going to drive contagion at home. Households are a recent change to the code. Household sizes are now binomially distributed. A household is merely a group of people with the exact same home attractor. Household size and the relative size of kappa and w (discussed next) are going to determine if everyone gets sick when one person does. Anecdotally that seems to be the case!It isn't clear to me that we need household size to be a free parameter for forecasting purposes given we have two other knobs, and I would be inclined to fix it at the national average. However in the presence of demographic data, varying household size and holding everything else constant might enrich the feature space and possibly help us understand differing infection rates (say between the Bronx and Manhattan).Before leaving household size some comment on the relationship to kappa is in order. The interplay between kappa and household size is seemingly straightforward at home. Commuters' mean square distance to home in the evenings can also relate quite closely to the ergodic average if kappa is on the higher side. Here we set kappa=6 to make the point.All that said commuting muddies the waters in other ways. Different values of kappa might lead to a faster commute through troubled waters - depending on the geography - or quite time in the car alone (as it were). The video below shows progression of the disease in a town using nine increasing values of kappa (increasing left to right along the first row, then the second and third). In this and the other comparisons, I will show that all parameters are relative to a baseline town (see the code for this and other ready to go towns and cities). The baseline for kappa is 3.0. Thus the top left simulation is for kappa=1.8 and the bottom right kappa=4.2.There is, I think you will agree, no glaringly obvious pattern although, as expected, contagion occurs less rapidly for the small values of kappa. It may not be just a home effect. With kappa set in the vicinity of 1.8 some people might not be turning up to work every day - never mind not mingling as closely with their work colleagues as they might.A caveat. With all of these simulations the initial conditions are important and those are generated randomly by design. If you scan back up to the first 3 x 3 video you'll see nine simulations that, though I didn't mention it at the time, use identical parameters. They certainly don't end up the same. Perhaps this will give you newfound respect for experts trying to predict the course of this disease but also a sense of the noise in these comparisons.c [default=0.5] The fraction of people who have work attractors. Work, as noted, can be school or just some place people go that is sufficiently removed from where one would otherwise wander to near home.W [default=1.0] The random walk step size is governed by the variable W. Philosophically I would like to think we could fix this as W=1.0 the way physicists like to set the speed of light c=1, but as a practical matter that doesn't work too well (see the geohashing part of the code, those who are interested). It might also detract from the message in the video below: walk size matters a lot. Morally, walk size is social distance.You'll notice that contagion occurs very quickly on the top left simulation - although its geography was a little unlucky, admittedly. The penetration is about 80%. On the bottom right, in comparison, we have a simulation where roughly one third as many people have caught the bug.One should be leery of interpreting this as a "right to roam". I prefer the interpretation where people are meandering further from the water cooler on average, crossing the street to avoid joggers, choosing alternative means to get to work, commuting at off hours, wearing a mask (to create greater effective distance) and changing what they do when they socialize - such as shouting at each other as they sit in Adirondack chairs placed twenty feet apart.n [default=40000] The population parameter is, as you might expect, the number of agents in the simulation. If we hold the city generation parameters constant then this is almost a population density parameter - except for a small sprawl effect.In this simulation we watch towns with differing populations varying from 24,000 on the top left to 56,000 bottom right. Towns begin with the same proportion of infections, ranging from 30 infections top left to 70 bottom right. I apologize but because the simulation runs more slowly for the larger populations the progression is not in sync, making it look as if the smaller towns are doing worse than they are.Nonetheless we see the larger towns fare considerably worse by every proportional measure. The dreaded exponential growth kicks in quickly, powered by asymptomatic carriers who collide with a sufficient number of people to set off the chain reaction. Meanwhile, in the small town things turn pear shaped, but more slowly. The rate of recovery is still an appreciable fraction of the people getting sick. Recovered people are the control rods for this reactor. For some time this takes some heat out of the core, as it were. Had this simulated town done more (higher rate of testing, or higher rate of contact tracing, which is equivalent to raising the asymptomatic test rate) it might have done much better.Need we say it, a relatively small change in density makes a big difference. At the risk of a misleading comparison between this simulation's density and real world density, we note that the top right versus the bottom right density ratio happens to be about the same as the density ratio between San Fransisco and New York.The two leftmost simulations provide a density ratio of 2:1 which is similar to the density difference between Manhattan and Brooklyn. The density ratio between right and top left approximates the ratio between the densities of New York City versus Los Angeles or in turn, Los Angeles against Detroit, Cleveland or the Portland metropolitan area. We should not bucket New York with Portland. Never mind the fact that Tennessee, South Dakota and Alaska are about twenty times less populated than New Jersey per square mile.Care must be taken, however, as this is a highly stylized model intended to capture close collisions. Offices might be slightly bigger in Missouri than Manhattan, but perhaps not three times as large. People with longer commutes don't stand apart from each other more than those with short commutes, once they get to the office. There are close talkers living in Brandenburg, the least dense city in Germany.The fact that the larger town started with the same proportion of infections but a numerically larger number than the small town might play a role. However here is a similar set of simulations in which all towns start with fifty infections, irrespective of size. Things still turn out worse for the more densely packed populations.You are going to see a lot of discussion about population density. Out of curiosity I quickly scratched out this plot of the logarithm of COVID-19 deaths (as of Apr 4th) versus the logarithm of population density for European countries. It ain't a law of physics, but it is hard to ignore.I know what you are thinking ... there might be all manner of confounding variables here and I'm sure you are right. What else is correlated? I did discover in the course of my brief investigations that amongst the "kissy countries" COVID-19 deaths are also correlated strongly with the number of times it is customary to kiss someone on the cheek when meeting. However I hastily add that this correlation completely disappeared after accounting for population density (actually the sign turned negative, a little). I think, therefore, we should take this as a cautionary tale. A lot of things are likely to be correlated, spuriously or otherwise, with population density. If you hear that "X causes COVID-19" then check against the plausible culprit: population density.Parenthetically, this does leave us with a true mystery. I leave it to a reader with superior insight to explain why people who live in more densely populated countries kiss each other on the cheek a larger number of times when they meet, than those living in less densely populated countries. And before you jump to it, no I am not convinced that in increase in the number of ceremonial kisses causes children.Radius [default=0.04] Previously known as "sprawl distance", the radius spaces out homes and workplaces. A work location is selected randomly by first choosing an existing work location and then moving away (on average) a distance equal to the radius. We multiply the radius by a standard normal number. At present this parameter also determines how close people live to each other. The home radius is fixed at four times the work radius parameter. There is no real rationale for this choice beyond a desire for one less parameter.I'm sure it will not surprise you to learn that decreasing the radius, ceteris paribus, tends to set up a city more likely to be susceptible to contagion. However geometry and luck enter the fray as seen in the comparison between the bottom left and top right towns.Two more parameters control the geography simulation (arguably one too many).s. Sprawl [default=0.25] Controls the extent to which home and work locations tend to drift away from the origin. After an existing home location is chosen, the new location is centered at (1+s) times the existing home position. So when we said the mean distance from the previous home was the radius r, that wasn't entirely true.e. Sprawl quadratic term [default=0.05] ... actually we also add e times the square of the distance to the origin of the existing home when choosing the center of where the next home is to be built. It remains to be seen if this guy gets the chop from Ockham's razor.I won't make you watch any more poorly produced videos but different choices of sprawl coefficients can yield different density profiles for the city and this tips the scales toward higher or lower virus penetration into the periphery.6. Coming soon, a better use for spare CPU cycles than bitcoin miningThe has been kept as small as possible but there are quite a few we have omitted in our discussion. For those who are interested the names of parameters - some of which you may choose to treat as configuration constants, are found in pandemic/conventions.py.It is my suspicion that you can go a long way varying a small number of these parameters, but I don't yet have a lot to back up that statement. And while running the model for any one set of initial conditions or parameters is easy, running it for a large number is not. So here's a plan. If you run this script then pretty soon the following will occur:The simulation starts with parameters chosen to optimize an acquisition function, which is to say they are chosen differently each time but in a way that tries to optimize what will be learned about the simulation.When it finishes, it sends back the results.If enough people run the model, we may build up a large database of initial conditions, parameters, and model outcomes (a public database of course) which can serve as a training set for a surrogate model of pandemics. A surrogate model is one that is approximately the same as the simulation yet can be computed thousands or millions of times faster than the code you now run.So here's a slightly premature call out to all armchair epidemiologists, disease epistemologists, statistical etymologists ("data science" - c'mon) and all you people on Linked-in offering to be my personal life coach. Do you have a few spare CPU cycles? Please stop drawing exponential curves and instead do the following:Open up terminalOpen up this page and cut and paste the bash script into the terminalI'll make sure the script does a bit more than drawing pretty pictures.AcknowledgementsIt is pretty hard to model disease progression without accidentally reinventing a standard model or something like it. I like to think of myself as a defender of research mores but like others in a hurry I have written the code more quickly than I can navigate academic paywalls and research prior work, on this particular occasion.I would like to acknowledge Python's implementation of set authored by Raymond D. Hettinger which I'm guessing doesn't get a shout out too often. I found Hioaki Kawai's geohash package to be more than useful.Open source pandemic librariesHere are some other libraries that model millions of agents (c.f. pedagogical tools)The epydemic library by Simon Dobson places people on nodes of a graph.SimpactCyan (Liesenborgs et al) also models relationships and events.Please help me flesh out this list.Parting thoughtsAs noted in the prior post, I wrote this code because I couldn't get my hands on any "official" open source spatiotemporal model for contagion and, like many of you, wanted to better understand the dynamics of disease. Agent models such as this one can be used or abused. However I don't think we should be resorting to heterogenous population models for infection that don't actually model infection at all.Another way to come at this is to recognize that an agent model can, with a small modification, imitate a heterogeneous population (macroscopic) model - albeit a computationally inefficient one. Suppose I were to introduce a line of code into the simulation that shuffled the positions of every particle on the screen, placing people uniformly irrespective of their home or work locations, and doing this over and over again. This enforces homogeneity, but doesn't seem terribly sensible.Instead, I encourage you to mess with the agent model code where density differentials are preserved if you look closely, there are parties at Westport and Stanwell Tops. I don't think it will do your mathematical intuition any harm - nor your mental health. As noted this has turned me into something of a cautious conditional optimist, my warning about lack of rigorous estimation notwithstanding.The dramatic drop in movement we have seen (as registered by cell phone locations) in places like New York City needs to be carefully translated in the strange geometry of this model, but combined with the knife edge behavior you see in the simulations, it suggests a dramatic turnaround is possible. Furthermore, the lags between turning points for things like infection and death in the simulation also give me hope. I won't be surprised if things take a turn for the better - though of course we all wish we never got this far.Bye for now. A reminder that you are invited to make suggestions and improve this simulation model. It is yours to fork.Originally published hereSee More