Summer Insect Pandemics in the United States

Initializing live version

Requires a Wolfram Notebook System

Interact on desktop, mobile and cloud with the free Wolfram Player or other Wolfram Language products.

Using the crowd-sourced data of iNaturalist.org, it is possible to get an idea of what life forms can be observed, when and where [1]. Many arthropod insect orders—Coleoptera, Hemiptera, Hymenoptera, Lepidoptera, Odanata and Orthoptera—include species that lie dormant during the winter months and emerge briefly as adults in the summer [2–10]. For example, the swallowtails of genus Papilio have two to three broods per year, and their observational data in the United States features a strong logistic peak centered roughly around August 1 [2]. Logistic peaks have an extremely simple mathematical definition (see Details). They appear in observational data across the class Insecta, and in even more generality. Some of the available COVID-19 pandemic data also closely follows a logistic peak [11, 12]. This Demonstration shows how to use a special logistic peak as a data model and extract comparable fit parameters in a wide variety of cases.

Contributed by: Brad Klee (April 2020)
Open content licensed under CC BY-NC-SA

Details

1. Theory

First, we derive the logistic equation from a three-state combinatorial model [13, 14]. A population of individuals equally divides into males and females. The male/female subsets are given a 1-to-1 mating, which strongly affects population dynamics. Initially, the entire population is in a state for pupal. At each instant of time, one individual pupa () becomes an adult (). If the new adult has its mate already in , then both go to together (in the same time step).

At any instant of time, the normalized count of adults must have a slope,

,

where is the normalized pupal population. The first term accounts for , while the second accounts for . After rearranging terms and substituting out , we arrive at a simple ordinary differential equation (ODE)

.

This ODE is solved by a polynomial , which also satisfies the boundary condition (all pupae become adults and meet their mate in a time interval ). This is not exactly what we are looking for, because the function fails to describe observational data.

The population observation data for insect pandemics is characterized by exponential tails, so we should modify the update rule to force exponential growth. At each instant of time, rather than choosing one candidate pupa, we will choose candidate pupae. This is equivalent to condensing consecutive time steps into just one. The new system of time sets , and requires that

,

or equivalently, . In other words, .

Iteration over large time steps introduces error and asymmetry. By Euler's method, . Up to a change in scale, this is essentially the logistic map, which is exactly solved by [15]

.

We also get the same function in reverse by solving with and . In forward time, the iteration is written as . The two imperfect solutions permute under time reversal symmetry. In this context, we describe either solution as slowly emerging or quickly emerging depending on the time it takes for to reach 1%. The corresponding logistic peaks are found by taking a derivative of the continuous solution, or first differences of the discrete solution, . The source code gives more details in particular.

2. Data Analysis

Summer insects such as analyzed here are typically only observed as adults, especially when they are out and searching for a mate. After mating they approach the end of their life cycle, and will die thereafter, preventing further observation. Even though the descriptive model is a vast oversimplification, there is some reason to believe that either the quick or slow emergence logistic peak will describe data. In fact, this is the case, and we achieve better than 99% accuracy in a wide variety of cases.

Datasets obtained from [1] describe observations over 5–10 years of significant activity. The data is processed and binned by day from day 1 to day 365, ignoring leap days when they occur (this causes a systematic error in February, but it is negligible due to life cycle). Most of the species are chosen because their seasonality clearly follows a logistic peak; however, this is not true for Papilio butterflies, which have multiple broods. In these cases, data from Eurytides marcellus is used as a baseline. This works well because Eurytides marcellus has similar early-season statistics, but is missing the significant late-season peak.

Day-to-day data is noisy, but can be smoothed by binning into equal time intervals. In fact, comparison with discrete fit functions requires binning anyway. Let be a derived dataset with offset and bin width . The binned data can then be compared directly to either logistic peak , as in the source code. To determine maximum likelihood parameters, we use a method familiar from quantum mechanics, where the probability of one observational function relative to another is given by a normalized dot product of the state vectors [16]. The dot product is written as , with a normalization factor . Maximum likelihood is achieved for some particular parameters when most closely approaches 1. In each case, these parameters are calculated in the source code, and reported as interval width and day of maximum outbreak in the Demonstration. Discrete fits are shown as red points in the plots.

It is also possible to ask for a least-squares fit of either continuous function to day-to-day data. This is an alternative, possibly preferable, means to estimate parameters . Continuous fits rarely have significantly different values for the extracted parameters, as can be seen from the blue curves. In the worst cases, the difference between discrete and continuous estimates of fall within a window of days, less than 10% error, not too bad considering imperfections of data. In any case, minimum uncertainty can be estimated as at least day due to per-day sample rate and large variations.

3. Discussion

According to our calculations, observation of COVID-19 in China also follows a quickly emerging logistic peak. This is not too surprising when considering similarities to the Kermack–McKendrick model [17]. The states susceptible, infected, removed can be mapped to the entomological model by . However, the Kermack–McKendrick SIR model involves two free parameters in addition to its scale and translational degrees of freedom. Taking more data to analyze, a SIR-style pupa, adult, mated (PAM) model would probably do a better job in general. This sounds like an advantage, but not quite. We have a naive perspective and analyze a minimal amount of data, so by Occam's razor, the PAM model is at least more appropriate in context, if not preferable.

These figures of 99% accuracy and higher are actually astounding when we consider that the logistic peaks used here have zero shape degrees of freedom. Yet they do have an odd shape, which is the consequence of simple iterative definitions, another amazing fortuity. Other authors have suggested using the logistic equation to fit COVID data [12]. This approach can achieve better than 99% accuracy; however, it does not capture the asymmetry of the COVID data. In a more detailed overlap analysis, we can measure accuracy on 98 data points rather than 15, then find 98.1% versus 98.6% accuracy for symmetric and asymmetric fit functions, respectively. This shows that the asymmetric model is a clear favorite because the difference of 0.5% is a whopping one-fourth of the 2% interval from 98% to 100%. When more data becomes available, we will see whether or not the trend asymmetry continues. Our guess is that it will.

One last issue to mention is anthropocentric bias, though this issue probably should have been mentioned first. Both iNaturalist data [1] and COVID-19 data are observational samplings of a larger population, much of which goes unobserved. Unless the sample density is exactly uniform relative to the unseen population, a practical impossibility, observation counts are not fully equivalent to population counts. This is okay, but less than ideal.

Here is a hypothetical example of the danger of biasing. Person 1 photographs the first Papilio glaucus of the late-summer brood and posts it on line. The photograph is seen by persons 2 and 3, who then get their own photographs, which inspire persons 4–8 and so on. This would cause a systematic error for anyone trying to estimate population growth rate from observational data. Undoubtedly it happens, but is it a persistent effect in data? Members of genus Limenitis are very attractive to photograph, but have slow emergence, so this is at least one seeming counter to the idea that viral inspiration always leads to systematic overestimate of the turn-on growth factor.

There is little to no hope that citizen scientists will collectively develop standard sampling methodology. This is even difficult for professional scientists. Most likely, biases will need to be inferred from existing data. In the future of data analysis, much more work needs to be done developing filter functions or transfer functions that can help to offset misleading biases. Until then, we can at least take away the insect/virus analogy, and be thankful if/when subsequent COVID-19 outbreaks have short lifetimes relative to more desirable examples from the class Insecta.

References

[1] iNaturalist. (Apr 6, 2020) www.inaturalist.org.

[2] Genus Papilio. (Apr 6, 2020) www.inaturalist.org/taxa/47225-Papilio.

[3] Genus Neotibicen. (Apr 6, 2020) www.inaturalist.org/taxa/751082-Neotibicen.

[4] Popillia japonica. (Apr 6, 2020) www.inaturalist.org/taxa/67760-Popillia-japonica.

[5] Libellula luctuosa. (Apr 6, 2020) www.inaturalist.org/taxa/47934-Libellula-luctuosa.

[6] Pterophylla camellifolia. (Apr 6, 2020) www.inaturalist.org/taxa/118951-Pterophylla-camellifolia.

[7] Limenitis archippus. (Apr 6, 2020) www.inaturalist.org/taxa/58586-Limenitis-archippus.

[8] Limenitis arthemis. (Apr 6, 2020) www.inaturalist.org/taxa/60607-Limenitis-arthemis.

[9] Limenitis arthemis astyanax. (Apr 6, 2020) www.inaturalist.org/taxa/58585-Limenitis-arthemis-astyanax.

[10] Bombus impatiens. (Apr 6, 2020) www.inaturalist.org/taxa/118970-Bombus-impatiens.

[11] H. Ritchie. "Coronavirus Source Data." Our World in Data. (Apr 6, 2020) ourworldindata.org/coronavirus-source-data.

[12] R. Allain, "The Promising Math behind 'Flattening the Curve'," Wired, March 24, 2020. www.wired.com/story/the-promising-math-behind-flattening-the-curve.

[13] E. W. Weisstein. "Logistic Equation" from Wolfram MathWorld—A Wolfram Web Resource. (Apr 6, 2020) mathworld.wolfram.com/LogisticEquation.html (Wolfram MathWorld).

[14] B. Klee and C. Moore. "Socks" [math-fun] mailing list, March 29, 2020.

[15] E. W. Weisstein. "Logistic Map—" from Wolfram MathWorld—A Wolfram Web Resource. (Apr 6, 2020) mathworld.wolfram.com/LogisticMapR=2.html (Wolfram MathWorld).

[16] D. J. Griffiths, Introduction to Quantum Mechanics, Cambridge: Cambridge University Press, 2017.

[17] E. W. Weisstein. "Kermack–McKendrick Model" from Wolfram MathWorld—A Wolfram Web Resource. (Apr 6, 2020) mathworld.wolfram.com/Kermack-McKendrickModel.html (Wolfram MathWorld).