# 4. Model selection using the Schwarz criterion

Nevertheless, Goldberg et al. innovated an important contribution in two key respects. Firstly, their piecewise model is defined by a small number of discrete phases or periods. This brings the advantage of directly modelling the timing and intensity of population events (the date at which the model changed from one phase to the other), and a simple description of the population behaviour in each phase. Secondly and most importantly, the authors raised the point that a model comparison is required. They test various models, both simpler (one phase) and more complex (up to six phases) in various permutations of logistic and exponential phases. We construct a continuous piecewise model, calculate likelihoods and use the BIC to select the most appropriate number of phases. Finally, we use a GOF test to show the data are plausible under the best model.

## 3. Continuous piecewise linear modelling

The goal in population modelling is usually to identify specific demographic precios meetmindful events. Typically, the objective is to estimate the date of some event that marks a change in the trajectory of the population levels, such as the start of a rapid decline or increase in population levels (perhaps from disease, migration or changes in carrying capacity) and provide a simple description of the population behaviour between these events, such as a growth rate. A CPL model lends itself well to these objectives since its parameters are the coordinates of the hinge points, which are the relative population size (y) and timing (x) of these events.

We choose the number of linear phases (or number of hinge points joining these phases) systematically as part of a model selection process. Given a 14 C dataset, we find the maximum-likelihood (ML) continuous one-piece (or one phase) linear model (1-CPL), then the ML 2-CPL, etc. Although the likelihood increases with the number of parameters (the greater freedom allows the model to fit more closely to the data), we calculate the Schwarz criterion , otherwise commonly misnamed the BIC, to naturally penalize for this increasing complexity. We favour this criterion over AIC since the BIC provides a greater penalty for model complexity than does the AIC, ensuring conservative selection that avoids an overfit model. Indeed, we find the AIC typically favours an unjustifiably complex model, for example, when using toy data where the â€˜true model’ is known. Therefore, we select the model with the lowest BIC as the best model. Model complexity beyond this provides incrementally worse BIC values, and as a result, the turning point in model complexity can be easily found, and superfluous computation for unnecessarily complex CPL models is thus avoided.

While a large database provides greater information content to justify a CPL model with many hinge points, it is worth considering the extreme case of fitting a CPL model to a tiny dataset. Figure 2 illustrates that the lack of information content naturally guards against overfitting, and a uniform distribution is always selected (a model with no demographic events and no population fluctuations) where sample sizes are low. This should make intuitive sense-in the light of such sparse evidence we should not infer anything more complex than a constant population.

## We build on this approach and overcome their shortcomings

Large 14 C databases covering long time periods often exhibit a general long-term background increase through time, attributable to some combination of long-term population growth and some unknown rate of taphonomic loss of dateable material through time. Such a dataset may be better explained by a model of exponential growth (requiring just a single lambda parameter) than a CPL model. Therefore, for real datasets, the model selection procedure should also consider other non-CPL models such as an exponential model.