Page 123 - Textos de Matemática Vol. 47
P. 123
MODELLING TIME SERIES OF COUNTS: AN INAR APPROACH 113
discrete valued variables the associate distribution functions are step functions, therefore adjustments are necessary. Several authors propose a randomized PIT obtained by perturbing the step function nature of the distribution function of the discrete random variables. Additionally, [12] introduce a nonrandomized version of the PIT suitable to count data. For details and references see [12].
Further evaluation of the model based on its predictive performance may be carried out using scoring rules suggested by [12] and [28].
3.2.4. Information criteria. Akaike information criterion, AIC and its many variants has been one of the most popular tools for model selection in time se- ries analysis. In the context of time series of counts, [41] studies an automatic criterion for selecting the order of an INAR(p) model based on the corrected version of Akaike Information Criterion, AICC of [22]. Some authors have used AIC as means of choosing between non nested models for time series of counts, regardless of the lack of studies concerning the performance of the criterion in this framework. Moreover, [38] examine the ability of widely used information criteria such as AIC, BIC and the Hannan-Quinn criterion (HQ) [21] to dis- tinguish between some nonlinear times series models that have been popular with practitioners. After performing an extensive simulation study they argue that all three criteria have a useful role to play in a time series model selection exercise.
4. Illustration
This section illustrates the modelling procedure with a data set consisting of the number of di↵erent IP addresses accessing the server of the pages of the Department of Statistics of the University of Wu¨rzburg in two-minute periods from 10 am to 6 pm on the 29th November 2005, in a total of 241 observations. This data set was originally studied by [50] and exhibits small but significant autocorrelation as indicated by Figure 1. The sample mean and variance x = 1.31 and ˆ2 = 1.39 do not indicate overdispersion.
Fitting a PoINAR(1) model to the data yields the CML estimates ↵ˆ = 0.24(0.00) and ˆ = 1.01(0.01). The parametric bootstrap exercise with M = 1000 and the residual analysis represented in Figure 2 indicate that the model captures the dynamics of the data. In fact, the variance of Pearson residuals is 1.05 and the acf of the component residuals in (c) indicate that the residuals are white noise. However, Figure 2(b) indicates that at time t = 224 the residual is unusually large with a large arrival component. This may suggest the occur- rence of an additive outlier meaning that Xt=224 may be contaminated by an exogenous source but the e↵ect is not carried over to subsequent observations by the dynamics.