Dynamic Model Averaging for Bayesian Quantile Regression Mauro Bernardi∗ University of Padova
Roberto Casarin
arXiv:1602.00856v1 [math.ST] 2 Feb 2016
University Ca’ Foscari of Venice
Bertrand B. Maillet A.A.AdvisorsQCG (ABN AMRO), Variances and University ParisDauphine
Lea Petrella Sapienza University of Rome
Abstract We propose a general dynamic model averaging (DMA) approach based on MarkovChain Monte Carlo for the sequential combination and estimation of quantile regression models with timevarying parameters. The efficiency and the effectiveness of the proposed DMA approach and the MCMC algorithm are shown through simulation studies and applications to macroeconomics and finance. JEL Classification: C11, C15, C32, C34. Keywords: Quantile Regression, MarkovChain Monte Carlo, Bayesian Inference, Dynamic Model Averaging, TimeVarying parameters regression, Latent Variables methods.
1. Introduction Quantile regression has been becoming popular in the recent years as a simple, robust and distribution free modeling tool since the seminal works by Koenker and Basset (1978) and Koenker (2005). It indeed provides a way to model the conditional quantiles of a response variable with respect to some covariates, in order to have a more complete picture ∗
Corresponding author, Department of Statistical Sciences, Via C. Battisti, 241, 35121, Padova, Italy, Tel.: +39.(0)49.82.74.165, email:
[email protected] Working Paper
February 3, 2016
of the entire conditional distribution than one can have with traditional linear regression. In fact, problem specific features, like skewness, fattails, outliers, breaks, truncatedcensored data, and heteroskedasticity, can sometimes shadow the nature of the dependence between the variable of interest and the covariates, so that the conditional mean would not be a sufficient statistic to fully understand the nature of that dependence. In particular, the quantile regression approach is appropriate not only when the underlying model is nonlinear or the innovation terms are nonGaussian, but also when modeling the tail behaviour of the underlying distribution is the primary interest. There is a number of published articles on quantile regression both in frequentist and Bayesian framework, dealing with parametric and nonparametric approaches. For a detailed review and references, see for example, Lum and Gelfand (2012) and Koenker (2005). In this article, we will follow a Bayesian approach to quantile regression and will propose a new approach for studying causal relationships in a very general context. Since our proposed quantile regression model can be represented as a mixture of linear and Gaussian models, a data augmentation framework can naturally be applied for inference, through the introduction of latent variables. In presence of these, the Bayesian framework provides indeed a natural inference framework to deal with latent variable models. One of the most challenging issues in quantile regression analyses is, however, related to model specification, with a particular emphasis on evaluation of the impact of the different exogenous regressors. From a classical viewpoint, a technical issue on the estimation process emerges from the fact that the objective function is not differentiable with respect to the regression parameters. The discontinuity in the first order condition of the corresponding objective function makes the derivation of the asymptotics of quantile regression estimators quite difficult, since conventional techniques based on first order Taylor expansion are no more applicable. Bayesian inference allows us to easily deal with model selection, which can be a quite difficult issue in other approaches for inference. In a Bayesian quantile regression framework, we can thus efficiently deal with the model selection problem by evaluating the marginal likelihood corresponding to model specifications differing by including or excluding a given regressor. However, from a computational point of view, this method 2
requires the estimation and the postprocessing of the 2M models when the number of regressors is M, and it appears that it is almost not feasible as soon as M is moderately large. An automatic approach for variable selection, adapting a version of the Stochastic Search Variable Selection (SSVS) by George and McCulloch (1993), has been proposed in Meligkotsidou et al. (2009), Reed et al. (2009), and Alhamzawi and Yu (2012). In this article, we follow a different route and extend the dynamic model averaging (DMA) approach of Raftery et al. (2010) to quantile regression models. DMA models are also becoming increasingly popular in econometrics (see, e.g., Koop and Korobilis 2012, Koop and Tole 2013 and Koop and Korobilis 2013), where it does seem to be the only computationally feasible algorithm currently available in presence of a large number of regression variables. The original DMA algorithm allows us to deal with model selection sequentially over a set of observations and its computational feasibility relies upon the linearity assumption for the underlying model. The DMA technique has been recently extended along different lines. McCormick et al. (2012) propose a DMA algorithm for binary outcome regression models. A modelbased alternative version of the DMA has also been recently elaborated by Belmonte and Koop (2013), which uses a different mechanism for potential timevarying model switching. The extension to a more general model is, nevertheless, still missing in the literature, mainly because of the computational cost implied by a higher model complexity. For example, the extension of DMA to conditionally linear models, which has the quantile regression as a special case, has a very large computation cost. When a new block of observations is added to the sample, a large number of iterations of a Markov Chain Monte Carlo (MCMC) algorithm are needed in order to approximate the posterior distribution of each model. In this article, we show how this cost can be considerably reduced by applying sequential simulation procedures. We thus apply the Sequential MCMC (SMCMC, hereafter) algorithm recently proposed in Dunson and Yang (2013), as it revealed to be an efficient and selftuning algorithm for sequential estimation of both parameter and latent variables. See also Casarin et al. (2016) for a embarrassingly parallel SMCMC algorithm for large set of data. We shall notice that our DMA procedure, providing a sequential selection procedure for 3
quantile regression (see, e.g., Kim 2007), also contributes to the stream of literature variable selection for quantile regression models with static and timevarying coefficients.
The
existing literature on the topic focuses mainly on shrinkage penalization approaches, since they permit to deal simultaneously with both the estimation and model selection problem. In this stream of literature, Wu and Liu (2009) studies the properties of variable selection methods based on least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalties for static models, while Wang et al. (2012) study the SCAD penalization for variable selection in highdimensional quantile regression. Noh et al. (2012) propose a model selection method for timevarying quantile regression, which is based on a basis expansion of the coefficients and on a penalization of the norm of each coefficient vector. An alternative approach to penalization function is represented by the use of the Bayesian Information Criterion (BIC). The theoretical properties of the BIC in a quantile regression context has been recently investigated by Lee et al. (2014), in the case where the number of variables is diverging. In the below developments, we follow a Bayesian approach (see, e.g., Ji et al. 2012) based on MCMC. The extensions with respect the existing literature is twofold. First, we apply MCMC based model selection procedure to timevarying quantile regression. Secondly, we provide a dynamic model selection approach thanks to the combination of DMA and SMCMC. As a side remark, we shall notice that a natural output of the algorithm is a combination of models with dynamic weight. In this sense, our paper also contributes to the stream of literature on model pooling (see, e.g., Hall and Mitchell 2007, Billio et al. 2013 and Fawcett et al. 2015.) Finally, another relevant contribution of the paper is to provide a quantile regression approach to two interesting empirical problems which are currently investigated in economics and finance. The first one is the INFLATION analyzed by Koop and Korobilis (2012). The second dataset contains some explanatory variables for the house price index which has been studied in the real estate literature. We here aim to investigate the empirical relationship between these variables following a quantile regression approach and to provide an answer to the problem of the best pricing models for real estates. The rest of the article is structured as follows. Section 2 introduces a quantile regression 4
model and defines the DMA problem for quantile regression. Section 3 presents our Bayesian model and the SMCMC algorithm used for sequential posterior approximation, respectively. Section 4 shows the effectiveness and efficiency of the method using synthetic data. Section 5 presents the empirical results obtained on some wellknown the macroeconomic and financial dataset. Section 6 concludes the paper. 2. DMA for Bayesian quantile regression 2.1. A timevarying quantile regression model In this section, we introduce the dynamic quantile regression model that extends the approach of Bernardi et al. (2015) to the case where the entire vector of regression parameters evolves stochastically over time.
Specifically, let xt = (1, x2,t , . . . , xM,t )′ ,
t = 1, 2, . . . , T a set of M exogenous regressors. We assume the observed scalar random variable at each point in time, yt , is a linear function of the exogenous variables xt , where the regression parameter vector β t follows a multivariate random walk dynamics as in Harvey (1989): yt = x′t β t + ξt β t = β t−1 + ζ t ,
(1) t = 1, 2, . . . , T,
β 0 ∼ N β 00 , P00 ,
(2) (3)
where ξt ∼ AL (τ, 0, σ), t = 1, 2, . . . , T are i.i.d. random variables with centred Asymmetric
Laplace distribution (ALD) with τ ∈ (0, 1) being the quantile confidence level and σ ∈ R+
is the scale parameter and β 0 ∈ RM is the initial state having mean equal to zero and diffuse variancecovariance matrix P00 = κIM , with κ → +∞. We further assume ζ t ∼ N (0, Ω), i.i.d. and independent of the measurement equation error term ξs , ∀t, s = 1, 2, . . . , T . The linear state space model introduced in Eq.
(1)(3) for modeling timevarying
conditional quantiles is nonGaussian because of the assumption made on the measurement innovation terms. In those circumstances, optimal filtering techniques used to analytically marginalize out the latent states based on the Kalman filter recursions can not be applied 5
(see Durbin and Koopman 2012). However, following Kozumi and Kobayashi (2011) and Bernardi et al. (2015), it is possible to exploit the representation of the centred Asymmetric Laplace probability density function, f (ξt  τ, σ), in terms of locationscale continuous mixture of Normals, that is f (ξt  τ, σ) =
Z
∞
0
1 1 1 2 √ exp − (ξt − λω) σ exp − ω dω, 2δσω σ 2πδσω
for t = 1, 2, . . . , T , where λ =
1−2τ τ (1−τ )
and δ =
2 τ (1−τ )
(4)
in order to let the τ –level quantile of the
measurement error ξt be equal to zero. Thus, it is easy to recognise that the non–Gaussian state space model defined in equations admits a conditionally Gaussian and linear state space representation. More specifically, equations (1)(2) become: yt = x′t β t + λωt + εt ,
ind
εt ∼ N (0, δσωt ) , i.i.d.
ζ t ∼ N (0, Ω) ,
β t = β t−1 + ζ t ,
β 0 ∼ N β 10 , P10 ,
(5) (6) (7)
where ωt , t = 1, 2, . . . , T , are i.i.d. with an exponential distribution of parameter σ −1 , that is ωt ∼ Exp (σ −1 ). In order to complete the description of the model we assume the following prior distributions for the parameters σ and Ω σ ∼ IG (a0 , b0 )
(8)
Ω ∼ IW (c0 , C0 ) ,
(9)
which are Inverse Gamma and Inverse Wishart, respectively, with densities: π (σ  a0 , b0 ) ∝ σ
−(a0 +1)
b0 exp − σ
π (Ω  c0 , Ω0 ) ∝ C0 c0 Ω−(c0 + 6
M +1 2
) exp −tr C Ω−1 , 0
and (a0 , b0 , c0 , Ω0 ) are fixed hyperparameters. 2.2. Dynamic model averaging In this section, we extend the DMA approach of Koop and Korobilis (2012) to the quantile dynamic regression model introduced in the previous section. The problem of choosing the regressors to include in the model can be tackled from two different perspectives. The first approach is to consider a variable dimension model (see, e.g., Marin and Robert 2007). In this paper we follow a second perspective, that is a model choice approach, which requires the estimation of all the submodels. Moreover following the seminal paper of Raftery et al. (2010) and the econometrics application of Koop and Korobilis (2012) we aim to apply recursively over time the model selection procedure in order to detect potential model adequacy over time. Following Raftery et al. (2010) we introduce a timevarying model index Lt which takes values in {1, 2, . . . , K} and indicates the model selected at time t. We assume it has a Markov dynamics with transition matrix P (Lt = k  Lt−1 = j) = πjk ,
k, j ∈ {1, 2, . . . , K} ,
(10)
where k ∈ {1, 2, . . . , K} denotes the model index. Let xkt = (1, xi1 t , . . . , xim(k) t )′ be the set of
(m(k) + 1) variables in the model k, with ij ∈ {2, 3, . . . , M}, j = 1, . . . , m(k) , ij 6= il ∀ l, j and
m(k) ∈ {0, 1, . . . , M − 1}. Then, we assume that each model can be expressed in the form (k)
of equations (1)– (3), where the dynamic regressor parameter vector β t (k)
covariates xt
and the vector of
are specific to the model k = 1, 2, . . . , M. This assumption essentially means
that, at each time t and for each model k = 1, 2, . . . , K, the system variables represented by the dynamic quantile regressors parameters β t and the model indicator variable Lt , i.e., {β t , Lt }, transits to a new state according to the Markovian transition matrix specified in Eq. (10) and the Markovian kernel defined by the latent states dynamics in Eq. (2). The updated states consists of the vector of predicted values at time t + 1 for each model specific (k)
quantile regression parameter vectors β t+1 and the updated time varying probability πk,t+1 associated to that model. As discussed in the previous Section 2.1, the dynamic quantile regression model defined 7
in Eq. (1)–(2) is non–Gaussian preventing the iterative application of the linear Kalman filter and smoother to get updated estimates of the latent dynamics at each point in time. To overcome this problem we again exploit the representation of the Asymmetric Laplace error term, in the measurement equation Eq. (1), as a locationscale mixture of Normals. (k)
The representation relies on the introduction of an additional latent factor ωt , specific to each model k having an i.i.d. exponential prior distribution with shape parameter σ (k) , i.e. (k)
ωt
∼ Exp(1/σ (k) ), i.i.d. ∀t.
For completeness we rewrite equations (5)– (6) for the model selection purposes as follows
yt =
K X k=1
(k) βt
(k) i.i.d.
where εt
q (k) (k)′ (k) (k) (k) (k) I{k} (Lt ) xt β t + λωt + δσ ωt εt
(k)
(k)
= β t−1 + ζ t ,
(11) (12)
(k) i.i.d.
∼ N (0, 1) and ζ t
∼ Nk+1 (0, Ω(k) ). The parameters λ, δ, and τ are define as
above. The approximated model is completed by the inclusion of the prior distribution for the first state defined in Eq. (3). 3. Sequential posterior approximation Following a standard approach to inference for the Bayesian quantile regression models we introduced some auxiliary variables, which allows us to represent the original model in Eq. (1)(3) as a conditionally Gaussian and linear state space model. We denote with z1:t = (z1 , z2 , . . . , zt ) the sequence of vectors z up to time t, then the completedata likelihood
8
of the unobservable components β 1:T and ω t:T and all parameters γ = (σ, Ω) is L (γ, ω1:T , β 1:T  y1:T , x1:T ) ∝
T Y
t=1 T Y
f (yt  β t , ωt , σ, xt ) (
T Y t=1
f (ωt  σ) f (β1 ) T X
T Y t=2
f (βt  βt−1 , Ω)
(yt − λωt − x′t β t )2 1 ∝ (σ × ωt ) exp − 2σδ t=1 ωt t=1 ( P ) T β ′1 β 1 − T −1 t=1 ωt × exp − Ω 2 exp − δ 2κ ( ) T ′ −1 1X × exp − β − β t−1 Ω β t − β t−1 . 2 t=2 t − 21
)
(13)
The Gaussian scale mixture representation makes the sequential inference difficult for the latent variables and the parameters. For this model, the DMA approach of Raftery et al. (2010) could be computationally prohibitive as the estimation of the latent variables and the parameter in the different models is not exact but needs for posterior approximation. In this situation the posterior approximation which relies upon standard MCMC schemes cannot be applied. In this paper we combine the sequential MCMC procedure (SMCMC), due to Dunson and Yang (2013), with the DMA approach to get a feasible sequential DMA procedure for latent variable models. Also, we propose some improvement of the algorithms which allow to reduce the computing time. First, we show how the forgetting factor introduced by Raftery et al. (2010) for the state covariance can be used to further improve the computing time, without jeopardizing the validity of the SMCMC procedure. Second, we show that the computing cost, which increases linearly in time, can be kept fixed by using fixedlag backward sampling procedure in the SMCMC transition kernel. In the following subsections, we present the SMCMC for the onemodel case, its extension to multimodel case, and some strategies for reducing the computational complexity.
9
3.1. Sequential model estimation Let θ t = (σ, Ω, ω 1:t , β1:t ), t ∈ N, denote the time sequence of augmented parameter vectors with values in the measurable spaces Rdt , B Rdt , with nondecreasing dimension
dt = dt−1 + d, t ≥ 1. Furthermore, we assume the augmented parameter vector can be partitioned as θ t = (θ t−1 , η t ) with η t = (ωt , β t ) the latent variable vector of dimension d, associated with the observation yt . In our quantile regression model with timevarying parameters, the prior distribution π (t) (θ t ) = π (γ) π (ω 1:t , β 1:t ), satisfies the compatibility condition π
(t+1)
(θ t ) =
Z
Rd
π (t) θ t , η t+1 dη t+1 ,
and this allows us to simplify the notation.
(14)
Furthermore, we denote with πt (θ t ) =
π (θ t y1:t ) ∝ π (γ) L (γ, ω 1:t , β1:t  y1:t , x1:t ) the posterior distribution at time t with respect to the Lebesgue measure of θ t . In the notation, we dropped out from the posterior the conditioning on the dependent variables y1:t and the covariates x1:t . In the SMCMC algorithm a population of L parallel inhomogeneous Markov chains are (l,j)
used to generate the samples θ t
with j = 1, . . . , mt , l = 1, . . . , L and t = 1, 2, . . . , T
from the sequence of posterior distributions πt , t = 1, 2, . . . , T . Each Markov chain of the population is defined by a sequence of transition kernels Kt (θ, A), t ∈ N, that are operators from Rdt−1 , B Rdt−1 to Rdt , B Rdt , such that Kt (θ, ·) is a probability measure for all θ ∈ Rdt−1 , and Kt (·, A) is measurable for all A ∈ B Rdt . The kernel Kt (θ, A) has πt as stationary distribution and results from the composition
of a jumping kernel, Jt and a transition kernel, Tt , that is Kt (θ, A) = Jt ◦
Ttmt
(θ, A) =
Z
R dt
Jt (θ, dθ ′ ) Ttmt (θ ′ , A) ,
where the fixed dimension transition is defined as Ttmt
(θ, A) = Tt ◦
Ttmt −1
(θ, A) =
Z
R dt
10
Tt (θ, dθ′ )Ttmt −1 (θ ′ , A) ,
(15)
with mt ∈ N, and T 0 = Id is the identity kernel. We assume that the jumping kernel satisfies
where Jt+1 θ t , η t+1
˜t , Jt+1 (θ t , θ t+1 ) = Jt+1 (θ t , η t ) δθt θ
= Jt+1 θ t , θ t , η t+1 .
(16)
This condition ensures that the error
propagation through the jumping kernel can be controlled over the SMCMC iterations. This result is given in Proposition 3.1, and is a simple extension of the result given in L1 norm in the Lemma 3.8 of Dunson and Yang (2013). Let us define the Vnorm, µ(·) = supf ≤V µ(f ), where V : Rn → [1, ∞] and µ is a signed measure (see Meyn and Tweedie (1993) Meyn and Tweedie, Ch. 14). Theorem 3.1. For any probability density p (·), and for θ t−1 ∈ Rdt−1 the following inequality holds πt − Jt ◦ pV ≤ where V˜ =
R
Rdt−1
sup θ t−1 ∈Rdt −1
πt (·  θ t−1 ) − Jt (θ t−1 , ·)V˜ + πt − pV ,
(17)
V ((θ t−1 , η t )) dθ t−1 .
The following result (Theorem 3.14 in Dunson and Yang 2013) shows the convergence of the chain with transition kernel Jt ◦ T mt , to the target distribution, under very general conditions. Theorem 3.2. Let us assume • (Geometric ergodicity) For each t, there exists a function Vt : Rdt → [1, ∞), C > 0 and ρt ∈ (0, 1) such that: R (a) Rdt V (θ t )2 πt (θ t ) dθt ≤ C; R (b) Rd V (θ t )2 πt (η t  θ t )dη t = Vt−1 (θ t−1 ), where θ t = (θ t−1 , η t ); (c) for all θ t ∈ Rdt , Tt (θ t , ·) − πt (·)Vt ≤ Vt (θ t ) ρt .
• (Stationary convergence) The stationary distribution πt of Tt satisfies √ αt = 2 CdH (πt , πt−1 ) → 0, 11
where πt is the marginal posterior of θ t−1 at time t. • (Jumping consistency) For a sequence of λt → 0 the following holds: sup θ t−1 ∈Rdt−1
where V˜t =
R
Rd
Jt (θ t−1 , ·) − πt (·  θ t−1 ) V˜t ,
V ((θ, η)) dθ t−1 .
t Let εt = ρm t . Then for any initial distribution π0 ,
Kt ◦ . . . ◦ K1 ◦ π0 − πt Vt ≤
t t X Y s=1
u=s
!
εu (αs + λs ) .
As regards to the geometric ergodicity and jumping consistency assumptions, there are a few works convergence rates of Gibbs samplers for Bayesian models. Rom´an and Hobert (2012) provide geometric ergodicity for a family of improper priors for Bayesian linear models. See also Jones and Hobert (2004) and Papaspiliopoulos and Roberts (2008). Since in this paper we are applying a Gibbs sampler for a conditionally linear and Gaussian model with proper conditionally conjugate prior distributions these assumption are satisfied following the Rom´an and Hobert (2013). In order to apply the SMCMC one need to specify the transition kernel Tt+1 and the jumping kernel Jt+1 at the iteration t + 1. The transition kernel Tt at the iteration t allows each parallel chain to explore the sample space of a given dimension, dt , and to generate (l,j)
samples θ t
, from the posterior distribution πt given the sample previously generated. We
choose as transition kernel the one of a blocked (or multimove) and collapsed Gibbs sampler, which iterates over the following steps: (l) (l) 1. generate σ from f σ  β 1:t , y1:t , x1:t collapsing the Gibbs with respect to the ω 1:t ; (l) (l) 2. generate ωs from f ωs  σ (l) , Ω(l) , β 1:t , y1:t , x1:t , s = 1, 2, . . . , t; (l) (l) 3. generate β 1:t from f β 1:t  σ (l) , Ω(l) , ω1:t , y1:t , x1:t with a multimove proposal; (l) 4. generate Ω(l) from f Ω  β 1:t , 12
for t = 1, 2, . . . , T , and l = 1, 2, . . . , L. The details of the sampling strategy for the full conditional distributions are provided in Appendix B.1. Due to the stochastic representation of the AL distribution in Eq. (4), we are able to implement a partially collapsed SMCMC based on Gibbs–type updating which relies on data augmentation (see Liu, 1994 and Van Dyk and Park, 2008). The key idea of the complete collapsed Gibbs–type simulation scheme is to avoid simulations from the full conditional distributions of blocks of model parameters when it is possible by analytically marginalising them out.
As shown by
Park and Van Dyk (2009), this approach has several advantages with respect to a systematic sampling because it reduces the computational time and increases the convergence rate of the sampler. In our case, it is only possible to partially integrate out the latent variables (l)
ω1:s , for s = 1, 2, . . . , t from the full conditional of the scale parameters σ (l) in the previous step 3, for l = 1, 2, . . . , L, which reduces the algorithm to a partially collapsed Gibbs sampler type move. As regard to the convergence of partially collapsed Gibbs moves, it is worth noting that, updating the parameters in the order presented above ensures that the posterior distribution at each point in time t = 1, 2, . . . , T is the stationary distribution of the generated Markov chain. In fact, by combining steps 1 and 2 sample draws are produced (l) (l) from π σ (l) , ω 1:t  β 1:s , y1:s , x1:t , for t = 1, 2, . . . , T and l = 1, 2, . . . , L, i.e. the conditional
posterior distribution and the partially collapsed Gibbs sampler is a blocked version of the ordinary Gibbs sampler, see Van Dyk and Park (2008) and Park and Van Dyk (2009). To initialise the Gibbs sampling algorithm we simulate a random draw from the joint prior distribution of the parameters defined in Eq. (8)–(9), and conditionally on that, we (l)
simulate the initial values of the augmented variables ω1 , for l = 1, 2, . . . , L. The jumping kernel Jt+1 at the iteration t + 1, allows the chains to go from a space of dimension dt to one of dimension dt+1 . We choose as jumping kernel of the lth parallel chain to be the transition kernel of a Gibbs sampler with the following full conditional distributions (l) (l) (l) 1. generate ωt+1 from the full conditional f ωt+1  σ , β t+1 , yt+1 , xt+1 ; (l) (l) (l) 2. generate β t+1 from f β t+1  σ (l) , Ω(l) , β t , ωt+1 , yt+1 , xt+1 .
The details of the sampling strategy for the full conditional distributions are provided in 13
Appendix B.2. Thanks to the locationscale mixture representation of the AL distribution, all the full conditional distributions involved in the transition and jumping kernels admits a known closed form representation. This is particularly convenient when T is large or when different models are estimated at the same time, as it is the case here, because the availability of conditional sufficient statistics that can be tracked helps to mitigate the increase in storage and computational burden over time. 3.2. Monitoring convergence The number of iterations of the Sequential MCMC sampler at each time mt are chosen accordingly to the crosschain correlation. In particular,we set the number of iterations at time t, mt , to be the smallest integer s such that rt (s) ≤ 1 − ǫ, where rt (s) is the rate function associated with the transition kernel Tt and ǫ is a given threshold level. As suggested by Dunson and Yang (2013), an upper bound for the rate function is provided by the lags chain autocorrelation, which can be estimated on line using information provided by the output of all the parallel chains, in the following way:
rˆt (s) = max j=1,2,...,p
PL
l=1
PL (s+1,t,l) ¯(s+1,t) (1,t,l) ¯(1,t) − θj θj − θj l=1 θj , 2 12 P 2 21 (s+1,t,l) (s+1,t) (1,t,l) (1,t) L θ − θ¯ θ − θ¯ j
j
l=1
j
(18)
j
(s,t,l)
, is the jth element of the vector θ (s,t,l) staking the fixed parameters γ (l) and (l) (l) the latent states generated up to time t, i.e. ω 1:t , β 1:t , by the l–th chain at the the s–th P (s,t) (s,t,l) iteration. Moreover, θ¯j = L−1 Ll=1 θj is the average of the draws of the s –th iteration where θj
over the L parallel chains.
3.3. Sequential model selection We now turn to case where multiple models are considered and estimated at the same time and when there is uncertainty about which one best represents the data generating process. We assume that the complete set of models can be represented as in equations (11)– (12), where the indicator variable Lt ∈ {1, 2, . . . , K} has a Markovian transition kernel and selects 14
the operative model at each time step t = 1, 2, . . . , T . Furthermore, we assume that models differ by the inclusion or exclusion of a given explanatory variable into the predictor vector xt . If we assume there are M − 1 potential explanatory variables and that the intercept
is included in all models then the number of models is K = 2M −1 . Even for a moderate number of variables, e.g. M = 10, the number of models may be large, e.g. 1, 024, and the specification of the transition matrix may be onerous. Following Raftery et al. (2010), we apply a forgetting factor approach as described in the following. (k) (k) (k) (k) Let πt+1t = P Lt+1 = k  y1:t , xt+1 and πt+1t+1 = P Lt+1 = k  y1:t+1 , xt+1 , where
again y1:t−1 = (y1 , y2 , . . . , yt−1 ). The approximated Hamilton filter recursion consists of the following two steps. At each point in time t, the first step predicts and updates the model probabilities as such: (k,l)
(k,l) πt+1t
(k) πt+1t+1
(πtt )α + ξ
∀l = 1, 2, . . . , L,
(19)
(k,l) (k) (k,l) πt+1t f yt+1  y1:t , xt+1 , Lt = k, ωt+1 ,
(20)
= PK
(j,l)
α j=1 (πtt ) + ξ
∝
L X l=1
,
where the exponent α in equation (19) is a forgetting factor typically settled close to one, and the constant ξ > 0 is a small number introduced to prevent rounding errors due to machine precision error. We fix this constant to ξ = 0.001/K as in Raftery et al. (2010). The forgetting factor α is introduced to avoid the explicit specification and calculation of the complete transition matrix of the Markov chain Lt which consists of K(K − 1) entries. The forgetting factor α can be interpreted as the prior mass assigned to the information prior to time t; the larger the α parameter, the lower the weight assigned to the new observation coming at time t + 1. This means that values of α approximately equal to one correspond to lower levels of prior confidence about the information content of the new observation when updating the model probabilities. As argued by Raftery et al. (2010), although the resulting model defects in the explicit transition matrix specification, this does not constitute a problem for inference in model probabilities as long as data provides enough information about which models better capture the data dynamics. Concerning 15
the model probabilities updating equation (20), it is obtained as Monte Carlo average of (k) (k,l) the approximated predictive distributions f yt+1  y1:t , xt+1 , Lt = k, ωt+1 over the entire
population of independent Markov chains to the extent of marginalising out the simulated
latent factor ωt+1 . The observation predictive distribution at time t + 1 is obtained as byproduct of the Kalman filtering updating equations for the approximated Gaussian model such as:
(k,l)
where yˆt+1t
(k,l) (k,l) (k) (k,l) f yt+1  y1:t , xt+1 , Lt = k, ωt+1 ≈ N yt+1  yˆt+1t , Vt+1 , =
(k) ′
(k,l)
(21)
(k,l)
λωt+1 + xt+1 β t+1t is the predictive mean of yt+1 for the lth
chain and model k ∈ {1, 2, . . . , K} and the variancecovariance matrix equal to −1 (k,l) (k) ′ (k,l) (k) Vt+1 = δσ (k,l) ω (k,l) + xt+1 Pt+1t xt+1 , as obtained by the Kalman filter recursion in
Appendix B.1.
The updated model probabilities are then normalised in the following way: (j,l)
(k,l) π ˜t+1t+1
πt+1t+1
= PK
(j,l)
j=1 πt+1t+1
,
∀l = 1, 2, . . . , L,
(22)
to get a properly defined probability measure. (l,k)
(l,k)
The second step consists in updating the latent factors βt+1 and ωt+1 , for l = 1, 2, . . . , L, and this can be easily done by running the SMCMC algorithm defined in the previous section, sequentially for each model K = 1, 2, . . . , K. This process is then iterated as a new sample observation becomes available. The sequence is initialised at t = 0, by setting the initial (k,l)
1 K
uniformly for all the models k = 1, 2, . . . , K and all the parallel chains of the SMCMC sampler l = 1, 2, . . . , L. Static model parameters σ (k,l) , Ω(k,l) are
model probabilities π00 =
initialised by a draw from their prior distributions defined in equations (8)–(9), respectively. 3.4. Dynamic quantile model averaging
Forecasting using SMCMC methods can efficiently incorporate parameter uncertainty in a straightforward fashion. The steps below outline how to generate onestepahead quantile forecasts, from all the competing models, at each point in time t, and how to combine them using the updated model weights. These steps are performed at the end of each SMCMC 16
(k)
iteration in the sampling period, given the model’s full parameter set, denoted by θ t . At each time point in the estimation period, an estimate of the predictive mean τ –level quantile can be obtained for the kth model such as: (k) (k) (k) (k) (k) (k) (k) qˆτ,t+1 xt+1 , β t+1 = E qτ xt+1 , β t+1  y1:t , xt+1 , β 1:t (k) (k) (k) (k) ′ = xt+1 E β t+1  y1:t , xt+1 , β 1:t ,
(23)
b (k) = E β (k)  y1:t , x(k) , β (k) is the mean of the marginal posterior distribution where β t+1 t+1 t+1 1:t (k)
of the regression parameters for model k, i.e. β t+1 , and can be obtained by numerically (k)
marginalising out the nuisance parameters (σ, Ω)(k) and the latent factors ω1:t+1 using the population of generated chains in the following way: b (k) = E β (k)  y1:t , x(k) , β (k) β t+1 t+1 t+1 1:t
1 X (k) (k) (k,l) (k,l) (k,l) (k,l) , E β t+1  y1:t , xt+1 , β 1:t , ω 1:t , σ , Ω ≈ L L
(24)
l=1
where the expectation in the right term is analytically provided by the Kalman filter L (k,l) (k,l) (k,l) predictive equation (B.6) averaged over the entire set of parameters σ , Ω , ω 1:t l=1
generated by the parallel chains such as:
E
(k) β t+1

(k) (k) y1:t , xt+1 , β 1:t
1 X (k,l) (k,l) (k,l) (k,j) ≈ β t−1t−2 + Pt−1t−2 xt−1 Vt−2 νt−2 , L l=1 L
(25)
where all the quantities involved in the previous expression have been defined in Appendix B.1. The resulting RaoBlackwellised estimate of the quantile function is more efficient than (k)
the simple average over the samples from the full conditional distribution of βt+1 , as argued in Robert and Casella (2004), pp. 130134. The model–averaged one–step–ahead τ –level quantile prediction is then formed by (k)
combining previous forecasts using the predicted model probabilities πt+1t , for each 17
competing model k = 1, 2, . . . , K DMA qˆτ,t+1
(k) (k) xt+1 , βt+1
=
K X
(k) (k) (k) (k) πt+1t qˆτ,t+1 xt+1 , β t+1
k=1
b , πt+1t xt+1 β t+1
k=1
=
K X
(k)
(k)
(k)
(26) (27)
(k)
where the predicting model probabilities πt+1t , for k = 1, 2, . . . , K have been obtained by averaging single model predictive probabilities defined in equation (19) over the L parallel chains:
L
(k) πt+1t
1 X (k,l) π , = L l=1 t+1t
∀k = 1, 2, . . . , K,
(28)
and equation (27) can be obtained by substituting for the definition of the predicted model specific τ –level quantile in equation (23) into equation (26). The multi–model predictions of the τ –level quantile function of the response variable yt at time t, is a weighted average of the (k) (k) (k) model–specific τ –level quantile predictions qˆτ,t xt , β t , where the weights are equal to (k)
the posterior predictive model probabilities for sample t, πtt−1 obtained as in equation (28). Moreover, since model predictive probabilities in equation (28) are obtained by averaging (k,l)
πt+1t over the L independent parallel chains, we can easily provide approximated confidence intervals for the estimated probabilities by calculating their Monte Carlo variance as such: (k) V πt+1t =
2 1 X (k,l) (k) πtt−1 − πtt−1 . L − 1 l=1 L
(29)
3.5. Reducing the computational complexity Fixedlag backward sampling can be used successfully to reduce the computation burden when dealing with a large number of observations. This corresponds to substitute for the fixed–interval backward smoothing algorithm detailed in Appendix B.3 with a fixedlag smoother. The fixedlag smoothing algorithm updates the h ≤ t dynamic latent states closest to the current observation yt , leaving the remaining states from 1 to h − 1 unchanged. At each iteration, the resulting DMASMCMC algorithm only simulates the quantile regression 18
coefficient β t−ht at time t, for a given lag h > 0. See also Anderson and Moore (1979) and Simon (2006). Remark. The fixedlag smoother can be interpreted as localdynamic quantile regression extending the nonparametric approach of Yu and Jones (1998). 4. Simulated examples In this section, we focus on simulated examples to assess the ability of the SMCMC and the DMA approach to recover the true model parameters and the data generating process (DGP). In all simulated datasets, the length of the sample is T = 200 observations, which corresponds to the sample size of the real data examples presented in the next section. The covariates are simulated from a uniform distribution on − T2 , T2 , i.e. xi,t ∼ U − T2 , T2 , for
i = 1, 2, . . . , M and t = 1, 2, . . . , T , and the innovation term is Gaussian, i.e. εt ∼ N (0, νt2 ), with heterosckedastic variance. For all the considered examples, the true Data Generating Process (DGP) is yt = x′t β ∗t + εt ,
∀t = 1, 2, . . . , T,
(30)
where M = 2, xt = (1, x1,t , x2,t ) and the quantile regression parameters dynamics β ∗t = ∗ ∗ ∗ β1,t , β2,t , β3,t are defined as follows.
4.1. Smooth change in quantiles
We assume the true quantile parameters have the following dynamics: the constant term ∗ ∗ β1,t is fixed at β1∗ = 2, β2,t has a change in slope at a given point in time t = 100, while the ∗ third parameter β3,t is characterised by a smooth, sinusoidal transition between two different
19
levels:
∗ β2,t =
0.6 −
0.4t , 100
if t <= 100
−0.2 + 0.4t , otherwise, 100 −1 c (2t − T − 2) ∗ β3,t = a + b 1 + exp , T 1, if t <= 100 νt2 = 0.25, otherwise,
with a = 0.2, b = 2 and c = 5.
4.2. Abrupt change in quantiles ∗ We assume an abrupt change in the constant term β1,t at time t = 100, two different ∗ coefficients for the parameter β2,t and a GARCH(1,1) dynamics for the conditional volatility:
∗ β1,t =
∗ β2,t =
−2, if t <= 100 2, otherwise, 1.6, if t <= 100 0.8,
otherwise,
2 νt2 = a + bνt−1 + cε2t−1 ,
∗ and β3,t = 2, ∀t = 1, 2, . . . , T , with a = 0.05, b = 0.9 and c = 0.05. Note that the relationship
between the coefficients of the DGP and the coefficients β t of the quantile regression is: ∗ ∗ ∗ β1,t = β1,t + νt Φ−1 (τ ), β2,t = β2,t , β3,t = β3,t , where τ is the quantile level and Φ is the
cumulative density function of a standard normal distribution. For the Bayesian estimation, the prior hyper parameter setting is as follows: a0 =, b0 =, 20
β1,t
β1,t
0
2
−1 0
−2 −3
−2
−4 −5
−4
−6 −6
−7 50
100
150
200
50
β2,t
100
150
200
150
200
150
200
β2,t 2
1.2 1
1.5
0.8 0.6 0.4
1
0.2 0
0.5
−0.2 50
100
150
200
50
β3,t
100
β3,t
2.5 2
2.2
1.5
2
1
1.8
0.5 1.6 0 50
100
150
1.4
200
50
100
Figure 1: Posterior estimate at time T of the quantile regression parameters βt = (β1,t , β2,t , β3,t ), t = 1, 2, . . . , T , for the simulated Example 4.1 (left panel) and 4.2 (right panel), with quantile level τ = 0.25 and N = 100 parallel chains. In each plot, red lines indicate true parameters, dark lines indicate posterior medians and grey areas indicate 95% HPD regions.
c0 = and C0 = 0.01I3 where I3 is the 3dim identity matrix. Figure 1 shows the sequential estimates for the two examples. Shift and smooth transition in the scale of the observation noise are captured by a corresponding changes in the intercept β1,t of the quantile regression (see first row of the figure). The second and third row of the same figure show that our DMA procedure is effective in detecting different kind of changes 21
in the regression coefficients. 5. Empirical applications with Dynamic Quantile Model Averaging 5.1. Revisiting inflation studies when causalities vary In this section, we now apply our Dynamic Quantile Model Averaging (DQMA) model to select the best set of predictors to forecast inflation in a generalised Phillips curve framework. Our aims are twofold: 1) to investigate the relevance of covariates other than the unemployment rate and lagged inflation for predicting the current inflation at different quantile levels, and 2) to test whether predictors for high and low inflation are different and/or if their relevance dynamically evolves over time. The latter question is a fundamental issue in empirical economics, since the rate of inflation pressure may have potential effects on the real side of the economy and on the overall level of output produced, and thus may influence the business cycle amplitude and periodicity. Moreover, since periods of high and low inflation pressure typically follow one another according to the business cycle phases, it is particularly relevant for policy makers purposes to distinguish the transmission mechanisms of the monetary policy between the two distinct economic phases, in order to avoid that monetary policy decisions have adverse effects on the level of real variables. Our DQMA model is particularly adapted to answer these questions, precisely because it can, per nature, consider a dynamically evolving linear relation between the covariates and the explained variable’s quantiles, precisely being able to capture and reproduce structural breaks often characterising the evolution of economic variables such as inflation (see also Primiceri (2005); Koop and Onorante (2012); Stock and Watson (2007)). Furthermore, the focus on quantiles of the predicted variable helps in discerning periods characterised by different economic conditions and, in particular, those featured by low or high inflation levels. Thanks to the conditionally Gaussian representation of the DQMA model, following the seminal articles on inflation by Engle (1982) and Bollerslev (1986), we introduce a conditional heteroskedastic volatility error term which greatly enhances the ability of the model to adapt to asymmetric conditional distributions. The dynamic model selection 22
procedure further improves our analysis, allowing us for the model inclusion probabilities at different quantiles level both 1) to be significantly different and 2) to vary over time. This essentially means not only that the importance of each exogenous regressor, as measured by (k)
its inclusion probability i.e. πtt−1 , may be different at various τ –levels – which is indeed quite standard in quantile regression – but also that those probabilities may substantially change during period of high inflation with respect to those of low inflation. We generalise the Autoregressive model of order p with exogenous covariates ARX(p) model developed by Stock and Watson (1999), and previously considered by Koop and Korobilis (2012), within our DQMA framework. The quantile model thus reads:
qτ (yt , β, γ) =
x′t−1 β
+
p X
φj yt−j ,
(31)
j=1
where yt is the inflation defined as the logarithmic percentage price growth yt = Pt 100 log Pt−1 , with Pt being a price index, xt−1 a set of lagged predictors and (yt−1 , . . . , yt−p )
the lagged inflation. In this article, we use the same dataset described in Koop and Korobilis (2012). A detailed description is given in Appendix C. Following standard literature on quantile regression analysis, we focus on five quantile levels, i.e. τ = 0.10 and τ = 0.25 (lower quantile analyses), τ = 0.5 (median analysis), and τ = 0.75 and τ = 0.90 (upper quantile analyses), which correspond, at each time step, to five different inflation levels. Due to the limited space, we report in Fig. 23 the timevarying parameters and the inclusion probabilities from our DQMA model for three quantile levels τ = (0.25, 0.5, 0.75). Note that, to keep the figure readable, we report the inclusion probabilities and associated dynamics only for those variables which are relevant to predict the different conditional quantiles at least one point in time, i.e. those variables whose inclusion probability is greater than 1/2. For comparison purposes, we report in Tab. 1 a summary of the coefficients value for a static quantile regression on the whole sample. Further details of the static regression parameter estimates are given in the supplementary material. The median analysis represents a robust version of the timevarying regression analysis 23
τ φ1 φ2 UNEMP CONS INV GDP HSTARTS EMPLOY PMI TBILL SPREAD DJIA MONEY INFEXP COMPRICE VENDOR
0.10 0.2468
0.25 0.2979 0.1385
0.50 0.3398 0.1739
0.75 0.4062 0.1948
0.90 0.4192 0.2106 0.0991
0.3192
0.4763
0.093 0.075 0.0871 0.0568 0.0063 0.0064 0.0078 0.0092 0.0169 0.0465 0.0507 0.1144 0.1048 0.1206 0.123 0.1305 0.0056
Table 1: Inflation dataset. Posterior mean of the significant parameters (selected by using the 95% credible intervals) of the static quantile regression model, estimated on the whole sample for different quantile levels (columns).
in Koop and Korobilis (2013) and Koop and Onorante (2012). As expected, our findings (middle row plots in Fig.
23), are qualitatively similar to theirs,: coefficients of the
regressors and correspondent probabilities of inclusion in the model change over time. Also, the level of persistence of the inflation, given by the sum of the autoregressive coefficients is about 0.5, which is in line with Koop and Korobilis (2013). See also φ1 and φ2 rows in column τ = 0.5 of Tab. 1. The dynamic evolution of the autoregressive parameters in Fig. 2 instead reveals a rather different picture than that the one suggested by the static parameters’ estimates. In particular, starting from the year 1973, φ1 suddenly increases to a level above 0.6, strongly affecting the overall inflation persistence. It is worth noting that this radical change in the inflation persistence, which receives a lot of attention in the recent empirical literature, coincides with the increase of the price level the US experienced during the seventies (see Fig. C.6). As regards to the Phillips curve, the unemployment rate (UNEMP) has a low probability of inclusion in the median analysis for the static quantile 24
regression (see Tab. 1) but, for the dynamic regression, it can be neglected only before 1973. After this date, UNEMP has a negative impact on INFL, which is coherent with the results of Koop and Onorante (2012), (see Fig. 3). In line with the result obtained in the static regression framework, other potential explanatory variables included in the median dynamic regression are: HSTARTS, EMPLOY, TBILL, DJIA and INFEXP. Three of them (namely HSTARTS, TBILL and INFEXP) are also found significant in the dynamic mean regression of Koop and Korobilis (2013). Despite their explicative power for explaining INFL, those variables are, nevertheless, not relevant for the entire reference period. The total new privately owned housing units (HSTARTS), for example, displays an inclusion probability considerably greater than 1/2 at the end of the period. The dynamic evolution of Treasury Bill rate (TBILL) and the expectations on future inflation levels (INFEXP), are in line with the results of Koop and Korobilis (2013). However, TBILL starts to become irrelevant to explain inflation levels after the end of the 1970s, while, in the same period, INFEXP starts to rise to reach a probability level of one at the end of the period. Another interesting finding concerns the relevance of the employment rate (EMPLOY) and Dow Jones index returns (DJIA) displaying similar inclusion probability dynamics. Both covariates start to became significant inflation predictors after the mid 1980s, at the beginning of the Great Moderation of the business cycle. A possible interpretation for this patterns is that some covariates become relevant to forecast the inflation only during period of low inflation volatility. More importantly, all the inclusion probability, with the notable exception of the Treasury Bill, rate (TBILL) become more relevant at the end of the period, starting from mid 1990s. The lower and upper quantile analyses show that coefficients of the regressors significantly change over quantiles with relevant implications in terms of policy. More specifically, in the static quantile regression we found that the autoregressive terms (φ1 , φ2 ) increase with respect to the medial levels. For low inflation levels, the persistence is about 0.24 (e.g., see column τ = 0.10, Tab. 1), whereas for high inflation level, we find evidence of higher persistence, that is about 0.61 (e.g., see column τ = 0.90, Tab. 1). The dynamic regression report a similar result at the end of the period. Inflation persistence is higher for the highest 25
quantile confidence levels. Although we note that, as expected, during the two inflation peaks of 1975 and 1981, the parameter φ2 is largely negative, denoting a reduction of the inflation persistence. The importance of a timevarying parameters analysis is clear from Fig. 3. For example, following the static regression model fitted on the whole sample, we can falsely conclude that the UNEMP variable is not relevant at any inflation level (see Tab. 1), whereas the timevarying regression provides evidence of a higher probability of inclusion of UNEMP in the models, with a negative coefficient (see green lines in Fig. 32) when predicting moderate inflation level (median analysis).
A similar result holds true
for HSTARTS, which is not always relevant following the static regression, and appears to have a significant impact for the whole period when forecasting moderate inflation levels. INFEXP instead is always significant for all the considered quantile confidence levels in both the static and dynamic regressions. Concerning the sign and magnitude of its impact, the static regression results are confirmed in the dynamic case only for medium and high inflation levels, where it is largely positive (red line in Fig. 2), but not for the lowest quantile (τ = 0.25), where it is positive only before the Great Moderation period. From the static regression, variables such as UNEMP, INV, GDP, EMPLOY, PMI and VENDOR, have no impact both on low and high levels of inflation (see both lower and upper quantiles), whereas variables such as CONS and HSTARTS not significantly influence low levels of inflation (e.g., compare τ = 0.10 and τ = 0.90 columns in Tab. 1), but have a positive impact when inflation is high. The dynamic regression only partially confirms these results revealing some changes. For example, INV, GDP, PMI and VENDOR have a probability of inclusion in the model lower than 1/2 whatever the quantile confidence levels, whilst UNEMP and EMPLOY are always relevant at least for some subperiods (see green and black lines in Fig. 3). The dynamic regression results also reveals a larger number of significant covariates for most or part of the period. More specifically, some covariates, such as MONEY and SPREAD for instance, are relevant only for the first and last quartile regressions. A major advantage of DQMA over traditional static model selection approaches consists in the dynamic update of the posterior inclusion probabilities. However, the benefits of DQMA over competing alternative approaches are not exhausted in the dynamic model 26
τ = 0.25
τ = 0.25
1
1.0
0.5 0.0 0
φ1 φ2 UNEMP INFEXP HSTARTS
0.5 1
1.0
2.0
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.50 0.4
0.6
0.3 0.2
0.4
0.1
φ1 φ2 UNEMP INFEXP HSTARTS
0.2 0
0
0.2
61 65 69 73 77 81 85 89 93 97 01 05
2.0
0.5
1.0
1
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.75
1
0.5
TBILL DJIA EMPLOY
0.1
τ = 0.75
0
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.50
0.8
0.2
TBILL DJIA EMPLOY MONEY
0.0
φ1 φ2 UNEMP INFEXP HSTARTS
1.0 2.0
61 65 69 73 77 81 85 89 93 97 01 05
TBILL DJIA EMPLOY MONEY SPREAD
61 65 69 73 77 81 85 89 93 97 01 05
Figure 2: Inflation dataset. Sequential parameter estimates for the full model for three quantile levels τ = 0.25 (upper panel), τ = 0.5 (middle panel) and τ = 0.75 (bottom panel). The dashed and dotted dark lines in the right panels denote the dynamic evolution of the autoregressive coefficients (φ1 , φ2 ).
selection. Good model selection procedures can be evaluated on the basis of their ability to adequately describe the response variable’s characteristic while retaining parsimony. When 27
τ = 0.25
τ = 0.25
1
1
0.8
0.8
0.6
0.6
0.4 0.2
TBILL DJIA EMPLOY MONEY
0.4
UNEMP INFEXP HSTARTS
0.2
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.50
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.50 1
1 0.8 0.6
0.8
0.4 0.6
0.4
UNEMP INFEXP HSTARTS
0
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.75 1 UNEMP INFEXP HSTARTS
0.9 0.8
0.7
TBILL DJIA EMPLOY MONEY SPREAD
0.7
0.6
0.6
0.5
0.5
0.4
61 65 69 73 77 81 85 89 93 97 01 05
τ = 0.75
0.9 0.8
TBILL DJIA EMPLOY
0.2
0.4 61 65 69 73 77 81 85 89 93 97 01 05
61 65 69 73 77 81 85 89 93 97 01 05
Figure 3: Inflation dataset. Sequential update of the predicted inclusion probabilities πtt−1 for three quantile levels τ = 0.25 (upper panel), τ = 0.5 (middle panel) and τ = 0.75 (bottom panel). The dashed horizontal lines represent 1/2 inclusion probability levels.
model selection is carried on for forecasting purposes, parsimony can be translated in terms of lack of overfitting, and, usually, smaller forecasted confidence intervals. As suggested by 28
8.5
8
7.5
7
6.5
6
61
65
69
73
77
81
85
89
93
97
01
05
Figure 4: Inflation dataset. Expected number of predictors for each quantile confidence level: τ = 0.25, blue line, τ = 0.5, green line, τ = 0.75, red line.
Koop and Korobilis (2012), if the DQMA approach overweights those models comprising a reduced number of predictors, then it preserves out–of–sample forecasting properties without compromising goodness–of–fit properties. The expected number of predictors included by the DQMA model selection procedure can be analytically evaluated using the predicted k inclusion probabilities πtt−1 , for t = 1, 2, . . . , T and k = 1, 2, . . . , K,such as:
E (Stτ ) =
K X
k πtt−1 Stτ,k ,
(32)
k=1
where Stk denotes the number of regressors included in model k = 1, 2, . . . , K at each point in time t = 1, 2, . . . , T , and E (St ) can be interpreted as the average number of predictors included by DQMA at time t. Figure 4 plots the expected number of regressors E (St ) for three different quantile confidence levels τ = (0.25, 0.50, 0.75). It is worth noting that the average number of predictors considers both the significant and irrelevant regressor while Figures 32 only consider those regressor that should be included at least once in the dynamic regressor set. This explains why E (S) is expected to be on average slightly larger than the number of relevant regressors included in Figures 32. Inspecting Figure 4, we note that the shrinkage reduces as the quantile confidence level increases. This essentially confirms the intuition, meaning that the number of predictors which are relevant 29
to explain higher inflation levels is, on average, larger than that for low inflation level. Moreover, we note that for the median regression, E (St0.5 ) is larger than what was found by Koop and Korobilis (2012) for the same dataset and for the same period. A possible explanation for this difference relies on the robustness properties of the median dynamic regression with underweights observations in the extreme tails reducing the variance that enters into the Kalman filter predicting equations. The proposed probability estimates are expected to be more efficient than those based on conditional mean regression. It is worth adding that for all the considered quantile levels, E (St0.25 ) changes over time and converges to a stable level after few periods. 5.2. Revisiting real estate studies with timevarying market conditions In a second application, we hereafter consider the monthly values of the REIT netofS&P 500 return from January 1982 to March 2013, which is an update version of the dataset used in Ling et al. (2000). The REIT netofS&P500 return is defined as the difference between the monthly NAREIT equity index return less the return to the S&P500 index in that month. We mobilise a quantile–based dynamic regression variant of the methodology employed by Ling et al. (2000), which instead consists in a static meanregression analysis with macro fundamental and financial variables. To dynamically update the sets of relevant regressors for predicting future REIT levels, Ling et al. (2000) combine a rollingwindow approach altogether with a stochastic search variable selection method.
Our analysis extends that
of Ling et al. (2000) in two main directions: we thus investigate 1) the predictability of moderately large/low REIT returns; 2) as well as the median return, as both a function of the same set of explanatory variables as in Ling et al. (2000) .1 As the intuition tells us, we should expect here that the determinants of the high/low returns would be rather similar in the broad picture, but rather substantially different depending on the subperiods considered, merely because the underlying market conditions, the mechanisms at stake and the economic forces are not the same. Indeed, our approach is essentially more robust and dynamic per essence, in the sense that the relevant inclusion probabilities in a quantile regression 1
A detailed description of the dataset is given in Appendix C.
30
τ TBILL SP500 TERM PREM DINDP DLEAD DCONST INLF DCONSUM DMBASE REITYLD MKTPE MKTYLD MKTMOM REITMOM JANDUM
0.10
0.25
0.50 7.1140
0.8356 1.0928
0.1743 0.1033 0.6527 0.2283
0.75 0.90 7.0672 11.0337 0.1636 0.9364 0.9582 1.3798 1.1784 1.2903 1.5628
0.0871 0.6281
1.5724 0.0483 1.0396 0.5577 0.0919 1.0030
0.0842
0.0802 0.1378
Table 2: Real estate dataset. Posterior mean of the significant parameters (selected by using the 95% credible intervals) of the static quantile regression model, estimated on the whole sample for different quantile levels (columns).
framework, are updated at each point in time and the regressor parameters follow a latent dynamic process. Dynamically updating the regressors inclusion probabilities at different quantile levels may thus help to explain how the evolving general economic conditions impact the housing bubble–bursts, as it is frequently observed in the market. The quantile regression model we employ has been defined in equation (31), where we exclude the lagged endogenous variables from the set of covariates. The results of the preliminary static quantile model over the entire dataset are presented in Table 2. Inspecting Table 2, it is first evident that while a lot of variables are relevant to explain high returns level as we generally expected, only a few of them explain the conditional quantile at lower confidence levels. Furthermore, MKTPE only impacts the lowest quantile level, while MKTMOM only has influence on the highest ones. Secondly, moving now to the dynamic quantile regression, the analysis of the timevarying coefficients allow to revisit some previous results, as depicted in Figure 5, plotting here the predicted inclusion probabilities πtt−1 of the regressors at different quantile levels. 31
τ = 0.25
τ = 0.25
τ = 0.25
1.0
1.0
1.0
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.2 0.1 0.0
0.3
0.2 TBILL TERM PREM
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.2 DLEAD INFL DCONSUM
0.1 0.0
τ = 0.50
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.0
τ = 0.50 1.0
1.0
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.1 0.0
0.3
0.2 TBILL TERM PREM
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.2 DLEAD INFL DCONSUM
0.1 0.0
τ = 0.75
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.0
1.0
1.0
0.9
0.9
0.9
0.8
0.8
0.8
0.7
0.7
0.7
0.6
0.6
0.6
0.5
0.5
0.5
0.4
0.4
0.4
0.3
0.3
0.1 0.0
0.3
0.2
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.2 DLEAD INFL DCONSUM
0.1 0.0
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
τ = 0.75
1.0
TBILL TERM PREM
DMBASE REITYLD REITMOM
0.1
τ = 0.75
0.2
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
τ = 0.50
1.0
0.2
DMBASE REITYLD REITMOM
0.1
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
0.1 0.0
DMBASE REITYLD REITMOM
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
Figure 5: Real estate dataset. Predicted inclusion probabilities πtt−1 , for three quantile levels τ = 0.25 (top panel), τ = 0.5 (middle panel) and τ = 0.75 (bottom panel). The dotted lines represent the 1/2 inclusion probability.
The dynamic evolution of the inclusion probabilities shed light on which variables play a relevant role to predict future quantile levels of the real estate returns, at each time t. As illustrated by equations (26)–(27), the inclusion probabilities play a role in the DQMA model selection procedure because they represent the weights attached to models that include a particular predictor. Figure 5 only represents those regressors that are relevant at least one point in time, i.e. whose posterior inclusion probability is, at least, 1/2, for some date t. A eyeball analysis of Figure 5 so reveals strong evidence of model changes over time for all the 32
confidence levels. TBILL, for example, measuring the cost opportunity of investing in real estates, provide a significant contribution to the prediction of high, medium and low real estate exceeded returns during the period going from the end of the 80s to the beginning of the 90s. Most of the times, however, we observe that the probability of inclusion of a given regressor displays a dynamic which is substantially different when we move from the median to the higher or lower confidence levels. This behaviour is clearly visible from the middle panels of Figure 5, where the variables DLEAD (measuring the macroeconomic situation) and INFL (measuring the impact of the inflation) are never included as predictor for τ = 0.50, while, as expected, they became relevant to predict high real estate returns, τ = 0.75. Interestingly, the percentage change in the monetary base (MBASE) displays an inclusion probability equal to one a the end of the period for both high and low confidence levels, but its relevance for predicting the median real estate return (τ = 0.50) seems to be questionable. Another important result is that, similarly to the standard DMA approach of Koop and Korobilis (2012), the DQMA allow us for dealing with both gradual and abrupt changes in the inclusion probabilities (see, e.g., the abrupt change from almost 1/2 to 1, in the inclusion probability of the monetary base MBASE, for τ = (0.25, 0.75)). Albeit we can also write that there are also many cases where the inclusion probability of a given predictor evolves smoothly over time. In conclusion, our DQMA approach allows us for the dynamic selection of the relevant predictors to forecast the variable of interest over time and across quantiles. In particular, it discriminates between period of high/low real estate returns, providing great benefits for policy purposes. In such a way, policy makers could then choose the most effective instrument regarding the global economic situation of the predicted variables. And our analysis confirms with no surprise that the relevance of the policy instruments evolves over time according to the global economic conditions. 6. Conclusion We propose in this article a new DQMA approach, which for the first time combines, in a tractable way, a natural Bayesian approach and a dynamic quantile regression framework, with model risk and uncertainty on the explanatory variables, that may also exhibit special 33
timevarying features such heteroskedasticity, leptokurticity and potential breaks. After having introduced the classical quantile regression model approach and defined the DMA problem for quantile regression, we propose our Bayesian model, as well as the related SMCMC algorithm used for sequential posterior approximation. Our Bayesian approach to dynamic quantile regression modelling relies on the representation of the quantiles as a linear function of latent variables describing the stochastic evolution of the regressors. The stochastic nature of the regressors as well as the presence of a non–Gaussian misspecified measurement error term and the potentially large number of regressors represent the major challenges afforded by the proposed methodology. Specifically, exploited the location–scale mixture of Gaussians representation of the Asymmetric Laplace distribution we cast the model in a conditionally Gaussian state space model having a double hierarchical latent structure. The latent structure has been subsequently increased by adding a further layer over the discrete space of regressors combinations making batch processing infeasible even for small samples. The proposed sequential updating mechanism extends prior work of Dunson and Yang (2013) to this complex setting. We prove that the proposed combination of transition and jumping kernel which relies on a sequence of Kalman filtering and smoothing steps converges to the posterior. As a result, simulations first prove to be effective and efficient when simulated surrogate data are used. Secondly, revisiting wellknown studies of macroeconomic and financial dataset focusing on the inflation and the realestate market, empirical studies show both 1) a similar general explanation regarding the natural causes of changes in inflation and realestate variations, but 2) a rather different detailed analysis when various subperiods are considered, thus depending on the general market conditions involved, and the channels and economic mechanisms that should be highlighted. Interestingly, since interest (and inflation) rates are at the their historical lowest at the moment (and have generally only evolved downward on the recent period), one might also find of interest to apply our general approach to bond markets, in order to assess whether previous conclusions about monetary policies in a Quantitative Easing context, forecasting of interest curves when the rates are so low, and stress testing when the most probable 34
future moves are likely to be upward, are still valid at the time we are written (especially after the last financial crisis). Acknowledgement Authors’ research is supported by funding from the European Union, Seventh Framework Programme FP7/20072013 under grant agreement SYRTOSSH2012320270, by the Institut Europlace de Finance, Systemic Risk grant, by the Italian Ministry of Education, University and Research (MIUR) PRIN 201011 grant MISURA, by the 2011 Sapienza University of Rome Research Project and by the “Carlo Giannini Research Fellowship”, the “Centro Interuniversitario di Econometria” (CIdE) and “UniCredit Foundation”, and by Global Risk Institute in Financial Services.
This research used the SCSCF
multiprocessor cluster system at University Ca’ Foscari of Venice. We thank conference and seminar participants at the: SIRE Econometrics Workshop, Glasgow, 2014, the 8th CSDA International Conference on Computational and Financial Econometrics Pisa, 2014 and the 6th Italian Congress of Econometrics and Empirical Economics.
35
Appendix A. Proofs of propositions Appendix A.1. Proof of Theorem 1 and 2 Proof of Th. 1. By the assumption on the jumping kernel it follows that πt − Jt ◦ pV = Z Z = sup (p(θ t−1 ) − πt (θ t−1 )Jt (θ t−1 , θ t )) dθ t−1 f (θ t )dθ t f ≤V R dt Rdt−1 Z Z + (Jt (θ t−1 , θ t ) − πt (θ t )πt−1 (θ t−1 )) dθ t−1 f (θ t )dθ t dt−1 R dt Z R (p(θ t−1 ) − πt (θ t−1 )Jt (θ t−1 , η t )) f (θ t )dθt−1 dη t = sup f ≤V Rdt ×Rd Z Z (Jt (θ t−1 , θ t ) − πt (θ t )πt−1 (θ t−1 )) dθ t−1 f (θ t )dθ t , + R dt
Rdt−1
where θ t = (θ t−1 , η t ). Thus one obtain
Z πt − Jt ◦ pV ≤ sup (πt (θ t ) − πt (θ t−1 )Jt (θ t−1 , θ t )) f (θ t )dθt f ≤V R dt Z + sup (πt (θ t−1 )Jt (θ t−1 , θ t ) − p(θ t−1 )Jt (θ t−1 , θt )) f (θt )dθ t f ≤V R dt Z Z ≤ πt (θ t−1 ) sup sup (πt (η t θ t−1 ) − Jt (θ t−1 , θ t )) f (θ t )dη t dθ t−1 Rdt−1
θ t−1 ∈Rdt−1 f ≤V
Rd
Rdt−1
θ t−1 ∈Rdt−1 f ≤V
Rd
Z Z + sup (πt (θ t−1 ) − p(θ t−1 )) Jt (θ t−1 , θ t )f (θ t )dθ t−1 dη t f ≤V Rdt−1 Rd Z Z ≤ πt (θ t−1 ) sup sup (πt (η t θ t−1 ) − Jt (θ t−1 , θ t )) f (θ t )dη t dθ t−1
Z + sup f ≤V
=
Rdt−1
sup
θ t−1 ∈Rdt−1
(πt (θ t−1 ) − p(θ t−1 ))
Z
Rd
f ((θ t−1 , η t ))dη t
πt (η t θt−1 ) − Jt (θ t−1 , η t )V + πt − pV .
Proof of Th. 2. See Dunson and Yang (2013). 36
dθ t−1
(A.1)
Appendix B. Computational details Appendix B.1. Full conditional distributions of the transition kernel In this appendix we detail the kernel we use to simulate from the augmented joint posterior distribution of the parameters and latent states at time t, πt (θ t ). The transition kernel consists of the following full conditional distributions. (l) 1. The full conditional distribution, f σ  β 1:t , y1:t , of the measurement scale parameter
σ is obtained by collapsing the augmented latent variables up to time t, ω 1:t . The distribution is an inverted gamma, IG (a, bl ), with parameters 0
a = a + t,
0
b=b +
t X s=1
′ (l) ρτ ys − xs β s .
where ρτ (y) = y (τ − 1 (y < 0)) is the τ –th level quantile loss function. (l) 2. The full conditional distribution of ωs−1, f ωs−1  σ (l) , Ω(l) , β 1:t , y1:t , x1:t , for s = 1, 2, . . . , t is Inverted Normal, IN (ψs , φ), with parameters ψs =
s
λ2 + 2δ 2 , (ys − x′s β s )2
λ2 + 2δ 2 . φ= δ2σ
3. The sequence of full conditional distributions of the timevarying regression parameters (l)
β 1:t are obtained from the conditional Gaussian state space representation by running the Kalman filter prediction and filtering equations forward up to time t and then processing backward the observations using the Kalman smoothing equations. At each time s = 1, 2, . . . , t, we have the conditional distribution t Y (l) (l) (l) f β 1:t  σ (l) , Ω(l) , ω1:t , y1:t , x1:t = f β s  β st , Pst , s=1
37
(B.1)
where f β s 
(l) (l) β st , Pst
is the density of a normal distribution with mean and variance (l) (l) β st = E β s(l)  ω 1:t , y1:t , x1:t (l) (l) Pst = V β s(l)  ω 1:t , y1:t , x1:t ,
(B.2) (B.3)
with y1:t = (y1 , . . . , yt), which are obtained by running the following Kalman smoother recursion
and
(l) (l) (l) (l) (l) (l) β st = β ss + Pss (Ps+1s )−1 β s+1t − β s+1s (l) (l) (l) (l) (l) (l) (l) (l) Pst = Pss + Pss (Ps+1s )−1 Ps+1t − Ps+1s (Ps+1s )−1 Pss , (l) (l) β ss−1, Pss−1
and
(l) (l) β ss , Pss
predictive and filtering equations:
are obtained through the following Kalman
(l)
(l)
(B.4)
(l)
(l)
(B.5)
β ss−1 = β s−1s−1 Pss−1 = Ps−1s−1 + Ω (l)
(l)
(l)
(l)
(l)
(l)
β ss = β ss−1 + Pss−1xs Vs(l) νs(l)
(B.6) (l)
Pss = Pss−1 − Pss−1xs Vs(l) x′s Pss−1, (l)
(l)
(l)
(l)
(B.7)
(l)
where νs = ys − yˆss−1, with yˆss−1 = λωs + x′s β ss−1 , is the prediction error at time −1 (l) (l) (l) s, and Vs = δσ (l) ωs + x′s Pss−1xs is the variance of the prediction error. (l)
4. The full conditional distribution, f (Ω  β 1:t , y1:t ), of the transition variancecovariance matrix Ω(l) is inverted Wishart, IW (c, C) with parameters (t − 1) , c = c0 + 2
t−1
′ 1X β s+1 − β s β s+1 − β s . C = C0 + 2 s=1
38
Appendix B.2. Full conditional distributions of the jumping kernel In this appendix we detail the jumping kernel used to sequentially update the posterior (l) (l) latent states, ωt+1 , β t+1 , at each time t and for the lth chain of the population, when
new observations become available, ∀l = 1, 2, . . . , L. For the easy of exposition we assume
that the Jumping kernel is applied as a new observation arrives, but the procedure can be easily extended to include updating of block of observations. (l) 1. The full conditional of ωt+1 , f ωt+1  σ (l) , β t+1 , yt+1 , (l) (l) with parameters IN ψt+1 , φ (l) ψt+1
v u λ2 + 2δ 2 =u 2 , t (l) yt+1 − x′t+1 β t+1
φ
(l)
(l)
is Inverted Normal
λ2 + 2δ 2 = 2 (l) . δ σ
(l)
2. The full conditional distribution f (β t+1 σ (l) , Ω(l) , β 1:t , ω 1:t+1 , y1:t+1 ) of the dynamic (l)
regression parameters β t+1 is a normal, N (β t+1t+1 , Pt+1t+1 ), with parameters β t+1t+1 , (l)
and Pt+1t+1 which are the filtered mean and variance obtained by iterating the Kalman filter updating equations (B.4)–(B.7) for one step from time t to time t + 1. Appendix B.3. Fixedlag smoother for dynamic quantile regression In this section we describe the fixedlag smoother for and the corresponding fixedlag simulation smoother that are required to simulate draws from the distribution of the latent regression parameters at time t − h, β t−h conditional to information up to time t, i.e. π β t−h  y1:t , x1:t , ω 1:t , σ, Ω , where h is the fixed lag and t varies from t = 1, 2, . . . , T . For
t = 1, 2, . . . , T , we have
(l)
(l)
f β 1:t  σ , Ω
(l) , ω 1:t , y1:t , x1:t
39
t Y (l) (l) = f β s  β s−hs, Ps−hs , s=1
(B.8)
(l) (l) where f β s−h  β s−hs, Ps−hs is the density of a normal distribution with mean and
variance
(l) (l) (l) β s−hs = E β s−h  ω 1:s , y1:s , x1:s (l) (l) (l) Ps−hs = V β s−h  ω 1:s , y1:s , x1:s ,
(B.9) (B.10)
with y1:s = (y1 , y2 , . . . , ys ), which are obtained by running the following Kalman fixlag smoother recursion (l)
(l)
h,(l)
−1
(l)
(l)
h,(l)
−1
β s−hs = β s−hs−1 + Pss−1 xs Vs(l) νs(l)
(B.11) h,(l)
Ps−hs = Ps−hs−1 + Pss−1xs Vs(l) x′s Pss−1, (l)
(l)
where νs and Vs
are obtained by running the Kalman filter recursion for the augmented h+1,(l)
state space model and Pss−1 0,(l)
Ps
(l)
(B.12)
h,(l)
(l)
(l)
h,(l)
(l) −1 ′ xs
= Pss−1ℓs with ℓs = IK − Pss−1xs Vs
initialised by
= Pss−1, for l = 1, 2, . . . , L. Simulation from the posterior distribution of the regression
parameter can be easily accomplished by exploiting joint normality of the states. Appendix C. Data appendix Appendix C.1. Inflation dataset We consider the changes in the Consumer Price Index (CPI) as a measure of inflation. In order to make our results comparable with those obtained by Koop and Korobilis (2012) we consider the following predictors: the unemployment rate (UNEMP); the percentage change in real personal consumption expenditures (CONS); the percentage change in private residential fixed investment (INV); the percentage change in real GDP (GDP); the logarithmic transformation of the housing starts measured as total new privately owned housing units (HSTARTS); the percentage change in employment measured as all employees total private industries, seasonally adjusted (EMPLOY); the change in the purchasing manager’s composite index provided by the Institute of Supply Management (PMI); the three month Treasury Bill rate (TBILL); the spread between the 10 year and 3 month 40
3 2.5 2 1.5 1 0.5 0
61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 01 03 05 07
20 10 0 10 20
81 83 85 87 89 91 93 95 97 99 01 03 05 07 09 11 13
Figure C.6: Inflation (top panel) and real estate (bottom panel) datasets. Inflation is defined as the first difference of the US GDP deflator, while the real estate index is measured by the difference between the REIT and the SP&500 returns. Superimposed red dotted lines denote the historical unconditional quantile at confidence levels τ = (0.10, 0.25, 0.50, 0.75, 0.90).
Treasury Bill rates (SPREAD); the percentage change in the Dow Jones Industrial Average index (DJIA); the percentage change in the money supply given by the M1 variable (MONEY); the University of Michigan measure of inflation expectations (INFEXP); the change in the NAPM commodities price index (COMPRICE); and the change in the NAPM vendor deliveries index (VENDOR). Further details on the variables used and their sources can be found in the Data Appendix of Koop and Korobilis (2013). Appendix C.2. Real estate dataset The macro variables are: the current onemonth Tbill rate (TBILL); the spread between the yieldtomaturity (YTM) on a 30year government bond and the Tbill rate (TERM); the spread between the YTM on AAA corporate bonds and the YTM on 30year government 41
bonds (PREM); the percentage change in the industrial production index (INDPRD); the percentage change in the leading economic indicators (DLEAD); the percentage change in construction starts (CONST); the percentage change in the consumer price index (INFL); the percentage change in nondurable consumption (CONSUM); the percentage change in the monetary base (MBASE). Changes in these macroeconomic variables over the prior sixmonth period to avoid noise and to decrease the impact of historical data revisions on the results. The changes are measured from month t − 8 to month t − 2 for predicting month t because there are reporting delays in the noninterestrate variables. The financial variables are: the dividend yield on the S&P 500 (MKTYLD); the dividend yield on the NAREIT Index (REITYLD); the S&P500 priceearnings (PE) ratio (MKTPE); the lagged return on the S&P500 (LMKT); the compounded return to the S&P500 during the previous six months (MKTMOM); the compounded return on the equity NAREIT index over the previous six months (REITMOM); and a January dummy variable (JANDUM). See Ling et al. (2000) Section 3 for further details. References Alhamzawi, R. and Yu, K. (2012). Variable selection in quantile regression via Gibbs sampling. Journal of Applied Statistics, 39(4):799–813. Anderson, B. and Moore, J. (1979). Optimal Filtering. PrenticeHall, Englewood Cliffs, NJ. Belmonte, M. and Koop, G. (2013). Model switching and model averaging in timevarying parameter regression models. Technical report, Dep. of Economics, University of Strathclyde, N. 1302. Bernardi, M., Gayraud, G., and Petrella, L. (2015). Bayesian tail risk interdependence using quantile regression. Bayesian Anal., 10(3):553–603. Billio, M., Casarin, R., Ravazzolo, F., and Van Dijk, H. (2013). Timevarying combinations of predictive densities using nonlinear filtering. Journal of Econometrics, 177:213–232. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskesdaticity. Journal of Econometrics, 31:307–327. Casarin, R., Craiu, R., and Leisen, F. (2016). Embarrassingly parallel sequential markovchain monte carlo for large sets of time series. Statistics and Its Interface, forthcoming. Dunson, D. and Yang, Y. (2013).
Sequential Markov chain Monte Carlo.
http://arxiv.org/abs/1308.3861.
42
Technical report, ArXiv:
Durbin, J. and Koopman, S. (2012). Time series analysis by state space methods. Oxford University Press. Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, 50(4):987–1007. Fawcett, N., Kapetanios, G., Mitchell, J., and Price, S. (2015). Generalised density forecast combinations. Journal of Econometrics, forthcoming. George, E. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889. Hall, S. G. and Mitchell, J. (2007). Combining density forecasts. International Journal of Forecasting, 23:1–13. Harvey, A. C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press. Ji, Y., Lin, N., and Zhang, B. (2012). Model selection in binary and tobit quantile regression using the Gibbs sampler. Comput. Stat. Data Anal., 56(4):827–839. Jones, G. L. and Hobert, J. P. (2004). Sufficient burnin for gibbs samplers for a hierarchical random effects model. Ann. Statist., 32(2):784–817. Kim, M. O. (2007). Quantile regression with varying coefficients. Annals of Statistics, 35:92–108. Koenker, R. (2005). Quantile regression. Cambridge University Press. Koenker, R. and Basset, J. (1978). Regression quantiles. Econometrica, 46:33–50. Koop, G. and Korobilis, D. (2012). Forecasting inflation using dynamic model averaging. International Economic Review, 53(3):867–886. Koop, G. and Korobilis, D. (2013). Large timevarying parameter {VARs}. Journal of Econometrics, 177(2):185 – 198. Koop, G. and Onorante, L. (2012). Estimating Phillips curves in turbulent times using the ECB’s survey of professional forecasters. Technical report, Working paper series, N. 1422, European Central Bank. Koop, G. and Tole, L. (2013). Forecasting the European carbon market. Journal of the Royal Statistical Society: Series A (Statistics in Society), 176(3):723–741. Kozumi, H. and Kobayashi, G. (2011). Gibbs sampling methods for Bayesian quantile regression. Journal of Statistical Computation and Simulation, 81:1565–1578. Lee, E. R., Noh, H., and Park, B. U. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association, 109(505):216–229. Ling, D. C., Naranjo, A., and Ryngaert, M. D. (2000). The predictability of equity REIT returns: Time variation and economic significance. Journal of Real Estate Finance and Economics, 20(2):117–136. Liu, J. (1994). The collapsed gibbs sampler in bayesian computations with applications to a gene regulation problem. Journal of the American Statistical Association, 89:958–966.
43
Lum, K. and Gelfand, A. (2012). Spatial quantile multiple regression using the asymmetric Laplace process. Bayesian Analysis, 7:1–24. Marin, J.M. and Robert, C. (2007). Bayesian Core: A Practical Approach to Computational Statistics. Springer. McCormick, T. H., Raftery, A. E., Madigan, D., and Burd, R. S. (2012). Dynamic logistic regression and dynamic model averaging for binary classification. Biometrics, 68(1):23–30. Meligkotsidou, L., Vrontos, I., and Vrontos, S. (2009). Quantile regression analysis of hedge fund strategies. Journal of Empirical Finance, 16(2):264–279. Meyn, S. P. and Tweedie, R. L. (1993). Markov chains and stochastic stability. Communications and Control Engineering Series. SpringerVerlag London Ltd., London. Noh, H., Chung, K., and Van Keilegom, I. (2012). Variable selection of varying coefficient models in quantile regression. Electronic Journal of Statistics, 6:1220–1238. Papaspiliopoulos, O. and Roberts, G. (2008). Stability of the gibbs sampler for bayesian hierarchical models. Ann. Statist., 36(1):95–117. Park, T. and Van Dyk, D. (2009). Partially collapsed Gibbs samplers: Illustrations and applications. Journal of Computational and Graphical Statistics, 18:528–550. Primiceri, G. (2005). Time varying structural vector auto regressions and monetary policy. Review of economic studies, 72:821–852. Raftery, A., Karny, M., and Ettler, P. (2010). Online prediction under model uncertainty via dynamic model averaging: Application to a cold rolling mill. Technometrics, 52:52–66. Reed, C., Dunson, D., and Yu, K. (2009). Bayesian variable selection in quantile regression. Technical report, Tech. Rep., Department of Mathematical Sciences, Brunel University. Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer–Verlag. Rom´ an, J. C. and Hobert, J. P. (2012). Convergence analysis of the gibbs sampler for bayesian general linear mixed models with improper priors. Ann. Statist., 40(6):2823–2849. Rom´ an, J. C. and Hobert, J. P. (2013). Geometric ergodicity of gibbs samplers for bayesian general linear mixed models with proper priors. Linear Algebra and its Applications, 473:54–77. Simon, D. (2006). Optimal state estimation: Kalman, H infinity, and nonlinear approaches. John Wiley & Sons. Stock, J. and Watson, M. (1999). Forecasting inflation. Journal of Monetary Economics, 44:293–335. Stock, J. and Watson, M. (2007). Why has U.S. inflation become harder to forecast? Journal of Monetary Credit and Banking, 39:3–33. Van Dyk, D. and Park, T. (2008). Partially collapsed Gibbs samplers: Theory and methods. Journal of the American Statistical Association, 110:790–796.
44
Wang, L., Wu, Y., and R., L. (2012). Quantile regression for analyzing heterogeneity in ultrahigh dimension. Journal of the American Statistical Association, 107:214–222. Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica, 19:801–817. Yu, K. and Jones, M. C. (1998). Local linear quantile regression. Journal of the American Statistical Association, 93(441):228–237.
45