Jul 22, 2015 - Indeed, the purpose of this exercise is to demonstrate the effectiveness of our method ..... table, locating the corresponding output v...

0 downloads 7 Views 2MB Size

Bayesian Nonparametric Dynamic State Space Modeling with Circular Latent States Satyaki Mazumder1 and Sourabh Bhattacharya2

1 Indian 2 Indian

Institute of Science Education and Research, Kolkata Statistical Institute, Kolkata

Abstract State space models are well-known for their versatility in modeling dynamic systems that arise in various scientific disciplines. Although parametric state space models are wellstudied, nonparametric approaches are much less explored in comparison. In this article we propose a novel Bayesian nonparametric approach to state space modeling assuming that both the observational and evolutionary functions are unknown and are varying with time; crucially, we assume that the unknown evolutionary equation describes dynamic evolution of some latent circular random variable. Based on appropriate kernel convolution of the standard Weiner process we model the time-varying observational and evolutionary functions as suitable Gaussian processes that take both linear and circular variables as arguments. Additionally, for the time-varying evolutionary function, we wrap the Gaussian process thus constructed around the unit circle to form an appropriate circular Gaussian process. We show that our process thus created satisfies desirable properties. For the purpose of inference we develop an MCMC based methodology combining Gibbs sampling and Metropolis-Hastings algorithms. Applications to a simulated dataset, a real wind speed dataset and a real ozone dataset demonstrated quite encouraging performances of our model and methodologies. Keywords: Circular random variable; Kernel convolution; Markov Chain Monte Carlo; State-space model; Weiner process; Wrapped Gaussian process.

Contents 1 Introduction

3

1.1

Flexibility of state space models . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.2

A brief discussion on state space models with circular states . . . . . . . . .

4

1.3

Need for nonparametric approaches to state space models . . . . . . . . . . .

4

1.4

A brief overview of the contributions and organisation of this paper . . . . .

5

2 Gaussian process based dynamic state space model with circular latent states 2.1

8

Gaussian and wrapped Gaussian process representations of the observational and evolutionary functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.2

2.3

9

Bayesian hierarchical structure of our nonparametric model based on circular latent states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

Prior specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

3 Look-up table approach to representing the distribution of the latent circular time series

13

3.1

Intuition behind the look-up table approach . . . . . . . . . . . . . . . . . .

13

3.2

Details of the lookup table approach in our circular context . . . . . . . . . .

14

3.3

Joint distribution of of the latent circular variables induced by the look-up table 16

3.4

Advantages of the look-up table approach . . . . . . . . . . . . . . . . . . .

4 Simulation study

18 19

4.1

True model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

4.2

Choices of prior parameters and the grid Gz . . . . . . . . . . . . . . . . . .

19

1

4.3

Brief discussion related to impropriety of the posteriors of some unknowns and the remedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

4.4

MCMC details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

4.5

Results of our simulation study . . . . . . . . . . . . . . . . . . . . . . . . .

22

5 Real data analysis 5.1

5.2

25

Wind speed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

5.1.1

Brief description of the data . . . . . . . . . . . . . . . . . . . . . . .

25

5.1.2

Prior choices and MCMC implementations . . . . . . . . . . . . . . .

25

5.1.3

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

Ozone level data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

27

5.2.1

A brief description of the data set . . . . . . . . . . . . . . . . . . . .

27

5.2.2

Prior choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

30

5.2.3

MCMC implementation . . . . . . . . . . . . . . . . . . . . . . . . .

31

5.2.4

Results of ozone level data . . . . . . . . . . . . . . . . . . . . . . . .

31

6 Discussion and conclusion

34

A Appendix

37

A.1 Gaussian process on linear and angular component and its properties . . . .

37

A.2 Covariance structure of our Gaussian process . . . . . . . . . . . . . . . . . .

39

S-1Smoothness properties of our Gaussian process with linear-circular arguments

40

S-1.1 Mean square continuity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

S-1.2 Mean square differentiability . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

2

S-2MCMC-based inference

44

S-2.1 Full conditional distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

S-2.1.1 Updating β f by Gibbs steps . . . . . . . . . . . . . . . . . . . . . . .

47

S-2.1.2 Updating β g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

S-2.1.3 Updating σf2 and σg2 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

S-2.1.4 Updating σ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

S-2.1.5 Updating ση2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

S-2.1.6 Updating x0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

S-2.1.7 Updating g ∗ (1, x0 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

S-2.1.8 Updating D z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

S-2.1.9 Updating x1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

S-2.1.10Updating xt+1 , t = 1, . . . , T − 1 . . . . . . . . . . . . . . . . . . . . .

54

S-2.1.11Updating xT +1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

S-2.1.12Updating Kt , t = 1, . . . , T + 1 . . . . . . . . . . . . . . . . . . . . . .

55

1 1.1

Introduction Flexibility of state space models

The versatility of state space models is clearly reflected from their utility in multifarious disciplines such as engineering, finance, medicine, ecology, statistics, etc. One reason for such widespread use of state space models is their inherent flexibility which allows modeling complex dynamic systems through the underlying latent states associated with an “evolutionary equation” and an “observational equation” that corresponds to the observed dynamic data. That most of the established time series models admit appropriate state space repre3

sentations (see, for example, Durbin and Koopman (2001), Shumway and Stoffer (2011)) is vindication of the enormous flexibility of state space models.

1.2

A brief discussion on state space models with circular states

In reality, there may be strong evidences that the observed time series data depends upon some circular time series. For instance, the ozone level time series data depends upon wind direction (see Jammalamadaka and Lund (2006), for example). However, data on wind direction are often not recorded along with ozone level. A concrete example of such a real data, on which we illustrate our model and methodologies, is provided in Section 5. Other examples (see Holzmann et al. (2006)) include time series data on wind speed (linear) and ocean current (linear) which depend upon wind direction (circular); daily peak load of pollutants (linear) and the time of day when the peak is attained (circular); speed (linear) and direction change (circular) of movements of objects, organisms and animals, to name a few. In Section 5, for the purpose of illustration, we apply our model and methodologies on another real example on wind speed data. When both the linear and circular time series data are available, Holzmann et al. (2006) consider hidden Markov models in a discrete mixture context to statistically analyse such data sets. Our aim in this article is to propose a novel nonparametric state space approach when the circular time series data are unobserved, even though they are known to affect the available linear time series data.

1.3

Need for nonparametric approaches to state space models

To date, most of the research on state space models have adhered to the parametric set-up, assuming known forms (either linear or non-linear) of the observational and evolutionary functions. Recently Ghosh et al. (2014) considered a Bayesian nonparametric approach to 4

state space modeling, assuming that these time-varying functional forms are unknown, which they modeled by Gaussian processes. However, in their approach, observational as well as evolutionary functions consist of only linear arguments and the functions were assumed to take values on the real line R. In our case both the functions have linear as well as circular arguments and moreover, the evolutionary function itself is circular. Hence to model our unknown observational and evolutionary functions, it is necessary to construct a new Gaussian process which can take time and angle as arguments. Therefore, a significantly different approach is taken here to deal with the problem. Moreover, as a by-product of the nonparametric approach based on Gaussian processes, it turned out that the latent states have a non-Markov, non-Gaussian, nonparametric distribution with a complex dependence structure, which is suitable for modeling complex, realistic, dynamic systems. Importantly, using our novel methodology, we are able to retain these advantages for a even more challenging set up. These are briefly discussed in Section 6; details will be provided in our future work.

1.4

A brief overview of the contributions and organisation of this paper

In this paper we use Gaussian processes for modeling the unknown observational and evolutionary functions. It is important to note here that both the functions have arguments which are linear as well as circular. Moreover, the evolutionary function itself is circular. Thus, it is clear that the approaches of any other paper (for example Ghosh et al. (2014) for dynamic modeling in linear components and the references therein) previous to ours are no longer appropriate in such a framework. Hence, quite substantial methodological advancement is necessary in our case. 5

We introduce our Bayesian nonparametric state space model with circular latent states in Section 2. The first challenge is to define a Gaussian process taking time and angle as arguments. We construct an appropriate Gaussian process by convolving a suitable kernel with the standard Brownian motion (Weiner process). The Gaussian process so defined enjoys desirable smoothness properties; moreover, as the absolute difference between two time points tends to infinity and/or the absolute difference between two angles tend to π/2 indicating orthogonality, the Gaussian process based covariance tends to zero as it should be. We provide these technical details in the Appendix. We provide further details of our Gaussian process with respect to continuity and smoothness properties in the supplement Mazumder and Bhattacharya (2014b), whose sections, figures and tables have the prefix “S-” when referred to in this paper. The Gaussian process that we create is an appropriate model for the time-varying observational function, but to model the evolutionary function which is circular in nature, we convert this Gaussian process into a wrapped Gaussian process so that it becomes a well-defined circular process. To obtain the joint distribution of the latent states, in Section 3 we employ the “look-up table” approach of Bhattacharya (2007), but substantially modified for our circular setup, which will play an important role in our MCMC based Bayesian inference. A detailed discussion on look-up table is also provided in this section. In Section 4 we illustrate our model and methodologies with a simulation study, where we simulate the data set from a highly non-linear dynamic model, but fit our nonparametric model, pretending that the data-generating mechanism is unknown. Our experiment shows that even in this highly challenging situation our method successfully captures future observations in terms of coverage associated with 95% highest posterior density credible regions. It is observed that the posterior densities of the latent variable at different time points are

6

multimodal; in these cases coloured graphical representation of the posterior densities of the latent variables with higher intensity on the color standing for higher density regions provide useful visual information, which we adopt. As we find, most of the true values of the latent variables fall within the high posterior probability regions. In Section 5 we demonstrate the performance of our dynamic nonparametric model in the case of two real time series datasets comprising wind speed and the level of ozone present in the atmosphere. In the first example, wind direction (the relevant data on circular process) are recorded, but we analyse the wind speed data using our circular latent process model assuming unavailability of this dataset, and assess the fit of the posterior latent process to the actually available wind direction data. Indeed, the purpose of this exercise is to demonstrate the effectiveness of our method in capturing the true latent process in a real data set-up. In the second example, although the ozone level is recorded, the relevant wind direction data are not available, even though ozone level depends upon wind direction; see the discussion in the following paragraph. Hence, we analyse the observed ozone level data considering wind direction as a latent circular process. As such, in both the experiments on real data, we obtain quite encouraging results, particularly in terms capturing the wind directions associated with the data sets, and the set aside observed data meant for forecasting, quite precisely. The simulation study and the wind speed data analysis, however important and interesting, are meant for validation of our model and methodologies, while our actual interest is in analysis of the ozone data using our ideas. Since ozone analysis has been the interest of many researchers so far, it is worth providing a glimpse of the history of such data analysis. It is crucial to note that the way we analyse the data is completely different from the previous approaches existing in the literature, for instance, Reinsel and Tiao (1987), Smith (1989), Jammalamadaka and Lund (2006), Hassanzadeh et al. (2008). Most of these papers except

7

Smith (1989) fit parametric regression models taking ozone data as a dependent variable. Smith (1989) uses extreme value analysis to detect trend in ground level ozone. Jammalamadaka and Lund (2006) point out that ozone level depends on wind direction. None of the other papers take wind direction into consideration. One reason for not taking advantage of the information on wind direction is that the data on wind direction is circular in nature and therefore, the usual statistical techniques are rendered invalid, as rightly pointed out by Jammalamadaka and Lund (2006). But perhaps the more important reason for not accounting for wind direction is the fact that such data are often not recorded along with the ozone level. None of the previous work available in the literature focuses on such an important issue. In this paper we analyze such an ozone data considering wind direction as circular latent (unobserved) variable, and ozone level as observed linear variable, using our novel nonparametric model. Our work differs from the existing ones in two aspects. We are the first to analyze such a data using dynamic modeling in a nonparametric framework. Also, we are the first to treat wind direction as a circular latent variable and include it in such an analysis.

2

Gaussian process based dynamic state space model with circular latent states

We introduce our proposed state space model as follows: For t = 1, 2, . . . T ,

yt = f (t, xt ) + t , t ∼ N (0, σ2 ),

(1)

xt = {g(t, xt−1 ) + ηt } [2π], ηt ∼ N (0, ση2 ),

(2)

8

where {yt ; t = 1, . . . , T } is the time series observed on the real line; {xt ; t = 0, 1, . . . , T } are the latent circular states; f (·, ·) is the unknown observational function taking values on the real line, and g(·, ·) is the unknown evolutionary function with values on the circular manifold. In (2), [2π] stands for the mod 2π operation. Note that

{g(t, xt−1 ) + ηt } [2π] = {g(t, xt−1 ) [2π] + ηt [2π]} [2π] = {g ∗ (t, xt−1 ) + ηt } [2π],

(3)

where g ∗ is the linear counterpart of g, that is, g ∗ (t, xt−1 ) is the linear random variable such that g ∗ (t, xt−1 ) [2π] = g(t, xt−1 ). For convenience, we shall often use representation (3). Indeed, for obtaining the distribution of xt , we shall first obtain the distribution of the linear random variable g ∗ (t, xt−1 ) + ηt and then apply the mod 2π operation to g ∗ (t, xt−1 ) + ηt to compute the distribution of the circular variable xt . Both the observational and the evolutionary functions have arguments t, which is linear in nature, and x, which is angular. The linear argument has been brought in to ensure that the functions are time-varying, that is, the functions are allowed to freely evolve with time.

2.1

Gaussian and wrapped Gaussian process representations of the observational and evolutionary functions

We consider Gaussian and wrapped Gaussian processes to model f and g independently; for this purpose we first construct appropriate Gaussian processes for f and g ∗ by convolving a suitable kernel with the standard Wiener process. The details are provided in Appendix A.1. Once we build such Gaussian processes, we can convert that for modeling g into a wrapped Gaussian process with the mod 2π operation applied to g ∗ . However, since our evolutionary 9

equation given by (2) involves the error term ηt , we will need to compute the distribution of g ∗ (·, ·) + ηt before applying the mod 2π operation. In the Gaussian process construction detailed in Appendix A.1 we assume the mean functions of f and g ∗ to be of forms µf (·, ·) = h(·, ·)0 β f and µg (·, ·) = h(·, ·)0 β g , where h(t, z) = (1, t, cos(z), sin(z))0 ; here z is an angular quantity and β f and β g are parameters in R4 . As shown in Appendix A.2, for any fixed (t1 , z1 ) and (t2 , z2 ), where t1 , t2 are linear quantities and z1 , z2 are angular quantities, the forms of the covariances are given by cf ((t1 , z1 ), (t2 , z2 )) = exp{−σf4 (t1 − t2 )2 } cos(|z1 − z2 |) and cg ((t1 , z1 ), (t2 , z2 )) = exp{−σg4 (t1 − t2 )2 } cos(|z1 − z2 |), where σf and σg are positive, real valued parameters. A very attractive property of our Gaussian process is that whenever |θ1 − θ2 | = π/2, implying orthogonality of two directions, the covariance becomes 0, the difference in time notwithstanding. To see that this is a desirable condition, first note that the sample correlation coefficient between two vectors is cosine of the angle between them. So, if the vectors are orthogonal, then the sample correlation coefficient is zero. This simple intuition seems to encourage development of correlation functions that have this property. The angular correlation function of Dufour and Roy (1976) satisfies this property, albeit it also involves an P infinite sum. The test statistic proposed in Epp et al. (1971), given by ni=1 cos(θi ), where θi is the angle between unit vectors X i and Y i , also satisfies this property. Obviously, as the time difference tends to infinity, then also the covariance tends to zero. That desired continuity and smoothness properties hold for our Gaussian process are proved in Section S-1 of the supplement. Thus, the Gaussian process we constructed seems to have quite reasonable features that are desirable in our linear-circular context. A pertinent question that arises in the context of modeling the circular latent variables directly using wrapped Gaussian process is what if some known transformation of xt , say,

10

zt = ψ(xt ), projecting xt on the Euclidean space, is considered as the relevant (linear) latent process, which is then modeled using the linear Gaussian process based idea of Ghosh et al. (2014)? The issue here is that it is usually feasible to postulate a single Gaussian process, but since there is no unique choice of the transformation ψ, under various such transformations the distribution of the original latent states xt would be different. To avoid this undesirable feature we modeled xt directly using wrapped Gaussian process.

2.2

Bayesian hierarchical structure of our nonparametric model based on circular latent states

Our model admits the following hierarchical representation:

[yt |f, θ f , xt ] ∼ N f (t, xt ), σ2 ; t = 1, . . . , T, [xt |g, θ g , xt−1 ] ∼ N g ∗ (t, xt−1 ), ση2 [2π]; t = 1, . . . , T, [x0 ] ∼ N µx0 , σx20 [2π], [f (·, ·)|θ f ] ∼ GP h(·, ·)0 β f , σf2 cf (·, ·) , [g(·, ·)|θ g ] ∼ GP h(·, ·)0 β g , σg2 cg (·, ·) [2π], [β f , σf2 , β g , σg2 , σ2 , ση2 ] = [β f , σf2 ][β g , σg2 ][σ2 , ση2 ],

(4) (5) (6) (7) (8) (9)

where θ f = (β f , σf , σ )0 and θ g = (β g , σg , ση )0 . In the above, GP stands for “Gaussian Process”. Integrating out f (·, ·) from the above hierarchical structure we obtain that given x1 , . . . , xT , D T = (y1 , . . . , yT )0 has the multivariate normal distribution of dimension T with mean µyt = H DT β f

11

(10)

and covariance matrix Σyt = σf2 Af + σ2 I T ,

(11)

with H 0DT = (h(1, x1 ), . . . , h(T, xT )) and the (i, j)-th element of Af being cf ((i, xi ), (j, xj )). For obtaining the joint distribution of the latent circular state variables, we consider the “look-up” table approach, but before introducing this, which we discuss in details in Section 3, in the next section we provide details regarding the prior distributions of the parameters associated with the above hierarchical structure.

2.3

Prior specifications

We assume the following prior distributions.

[x0 ] ∼ von Mises(µ0 , σ02 ) γ 2 2 (− α2+2 ) [σ ] ∝ (σ ) exp − 2 ; 2σ αη +2 γη 2 2 (− 2 ) exp − 2 ; [ση ] ∝ (ση ) 2ση αg +2 γg 2 2 (− 2 ) [σg ] ∝ (σg ) exp − 2 ; 2σg ( ) α +2 f γ − f 2 [σf2 ] ∝ (σf2 ) exp − 2 ; 2σf

(12) α , γ > 0

(13)

αη , γ η > 0

(14)

αg , γg > 0

(15)

αf , γ f > 0

(16)

[β f ] ∼ N (β f,0 , Σβf,0 )

(17)

[β g ] ∼ N (β g,0 , Σβg,0 ).

(18)

Choice of the prior parameters are discussed in Sections 4 and 5.

12

3

Look-up table approach to representing the distribution of the latent circular time series

For obtaining the joint distribution of the latent circular variables, we employ the look-up table approach of Bhattacharya (2007), but because of the circular nature of the latent states, appropriate modifications are necessary (for treatment of look-up table in linear dynamic system one may see Ghosh et al. (2014)). In the next section we briefly discuss the intuition behind look-up table idea for our circular set-up.

3.1

Intuition behind the look-up table approach

For illustrative purposes let ηt = 0 for all t, yielding the model xt = g ∗ (t, xt−1 ) [2π]. Let us first assume that the entire linear process g ∗ (·) is available. This means that for every input u = (t, z), where t > 0 and z lies on the unit circle, the corresponding g ∗ (u) is available, thus constituting a look-up table, with the first column representing u and the second column representing the corresponding g ∗ (u). Conditional on (t, xt−1 ), xt = g ∗ (t, xt−1 ) can be obtained by simply picking the input (t, xt−1 ) from the first column of the look-up table, locating the corresponding output value g ∗ (t, xt−1 ) in the second column of the lookup table, and then finally reporting xt = g ∗ (t, xt−1 ) [2π]. In practice, we will simulate the Gaussian process g ∗ on a fine enough grid of inputs, and conditional on this simulated process, will simulate from the conditional distribution of g ∗ (t, xt−1 ), given xt−1 , before applying the modulo 2π operation. By making the grid as fine as required, this strategy can be made to approximate xt as accurately as desired; this is formalized in Ghosh et al. (2014) and easily goes through in our circular set-up. By repeating the aforementioned procedure for each t, the joint distribution of the circular state variables can be approximated as closely 13

as desired. Details of our strategy are provided in the next section.

3.2

Details of the lookup table approach in our circular context

We consider a set of grid points in the interval [0, 2π]; let this set be denoted by Gz = {z1 , . . . , zn }. Let D z = (g ∗ (1, z1 ), . . . , g ∗ (n, zn )). Note that D z has a joint multivariate normal distribution of dimension n with mean vector

E[D z |β g , σg2 ] = H Dz β g ,

(19)

V [D z |β g , σg2 ] = σg2 Ag,Dz ,

(20)

and covariance matrix

where H 0Dz = (h(1, z1 ), . . . , h(n, zn )) and the (i, j)-th element of Ag,Dz is cg ((ti , zi ), (tj , zj )). The conditional distribution of D z given (x0 , g ∗ (1, x0 )), β g and σg2 is an n-variate normal with mean vector

E[D z |β g , σg2 , x0 , g ∗ (1, x0 )] = H Dz β g + sg,Dz (1, x0 )(g ∗ (1, x0 ) − h(1, x0 )0 β g )

(21)

and conditional variance

Var D z |β g , σg2 , x0 , g ∗ (1, x0 ) = σg2 (Ag,Dz − sg,Dz (1, x0 )(sg,Dz (1, x0 ))0 ) ,

(22)

where sg,Dz (·, ·) = (cg ((·, ·), (t1 , z1 )), . . . , cg ((·, ·), (tn , zn )))0 . The conditional distribution of g ∗ (t, xt−1 ) given D z and xt−1 is a normal distribution

14

with mean

E[g ∗ (t, xt−1 )|D z , xt−1 , β g , σg2 ] = h(t, xt−1 )0 β g + (sg,Dz (t, xt−1 ))0 A−1 g,Dz (D z − H Dz β g ) (23)

and variance

Var g ∗ (t, xt−1 )|D z , xt−1 , β g , σg2 = σg2 1 − (sg,Dz (t, xt−1 ))0 A−1 g,Dz sg,Dz (t, xt−1 ) .

(24)

With the above distributional details, our procedure of representing the circular latent states in terms of the auxiliary random vector D z , conditional on β g and σg2 , can be described as follows. 1. x0 ∼ π ∗ , where π ∗ is some appropriate prior distribution on the unit circle. 2. Given x0 , β g and σg2 , x1 = g ∗ (1, x0 ) [2π] = g(1, x0 ), where g ∗ (1, x0 ) has a normal distribution with mean h(1, x0 )0 β g and variance σg2 . 3. Given x0 , x1 , β g and σg2 , [D z |x0 , g ∗ (1, x0 ), β g , σg2 ] is a multivariate normal distribution with mean (21) and covariance matrix (22). 4. For t = 2, 3, . . ., x∗t ∼ [g ∗ (t, xt−1 )|D z , xt−1 , β g , σg2 ] which is a normal distribution with mean and variance given by (23) and (24) respectively; xt is related to x∗t via xt = x∗t [2π].

15

3.3

Joint distribution of of the latent circular variables induced by the look-up table

Using the look-up table approach the joint distribution of (D z , x0 , x1 , x2 , . . . , xT , xT +1 ) given β g , ση2 and σg2 is as follows: [x0 , x1 , . . . , xT +1 , D z |β g , ση2 , σg2 ] = [x0 ][x1 = {g ∗ (1, x0 ) + η1 } [2π]|x0 , ση2 , σg2 ] [D z |x0 , g ∗ (1, x0 ), T Y β g , σg2 × xt+1 = {g ∗ ((t + 1), xt ) + ηt+1 } [2π]|β g , σg2 , t=1

2

D z , xt , ση .

(25)

In the above, [x0 ] ∼ π ∗ is a prior distribution on the unit circle, [x1 = {g ∗ (1, x0 ) + η1 } [2π]|x0 , β g , ση2 , σg2 ] follows a wrapped normal distribution, derived from [x∗1 = g ∗ (1, x0 )+η1 |x0 , β g , ση2 , σg2 ], which is a normal distribution with mean µg (1, x0 ) = h(1, x0 )0 β g and variance σg2 + ση2 . As already noted in Section 3, [D z |x0 , g ∗ (1, x0 ), β g , σg2 ] is multivariate normal with mean and covariance matrix given by (21) and (22) respectively, and finally the conditional distribution [xt+1 = {g ∗ ((t + 1), xt ) + ηt+1 } [2π]|β g , σg2 , D z , xt , ση2 ] is again a wrapped normal distribution derived from [x∗t+1 = g ∗ ((t + 1), xt ) + ηt+1 |β g , σg2 , D z , xt , ση2 ], which is a normal distribution with mean µxt given by (23) and variance σx2t = ση2 + σg2 1 − (sg,Dz (t, xt−1 ))0 A−1 g,Dz sg,Dz (t, xt−1 ) .

(26)

For explicit derivations of the conditional distributions associated with (25) it is necessary to bring in some more auxiliary variables. To be specific, note that x∗t = xt + 2πKt , where Kt = hx∗t /2πi, where, for any u, hui denotes the greatest integer not exceeding u. Note that

16

for each t, Kt can take values in the set {· · · , −2, −1, 0, 1, 2, · · · }. Here we view the wrapped number Kt as a random variable; see also Ravindran and Ghosh (2011). Note that x1 given (g ∗ (1, x0 ), β g , ση2 , σg2 , K1 ) has the following distribution:

[x1 |g ∗ (1, x0 ), β g , ση2 , σg2 , K1 ] =

√ 1 2πση

exp − 2σ12 (x1 + 2πK1 − g ∗ (1, x0 ))2 I[0,2π] (x1 ) η 2π(K1 +1)−g ∗ (1,x0 ) 2πK1 −g ∗ (1,x0 ) Φ −Φ ση ση

(27)

and the distribution of K1 given (g ∗ (1, x0 ), ση2 ) is

[K1 |g

∗

(1, x0 ), ση2 , σg2 ]

=Φ

2π(K1 + 1) − g ∗ (1, x0 ) ση

−Φ

2πK1 − g ∗ (1, x0 ) ση

,

(28)

where Φ(·) is the cumulative distribution function of standard normal distribution. Similarly, for t = 2, . . . , T + 1, the distributions of xt given (β g , ση2 , σg2 , D z , xt−1 , Kt ) and Kt given (β g , ση2 , σg2 , D z , xt−1 ), respectively, are

[xt |β g , ση2 , σg2 , D z , xt−1 , Kt ] =

√ 1 2πσxt

− 2σ12 (xt xt

2

+ 2πKt − µxt ) I[0,2π] (xt ) exp 2π(Kt +1)−µxt 2πKt −µxt Φ − Φ σx σx t

[Kt |β g , ση2 , σg2 , D z , xt−1 ]

=Φ

2π(Kt + 1) − µxt σxt

(29)

t

−Φ

2πKt − µxt σxt

,

(30)

where µxt and σxt are given by (23) and (26), respectively. Thus, using the conditionals (27), (28), (29) and (30), the joint distribution of the latent circular variables, conditional on β g , ση2 and σg2 can be represented as [x0 , x1 , . . . , xT +1 |β g , ση2 , σg2 ]

17

=

X

Z

[x0 , x1 , . . . , xT +1 , D z , K1 , . . . , KT +1 |β g , ση2 , σg2 ]dD z

K1 ,...,Kt+1

=

X

Z

[x0 ][D z |x0 , g ∗ (1, x0 ), β g , σg2 ][g ∗ (1, x0 )|x0 , β g , σg2 ]

K1 ,...,Kt+1

× [x1 |x0 , g ∗ (1, x0 ), K1 , ση2 , σg2 ][K1 |x0 , g ∗ (1, x0 ), ση2 , σg2 ] T Y × [xt+1 |β g , σg2 , D z , xt , ση2 , Kt+1 ][Kt+1 |β g , ση2 , σg2 , D z , xt ]dg ∗ (1, x0 )dD z .

(31)

t=1

3.4

Advantages of the look-up table approach

Ghosh et al. (2014) provide ample details on the accuracy of the look-up table approach. In particular, they prove a theorem on the accuracy of the approximation of the distribution of the latent states using the look-up table, show that the joint distribution of the latent states is non-Markovian, even though conditionally on D z , the latent states have a Markov structure. Quite importantly, Ghosh et al. (2014) point out that this approach leads to great computational savings and remarkable numerical stability of the associated MCMC algorithm thanks to the fact that Ag,Dz needs to be inverted only once, even before beginning the MCMC simulations, and that the set of grid points Gz can be chosen so that Ag,Dz is invertible. These advantages clearly remain valid even in our circular set-up.

18

4

Simulation study

4.1

True model

We now illustrate the performance of our model and methodologies using a simulation study. For this purpose we simulate a set of observations of size 101 from the following nonlinear dynamic model:

yt = tan2 (θt )/20 + vt ; tan

θt − π 2

= α tan

θt−1 − π 2

β tan

+ 1 + tan

2

θt−1 −π 2

θt−1 −π 2

+ γ cos(1.2(t − 1)) + ut ,

for t = 1, . . . , 101, where ut and vt are normally distributed with means zero and variances ση2 and σ2 . We set the values of α, β and γ to be 0.05, 0.1 and 0.2, respectively; we fix the values of both ση and σ at 0.1. We consider the first 100 observations of yt as known, and set aside the last observation for the purpose of forecasting.

4.2

Choices of prior parameters and the grid Gz

In this experiment we consider a four-variate normal prior distribution for β f with mean (0, 0, 0, 0)0 and the identity matrix as the covariance matrix. For β g we choose a four-variate normal prior with mean vector (2.5, 0.04, 1.0, 1.0)0 . The choice of the covariance matrix for β g is discussed in the next paragraph. Choices of these prior parameters ensured adequate mixing of our MCMC algorithm. Observe that in (2) of our proposed model, in the mean function of the underlying Gaussian process g, the third and the fourth components of β g are multiplied by cos(xt−1 ) and

19

sin(xt−1 ), respectively, so that an identifiability problem crops up. To counter this problem, we set the third and the fourth components of β g to be 1, throughout the experiment. Therefore, the covariance matrix for β g is chosen to be a diagonal matrix with the entries (1.0, 1.0, 0.0, 0.0)0 . For σ and σf we consider inverse gamma priors with (shape, scale) parameters (4.01, 0.005× 5.01) and (4.01, 0.1 × 5.01), respectively, so that the mode of σ is 0.005 and that of σf is 0.1. We choose the first parameter of the inverse gamma distribution to be equal to 4.01 so that the variance is 200 times the square of the mean of the inverse gamma distribution, which are in this case 0.012 and 0.25, respectively. The choices of second prior parameters for σ and σf yielded adequate mixing of our MCMC algorithm. Finally we divide the interval [0, 2π] into 100 sub-intervals and choose one point from each of the sub-intervals; these values constitute the second component of the two dimensional grid Gz . For the first component of Gz , we select a random number uniformly from each of i h 2πi 2π(i+1) , 100 , i = 0, . . . , 99. the 100 subintervals 100

4.3

Brief discussion related to impropriety of the posteriors of some unknowns and the remedy

An interesting feature associated with our model is the impropriety of the posteriors of σg , ση and K1 , . . . , KT +1 , when they are all allowed to be random. In a nutshell, for any value of Kt , exactly the same value of the circular variable xt is obtained by the mod 2π operation applied to x∗t = xt + 2πKt . Thus, given xt , it is not possible to constrain Kt unless both σg and ση are bounded. Boundedness of σg and ση would ensure that x∗t has finite variance, which would imply finite variability of Kt . Since it is unclear how to select a bounded prior for σg and ση , we obtain the maximum 20

likelihood estimates (MLEs) of these variances and plug in these estimates in our model. To obtain the MLEs, we implemented the simulated annealing methodology (see, for example, Robert and Casella (2004), Liu (2001)) where at each iteration we proposed new values of these variances, then integrated out all the other parameters using averages of Monte Carlo simulations, given the proposed values of σg and ση , so that we obtain the integrated likelihood given the proposed variances; then we calculated the acceptance ratio, and finally decreased the temperature parameter of our simulated annealing algorithm before proceeding to the next iteration. The MLEs turned out to be σ ˆg = 0.1258 and σ ˆη = 0.1348.

4.4

MCMC details

As detailed in Section S-2 of the supplement, our MCMC algorithm updates some parameters using Gibbs steps, and the remaining using random walk Metropolis-Hastings steps. To update σ and σf we implemented normal random walk with variance 0.05; x0 is updated using von-Mises distribution with κ = 3.0, and for updating xt a mixture of two von-Mises distributions with κ = 0.5 and κ = 3.0 is used for t = 1, . . . , T . The wrapping variables Kt ; t = 1, . . . , T , are updated using the discrete normal random walk with variance 1.0. All these choices are made very painstakingly after carefully adjudging the mixing properties of many pilot MCMC runs. The rest of the parameters are updated using Gibbs steps, as detailed in Section S-2 of the supplement. With the above choices of the prior parameters and Gz , and with the above MCMC updating procedure of the parameters, we performed 2, 10, 000 MCMC simulations with a burn-in period consisting of the first 1, 50, 000 iterations. The time taken to run 2, 10, 000 MCMC simulations in a desktop computer with i7 processors is 20 hours and 34 minutes.

21

4.5

Results of our simulation study

The posterior densities of the components of β f are provided in Figure 1. Figure 2 displays the posterior densities of the first two components of β g , and the posterior density of σf . Figure 3 depicts the posterior density of the σ and x101 . The horizontal bold black lines denote the 95% highest posterior density credible intervals and the vertical lines denote the true values. Observe that the true values in each of the cases fall well within the intervals. As already mentioned, it is seen that the densities of most of xt , t = 1, . . . , T, has multiple modes, so that a plot of the posterior probability distribution of the latent process for each time point, rather than ordinary credible regions, is appropriate. Such a plot for the latent time series x1 , . . . , xT is displayed in Figure 4, where regions with progressively higher densities are represented by more progressively intense colors. Quite encouragingly, most of the true values are seen to lie in the high probability regions. Figure 5 depicts the posterior predictive density corresponding to y101 ; the true value is well within the 95% highest posterior density credible interval of the predictive density. A trace plot of y101 for last 60,000 thousand iteration is also provided in Figure 5 as a sample trace plot to show the convergence of our MCMC iterations. Thus, our model performs quite encouragingly, in spite of the true model being highly non-linear and assumed to be unknown. As a result, we expect our model to perform adequately in general situations.

22

posterior density of beta_f_3

posterior density of beta_f_4 3.0

posterior density of beta_f_2

2.0

density

1.0

2.0

density

1.0

density

−0.6

−0.2 0.0

0.2

0.4

0.6

−0.005

0.000

βf1

0.005

0.010

0.0

0.0

0

0

50

100

2 1

density

3

200

4

3.0

posterior density of beta_f_1

−0.6

−0.2 0.0

βf2

0.2

0.4

0.6

−0.6

−0.2 0.0

βf3

0.2

0.4

0.6

βf4

Figure 1: Posterior densities of the four components of β f .

posterior density of beta_g_2

−0.6

−0.4

−0.2 0.0 βg1

0.2

0.4

12 10 6

density

0

0

0

2

1

50

4

100

density

8

150

4 3 2

density

posterior density of sigma_f

200

posterior density of beta_g_1

0.010

0.015 0.020 βg2

0.025

0.10

0.20

0.30

0.40

σf

Figure 2: Posterior densities of the first and second components of β g and the posterior density of σf .

posterior density of x_last

0.15 0.05

0.10

density

10

0.00

5 0

density

15

0.20

posterior density of sigma_e

0.00

0.10

0.20

0.30

0

σe

1

2

3

4

5

x101

Figure 3: Posterior densities of σ and the x101 .

23

6

7

Posterior densities of latent process

time

Figure 4: Depiction of the posterior densities of the latent circular process {xt ; t = 1, . . . , T }; higher the intensity of the color, higher is the posterior density. The black line denotes the true time series.

trace plot of y_last

−0.5

0.0

y_last

1.5 1.0 0.0

0.5

density

2.0

0.5

2.5

posterior density of y_last

−1.0

−0.5

0.0

0.5

1.0

0

y101

10000

30000

50000

iteration numbers

Figure 5: From left: The first plot is that of the posterior predictive density corresponding to y101 , where the vertical line denotes the true value and the bold black horizontal line denotes the 95% highest posterior density credible interval. The second plot is trace plot of y101 for the last 60,000 observations. 24

5 5.1 5.1.1

Real data analysis Wind speed data Brief description of the data

In Section 4 we have demonstrated the effectiveness of our ideas with a simulation study. Now we validate our model and methodologies on a real data using historic wind speed and wind direction data recorded by the National Oceanic and Atmospheric Administration’s National Data Buoy Center (http://www.ndbc.noaa.gov/historical data.shtml); the website has been very kindly brought to our notice by the Associate Editor. Similar to Marzio et al. (2012a), standard meteorological data obtained at the year 2009, monitored at station 41010 (120NM East of Cape Canaveral), which is automatically recorded every 30 minutes (at 20 and 50 past each hour), are considered for our analysis. Here we collect the wind speed and wind direction data for 101 times points starting from 1st January, 2009. Wind directions originally recorded in degrees are converted to radians, ranging from 0 to 2π. Wind speed data are observed in meter per second. It is important to mention that for our analysis we use 100 wind speed data points assuming that the wind direction data have not been observed. The main purpose is here to demonstrate that our method is well-equipped to capture the recorded, real, wind direction data, considered to be latent with respect to our model. A plot of true wind direction data is given in Figure 6 along with the plot of observed wind speed data.

5.1.2

Prior choices and MCMC implementations

We chose the prior parameters so as to obtain reasonable prediction of the future observation (set aside as y101 ), and to obtain adequate mixing of our MCMC algorithm. As such we 25

specify the prior means of β f and β g to be (0, 0, 0, 0)0 and (1, 1, 1, 1)0 , respectively. The prior covariance matrix for β f has been chosen to be an identity matrix of order 4 × 4 and for β g it has been taken to be a diagonal matrix of order 4 × 4, with diagonal elements 0.01, 0.01, 0.0 and 0.0, respectively. Following the discussion in Section 4 we fixed the third and fourth components of β g at 1 throughout the experiment to avoid identifiability issues. The shape parameters for σe and σf in the respective inverse gamma prior distributions are chosen to be 4.01 and the scale parameters are chosen to be 0.01 × 5.01 and 0.001 × 5.01, respectively, so that the prior modes for σe and σf are 0.01 and 0.001, respectively. The choice of the first parameter of inverse gamma is justified in Section 4. The MLEs of ση and σg , obtained using simulated annealing method, are 0.1455 and 0.1258. With all these prior choices our MCMC algorithm as detailed in section S-2 of the supplement has been used. As mentioned in Section 4.4 here also we use the same mixture of von-Mises to update xt , t = 1, . . . , 100. We implemented 2,50,000 MCMC iterations, where the last 1,00,000 iterations have been taken for the analysis after discarding a burn in of period 1,50,000. The time taken to implement 2,50,000 iterations on our i7 machine is 19 hours 58 minutes.

5.1.3

Results

Figures 7 and 8 provide the posterior densities of four components of the vector β f and β g , respectively. The posterior densities of σe and σf are shown in Figure 9. The posterior predictive density of y101 is provided in Figure 11. It is seen that the true value falls well within the 95% highest posterior density credible interval, which shows how well our model and the prior distributions of the parameters succeed in describing the uncertainty present in the data. (see, for instance, Box and Tiao (1973) and Bickel and Doksum (2007)). A

26

trace plot for y101 is also displayed in Figure 11 as a sample demonstration of MCMC convergence. We depict the marginal posterior densities of the latent variables in Figure 10, where progressively higher intensities of the color denote regions of progressively higher posterior densities. It can be seen that the true values of the latent variable, that is, the true values of wind direction (in radians) fall mostly in the corresponding high probability regions. Indeed, it is really encouraging to observe that our model and methods successfully capture the highly non-linear trend, even with a sharp discontinuity at around t = 10, denoting a change point, present in the original wind direction data. This has been possible because of our nonparametric ideas and also because our model allows the unknown observational and the evolutionary function based on Gaussian processes to change with time. To sum up, it can be inferred that our model and methodologies not only capture the true wind directions in the respective high posterior probability regions, but ensure that the posterior probabilities concentrate on relatively small regions, which, in turn, allows us to identify the trend present in the actual process with much precision.

5.2 5.2.1

Ozone level data A brief description of the data set

We now apply our model and methodologies to a real data set obtained from the website http://www.esrl.noaa.gov/gmd/grad/neubrew/OmiDataTimeSeries.jsp. The data concerns the ozone level present in the atmosphere at a particular location and at a particular year. For our analysis we select a location with latitude 40.125 and longitude 105.238, which corresponds to Boulder, Colorado. We collected 101 observations starting from May 15, 2013. The plot of the ozone level data is provided in Figure 12. Although it is expected that the ozone level present in the atmosphere depends upon the direction of wind flow (see 27

wind speed plot for 101 time points

0

2

1

4

6

8

wind speed

4 3 2

wind direction

5

10

6

wind direction plot for 101 time points

0

20

40

60

80

100

0

20

40

time points

60

80

100

time points

Figure 6: Plot of the wind direction and wind speed data

6

7

8

9

10

−0.06

βf1

−0.02

0.02 0.04

0.8 0.6 0.0

0.2

0.4

density

0.6 0.4 0.0

5 0

0.0

5

posterior density of beta_f_4

0.2

10

density

density

20

0.6 0.4 0.2

density

posterior density of beta_f_3 0.8

posterior density of beta_f_2 30

posterior density of beta_f_1

0

1

βf2

2

3

4

−4

βf3

−3

−2

−1

βf4

Figure 7: Posterior densities of the four components of β f for the wind speed data. posterior density of beta_g_2

50

100

density

2

0

0

1

density

3

150

posterior density of beta_g_1

−1.0

−0.8

−0.6 βg1

−0.4

−0.2

0.010

0.015

0.020 βg2

0.025

0.030

Figure 8: Posterior densities of the first two components of β g for the wind speed data. 28

posterior density of sigma_e

5

10

density

4 3

0

0

1

2

density

5

6

15

7

posterior density of sigma_f

0.1

0.2

0.3

0.4

0.5

0.6

0.0

0.1

0.2

0.3

0.4

σe

σf

Figure 9: Posterior densities of σf and σe for the wind speed data.

Posterior densities of wind direction

time points

Figure 10: Representation of the marginal posterior densities of the latent variables corresponding to wind directions as a color plot; progressively higher densities are represented by progressively intense colors. The black line represents the true wind direction data.

29

trace plot of y_last

0.0

3

4

5

y_last

6

0.8 0.4

density

7

8

1.2

posterior density of y_last

3

4

5

6

7

8

0e+00

y101

4e+04

8e+04

iteration numbers

Figure 11: From left: The first panel displays the posterior predictive density of the 101-th observation for the wind speed data. The thick horizontal line denotes the 95% highest posterior density credible interval and the vertical line denotes the true value. The second panel depicts the trace plot of y101 for the last 1,00,000 iterations. Jammalamadaka and Lund (2006)), the data on the direction of wind flow is not available at that particular location and time. Therefore, we expect that our general, nonparametric model and the associated methods will be quite useful in this situation. We retain 100 observations for our analysis and keep aside the last observation for the purpose of prediction. Before applying our model and methods, we first de-trend the data-set. Plot of detrended ozone data is displayed in Figure 12 along with the plot of observed ozone data for 101 days.

5.2.2

Prior choices

We keep the same choices of the prior parameters as done in case of ozone level data. The MLEs of ση and σg , obtained by the simulated annealing method discussed in Section 4.3, are 0.0493 and 0.2269, respectively.

30

5.2.3

MCMC implementation

With these choices of prior parameters we implement our MCMC algorithm detailed in Section S-2 of the supplement with the random walk scales chosen on the basis of informal trial and error method associated with many pilot runs of our MCMC algorithm. As mentioned in 4.4, here also we use the same mixture of von-Mises to update xt , t = 1, . . . , 100. Our final MCMC run is based on 2, 50, 000 iterations of MCMC, of which we discarded the first 2, 00, 000 iterations as the burn-in period. The time taken for 2, 50, 000 iterations of MCMC on our desktop computer with i7 processors, is about 21 hours.

5.2.4

Results of ozone level data

The posterior densities of the four components of β f and the two components of β g are provided in Figures 13 and 14, respectively. The posterior densities of σe and σf are shown in Figure 15. Figure 16 shows the marginal posterior distributions associated with the latent circular process depicted by progressively intense colors, along with the posterior median indicated by black line. Finally, the posterior predictive density corresponding to y101 is provided in Figure 17. Here the thin vertical line denotes the true value of the 101-th observation and the thick line represents the 95% highest density region of the posterior predictive density. As in our previous experiments, here also the true value falls well within the 95% highest posterior density credible interval. Also, as in our previous experiments, trace plot of y101 for the last 50,000 observations, illustrate the convergence of our MCMC iterations.

31

Detrended ozone level plot for 101 days

0

20

40

60

80

−20

0

10

20

detrended ozone level

340 320 300 280

Ozone level

30

40

360

Ozone level plot for 101 days

100

0

20

40

Index

60

80

100

Index

Figure 12: Plot of ozone level for 101 days in Boulder, Colorado.

density

0.3 0.2

−4

−2

0

2

4

−0.10

−0.05

βf1

0.00

0.05

0.10

0.0

0

0.0

0.1

0.1

5

density

0.3

0.4

posterior density of beta_f_4

0.4

15 10

density

0.3 0.2 0.1 0.0

−4

βf2

−2

0

2

4

−4

βf3

−2

0

2

βf4

Figure 13: Posterior densities of the three components of β f for the ozone data. posterior density of beta_g_2

150 50

100

density

2

0

1

density

3

4

200

posterior density of beta_g_1

0

density

posterior density of beta_f_3

0.2

posterior density of beta_f_2

0.4

posterior density of beta_f_1

−0.4

−0.2

0.0

0.2

0.045

βg1

0.050

0.055

βg2

Figure 14: Posterior densities of the first two components of β g for the ozone data. 32

posterior density of sigma_e

0.3

density

0

0.0

0.1

0.2

2 1

density

3

0.4

4

0.5

posterior density of sigma_f

0.0

0.5

1.0

1.5

2.0

2.5

9

10

11

12

13

14

15

16

σe

σf

Figure 15: Posterior densities of σf and σe for the ozone data.

Posterior densities of latent process

days

Figure 16: Depiction of the marginal posterior distributions of the latent variables using progressively intense colors for progressively higher densities, and the median for the latent process of the ozone data.

33

trace plot of y_last 40 20 −40

0

y_last

0.020 0.010 0.000

density

0.030

posterior density of y_101

−60 −40 −20

0

20

40

60

0

y101

10000

30000

50000

iteration numbers

Figure 17: Posterior predictive density of the 101-th observation for the ozone data. The thick horizontal line denotes the 95% highest posterior density credible interval and the vertical line denotes the true value.

6

Discussion and conclusion

In this paper we have proposed a novel nonparametric dynamic state space model where the latent process is in the circular manifold. We assumed that both the observational and the evolutionary functions are time-varying, but have unknown functional forms, which we model nonparametrically via appropriate Gaussian processes. For this purpose we derived a suitable Gaussian processes with both linear and circular arguments using kernel convolution. Previously, some research has been carried out on Gaussian process with circular argument; see, for example, Dufour and Roy (1976), Gneiting (1998). However, most of the previous works considered the circular variable as the only argument. The main issue with the procedure of Dufour and Roy (1976) is that the covariance function turns out to be an infinite sum, and therefore, one has to approximate the infinite sum with proper truncation

34

while applying to data. Hence, the question of error of approximation lurks in their procedure. Gneiting (1998) provided sufficient conditions under which any correlation function on the real line can be treated as a correlation function on circles. For that purpose Gneiting (1998) had to bound the argument of the correlation function on a finite interval, and therefore, the correlation can not tend to zero for the underlying Gaussian process. The kernel convolution method has been used in Shafie et al. (2003), although they derived the Gaussian process on two linear arguments, one in R (real line) and the other in R+ (positive part of the real line). Adler (1981) and Adler and Taylor (2007) dealt with Gaussian processes on manifolds in great details. However, they focused on Gaussian processes with arguments only on single manifold. Here we mention that although we also use the kernel convolution approach to forming appropriate Gaussian processes, our case is substantially different in that our Gaussian process construction is based on both linear and circular arguments. Moreover, we have chosen our kernel appropriately such that the Gaussian process satisfies all desirable smoothness properties. The most elegant property of our Gaussian process is that the covariance function becomes 0 whenever |θ1 − θ2 | = π/2. This implies that whenever two angular observations have orthogonal directions, their correlation turns out to be 0 irrespective of the difference in time. Obviously, we also have shown that as |t1 − t2 | → ∞ then the covariance function tends to 0, that is, as the difference in time goes to ∞, the correlation goes to 0. The main aim of our research is to predict single or multiple future observations given the dynamic data at hand. That is, considering the Bayesian paradigm, our main objective is to obtain posterior predictive distributions. To achieve the posterior predictive distributions, appropriate MCMC simulation techniques needed to be devised. The main MCMC challenge for this model is to simulate the complete latent process; aided by the look up table concept

35

of Bhattacharya (2007) (see also Ghosh et al. (2014)), appropriately adapted to suit the circular context, we could create an MCMC algorithm that has demonstrated very reasonable performances in both simulated and real data situations. Our model and methods are applied to a simulated data where the data is generated from a highly nonlinear model, which is completely different from our own model. This simulation is done purposefully to demonstrate that our method is applicable to any nonlinear dynamic model where the latent process is in the circular manifold. It is also successfully shown that the future observation is well within the 95% credible region of the posterior predictive density. Quite importantly, almost the complete set of true latent variables fell well within their respective high marginal posterior probability density regions. The encouraging results are expected to provide any practitioner with some degree of latitude in applying our model in any practical context. To demonstrate the effectiveness of our model and methodologies in capturing the underlying latent circular process in real data scenarios, we implemented our model on wind speed and direction data of a particular location for a particular period of time. In this experiment we took 100 observations on wind speed data for implementing our method. We pretended that the data on wind direction were unknown. Quite importantly it is noticed that the high probability region of the posterior densities associated with the latent process covered most of the observed wind direction values, and the underlying highly non-linear and discontinuous trend associated with the wind directions has been quite precisely captured. Finally, we applied our model to the level of ozone present in the atmosphere for a particular location over a period of time consisting of hundred observations, where wind direction data, expected to be associated with the ozone data, are not recorded. Even in this real example, our ideas yielded quite encouraging results. In particular, our posterior

36

predictive density for the set-aside “future” observation successfully captured the true, setaside value within the 95% highest posterior density credible interval. These two real data analyses ensure that our model and methodologies are equally good in predicting the future observations of the observed data and in capturing the underlying latent circular process generating the linear observed data. In fine, we remark that in this paper we assumed the observations yt , t = 1, . . . , T , to be in R. However, it is straightforward to extend our theory to Rp by suitably adjusting the kernel convolution technique. The technique can be extended even to cases where the latent xt , t = 1, . . . , T , are also multidimensional. To keep the size of the paper reasonable we skip the multivariate part for this paper.

Acknowledgment The authors are thankful to Moumita Das for very useful discussions. The authors are also thankful to the anonymous referees and an Associate Editor for their valuable comments and suggestions which helped improve our paper significantly.

A A.1

Appendix Gaussian process on linear and angular component and its properties

To define a Gaussian process on linear and angular component we use the well known kernel convolution method. Let k be any d-dimensional kernel such that

37

Z

k 2 (t) dt < ∞.

Here we choose two kernels as follows (in case d = 1)

k1 (t) =

1 −1/4 − 2ψ1 2 t2 , π e ψ

where ψ > 0, and k2 (t) = π −1/2 cos(t) I(0 ≤ t ≤ π), a trigonometric kernel. Based on above two choices of the kernel we propose a new Gaussian process for time and angle as arguments as follows. Z

∞

1 2 −1 −1/4 − 2ψ2 (y−t)

ψ π

X(t, θ) = µ(t, θ) +

e

Z

−1 −3/4

π

dW (y)

−∞

π −1/2

cos(u − θ) dW (u)

0

Z

∞

Z

= µ(t, θ) + ψ π

π

e −∞

−

1 (y−t)2 2ψ 2

cos(u − θ) dW (u) dW (y),

0

where µ(t, θ) is the mean of the process which may depend on time t and angle θ (as in our case mean is assumed to be of the form h(·, ·)0 β, with h(t, θ)0 = (1, t, cos(θ), sin(θ))); W (·) is the one dimensional standard Wiener process. Next, we determine the structure of the covariance of our Gaussian process thus constructed.

38

A.2

Covariance structure of our Gaussian process

With these separable kernels we calculate the covariance function of X(t1 , θ1 ) and X(t2 , θ2 ) for fixed (t1 , θ1 ) and (t2 , θ2 ) as −2 −6/4

cov(X(t1 , θ1 ), X(t2 , θ2 )) = ψ π

Z

∞

Z

e

E −∞

Z

∞

π

Z

e −∞ −2 −6/4

π

−

−

1 (y−t1 )2 2ψ 2

cos(u − θ1 ) dW (u) dW (y)

0

1 (y−t2 )2 2ψ 2

cos(u − θ2 ) dW (u) dW (y)

0

Z

∞

−

e

=ψ π

1 2ψ 2

{(y−t1 )2 +(y−t2 )2 } dy

−∞

Z

π

cos(u − θ1 ) cos(u − θ2 ) du 0

Z 1 −2 −6/4 − 2ψ1 2 (t21 +t22 ) ∞ − 2ψ1 2 2(y2 −y(t1 +t2 )) e = ψ π e dy 2 −∞ Z π [cos(−(θ1 − θ2 )) + cos(2u − (θ1 + θ2 ))] du 0

Z ∞ t +t 1 −2 −6/4 − 2ψ1 2 (t21 +t22 )+ 4ψ1 2 (t1 +t2 )2 − 12 (y− 1 2 2 )2 e ψ = ψ π e dy 2 −∞ Z π cos(2u − (θ1 + θ2 )) du π cos(|θ1 − θ2 |) + 0

√ − 1 (t −t )2 1 = ψ −2 π −6/4 ψ πe 4ψ2 1 2 π cos(|θ1 − θ2 |) 2 1 − 1 |t −t |2 = ψ −1 e 4ψ2 1 2 cos(|θ1 − θ2 |) 2 = σ 2 exp{−σ 4 |t1 − t2 |2 } cos(|θ1 − θ2 |),

where σ 2 =

ψ −1 . 2

Here it is important to remind the reader that in this paper our motive of introducing the kernels k1 and k2 is entirely different from the other existing works involving circular and spherical data, where the goals are density estimation, nonparametric regression and

39

smoothing (see, for example, Hall et al. (1987), Marzio et al. (2009), Marzio and Taylor (2009), Marzio et al. (2011), Marzio et al. (2012b), Marzio et al. (2012a), Marzio et al. (2014)). Hence, our kernels need not satisfy the optimality properties required for the aforementioned works. Indeed, here our goal is to construct an appropriate Gaussian process model for random functions having both time and angle as arguments. The Gaussian process is required to possess desired properties, such as stationarity in time and angle, zero correlation when the directions are orthogonal and/or when the time difference tends to infinity, along with desired continuity and smoothness properties. Moreover, quite importantly, a closed form of the covariance of the Gaussian process is also required, which, as we discuss in Section 6 of our paper, is difficult to obtain in general. With our kernels k1 and k2 , all these properties have been achieved, and in this sense, they are optimal.

Supplementary Material Throughout, we refer to our main paper Mazumder and Bhattacharya (2014a) as MB.

S-1

Smoothness properties of our Gaussian process with linear-circular arguments

Here we assume that µ(t, θ) is twice differentiable with respect to t and θ, and that the derivatives are bounded. Formally, we assume that

∂ 2 µ(t,θ) ∂ 2 µ(t,θ) ∂ 2 µ(t,θ) , ∂θ2 , ∂t∂θ ∂t2

(=

∂ 2 µ(t,θ) ) ∂θ∂t

exist

and are bounded. We denote the covariance function σ 2 exp{−σ 4 |t1 − t2 |2 } cos(|θ1 − θ2 |) (where σ 2 =

ψ −1 ) 2

by K(|t1 − t2 |, |θ1 − θ2 |).

40

S-1.1

Mean square continuity:

1. With respect to time t

E[X(t + h, θ) − X(t, θ)]2 =E[X(t + h, θ)]2 + E[X(t, θ)]2 − 2E[X(t + h, θ)X(t, θ)] =K(0, 0) + K(0, 0) − 2K(h, 0) =2(K(0, 0) − K(h, 0))

Now as h → 0, E[X(t + h, θ) − X(t, θ)]2 → 0 because of the fact that K(h, 0) is continuous in h. 2. With respect to angle θ:

E[X(t, θ + α) − X(t, θ)]2 =E[X(t, θ + α)]2 + E[X(t, θ)]2 − 2E[X(t, θ + α)X(t, θ)] =K(0, 0) + K(0, 0) − 2K(0, α) =2(K(0, 0) − K(0, α))

Now as α → 0, E[X(t, θ + α) − X(t, θ)]2 → 0 because of the fact that K(0, α) is continuous in α. 3. With respect to time t and angle θ:

E[X(t + h, θ + α) − X(t, θ)]2 41

=E[X(t + h, θ + α)]2 + E[X(t, θ)]2 − 2E[X(t + h, θ + α)X(t, θ)] =K(0, 0) + K(0, 0) − 2K(h, α) =2(K(0, 0) − K(h, α))

Now as (h, α) → (0,0) then E[X(t + h, θ + α) − X(t, θ)]2 → 0 because of the fact that K(h, α) is continuous in h and α.

S-1.2

Mean square differentiability

A process X(u), u ∈ Rd , is said to be Mean Square Differentiable at u0 if for any direction p there exists a process Lu0 (p), linear in p, such that

X(u0 + p) = X(u0 ) + Lu0 (p) + R(u0 , p), where p ∈ Rd , and R(u0 , p) satisfies the following R(u0 , p) → 0, in L2 , ||p|| with || · || being the usual Euclidean norm (for details see Banerjee and Gelfand (2003)). However, we have t ∈ R+ and θ ∈ [0, 2π], so we can not directly apply the definition of mean square differentiability that is appropriate for Rd . For our purpose we define a new metric on time and angular space as

d(t1 , t2 , θ1 , θ2 ) = |t1 − t2 | + |θ1 − θ2 |,

42

(recall that we have used the angular distance as a metric on the angular space to represent the covariance as a function of distance in time and angle). Note that d(·, ·, ·, ·) satisfies all the three criteria for being a metric, that is,

1. d(t1 , t2 , θ1 , θ2 ) ≥ 0 2. d(t1 , t2 , θ1 , θ2 ) = 0 iff t1 = t2 , θ1 = θ2 3. d(t1 , t3 , θ1 , θ3 ) ≤ [|t1 − t2 | + |θ1 − θ2 |] + [|t2 − t3 | + |θ1 − θ2 |] = d(t1 , t2 , θ1 , θ2 ) + d(t2 , t3 , θ2 , θ3 )

With the help of this new metric in time and angular space we define Mean Square Differentiability in time and circular domain as Definition 1 A process X(t, θ) is said to be Mean Square Differentiable in L2 sense at (t0 , θ0 ) if for any direction (h, α) there exists a process Lt0 , θ0 (h, α), linear in h, α, such that

X(t0 + h, θ0 + α) = X(t0 , θ0 ) + Lt0 , θ0 (h, α) + R(t0 , θ0 , h, α),

where R(t0 , θ0 , h, α) satisfies the following condition R(t0 , θ0 , h, α) → 0, in L2 as d(h, 0, α, 0) → 0. d(h, 0, α, 0) In our case, since our covariance function K(|t1 − t2 |, |θ1 − θ2 |) has partial derivatives of all orders, the partial derivative processes of all orders exist with covariance structures given by partial derivatives of our covariance function; see Section 2.2 of Adler (1981) for details. In fact, the partial derivative processes are all Gaussian processes, and hence, they 43

are bounded in L2 . Hence, we can apply Taylor series expansion to obtain a linear function Lu0 (p). The following calculation will make the things clear. Following the multivariate Taylor series expansion (using our new metric) we have ∂ ∂ X(t0 + h, θ0 + α) = X(t0 , θ0 ) + h X(t, θ) + α X(t, θ) + R(t0 , θ0 , h, α), ∂t ∂θ t=t0 ,θ=θ0 t=t0 ,θ=θ0 2 2 ∂ X(t,θ) ∂ 2 X(t,θ) where |R(t0 , θ0 , h, α)| ≤ M d (h, 0, α, 0), with M = max ∂ X(t,θ) , , ∂t∂θ ∂θ∂t , ∂t2 2 ∂ X(t,θ) ∂θ2 (using the analogy with multivariate Taylor series expansion in Rd , recall that in ∗ 2

∗

the case of Rd , R(u0 , p) ≤ M ∗ ||p||2 ). Since each of the partial derivative processes is bounded in L2 , it is obvious that M ∗ is also bounded in L2 . Mean square differentiability of our kernel convolved Gaussian process thus follows.

S-2

MCMC-based inference

In our MCMC-based inference we include the problem of forecasting yT +1 , given the observed data set D T . The posterior predictive distribution of yT +1 given D T is given by Z [yT +1 |D T ] =

[yT +1 |D T , x0 , . . . , xT +1 , β f , β g , σ2 , ση2 , σf2 , σg2 ] × [x0 , . . . , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 |D T ] dβ f dβ g dσ2 dση2 dσg2 dσf2 dx0 . . . dxT +1 .

Thus, once we have a sample realization from the joint posterior 44

(32)

[x0 , . . . , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 |D T ], we can generate a realization from [yT +1 |D T ] by simply simulating from [yT +1 |D T , x0 , . . . , xT +1 , β f , β g , σ2 , ση2 , σf2 , σg2 ], conditional on the realization obtained from the former joint posterior. Observe that the conditional distribution [yT +1 = f (T + 1, xT +1 ) + T +1 |D T , x0 , . . . , xT +1 , β f , σ2 , σf2 ] is normal with mean µyT +1 = h(T + 1, xT +1 )0 β f + sf,DT (T + 1, xT +1 )0 A−1 f,DT (D T − H DT β f )

(33)

and variance

σy2T +1 = σ2 + σf2 1 − (sf,DT (T + 1, xT +1 ))0 A−1 f,DT sf,DT (T + 1, xT +1 ) .

(34)

Using the auxiliary variables K1 , . . . , KT +1 , the posterior distribution of the latent circular variables and the other parameters can be represented as

[x0 , x1 , . . . , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 |D T ] X Z = [x0 , x1 , . . . , xT , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 , g ∗ (1, x0 ), D z , K1 , . . . , KT , KT +1 |D T ] K1 ,...,KT +1

∝

× dg ∗ (1, x0 )dD z X Z [x0 , x1 , . . . , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 , g ∗ (1, x0 ), D z , K1 , . . . , KT , KT +1 , D T ] K1 ,...,KT +1

=

× dg ∗ (1, x0 )dD z X Z [β f ][β g ][σ2 ][ση2 ][σg2 ][σf2 ][x0 ][g ∗ (1, x0 )|x0 , β g , σg2 ][D z |g ∗ (1, x0 ), x0 , β g , σg2 ] K1 ,...,KT +1

[x1 |g ∗ (1, x0 ), ση2 , K1 ][K1 |g ∗ (1, x0 ), ση2 ][D T |x1 , . . . , xT , β f , σ2 , σf2 ]

45

T +1 Y

T +1 Y

t=2

t=2

[xt |β g , ση2 , σg2 , D z , xt−1 , Kt ]

[Kt |β g , ση2 , σg2 , D z , xt−1 ] dg ∗ (1, x0 ) dD z .

(35)

In order to obtain MCMC samples from [x0 , x1 , . . . , xT +1 , β f , β g , σ2 , ση2 , σg2 , σf2 |D T ], we first carry out MCMC simulations from the joint posterior which is proportional to integrand (35). Ignoring g ∗ (1, x0 ), Dz and K1 , . . . , KT +1 in these MCMC simulations and storing the realizations associated with the remaining parameters yield the desired samples.

S-2.1

Full conditional distributions

Here we provide the full conditional distributions of the unknowns. In what follows, we shall express [g ∗ (1, x0 )|x0 , β g , σg2 ][D z |g ∗ (1, x0 ), x0 , β g , σg2 ] as [D z , g ∗ (1, x0 )|x0 , β g , σg2 ]. [β f | · · · ] ∝ [β f ][D T |x1 , . . . , xT , β f , σ2 ] [β g | · · · ] ∝ [β g ][D z , g

∗

(36)

(1, x0 )|x0 , β g , σg2 ]

σg2 , D z , xt−1

T +1 Y

T +1 Y

t=2

t=2

[xt |β g , ση2 , σg2 , D z , xt−1 , Kt ]

Kt |β g , ση2

(37)

[σ2 | · · · ] ∝ [σ2 ][D T |x1 , . . . , xT , β f , σ2 ]

(38)

[σf2 | · · · ] ∝ [σf2 ][D T |x1 , . . . , xT , β f , σf2 ]

(39)

[ση2 | · · · ]

∝

[ση2 ][x1 |g ∗ (1, x0 ), ση2 , K1 ][K1 |g ∗ (1, x0 ), ση2 ]

T +1 Y

[xt |β g , ση2 , σg2 , D z , xt−1 , Kt ]

t=2 T +1 Y

[Kt |β g , σg2 , ση2 , D z , xt−1 ]

(40)

t=2

[σg2 | · · · ]

∝

[σg2 ][D z , g ∗ (1, x0 )|x0 , β g , σg2 ]

T +1 Y

T +1 Y

t=2

t=2

[xt |β g , ση2 , σg2 , D z , xt−1 , Kt ]

46

Kt |β g , σg2 ,

ση2 , D z , xt−1

(41)

[x0 | · · · ] ∝ [x0 ][D z , g ∗ (1, x0 )|x0 , β g , σg2 ]

(42)

[g ∗ (1, x0 )| · · · ] ∝ [g ∗ (1, x0 )|x0 , β g , σg2 ][D z |g ∗ (1, x0 ), x0 , β g , σg2 ][x1 |g ∗ (1, x0 ), x0 , ση2 , K1 ] [K1 |g ∗ (1, x0 ), ση2 ] [D z | · · · ] ∝ [D z |g

∗

(1, x0 ), x0 , β g , σg2 ]

(43) T +1 Y

T +1 Y

t=2

t=2

[xt |β g , σg2 , ση2 , D z , xt−1 , Kt ]

Kt |β g , σg2 , ση2 ,

D z , xt−1 ]

(44)

[x1 | · · · ] ∝ [x1 |g ∗ (1, x0 ), ση2 ][D T |x1 , . . . , xT , β f , σ2 ] [x2 |β g , σg2 , ση2 , D z , x1 , K2 ][K2 |β g , σg2 , ση2 , D z , x1 ] [xT +1 | · · · ] ∝ [xT +1 |β g , σg2 , ση2 , D z , xT , KT +1 ]

(45) (46)

[xt+1 | · · · ] ∝ [xt+1 |β g , σg2 , ση2 , D z , xt ][xt+2 |β g , σg2 , ση2 , D z , xt+1 , Kt+2 ] Kt+2 |β g , σg2 , ση2 , D z , xt+1 ] [D T |x1 , . . . , xT , β f , σ2 ], t = 1, . . . , T − 1

(47)

Finally, we write down the full conditional distribution of Kt , for t = 1, . . . , T + 1, as

S-2.1.1

[K1 | · · · ] ∝ [K1 |g ∗ (1, x0 ), ση2 ][x1 |g ∗ (1, x0 ), β g , ση2 , K1 ]

(48)

[Kt | · · · ] ∝ [xt |β g , ση2 , D z , xt−1 , Kt ][Kt |β g , ση2 , D z , xt−1 ], t = 2, . . . , T + 1.

(49)

Updating β f by Gibbs steps

The full conditional of β f is a multivariate normal distribution with mean E[β f | · · · ] = {H 0DT (σf2 Af,DT + σ2 I)−1 H DT + Σβf,0 }−1 × {H 0DT (σf2 Af,DT + σ2 I)−1 D T + Σ−1 βf,0 β f,0 } 47

(50)

and variance

V [β f | · · · ] = {H 0DT (σf2 Af,DT + σ2 I)−1 H DT + Σβf,0 }−1 .

S-2.1.2

(51)

Updating β g

We first explicitly write down the right hand side of (37).

∗

[β g ][D z , g (1, x0 )|x0 , β g ]

T +1 Y

T +1 Y

t=2

t=2

[xt |β g , ση2 , D z , xt−1 , Kt ]

[Kt |β g , ση2 , D z , xt−1 ]

1 0 −1 ∝ exp − (β g − β g,0 ) Σβg ,0 (β g − β g,0 ) 2 1 0 0 0 0 −1 ∗ 0 0 ∗ 0 exp − [(D z , g ) − (H Dz β g , h (1, x0 )) ] ADz ,g∗ (1,x0 ) [(D z , g ) − (H Dz β g , h (1, x0 )) ] 2 ( T +1 ) T +1 X 1 Y 2 exp − (x + 2πK − µ ) I[0,2π] (xt ) (52) t t xt 2 2σ x t t=2 i=2 Observe that the denominator of [xt |β g , ση2 , D z , xt−1 , Kt ] cancels with the density of [Kt |β g , ση2 , D z , xt−1 ] for each t = 2, . . . , T + 1. Also we note that the indicator function does not involve β g for all t = 2, . . . , T + 1. Therefore, after simplifying the exponent terms and ignoring the indicator function we can write 1 0 −1 [β g | · · · ] ∝ exp − (β g − µβg ) Σβg (β g − µβg ) , 2 where

48

(53)

µβg

1 0 −1 0 0 = E[β g | · · · ] = Σ−1 βg ,0 + 2 [H Dz , h(1, x0 )]ADz ,g ∗ (1,x0 ) [H Dz , h(1, x0 )] σg

0 )−1 T X H 0Dz A−1 H 0Dz A−1 g,Dz sg,Dz (t + 1, xt ) − h(t + 1, xt ) g,Dz sg,Dz (t + 1, xt ) − h(t + 1, xt ) + σx2t t=1 1 0 −1 ∗ Σ−1 βg ,0 β g,0 + 2 [H Dz , h(1, x0 )]ADz ,g ∗ (1,x0 ) [D z , g (1, x0 )] σg ) T 0 −1 X s (t + 1, x ) D h(t + 1, x ) − H A xt+1 + 2πKt+1 − sg,Dz (t + 1, xt )0 A−1 g,D t z t z Dz g,Dz g,Dz + 2 σ xt t=1 (54) and

Σβg

1 0 −1 0 0 = V [β g | · · · ] = Σ−1 βg ,0 + 2 [H Dz , h(1, x0 )]ADz ,g ∗ (1,x0 ) [H Dz , h(1, x0 )] σg

0 )−1 T X H 0Dz A−1 H 0Dz A−1 g,Dz sg,Dz (t + 1, xt ) − h(t + 1, xt ) g,Dz sg,Dz (t + 1, xt ) − h(t + 1, xt ) . + σx2t t=1 (55) Hence [β g | · · · ] follows a tri-variate normal distribution with mean and variance µβg and Σβg , respectively, and therefore, we update βg using Gibbs sampling. S-2.1.3

Updating σf2 and σg2

The mathematical form of the full conditional distributions of σf2 and σg2 are not tractable, so we update σf2 and σg2 by random walk Metropolis-Hastings steps.

49

S-2.1.4

Updating σ2

The mathematical form of the full conditional distribution of σ2 is not tractable, so we update σ2 by a random walk Metropolis-Hastings step. S-2.1.5

Updating ση2

For full conditional distribution of ση2 right hand side of (40) simplifies a bit in the sense that the denominator of [xt |β g , ση2 , D z , xt−1 , Kt ] cancels with the density of [Kt |β g , ση2 , D z , xt−1 ] for t = 2, . . . , T + 1, and the denominator of [x1 |g ∗ (1, x0 ), β g , ση2 , K1 ] cancels with the density of [K1 |g ∗ (1, x0 ), β g , ση2 ], which, in turn, gives the following form:

(

T +1 X 1 [ση2 | · · · ] ∝ [ση2 ] exp − (xt + 2πKt − µxt )2 2 2σxt i=2

)

1 ∗ 2 exp − 2 (x1 + 2πK1 − g ) . (56) 2ση

However, the above equation does not have a closed form; hence, for updating ση2 as well, we use random walk Metropolis-Hastings.

S-2.1.6

Updating x0

The full conditional distribution of x0 is not tractable and hence again here we use random walk Metropolis-Hastings for updating x0 . Now note that x0 is a circular random variable, (old)

so to update x0

(new)

to x0

(old)

we use the vonMises distribution with location parameter x0

50

.

S-2.1.7

Updating g ∗ (1, x0 )

Equation (43), after cancelling the denominator of [x1 |g ∗ (1, x0 ), x0 , β g , ση2 , K1 ] with the density of [K1 |g ∗ (1, x0 ), x0 , β g , ση2 ], and ignoring the indicator function on x0 , reduces to 1 ∗ 2 [g (1, x0 )| · · · ] ∝ [g (1, x0 )|x0 , β g ][D z |g (1, x0 ), x0 , β g ] exp − 2 (x1 + 2πK1 − g ) . 2ση ∗

∗

∗

After further simplification the full conditional distribution of g ∗ (1, x0 ) reduces to 1 ∗ 2 [g (1, x0 )| · · · ] ∝ exp − 2 (g − νg ) , 2γg ∗

(57)

where

∗

νg = E[g (1, x0 )| · · · ] =

−1 1 1 0 −1 + (1 + sg,Dz (1, x0 ) Σg,Dz sg,Dz (1, x0 )) ση2 σg2 1 x1 + 2πK1 −1 ∗ 0 0 + 2 (h(1, x0 ) β g + sg,Dz Σg,Dz D z ) ση2 σg

(58)

and

γg2

∗

= V [g (1, x0 )| · · · ] =

1 1 0 −1 + (1 + sg,Dz (1, x0 ) Σg,Dz sg,Dz (1, x0 )) , ση2 σg2

(59)

with D ∗z = D z − H Dz β g + h(1, x0 )0 β g sg,Dz ,

(60)

Σg,Dz = Ag,Dz − sg,Dz (1, x0 )sg,Dz (1, x0 )0 .

(61)

and

51

Hence [g ∗ | · · · ] follows a normal distribution with mean νg and variance γg . Therefore, we update g ∗ using Gibbs sampling.

S-2.1.8

Updating D z

Here also we observe that in the full conditional distribution of D z , the denominator of [xt |β g , ση2 , D z , xt−1 , Kt ] cancels with the density of [Kt |β g , ση2 , Dz , xt−1 ] for each t = 2, . . . , T + 1. After simplification it turns out that the full conditional distribution of D z is an n-variate normal with mean ( E(D z | · · · ) = ( ×

Σ−1 g,Dz + A−1 g,Dz 2 σg

T X sg,Dz (t + 1, xt )s0g,Dz (t + 1, xt ) σx2t t=1

)−1

! A−1 g,Dz

Σ−1 g,Dz µg,Dz + A−1 g,Dz σg2 T X sg,Dz (t + 1, xt ){xt+1 + 2πKt+1 − β 0g (h(1, t + 1, xt ) − H 0Dz A−1 g,Dz sg,Dz (t + 1, xt ))} σx2t t=1

(62) and covariance matrix ( V (D z | · · · ) =

Σ−1 g,Dz + A−1 g,Dz σg2

T X sg,Dz (t + 1, xt )s0g,Dz (t + 1, xt ) σx2t t=1

)−1

! A−1 g,Dz

.

(63)

Therefore, we update D z using Gibbs sampling. S-2.1.9

Updating x1

For the full conditional distribution of x1 we write down the complete expression of (45) as follows: 52

)

[x1 | · · · ] ∝

√ 1 2πση

exp − 2σ12 (x1 + 2πK1 − g ∗ )2 I[0,2π] (x1 ) η 2π(K1 +1)−g ∗ 2πK1 −g ∗ Φ −Φ ση ση

1 exp {− (D T − µyt )0 Σ−1 yt (D T − µyt )} 2 1 1 2 √ exp − 2 (x2 + 2πK2 − µx2 ) , 2σx2 2πσx2

(64)

where µyt and Σyt are given by (10) and (11) of MB. Here we note that the denominator of [x2 |β g , ση2 , D z , x1 , K2 ] cancels with [K2 |β g , ση2 , D z , x1 ]. Also we ignore the indicator term ∗ 2πK1 −g ∗ associated with x2 . We note that the term Φ 2π(K1σ+1)−g − Φ does not involve ση η ∗ 2πK1 −g ∗ x1 . Hence ignoring Φ 2π(K1σ+1)−g − Φ we get ση η

1 1 ∗ 2 [x1 | · · · ] ∝ √ exp − 2 (x1 + 2πK1 − g ) I[0,2π] (x1 ) 2ση 2πση 1 exp {− (D T − µyt )0 Σ−1 yt (D T − µyt )} 2 1 1 2 √ exp − 2 (x2 + 2πK2 − µx2 ) , 2σx2 2πσx2

(65)

However, it is not possible to get a closed form expression of [x1 | · · · ], so we update it by random walk Metropolis-Hastings.

53

Updating xt+1 , t = 1, . . . , T − 1

S-2.1.10

For xt+1 we have the same structure as for x1 , except for some changes in the parameters. To be precise, the full conditional distribution can be explicitly written as

[xt+1 | · · · ] ∝

1 2πσxt+1

1

2

exp − 2σ2 (xt+1 + 2πKt+1 − µxt+1 ) I[0,2π] (xt+1 ) x 2πK −µ 2π(K t+1+1)−µ xt+1 xt+1 t+1 t+1 − Φ Φ σxt+1 σxt+1 ! 1 1 2 √ exp − 2 (xt+2 + 2πKt+2 − µxt+2 ) 2σxt+2 2πσxt+2 √

1 exp {− (D T − µyt )0 Σ−1 yt (D T − µyt )}. 2 We note here that Φ

2π(K

t+1 +1)−µxt+1

−Φ

2πK

t+1 −µxt+1

(66)

does not involve xt+1 because µxt+1 2π(K +1)−µ xt+1 t+1 and σxt+1 depend on xt , not on xt+1 , and hence we can ignore the term Φ − σxt+1 2πK −µ xt+1 t+1 and rewrite (66) as Φ σx σxt+1

σxt+1

t+1

! 1 1 2 [xt+1 | · · · ] ∝ √ exp − 2 (xt+1 + 2πKt+1 − µxt+1 ) I[0,2π] (xt+1 ) 2σxt+1 2πσxt+1 ! 1 1 √ exp − 2 (xt+2 + 2πKt+2 − µxt+2 )2 2σxt+2 2πσxt+2 1 exp {− (D T − µyt )0 Σ−1 yt (D T − µyt )}. 2

(67)

Here also the expression of the full conditional distribution of xt+1 is not tractable. So, we adopt random walk Metropolis-Hastings to update xt+1 , for t = 1, . . . , T .

54

S-2.1.11

Updating xT +1

The full conditional distribution of xT +1 has probability density function of the form (29) of MB with parameters

µxT +1 = h(1, xT )0 β g + sg,Dz (T + 1, xT )0 A−1 g,Dz (D z − H Dz β g )

(68)

σx2T +1 = ση2 + σg2 {1 − sg,Dz (T + 1, xT )0 A−1 g,Dz sg,Dz (T + 1, xT )}.

(69)

and

We note here that given all unknowns except xT +1 , xT +1 +2πKT +1 follows a truncated normal distribution with left side truncation at 2πKT +1 and right side truncation at 2π(KT +1 + 1) (KT +1 is constant in this case). Hence we update xT +1 + 2πKT +1 using Gibbs sampling and then subtract 2πKT +1 from it to update xT +1 . S-2.1.12

Updating Kt , t = 1, . . . , T + 1

The full conditional distribution of K1 reduces to the following form 1 1 ∗ 2 [K1 | · · · ] ∝ √ exp − 2 (x1 + 2πK1 − g ) I{...,−1,0,1,...} (K1 ), 2ση 2πση

(70)

and similarly the full conditional distribution of Kt becomes 1 1 2 [Kt | · · · ] ∝ √ exp − 2 (xt + 2πKt − µxt ) I{...,−1,0,1,...} (Kt ), 2σxt 2πσxt

(71)

for t = 2, . . . , T +1. We update Kt , for t = 1, . . . , K+1, by random walk Metropolis-Hastings.

55

References Adler, R. J. (1981). The Geometry of Random Fields. Wiley, London. Adler, R. J. and Taylor, J. E. (2007). Random Fields and Geometry. Springer, Boston. Banerjee, S. and Gelfand, A. E. (2003). On Smoothness Properties of Spatial Processes. Journal of Multivariate Analysis, 84, 85–100. Bhattacharya, S. (2007). A Simulation Approach to Bayesian Emulation of Complex Dynamic Computer Models. Bayesian Analysis, 2, 783–816. Bickel, J. P. and Doksum, A. K. (2007). Mathematical Statistics. Number 2nd Edition in Voume I. Pearson Prentice Hall. Box, E. P. G. and Tiao, C. G. (1973). Bayesian Inference in Statistical Analysis. Addison Wesley Publishing Co. Dufour, M. J. and Roy, R. (1976). On Spectral Estimation for a Homogeneous Random Process on the Circle. Stochastic Processes and their Applications, 4, 107–120. Durbin, J. and Koopman, S. J. (2001). Time Series Analysis by State Space Methods. Oxford University Press, Oxford. Epp, R. J., Tukey, J. W., and Watson, G. S. (1971). Testing Unit Vectors for Correlation. Journal of Geophysical Research, 76, 8480–8483. Ghosh, A., Mukhopadhyay, S., Roy, S., and Bhattacharya, S. (2014). Bayesian Inference in Nonpaametric Dynamic State Space Models. Statistical Methodology, 21, 35–48.

56

Gneiting, T. (1998). Simple Tests for the Validity of Correlation Function Models on the Circle. Statistics and Probability Letters, 39, 119–122. Hall, P., Watson, G. S., and Cabrera, J. (1987). Kernel Density Estimation with Spherical Data. Biometrika, 74, 751–762. Hassanzadeh, S., Hosseinibalam, F., and Omidvari, M. (2008). Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan. Physica A, 387, 2317–2327. Holzmann, H., Munk, A., Suster, M., and Zucchini, W. (2006). Hidden Markov Models for Circular and Linear-Circular Time Series. Environmental and Ecological Statistics, 13, 325–347. Jammalamadaka, R. S. and Lund, J. U. (2006). The Effect of Wind Direction on Ozone Levels: A Case Study. Environmental and Ecological Statistics, 13, 287–298. Liu, J. (2001). Monte Carlo Strategies in Scientific Computing. Springer-Verlag, New York. Marzio, M. D. and Taylor, C. C. (2009). Using Small Bias Nonparametric Density Estimators for Confidence Interval Estimation. Journal of Nonparametric Statistics, 21, 229–240. Marzio, M. D., Panzera, A., and Taylor, C. C. (2009). Local Polynomial Regression for Circular Predictors. Statistics and Probability Letters, 79, 2066–2075. Marzio, M. D., Panzera, A., and Taylor, C. C. (2011). Density Estimation on the Torus. Journal of Statistical Planning and Inference, 141, 2156–2173. Marzio, M. D., Panzera, A., and Taylor, C. C. (2012a). Nonparametric Regression for Circular Responses. Scandinavian Journal of Statistics, 40, 238–255. 57

Marzio, M. D., Panzera, A., and Taylor, C. C. (2012b). Nonparametric Smoothing and Prediction for Non-Linear Circular Time Series. Journal of Time Series Analysis, 33, 620–630. Marzio, M. D., Panzera, A., and Taylor, C. C. (2014). Nonparametric Regression for Spherical Data. Journal of the American Statistical Association, 109, 748–763. Mazumder, S. and Bhattacharya, S. (2014a). Bayesian Nonparametric Dynamic State-Space Modeling with Circular Latent States. Submitted. Mazumder, S. and Bhattacharya, S. (2014b). Supplement to “Bayesian Nonparametric Dynamic State-Space Modeling with Circular Latent States”. Submitted. Ravindran, P. and Ghosh, S. (2011). Bayesian Analysis of Circular Data Using Wrapped Distributions. Journal of Statistical Theory and Practice, 4, 1–20. Reinsel, C. G. and Tiao, C. G. (1987). Impact of Chlorofluoromethanes on Stratospheric Ozone: A Statistical Analysis of Ozone Data for Trends. Journal of the American Statistical Association, 82, 20–30. Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer-Verlag, New York. Shafie, K., Siegmund, D., Sigal, B., and Worsley, K. J. (2003). Rotation Space Random Fields with an Application to fMRI Data. Annals of Statistics, 31, 1732–1771. Shumway, R. H. and Stoffer, D. S. (2011). Time Series Analysis and Its Applications. Springer-Verlag, New York.

58

Smith, R. L. (1989). Extreme Value Analysis of Environmental Time Series: An Application to Trend Detection in Ground-Level Ozone. Statistical Science, 4, 367–377.

59