Jul 23, 2010 - arXiv:1007.4169v1 [stat.ME] 23 Jul 2010. Assessing Characteristic Scales Using Wavelets â . Michael J. Keim. DNV Renewables (USA) Inc...

0 downloads 9 Views 262KB Size

Assessing Characteristic Scales Using Wavelets † Michael J. Keim DNV Renewables (USA) Inc., Seattle, WA, USA

Donald B. Percival University of Washington, Seattle, WA, USA Summary. Characteristic scale is a notion that pervades the geophysical sciences, but it has no widely accepted precise definition. The wavelet transform decomposes a time series into coefficients that are associated with different scales. The variance of these coefficients can be used to decompose the variance of the time series across different scales. A practical definition for characteristic scale can be formulated in terms of peaks in plots of the wavelet variance versus scale. This paper presents basic theory for characteristic scales based upon the discrete wavelet transform, proposes a natural estimator for these scales and provides a large sample theory for this estimator that permits the construction of confidence intervals for a true unknown characteristic scale. Computer experiments are presented that demonstrate the efficacy of the large sample theory for finite sample sizes. Examples of characteristic scale estimation are given for global temperature records, coherent structures in river flows, the Madden–Julian oscillation in an atmospheric time series and transects of one type of Arctic sea ice.

1. Introduction Time series in geophysics and other areas often seem to be describable as a series of ‘states’ or ‘events’ whose durations tend to cluster around a value known as a characteristic scale. Although the notion of characteristic scale is widespread in the physical sciences, it does not have a precise definition independent of summary statistics that have been proposed to extract it from particular time series. Several definitions for characteristic scale are discussed in von Storch and Zwiers (1999) for time series that can be modeled as a stochastic process Xt , t ∈ Z (the set of all integers). One definition involves quantifying the ‘memory’ of the process. Suppose that P[Xt+τ > 0 | Xt > 0] > 0.5 for small lags τ , but P[Xt+τ > 0 | Xt > 0] = 0.5 at large lags. The smallest τ such that the latter relationship holds is one way to define a characteristic scale. Although this definition has some intuitive appeal because it is based on the length of time that a process takes to ‘forget’ its current positive state, von Storch and Zwiers note that it is of limited practical value; for example, it leads to an infinite characteristic scale for a first-order autoregressive (AR(1)) process, one of the most popular models in time series analysis. Under the additional assumption that Xt is a wide-sense stationary process with variance σ 2 , a more useful definition compares Xt to a process Yt consisting of independent and identically distributed random variables, also with variance σ 2 . The sample mean of Y1 , Y2 , . . . , YN has variance σ 2 /N , whereas that for X1 , X2 , . . . , XN can be expressed as σ 2 /N ′ , where N ′ is referred to as the equivalent sample size. The limit of the ratio N/N ′ as N → ∞ defines a quantity τD known as the decorrelation time. As †Address for correspondence: Donald B. Percival, Applied Physics Laboratory, Box 355640, Univerisity of Washington, Seattle, WA, 98195–5640, USA. E-mail: [email protected]

2

Keim and Percival

von Storch and Zwiers argue, τD is a reasonable definition for characteristic scale for some – but not all – time series. The practicality of this measure is partly due to its relationship with the autocorrelation sequence (ACS) ρk for Xt , namely, τD = 1 + 2

∞ X

ρk .

(1)

k=1

Appropriate estimates of ρk can thus be used as the basis for an estimate of τD ; for an AR(1) process, we have τD = (1 + ρ1 )/(1 − ρ1 ). More generally, von Storch and Zwiers note that the variances of statistics other than the sample mean can be used in a similar manner to define other measures of characteristic scale. Other approaches for defining characteristic scale have been discussed in the literature. Four examples are Simonetti et al. (1985), who cast the definition in terms of the structure function (basically a reformulation of the ACS); Cordes (1986), who uses the shape of the ACS; Higuchi (1988), who links characteristic scale to a measure of fractal dimension; and Tsonis et al. (1998), who define the concept in terms of fluctuations from cumulative sums in combination with detrending via singular spectrum analysis (see Section 6.1 for details). In this article we propose a new definition for characteristic scale based upon the discrete wavelet transform (DWT) of Xt . The DWT is often described as a scale-based transform (see, e.g., Flandrin, 1999, Percival and Walden, 2000, and Nason, 2008). To fix ideas, let us Haar focus on the Haar DWT. This transform yields wavelet coefficients, say Wτ,t , that reflect changes in adjacent averages spanning integer scales τ at times indexed by t; to be precise, Haar Wτ,t ∝

τ −1 τ −1 1X 1X Xt−l − Xt−τ −l . τ τ l=0

l=0

Haar If the ‘events’ in a time series have a characteristic duration of τ , then |Wτ,t | will tend to be large at certain indices t. Under the assumption that Xt is a stationary process, a summary Haar Haar of the ‘largeness’ of |Wτ,t | across t is provided by a time-independent quantity var {Wτ,t } known as the wavelet variance. The wavelet variance provides a scale-based decomposition Haar of the variance of Xt (see Section 2.1 for details), so a large var {Wτ,t } for a particular τ should provide the basis for defining a characteristic scale that is in the neighborhood of τ . The goal of this article is to expand this key idea to define a wavelet-based characteristic scale and to provide theory for a corresponding statistically tractable estimator. The remainder of this paper is organized as follows. Section 2 gives some necessary background on the DWT, the wavelet variance and its sampling theory for intrinsically stationary Gaussian processes. Section 3 proposes a wavelet-based definition of characteristic scale and contrasts it with τD . Section 4 deals with estimation of the wavelet-based characteristic scale and provides some large-sample theory for the proposed estimator. Section 5 reports on a Monte Carlo study that examines the efficacy of the large-sample theory for a representative selection of processes and finite sample sizes. Section 6 gives four examples of estimating characteristic scales for actual time series. The final section (7) has a summary and a discussion of possible extensions.

2. Background on the Wavelet Variance Here we define the wavelet variance, give an interpretation for it and present formulae that allows its computation, after which we review its estimation theory.

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

3

2.1. Definition and Basic Properties of the Wavelet Variance Let {Xt } be an intrinsically stationary process of integer order d ≥ 0, defined as follows. If d = 0, {Xt } itself is stationary in the sense that both E{Xt } and cov {Xt+τ , Xt } exist, are finite and are independent of t; if d > 0, then subjecting {Xt } to a dth order backward difference filter yields a stationary process, namely, (d)

Xt (d−1)

(1)

≡

d X d (−1)k Xt−k , k k=0

(0)

whereas {Xt }, . . . , {Xt } and {Xt } ≡ {Xt } are all nonstationary. Under the assump(d) tion that {Xt } has a spectral density function (SDF) denoted by SX (d) (·), the (generalized) SDF for {Xt } is defined to be SX (f ) =

SX (d) (f ) , [4 sin2 (πf )]d

where 4 sin2 (πf ) defines the squared gain function for a first-order backward difference filter (d) (d) (Yaglom, 1958). We denote the autocovariance sequence (ACVS) for {Xt } by {sτ }. Let {h1,l : l = 0, 1, . . . , L1 − 1} P be a unit-level Daubechies wavelet filter of width L1 = 2, 4, 6, . . . normalized such that l h21,l = 1/2 (Daubechies, 1992). If L1 ≥ 4, use of this filter is equivalent to subjecting the output from a backward difference filter of order d = L1 /2 to a low-pass filter of width L1 /2 (the case L1 = 2 yields the Haar wavelet filter, whose coefficients { 21 , − 12 } are proportional to a first-order backward difference filter). Let g1,l ≡ (−1)l+1 h1,L1 −1−l be the corresponding scaling filter. Let H1 (f ) ≡

LX 1 −1

h1,l e−i2πf l

l=0

define the transfer function for the wavelet filter, and let G1 (f ) denote the same for the scaling filter. For a level j ≥ 2, let Hj (f ) ≡ H1 (2j−1 f )

j−2 Y

G1 (2k f ).

k=0

The inverse Fourier transform of this function gives the impulse response sequence for the jth level wavelet filter {hj,l : l = 0, . . . Lj − 1}, where Lj ≡ (2j − 1)(L1 − 1) + 1. We denote the corresponding squared gain function by Hj (f ) = |Hj (f )|2 . The filter {hj,l } is approximately a bandpass filter with a passband given by |f | ∈ (1/2j+1 , 1/2j ]. The jth level wavelet coefficient process for {Xt } is given by Wj,t ≡

Lj −1

X

hj,l Xt−l .

l=0

The coefficient Wj,t is proportional to changes in adjacent weighted averages with an effective scale (or span) of τj = 2j−1 . Note that scale τj is associated with the frequency interval (1/(4τj ), 1/(2τj )] and the interval of periods [2τj , 4τj ). Under the assumptions that {Xt }

4

Keim and Percival

is an intrinsically stationary process of order d with SDF SX (·) and that L1 ≥ 2d, {Wj,t } is a stationary process with SDF given by Hj (f )SX (d) (f ) . [4 sin2 (πf )]d

Sj (f ) ≡ Hj (f )SX (f ) =

By definition the wavelet variance for {Xt } at scale τj is the variance of {Wj,t }: νj2

≡ var {Wj,t } =

Z

1/2

Sj (f ) df =

−1/2

1/2

Z

−1/2

Hj (f )SX (f ) df.

If {Xt } is a stationary process, then var {Xt } =

∞ X

νj2

j=1

(Percival, 1995), and the wavelet variance for scale τj can be interpreted as the contribution to the overall variance due to changes in adjacent weighted averages over that scale (if {Xt } is nonstationary, the summation above diverges to infinity, but νj2 still has the interpretation of measuring the variability of changes in adjacent weighted averages). The theoretical wavelet variance for an intrinsically stationary process can be computed (d) readily in terms of the ACVS {sτ } for its underlying stationary component. In particular, we can write νj2

=

(d) s0

Lj −d−1

(d) bj,l

X l=0

2

+2

Lj −d−1

X

Lj −d−1−τ

X

s(d) τ

τ =1

(d) (d)

bj,l bj,l+τ ,

l=0

(d)

(0)

where {bj,l } is the dth-order cumulative summation of {hj,l }; i.e., with bj,l ≡ hj,l , we have, for k = 1, . . . , d, l X (k) (k−1) bj,l = bj,n , l = 0, 1, . . . , Lj − k − 1 n=0

(d)

(Lemma 1, Craigmile and Percival, 2005). Using {bj,l }, we can write Wj,t =

Lj −d−1

X

(d)

(d)

bj,l Xt−l .

l=0

(d)

Denote the transfer function and squared gain function for {bj,l } as (d) Bj (f )

≡

Lj −d−1

X l=0

(d)

(d)

(d)

bj,l e−i2πf l and Bj (f ) ≡ |Bj (f )|2 =

Hj (f ) . [4 sin2 (πf )]d

Then we can have (d)

Sj (f ) = Bj (f )SX (d) (f ) and hence νj2 =

Z

1/2

−1/2

(d)

Bj (f )SX (d) (f ) df.

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

5

2.2. Estimation Theory for the Wavelet Variance Given a time series that can be regarded as a realization of a portion X0 , X1 , . . . , XN −1 of length N of the process {Xt }, we can compute the level j wavelet coefficients for indices Lj − 1 ≤ t ≤ N − 1 under the assumption that Mj ≡ N − Lj + 1 > 0. A sufficient (but not necessary) condition for {Wj,t } to be a zero mean stationary process is that L1 > 2d (if L1 = 2d, then {Wj,t } is necessarily stationary, but it might not have zero mean). Assuming 2 that L1 is chosen such that {Wj,t } is a zero mean stationary process, we have νj2 = E{Wj,t } and hence N −1 1 X 2 νˆj2 ≡ Wj,t (2) Mj t=Lj −1

is an unbiased estimator of the wavelet variance. To look at the second moment properties of νˆj2 , we assume that the Wj,t obey a multivariate Gaussian distribution. Using the Isserlis theorem (Isserlis, 1918) and assuming j ≤ k, we find that variance and covariance of νˆj2 and νˆk2 are given by cov {ˆ νj2 , νˆk2 }

2 = Mj

M k −1 X

τ =−(Mk −1)

|τ | 2 1− s2j,k,τ + Mk Mj Mk

LX k −2

N −1 X

s2j,k,t−u ,

(3)

t=Lj −1 u=Lk −1

where {sj,k,τ } is the cross-covariance sequence for the bivariate stationary processes {Wj,t } and {Wk,t }: sj,k,τ ≡ cov {Wj,t+τ , Wk,t } =

Lj −d−1

X l=0

(d)

bj,l

LkX −d−1 m=0

(d)

(d)

bk,m sτ −l+m

(when using (3) to compute var {ˆ νj2 } by letting k = j, the double summation is interpreted as zero). While (3) is an exact result, it is of interest to explore an approximation to cov {ˆ νj2 , νˆk2 } that leads to a practical scheme for estimating it. As N → ∞ and hence Mk → ∞ also, the double summation in (3) becomes negligible, whereas the first summation can be approximated using a Ces` aro sum argument, which, followed by an appeal to Parseval’s theorem, yields cov {ˆ νj2 , νˆk2 } ≈

Z 1/2 ∞ 2Aj,k 2 X 2 sj,k,τ = , where Aj,k ≡ Sj (f )Sk (f ) df. Mj τ =−∞ Mj −1/2

(4)

Suppose that Sˆj (f ), 0 < f < 1/2, is some standard nonparametric SDF estimator whose large-sample distribution is dictated by Sj (f )χ2η /η, where χ2η is a chi-square random variable with η degrees of freedom (see, e.g., Priestley, 1981). Letting Sˆk (f ) be a similar estimator for Sk (f ), it follows from standard theory for multivariate SDF estimation (Priestley, 1981) that 2 E{Sˆj (f )Sˆk (f )} ≈ Sj (f )Sk (f ) +1 η and hence that Aˆj,k ≡

η 2+η

Z

1/2

−1/2

Sˆj (f )Sˆk (f )df

6

Keim and Percival

is an approximately unbiased estimator of Aj,k . Specializing to the case where Sˆj (f ) and Sˆk (f ) are lag window estimators based upon lag windows {wj,τ } and {wk,τ } (Priestley, 1981), we have ! M k −1 X η 2 2 Aˆj,k = (5) νˆj νˆk + 2 wj,τ sˆj,τ wk,τ sˆk,τ , 2+η τ =1 where {ˆ sj,τ } is the biased estimator of the ACVS for Wj,Lj −1 , . . . , Wj,N −1 : sˆj,τ ≡

1 Mj

NX −1−τ

Wj,t+τ Wj,t ,

t=Lj −1

0 ≤ τ ≤ Mj − 1.

A practical scheme for approximating cov {ˆ νj2 , νˆk2 } is to substitute Aˆj,k for Aj,k in (4). Finally, we note that, under a Gaussian assumption on {Xt } and mild conditions on its SDF, νˆj2 is asymptotically normally distributed with mean νj2 and large sample variance 2Aj,j /Mj (Percival, 1995; Mondal, 2007; see Serrouk et al., 2000, for related results that relax the Gaussian assumption). 3. Wavelet-Based Definition of Characteristic Scale Because we can interpret the wavelet variance at a particular scale τj as the contribution to the overall variance of {Xt } due to changes in adjacent weighted averages over that scale, we can formulate a wavelet-based notion of characteristic scale by searching for scales at which νj2 is large compared to its surrounding values, thus leading to the following definitions. Definition 1. (Local characteristic scale τc,j .) Suppose {Xt } is an intrinsically sta2 tionary process such that νj2 ≥ νj±1 for some j ≥ 2, with strict inequality holding in at least one case. Fit a quadratic yk = a + bxk + cx2k that passes through (xk , yk ) ≡ (log2 (τk ), log2 (νk2 )), k = j − 1, j, j + 1. A local characteristic scale is the location at which the quadratic is maximized: τc,j = 2−b/(2c) = 2−β1 /β2 τj , where β1 ≡

yj+1 − yj−1 and β2 ≡ yj+1 − 2yj + yj−1 . 2

(6)

√ √ 2 2 2 We note in passing that √ τj / 2 ≤ τc,j ≤ τj 2 and that the pattern νj−1 < νj2 = νj+1 > νj+2 yields τc,j = τc,j+1 = τj 2. Definition 2. (Global characteristic scale τc .) Suppose {Xt } has a local characteristic scale τc,j such that νj2 > νk2 for all k ∈ Z+ excluding k = j − 1, j, j + 1, where Z+ is the set of positive integers. Then {Xt } is said to have a global characteristic scale τc ≡ τc,j . Figure 1 shows four examples of theoretical wavelet variance curves with local characteristic scales. If we assume that the wavelet variances at scales not depicted in the plots are all smaller than the ones shown, the processes associated with (a), (b) and (d) have a global characteristic scale, but the one for (c) does not. Our definitions for τc,j and τc need some justification. Arguably a more natural definition for characteristic scale would involve a wavelet variance defined over a continuum of scales via a continuous wavelet transform (CWT). A local maxima of a CWT-based wavelet variance curve would then define a local characteristic scale, which would seem to be preferable to our interpolation scheme based on just the dyadic scales τj = 2j−1 . The

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

log2(ν2j )

0

(a)

(b)

(c)

(d)

7

−3

−6

log2(ν2j )

0

−3

−6 0

3

6 log2(τj)

9

0

3

6

9

log2(τj)

Fig. 1. Log of wavelet variances νj2 versus log of τj (circles) for (a) a first-order autoregressive (AR(1)) process, (b) a linear combination of an AR(1) process and a fractionally differenced process; (c) a linear combination of an AR(1) process and white noise; and (d) a linear combination of two AR(1) processes (see Section 5 for precise definitions of all four processes). The vertical dashed lines indicate the locations the characteristic scales τc,j , while the gray curves show the quadratic fit whose maximum location determines τc,j .

. following example suggests the CWT- and DWT-based definitions are quite similar, which leads us to prefer the latter because it is much easier to compute and because its estimator is statistically tractable. Consider an AR(1) process Xt = φXt−1 + ǫt , where {ǫt } is a Gaussian white noise process with zero mean and unit variance. For the Haar DWT, this process has a global characteristic scale when 0.57 < φ < 1. The pluses in Fig. 2 show how τc increases as φ varies from 0.60 to 0.99 in steps of 0.01. Avoiding interpolation issues that arise in using a CWT with a time series sampled over the integers, we can readily extend the definition of the Haar wavelet variance to all scales τ ∈ Z+ : o nP Pτ −1 τ −1 var Xt−l − l=0 Xt−τ −l l=0 . ν 2 (τ ) = 4τ 2 A plot of ν 2 (τ ) versus τ for φ = 0.60, 0.61, . . . , 0.99, shows a unique maximum, which provides us with a CWT-like definition of characteristic scale τ˜c ∈ Z+ . The circles in Fig. 2 show τ˜c versus φ. The agreement between τc and τ˜c is very good and gets better as φ increases. By contrast, if we were to define τc in terms of a quadratic fit in linear/linear rather than log/log space, we obtain the asterisks shown in the figure, which do not agree nearly as well with the CWT-based definition τ˜c . Use of linear/log space or log/linear space

8

Keim and Percival

log2(characteristic scale)

8

* +

7

* + * +

6 5 4 +

3 2

* +* +* +* * * +* +* +* +

* * +* +* +* * * +* +* +

+ * * +* +* * + * +* + +

* +

* +* +

+ *

* +

* ++ +

* +* +* + +

*+ + * +* + 0.6

0.7

0.8

0.9

1.0

φ Fig. 2. Comparison of three wavelet-based measures of characteristic scale for an AR(1) process with a unit-lag autocorrelation of φ. The pluses show τc , which is based on a quadratic fit in log/log space around the peak in the Haar wavelet variance curve evaluated over dyadic scales τj = 2j−1 . The asterisks show a similar measure, but now based on a quadratic fit in linear/linear space. The circles are based on a measure given by the location of the peak of the Haar wavelet variance curve evaluated over all integer-valued scales.

. yields characteristic scales that are nearly identical to those obtained using, respectively, linear/linear space or log/log space. The choice of log/log space over log/linear space is dictated by the fact that the log transform acts as a variance-stabilizing transform for wavelet variance estimators (this can be seen further on by noting that, while the elements of Σ1 in Equation (7) depend on the wavelet variance, this quantity ratios out in the elements of Σ2 in Equation (8)). For a stationary process {Xt } with ACS {ρk }, it is of interest to compare τc to the measure of characteristic scale provided by the decorrelation time of Equation (1). For an AR(1) process, ρk = φ|k| decays exponentially, and we have τD = (1 + φ)/(1 − φ). Figure 3 shows τD versus the Haar-based τc as φ ranges over 0.60, 0.61, . . . , 0.99. The two measures . track each other as φ gets large, with τD = 1.01τc for φ = 0.99. Figure 1(a) shows the . Haar wavelet variance versus τj when φ = 0.7, for which τc = 4.53. By contrast, Fig. 1(b) shows a similar wavelet variance curve for a process that is a linear combination of an AR(1) process with φ = 0.75 and a fractionally differenced (FD) process with long-memory parameter δ = 0.45 (Granger and Joyeux, 1980; Hosking, 1981; Beran, 1984). The two . processes are independent of each other. Here τc = 5.87, whereas τD is infinite because ρk decays hyperbolically due to the influence of the FD process. This fact points out a fundamental difference between the measures τc and τD : whereas the former concentrates on localized properties of {Xt }, the latter is influenced to a large degree by the asymptotic

9

5 4 2

3

log2(τD)

6

7

8

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

2

3

4

5

6

7

8

log2(τc) Fig. 3. Decorrelation time τD versus characteristic scale τc for AR(1) processes with unit-lag autocorrelations of φ = 0.60, 0.61, ..., 0.99 (circles from left to right). If τD and τc had been equal, the circles would have fallen on the dashed line.

. decay rate of the ACS (and cannot be used at all with intrinsically stationary processes with d > 0). 4. Statistical Properties of Characteristic Scale Estimators Suppose we have a time series that can be regarded as a realization of a portion X0 , X1 , . . . , XN −1 of an intrinsically stationary process of order d, based upon which we want to estimate local characteristic scales (assuming such exist) for {Xt }. We start by estimating the wavelet variance out to some maximum scale of interest τJ0 using the unbiased estimator νˆj2 , j = 1, . . . , J0 . If there is some 1 < j < J0 such that the estimates obey the pattern 2 νˆj2 ≥ νˆj±1 (with strict inequality holding in at least one case), we can define an estimator τˆc,j of the characteristic scale in the region of τj by replacing yk with yˆk ≡ log2 (ˆ νk2 ) in equation (6): yˆj+1 − yˆj−1 ˆ ˆ and βˆ2 ≡ yˆj+1 − 2ˆ yj + yˆj−1 . τˆc,j = 2−β1 /β2 τj , where βˆ1 ≡ 2 We want to establish simple – but reasonable – approximations to the sampling properties of this estimator under the assumption that N is large. Motivated by the large-sample theory reviewed at the end of Section 2.2, we start with T 2 2 is multivariate Gaussian with a mean given by the the assumption that νˆj−1 , νˆj2 , νˆj+1 T 2 2 and a covariance matrix dictated by equation (3). true wavelet variances νj−1 , νj2 , νj+1 We use equation (4) to approximate this symmetric matrix by

Aj−1,j−1 /Mj−1 Σ1 ≡ 2 Aj−1,j /Mj−1 Aj−1,j+1 /Mj−1

— Aj,j /Mj Aj,j−1 /Mj

— . — Aj+1,j+1 /Mj+1

(7)

10

Keim and Percival

T T 2 2 The delta method says that log2 νˆj−1 , log2 νˆj2 , log2 νˆj+1 = [ˆ yj−1 , yˆj , yˆj+1 ] is ap T T 2 2 proximately Gaussian with mean log2 νj−1 , log2 νj2 , log2 νj+1 = [yj−1 , yj , yj+1 ] and covariance matrix Σ2 whose elements are Σ2,m,n ≡

2 2 cov {ˆ νj−2+m , νˆj−2+n }

2 2 log2 (2) νj−2+n νj−2+m

+2

2 2 2 2 var {ˆ νj−2+m } var {ˆ νj−2+n } + (cov {ˆ νj−2+m , νˆj−2+n })2 4 4 log2 (2) νj−2+n νj−2+m

,

(8) with m and n = 1, 2 and 3. Since

βˆ1 βˆ2

=

− 21 1

0 −2

1 2

1

yˆj−1 yˆj−1 yˆj ≡ H yˆj , yˆj+1 yˆj+1

it follows that [βˆ1 , βˆ2 ]T is approximately Gaussian with mean [β1 , β2 ]T and covariance HΣ2 H T . Let κ ˆ ≡ −βˆ1 /βˆ2 . Further applications of the delta method say that κ ˆ is approximately Gaussian with mean −β1 /β2 and variance given approximately by σκ2ˆ

≡

var {βˆ1 } β12 var {βˆ2 } var {βˆ1 } var {βˆ2 } + 2(cov {βˆ1 , βˆ2 })2 + + β22 β24 β24 2β1 cov {βˆ1 , βˆ2 } 3β 2 (var {βˆ2 })2 − + 1 6 β2 β23

and that

(9)

2 var {ˆ τc,j } ≈ τc,j σκ2ˆ log2e (2).

We can now provide, for example, an approximate 95% confidence interval (CI) [L− , L+ ] for κ; i.e., . ˆ ± 1.96σκˆ . P[L− ≤ κ ≤ L+ ] ≈ 0.95, where L± = κ

The event L− ≤ κ ≤ L+ is equivalent to the event τj 2L− ≤ τc,j ≤ τj 2L+ , so an approximate 95% CI for τc,j is given by [2−1.96σκˆ τˆc,j , 21.96σκˆ τˆc,j ]. In practical applications, we can estimate σκ2ˆ in a ‘plug-in’ manner by using Aˆj,k from (5) for Aj,k in (7), νˆj2 from (2) for νj2 in (8), and βˆ1 and βˆ2 for β1 and β2 in (9). A caveat about our approach is that it is conditioned upon the observed pattern νˆj2 ≥ 2 νˆj±1 correctly indicating the presence of a local characteristic scale somewhere in the vicinity of τj . As N → ∞, observed patterns will agree better and better with true patterns because of the asymptotic properties of the wavelet variance estimators, but observed patterns might be deceptive for finite sample sizes. A sanity check that sheds some light on the validity of an observed pattern is to generate a large number of independent realizations from a T 2 2 and covariance matrix trivariate Gaussian distribution with mean vector νˆj−1 , νˆj2 , νˆj+1 ˆ dictated by (7) with Aj,k replaced by Aj,k of equation (5). If the proportion of realizations that have a maximum in the same location as the observed pattern is large, then we have some reassurance that the observed pattern is faithfully mimicking the true pattern (see Section 6.3 for an example of this procedure). 5. Monte Carlo Experiments We consider the following four zero-mean Gaussian stationary processes, whose wavelet variances are depicted in Fig. 1:

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

11

Table 1. Results from Monte Carlo experiments (see text for details). process (a)

(b)

(c)

(d)

(d)

N 512 2048 8192 512 2048 8192 512 2048 8192 512 2048 8192 512 2048 8192

τc 4.53

5.87

30.42

3.76

122.96

τˆc 4.69 4.66 4.57 6.16 5.84 5.83 33.51 32.49 31.41 4.00 3.84 3.76 89.31 149.60 153.43

M 992 1000 1000 981 1000 1000 838 964 1000 971 999 1000 454 703 775

% coverage 88.2 87.1 94.4 89.2 90.5 93.8 86.8 88.5 93.1 91.0 96.7 96.1 86.8 88.9 86.7

(a) an AR(1) process with a variance of 4 and a unit-lag autocorrelation of φ = 0.7; √ (b) a process given by √23 Xt + √13 Yt , where {Xt } is an AR(1) process with φ = 0.75, while {Yt } is an FD process with long-memory parameter δ = 0.45; (c) √12 Xt + √12 Yt , where {Xt } is an AR(1) process with φ = 0.95, while {Yt } is a white noise process; and √ (d) √23 Xt + √13 Yt , where {Xt } is an AR(1) process with φ = 0.65, while {Yt } is a similar process with φ = 0.99.

For creating the last three processes, {Xt } and {Yt } are unit variance Gaussian processes such that Xs and Yt are independent for all s and t. For each process and for samples size N = 512, 2048 and 8192, we generated 1000 realizations using an exact simulation method for AR processes (Kay, 1981) and for FD processes (Davies and Harte, 1987; Craigmile, 2003). We recorded the number of replications M for which there was a peak in the Haar wavelet variance curve at either the proper level j or levels j ± 1. For each of these M realizations, we estimated the characteristic scale and computed a 95% CI using the plug-in procedure described above (the estimates Aˆj,k were formed using periodograms, which are a special case of a lag window estimator with wj,τ = 1 for all τ and with η = 2). Table 1 shows the average of the estimated characteristic scales and the percentage of times that the 95% CIs trapped the true characteristic scale. There is a tendency for τˆc to be positively biased. The closeness of coverage percentage to the nominal 95% tends to depend upon the true τc : the smaller τc is, the better the coverage rate. For small sample sizes, the coverage rate tends to be below the nominal 95%. The coverage rates tend to improve with increasing sample size, as asymptotic theory would suggest. These experiments show that the large-sample theory gives useful – but admittedly not perfect – approximations to the variability in τˆc for moderate sample sizes. (Similar results were obtained using Daubechies wavelet filters of lengths L1 = 4 and 8.)

12

Keim and Percival

degrees

(a) 0.5 0.0

−0.5 (b)

8000 6000 4000 2000

meters

3 2 1 0 −1 −2 −3

(c)

12 10 8 6 4 2 0

(d’)

indicator

(d) 1

0

0

500

1000

1500

2000

time index

Fig. 4. Four time series (upper four plots) and one derived series (bottom plot). The series are (a) monthly global temperature anomalies; (b) coherent structures in river flows; (c) 200-hPa velocity potential anomalies in the atmosphere; (d’) Arctic sea ice thickness; and (d) indicator series for medium multiyear Arctic sea ice.

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

13

6. Real-World Examples Here we consider four examples of characteristic scale estimation based upon actual time series and the Haar wavelet.

6.1. Global Temperature Record Figure 4(a) shows a time series of length N = 1560 of monthly global temperature anomalies (land and ocean combined) from January 1880 up to December 2009 (data obtained from http://www.ncdc.noaa.gov/pub/data/anomalies/). The series appears to have a prominent trend upwards. Tsonis et al. (1998) analyzed a closely related series by forming cumulative sums and considering standard deviations of lagged differences of these sums as a function of lag. Because trends adversely affect this method, they elected to detrend the series using the first three empirical orthogonal functions extracted from a singular spectrum analysis (Elsner and Tsonis, 1996). Subtraction of these functions from the time series yielded a residual series that was subjected to a test for non-zero trend using a Cox–Stuart nonparametric test (Cox and Stuart, 1955). Since the test failed to reject the null hypothesis of no trend, they used the residuals to estimate a characteristic scale, obtaining a value of about 20 months. This scale was the value at which the curve of standard deviations versus lag exhibited a change in slope in log/log space. They interpreted this characteristic scale as being due to the influence of El Ni˜ no/La Ni˜ na cycles on global temperatures. Figure 5(a) shows that the Haar wavelet variance curve for this series has a peak at scale . τ5 = 16 months. The corresponding estimated characteristic scale is τˆc,5 = 14.9 months, with an associated 95% CI of [9.6, 23.0]. This interval traps the nominal characteristic scale found by Tsonis et al. (1998). There is, however, some cause for concern here due to the apparent trend in this time series. If we model the series as Xt = a + bt + Yt , where a + bt describes a linear (first-order polynomial) trend and {Yt } is a zero mean stationary process, then the Haar wavelet coefficient process {Wj,t } is stationary, but has a nonzero mean, which is in conflict with the zero mean assumption used to construct νˆj2 of Equation (2). The wavelet coefficients most influenced by a linear trend are those at the largest scales, which explains the upward pattern in the wavelet variance curve of Fig. 5(a) at those scales. Since there is some concern that the wavelet variance estimates at smaller scales might also be adversely affected by the trend, we also considered a wavelet variance curve constructed using a Daubechies wavelet filter of width L1 = 8 (the so-called ‘least asymmetric’ filter). This filter is capable of completely eliminating a trend that is well-approximated by a third-order polynomial because of its embedded backward difference filter of order d = 4 (Craigmile et al., 2004). The wavelet variance curve for this filter also exhibits a peak at . scale τ5 . The corresponding estimated characteristic scale is τˆc,5 = 16.0 months, with an associated 95% CI of [11.1, 23.3], all of which are in reasonable agreement with the results from the Haar wavelet. This example points out that, because of differencing operations embedded within wavelet filters, there is no need to detrend a time series prior to a wavelet analysis if care is taken in selecting a wavelet filter of appropriate length L1 to handle the nature of the apparent trend in the series.

6.2. Coherent Structures in River Flow Figure 4(b) shows a time series capturing so-called ‘coherent structures’ (such as boils or eddies) in river flows (Chickadel et al., 2009; data courtesy of Alex Horner-Devine and

14

Keim and Percival

log2(ν2j )

−5

(a)

20

−8

17

−11

14

−2 log2(ν2j )

(b)

(c)

(d)

−4

−5

−7

−8

−10 0

3

6 log2(τj)

9

0

3

6

9

12

log2(τj)

Fig. 5. Log of wavelet variances νj2 versus log of τj (circles) for (a) monthly global temperature anomalies; (b) coherent structures in river flows; (c) 200-hPa velocity potential anomalies; and (d) medium multiyear Arctic sea ice. The vertical dashed lines indicate the locations the estimated characteristic scales τˆc,j , while the gray curves show the quadratic fit whose maximum location determines τˆc,j . The horizontal solid lines depict 95% confidence intervals for the true characteristic scales.

. Bronwyn Hayworth, Department of Civil and Environmental Engineering, University of Washington). The 2048 values shown in the plot are from a longer series of length N = 29972 that has a sampling interval of ∆ = 1/25 sec and spans a little less than 20 min (the subseries in the plot is from the first 82 sec). This time series is derived from measurements from three transducers and a velocity profiler set on the bottom of the Snohomish River Estuary in Washington State immediately downstream of a sill pointing upwards. The structures are essentially quasi-periodic upwellings from the river that appear as temporary ‘blobs’ on the surface of the river. Each blob dissipates within a second or so, and then another blob forms sometimes later. As the tide increases, the water velocity increases, and the frequency at which the blobs occur appears to increase. Videos of the river surface clearly show these boils qualitatively, but quantifying this little-understood phenomenon using standard Fourier-based spectral analysis is problematic because it appears as a small perturbation in a low-frequency rolloff. By contrast, the scalebased analysis afforded by the wavelet variance (Fig. 5(b)) clearly displays a peak in its decomposition of the sample variance, rendering the phenomenon as interpretable in terms . of a characteristic scale. The estimated standardized characteristic scale is τˆc,6 = 41.1, which corresponds to a physical characteristic scale of τˆc,6 ∆ = 1.6 sec, with an associated 95%

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

15

CI of [1.4, 1.9] sec. The time-evolving properties of the boils can be studied by estimating characteristic scales for time series spanning successive 20-minute time intervals.

6.3. Madden–Julian Oscillation (MJO) Figure 4(c) shows the first 2048 values from a time series of 200-hPa velocity potential anomalies equatorward of latitude 30◦ N and at longitude 80◦ E (one of a number of daily MJO indices available from http://www.cpc.ncep.noaa.gov/). The entire series has 2354 values covering 3 January 1978 to 29 March 2010 with a sampling interval of ∆ = 5 days. This series is one manifestation of the MJO, which Madden and Julian (1994) define as a 40–50 day oscillation appearing in various atmospheric time series collected in the tropics. The periods associated with the MJO have been revised since 1994 based upon subsequent analysis of additional time series – the MJO is now sometimes called a 30–60 day or intraseasonal oscillation. The wavelet variance plot for the velocity potential anomalies (Fig. 5(c)) has a local peak at scale τ2 , with associated standardized local characteristic . . scale τˆc,2 = 2.53, which converts into a physical scale of τˆc,2 ∆ = 12.7 days. The associated 95% CI is [11.9, 13.5] days. Since a physical scale of τ ∆ is associated with the interval of periods [2τ ∆, 4τ ∆], the point estimate τˆc,2 ∆ matches up with 25–51 day oscillations and hence with the description of the MJO as a 30–60 day oscillation. A difficulty with using Fourier-based spectral analysis to deduce the MJO is the lack of a standard way to determine the beginning and end of the frequency interval associated with this broad-band oscillation. The notion of a characteristic scale bypasses this difficulty and opens up a means of objectively tracking how the MJO varies across time and over different time series. In addition to the peak at τ2 , there is a second one at τ7 , which leads to an estimated . physical characteristic scale of τˆc,7 ∆ = 304 days and an associated 95% CI of [79, 1170]. The interval of periods associated with τˆc,7 is [609, 1217] days, so this local characteristic scale suggests an oscillation spanning two to three years that is about 5 times weaker than the MJO. The presence of this weak additional oscillation is conditional upon the peak pattern in the wavelet variance estimates being correct. As an example of the reality check described at the end of Section 4, we generated 100, 000 realization from a trivariate normal distribution with mean [ˆ ν62 , νˆ72 , νˆ82 ]T and covariance dictated by Equation (7), with Aj,k estimated per Equation (5). Of these realizations, 60% obeyed the observed νˆ62 ≤ νˆ72 ≥ νˆ82 pattern, but the remaining 40% did not, casting considerable doubt on the validity of the observed peak pattern. (A similar test on the MJO peak at τ2 yielded 99, 916 realizations with the observed peak pattern and only 84 without.)

6.4. Medium Multiyear Arctic Sea Ice Figure 4(d′ ) shows 2048 measurements of ice thickness taken at 1 m spacings along a transect near the North Pole in April of 1991. The entire set of data consists of N = 49, 998 measurements extending over 50 km and was collected by a U.S. Naval submarine with an upward-looking sonar (the data are archived by the National Snow and Ice Data Center at http://nsidc.org/). We can regard these data as a time series with a spacing of ∆ = 1 m, where here ‘time’ is a surrogate for distance along the submarine’s path under the ice (the submarine was moving in the same direction at a constant speed as much as possible, and the data were recorded at regular intervals of time). Researchers classify sea ice by thickness, with different ice types thought to be driven by different physical processes (Flato, 1995, and World Meteorological Organization, 2007).

16

Keim and Percival

One such type is called medium multiyear ice and has a thickness from 2 to 5 m. The horizontal dashed lines on Fig. 4(d′ ) demark this ice type. Figure 4(d) shows a binary-valued times series indicating the absence or presence (using 0 or 1) of this ice type. Figure 5(d) shows the Haar wavelet variance curve for this indicator series. The curve exhibits a single . broad peak at scale τ7 , leading to an characteristic scale is τˆc,7 = 48.9 m, with an associated 95% CI of [29.6, 80.7] m. We can regard this characteristic scale as an indicator of the ‘typical’ extent of medium multiyear ice. In the face of other evidence that the Arctic climate is dramatically changing, a question of considerable geophysical interest is how stable the characteristic scales for different ice types are both spatially and temporally. Because submarines have collected data on sea-ice thickness throughout the Arctic region since 1958, it is possible to look at temporal and spatial variations in estimated characteristic scales and to use the methodology developed in this paper as one way to assess changes in Arctic climatology over the past 50 years. Finally we note that there is evidence of long-range dependence in series of ice thickness measurements (Percival et al., 2008). This type of dependence maps over into indicator series, which means that the characteristic scale τD of Equation (1) would be infinite for the medium multiyear ice indicators. By contract, the wavelet-based characteristic scale is finite and provides a useful summary of one aspect of the indicator series.

7. Summary and Discussion We have proposed a new definition for the characteristic scale of a time series that can be modeled as an intrinsically stationary process. The definition is based upon local peaks in a plot of the wavelet variance versus scale. Since the wavelet variance provides a scalebased decomposition of the process variance, a characteristic scale corresponds to one that is contributing more to the overall variance than scales surrounding it. This wavelet-based definition of characteristic scale has certain advantages over other definitions, including abilities to (1) focus on localized properties of the process rather than asymptotic decay rates of autocorrelation sequences, (2) handle certain nonstationary processes and (3) handle series with trends that are well approximated by a low-order polynomial. We have developed a large-sample theory for an estimator of the wavelet-based characteristic scale, and we have demonstrated the use of this theory through Monte Carlo experiments and applications to four representative real-world time series. There are several avenues of research that are outside the scope of this article, of which we mention three of particular interest. First, the basis for our large-sample statistical theory for the characteristic scale estimator is that the underlying wavelet coefficient processes are Gaussian. This assumption does not automatically rule out the usefulness of our theory for non-Gaussian processes. The filtering that is required to generate wavelet coefficients produces a central limit effect. Thus, even if a process is non-Gaussian, its associated wavelet coefficient processes might be well approximated as Gaussian, particularly at large scales. The indicator series for medium multiyear Arctic sea ice considered in Section 6.4 is an example of a non-Gaussian series whose large-scale wavelet coefficients are markedly closer in distribution to Gaussian than the original series. While limited tests to date indicate that the Gaussian-based large-sample theory for τˆc,j is reasonably valid for indicator series, there is certainly room for additional research that examines the question of non-Gaussianity more thoroughly. Second, we have assumed our time series to be regularly sampled, but irregularly sampled

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

17

series often occur in practice. The simplest form of irregularity is missing observations in what would otherwise be a regularly sampled series. Mondal and Percival (2010) present statistical theory for an estimator of the wavelet variance that works with gappy time series. This theory can presumably be used as the basis for a characteristic scale estimator for gappy time series. A more serious challenge is to provide a corresponding theory for time series sampled at irregular patterns. A third avenue for additional research is to handle two-dimensional data (collected on a regular grid) that display a degree of characteristic bumpiness. Gilgai patterns in the Bland Plain of New South Wales, Australia, provide an example of this type of data. Milne et al. (2010) have analyzed these patterns using a two-dimensional version of the wavelet variance. The notion of a characteristic scale for a two-dimensional isotropic field could be the basis for an interesting complementary analysis of these data. Acknowledgments This research was supported by U.S. National Science Foundation Grant No. ARC 0529955. Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors thank Yanling Yu for discussions on the Arctic sea ice example. References Beran, J. (1994) Statistics for Long-Memory Processes. New York: Chapman & Hall. Chickadel, C. C., Horner-Devine, A. R., Talke, S. A. and Jessup, A. T. (2009) Vertical boil propagation from a submerged estuarine sill. Geophysical Research Letters, 36, L10601, doi:10.1029/2009GL037278. Cordes, J. M. (1988) Space velocities of radio pulsars from interstellar scintillations. Astrophysical Journal, 311, 183–196. Cox, D. R. and Stuart, A. (1955) Some quick sign tests for trend in location and dispersion. Biometrika, 42, 80–95. Craigmile, P. F. (2003) Simulating a class of stationary Gaussian processes using the Davies–Harte algorithm, with application to long memory processes. Journal of Time Series Analysis, 24, 505–511. Craigmile, P. F., Guttorp, P. and Percival, D. B. (2004) Trend assessment in a long memory dependence model using the discrete wavelet transform. Environmetrics, 15, 313–335. Craigmile, P. F. and Percival, D. B. (2005) Asymptotic decorrelation of between-scale wavelet coefficients. IEEE Transactions on Information Theory, 51, 1039–1048. Davies, R. B. and Harte, D. S. (1987) Tests for Hurst effect. Biometrika, 74, 95–101. Daubechies, I. (1992) Ten Lectures on Wavelets. Philadelphia: SIAM. Elsner, J. B. and Tsonis, A. A. (1996) Singular Spectrum Analysis: A New Tool in Time Series Analysis. New York: Plenum.

18

Keim and Percival

Flandrin, P. (1999) Time-Frequency/Time-Scale Analysis. San Diego: Academic Press. Flato, G. M. (1995) Spatial and temporal variability of Arctic ice thickness. Annals of Glaciology, 21, 323–329. Higuchi, T. (1988) Approach to an irregular time series on the basis of the fractal theory. Physica D, 31, 277–283. Hosking, J. R. M. (1981) Fractional differencing. Biometrika, 68, 165–76. Isserlis, L. (1918) On a formula for the product-moment coefficient of any order of a normal frequency distribution in any number of variables. Biometrika, 12, 134–9. Kay, S. M., (1981) Efficient generation of colored noise. Proceedings of the IEEE, 69, 480–481. Madden, R. A. and Julian, P. R. (1994) Observations of the 40–50-day tropical oscillation – a review. Monthly Weather Review, 122, 814–837. Milne, A. E., Webster, R. and Lark, R. M. (2010) Spectral and wavelet analysis of gilgai patterns from air photography. Australian Journal of Soil Research, 48, 309–325. Mondal, D. (2007) Wavelet Variance Analysis for Time Series and Random Fields. PhD dissertation, Department of Statistics, University of Washington. Mondal, D. and Percival, D. B. (2010) Wavelet variance analysis for gappy time series. Annals of the Institute of Statistical Mathematics, 62, 943–966. Nason, G. P. (2008) Wavelet Methods in Statistics with R. New York: Springer. Percival, D. B. (1995) On estimation of the wavelet variance. Biometrika, 82, 619–631. Percival, D. B., Rothrock, D. A., Thorndike, A. S. and Gneiting, T. (2008) The variance of mean sea-ice thickness: effect of long-range dependence. Journal of Geophysical Research – Oceans, 113, C01004, doi:10.1029/2007JC004391. Percival, D. B. and Walden, A. T. (2000) Wavelet Methods for Time Series Analysis. Cambridge: Cambridge University Press. Priestley, M. B. (1981) Spectral Analysis and Time Series. London: Academic Press. Serroukh, A., Walden, A. T. and Percival, D. B. (2000) Statistical properties and uses of the wavelet variance estimator for the scale analysis of time series. Journal of the American Statistical Association, 95, 184–196. Simonetti, J. H., Cordes, J. M. and Heeschen, D. S. (1985) Flicker of extragalactic radio sources at two frequencies. Astrophysical Journal, 296, 46–59. Tsonis, A. A., Roebber, P. J. and Elsner, J. B. (1998) A characteristic time scale in the global temperature record. Geophysical Research Letters, 25, 2821–2823. von Storch, H. and Zwiers, F. W. (1999) Statistical Analysis in Climate Research. Cambridge: Cambridge University Press.

ASSESSING CHARACTERISTIC SCALES USING WAVELETS

19

World Meteorological Organization (2007) Sea-Ice Information Services in the World, 3rd edn. Publication 574. Geneva: World Meteorological Organization. Yaglom, A. M. (1958) Correlation theory of processes with random stationary nth increments. American Mathematical Society Translations (Series 2), 8, 87–141.