Jun 1, 2013 - H will denote vector or matrix transposition and vector or matrix ..... Nevertheless, our assumption on flat input spectrum is reasonabl...

0 downloads 5 Views 632KB Size

Analysis of Mismatched Estimation Errors Using Gradients of Partition Functions∗ Wasim Huleihel and Neri Merhav Department of Electrical Engineering

arXiv:1306.0094v1 [cs.IT] 1 Jun 2013

Technion - Israel Institute of Technology Haifa 32000, ISRAEL E-mail: {[email protected], [email protected]}.technion.ac.il

Abstract We consider the problem of signal estimation (denoising) from a statistical-mechanical perspective, in continuation to a recent work on the analysis of mean-square error (MSE) estimation using a direct relationship between optimum estimation and certain partition functions. The paper consists of essentially two parts. In the first part, using the aforementioned relationship, we derive single-letter expressions of the mismatched MSE of a codeword (from a randomly selected code), corrupted by a Gaussian vector channel. In the second part, we provide several examples to demonstrate phase transitions in the behavior of the MSE. These examples enable us to understand more deeply and to gather intuition regarding the roles of the real and the mismatched probability measures in creating these phase transitions.

Index Terms Minimum mean-square error (MMSE), mismatched MSE, partition function, statistical-mechanics, conditional mean estimation, phase transitions, threshold effect.

I. I NTRODUCTION The connections and the interplay between information theory, statistical physics and signal estimation have been known for several decades [1-4], and they are still being studied from a variety of aspects, see, for example [5-17] and many references therein. ∗

This research was partially supported by The Israeli Science Foundation (ISF), grant no. 412/12.

October 16, 2018

DRAFT

2

Recently, in [6], the well known I-MMSE relation [8], which relates the mutual information and the derivative of the minimum mean-square error (MMSE), was further explored using a statistical physics perspective. Specifically, in their analysis, the authors of [6] exploit the natural “mapping” between information theory problems and certain models of many-particle systems in statistical mechanics (see, e.g., [18, 19]). One of the main contributions in [6] is the demonstration of the usefulness of statisticalmechanical tools (in particular, utilizing the fact that the mutual information can be viewed as the partition function of a certain physical system) in assessing MMSE via the I-MMSE relation of [8]. More recently, Merhav [5] proposed a more flexible method, whose main idea is that, for the purpose of evaluating the covariance matrix of the MMSE estimator, one may use other information measures, which have the form of a partition function and hence can be analyzed using methods of statistical physics (see, e.g., [18-26] and many references therein). The main advantage of the proposed approach over the I-MMSE relations, is its full generality: Any joint probability function P (x, y), where x and y designate the channel input to be estimated and the channel output, respectively, can be handled (for example, the channel does not have to be additive or Gaussian). Moreover, using this approach, any mismatch, both in the source and the channel, can be considered. This paper is a further development of [5] in the above described direction. Particularly, in [5, Section IV. A], the problem of mismatched estimation of a codeword, transmitted over an additive white Gaussian (AWGN) channel, was considered. It was shown that the mismatched MSE exhibits phase transitions at some rate thresholds, which depend upon the real and the mismatched parameters of the problem, and the behavior of the receiver. To wit, the mismatched MSE acts inherently differently for a pessimistic and optimistic receivers, where in the example considered in [5, Section IV. A] pessimism literally means that the estimator assumes that the channel is worse than it really is (in terms of signal-to-noise ratio (SNR)), and the vice versa for optimism. In this paper, we extend the above described model to a much more general one; the Gaussian vector channel, which has a plenty of applications in communications and signal processing. It is important to emphasize that compared to [5, 6], it will be seen that: (1) the mathematical analysis is much more complicated (consisting of some new concepts), and (2) the notions of pessimism and optimism described above, also play a significant role in this model, although their physical meanings in general are not obvious. Moreover, in contrast to previous work on mismatched estimation, in this paper, the interesting case of channel mismatch is explored, namely, the receiver has a wrong assumption on the channel. In order to demonstrate the usefulness of the theoretical results derived for the general model, we also provide a few examples associated with some specific channel transfer functions, and draw conclusions and insights regarding the threshold effects in the behavior of October 16, 2018

DRAFT

3

the partition function and the MSE. As was mentioned earlier, we consider the Gaussian vector channel model Y = AX + N ,

(1)

where N ∈ Rn is a Gaussian white noise vector and A is a deterministic n × n matrix representing

a linear transformation induced by a given linear system. The vector X ∈ Rn is chosen uniformly at

random from a codebook (which is itself selected at random as well). There are several motivations for codeword estimation. One example is that of a user that, in addition to its desired signal, receives also a relatively strong interference signal, which carries digital information intended to other users, and which comes from a codebook whose rate exceeds the capacity of this crosstalk channel between the interferer and our user, so that the user cannot fully decode this interference. Nevertheless, our user would like to estimate the interference as accurately as possible for the purpose of cancellation. Furthermore, we believe that the tools/concepts developed in this paper for handling matched and mismatched problems, can be used in other applications in signal processing and communication. Such examples are denoising (see for example, [27-29]), mismatched decoding (for example, [30]), blind deconvolution (for example, [31, 32]), and many other applications. Note that although the aforementioned examples are radically different (in terms of their basic models and systematization), they will all suffer from mismatch when estimating the input signals. In the special case of matched estimation, it will be shown that the MMSE is asymptotically given by R 2π Px 1 if R > Rc 2π 0 1+|H(ω)|2 Px β dω, mmse (X | Y ) lim = (2) n→∞ n 0, if R ≤ R c

where

1 Rc = 4π △

Z

0

2π

ln 1 + |H (ω)|2 Px β dω,

(3)

in which mmse (X | Y ) is the estimation error results from estimating X based on Y , using the MMSE estimator, 1/β and Px denote the noise variance and the transmitted power, respectively, and H (ω) is the frequency response of the linear system A. As can be seen from the above formula, for R < Rc the MMSE essentially vanishes since the correct codeword can be reliably decoded, whereas for R > Rc , the MMSE is simply the estimation error which results by the Wiener filter that would have been applied had the input been a zero-mean, i.i.d. Gaussian process, with variance 1/β . Accordingly, it will be seen that for R > Rc the MMSE estimator is simply the Wiener filter. It is important to emphasize that while October 16, 2018

DRAFT

4

the above result may seem to be a natural generalization of the results in [5, 6] (where A is taken to be identity matrix), the analysis (and results) of the mismatched case is by far more complicated and non-trivial. Indeed, it will be seen that in the mismatched case, the MSE is essentially separated into two cases, each exhibiting a completely different behavior. Further physical insights regarding the above result and other results will be presented later on. The remaining part of this paper is organized as follows. In Section II, we first establish notation conventions. Then, the model considered is presented and the problem is formulated. In Section III, the main results are stated and discussed. In Section IV, we provide a few examples which illustrate the theoretical results. In Section V, we discuss the techniques and methodologies that are utilized in order to prove the main results, along with a brief background and summary on the basic relations between the conditional mean estimator, as well as its error covariance matrix and the aforementioned partition function, which were derived in [5]. In Section VI, the main results are proved. Finally, our conclusions appear in Section VII. II. N OTATION C ONVENTIONS

AND

P ROBLEM F ORMULATION

A. Notation Conventions Throughout this paper, scalar random variables (RV’s) will be denoted by capital letters, their sample values will be denoted by the respective lower case letters and their alphabets will be denoted by the respective calligraphic letters. A similar convention will apply to random vectors and their sample values, which will be denoted with same symbols in the bold face font. Thus, for example, X will denote a random vector (X1 , . . . , Xn ) and x = (x1 , . . . , xn ) is a specific vector value in X n , the n-th Cartesian

power of X . The notations xji and X ji , where i and j are integers and i ≤ j , will designate segments (xi , . . . , xj ) and (Xi , . . . , Xj ), respectively. Probability functions will be denoted generically by the

letter P or P ′ . In particular, P (x, y) is the joint probability mass function (in the discrete case) or the joint density (in the continuous case) of the desired channel input vector x and the observed channel output vector y . Accordingly, P (x) will denote the marginal of x, P (y | x) will denote the conditional probability or density of y given x, induced by the channel, and so on. The expectation operator of a generic function f (x, y) with respect to (w.r.t.) the joint distribution of X and Y , P (x, y) , will be denoted by E {f (X, Y )}. Accordingly, E ′ {f (X, Y )} means that the

expectation is performed w.r.t. P ′ (x, y). The conditional expectation of the same function given that

Y = y , denoted E {f (X, Y ) | Y = y} and which is obviously identical to E {f (X, y) | Y = y}, is,

of course, a function of y . On substituting Y in this function, this becomes a random variable which October 16, 2018

DRAFT

5

will be denoted by E {f (X, Y ) | Y }. When using vectors and matrices in a linear-algebraic format, n-dimensional vectors, like x (and X ), will be understood as column vectors, the operators (·)T and (·)H

will denote vector or matrix transposition and vector or matrix conjugate transposition, respectively, and ·

so, xT would be a row vector. For two positive sequences {an } and {bn }, the notation an = bn means equivalence in the exponential order, i.e., limn→∞

1 n

log (an /bn ) = 0. For two sequences {an } and {bn },

the notations an ∼ bn and an . bn mean limn→∞ (an /bn ) = 1 and limn→∞ (an /bn ) ≤ 1, respectively. Finally, the indicator function of an event A will be denoted by 1{A}. B. Model and Problem Formulation Let C = {x0 , . . . , xM −1 } denote a codebook of size M = enR , which is selected at random (and then revealed to the estimator) in the following manner: Each xi is drawn independently under the uniform distribution over the surface of the n-dimensional hyperesphere, which is centered at the origin, and √ whose radius is nPx . Finally, let X assume a uniform distribution over C . We consider the Gaussian vector channel model Y = AX + N ,

(4)

where Y , X and N are random vectors in Rn , designating the channel output vector, the transmitted codeword and the noise vector, respectively. It is assumed that the components of the noise vector, N , are i.i.d., zero-mean, Gaussian random variables with variance 1/β , where β is a given positive constant designating the signal-to-noise ratio (SNR) (for Px = 1), or the inverse temperature in the statisticalmechanical jargon. We further assume that X and N are statistically independent. Finally, the channel matrix, A ∈ Rn×n , is assumed to be a given deterministic Toeplitz matrix, whose entries are given by the coefficients of the impulse response of a given linear system. Specifically, let {hk } denote the generating sequence (or impulse response) of A, so that A = {ai,j }i,j = {hi−j }i,j , and let H (ω) designate the frequency response (Fourier transform) of {hk }. As was mentioned previously, we analyze the problem of mismatched codeword estimation which is formulated as follows: Consider a mismatched estimator which is the conditional mean of X given Y , based on an incorrect joint distribution P ′ (x, y), whereas the true joint distribution continues to be P (x, y). Accordingly, the mismatched MSE is defined as

2 △ mse (X | Y ) = E X − E ′ {X | Y }

(5)

where E ′ {X | Y } is the conditional expectation w.r.t. the mismatched measure P ′ . In this paper, the

following mismatch mechanism is assumed: The input measure is matched, i.e., P (x) = P ′ (x) (namely,

October 16, 2018

DRAFT

6

the mismatched estimator knows the true code), both conditional measures (“channels”) P (· | x) and

P ′ (· | x) are Gaussian, but are associated with different channel matrices. More precisely, while the true

channel matrix (under P ) is A, the assumed channel matrix (under P ′ ) is A′ , another Toeplitz matrix,

generated by the impulse response {h′k }, whose frequency response is H′ (ω). It should be pointed out, however, that the analysis in this paper can be easily carried out also for the case of mismatch in the input distribution, or mismatch in the noise distribution, which has been already considered in [5]. Using the theoretical tools derived in [5], the mismatched MSE (and the MMSE as a special case) will be derived for the model described above. A very important function, which will be pivotal to our derivation of both the mismatched estimator and the MSE, is the partition function, which is defined as follows. Definition 1 (Partition Function) Let λ = (λ1 , . . . , λn )T be a column vector of n real-valued parameters. The partition function w.r.t. the joint distribution P (x, y), denoted by Z (y, λ), is defined as △ X Z (y, λ) = exp λT x P (x, y) . (6) x∈X n In the above definition, it is assumed that the sum (or integral, in the continuous case) converges uniformly at least in some neighborhood of λ = 0 1 . Accordingly, under the above described model, the mismatched partition function is given by △

Z ′ (y, λ) =

X

x∈C

exp λT x P ′ (x, y)

= (2π/β)−n/2

X

x∈C

(7)

h i

2 e−nR exp −β y − A′ x /2 + λT x .

(8)

Remark 1 In the above definition, the role of λ will be understood later on. In a nutshell, the idea [5] is that the gradient of ln Z ′ (y, λ) w.r.t. λ, computed at λ = 0, simply gives the mismatched MSE estimator, E ′ {X | y}, and the expectation of the Hessian of ln Z ′ (y, λ) w.r.t. λ, computed at λ = 0, gives the MSE. Nevertheless, in the next section, where we present the main results, the dependency of the different quantities in λ will not be apparent, as they will already be computed at λ = 0. III. M AIN R ESULTS

AND

D ISCUSSION

In this section, our main results are presented and discussed. The proofs of these results are provided in Section VI. The asymptotic MMSE, which is obtained as a special case of the mismatched case (P = P ′ ), 1

In case that this assumption does not hold, one can instead, parametrize each component λi of λ as a purely imaginary √ number λi = jωi where i = −1, similarly to the definition of the characteristics function. October 16, 2018

DRAFT

7

is given in the following theorem. Theorem 1 (Asymptotic MMSE) Consider the model defined in Subsection II-B, and assume that the sequence {hk }k is square summable. Then, the asymptotic MMSE is given by R 2π Px 1 R > Rc 2π 0 1+|H(ω)|2 Px β dω, mmse (X | Y ) lim = n→∞ n R ≤ Rc 0,

(9)

where

1 Rc = 4π △

Z

0

2π

ln 1 + |H (ω)|2 Px β dω.

(10)

From the above result, it can be seen that for R > Rc the MMSE is simply the estimation error which results by the Wiener filter that would have been applied had the input been a zero-mean, i.i.d. Gaussian process, with variance 1/β . Accordingly, it is also shown in Section VI that the MMSE estimator is exactly the Wiener filter. In the next theorem, we present the mismatched MSE. In contrast to the MMSE, unfortunately, the MSE does not lend itself to a simple closed-form expression. As will be seen in Section VI, this complexity stems from the complicated dependence of the partition function on λ. Nevertheless, despite of the following non-trivial expressions, it should be emphasized that the obtained MSE expression has a singleletter formula, and thus, practically, it can be easily calculated at least numerically. Let us define the following auxiliary variables ′ (ω)|2 β 2 + P β |H (ω)|2 + γ |H 0 x △ Pa (ω) = 2 |H′ (ω)|2 β + γ0

R 2π where γ0 is chosen such that 0 Pa (ω) dω = 2πPx . Next define ′ (ω)|2 β 2 + P β |H (ω)|2 + γ ′ (ω)|2 + γ − 2 |H |H 0 x 0 △ B (ω) = 3 |H′ (ω)|2 + γ0 2 |H′ (ω)|2 |H (ω)|2 + 1 B (ω) 2β β △ C (ω) = r 1 + 4β 2 |H′ (ω)|2 Pa (ω) |H (ω)|2 + β1 i R 2π h Px − C (ω) dω 2 ′ 0 Px γ0 +Px |H (ω)| β △ , ϑ=2+ R 2π B (ω) dω 0 October 16, 2018

(11)

(12)

(13)

(14)

DRAFT

8

and βH′∗ (ω) △ Ξ1 (ω) = − 2 2 ′ |H (ω)| + γ0

2 ϑ −

|H (ω)| + + γ0 Pa (ω) + . r 1 + 4β 2 |H′ (ω)|2 Pa (ω) |H (ω)|2 + β1

|H′ (ω)|2

2

2β 2 |H′ (ω)|2

2

1 β

Let ǫs,0, α1,0 and α2,0 be the solution of the following set of three simultaneous equations: ! Z 2π 1 2ǫs,0 R+ ln dω = 0 4π 0 Px |H′ (ω)|2 α2,0 + 2Px α1,0 ǫs,0 i h Z 2π 4α1,0 ǫ2 + |H′ (ω)|2 α2,0 |H (ω)|2 Px + 1 α2,0 + 2ǫs,0 0,s β 1 dω = Px 2 2π 0 |H′ (ω)|2 α2,0 + 2α1,0 ǫs,0 Z 2π 4α2 ǫ2 1 + Px β |H (ω)|2 + 4 |H′ (ω)|2 α1,0 ǫ2 β + 2 |H′ (ω)|4 α2,0 ǫs,0 β s,0 1,0 s,0 1 dω = 1. 2 2π 0 2 ′ 2βǫs,0 |H (ω)| α2,0 + 2α1,0 ǫs,0

(15)

(16)

(17)

(18)

Then, we define

2 2 △ K (ω) = 2βǫs,0 H′ (ω) α2,0 + 2α1,0 ǫs,0 4 2 △ T (ω) = 4α21,0 ǫ2s,0 1 + Px β |H (ω)|2 + 4β H′ (ω) α1,0 ǫ2s,0 + 2 H′ (ω) α2,0 βǫs,0

2 △ D (ω) = H′ (ω) α2,0 + 2α1,0 ǫs,0 i h 2 △ R (ω) = 4α1,0 ǫ2s,0 + H′ (ω) α2,0 α2,0 |H (ω)|2 Px + 2ǫs,0 2 △ Q (ω) = ǫs,0 Px H′ (ω) α2,0 + 2Px α1,0 ǫs,0 Z 2π Px |H′ (ω)|2 α2,0 △ 1 dω V = 2 2π 0 ǫ ′ s,0 Px |H (ω)| α2,0 + 2Px α1,0 ǫs,0 Z 2π Px |H′ (ω)|2 r2 + 2Px ǫs,0 r1 △ 1 1 dω F = V 2π 0 Px |H′ (ω)|2 α2,0 + 2Px α1,0 ǫs,0 Z 2π 8α1,0 ǫ2 1 + Px β |H (ω)|2 + 4β |H′ (ω)|2 ǫ2 s,0 s,0 △ 1 γ1 = 2π 0 K (ω) 8T (ω) βǫ2s,0 |H′ (ω)|2 α2,0 + 2α1,0 ǫs,0 dω − K2 (ω) Z 2π 2K (ω) βǫs,0 |H′ (ω)|4 − 4T (ω) βǫs,0 |H′ (ω)|2 α2,0 + 2α1,0 ǫs,0 |H′ (ω)|2 △ 1 dω γ2 = 2π 0 K2 (ω) October 16, 2018

(19) (20) (21) (22) (23) (24)

(25)

(26)

(27) DRAFT

9

△

γ3 =

1 2π

Z

2π

0

8α21,0 ǫs,0 1 + Px β |H (ω)|2 + 8βǫs,0 |H′ (ω)|2 α1,0 + 2βα2,0 |H′ (ω)|4 K (ω)

2 2 2 ′ ′ T (ω) 2β |H (ω)| α2,0 + 2α1,0 ǫs,0 + 8βǫs,0 α1,0 |H (ω)| α2,0 + 2α1,0 ǫs,0 dω − K2 (ω) △

Υ (ω) = △

1 2π

△

1 2π

η1 = η2 =

1 η3 = 2π △

−4βα1,0 ǫs,0 α2,0 − βα22,0 |H′ (ω)|2 K2 (ω) Z 2π 4D (ω) ǫ2s,0 − 4R (ω) ǫs,0 dω D3 (ω) 0 i h Z 2π |H′ (ω)|2 |H (ω)|2 Px + 1 α2,0 + 2ǫs,0 β 2 D (ω) 0 |H′ (ω)|2 α2,0 |H (ω)|2 Px + β1 D (ω) − 2R (ω) |H′ (ω)|2 + dω D3 (ω)

Z

2π

0

8D (ω) α1,0 ǫs,0 + 2D (ω) |H′ (ω)|2 α2,0 − 4R (ω) α1,0 dω D3 (ω)

α22,0 D2 (ω) η2 γ3 − γ2 η3 r1 = γ2 η1 − η2 γ1 η1 γ3 − γ1 η3 r2 = γ1 η2 − η1 γ2 η2 Υ (ω) − γ2 Λ (ω) J1 (ω) = γ2 η1 − η2 γ1 η1 Υ (ω) − γ1 Λ (ω) J2 (ω) = γ1 η2 − η1 γ2 R 2π 2ǫ2s,0 Px R 2π dω + J (ω) 2 △ 1 J1 (ω) 0 0 Q(ω) J (ω) = 2π V (1 − F ) △

Λ (ω) =

(28)

(29) (30)

(31)

(32) (33) (34) (35) (36) (37)

2ǫs,0 Px |H′ (ω)|2 dω Q(ω)

△

Ξ2 (ω) = −2J (ω) H′∗ (ω) .

(38) (39)

Finally, let Z 2π Z 2π 1 1 1 ∗ 2 2 ∗ Eg = Px − Re Ξ2 (ω) H (ω) Px dω + |Ξ2 (ω)| |H (ω)| Px + dω, π 0 2π 0 β △

(40)

and Z 2π Z 2π 1 1 1 ∗ 2 2 ∗ Ξ1 (ω) H (ω) Px dω + |Ξ1 (ω)| |H (ω)| Px + Ep = Px − Re dω, π 0 2π 0 β △

October 16, 2018

(41)

DRAFT

10

and we define the following critical rates 1 Re = 4π △

△1

Z

2π 0

2 ln Px γ0 + Px β H′ (ω) dω

1 Rd = + βPx 2 2π

△

+

1 4π

−

1 4π

Z

(42)

2π

Re H′∗ (ω) H (ω) dω 0 ′ (ω)|2 β 2 + β |H (ω)|2 P Z 2π + γ |H 0 x ′ H (ω) 2 β − Px 2 0 |H′ (ω)|2 β + γ0 Z 2π |H′ (ω)|2 β 3 + 2Px β |H (ω)|2 + γ0 0 |H′ (ω)|2 β + γ0

Rc =Re + Rd Z 2π 1 2˜ ǫ △ Rg = − dω ln 4π 0 Px |H′ (ω)|2 α ˜ 2 + 2Px α ˜ 1 ǫ˜

(43)

(44) (45)

where α ˜ 1 and α ˜2 solve the set of two simultaneous equations 1 2π

Z

1 2π

Z

2π 0 2π 0

h i ˜ 2 + 2˜ ǫ 4˜ α1 ǫ˜2 + |H′ (ω)|2 α ˜ 2 |H (ω)|2 Px + β1 α dω = Px 2 |H′ (ω)|2 α ˜ 2 + 2˜ α1 ˜ǫ ˜ 1 ǫ˜2 β + 2 |H′ (ω)|4 α ˜ 2 ǫ˜β 4˜ α21 ˜ ǫ2 1 + Px β |H (ω)|2 + 4 |H′ (ω)|2 α dω = 1. 2 2 ′ 2β˜ ǫ |H (ω)| α ˜ 2 + 2˜ α1 ˜ǫ

(46)

(47)

and ǫ˜ =

1 Px + 2β 4π

Z

2π 0

′ H (ω) − H (ω) 2 dω.

(48)

We are now in a position to state our main theorem.

Theorem 2 (Mismatched MSE) Consider the model defined in Subsection II-B, and assume that the sequence {hk }k is square summable. The (asymptotic) mismatched MSE is given as follows: a) For Rd ≥ 0 0,

mse (X | Y ) = n→∞ n Ep , lim

October 16, 2018

R ≤ Rc

.

(49)

R > Rc DRAFT

11

b) For Rd < 0 0, mse (X | Y ) lim = Eg , n→∞ n E , p

R ≤ Rg Rg < R ≤ Re .

(50)

R > Re

In the jargon of statistical mechanics of spin arrays (see for example [33, Ch. 6]), the ranges of rates R ≤ Rc for Rd ≥ 0, R ≤ Rg for Rd < 0, and R ≤ Rc in the matched case, correspond to the ordered phase (or ferromagnetic phase) in which the partition function is dominated by the correct codeword (and hence so is the posterior). Accordingly, in this range the MSE asymptotically vanishes, which literally means reliable communication. The intermediate range, Rg < R ≤ Re , which appears only in the mismatched case and only for Rd < 0, is analogous to the glassy phase (or “frozen” phase), in which the partition function is dominated by a sub-exponential number of wrong codewords. Intuitively, in this range, we may have the illusion that there is relatively little uncertainty about the transmitted codeword, but this is wrong due to the mismatch (as the main support of the mismatched posterior belongs to incorrect codewords). The remaining range corresponds to the paramagnetic phase, in which the partition function is dominated by an exponential number of wrong codewords. In Section IV, we will link between each one of the two cases Rd ≥ 0 and Rd < 0, to “pessimistic” and “optimistic” behaviors of the receiver, which were already mentioned in the Introduction. It is tempting to think that there should not be a range of rates for which the MSE (MMSE) vanishes, as we deal with an estimation problem rather than a decoding problem. Nonetheless, since codewords are being estimated, and there are a finite number of them, for low enough rates (up to some critical rate) the posterior is dominated by the correct codeword, and thus asymptotically, the estimation can be regarded as a maximum a posteriori probability (MAP) estimation, and so the MSE vanishes. In the same breath, note that this is not the case if mismatch in the input distribution is considered. For example, if the receiver’s assumption on the transmitted energy is wrong, then no matter how low the rate is, there will always be an inherent error which stems from the fallacious averaging over a hypersphere with wrong radius (wrong codebook). Precisely, in this case, the estimated codeword will differ from the real one by p an inevitable scaling of Px′ /Px , where Px′ is the mismatched power.

Finally, it is important to emphasize that the mismatched MSE estimator and the MMSE estimator can

also be obtained as a byproduct of the analysis. However, since they will add only little further insights into the problem, we do not present them here. The interested reader can find their explicit expressions October 16, 2018

DRAFT

12

in Section VI. Remark 2 Although we have assumed that the transmitted codeword has a flat spectrum, the analysis can readily be extended to any input spectral density Sx (ω). In Section VI, we discuss the technical issues that should be considered in order to modify the analysis to hold for this generalization. As a concrete simple example, in the case of MMSE estimation, one obtains R 2π Sx (ω) 1 2π 0 1+|H(ω)|2 Sx (ω)β dω, mmse (X | Y ) lim = n→∞ n 0,

R > Rc

(51) R ≤ Rc

where

1 Rc = 4π △

Z

0

2π

2

ln 1 + |H (ω)| Sx (ω) β dω.

(52)

Nevertheless, our assumption on flat input spectrum is reasonable when there is uncertainty at the encoder concerning the frequency response of the channel, as there are no “preferred” frequencies. Finally, note that as an application of the above issue, one may wish to consider the minimization of the MMSE w.r.t. the input spectral density. IV. E XAMPLES In this section, we provide a few examples in order to illustrate the theoretical results presented in the previous section. In particular, we present and explore the phase diagrams and the MSE’s as functions of the rate and some parameters of the mismatched channel. The main goal in these examples is further understanding of the role of the true and the mismatched probability measures in creating phase transitions. Example 1 We start with a simple example where both H (ω) and H′ (ω) are low-pass filters (LPFs) that differ in their cutoff frequencies and gains H (ω) =

1,

0,

and H′ (ω) =

χ, 0,

|ω| ≤

π 2

,

(53)

|ω| ≤ ωc

(54)

else

else

for some χ > 0 and 0 ≤ ωc ≤ π . In the numerical calculations, we chose β = Px = 1. Figures 1 and 2 show, respectively, the phase diagrams and the MSE’s as functions of R and ωc , for various values of October 16, 2018

DRAFT

13

the gain χ. The first obvious observation is that the maximum range of rates for which the ferromagnetic phase dominates the partition function occurs at ωc = π/2 for each gain, as expected. Next, consider the case of χ = 1, which means that the gain is matched. In this case, it is observed that for ωc ≤ π/2, there are two phases: the ferromagnetic phase and the paramagnetic phase, and hence, based on Theorem 2, Rd ≥ 0. On the other hand, for ωc > π/2, the glassy phase begins to play a role, and thus Rd < 0.

Intuitively speaking, the case of ωc ≤ π/2 corresponds to a pessimistic assumption of the receiver lower bandwidth which translates to lower effective SNR, while ωc > π/2 corresponds to an optimistic assumption - higher effective SNR. These behaviors are consistent with the results obtained in [5], where the case of mismatch in the noise variance was considered (while assuming that A = A′ is the identity matrix). In [5], Rd > 0 simply translates to β > β ′ (the mismatched noise variance is larger than the actual one), namely, the estimator is pessimistic, while in the case of the reversed inequality it is overly optimistic. Accordingly, in the pessimistic case, the partition function exhibits a single phase transition, but at the price of a lower critical rate (compared to the matched case), which means that the range of rates for which reliable communication is possible is smaller. In the optimistic case, however, there is no loss in the critical rate, but there is a price of an additional phase transition. Now, for χ 6= 1, the notions of pessimism and optimism are not a priori obvious. For example, it can be seen that for χ < 1, and for a large enough cutoff frequency ωc , the mismatched estimator can be regarded as an optimistic one. Also, for χ > 1, apparently, the “price” of being too optimistic in the gain results in a dominant range of the glassy phase. Finally, note that the fact that the range of rates for which the ferromagnetic region dominates the partition function (namely, vanishing MSE) is decreasing with the excess of the optimism (e.g., for χ = 1 and increasing of the cutoff frequency) is reasonable2. Indeed, the uncertainty in the frequency domain, causes the receiver to assume that the codewords are distributed in some subspace of the n-dimensional hypersphere. The size of this subspace is, of course, increasing as the receiver’s assumption is more optimistic. Accordingly, the probability of error also increases, and thus the threshold rate for reliable communication decreases.

2

In [5], in contrast to our case, for β < β ′ (Rd < 0), the critical rate Re is fixed for any mismatched noise variance value,

namely, it is independent of the optimistic behavior of the receiver. October 16, 2018

DRAFT

14

χ = 0.75

χ=1

3 2.5

π/2

c

1.5

1.5

1

1 R = Rc

0.5 0

R = Re

2

ω

c

π/2

R = Rg

2.5

R = Re

2

ω

3

R = Rg

0

0.1

0.2

0.3

0.5 0

0.4

R = Rc 0

0.1

0.2

R

0.3

0.4

R χ = 1.5 3

Ferromagnetic

2.5

Glassy

2

π/2

c

ω

Paramagnetic

1.5 1 R = Re

0.5 0

R = Rg 0

0.1

0.2

0.3

0.4

R

Fig. 1.

Example 1: Phase diagram in the plane of R vs. ωc with various gain values. The arrows are directed towards the

boundaries of the various phase transitions.

Example 2 Let H (ω) be a multiband filter 1, H (ω) = 0,

given by ω ± else

3π 8

≤

π 8

or ω ±

7π 8

and let the mismatched filter be given by a band-pass filter 1, ωL ≤ |ω| ≤ ωR ′ H (ω) = , 0, else

≤

π 8

,

(55)

(56)

with constant bandwidth, ωR − ωL = π/8, i.e., smaller than the real one. In the numerical calculations, we again chose β = Px = 1. Figures 3 and 4 show, respectively, the phase diagram and the MSE as functions of R and ωL . First, observe that for ωR < π/4, which means that H′ (ω) and H (ω) October 16, 2018

DRAFT

15

χ = 0.75

χ=1 1

3 2.5 π/2

0.8

2

π/2

c

0.6

ω

ωc

2.5

0.8

2

1

3

1.5 0.4

0.4

1

1 0.2

0.5 0

0.6

1.5

0

0.2

0

0

0.4

0.2

0.5 0

0.2

R

0.4

0

R χ= 1.5 1

3 2.5

0.8

2

ω

c

π/2

0.6

1.5 0.4 1 0.2

0.5 0

0

0.2

0.4

0

R

Fig. 2.

Example 1: Mismatched MSE as a function of R and ωc with various gain values.

are equal to one over non intersecting frequency ranges, there is no ferromagnetic phase, as expected. Accordingly, for ωR > π/4, the ferromagnetic phase begins to play a role, and it can be seen that for π/4 + π/8 < ωR < π/2, which means maximal intersection between the two filters, the range of rates

for which the ferromagnetic phase dominates the partition function is maximal. Since the matched filter has two bands, obviously, the same behavior appears also in the second band. Thus, in this example, we actually obtain two disjoint glassy (and ferromagnetic) regions, which correspond to the two bands of the matched filter. Also, as shown in Fig. 4, in the ranges where no ferromagnetic phase exists, the MSE within the paramagnetic phase is larger than the MSE within the regions where ferromagnetic phase does exists, as one would expect.

Remark 3 Example 2 actually demonstrates that there can be arbitrarily many phase transitions. Generally speaking, for a matched multiband filter with N disjoint bands, and a mismatched bandpass filter (with small enough bandwidth), there are N disjoint glassy and ferromagnetic phases.

October 16, 2018

DRAFT

16

2.5

Rc Ferromagnetic ωR =3π/4

Rg

2

Glassy Paramagnetic

ωL

1.5 ωR = π/2 1 Re ωR = π/4

0.5

Rg

0 0

2

4

6

8

10

R

Fig. 3.

−3

x 10

Example 2: Phase diagram in the plane of R vs. ωL .

Example 3 In this example, we consider more realistic filters. Let H (z) denote a Type-II FIR filter given by (in the Z domain) H (z) = 1 − ej0.8π z −1

and let the mismatched filter be given is H′ (z) = 1 − z0 z −1

1 − z0∗ z −1

2

1 − e−j0.8π z −1

1 − ej0.8π z −1

2

,

1 − e−j0.8π z −1

(57)

(58)

where z0 is a mismatched zero. In the numerical calculations, we chose again β = Px = 1. Fig. 5 shows the amplitude response of the real and the mismatched filters for various angular frequencies defined as △

φ = arg (z0 ). Figures 6 and 7 show, respectively, the phase diagram and the MSE as functions of R and φ. In this example, the roles of the differences between the true and mismatched filters, are emphasized.

Starting with the obvious, observe that the maximal range of rates for which the ferromagnetic region dominates the partition function occurs at φ = 0.8π , as expected. Less trivially, for angular frequencies within the range [0.2π, 0.25π] , the ferromagnetic region is negligible. Looking at Fig. 5, it can be seen that within this range of angular frequencies, the true and the mismatched filters are “almost orthogonal” October 16, 2018

DRAFT

17

1 Rc

2.5

0.9 0.8 ωR =3π/4

2

0.7

R

g

0.6

ωL

1.5 0.5

ωR = π/2

0.4

Re

1

0.3 0.2

ωR = π/4

0.5 R

0.1

g

0 0

2

4

6 R

Fig. 4.

8

10

0

−3

x 10

Example 2: Mismatched MSE as a function of R and ωL .

in the L2 sense, namely, their inner product is almost zero. Accordingly, using the methods in Section VI, it can be easily shown that for orthogonal filters we have that Rg = 0, namely, no ferromagnetic region exists (note that in this example, Rg is never equal to zero since the filters are never orthogonal). Finally, for angular frequencies within the range [0, 0.2π] , the ferromagnetic region returns to play a role. Indeed, Fig. 5 shows that, within this range, the matched and the mismatched filters “share” more similarities (in the sense of larger inner product). Example 4 Let H (z) be given by H (z) = z − 2 cos (0.8π) + z −1 = z · 1 − ej0.8π z −1 1 − e−j0.8π z −1

(59)

H′ (z) = H (z) z −d

(60)

and let the mismatched filter be given as

where d ∈ Z is a mismatched delay. As before, in the numerical calculations, we chose β = Px = 1. October 16, 2018

DRAFT

18

5 Real 0.7 0.5 0.35 0.2 0.3 0.05

4.5 4

Filter Amplitude

3.5 3 2.5 2 1.5 1 0.5 0

Fig. 5.

−3

−2

−1

0 Frequency

1

2

3

Example 3: Amplitude of the real filter and mismatch filters for several phases.

1 0.9 R

Ferromagnetic

c

0.8

Glassy

0.7 Rg

φ

0.6

Paramagnetic

0.5 0.4 0.3

Re

0.2 0.1

R

g

0 0

0.2

0.4

0.6

0.8

1

R

Fig. 6.

Example 3: Phase diagram in the plane of R vs. φ.

October 16, 2018

DRAFT

19

1 0.9

0.9 R

0.8

c

0.8

0.7

0.7 Rg

φ

0.6

0.6

0.5

0.5

0.4

0.4 Re

0.3

0.3

0.2

0.2

0.1

0.1

Rg

0 0

0.2

0.4

0.6

0.8

1

0

R

Fig. 7.

Example 3: Mismatched MSE as a function of R and φ.

Figures 8 and 9 show, respectively, the phase diagram and the MSE as functions of R and d. First, we see that Re is constant, approximately equal to 0.29, which makes sense since Re is given by Z 2π h ′ 2 i 1 ln Px γ0 + H (ω) β dω, Re = 4π 0

(61)

and thus independent of the delay (note that according to (11) γ0 is also independent of the delay). Next, let us take a look at Rd given in (45) Z 2π 1 1 Rd = + βPx Re H′∗ (ω) H (ω) dω 2 2π 0 ′ (ω)|2 β 2 + β |H (ω)|2 P Z 2π + γ |H 0 x ′ 1 H (ω) 2 β + − Px 2 4π 0 2 |H′ (ω)| β + γ0 Z 2π |H′ (ω)|2 β 3 + 2Px β |H (ω)|2 + γ0 1 . − 4π 0 |H′ (ω)|2 β + γ

(62)

0

In contrast to Re , Rd does depend on the delay via the second term, which in the case considered takes

the form Re (H′∗ (ω) H (ω)) = |H (ω)|2 cos (ωd). Actually, in the settings considered, it is easy to show that γ0 = 1/Px = 1, thus obtaining Z 2π Z 2π ′ 1 1 H (ω) 2 dω Rd = Re H′∗ (ω) H (ω) dω − 2π 0 2π 0 October 16, 2018

(63) DRAFT

20

Z 2π ′ ′ H (ω) 2 cos (ωd) dω − 1 H (ω) 2 dω 2π 0 0 Z 2π 1 H′ (ω) 2 [cos (ωd) − 1] dω ≤ 0. = 2π 0 1 = 2π

Z

2π

(64) (65)

Therefore, we obtain that Rd is non-positive, and hence for all φ (except the trivial case of φ = 0) there is a glassy phase. This result is consistent with Figures 8 and 9. More importantly, it can be observed that the MSE vanishes (or equivalently, the ferromagnetic phase dominates the partition function) only in case d = 0, namely, zero delay. This is a reasonable result, as a delay of one sample (linear phase) is enough to cause a serious degradation in the MSE. Actually, for any fixed rate the error is constant, independently of the delay, as one would expect. Finally, note that the MSE is larger in the glassy region than in the paramagnetic region3 . This is also a reasonable result: As the rate increases, and hence more codewords are possible, since the MSE estimator is actually a weighted average (w.r.t. the posterior) over the codewords, the MSE can only decrease (each codeword in the codebook contributes approximately the same estimation error). Accordingly, for small codebooks (low rates) the MSE is larger, since the averaging is performed over “fewer” codewords. V. P ROOF O UTLINE

AND

T OOLS

A. Proof Outline In this section, before getting deep into the proof of Theorem 2, we discuss the techniques and the main steps which will be used in Section VI. Generally speaking, the evaluation of the mismatched partition function, Z ′ (y, λ), for a typical y , essentially boils down to the evaluation of the exponential order of

2 λT X 1 1 ′

Pr y − A X1 − ≈ nǫ (66) 2 β

for every value of ǫ in some range. In case that A′ = I [5, 6], this probability can be calculated fairly easily. Indeed, in this case, the above probability is equivalent to calculating the probability that a randomly chosen vector X on the n-dimensional hypersphere shell would have an empirical correlation coefficient ρ (induced by the constraint ky − xk2 /2 − λT x/β ≈ nǫ) with a given vector y ′ = y + λ/β . Geometrically, this probability is actually the probability that X falls within a cone of half angle arccos (ρ) around y ′ (for more details, see [34, 35]). However, in our case, because of the “interactions”4 between 3 4

Note that the MSE, in contrast to the MMSE, must not be monotonically increasing as a function of the rate. n o 2 In the considered settings, the posterior, is proportional to exp −β ky − A′ xk /2 , and after expansion of the norm, the

exponent includes an “external-field term,” proportional to y H A′ x, and a “pairwise spin-spin interaction term,” proportional to 2

kA′ xk . October 16, 2018

DRAFT

21

10 9

Ferromagnetic

8

Glassy R

e

7

Paramagnetic

d

6 5 4 3 2 1 R

c

0 0

0.2

0.4

0.6

0.8

1

R

Fig. 8.

Example 4: Phase diagram in the plane of R vs. d.

10 0.9 9 0.8

8 R

e

d

7

0.7

6

0.6

5

0.5

4

0.4

3

0.3

2

0.2

1

R

0.1

c

0 0

0.2

0.4

0.6

0.8

1

0

R

Fig. 9.

Example 4: Mismatched MSE as a function of R and d.

October 16, 2018

DRAFT

22

different components of X , which are induced by A′ , the methods in the aforementioned papers are not directly applicable. In our case, the purpose is to estimate the probability that a randomly chosen vector X on the n-dimensional hypersphere shell would fall within the intersection of this hypersphere and the n-dimensional hyperellipsoid (which is induced by the event in (66)). All our attempts to approach this

calculation using the “geometric” route have failed. Thus, we will use a different route. The main idea in our approach is, to “eliminate” the interactions between the different components of X , by passing to the frequency domain. Since A′ is a Toeplitz matrix, according to Szeg¨o’s theorem [36-39], it is asymptotically diagonalized by the discrete Fourier transform (DFT) matrix (if A′ is a circulant matrix then the DFT matrix exactly diagonalizes it). Thus, multiplying both sides of (4) by the √ n−1 DFT matrix, F H = e−j2πml/n / n m,l=0 , we “asymptotically”5 have that ˜ +N ˜ Y˜ = ΣX

(67)

△ △ △ △ ˜ = ˜ = where Σ = diag (σ1 , . . . , σn ), X F H X , Y˜ = F H Y and N F H N . Accordingly, we evaluate

(66), using Pr

(

2 λ ˜T X ˜1 1

˜ 1 ˜ − ΣX ≈ nǫ

−

y 2 β

)

(68)

˜ = F T λ. Now, in order to evaluate (68), it is desirable to estimate the volume6 of the following where λ ˜ ) and δ > 0, we define the conditional δ-type of x ˜ given y ˜ as set: For a given pair of vectors (˜ x, y ( ) k˜ ˜T x ˜ xk2 λ △ y − Σ˜ ˜) = x ˜ ∈ Rn : k˜ Tδ (˜ x|y xk2 − nPx ≤ δ, − − nǫ ≤ δ . (69) 2 β

˜ given y ˜ as it contains all vectors This set is regarded as a conditional type of (wrong) codewords x

which, within δ, have the same energy related to the partition function (8). After calculating the volume of (69), the probability in (68) can then be easily estimated. However, as was previously mentioned, calculating the volume of such a set is a tedious task when approaching it directly. We will use instead ˜ into k bins, each of dimension the following relaxation. We start with partitioning the components of x nb , such that k = n/nb , and we approximate the eigenvalues, which are the diagonal elements of Σ, to

be piecewise constant over these bins. This partition literally means that we transform the original model

5

Rigorously, in the proof, we first assume that A′ is a circulant matrix, and thus (67) is exact for any n. Then, when taking

the limit n → ∞, using Szeg¨o’s theorem, this assumption will be dropped. Finally, note that the assumption of the square summability of the generating sequence {hk } in the theorems presented earlier, is made in order to use Szeg¨o’s theorem. △ R 6 Recall that the volume of a set A ⊂ Rn is defined as Vol {A} = A dx. October 16, 2018

DRAFT

23

in (67) into k subchannels, each having the form Yi,r = σr Xi + Ni , i = (r − 1) nb + 1, . . . , r · nb ,

(70)

for r = 1, . . . , k. With this partitioning in mind, at the final stage of the analysis (after taking the limit n → ∞), we take the limit k → ∞. This partitioning will enable to calculate the desired volume. Then,

using large deviations considerations, the mismatched partition function will be obtained. Finally, in order to derive the MSE, we will use the tools of [5], which are briefly presented in the following subsection.

B. Optimum Estimation Relations - Background and Summary 1) Matched Case: Let X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Ym ) be two random vectors, jointly distributed according to a given probability function P (x, y). The conditional mean estimator of X 2 ˆ = E {X | Y } is well known to minimize the MSE E Xi − X ˆi based on Y , i.e., X for all i = o n 1, . . . , n. Accordingly, the MMSE in estimating Xi equals to E (Xi − E {Xi | Y })2 , i.e., the expected conditional variance of Xi given Y . More generally, the MMSE error covariance matrix is an n × n

matrix whose (i, j)-th element is given by E {(Xi − E {Xi | Y }) (Xj − E {Xj | Y })}. This matrix can be represented as the expectation (w.r.t. Y ) of the conditional covariance matrix of X given Y , henceforth denoted by cov (X | Y ). In particular, using the orthogonality principle, the MMSE error covariance matrix is given by E {cov (X | Y )} = E XX T − E E {X | Y } E X T | Y .

(71)

E {X | Y = y} = ∇0 g (λ) ln Z (y, λ) E {cov (X | Y )} = E ∇20 ln Z (Y , λ)

(72)

Based on Definition 1, the following relations readily follow

(73)

where for a generic function g, we use ∇0 g (λ) and ∇20 g (λ) to designate ∇λ g (λ)|λ=0 and ∇2λ g (λ) λ=0 , respectively, and ∇λ and ∇2λ denote the gradient and Hessian operators w.r.t. λ, respectively. Finally, it is easy to verify that the following relation holds o n E {cov (X | Y )} = E XX T − E [∇0 ln Z (Y , λ)] [∇0 ln Z (Y , λ)]T ,

(74)

and upon taking the trace of the above equation one obtains △

mmse (X | Y ) = October 16, 2018

n X i=1

o n E (Xi − E {Xi | Y })2 DRAFT

24

=

n X i=1

"

E

Xi2

−E

(

∂ ln Z (Y , λ) ∂λi

2

λ=0

)#

.

(75)

Further relations between information measures and estimation quantities can be found in [5, 6]. 2) Mismatched Case: Consider a mismatched estimator which is the conditional mean of X given Y , based on an incorrect joint distribution P ′ (x, y), whereas the true joint distribution continues to be P (x, y). Then, the following relation holds

△ n T o E cov′ (X | Y ) = E X − E ′ {X | Y } X − E ′ {X | Y }

= E XX T − E P E {X | Y } E ′ X T | Y − E E ′ {X | Y } E X T | Y + E E ′ {X | Y } E ′ X T | Y ,

△

(76)

T

where cov′ (X | Y ) = (X − E ′ {X | Y }) (X − E ′ {X | Y }) . Upon taking the trace of (76), one obtains △

mse (X | Y ) = =

n X

E

i=1 " n X i=1

n

2 o Xi − E ′ {Xi | Y }

∂ ln Z (Y , λ) ∂ ln Z ′ (Y , λ) · ∂λi ∂λi λ=0 λ=0 )# ( ∂ ln Z ′ (Y , λ) 2 . +E ∂λi

E Xi2 − 2E

(77)

λ=0

VI. P ROOF

OF

T HEOREM 2

For a given y , the mismatched partition function is given by7 Z ′ (y, λ) =

X

x∈C

i h

2 e−nR exp −β y − A′ x /2 + λT x

h i

2 = e−nR exp −β y − A′ x0 /2 + λT x0 i h X

2 + e−nR exp −β y − A′ x /2 + λT x x∈C\{x0 } △

= Zc′ (y, λ) + Ze′ (y, λ)

7

(78) (79) (80) (81)

Note that there should be a normalization factor of (2π/β)−n/2 in (78). Nonetheless, since this constant is independent of

λ, it has no effect on the MSE (which is obtained by the gradient of ln Z ′ (y, λ) w.r.t. λ). Hence, for simplicity of notation, it is omitted. October 16, 2018

DRAFT

25

where without loss of generality, the transmitted codeword is assumed to be x0 , and Zc′ (y, λ) and Ze′ (y, λ) are the partial partition functions induced by the correct codeword and the wrong codewords, 2

2

respectively. By the law of large numbers (LLN), ky − A′ x0 k ≈ k(A − A′ ) x0 k + n/β , and therefore, with high probability

2 n β ′ T

A − A x0 + e exp − + λ x0 2 β ( ) ! 2 1 β k(A − A′ ) x0 k T = exp −n R + + + λ x0 . 2 2n

· Zc′ (y, λ) =

−nR

(82) (83)

More precisely, for any ǫ > 0, (

) ! 2 1 β k(A − A′ ) x0 k T exp −n R + + + ǫ + λ x0 ≤ Zc′ (y, λ) 2 2n ( ) ! 2 1 β k(A − A′ ) x0 k ≤ exp −n R + + − ǫ + λT x0 2 2n

with probability tending to one as n → ∞. As for Ze′ (y, λ), we have Z ′ −nR Ze (y, λ) = e N (ǫ) e−nβǫ dǫ

(84)

(85)

R

where △

N (ǫ) =

M −1 X i=1

) 2 λT xi ky − A′ xi k − ≈ nǫ , 1 xi : 2 β (

(86) 2

to wit, N (ǫ) is the number of codewords {xi } in C \ {x0 } for which ky − A′ xi k /2 − λT xi /β ≈ nǫ, namely, between nǫ and n (ǫ + dǫ). We proceed in two steps: First, the typical exponential order of N (ǫ) is computed, and then (85) is calculated. Step 1: Given y , N (ǫ) is a sum of (M − 1) i.i.d. Bernoulli random variables and therefore, its expected value is given by M −1 X

(

) 2 ky − A′ X i k λT X i Pr E {N (ǫ)} = − ≈ nǫ 2 β i=1 ( ) 2 λT X 1 ky − A′ X 1 k nR − ≈ nǫ . = e − 1 · Pr 2 β

(87)

(88)

Assuming that A′ is a circulant matrix8 , it is known that the discrete Fourier transform (DFT) matrix diagonalizes it [36-39], and thus multiplying both sides of equation (4) by the DFT matrix, F H , one 8

Recall that this assumption is only an intermediate step in the analysis, and will be dropped later on. Alternatively, instead of

this assumption, one could use the spectral decomposition theorem, to find an orthonormal basis which diagonalizes the matrix A′ , and project (4) on this basis, to obtain the form of (89). October 16, 2018

DRAFT

26

obtains ˜ +N ˜ Y˜ = Σ′ X

(89)

△ △ △ △ ˜ = ˜ = F H X , Y˜ = F H Y and N F H N . Since a unitary operator where Σ′ = diag (σ1′ , . . . , σn′ ), X √ ˜ is still uniformly drawn on the n-hyperesphere with radius nPx (as in the is applied on X , then X

˜ has the same statistics as before, namely, its components are i.i.d. complex original setting). Similarly, N

Gaussian random variables with zero mean and variance 1/β . For simplicity of notation, in the following, the “tilde” sign over the various variables will be omitted, keeping the original notation. Therefore, instead of evaluating (88), the exponential order of Pr

(

) 2 ky − Σ′ X 1 k λT X 1 − ≈ nǫ , 2 β

(90)

will be evaluated, where F T λ 7→ λ9 . For a given pair of vectors (x, y) and δ > 0, define the conditional δ-type of x given y as △

Tδ (x | y) =

(

x ∈ Rn : kxk2 − nPx ≤ δ,

The following lemma is proved in Appendix A.

) ky − Σ′ xk2 λT x − − nǫ ≤ δ . 2 β

(91)

△

Lemma 1 Let k and nb be natural numbers such that k = n/nb 10 . Define the sets G1,δ = △

{δ · i : i = 0, 1, . . . , ⌈kPx /δ⌉} and G2,δ = {δ · i : i = − ⌈k/δ⌉ , . . . , −1, 0, 1, . . . , ⌈k/δ⌉}. Also, let △ Tˆδ (x | y) =

where

Ś

P

[

δ

∩RδP

k ą

δ Bm (Pm , ρm )

(92)

m=1

designates a Cartesian product, and △ δ Bm (Pm , ρm ) =

(

x∈R

nb

2 mn

b : x(m−1)nb +1 − nb Pm ≤ δ,

( ) ) q X σi′ y¯i xi − nb ρm P˜y,m P˜σ,m ≤ δ Re

(93)

i∈Im

9

Note that F T λ may be a complex quantity (in contrast to λ). This fact will be taken into account later on.

10

Without loss of generality, it is assumed that nb (bin length) is a divisor of n, and that the k various bins have equal sizes.

October 16, 2018

DRAFT

27 △

△

λi βσi′ ,

where Im = [(m − 1) nb + 1, mnb ], y¯i = yi∗ +

△ P˜y,m =

and11

1 nb

P

i∈Im

△ |¯ yi |2 , P˜σ,m =

1 nb

P

i∈Im

k ) 1 X k P = P ∈ G1,δ : Pi − Px ≤ δ k i=1 ) ( q k 1 X △ k ρi P˜y,i P˜σ,i − ρ˜ ≤ δ RδP = ρ ∈ G2,δ : k δ △

(

|σi′ xi |2 ,

(94)

(95)

i=1

k and G k are the k th Cartesian power of G where G1,δ 1,δ and G2,δ , respectively, and 2,δ 2 1 Pn ′ △ n i=1 |σi xi | + Py − 2ǫ ρ˜ = 2 P △ where Py = n1 ni=1 |yi |2 . Then,

Tˆδ/k (x | y) ⊆ Tδ (x | y) ⊆ Tˆδ (x | y) .

(96)

(97)

Next, the eigenvalues, {σi′ }i , are approximated to be piecewise constant over the various bins. At the final stage of the analysis (after taking the limit n → ∞), we will take the limit k → ∞ so that this

′ |2 P , and (with approximation becomes superfluous. Accordingly, under this approximation, P˜σ,m = |σm m

abuse of notation) Tˆδk (x | y) =

P

[

δ

∩RδP

k ą

δ Bm (Pm , ρm )

(98)

m=1

where now △ δ Bm (Pm , ρm ) =

(

x∈R

nb

2 mn

b : x(m−1)nb +1 − nb Pm ≤ δ,

( ) ) q X 2 ′ ′ | P ≤δ y¯i xi − nb ρm P˜y,m |σm Re σm m

(99)

i∈Im

and △ RδP =

(

ρ∈

k G2,δ

where 1 △ k

ρ˜ =

11

) k 1 X ′ q σi ρi P˜y,i Pi − ρ˜ ≤ δ , : k

(100)

i=1

′ 2 i=1 |σi | Pi

Pk

2

+ Py − 2ǫ

.

(101)

The purpose of the subscript symbol in RδP is to emphasize the dependence of it on P. More precisely, these sets should

be understood as joint-power-correlation allocations, which are “living” in the intersection P δ ∩ RδP . Accordingly, Pm and ρm are the power and correlation constraints within the mth bin, respectively. October 16, 2018

DRAFT

28

In the following, the volume of Tδ (x | y) is evaluated. On the one hand, using Lemma 1, one obtains that n o Vol {Tδ (x | y)} ≤ Vol Tˆδk (x | y) ) ( k X ą δ ≤ Bm (Pm , ρm ) Vol

(102) (103)

m=1

P δ ∩RδP

≤ Nδ,k · max Vol δ δ P ∩RP

(

k ą

δ Bm (Pm , ρm )

m=1

)

(104)

where the second inequality follows for the union bound, and Nk,δ is a constant depending on k and δ (but not on n). This constant can be roughly bounded by kP + 2δ k k + δ k δ δ x Nδ,k ≤ P RP ≤ . 2· δ δ

(105)

On the other hand,

n o k Vol {Tδ (x | y)} ≥ Vol Tˆδ/k (x | y) ) ( k ą δ/k ≥ max Vol Bm (Pm , ρm ) . P δ/k ∩Rδ/k P

(106) (107)

m=1

The following lemma is proved in Appendix B. Lemma 2 For every m = 1, . . . , k and ν > 0, n o nn nn o o b b 2 δ 2 ln πeϑo,u ≤ Vol Bm (Pm , ρm ) ≤ exp ln πeϑo . (1 − ν) exp 2 2

(108)

where

ϑ2δ,+ = Pm + δ − Pm (ρm − δ)2

(109)

ϑ2δ,− = Pm − δ − Pm (ρm + δ)2 .

(110)

and

In particular, n o 1 1 δ ln Vol Bm (Pm , ρm ) = ln πePm 1 − ρ2m . δ→0 nb →∞ nb 2 lim lim

Now,

Vol

October 16, 2018

(

k ą m=1

)

δ Bm (Pm , ρm )

=

k Y

m=1

n o δ Vol Bm (Pm , ρm ) .

(111)

(112)

DRAFT

29

Whence, using Lemma 2, (112), (104) and (107), one obtains that ) ( k nb X 2 n/2 ln ϑδ/k,− Vol {Tδ (x | y)} ≥ max (πe) exp 2 P δ/k ∩Rδ/k P

(113)

m=1

and Vol {Tδ (x | y)} ≤ Nδ,k · max (πe)n/2 exp δ δ

(

) k nb X ln ϑ2δ,+ . 2

(114)

1 1 lim lim ln Vol {Tδ (x | y)} = ln (πe) + max n→∞ P∩RP δ→0 n 2

(

) k hX ln Pm 1 − ρ2m 2

(115)

P ∩RP

Thus,

where

m=1

m=1

) k 1X Pi = Px P = P ∈R : k i=1 ( ) q k 1 X ′ △ k RP = ρ ∈ R : σi ρi P˜y,i Pi = ρ˜ . k △

(

k

(116)

(117)

i=1

Finally, the probability in (90), is given by ( ) k 2 T ′ Vol Tδ (x | y) λ X1 ky − Σ X 1 k 1 1 n o , − ≈ nǫ = lim lim lim ln lim ln Pr n→∞ n→∞ n h→0 δ→0 2 β n n Vol T

(118)

x,δ

n is the set of n-dimensional x-complex vectors with norm in which Tx,δ

√

nPx .

n is given by Lemma 3 The volume of Tx,δ

n 1 1 ln Vol Tx,δ = ln (πePx ) . δ→0 n→∞ n 2 lim lim

(119)

Proof: Readily follows by using almost the same proof of Lemma 2 (see Appendix B). Thus, applying Lemma 3 on (118), one obtains12 ) ( 2 n o λT X 1 ky − Σ′ X 1 k · ˜ (ǫ) − ≈ nǫ = exp nΓ Pr 2 β with probability tending to one as n → ∞, and ( ) k X h P △ m 2 ˜ (ǫ) = lim max Γ ln 1 − ρm . h→0 P,RP 2 Px

(120)

(121)

m=1

12

Note that at this stage, using once again the dominated convergence theorem (DCT) [40] and Szeg¨o’s theorem [36-39], we △

can refine the bin sizes by taking the limit h = nb /n = 1/k → 0, and then to solve a variational problem. However, it turns out that it is better to refine the bin sizes only at the last stage of the analysis. October 16, 2018

DRAFT

30

Therefore, using (88) n o · ˜ (ǫ) . E {N (ǫ)} = exp n R + Γ

(122)

n o △ ˜ (ǫ) > 0 . E = ǫ∈R: R+Γ

(123)

To finish step 1, the following lemma is proposed and proved in Appendix C13 . Lemma 4 Let

Then,

1 ln N (ǫ) = n→∞ n −∞, lim

with probability (w.p.) 1.

˜ (ǫ) , R + Γ

ǫ∈E

(124)

else

Step 2: Using Lemma 4, (85), and Varadhan’s theorem [41], one obtains that [42, 33, Ch. 2], n o · ˜ (ǫ) − βǫ Ze′ (Y , λ) = e−nR max exp n R + Γ ǫ∈E n o ˜ , = exp n max Γ (ǫ) − βǫ ǫ∈E

(125) (126)

namely, w.p. 1,

n o ln Ze′ (Y , λ) ˜ (ǫ) − βǫ . = max Γ n→∞ ǫ∈E n lim

(127)

Let Γ (ǫ) be defined as in (121), but without the limit over h. It is verified in Appendix D that the maximization and the limit over h can be interchanged, namely, (127) can be rewritten as follows14 ln Ze′ (Y , λ) ∼ lim max {Γ (ǫ) − βǫ} h→0 ǫ∈E n

(128)

with probability tending to one. For simplicity of notation, in the following, the notion of typical sequences is used to describe an event that is happening with high probability. For example, we say that for a typical 13

˜ (ǫ) > 0, then the energy level ǫ will be “typically” populated Lemma 4 simply states that, if we chose ǫ such that, R + Γ

with an exponential number of codewords, concentrated very strongly around its mean E {N (ǫ)}. Otherwise (which means that E {N (ǫ)} is exponentially small), the energy level ǫ will not be populated by any codewords “typically”. 14

Another approach to “handle” the limit over h is, to first prove the theorem for a linear system whose frequency response is

a staircase function (namely, “ignoring evaluate” the limit over h in (118)). Then, using the fact that every frequency response can be approximated arbitrarily well by a sequence of staircase functions with sufficiently small spacing between jumps (Szeg¨o’s theorem), the main theorem is proved. Note that (128) literally means that the partition function for any transfer function is obtained via a limit (w.r.t. h) of a sequence of partition functions corresponding to staircase functions with spacings h. October 16, 2018

DRAFT

31

realization of y , Ze′ (y, λ) is given by the right hand side of (127), with the meaning that it happens with probability tending to one as n → ∞. Also, in the following, in order not to drag the limit over h, it will be omitted and then reverted when it has a role. Next, an explicit expression for Ze′ (y, λ) is derived. Based on (116), (117), and (121), Γ (ǫ) can be rewritten as max

{Pi }ki=1 ,{ρi }ki=1

s.t.

1 k

k hX Pm 2 ln 1 − ρm 2 Px m=1

k X ′ q ′ 2 σm ρm Pm P˜y,m − 1 |σm |2 Px − 1 σm Pm − 1 2 2 2β

m=1

!

= −ǫ

k 1 X Pm = Px . k

(129)

m=1

Proposition 1 Let {µi }ki=1 be a vector of real scalars such that into

P

i µi

= k. Then, (129) can be transformed

k hX Pm 2 max ln 1 − ρm Px {Pi }ki=1 ,{ρi }ki=1 ,{µi }ki=1 2 m=1 ′ q σi ρi Pi P˜y,i − 1 |σi |2 Px − 1 σi′ 2 Pi − 1 = −µi ǫ, i = 1, . . . , k s.t. 2 2 2β k k 1 X 1 X Pm = Px ; µm = 1. k k m=1

(130)

m=1

Proof of Proposition 1: Given a solution of (130), it is to verify that it is feasible for the optimization ∗ , ρ∗ }, of (129), by taking problem given by (129). Conversely, given a solution, {Pm m q 2 1 1 1 ′ | ρ∗ ′ 2 ∗ ∗ ˜ |σm m Pm Py,m − 2 |σm | Px − 2 |σm | Pm − 2β ∗ µm = − , ǫ

(131)

∗ , ρ∗ , µ∗ } is feasible for (130). Thus, the two problems are equivalent. it can be seen that {Pm m m

Using the first constraint in (130), the optimization problem in (130) can be transformed into 2 2 1 1 k ′ |2 P + 1 − µ ǫ X |σ | P + |σ m m x m Pm h m 2 2β 2 q ln 1 − max Px {Pi }ki=1 ,{µi }ki=1 2 ′ | m=1 |σm Pm P˜y,m s.t.

k k 1 X 1X Pi = Px ; µm = 1. k k i=1

(132)

m=1

Therefore, for a typical realization of the vector y , Ze′ (y, λ) is given by ln Ze′ (y, λ) ∼ max max P E ,{µi }∈Mk n October 16, 2018

DRAFT

32

1 k ′ |2 P + X |σm |2 Px + 12 |σm m h ln Pm 1 − 2 q Px 2 ′ | m=1 |σm Pm P˜y,m

1 2β

− µm ǫ

in which

△

Mk =

(

2 − 2βµm ǫ ,

) k X 1 µi = 1 . (µ1 , . . . , µk ) ∈ Rk : k

(133)

(134)

i=1

Using the subadditivity property of the maximum norm one obtains (for typical y ) k ln Ze′ (y, λ) hX . max n P,{µi } 2 m=1 2 1 1 ′ 2 P m 2 |σm | Px + 2 |σm | Pm + q max ln 1− Px E ′ | |σm Pm P˜y,m

1 2β

− µm ǫ

2 − 2βµm ǫ.

(135)

Note that except the subadditivity, in the above optimization the maximization is carried over {µi } ∈ Rk rather than {µi } ∈ Mk (as it should be), hence increasing further the bound. Changing the variables, µm ǫ 7→ ǫm , the values of ǫm for which the derivative vanishes are the solutions of the following equation 1 ′ |2 P − ǫm + 12 |σm |2 Px + 21 |σm 2 2β m (136) 2 ! − 2β = 0, 1 ′ |2 P + 1 −µ ǫ |σm |2 Px + 21 |σm m m 2 ˜ 2 2β ′ √ |σm | Py,m Pm 1 − Pm P˜y,m

′ | |σm

which after simple algebra, boils down to a quadratic equation whose solutions are q ′ |2 βP + |σ |2 βP + ′ |2 P ˜y,m Pm 2 + |σm 1 + 4β 2 |σm m m x ∗ ǫm,1 = 2β q ′ |2 βP + |σ |2 βP − ′ |2 P ˜y,m Pm 1 + 4β 2 |σm 2 + |σm m m x ∗ ǫm,2 = . 2β

(137)

(138)

Substitution of ǫ∗m,1 in the objective function of (135) reveals that ǫ∗m,1 is not in the objective function domain, and thus only ǫ∗m,2 is considered. In the following, the case ǫ∗m,2 ∈ E is first analyzed. Substituting ǫ∗m,2 in (135), one obtains (for typical y ) k ′ X Pm h ln Ze (y, λ) . max ln 1− Px n P,{µi } 2

1 2

2

|σm | Px +

1 2

′ |2 P |σm m

′ | |σm

m=1

+ p Pm Py,m

1 2β

−

ǫ∗m,2

! 2 − 2βǫ∗m,2 .

(139)

Let γ be the Lagrange multiplier associated with the power constraint. Then, the derivative of the objective function in (139) w.r.t. Pm is given by ′ 2 β+ − σm October 16, 2018

q ′ |2 β 2 P ∗ P ˜ 1 − 1 + 4 |σm m y,m ∗ 2Pm

− γ = 0,

(140)

DRAFT

33

which vanishes at ′ |2 β 1 + β P ˜y,m + γ |σm ∗ Pm = , (141) 2 ′ |2 β + γ |σm P independently of ǫ∗m,2 , and γ is chosen such that i Pi = kPx . Therefore (for typical y ), 2 2 ∗ 2 1 1 1 k ′ ∗ ∗ ′ X h Pm ln Ze (y, λ) 2 |σm | Px + 2 |σm | Pm + 2β − ǫm q . ln 1− − 2βǫ∗m Px n 2 ′ | ∗P ˜y,m m=1 |σm Pm △

= Fpar ,

(142)

△

∗ ). Hence, an upper bound, F , on ln Z ′ (y, λ) /n is obtained. On the other hand, where ǫ∗m = ǫ∗m,2 (Pm par e

by taking ∗

ǫ = µ∗m

Pk

∗ i=1 ǫi

(143)

k kǫ∗ = Pk m , ∗ i=1 ǫi

(144)

and (141), this bound is achieved. Summarizing the above results, Ze′ (y, λ) is given by (for typical y ) Fpar , Γ (ǫ∗ ) + R > 0 ′ ln Ze (y, λ) . (145) ∼ n ∗ Γ (ǫ ) + R ≤ 0 Γ (ǫs ) − βǫs ,

Since at the final step of the calculation, the partition function (or its derivative w.r.t. λ) is evaluated at

λ = 0, the range Γ (ǫ∗ ) + R > 0 should be computed at the vicinity of λ = 0. First, note that P˜y,i , given

in Lemma 1, can be written as P˜y,i

1 2 1 = |σi |2 Px + + Re β β nb

1 σi′

Hence, substituting λ = 0 in (141), one obtains ∗ Pm |λ=0

where γ0 is chosen such that

X

r∈i-th bin

+

1 1 2n ′ 2 β |σi | b

′ |2 β 1 + β P ˜y,m + γ0 |σm λ=0 = 2 ′ |2 β + γ |σm 0 ′ |2 β 2 + β |σ |2 P |σm m x + γ0 = 2 ′ |2 β + γ |σm 0 kPx =

X m

October 16, 2018

y r λr

!

∗ Pm |λ=0 .

X

r∈i-th bin

|λr |2 .

(146)

(147)

(148)

(149)

DRAFT

34 ∗ ∗ ∗| Substitution of Pm λ=0 and ǫm ( Pm |λ=0 )|λ=0 in Γ (ǫ), reveals that k ′ |2 P ∗ | ∗| 1 |σ |2 Px + 12 |σm h X Pm m λ=0 + λ=0 1 − 2 m r Γ (ǫ∗ )|λ=0 = ln 2 Px ′ | ∗| ˜y,m m=1 |σm Pm P λ=0

1 2β

λ=0

2 − ǫ∗m |λ=0 ,

(150)

and that ǫ∗m |λ=0 =

2+

′ |2 β |σm

∗| Pm λ=0

q ′ |2 P ∗| ˜y,m Pm + |σm | βPx − 1 + 4β 2 |σm λ=0 2

2β

Then, substituting (151) in the mth term of the sum in (150), it becomes r ′ |2 β 2 P ∗ | ˜y,m − 1 1 + 4 |σ P m m λ=0 λ=0 , ln 2 2 ˜ ′ 2Px |σm | β Py,m

.

(151)

(152)

λ=0

which after substitution of (147), boils down to

1 . ′ |2 P β Px γ0 + |σm x

(153)

Hence, substituting (153) in (150), one obtains Γ (ǫ∗ )|λ=0 = −

Accordingly, the region

Γ (ǫ∗ ) +

(154)

m=1

R ≤ 0 is equivalent to

R≤

and hence

k ′ 2 hX Px β . ln Px γ0 + σm 2

k △ ′ 2 hX Px β = ln Px γ0 + σm Re , 2

(155)

m=1

Fpar ,

ln Ze′ (y, λ) ∼ n Γ (ǫs ) − βǫs ,

R > Re .

(156)

R ≤ Re

The next step in the evaluation of Z ′ (y, λ), is taking into account Zc′ (y, λ). To this end, the following relation is used ln e−na + e−nb = − min (a, b) . lim n→∞ n

(157)

Accordingly, within the range R > Re , for a typical code and realizations of the vector y , we search rates for which Zc′ (y, 0) > Ze′ (y, 0), namely, ln Zc′ (y, 0) > Fpar |λ=0 . n October 16, 2018

(158)

DRAFT

35

Recall that Fpar |λ=0 is given by Fpar |λ=0 = Γ (ǫ∗ ) − βǫ∗ =−

and that

(159)

k o ′ 2 h Xn β + 2βǫ∗m | ln Px γ0 + Px σm λ=0 , 2

(160)

m=1

2 1 β ln Zc′ (y, 0) ′

A − A x0 =− R+ + n 2 2n = −R −

Hence the inequality in (158) becomes

k 2 1 hβ X ′ − σm − σm Px . 2 2

(162)

m=1

k k ′ 2 1 hβ X ′ hX σm − σm 2 Px ln Px γ0 + Px σm β − − R< 2 2 2 m=1

h + 2

(161)

m=1

k X

m=1

q ′ 2 2 ∗ ′ |2 P ∗| ˜y,m Pm 2 + σm β Pm |λ=0 + |σm | βPx − 1 + 4β 2 |σm λ=0

(163)

k k ′ X ′ 2 1 hX Re σm∗ σm Px = ln Px γ0 + Px σm β + + hβ 2 2 m=1

m=1

+

Substituting

hβ 2

∗, Pm

k X

m=1

′ 2 ∗ σm ( Pm |

λ=0 − Px ) −

h 2

k X

m=1

q ′ |2 P ∗| ˜y,m Pm 1 + 4β 2 |σm λ=0 .

(164)

given in (148), in the last two terms of (164), one obtains R<

k k ′ X ′ 2 1 hX β + + hβ Re σm∗ σm Px ln Px γ0 + Px σm 2 2 m=1 m=1 2 2 ′ k hβ X ′ 2 |σm | β 2 + β |σm | Px + γ0 + σm β − Px 2 2 2 ′ | β+γ m=1 |σm 0 2 ′ 2 k h X |σm | β 3 + 2Px β |σm | + γ0 − . 2 |σ ′ |2 β + γ m=1

m

(165)

0

Refining the bin sizes by taking the limit h → 0, while using Szeg¨o’s theorem, it is shown in Appendix E that (165) becomes △

R < Re + Rd = Rc

(166)

where △

Re = October 16, 2018

1 4π

Z

2π 0

2 ln Px γ0 + H′ (ω) Px β dω,

(167) DRAFT

36

and 1 △1 Rd = + βPx 2 2π +

1 4π

−

1 4π

Z

2π

Re H′∗ (ω) H (ω) dω 0 2 2 ′ Z 2π |H (ω)| β 2 + β |H (ω)| Px + γ0 ′ H (ω) 2 β − Px 2 0 |H′ (ω)|2 β + γ0 Z 2π |H′ (ω)|2 β 3 + 2Px β |H (ω)|2 + γ0 . 0 |H′ (ω)|2 β + γ0

(168)

Hence, within the range R > Re , Zc′ (y, 0) > Ze′ (y, 0) (again, typical code and realization vector y ) for {R < Rc } ∩ {R > Re } = {Re < R < Re + Rd = Rc } ,

(169)

which is a non-empty set if Rd is positive. Next, within the range R ≤ Re , Zc′ (y, 0) > Ze′ (y, 0) for rates which satisfy (for typical code and realization of y ) ln Zc′ (y, 0) > Γ (ǫs ) − βǫs |λ=0 . n

(170)

First, recall that ǫs satisfies R + Γ (ǫs ) = 0, and hence Γ (ǫs ) = −R. Thus, (170) can be rewritten as −R −

which is equivalent to

k 2 1 hβ X ′ − σm − σm Px > −R − βǫs |λ=0 , 2 2

ǫs |λ=0 >

Applying Γ (·) to (172), one obtains

k 2 h X ′ 1 + σm − σm Px . 2β 2 k 2 1 h X ′ + σm − σm Px 2β 2 m=1

and hence R<−Γ

Γ

m=1

max k

{Pi }i=1 ,{ρi }ki=1

s.t.

October 16, 2018

k 2 1 h X ′ + σm − σm Px 2β 2

k 2 h X ′ 1 + σm − σm Px 2β 2

1 k

m=1

!

(172)

m=1

Γ ( ǫs |λ=0 ) > Γ

where

(171)

m=1

!

!

,

(173)

= Rg ,

(174)

△

λ=0

=

k hX Pm ln 1 − ρ2m 2 Px m=1

k ′ X ′ q ′ 2 ∗ σm ρm Pm P˜y,m − 1 σm (Pm − Px ) − Re σm σm Px 2

m=1

!

=0

DRAFT

37 k 1 X Pm = Px . k

(175)

m=1

To conclude, Z ′ (y, λ) is given by (for a typical code and y ) Fpar , R > Re ∨ Rc ln Z ′ (y, λ) Rg < R ≤ Re ∼ Γ (ǫs ) − βǫs , n ′ {Re ≤ R ≤ Rc } ∪ {R ≤ Rg ∧ Re } ln Zc (y, λ) /n, △

(176)

△

where a ∨ b = max (a, b) and a ∧ b = min (a, b). In the following, the relation Rd > 0 =⇒ Re < Rg ,

(177)

is verified. Recall that Rd follows from the requirement that 2 β 1 ln Zc′ (y, 0) ′

A − A x0 − R+ + = 2 2n n

(178)

≥ Fpar |λ=0

(179)

= Γ (ǫ∗ ) − βǫ∗ |λ=0 = −Re − βǫ∗ |λ=0 ,

(180)

which can be rewritten as ∗

R ≤ Re + βǫ |λ=0 −

and thus Rd is given by ∗

Rd = βǫ |λ=0 −

Accordingly, Rd > 0 is equivalent to βǫ∗ |λ=0 >

2 1 β ′

+ A − A x0 , 2 2n

2 β 1 ′

A − A x0 . + 2 2n

β 1

A − A′ x0 2 . + 2 2n

(181)

(182)

(183)

Now, within the range R ≤ Re , Zc′ (y, 0) ≥ Ze′ (y, 0) if (172) βǫs |λ=0 >

β 1

A − A′ x 0 2 . + 2 2n

(184)

However, R ≤ Re is equivalent to ǫ∗ ∈ / E , and thus ǫs |λ=0 ≥ ǫ∗ |λ=0 . Therefore, if Rd > 0, the following holds β 1

A − A′ x 0 2 . βǫs |λ=0 ≥ βǫ∗ |λ=0 > + 2 2n October 16, 2018

(185)

DRAFT

38

Whence, (184) holds true within the whole region R ≤ Re , and therefore Re < Rg . Thus, for Rd > 0, Z ′ (y, λ) becomes (for a typical code realization y ) ln Z ′ (y, λ) Fpar , ∼ n ln Z ′ (y, λ) /n, c

R > Rc .

(186)

R ≤ Rc

If however, Rd < 0, then Rg ≤ Re , and hence (for a typical code realization y ) Fpar , R > Re ln Z ′ (y, λ) ∼ −R − βǫs , Rg < R ≤ R e . n ln Z ′ (y, λ) /n, R ≤ Rg c

(187)

Recall that ǫs is the solution of the equation

Γ (ǫs ) + R = 0,

(188)

where Γ (ǫs ) is given by max

{Pi }ki=1 ,{ρi }ki=1

s.t.

1 k

k Pm hX 2 ln 1 − ρm 2 Px m=1

k X ′ q ′ 2 σm ρm Pm P˜y,m − 1 |σm |2 Px − 1 σm Pm − 1 2 2 2β

m=1

!

= −ǫs

k 1 X Pm = Px . k

(189)

m=1

Similarly to the optimization problem in (132), the above maximization problem can be rewritten as 2 2 1 1 k ′ |2 P + 1 − µ ǫ |σ | P + |σ m x m m s h X Pm m 2 2β 2 q ln max 1 − Px {Pi }ki=1 ,{ρi }ki=1 2 ′ m=1 σm Pm P˜y,m s.t.

k k 1 X 1 X Pm = Px , µm = 1. k k m=1

(190)

m=1

Accordingly, the derivative of the objective function w.r.t. Pm vanishes at ′ |2 α ˜ 4α1 ǫ2s + |σm 2 Py,m α2 + 2ǫs ∗ Pm = 2 ′ |2 α + 2α ǫ |σm 2 1 s

(191)

and the derivative w.r.t. µm it vanishes at 2 ′ |2 α ǫ ˜ 4α21 ǫ2s 1 + Px β |σm |2 + 4 |σm 1 s α2 − β Py,m α2 + βǫs + Px βα2 |σm | µ∗m = 2 ′ |2 α + 2α ǫ 2βǫs |σm 2 1 s October 16, 2018

DRAFT

39

2 ′ |4 α ˜ |σm 2 α2 − β Py,m α2 + 2βǫs + Px βα2 |σm | + (192) 2 ′ |2 α + 2α ǫ 2βǫs |σm 2 1 s P P ∗ ∗ , and α is chosen such that k = where α1 is chosen such that kPx = m Pm 2 m µm . Substituting the

above maximizers in the objective function one obtains k 2ǫs hX ln Γ (ǫs ) = . ′ |2 α + 2P α ǫ 2 P |σ x 2 x 1 s m m=1

For completeness, a closed-form expression for Rg is derived. Based on (174) ! k 2 h X ′ 1 + σm − σm Px Rg = − Γ . 2β 2 m=1

(193)

(194)

λ=0

Using (193), and upon taking the limit h → 0 (while using Szeg¨o’s theorem, as was done in (168)) Z 2π 1 2˜ ǫ dω (195) Rg = − ln 4π 0 Px |H′ (ω)|2 α ˜ 2 + 2Px α ˜ 1 ˜ǫ where α ˜ 1 and α ˜2 solve the simultaneous equations h i 2 1 2 + |H′ (ω)|2 α Z 2π 4˜ |H (ω)| P + α ǫ ˜ ˜ α ˜ + 2˜ ǫ x 1 2 2 β 1 dω = Px 2 2π 0 |H′ (ω)|2 α ˜ 2 + 2˜ α1 ǫ˜ 2 1 + P β |H (ω)|2 + 4 |H′ (ω)|2 α 2˜ Z 2π 4˜ ˜ 1 ǫ˜2 β + 2 |H′ (ω)|4 α ˜ 2 ǫ˜β ǫ α x 1 1 dω = 1, 2 2π 0 2 ′ 2β˜ ǫ |H (ω)| α ˜ 2 + 2˜ α1 ˜ǫ

(196)

(197)

and

ǫ˜ =

1 Px + 2β 4π

Z

2π 0

′ H (ω) − H (ω) 2 dω.

(198)

Obtaining Z ′ (y, λ), using the tools presented in Subsection V-B, the MSE is now derived. The MSE estimator of the ith component (chip) of x′ , within the q th bin, is given by the derivative of Z ′ (y, λ) w.r.t. λqi evaluated at λ = 015 . The derivative of Fpar is given by ( k ∂Fpar hX Px ∂γ =− 2 ∂λqi λ=0 2 Px γ0 + Px σ ′ β ∂λqi l=1

15

A very similar analysis applies also to the derivative

weights proportional to EN (ǫ)e

−βǫ

l

∂ ∂λi

λ=0

) ∂ψl + . ∂λqi λ=0

(199)

ln Z (y, λ), which is essentially a weighted average over xi with

for ǫ ∈ E . Thus, the exponentially dominant weight is due to the term that maximizes

the exponent [5, 6]. Hence, in this case, the commutativity between the derivative w.r.t. λ and the limit n → ∞ is legitimate. Another approach to justify the interchange of the order of these operations is to use well-known results (for example, [43, Ch. 16],[44, 45]) on functional properties of a limit function, which are applicable in our case due to the uniform convergence of the various relevant terms (see Appendix D). October 16, 2018

DRAFT

40 △

Let xq = ∂γ/∂λqi |λ=0 . Using (141), one obtains 2 σq′ 2 β 2 ∂ P˜y,q σq′ 2 + γ0 ∗ + x q ∂Pq ∂λqi λ=0 = 4 ∂λqi λ=0 σ ′ 2 + γ0 q 2 2 + γ0 x q 2 σq′ + γ0 σq′ β 1 + β P˜y,q λ=0 , − 4 2 σq′ + γ0 and for l 6= q

∂Pl∗

∂λqi λ=0

=

|σl′ |2 + γ0

where by using (146)

2

+ γ0 xq xq − 2 |σl′ |2 + γ0 |σl′ |2 β 1 + β P˜y,l λ=0 4 2 σ ′ + γ0 l ∂ P˜y,q 2 yqi = ∂λqi βσq′ nb λ=0 = 1 + |σq |2 Px β. β P˜y,q

(203)

Pr∗ |λ=0 = kPx , it follows that k k X ∂Pr∗ ∂ X ∗ = Pr 0= ∂λqi ∂λqi r=1 r=1 λ=0 λ=0 2 ′ 2 2 ∂ P˜y,q σq β σq′ 2 + γ0 ∂λqi λ=0 = 4 σq′ 2 + γ0 2 ′ |2 β 1 + β P ′ |2 + γ ′ |2 + γ ˜ k + γ |σ |σ − 2 |σ X 0 y,r 0 0 r r r λ=0 , + xq 4 2 ′ r=1 |σr | + γ0

and thus

P

r

′ 2 2 ∂ P˜y,q ′ σq β 2σq∗ β ∂λqi λ=0 xq = = 2 2 yqi , 2 ′ σq′ 2 + γ0 C n b σ q + γ0 C

where △

C=

k X r=1

+ γ0 |σr′ |2 + γ0 − 2 |σr′ |2 β 1 + β P˜y,r λ=0 . 3 2 ′ |σr | + γ0

λ=0

(204)

(205)

(206)

Next, ∂ψl /∂λqi |λ=0 , is calculated. Using the definition of ψm in (11) one obtains ∗ ∂Pq∗ 2 σ ′ 2 ∂Py,q ∗ ˜ ∗ 2β + P P y,q 2 ∂Pq q q λ=0 ∂λq ∂ψq λ=0 ∂λqi λ=0 , r i λ=0 = σq′ β − 2 ∂λqi λ=0 ∂λqi λ=0 1 + 4β 2 σq′ P˜y,q Pq∗ October 16, 2018

(201)

(202)

λ=0

Since γ0 is chosen to satisfy

(200)

(207)

(208)

λ=0

DRAFT

41

and for l 6= q

2 |σ ′ |2 P ˜ 2β ∗ y,l ′ 2 ∂Pl l ∂ψl λ=0 = σl β −r 2 ∂λqi λ=0 ∂λqi λ=0 1 + 4β 2 σl′ P˜y,l

∂Pl∗ ∂λqi λ=0

λ=0

.

(209)

Pl∗ λ=0

Substituting (206), (208) and (209) in (199), the MSE estimator in the range R > Rc and R > Re , for Rd > 0 and Rd < 0, respectively, (note that all the terms are dependent on yqi linearly via xq ) is given

by16 ∂nFpar E {Xqi | Y } ∼ = ξ1,q Yqi ∂λqi λ=0 ′

where ξ1,q

(210)

) ( ′ k X 2σq∗ β Px 1 = − 2 + Bl − Cl 2 ′ 2 ′ 2 σq + γ0 C l=1 Px γ0 + Px σl β 2 σq′ 2 + γ0 Pq∗ 2 1 λ=0 − 1− r 2 2 P ∗ 1 + 4β 2 σ ′ P˜y,q q

λ=0

q λ=0

2 ′ 2β 2 σq′ P˜y,q 2σq∗ β λ =0 r − 2 , 2 2 σq′ + γ0 1 + 4β 2 σq′ P˜y,q Pq∗ λ=0

(211)

λ=0

with

2

+ γ0 − 2 |σl′ |2 + γ0 |σl′ |2 β 1 + β P˜y,l λ=0 Bl = 4 σ ′ 2 + γ0 l 2β 2 |σl′ |2 P˜y,l Bl λ=0 Cl = r . 2 ′ ∗ 2 ˜ 1 + 4β σl Py,l Pl λ=0

|σl′ |2 + γ0

(212)

(213)

λ=0

Next, the MSE estimator in the region Rg < R ≤ Re for Rd < 0 is derived. The derivative of the partition function w.r.t. λqi is given by ∂Fglas ∂ǫs =−β . ∂λqi λ=0 ∂λqi λ=0

(214)

Recall that ǫs is the solution of the equation

Γ (ǫs ) + R = 0, 16

(215)

The relation between the right and the left hand sides of (210) is an asymptotic equality between two random variables, in

the sense that the difference between them converges to zero w.p. 1. October 16, 2018

DRAFT

42

where Γ (ǫs ) is given as max

{Pi }ki=1 ,{ρi }ki=1

s.t.

1 k

k Pm hX 2 ln 1 − ρm 2 Px m=1

k X ′ q ′ 2 σm ρm Pm P˜y,m − 1 |σm |2 Px − 1 σm Pm − 1 2 2 2β

m=1

!

= −ǫs

k 1 X Pm = Px . k

(216)

m=1

Similarly to the optimization problem in (132), the maximization problem in (216) can be rewritten as 2 2 1 1 k ′ |2 P + 1 − µ ǫ X |σ | P + |σ m x m m s Pm h m 2 2β 2 q ln 1 − max Px {Pi }ki=1 ,{ρi }ki=1 2 ′ m=1 σm Pm P˜y,m s.t.

k k 1 X 1 X Pm = Px , µm = 1. k k m=1

(217)

m=1

The derivative of the objective function w.r.t. Pm vanishes at ′ |2 α ˜ 4α1 ǫ2s + |σm 2 Py,m α2 + 2ǫs ∗ Pm = 2 ′ |2 α + 2α ǫ |σm 2 1 s

(218)

the above maximizers in the objective function of (217) one obtains k 2ǫs hX ln Γ (ǫs ) = . ′ |2 α + 2P α ǫ 2 Px |σm 2 x 1 s m=1

(220)

k 2ǫs hX ln + R = 0. ′ |2 α + 2P α ǫ 2 Px |σm 2 x 1 s

(221)

and the derivative w.r.t. µm , vanishes at 2 ′ |2 α ǫ ˜ 4α21 ǫ2s 1 + Px β |σm |2 + 4 |σm 1 s α2 − β Py,m α2 + βǫs + Px βα2 |σm | µ∗m = 2 ′ |2 α + 2α ǫ 2βǫs |σm 2 1 s 2 ′ |4 α ˜ |σm 2 α2 − β Py,m α2 + 2βǫs + Px βα2 |σm | (219) + 2 ′ |2 α + 2α ǫ 2βǫs |σm 2 1 s P P ∗ where α1 is chosen to such that kPx = m Pm , and α2 is chosen such that k = m µ∗m . Substituting

Thus, (215) becomes

m=1

△

△

△

△

△

Let xq = ∂ǫs /∂λqi |λ=0 , α˙ 1,q = ∂α1 /∂λqi |λ=0 , α˙ 2,q = ∂α2 /∂λqi |λ=0 , α1,0 = α1 |λ=0 , α2,0 = α2 |λ=0 △

and ǫs,0 = ǫs |λ=0 . Differentiating (221) w.r.t. λqi one obtains k ′ |2 α X Px |σm 2,0 + 2Px α1,0 ǫs,0 0= 2ǫs,0 m=1

October 16, 2018

DRAFT

43

′ 2 ˙ ′ |2 α ˙ 1,q ǫs,0 + 2Px α1,0 xq 2 Px |σm 2,q + 2Px α 2,0 + 2Px α1,0 ǫs,0 xq − 2ǫs,0 Px |σm | α × 2 ′ |2 α Px |σm 2,0 + 2Px α1,0 ǫs,0 k Px |σ ′ |2 α2,0 xq − ǫs,0 Px |σ ′ |2 α ˙ + 2P α ˙ ǫ X 2,q x 1,q s,0 m m = , 2 ′ | α ǫs,0 Px |σm m=1 2,0 + 2Px α1,0 ǫs,0

(222)

(223)

and thus

U1 U2

(224)

k ′ |2 α X Px |σm ˙ 2,q + 2Px α˙ 1,q ǫs,0 2 ′ P |σ | α2,0 + 2Px α1,0 ǫs,0 m=1 x m

(225)

xq =

where △

U1 =

and △

U2 =

k X

m=1 ǫs,0

′ |2 α Px |σm 2,0

′ |2 α Px |σm 2,0 + 2Px α1,0 ǫs,0

.

(226)

Hence, in order to calculate xq one needs to find ǫs,0 , α1,0 , α2,0 , α˙ 1,q , α˙ 2,q . The terms ǫs,0 , α1,0 , α2,0 are calculated using the set of simultaneous equations Γ (ǫs )|λ=0 + R = 0

(227a)

k 1 X ∗ Pm |λ=0 = Px k

(227b)

m=1

k 1 X ∗ µm |λ=0 = 1, k

(227c)

m=1

and accordingly, the terms α˙ 1,q , α˙ 2,q are calculated using the set of equations k ∗ X ∂Pm =0 ∂λqi λ=0 m=1 k X ∂µ∗m = 0. ∂λqi λ=0

(228a)

(228b)

m=1

Given ǫs,0, α1,0 , α2,0 , closed-form expressions for α˙ 1,q , α˙ 2,q are now derived. Using (218), (228a) can be written as

△ where P˜˙y,q = ∂ P˜y,q /∂λqi

λ=0

△

η1 =

2 η1 α˙ 1,q + η2 α˙ 2,q + η3 xq + ηq σq′ P˜˙ y,q = 0

(229)

, and

k X 4Dm ǫ2s,0 − 4Rm ǫs,0 3 Dm

(230)

m=1 October 16, 2018

DRAFT

44

△

η2 =

k Dm |σ ′ |2 X m

m=1

+ △

η3 =

h

|σm |2 Px + 3 Dm

1 β

2 ′ |2 α |σm 2,0 |σm | Px +

α2,0 + 2ǫs,0

1 β

3 Dm

i

′ |2 Dm − 2Rm |σm

k ′ |2 α X 8Dm α1,0 ǫs,0 + 2Dm |σm 2,0 − 4Rm α1,0 3 Dm

m=1 2 △ α2,0 ηq = 2 , Dq

(231) (232) (233)

in which △ ′ 2 Dm = σm α2,0 + 2α1,0 ǫs,0 i h ′ 2 △ α2,0 α2,0 |σm |2 Px + 2ǫs,0 . Rm = 4α1,0 ǫ2s,0 + σm

(234) (235)

Similarly, using (219), (228b) can be written as

2 γ1 α˙ 1,q + γ2 α˙ 2,q + γ3 xq + γq σq′ P˜˙y,q = 0

where △

γ1 =

2 ′ |2 ǫ2 k 8α1,0 ǫ2 + 4β |σm 1 + P β |σ | X x m s,0 s,0 Km

m=1

− △

γ2 =

(236)

′ |2 α 8Tm βǫ2s,0 |σm 2,0 + 2α1,0 ǫs,0 2 Km

2 4 ′ |2 ′ ′ k X 2Km βǫs,0 |σm | − 4Tm βǫs,0 |σm | α2,0 + 2α1,0 ǫs,0 |σm

m=1

2 Km

2 k ′ 4 ′ |2 α + 8βǫs,0 |σm X 8α21,0 ǫs,0 1 + Px β σm 1,0 + 2βα2,0 |σm | γ3 = Km m=1 2 2 2 ′ ′ Tm 2β |σm | α2,0 + 2α1,0 ǫs,0 + 8βǫs,0 α1,0 |σm | α2,0 + 2α1,0 ǫs,0 − 2 Km 2 ′ 2 △ −4βα1,0 ǫs,0 α2,0 − βα2,0 σq , γq = Kq2

(237)

(238)

△

in which

2 △ ′ 2 Km = 2βǫs,0 σm α2,0 + 2α1,0 ǫs,0 ′ 4 ′ 2 △ α2,0 βǫs,0. α1,0 ǫ2s,0 + 2 σm Tm = 4α21,0 ǫ2s,0 1 + Px β |σm |2 + 4β σm October 16, 2018

(239) (240)

(241) (242)

DRAFT

45

Thus, solving the pair of equations, (229) and (236), one obtains γ3 /γ2 − η3 /η2 xq + η1 /η2 − γ1 /γ2 γ3 /γ1 − η3 /η1 xq + = η2 /η1 − γ2 /γ1

α˙ 1,q = α˙ 2,q

γq /γ2 − ηq /η2 η1 /η2 − γ1 /γ2 γq /γ1 − ηq /η1 η2 /η1 − γ2 /γ1

′ 2 ˙ △ σq P˜y,q = r1 xq + J1q σq′ 2 P˜˙ y,q

′ 2 ˙ △ σq P˜y,q = r2 xq + J2q σq′ 2 P˜˙ y,q .

Substituting α˙ 1,q and α˙ 2,q in (224), simple rearrangement of terms reveals that P P ′ |2 /Q J1,q km=1 2ǫ2s,0 Px /Qm + J2,q km=1 2ǫs,0 Px |σm m σq′ 2 P˜˙y,q xq = V (1 − F )

(243) (244)

(245)

where

k X

′ |2 α Px |σm 2,0 2 ′ m=1 ǫs,0 Px |σm | α2,0 + 2Px α1,0 ǫs,0 ′ |2 r + 2P ǫ r k P |σ ǫ X x 2 x s,0 1 s,0 m △ 1 F = 2 V ′ m=1 ǫs,0 Px |σm | α2,0 + 2Px α1,0 ǫs,0 △ ′ 2 Qm = ǫs,0 Px σm α2,0 + 2Px α1,0 ǫs,0 . △

V =

Let △

Jq =

J1,q

Pk

P ′ |2 /Q + J2,q km=1 2ǫs,0 Px |σm m , V (1 − F )

2 m=1 2ǫs,0 Px /Qm

(246)

(247)

(248)

(249)

and so 2 2σq′∗ yq . xq = Jq σq′ P˜˙y,q = Jq nb β i

Therefore,

∂nFglas E {Xqi | Y } ∼ ∂λqi λ=0 ∂ǫs = − nβ ∂λq ′

i

where

△

ξq,2 = −Jq

λ=0

(250)

(251) = ξq,2 · yqi .

2σq′∗ . h

(252)

(253)

Finally, the mismatched MSE estimator in the region R ≤ Rg and R ≤ Rc for Rd < 0 and Rd > 0, respectively, is derived. Based on (83), it readily follows that ∂ ln ZQ,c ′ E {Xqi | Y } ∼ = Xqi . ∂λqi λ=0

October 16, 2018

(254)

DRAFT

46

To conclude, the mismatched MSE estimator is given as follows. For Rd ≥ 0 E ′ {Xqi | Y } ∼

For Rd < 0

Xqi ,

R ≤ Rc

ξq,1Yqi ,

Xqi , E ′ {Xqi | Y } ∼ ξq,2 Yqi , ξ Y , q,1 qi

.

(255)

R > Rc

R ≤ Rg

(256)

Rg < R ≤ Re R > Re

where the above equalities are asymptotic equalities between two random variables, in the sense that the difference between them converges to zero in probability. The mismatched MSE is given by n X E Xi2 − 2 Re E E (Xi | Y ) E ′∗ (Xi | Y ) mse (X | Y ) = i=1

n 2 o + E E ′ (Xi | Y ) .

(257)

Therefore, based on (257), in order to calculate the MSE, the MMSE estimator should be obtained first. Substituting A = A′ in Rd , given in (168), one can see that Rd = 0. Thus, the MMSE estimator is given by

E {Xqi | Y } ∼

ξ1,q Yqi ,

R > Re .

(258)

R ≤ Re

Xqi ,

In order to find Re , according to (155), γ0 is needed. However, in this case it can readily be verified that γ0 = 1/Px , and thus Rc,M

1 = Re = 4π △

Z

2π 0

ln 1 + |H (ω)|2 βPx dω.

(259)

′ in (211), reveals that Finally, substitution of σm = σm

ξ1,q =

and thus E {Xqi | Y } ∼

October 16, 2018

βσq∗ Px 1 + |σq |2 Px β

βσq∗ Px 1+|σq |2 Px β Yqi , Xqi ,

,

(260)

R > Rc,M .

(261)

R ≤ Rc,M DRAFT

47

Based on the second term of the sum in (257), several cases should be considered. For Rd > 0, since Rc < Rc,M , there are three regions: R < Rc , Rc < R < Rc,M and R > Rc,M . For R < Rc , both the

matched and the mismatched estimators are asymptotically equal to Xqi with high probability, and thus mse (X | Y ) = 0.

(262)

For Rc < R < Rc,M one readily obtains k X

!

k X

1 2 |ξm,1 | |σm | Px + , β

(263)

! ∗ P βσm 1 x 2 |σm | Px + β 1 + |σm |2 Px β m=1 k X 1 2 2 |ξm,1 | |σm | Px + +h β m=1 ! k k X X 1 2 2 ∗ ∗ . |ξm,1 | |σm | Px + = Px − 2h Re ξm,1 σm Px + h β

(264)

mse (X | Y ) = Px − 2h Re n and similarly, for Rc,M < R, mse (X | Y ) = Px − 2h Re n

∗ ∗ ξm,1 σm Px

m=1

k X

+h

m=1

2

∗ ξm,1

(265)

m=1

m=1

Thus, the MSE’s in the last two ranges are the same. In the same way, the MSE for Rd < 0 is calculated. For R ≤ Rg mse (X | Y ) = 0.

(266)

For Rg < R ≤ Re mse (X | Y ) = Px − 2h Re n and for R > Re mse (X | Y ) = Px − 2h Re n

k X

∗ ∗ ξm,2 σm Px

m=1

k X

m=1

∗ ∗ ξm,1 σm Px

! !

+h

k X

1 △ 2 |ξm,2 | |σm | Px + = mse1 , β

(267)

k X

1 △ 2 |ξm,1 | |σm | Px + = mse2 . β

(268)

2

m=1

+h

2

m=1

Finally, take the limit h → 0 (after n → ∞). Using Szeg¨o’s theorem (as was done in (168)), one obtains (i = 1, 2) Px lim msei = Px − n→∞ π

Z

0

2π

Re (Ξ∗i (ω) H∗ (ω)) dω

1 + 2π

Z

0

2π

1 2 |Ξi (ω)| |H (ω)| + dω β 2

(269)

where Ξi (ω), for i = 1, 2, are given in (15) and (39). In the matched case, for R > Rc,M mmse (X | Y ) = October 16, 2018

n X i=1

o n E Xi2 − E |E {Xi | Y }|2

(270)

DRAFT

48

= nPx −

nb k X X

m=1 im =1

o n E |E {Xim | Y }|2

(271)

2 k ∗ P X βσm 1 x 2 = nPx − 1 + |σ |2 P β nb |σm | Px + β m x m=1

= nPx − n

k X

m=1

which upon taking the limit h → 0, becomes

h1

β

k X |σm |2 Px2 Px , h =n 2 1 + |σm |2 Px β + |σm | Px m=1

mmse (X | Y ) 1 = n→∞ n 2π lim

Z

2π

0

Px 1 + |H (ω)|2 Px β

dω.

(272)

(273)

(274)

Remark 4 (Generalization to Any Input Spectral Distribution) As was mentioned in Section III, the above analysis can be modified to hold for any input spectral density Sx (ω). Technically speaking, the following modification should be considered: Let Px,m be the (real) transmitted power over the mth bin. Then, because of the separable form of the partition function over the bins, we will essentially obtain exactly the same results with the exception of Px,m instead of Px . Precisely, instead of Px which appears in the numerator of the logarithm function in (133), one should simply replace it to Px,m . Following the same lines of derivation, at the final stage of the refinement of the bin sizes, we will finally obtain the spectral density Sx (ω) as a limit function of {Px,m }m . VII. C ONCLUSION In this paper, we considered the problem of mismatched estimation of codewords corrupted by a Gaussian vector channel. The derivation was build upon a simple relation between the MSE and a certain function, which can be viewed as a partition function, and hence be analyzed using methods of statistical mechanics. As a special case, the MMSE estimator and its respective estimation error was derived. In particular, it was shown that the MSE essentially separated into two cases each exhibiting a different behavior: In one case, the MSE exhibits single phase transition, which divides the MSE into ferromagnetic and paramagnetic phases. In the other case, the MSE exhibits two phase transitions, which divide the MSE into three phases consisting of the two previous phases and a third glassy phase. Then, using the theoretical results obtained, a few numerical examples were analyzed, by exploring the phase diagrams and the MSE’s as functions of the mismatched parameters in each problem. This leads to physical intuitions regarding the threshold effects and the role of the mismatched measure in creating them. Indeed, it was shown that the aforementioned separation of the MSE is linked to pessimism and optimism behaviors of the receiver, according to its mismatched assumption on the channel. Note that in October 16, 2018

DRAFT

49

contrast to previous related papers [5, 6], in which the explored examples did not completely emphasize the necessity of the use of the analysis techniques of statistical physics for deriving the MSE, we believe that the considered problem in this paper does, as standard information theoretic approaches do not lend themselves to rigorous analysis. Finally, we believe that the tools developed in this paper for handling optimum estimation problems, can be used in other applications. One such application, which has been already considered for a simple model is estimation of signals of partial support [6, Section V. D] which has motivation in compressed sensing applications. It would be natural to generalize the model considered in [6, Section V. D] to a much more rich and applicable one (in the spirit of the considered model in this paper), and perhaps assessing the MSE using the concepts developed in this paper. A PPENDIX A P ROOF

OF

L EMMA 1

Proof: We first show the inclusion Tδ (x | y) ⊆ Tˆδ (x | y) ,

(A.1)

namely, for any x ∈ Tδ (x | y) also x ∈ Tˆδ (x | y). Recall that

2 mn △

nb δ b Bm (Pm , ρm ) = x ∈ R : x(m−1)nb +1 − nb Pm ≤ δ,

( ) ) q X σi′ y¯i xi − nb ρm P˜y,m P˜σ,m ≤ δ , Re

(A.2)

i∈Im

and that △

Tδ (x | y) =

) ky − Σ′ xk2 λT x 2 − − nǫ ≤ δ . x ∈ Rn : kxk − nPx ≤ δ, 2 β

(

First, note that the second constraint in (A.3) can be rewritten as ( ) n 1X ′ Re − ρ σ y ¯ x ≤δ i i i n

(A.3)

(A.4)

i=1

where

ρ˜ =

1 n

2 ′ i=1 |σi xi |

Pn

2

+ Py − 2ǫ

.

(A.5)

Then, for any x ∈ Tδ (x | y), we first show that there exist a sequence {Pm }km=1 ∈ P δ such that for any 1 ≤ m ≤ k,

October 16, 2018

2 mn

b x (m−1)nb +1 − nb Pm ≤ δ.

(A.6) DRAFT

50

2

b in the To this end, for each 1 ≤ m ≤ k, Pm is chosen to be the nearest point to xmn (m−1)nb +1

2

mnb k , namely P set G1,δ m = x(m−1)nb +1 / (nb δ) · δ . Under this choice, obviously, (A.6) holds, and {Pm }km=1 ∈ P δ , since

2 mnb

k k 1 X 1 X x

(m−1)nb +1 Pm − Px = δ − Px k k nb δ m=1 m=1 k

2 1 X

mnb ≤

x(m−1)nb +1 δ − Px ≤ δ n

(A.7)

(A.8)

m=1

where the last equality follows from the fact that x ∈ Tδ (x | y). Next, we show that there exist a

sequence {ρm }km=1 ∈ RδP such that for any 1 ≤ m ≤ k, ( ) q X ′ ˜ ˜ σi y¯i xi − nb ρm Py,m Pσ,m ≤ δ. Re

(A.9)

i∈Im

Similarly, by taking

ρm

P ′y Re σ ¯ x i i i i∈I k m · δ ∈ G2,δ q , = ˜ ˜ nb δ Py,m Pσ,m

obviously, (A.9) holds, and also {Pm , ρm } ∈ P δ ∩ RδP , since P q q k k 1 X 1 X ′ Re ¯i xi i∈Im σi y ˜ ˜ ˜ ˜ q ρm Py,m Pσ,m − ρ˜ = · δ Py,m Pσ,m − ρ˜ k k m=1 m=1 nb δ P˜y,m P˜σ,m ( ) k 1 X X ≤ Re σi′ y¯i xi − ρ˜ n m=1 i∈Im ( ) n 1X ′ σi y¯i xi − ρ ≤ δ = Re n

(A.10)

(A.11)

(A.12)

(A.13)

i=1

where the last equality follows from the fact that x ∈ Tδ (x | y). For the second inclusion, we need to show that Tˆδ/k (x | y) ⊆ Tδ (x | y). For any x ∈ Tˆδ/k (x | y) k

2 X

mnb 2

x(m−1)nb +1 − nPx kxk − nPx = m=1 k k

2 X X

mnb n P − x =

(m−1)nb +1 b m m=1 m=1 k

2 X mn δ

b x ≤ (m−1)nb +1 − nb Pm ≤ k k = δ

(A.14)

m=1

October 16, 2018

DRAFT

51

where the second equality follows from the definition of P δ , the third inequality follows from the triangle δ (P , ρ ). In the same way, for inequality, and the forth inequality follows from the definition of Bm m m

any x ∈ Tˆδ/k (x | y) v ( ) u n X u ′ tPy¯ Re − nρ σ y ¯ x i i i i=1

n 1X ′ 2 |σi xi | n i=1

( ! k ) q k X X X ′ = ˜ ˜ ρm Py,m Pσ,m Re σi y¯i xi − nb m=1 m=1 i∈Im ≤k

δ =δ k

(A.15)

where the first equality follows from the definition of RδP , and the second inequality follows from the δ (P , ρ ). Thus T ˆδ/k (x | y) ⊆ Tδ (x | y) ⊆ Tˆδ (x | y). triangle inequality and the definition of Bm m m

A PPENDIX B P ROOF

OF

L EMMA 2

Proof: For simplicity of notation, the following conventions are used. Calculating the volume of δ (P , ρ ) is equivalent to calculating the volume of the set Bm m m ( ) ( ) n X p △ 2 ∗ n xi yi − nρ Px Py ≤ δ Fδ (Px , ρ) = x ∈ R : kxk − nPx ≤ δ, Re

(B.1)

i=1

△

△

where Px = kxk2 /n and Py = kyk2 /n, for a given vector y ∈ Cn . Due to the symmetry of the vectors x

and y in the DFT domain (recall that in the time domain the considered vectors are real), i.e., xi = x∗n−i

for i = 1, . . . , n (and similarly for y ), for the volume calculation of (B.1), only vectors with dimension n/2 should be considered, while the other half is fixed. Accordingly, the constraints in (B.1) take the

form X n/2 n 2 |xi | − Px ≤ δ, 2 i=1

and

△

X n/2 n p ∗ Re xi yi − ρ Px Py ≤ δ. 2 i=1

Let m = n/2. Consider the following Gaussian measure ) ( m X 1 1 △ m |xi − ayi |2 dx exp − 2 dγG (x) = (πϑ2 )m ϑ

(B.2)

(B.3)

(B.4)

i=1

where a, ϑ2 ∈ R. Then,

m m 1 = γG {Rm } ≥ γG {Fδ (Px , ρ)} October 16, 2018

(B.5) DRAFT

52

) ( m 1 1 X 2 = (B.6) |xi − ayi | dx exp − 2 2 m ϑ Fδ (πϑ ) i=1 Z io n mh p 1 2 dx (B.7) P P (ρ − δ) + a P ≥ exp − P + δ − 2a x y y x m 2 ϑ2 Fδ (πϑ ) n mh io p 1 2 exp − = Vol {Fδ (Px , ρ)} P + δ − 2a . (B.8) P P (ρ − δ) + a P x x y y (πϑ2 )m ϑ2 Z

It is easy to verify that

△

ao =

s

Px (ρ − δ) , Py

(B.9)

and △

ϑ2o = Px + δ − 2a

p

Px Py (ρ − δ) + a2 Py

= Px + δ − Px (ρ − δ)2

(B.10) (B.11)

maximize the right hand side of (B.8) (w.r.t. a and ϑ2 ). Thus, on the one hand, Vol {Fδ (Px , ρ)} ≤ exp m ln πeϑ2o .

(B.12)

m {Fδ (Px , ρ) ∪ Fδc (Px , ρ)} 1 = γG ) ( Z m 1 1 X 2 m = {Fδc (Px , ρ)} |xi − ayi | dx + γG exp − 2 2 )m (πϑ ϑ Fδ

(B.13)

On the other hand,

(B.14)

i=1

m ≤ Vol {Fδ (Px , ρ)} exp −m ln πeϑ2o,u + γG {Fδc (Px , ρ)}

(B.15)

where the last inequality follows by the same considerations as before, and ϑ2o,u = Px − δ − Px (ρ + δ)2 .

(B.16)

Using Boole’s inequality m γG {Fδc (Px , ρ)}

≤

m γG

n

o 2 m x : kxk − mPx > δ + γG

( n ) ) X p ∗ xi yi − mρ Px Py > δ . x : Re

(

i=1

(B.17)

It is easy to verify that the parameters a and ϑ that are maximizing the Gaussian measure are given by P ∗ Re ( m ˜ i=1 xi yi ) △ ρ = (B.18) aM = Py Py P m ∗ 2 Re ( m ρ˜2 1 X 2 2 i=1 xi yi ) △ ˜ = Px − |xi | − (B.19) ϑM = m Py Py i=1

October 16, 2018

DRAFT

53

where ρ˜ and P˜x are the empirical correlation and the input variance, respectively. Let γG,M denote the Gaussian measure associated with the parameters aM , ϑM , namely, γG,M is given by (B.4) with a = aM and ϑ2 = ϑ2M . Accordingly, it is easy to verify that under γG,M , the following hold o n E γG,M kXk2 = mPx ,

(B.20)

and E γG,M

(

Re

"

n X

Xi yi∗

i=1

#)

=m

p Px Py ρ.

(B.21)

Thus, using the LLN, the two terms on the right hand side of (B.17) are negligible as m → ∞, namely, m γG {Fδc (Px , ρ)} ≤ ǫ

(B.22)

Vol {Fδ (Px , ρ)} ≥ (1 − ǫ) exp m ln πeϑ2o,u .

(B.23)

for any ǫ > 0. Thus,

Finally, combining (B.12), (B.23), and taking the limit δ → 0, the lemma follows. A PPENDIX C P ROOF

OF

L EMMA 4

Proof: Recall that ·

E {N (ǫ)} = exp {n (R + Γ (ǫ))} ,

and that

(C.1)

17

·

var {N (ǫ)} = exp {n (R + Γ (ǫ))} (1 − exp {nΓ (ǫ)}) .

(C.2)

Thus, var {N (ǫ)}

2

(E {N (ǫ)})

·

= exp {−n (R + Γ (ǫ))} .

(C.3)

·

For any ǫ ∈ / E , the expectation of N (ǫ) can be written as E {N (ǫ)} = e−nC1 where C1 = R+Γ (ǫ) > 0. Thus, by Markov inequality (since N (ǫ) ∈ N ∪ {0}) ·

P {N (ǫ) > 0} ≤ E {N (ǫ)} = e−nC1 . 17

(C.4)

Given y, N (ǫ) is a sum of M − 1 i.i.d. Bernoulli random variables and therefore its variance is (M − 1) p (1 − p), where

p is the success probability, which in our case, was shown to be given by p = exp {nΓ (ǫ)}. October 16, 2018

DRAFT

54

On the other hand, for any ǫ ∈ E and δ > 0, using Chebyshev’s inequality N (ǫ) var {N (ǫ)} · −nC2 − 1 > δ ≤ =e P E {N (ǫ)} γ (E {N (ǫ)})2

(C.5)

where C2 = R + γ (ǫ) > 0. Thus, in this case, N (ǫ) is concentrated very strongly around E {N (ǫ)}. △

Finally, let An = {|N (ǫ) − E {N (ǫ)} 1 {E }| > δ}. Then, using (C.4) and (C.5), it is easy to verify that ∞ X i=1

P (An ) < ∞.

Thus, using Borel-Cantelli Lemma, one obtains that P lim sup An = 0,

(C.6)

(C.7)

n→∞

and hence (124) follows. A PPENDIX D P ROOF

OF

(128)

Equation (128) follows by the following lemma. Lemma 5 Let f : R × R → R be a smooth function such that g (x) = lim f (x, h) , h→a

(D.1)

uniformly for every x ∈ R. Assume that limh→a maxx f (x, h) exist. Then, lim max f (x, h) = max lim f (x, h) .

h→a

x

x

h→a

(D.2)

Proof of Lemma 5: Let △

λ = lim max f (x, h) ,

(D.3)

g (x0 ) = max g (x) = max lim f (x, h) .

(D.4)

h→a

x

and △

x

x

h→a

Based on (D.3), ∀ǫ1 > 0 there exist δ1 > 0 such that max f (x, h) − λ < ǫ1 x

(D.5)

whenever 0 < h − a < δ1 . Accordingly, by (D.1), ∀ǫ2 > 0 there exist δ2 > 0 such that |f (x, h) − g (x)| < ǫ2 October 16, 2018

(D.6) DRAFT

55

whenever 0 < h − a < δ2 . Let us assume by contradiction that (without loss of generality) △

∆ = |g (x0 ) − λ| > 0.

(D.7)

However, by using the triangle inequality, one obtains that 0 < ∆ = |g (x0 ) − λ| ≤ g (x0 ) − max f (x, h) + max f (x, h) − λ , x

x

(D.8)

and hence

0 < ∆ − ǫ1 ≤ g (x0 ) − max f (x, h) ≤ |g (x0 ) − f (x0 , h)| , x

(D.9)

for 0 < h − a < min (δ1 , δ2 ), which contradicts the assumption in (D.1) (or (D.6)). Thus, δ = 0. Remark 5 As the proof shows, Lemma 5 remains valid for functions f : X × Y → R. In our case, the assumptions of Lemma 5 hold true: the uniform convergence is due to the absolutely (square) summability of the sequence {hk } and Szeg¨o’s theorem, and the existence the limit over the maximization problem indeed exists as was obtained. Thus, the order of limit over h and the maximization over ǫ in (128) can be interchanged. A PPENDIX E D ERIVATION

OF

(168)

Szeg¨o’s theorem [36-39] basically states that, for a sequence of Toeplitz matrices T n = {ti−j }i,j with dimension n × n, for which {tk } is absolutely (square) summable, the following holds Z 2π n−1 1 1X F (T (ω)) dω F (τn,k ) = lim n→∞ n 2π 0

(E.1)

k=0

where {τn,k }k are the eigenvalues of T n , T (ω) is the Fourier transform of {tk }, and F (·) is some polynomial function. Furthermore, if T n are Hermitian, then (E.1) holds true for any continuous function F (·).

In our case, however, the matrices A and A′ are not necessarily Hermitian. Nevertheless, based on (165), it can be seen that the dependency of the various non-linear terms (except the third term) on the ′ |2 , which can be regarded as eigenvalues of the Hermitian matrix AH A, and eigenvalues is only via |σm

so Szeg¨o’s theorem can be applied. Regarding the third term in the right hand side of (165), it can be shown [38] that a product of Toeplitz matrices also satisfies Szeg¨o’s theorem, namely, Z 2π n−1 1X 1 lim F (T (ω) S (ω)) dω F (ρn,k ) = n→∞ n 2π 0

(E.2)

k=0

October 16, 2018

DRAFT

56

where {ρn,k }k are the eigenvalues of product of the Toeplitz matrices, T n S n , and T (ω) and S (ω) are the respective Fourier transforms. Accordingly, since the third term in (165) is originated from a product of Toeplitz matrices (162), (E.2) can be used. Therefore, a direct application of (E.1) and (E.2) on (165), we finally obtain (168). Finally, note that these considerations are utilized to justify the other places in the paper (for example, (195) and (269)) in which Szeg¨o’s theorem is applied.

R EFERENCES [1] R. S. Bucy, “Information and filtering,” Inf. Sci., vol. 18, pp. 179–187, 1979. [2] T. E. Duncan, “On the calculation of mutual information,” SIAM J. Appl. Math., vol. 19, no. 1, pp. 215–220, 1970. [3] T. Kailath, “The innovations approach to detection ans estimation theory,” Proc. IEEE, vol. 58, no. 5, pp. 680–695, May 1970. [4] J. Seidler, “Bounds on the mean-square error and the quality of domain decisions based on mutual information,” IEEE Trans. Inf. Theory, vol. IT-17, no. 6, pp. 655–665, Nov. 1971. [5] N. Merhav, “Optimum estimation via gradients of partition functions and information measures: A statistical-mechanical perspective,” IEEE Trans. Inf. Theory, vol. 57, no. 6, pp. 3887–3898, June 2011. [6] N. Merhav, D. Guo, and S. Shamai, “Statistical physics of signal estimation in Gaussian noise: theory and examples of phase transitions,” IEEE Trans. Inf. Theory, vol. 56, no. 3, pp. 1400–1416, Mar. 2010. [7] D. Guo, “Relative entropy and score function: New information-estimation relationships through arbitrary additive perturbations,” presented at the Int. Symp. Information Theory, Seoul, South Korea, Jun./Jul. 2009. [8] D. Guo, S. Shamai, and S. Verd´u, “Mutual information and minimum mean-square error in Gaussian channels,” IEEE Trans. Inf. Theory, vol. 51, no. 4, pp. 1261–1282, Apr. 2005. [9] ——, “Additive non-Gaussian noise channels: Mutual information and conditional mean estimation,” in Proc. IEEE Int. Symp. Inf. Theory.

Adelaide, Australia, Sep. 2005, pp. 719–723.

[10] ——, “Mutual information and conditional mean estimation in Poisson channels,” IEEE Trans. Inf. Theory, vol. 54, no. 5, pp. 1837–1849, May 2008. [11] D. P. Palomar and S. Verd´u, “Gradient of mutual information in linear vector gaussian channels,” IEEE Trans. Inf. Theory, vol. 52, no. 1, pp. 141–154, Jan. 2006. [12] ——, “Representation of mutual information via input estimates,” IEEE Trans. Inf. Theory, vol. 53, no. 2, pp. 453–470, Feb. 2007. [13] M. Raginsky and T. P. Coleman, “Mutual information and posterior estimates in channels of exponential family type,” in Proc. IEEE Workshop Inf. Theory.

Taormina, Italy, Oct. 2009, pp. 399–403.

[14] S. Verd´u, “Mismatched estimation and relative entropy,” IEEE Trans. Inf. Theory, vol. 56, no. 8, pp. 3712–3720, Aug. 2010. [15] T. Weissman, “The relationship between causal and non-causal mismatched estimation in continuous-time AWGN channels,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4256–4273, Sep. 2006. [16] A. Atar and T. Weissman, “Mutual information, relative entropy, and estimation in the Poisson channel,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1302–1318, Mar. 2012. [17] R. Bustin and S. Shamai, “MMSE of “Bad” coeds,” IEEE Trans. Inf. Theory, vol. 59, no. 2, pp. 733–743, Feb. 2013. October 16, 2018

DRAFT

57

[18] N. Sourlas, “Sping-glass models as error-correcting codes,” Nature, vol. 339, pp. 693–695, Jan. 1989. [19] ——, “Spin glasses, error-correcting codes and finite-temperatures,” Europhys. Lett., vol. 25, pp. 159–164, 1994. [20] F. Cousseau, K. Mimura, T. Omori, and M. Okada, “Statistical mechanics of lossy compression for non-monotonic multilayer perceptrons,” Phys. Rev. E, vol. 78, p. 021124, Jul. 2008. [21] T. Hosaka and Y. Kabashima, “Statistical mechanical approach to error exponents of lossy data compression,” J. Phys. Soc. Japan, vol. 74, no. 1, pp. 488–497, Jan. 2005. [22] Y. Iba, “The Nishimori line and Bayesian statistics,” J. Phys. A: Math. Gen, vol. 32, pp. 3875–3888, 1999. [23] Y. Kabashima and T. Hosaka, “Statistical mechanics for source coding with a fidelity criterion,” Progr. Theoret. Phys., pp. 197–204, 2005. [24] Y. Kabashima, K. Nakamura, and J. vanMourik, “Statistical mechanics of typical set decoding,” Phys. Rev. E, vol. 66, pp. 197–204, 2002. [25] Y. Kitagawa and T. Tanaka, “Optimal spreading sequences in large CDMA systems: A statistical mechanics approach,” in Proc. Int. Symp. Information Theory, vol. 13731377.

Toronto, ON, Canada, Jul. 20, 2008.

[26] T. Tanaka, “A statistical-mechanics approach to large-system analysis of CDMA multiuser detectors,” IEEE Trans. Inf. Theory, vol. 48, no. 11, pp. 2888–2910, Nov. 2002. [27] T. Weissman, A. Ordentlich, G. Seroussi, S. Verd´u, and J. M. Weinberger, “Universal discrete denoising: Known channel,” IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 5–28, Jan. 2005. [28] G. Gemelos, S. Sigurjonsson, and T. Weissman, “Universal minimax discrete denoising under channel uncertainty,” IEEE Trans. Inf. Theory, vol. 52, no. 8, pp. 3476–3497, Aug. 2006. [29] S. Jalali and T. Weissman, “Denoising via MCMC-based lossy compression,” IEEE Trans. Sig. Process., vol. 60, no. 6, pp. 3092–3100, Jun. 2012. [30] A. Ganti, A. Lapidoth, and E. Telatar, “Mismatched decoding revisited: General alphabets, channels with memory, and the wide-band limit,” IEEE Trans. Inf. Theory, vol. 46, no. 7, p. 23152328, Nov. 2000. [31] D. Donoho, “On minimum entropy deconvolution,” Applied time series analysis, pp. 565–608, 1981. [32] M. Fozunbal, “On regret of parametric mismatch in minimum mean square error estimation,” in Proc. IEEE Int. Symp. Inf. Theory.

Austin, Texas, U.S.A., June 2010, pp. 1408–1412.

[33] A. M´ezard, M. Montanari, Information, Physics and Computation.

Oxford, U.K.: Oxford Univ. Press., 2009.

[34] C. E. Shannon, “Probability of error for optimal codes in a Gaussian channel,” Bell Sys. Technical J., vol. 38, no. 3, pp. 611–656, May 1959. [35] A. D. Wyner, “A bound on the number of distinguishable functions which are time-limited and,” SIAM J. Appl. Math., vol. 24, no. 3, pp. 289–297, May 1973. [36] U. Grenander and G. Szego, Toeplitz Forms and Their Applications. University of Calif. Press, Berkeley and Los Angeles, 1958. [37] A. Widom, Toeplitz Matrices. in Studies in Real and Complex Analysis, edited by I.I. Hirschmann, Jr., MAA Studies in Mathematics, Prentice-Hall, Englewood Cliffs, NJ, 1965. [38] M. R. Gray, Toeplitz and Circulant Matrices: A review.

now, 2006.

[39] A. Bottcher and S. M. Grudsky, Spectral Properties of Banded Toeplitz Matrices. [40] W. Rudin, Principles of Mathematical Analysis.

3rd ed. New York: McGraw-Hill, 1976.

[41] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. [42] G. Galavotti, Statistical Mechanics: A Short Treatise. October 16, 2018

SIAM, 2005.

Springer, 1998.

Springer Verlag, New York, 1999. DRAFT

58

[43] V. A. Zorich, Mathematical Analysis II. Springer, 2009. [44] O. J. Frink, “Differentiation of sequences,” Amer. Math. Soc., vol. 41, pp. 553–560, Dec. 1934. [45] R. C. Steinlage, “Nearly uniform convergence and interchange of limits,” Publications de l’Institut Math´ematique, vol. 26, no. 12, pp. 115–129, June 1971.

October 16, 2018

DRAFT