Feb 18, 2013 - The decision tree model captures the complexity of computing functions f : Xm â Y in a setting where the quantity of interest is the ...

2 downloads 37 Views 261KB Size

arXiv:1302.4207v1 [cs.CC] 18 Feb 2013

February 19, 2013

Abstract We completely characterise the complexity in the decision tree model of computing composite relations of the form h = g ◦ (f 1 , . . . , f n ), where each relation f i is boolean-valued. Immediate corollaries include a direct sum theorem for decision tree complexity and a tight characterisation of the decision tree complexity of iterated boolean functions.

1

Introduction

The decision tree model captures the complexity of computing functions f : X m → Y in a setting where the quantity of interest is the number of queries to the input (see [1] for a good review of the model). We are allowed to query individual elements xi ∈ X at unit cost, and seek to compute f (x1 , . . . , xm ) using the minimal number of queries. Here we will be interested in the decision tree complexity of composite functions, i.e. functions of the form h(x) = g(f 1 (x1 ), . . . , f n (xn )), where xi ∈ X mi . One strategy to compute h is first to replace g with the function g¯ given by substituting the values taken by any constant functions f i into g, and then to compute g¯ using efficient algorithms for f 1 , . . . , f n as black boxes. In other words, we use an efficient algorithm for computing g¯(y1 , . . . , yn ), and replace each query to a variable yi with the evaluation of f i (xi ) using an optimal algorithm. As the xi inputs are independent, it is natural to conjecture that this is in fact always the most efficient way to compute g. However, a moment’s thought shows that this cannot be the case in general: indeed, consider the function h(x) = g(f (x)), where f : {0, 1}n → {0, 1}n is the identity function f (x) = x, and g : {0, 1}n → {0, 1} is the function extracting the first bit of the input, g(y) = y1 . Then f requires n queries to be computed, whereas h can be computed with 1 query. Another simple, but arguably less trivial, example of this phenomenon is illustrated in Figure 1 below; the key point in this example is that the worst-case input to the functions f i may not produce an output which is a worst-case input to g. We will show that, nevertheless, the simple algorithm above is optimal if the functions f i are boolean-valued, i.e. when Y = {0, 1}. In fact, we prove such a result for the generalisation of the decision tree model to computing relations. A decision tree computes a relation f ⊆ X m × Y if, on input x ∈ X m , it outputs y such that (x, y) ∈ f (see Section 1.2 for a formal definition of the model). We call this result a composition theorem for decision tree complexity. ∗

[email protected]

1

x1 y1

x1 y2

x2

0 1

(a) Optimal tree for f

2

0

1

1

x3 2

2

x4

0 1

(b) Optimal tree for g

x2 1

2

2

(c) Optimal tree for h

Figure 1: Let f : {0, 1}2 → {0, 1, 2} and g : {0, 1, 2}2 → {0, 1, 2} be defined by the decision trees above (where edges correspond to elements of {0, 1} or {0, 1, 2} in ascending order from left to right). Set h(x1 , x2 , x3 , x4 ) = g(f (x1 , x2 ), f (x3 , x4 )). Then f and g require 2 queries each to be computed, but h can be computed using only 3.

1.1

Related work

This work can be seen as fitting into the framework of “direct sum” results in query complexity, in which there has been significant recent interest (see the thesis [2] for a good review). The basic aim of such work is to show that, given n independent computational problems, the most efficient way to solve them all is simply to use an optimal algorithm for each problem separately. While this claim is intuitively obvious, it is not necessarily easy to prove, and in some cases is even false; see [2] for some examples. In particular, Jain, Klauck and Santha have shown [5] such a direct sum theorem for deterministic and randomised decision tree complexity. As discussed in [5], previous work had dismissed this question as uninteresting, but the proof is not immediate. In the deterministic case, the result of [5] is proven by showing that, given a decision tree T computing n independent outputs, one can produce n independent trees Ti , one for each output, and lower bounding the depth of T in terms of the depth of the trees Ti . This approach does not seem suitable for proving a composition theorem, as given a tree computing h, it is not necessarily possible to produce individual trees for f 1 , . . . , f n . On the other hand, the composition theorem given here in fact implies a direct sum theorem for decision tree complexity (see Section 2.1 for this and other corollaries). Similar, but more complicated, composition theorems to the present work have been proven for ˇ quantities which are important in the study of quantum query complexity. Høyer, Lee and Spalek [3] proved that the so-called non-negative weight adversary bound (which we will not define here; see [3] for details) obeys a composition theorem. Interestingly, their proof is based on generalising the query model to weighted queries, similarly to the approach taken here. A composition theorem was later proven for the general adversary bound (in two parts; a lower bound by Høyer, Lee and ˇ Spalek [4], and an upper bound by Reichardt [7]), which was then used by Reichardt to infer a composition theorem for quantum query complexity [8].

1.2

Relations and decision trees

We use essentially the standard model of decision trees (see e.g. [1] for definitions, and [5] for the extension to relations). Fix finite sets X and Y , and integer m. A decision tree T is a rooted |X|ary tree whose internal vertices are labelled with indices between 1 and n, whose edges are labelled with elements of X, and whose leaves are labelled with elements of Y . T computes a function fT : X n → Y as follows: starting with the root, it queries the input variable indexed by the label

2

of the current vertex and, depending on the outcome, evaluates the corresponding subtree. When a leaf is reached, the tree outputs the label of that leaf. The depth of T is the maximal length of a path from the root to a leaf (i.e. the worst-case number of queries used on any input). We will consider the complexity of computing relations in the decision tree model. Let f ⊆ × Y . We write dom f = {x : ∃y, (x, y) ∈ f } and range f = {y : ∃x, (x, y) ∈ f }, and also f (x) = {y : (x, y) ∈ f }, f −1 (y) = {x : (x, y) ∈ f }. f is said to be a partial function if |f (x)| = 1 for all x ∈ dom f ; in this case, we abuse notation and write f (x) for the single element within the set f (x). f is a total function if in addition dom f = X n . We say that f is constant if there exists y ∈ Y such that f −1 (y) = dom f , and trivial if, for all y ∈ Y , f −1 (y) = dom f . Note that trivial relations are constant, and that the empty relation f = ∅ is trivial.

Xn

We say that a decision tree T computes a relation f if, for all x ∈ dom f , on input x, T outputs y such that (x, y) ∈ f , i.e. fT (x) ∈ f (x) for all x ∈ dom f . If this holds, we say that fT is consistent with f . The decision tree complexity D(f ) is the minimal depth over all decision trees computing f . This quantity is also known as the exact (classical) query complexity of f . This can be generalised to define a weighted variant of decision tree complexity D(f, [w1 , . . . , wn ]), which is equal to the minimal worst-case number of queries required to compute f , given that querying the i’th element of the input costs wi queries. This can again be expressed as the minimal depth of a decision tree computing f with weights w1 , . . . , wn on the inputs, where the depth of a decision tree T with weights on the inputs is simply the maximal sum of the weights of the inputs specified by the labels of vertices of T , on any path from the root to a leaf. The usual decision tree complexity is recovered by setting D(f ) = D(f, [1, . . . , 1]). For f ⊆ X n × Y , let fi←b ⊆ X n × Y be the modification of f we obtain by substituting b as the i’th input to f . That is, fi←b = {(x, y) ∈ f : xi = b}. We can use this definition to write down the following characterisation of weighted decision tree complexity (indeed, this could even be seen as its definition). Fact 1. Let f ⊆ X n × Y , and w1 , . . . , wn ∈ N. Then D(f, [w1 , . . . , wn ]) = 0 if and only if f is constant. If f is not constant, D(f, [w1 , . . . , wn ]) = min max D(fi←b , [w1 , . . . , wn ]) + wi . i∈[n] b∈X

Proof. If f is constant, the tree can simply output y such that f −1 (y) = dom f using no queries. On the other hand, if there exists a tree that computes f and makes no queries, then there is only one fixed value y that is output by the tree; so for all x ∈ dom f , (x, y) ∈ f ; so f −1 (y) = dom f . If f is not constant, then for any i, we can produce a tree computing f with weights w1 , . . . , wn by first querying the i’th variable, obtaining outcome b, then using an optimal tree for fi←b with weights w1 , . . . , wn . This proves the “≤” part. For the “≥” part, given a tree computing f which makes its first query to the i’th variable, where i achieves the minimum above, we can obtain a tree for computing fi←b , for any b ∈ X, by simulating this tree, replacing the first query with the constant b and hence saving weight wi . In the above fact, and the rest of the paper, we write [n] for {1, . . . , n}.

3

1.3

Composition of relations

P For i ∈ [n], let f i ⊆ X mi × Y , and g ⊆ Y n × Z. Set m = i mi . The composition g ◦ (f 1 , . . . , f n ) is defined as {(x, z) ∈ X m × Z : ∃y ∈ Y n , ∀i (xi , yi ) ∈ f i ∧ (y, z) ∈ g}, where we write x = (x1 , . . . , xn ), with xi ∈ X mi . If f and g are total functions, this simply says that for all x ∈ X m , h(x) = g(f 1 (x1 ), . . . , f n (xn )). In words, an algorithm to compute the relation h on input x has to output an arbitrary z such that z ∈ g(y), for some y such that for all i, yi ∈ f i (xi ). Assume that f i is non-trivial for all i. If some of the relations f i are constant (i.e. (f i )−1 (bi ) = dom f i , for some bi ∈ Y ), this expression for h can be simplified by removing these inputs and modifying g appropriately. Write g¯ for the relation obtained by fixing constants to the inputs to g, where possible. In other words, if S ⊆ [n] is the set of indices i such that f i is not constant, then g¯ = {(y, z) ∈ g : yi = bi , i ∈ / S}, h = g¯ ◦ (f 1 , . . . , f n ) = {(x, z) ∈ X m × Z : ∃y ∈ Y n , ∀i ∈ S, (xi , yi ) ∈ f i ∧ (y, z) ∈ g¯}.

2

A composition theorem

We are finally ready to state and prove our main result. Theorem 2. Fix n ∈ N and finite sets X, Z. Let g ⊆ {0, 1}n × Z, and for 1 ≤ i ≤ n, let P f i ⊆ X mi × {0, 1} be non-trivial. Set m = i mi . Define h ⊆ X m × Z by h = g ◦ (f 1 , . . . , f n ). Also let g¯ be the relation given by substituting constant inputs into g. That is, for each i, if (f i )−1 (bi ) = dom f i , then g is replaced with gi←bi (as f i is non-trivial, there is exactly one such bi ). Then D(h) = D(¯ g , [D(f 1 ), . . . , D(f n )]). Proof. To see that D(h) ≤ D(¯ g , [D(f 1 ), . . . , D(f n )]), observe that the algorithm discussed in the introduction achieves this complexity. Formally, we compute h as follows: first form a new relation g¯ by replacing constant relations f i with corresponding constants as inputs to g; then use an optimal tree for computing g¯ with weights D(f 1 ), . . . , D(f n ), replacing each query to the i’th variable with the use of an optimal tree for f i . The more interesting part is the corresponding lower bound, the proof of which proceeds by induction on D(h). For the base case, take D(h) = 0. Then h is constant. By definition, this implies that there exists z such that the following holds: For all x = (x1 , . . . , xn ) such that there exist y 0 ∈ {0, 1}n and z 0 ∈ Z with (xi , yi0 ) ∈ f i for all i and (y 0 , z 0 ) ∈ g, there exists y ∈ {0, 1}n such that (xi , yi ) ∈ f i for all i and (y, z) ∈ g. Let S = {i : f i is not constant}. For all i ∈ S, let ai , bi ∈ X mi be picked such that (ai , 0) ∈ f , (ai , 1) ∈ / f , (bi , 0) ∈ / f and (bi , 1) ∈ f . As f i is not constant, such elements ai , bi exist. Further, for all i ∈ / S, let ci satisfy (f i )−1 (ci ) = dom f i . As f i is not trivial, ci is unique i m i and there exists d ∈ X such that (di , ci ) ∈ f i , but (di , c0i ) ∈ / f i for c0i 6= ci . For s : S → {0, 1}, let the string y(s) ∈ {0, 1}n be defined as follows: y(s)i = s(i) for i ∈ S, and y(s)i = ci for i ∈ / S. For each distinct input x ∈ {0, 1}m whose i’th block is equal to ai or bi (for 4

i ∈ S) or di (for i ∈ / S), a different string y(s) is obtained, and if (x, y) ∈ (f 1 , . . . , f n ), y = y(s). In particular, by choosing s appropriately one can produce any y ∈ dom g¯. As there is exactly one y such that (x, y) ∈ (f 1 , . . . , f n ), and h is constant, (y, z) ∈ g. Thus, for all y ∈ dom g¯, (y, z) ∈ g¯. Hence g¯ is constant and D(¯ g , [D(f 1 ), . . . , D(f n )]) = 0 as required. For the inductive step, assume that D(h) = D(¯ g , [D(f 1 ), . . . , D(f n )]) for all relations h = 1 n i g ◦ (f , . . . , f ) such that f is non-trivial for all i, and such that D(h) ≤ k, for some k. Now let h = g ◦ (f 1 , . . . , f n ) satisfy D(h) = k + 1. As h is not constant, by Fact 1 D(h) =

min max D(hi←b ) + 1

i∈[m] b∈X

i = min min max D(g ◦ (f 1 , . . . , fj←b , . . . , f n )) + 1. i∈[n] j∈[mi ] b∈X

i As f i is non-trivial for all i, for any pair i, j there exists b0 such that fj←b 0 is non-trivial. Thus, i for b achieving the above maximum, fj←b is non-trivial. Let i, j be the indices achieving the above minimum. Then, by the inductive hypothesis, i i ), . . . , D(f n )]), , . . . , f n )) = max D(g¯, [D(f 1 ), . . . , D(fj←b max D(g ◦ (f 1 , . . . , fj←b b∈X

b∈X

where g¯ is the relation obtained by substituting bk as the k’th input to g, for all k 6= i such that i i . )−1 (bi ) = dom fj←b (f k )−1 (bk ) = dom f k , and also substituting bi as the i’th input to g if (fj←b We thus have i ), . . . , D(f n )]) + 1 D(h) = max D(g¯, [D(f 1 ), . . . , D(fj←b b∈X

i ), . . . , D(f n )]) + 1, = D(g¯, [D(f 1 ), . . . , max D(fj←b b∈X

where the second equality holds because decreasing any of the weights cannot increase the decision i is not constant. Then g¯ = g¯ tree complexity. First assume that, for b achieving the maximum, fj←b and i ), . . . , D(f n )]) + 1 D(h) = D(¯ g , [D(f 1 ), . . . , max D(fj←b b∈X

1

≥ D(¯ g , [D(f ), . . . , D(f i ) − 1, . . . , D(f n )]) + 1 ≥ D(¯ g , [D(f 1 ), . . . , D(f n )]) i as required. The first inequality follows from D(f i ) ≤ maxb∈X D(fj←b ) + 1, while the second holds because, to find a decision tree T which computes g¯ with weight D(f i ) on the i’th variable, we can just use an optimal decision tree T 0 for g¯ with weight D(f i ) − 1 on the i’th variable. As T 0 is optimal, we only encounter the i’th variable at most once on any path from the root to a leaf, so the depth of T is upper bounded by the depth of T 0 , plus 1. i On the other hand, now assume that, for b achieving the maximum, fj←b is constant. Then i i it must hold that D(f ) = 1 (if f were already constant, querying any of the variables on which it depends would not achieve the overall minimum; if f i is not constant, and there exists b0 ∈ X i 0 i such that fj←b 0 is not constant, such a b would achieve the maximum over b; hence D(f ) = 1). So we can produce a decision tree for g¯ with weights D(f 1 ), . . . , D(f n ) by computing f i using one query to the j’th variable and then using an optimal decision tree for g¯ with weights D(f 1 ), . . . , D(f i−1 ), 0, D(f i+1 ), . . . , D(f n ). Thus i D(¯ g , [D(f 1 ), . . . , D(f n )]) ≤ max D(g¯, [D(f 1 ), . . . , D(fj←b ), . . . , D(f n )]) + 1 = D(h) b∈X

as required, completing the proof. 5

2.1

Corollaries

We finish by noting a few simple corollaries of Theorem 2 that can be useful in amplifying separations between decision tree complexity and other complexity measures. We begin by considering the case where all the relations f i are the same. Corollary 3. Fix f ⊆ X k ×{0, 1} and g ⊆ {0, 1}n ×Z. Define h ⊆ X nk ×Z as h = g ◦(f, f, . . . , f ). Then D(h) = D(f )D(g). Proof. If f is constant, then D(h) = 0. Otherwise, take f 1 = · · · = f n = f in Theorem 2. Then D(h) = D(g, [D(f ), . . . , D(f )]) = D(f )D(g). Second, we observe that the composition theorem can be used to characterise the decision tree complexity of an iterated boolean function, via the following corollary. k

Corollary 4. For any f : {0, 1}n → {0, 1}, let the k-fold iteration of f , f (k) : {0, 1}n → {0, 1}, be k defined as follows. Set f (1) = f , and for k ≥ 2, split the input x ∈ {0, 1}n into blocks x1 , . . . , xn of nk−1 bits each, and set f (k) (x) = f (f (k−1) (x1 ), . . . , f (k−1) (xn )). Then D(f (k) ) = D(f )k . Proof. If f is constant, then D(f (k) ) = 0 = D(f )k . Otherwise, by Theorem 2, for k ≥ 2, D(f (k) ) = D(f, [D(f (k−1) ), . . . , D(f (k−1) )]) = D(f )D(f (k−1) ). The claim follows by induction. Nisan and Szegedy used an iterated version of the “not all equal” function on 3 bits to prove an asymptotic separation between decision tree complexity and polynomial degree [6], based on lower bounding decision tree complexity by sensitivity. Corollary 4 gives a direct proof of this result. Finally, we obtain the following special case of a direct sum result of Jain, Klauck and Santha [5]. Corollary 5. For any non-trivial P f 1 , . . . , f n ⊆ X k × {0, 1}, let h ⊆ X kn × {0, 1}n be defined by 1 2 n h = (f , f , . . . , f ). Then D(h) = i∈[n] D(f i ). 1 ), . . . , D(f n )]). Let Proof. Take g(y1 , . . . , yn ) = (y1 , . . . , yn ) in Theorem 2. Then D(h) g , [D(fP P = D(¯ i i 1 n S = {i : f is not constant}. Then D(¯ g , [D(f ), . . . , D(f )]) = i∈S D(f ) = i∈[n] D(f i ).

The proof technique of [5] allows the range of each relation f i to be arbitrary, rather than being restricted to {0, 1} as here.

Acknowledgements I would like to thank Graeme Mitchison and Toby Cubitt for helpful discussions and comments on this work, and an anonymous referee for inspiring it.

References [1] H. Buhrman and R. de Wolf. Complexity measures and decision tree complexity: a survey. Theoretical Computer Science, 288:21–43, 2002. [2] A. Drucker. The Complexity of Joint Computation. PhD thesis, MIT, 2012.

6

ˇ [3] P. Høyer, T. Lee, and R. Spalek. Tight adversary bounds for composite functions, 2005. quant-ph/0509067. ˇ [4] P. Høyer, T. Lee, and R. Spalek. Negative weights make adversaries stronger. In Proc. 39th Annual ACM Symp. Theory of Computing, pages 526–535, 2007. quant-ph/0611054. [5] R. Jain, H. Klauck, and M. Santha. Optimal direct sum results for deterministic and randomized decision tree complexity. Inf. Proc. Lett., 110(20):893–897, 2010. arXiv:1004.0105. [6] N. Nisan and M. Szegedy. On the degree of Boolean functions as real polynomials. Computational Complexity, 4(4):301–313, 1994. [7] B. Reichardt. Span programs and quantum query complexity: The general adversary bound is nearly tight for every boolean function. In Proc. 50th Annual Symp. Foundations of Computer Science, pages 544–551, 2009. arXiv:0904.2759. [8] B. Reichardt. Reflections for quantum query algorithms. In Proc. 22nd ACM-SIAM Symp. Discrete Algorithms, pages 560–569, 2011. arXiv:1005.1601.

7