Oct 28, 2010 - to the active list. III. GENERALIZING TO HIGHER DIMENSIONS: SIMPLEXES AND HYPERCUBES. REVIEW OF PRIOR WORK. The first question to decid...

0 downloads 23 Views 511KB Size

No documents

arXiv:1009.4647v2 [hep-ph] 28 Oct 2010

c 2010 Stephen L. Adler

Contents

I. Introduction II. One dimensional adaptive integration

3 4

III. Generalizing to higher dimensions: simplexes and hypercubes. Review of prior work. IV. Simplex properties and applications

8

A. Simplex properties

9

B. Simplex applications

12

V. Simplex subdivision and properties

16

A. Simplex subdivision algorithms

16

B. Subdivision properties

17

VI. Hypercube subdivision and properties VII. Parameterized higher order integration formulas for a general simplex

∗

6

21 24

A. First through third order formulas

30

B. Fourth order formula

33

C. Fifth order formula

34

D. Vandermonde solvers

36

E. Seventh order formula

36

Electronic address: [email protected]

2 F. Ninth order formula

38

G. Leading term in higher order

40

VIII. Derivation of the simplex generating function

40

IX. Parameterized higher order integration formulas for axis-parallel hypercubes 42 A. First and third order formulas

45

B. Fifth order formula

46

C. Seventh order formula

46

D. Ninth order formula

47

E. Leading term in higher order

49

F. One dimension revisited: comparison of Vandermonde moment fitting to standard one dimensional methods X. Function calls needed for integration routines of various orders

49 54

XI. Putting it all together – sketch of the algorithms

56

XII. Test integrals; false positives and their avoidance

59

XIII. Description of programs in the seven directories

62

A. General description

62

B. Inputs

66

C. Outputs (and their use in making memory and running time estimates)

68

XIV. Some sample results, and open questions

71

A. Sample results

71

B. Programming extensions and open questions

77

XV. Acknowledgements XVI. References XVII. Contents of programs in directories

78 79 81

A. Directory simplex123

81

B. Directory simplex4

81

C. Directory simplex579

82

3 D. Directory simplex579 16

82

E. Directory cube13

82

F. Directory cube579

82

G. Directory cube579 16

83

I.

INTRODUCTION

This book is concerned with numerical integration in general p dimensional spaces. To understand why special methods are needed, let us consider for the moment trapezoidal or center-of-bin integration on the unit interval in p = 1 dimension. Since these are both second order methods, to achieve an accuracy of one part in 104 one needs a division of the unit interval into roughly 100 subdivisions, with an evaluation of the integrand function at each. This poses no problem for numerical evaluation, but suppose instead we wish to integrate a function over a 9 dimensional region, achieving a similar accuracy of one part in 104 . One then needs 100 divisions per axis, and (100)9 = 1018 function evaluations, which is a daunting task even for the fastest current computers. So a brute force extension of the trapezoidal rule (or similar higher order methods, such as Simpson’s rule) is not a viable approach when the dimension of the space p is more than around four. In consequence, methods for high dimensional spaces have focused on adaptive algorithms, in which function evaluations are concentrated in regions where the integrand is large and rapidly varying. Both Monte Carlo and deterministic algorithms have been proposed and widely used. Typically, they start from a base region, and then subdivide or refine on one to three or four sides along which the integrand is most rapidly varying. The process is then iterated, leading to finer subdivisions and an improved estimate of the integrand. Most authors, however, have considered it to be computationally prohibitive to proceed at each step by dividing the base region into 2p subregions, so that the maximal length of each side is reduced by a factor of 2 at each step. Such a subdivision would allow localization of isolated integrand peaks in p dimensions, giving a method with the potential of achieving high accuracy for integrations in high dimensional spaces, and high resolution in applications such as template-based pattern recognition. The motivation for this book is the observation that computer speed has dramatically increased in recent years, while the cost of memory has simultaneously dramatically decreased; our current laptop speeds, and memories, are characterized by “giga” rather than the “mega” of two decades ago. So it is now timely to address the problem of formulating practical high dimension integration

4 routines that proceed by 2p subdivision. We will develop methods for adaptive integration over both general simplexes, and axis-parallel hypercubes. Our simplex method is based on combining Moore’s (1992) algorithm for 2p subdivision of a general simplex, with new formulas for parameterized higher order integration over a general simplex that we derive using the centroid approach of Good and Gaskins (1969, 1971), to give a a fully localizable adaptive integration procedure for general dimension p ≥ 1. In addition to giving a hypercube method based on partition into simplexes, we also give a simpler, direct method for integration over hypercubes, constructed by analogy with our methods for simplexes. We focus specifically on a few special base region geometries: the standard simplex (relevant for calculating Feynman parameter integrals in physics), the Kuhn simplex, which can be used to tile the p dimensional side 1 hypercube by symmetrization of the integrand, and the half-side 1 hypercube, which for which we give direct algorithms which are simpler than the simplex-based algorithms. By changes of variable, any multiple integral with fixed limits of integration in each dimension can be converted to an integration over the side 1 or half-side 1 hypercube. In the following sections we develop the theory behind our methods, and then give a suite of Fortran programs, for both serial and MPI parallel computation, implementing them.

II.

ONE DIMENSIONAL ADAPTIVE INTEGRATION

As a simple example, let us sketch how to write an adaptive integration program in one dimension for the integral I=

Z

1

dxf (x)

.

(1)

0

A first estimate can be obtained by using the trapezoidal rule Ia ≃ 0.5[f (0) + f (1)] ,

(2)

and a second estimate obtained by using the center-of-bin rule Ib ≃ f (0.5)

.

(3)

These are both first order accurate methods, but since they are applied to the entire interval (0, 1) there will be a significant error, unless f (x) happens to be a linear function over the interval. If we want an evaluation of the integral with an estimated error ǫ, we test whether |Ia − Ib | < ǫ. If this condition is satisfied, we output Ia and Ib as estimates of the integral. If the condition is not

5 satisfied, we subdivide the interval (0, 1) into two half-sized intervals (0, 0.5) and (0.5, 1). In each subinterval we follow the same procedure. For a subinterval with upper limit xU and lower limit xL , and midpoint xM , we now define Ia ≃ 0.5[f (xU ) + f (xL )] ,

(4)

Ib ≃ f (xM ) .

(5)

and

For the two subintervals, we evaluate the trapezoidal and center-of-bin approximations to the integral, keeping Ia (subinterval) and Ib (subinterval) for the subinterval, multiplied by the subinterval width of 1/2, as contributions to the answer if the “thinning” condition |Ia (subinterval) − Ib (subinterval)| < ǫ

(6)

is met, and subdividing the interval by half again if this condition is not met. When, after a sequence of subdivisions, the condition is met for all subintervals, we have obtained good approximations to both a trapezoidal and center-of-bin evaluation of the integral, Ia ≃ Ib ≃

X

L(subinterval)Ia (subinterval)

,

subintervals

X

L(subinterval)Ib (subinterval) .

subintervals

(7) Here L(subinterval) is the subinterval length, and since the subintervals are a tiling of the interval (0, 1), we clearly have X

L(subinterval) = 1

.

(8)

subintervals

From the difference of Ia and Ib we get an estimate of the error, given by |outdiff| ≡ |Ia − Ib | .

(9)

We can also compute the sum of the absolute values of the local subinterval errors, errsum ≡

X

subintervals

L(subinterval)|Ia (subinterval) − Ib (subinterval)| ≥ |outdiff| .

(10)

When the condition |Ia (subinterval) − Ib (subinterval)| < ǫ is met for all subintervals, then errsum reduces, using Eq. (8), to errsum < ǫ ,

(11)

6 and if all local subinterval errors have the same sign, then we have errsum = |outdiff|. If the process of subdivision has to be stopped before the condition |Ia (subinterval) − Ib (subinterval)| < ǫ is satisfied for all subintervals, with the remaining subregion contributions added to Ia and Ib before the program terminates, then errsum will typically be larger than ǫ. Such premature termination can happen for very irregular or singular functions, or if the parameter ǫ is made too small, or if one subdivides without imposing the thinning condition of Eq. (6). But for smooth functions f (x) and thinning with attainable ǫ the subdivision process will terminate quite rapidly. The reason is that both the trapezoidal and center-of-bin methods are accurate to first order with a second order error, and so the difference Ia (subinterval) − Ib (subinterval) scales as [L(subinterval)]2 as the subinterval length L(subinterval) approaches zero.

The adaptive integration method just sketched is easily programmed, and works well. One does not have to keep track of the relative location of the various subintervals, only of their starting and ending x values. Thus, one maintains a list of active subintervals, stored in any convenient order; when a subinterval is divided the two resulting halves are added to the list of active subintervals, while if a subinterval obeys the thinning condition , its contributions to Ia , Ib , and errsum are added to an accumulation register, and the subinterval is removed from the list of active subintervals. Even faster termination is obtained if Simpson’s rule or an even higher order integration rule is used; see for example the Wikipedia article on the McKeeman (1962) adaptive Simpson rule, and references given there. The idea again is to compute two different evaluations of the integral over each subinterval, giving an error estimate that is used to determine whether to “harvest” the result at that level of subdivision, or to subdivide further. In generalizing to higher dimensional integrals, the same features persist: for each integration subregion, we evaluate a local thinning condition obtained from the difference of two alternative higher order integration rules. If the condition is obeyed, that subregion is “harvested” and deleted from the list of active subregions; if the condition is not obeyed, the subregion is further subdivided and the resulting smaller subregions are added to the active list.

III.

GENERALIZING TO HIGHER DIMENSIONS: SIMPLEXES AND HYPERCUBES. REVIEW OF PRIOR WORK.

The first question to decide in generalizing to higher dimensions is the choice of base region geometry. There are two natural higher dimensional analogs of the one dimensional interval (0, 1). The first is the side 1 hypercube (0, 1) ⊗ (0, 1) ⊗ ... ⊗ (0, 1), and the second is what we will term a

7

FIG. 1: From left to right, the unit standard simplex, the side 1 hypercube, and the half-side 1 hypercube, in 2 dimensions.

standard simplex with vertices (0, 0, ..., 0), (0, 1, 0, 0.....0), (0, 0, 1, 0, 0...., 0), ...., (0, 0, ...., 0, 1). We will also make use of the half-side 1 hypercube, spanning (−1, 1) ⊗ (−1, 1) ⊗ ... ⊗ (−1, 1). These three basic regions are illustrated, in two dimensions, in Fig. 1. Some simple geometric facts are important in setting a strategy. For a p dimensional simplex, the number of vertices is p + 1 and the number of sides connecting vertices is (p + 1)p/2, both of which have polynomial growth. Thus, the indexing problem of keeping track of vertices which define active regions is relatively simple. For a p dimensional hypercube, the number of m-dimensional hypercubes on the boundary (see, e.g., the Wikipedia article on hypercubes) is 2p−m p !/(m !(p − m) !), and so the number of vertices (m = 0) is 2p , and the number of sides connecting vertices

(m = 1) is p 2p−1 , both of which grow exponentially with p. Thus, if one labels hypercubes in terms of their vertices or sides, an exponentially growing index is required for large p. However, for the maximal boundary hypercube, with m = p − 1, the number given by the above formula is just 2p, which again has polynomial, in fact linear, growth. (For example, a square has 2 × 2 = 4 lines as sides, and a cube has 2 × 3 = 6 squares as faces.) So our direct method for hypercubes will use geometric features of the maximal boundary hypercubes for indexing, subdivision, and integration, closely following the methods that we develop for simplexes. Before getting into further details, let us first give a very brief survey of adaptive methods for higher dimensional integration that are currently in the literature. A method that is widely used by physicists to evaluate Feynman parameter integrals is the VEGAS program of Lepage (1978), which uses a hypercube as the base geometry. This is a Monte Carlo method, in which

8 random samplings of the integration volume are done with a separable probability density that is a product of one dimensional densities along each axis. This probability density is then iterated to give a more detailed sampling along axes on which the projection of the integrand is rapidly varying. A deterministic method of Genz and Cools (2003) is based on simplexes as the base regions. The algorithm picks the subregion with the largest estimated error, and subdivides it into up to four equal volume subregions by cutting edges along which the integrand is most rapidly varying. This, and related adaptive algorithms, are discussed in the survey of CUBPACK by Cools and Haegemans (2003). The CUBA set of algorithms described by Hahn (2005) includes both Monte Carlo methods and deterministic methods; the former include refinements of VEGAS and the latter proceed by bisection of the subregion with largest error. A survey of many types of high dimensional integration algorithms, including adaptive algorithms, is contained in the HIntLib Manual of Sch¨ urer (2008). Most of the algorithms just described do not proceed directly to a 2p subdivision of the base region (although the possibility of 2p subdivision is sketched in “Algorithm 2” of Cools and Haegemans (2003)). An algorithm in the literature which makes use of a 2p subdivision was given by Kahaner and Wells (1979). Unlike the algorithms which we develop below, which work directly from the vertex coordinates of a general simplex, the Kahaner Wells algorithm uses changes of variable for both simplex subdivision and integration. It also rank orders the errors for each subregion (as do most of the algorithms described in the preceding paragraph), and at each stage subdivides the subregion with the largest contribution to the total error. While this global method may result in efficiencies in reducing the number of subdivisions needed, it makes parallelization of the algorithm more complicated, since the computations for the different subregions are not independent of one another. Also, when many subregions have errors of similar size, which is often the case, the computational effort involved in rank ordering the errors may not be justified. In the algorithms developed below, as in the one-dimensional example given in Sec. 1, we use a local thinning condition for the subregions, making it easy to turn serial versions of the algorithm into parallel ones. We note, however, that the subdivision and integration methods that we use could also be incorporated into global adaptive algorithms.

IV.

SIMPLEX PROPERTIES AND APPLICATIONS

Any set of p + 1 points in p dimensional space defines a p-simplex, and we will be concerned with integrations over the interior region defined this way. Thus, in 1 dimension, 2 points define a

9 1-simplex that is the line segment joining them, in 2 dimensions, 3 points define a 2-simplex that is a triangle, in three dimensions, 4 points define a 3-simplex that is a tetrahedron, and so forth. We will refer to the p + 1 points, that each define a p-vector, as the vertices of the simplex, and our strategy will be to express all operations, both for the subdivision of simplexes and for calculating approximations to integrals over simplexes, directly in terms of these vertices. Our convention, both here in the text and in the programs, is that the p + 1 vertices of a simplex are enumerated from 0 to p, and the p vector components of each vertex are enumerated from 1 to p. Both will be denoted by subscripts; it should be clear from context and from the range of the index whether an index is the label of a simplex vertex, as in x0 , ..., xp , or the component index of a general point x, as in x1 , ..., xp . In this notation, the ith component of the jth simplex vertex is denoted by a double subscript xji .

A.

Simplex properties

A simplex forms a convex set. This means that for any integer n ≥ 1 and any set of points x1 , ...., xn lying within (or on the boundary) of a simplex, and any set of non-negative numbers α1 , ...., αn which sum to unity, αj ≥ 0,

j = 1, ..., n n X

,

αj = 1 ,

j=1

(12) the point x=

n X

αj xj

(13)

j=1

also lies within (or on the boundary) of the simplex (see, e.g., Osborne (2001)). In constructing integration rules for simplexes, we will be particularly interested in linear combinations of the form of Eq. (13) in which the points x1 , ..., xn are vertices of the simplex. For such sums, one can state a rule which determines precisely where the point x lies with respect to the boundaries of the simplex. Let x0 , x1 , ..., xp be the vertices of a simplex, and let xc denote the centroid of the simplex, p

xc =

1 X xj p+1 j=0

.

(14)

10 Let us denote by x ˜j the coordinates of the vertices with respect to the centroid as origin, x ˜ j = xj − xc

,

(15)

which obey the constraint following from Eq. (14), p X

x ˜j = 0 .

(16)

j=0

Correspondingly, let x denote a general point, and let x ˜ = x − xc denote the general point referred to the centroid as origin. Since we are assuming that the simplex is non-degenerate, the vectors x ˜j span a linearly independent basis for the p-dimensional space, and so we can always expand x ˜ as a linear combination of the x ˜j , x ˜=

p X

αj x ˜j

.

(17)

j=0

This expansion is not unique, since by Eq. (16) we can add a constant a to all of the coefficients αj , without changing the sum in Eq. (17). In particular, we can use this freedom to put the expansion of Eq. (17) in a standard form, which we will assume henceforth, in which the sum of the coefficients αj is unity, p X

αj = 1 .

(18)

j=0

For coefficients (called barycentric coordinates) obeying this unit sum condition, we can use Eqs. (14) and (15) to also write x=

p X

αj xj

.

(19)

j=0

In terms of the expansion of Eqs. (17) through (19) we can now state a rule see Pontryagin (1952) and the Wikipedia article on barycentric coordinates for determining where the point x

lies with respect to the simplex: (1) If all of the αj are strictly positive, the point lies inside the boundaries of the simplex; (2) If a coefficient αj is zero, the point lies on the boundary plane opposite to the vertex xj , and if several of the αj vanish, the point lies on the intersection of the corresponding boundary planes; (3) If any coefficient αj is negative, the point lies outside the simplex. To derive this rule, we observe that a point x lies within the simplex only if it lies on the same side of each boundary plane of the simplex as the simplex vertex opposite that boundary. Let us focus on one particular vertex of the simplex, which we label xp , so that the other p vertices are

11 x0 , ..., xp−1 . These p vertices span an affine hyperplane, which divides the p-dimensional space into two disjoint parts, and constitutes the simplex boundary hyperplane opposite the simplex vertex xp . A general parameterization of this hyperplane takes the form p−1 X

x = x0 +

j=1

βj (xj − x0 ) ,

(20)

that is, we take x0 as a fiducial point on the hyperplane and add arbitrary multiples of a complete basis of vectors xj − x0 in the hyperplane. Rewriting Eq. (20) as x=

p−1 X

γj xj

,

(21)

j=0

with γ0 = 1 −

Pp−1 j=1

βj and γj = βj , j ≥ 1, we see that the p coefficients γj obey the condition p−1 X

γj = 1 .

(22)

j=0

By virtue of this condition, we can also write the hyperplane parameterization of Eq. (21) in terms of coordinates with origin at the simplex centroid, x ˜=

p−1 X

γj x ˜j

.

(23)

j=0

We now wish to determine whether the general point x ˜ lies on the same side of this hyperplane as the vertex x ˜p , or lies on the hyperplane, or lies on the opposite side from x ˜p , by using the expansion of Eqs. (17) and (18), which we rewrite in the form x ˜=

p−1 X

αj x ˜j +

j=0

j=0

−

p−1 αp X x ˜j p

p−1 αp X x ˜j + αp x ˜p p

.

j=0

(24) The first line on the right hand side of Eq. (24) has the form of the hyperplane parameterization of Eq. (23), since by construction the coefficients add up to unity, and so this part of the right hand side is a point on the boundary hyperplane opposite the vertex x ˜p . The second line on the right hand side of Eq. (24) can be rewritten, by using Eq. (16), as αp

p+1 x ˜p p

.

(25)

12 To appreciate the significance of this, we note that the centroid of the p points defining the boundary hyperplane is p−1

x ˜h; c

1 1X x ˜j = − x ˜p = p p

,

(26)

j=0

where we have again used Eq. (16). Therefore the vector from the centroid of the points defining the hyperplane to the vertex x ˜p is x ˜p − x ˜h; c =

p+1 x ˜p p

.

(27)

So Eq. (25) tells us that the point x ˜ is displaced from the hyperplane by a vector parallel to that of Eq. (27), with its length rescaled by the factor αp . Therefore, if αp > 0, the point x ˜ lies on the same side of the boundary hyperplane as the opposite vertex x ˜p . If αp = 0, the point x ˜ lies on the boundary hyperplane, and if αp < 0, the point x ˜ lies on the opposite side of the boundary hyperplane from the vertex x ˜p . Applying this argument to all p + 1 vertices in turn gives the rules stated above. In constructing integration rules for simplexes, we will use the following elementary corollary of the result that we have just derived. Consider the sum ˜ = X

N X

λi x ˜i

,

(28)

i=1

with the coefficients λi obeying λi > 0, N X

i = 1, ..., N

λi < 1 ,

i=1

(29) with the points x ˜i any vertices of a simplex. Some vertices may be omitted, and some used more ˜ lies inside the simplex. To see this, we note than once, in the sum of Eq. (28). Then the point X that by adding a positive multiple of zero in the form of Eq. (16), the sum of Eq. (28) can be reduced to the form of Eqs. (17) and (18), with all expansion coefficients αj strictly positive. By ˜ lies within the simplex. the rule stated above, this implies that the point X

B.

Simplex applications

Our programs for p-dimensional integration make special use of two kinds of simplexes, the unit “standard simplex” introduced above, and the unit Kuhn simplex. In this subsection, we discuss

13 important applications of these two special types of simplexes. To recapitulate, the unit standard simplex has vertices given by x0 =(0, 0, 0, ..., 0)

,

x1 =(1, 0, 0, ..., 0)

,

x2 =(0, 1, 0, ..., 0)

,

x3 =(0, 0, 1, 0, ..., 0)

,

............ xp−1 =(0, 0, 0, ..., 0, 1, 0)

,

xp =(0, 0, 0, ..., 0, 0, 1)

. (30)

It is bounded by axis-parallel hyperplanes xj = 0,

j = 1, ..., p and the diagonal hyperplane

1 = x1 + x2 + ... + xp . Thus, the integral of a function f (x1 , ..., xp ) over the standard simplex can be written as a multiple integral in the form Z

f (x1 , ..., xp )dx1 ...dxp = ×

1

dx1

0 1−x1 −x2 −...−xp−2

standard simplex

Z

Z

0

Z

1−x1

dx2

dxp−1

1−x1 −x2

dx3 ....

0 1−x1 −x2 −...−xp−1

0

Z

Z

dxp f (x1 , ..., xp ) . (31)

0

An important physics application of this formula is the Feynman-Schwinger formula for combining perturbation theory denominators, 1 = p! D0 D1 ...Dp

Z

standard simplex

1 [(1 − x1 − x2 − ... − xp )D0 + x1 D1 + ... + xp Dp ]p+1

,

(32)

which can be proved inductively as follows. For p = 1, the Feynman-Schwinger formula reads 1 = D0 D1

Z

1

dx1

0

1 [(1 − x1 )D0 + x1 D1 ]2

,

(33)

which is easily verified by carrying out the integral. Assume now that this formula holds for dimension p. For p + 1, the formula asserts that 1 =(p + 1) ! D0 D1 ...Dp Dp+1 ×

Z

1

dx1 .... 0

Z

1−x1 −x2 −...−xp−1 0

dxp

Z

1−x1 −x2 −...−xp−1 −xp

dxp+1

0

1 [(1 − x1 − x2 − ... − xp − xp+1 )D0 + x1 D1 + ... + xp Dp + xp+1 Dp+1 ]p+2

. (34)

14 Carrying out the integral over xp+1 , we get Z 1−x1 −x2 −...−xp−1 Z 1 1 1 dxp dx1 .... = p! D0 D1 ...Dp Dp+1 D p+1 − D0 0 0 1 1 P P × − [(1 − pi=1 xi )D0 + x1 D1 + ... + xp Dp ]p+1 [(1 − pi=1 xi )Dp+1 + x1 D1 + ... + xp Dp ]p+1

.

(35)

But applying the induction hypothesis for p dimensions, the right hand side of this equation reduces to 1 1 1 1 − = Dp+1 − D0 D0 D1 ....Dp Dp+1 D1 ....Dp D0 D1 ...Dp Dp+1

,

(36)

which is the result to be proved. In the literature, numerical evaluation of Eq. (32) is usually accomplished by first making changes of variable to convert the simplex integral to an integral over a hypercube, and then using a hypercube-based program such as VEGAS. Using the methods developed below for direct evaluation of integrals over a standard simplex in arbitrary dimensions, the formula of Eq. (32) can also be integrated numerically in its original simplex form. We next turn to the unit Kuhn (1960) simplex, which has the vertices given by x0 =(0, 0, 0, ..., 0)

,

x1 =(1, 0, 0, ..., 0)

,

x2 =(1, 1, 0, ..., 0)

,

x3 =(1, 1, 1, 0, ..., 0)

,

............ xp−1 =(1, 1, 1, ..., 1, 1, 0)

,

xp =(1, 1, 1, ..., 1, 1, 1)

, (37)

and which defines a simplex in which 1 ≥ x1 ≥ x2 ≥ x3 .... ≥ xp−1 ≥ xp . The unit Kuhn simplex in two dimensions is illustrated in Fig. 2. The integral of a function f (x1 , ..., xp ) over a unit Kuhn simplex can be written as a multiple integral in the form Z

unit Kuhn

Z

1

Z

x1

Z

x2

dx3 .... dx2 dx1 f (x1 , ..., xp )dx1 ...dxp = 0 0 0 simplex Z xp−1 Z xp−2 dxp f (x1 , ..., xp ) . dxp−1 × 0

0

(38)

15

FIG. 2: The unit Kuhn simplex in 2 dimensions.

FIG. 3: Kuhn simplex tiling of a unit hypercube in 2 dimensions.

Consider now the integral of the function f (x1 , ..., xp ) over the unit hypercube, Z

1

dx1

0

Z

1

dx2 .... 0

Z

1

dxp−1 0

Z

1

dxp f (x1 , ..., xp ) .

(39)

0

This hypercube can be partitioned into p ! regions, each congruent to the unit Kuhn simplex, by the requirement that in the region corresponding to the permutation P of the coordinate labels 1, ..., p, the coordinates are ordered according to xP (1) ≥ xP (2) ≥ xP (3) .... ≥ xP (p−1) ≥ xP (p) . This partitioning or tiling is illustrated for a square in Fig. 3, and for a cube in three dimensions in Fig. 1 of Plaza (2007). Hence the integral of f over the unit hypercube is equal to the integral of the symmetrized function computed from f , integrated over the unit Kuhn simplex, Z

1

dx1 0

Z

0

1

dx2 .... Z =

Z

1

dxp−1 0

Z

1

dxp f (x1 , ..., xp )

0

X

unit Kuhn simplex p ! permutations P

f (xP (1) , ..., xP (p) )dx1 ...dxp

.

(40)

We will use this equivalence to construct adaptive programs for integration over a unit hypercube, based on first reducing it, by symmetrization, to an integral over a unit Kuhn simplex, and then adaptively subdividing the Kuhn simplex to reduce the integration error as needed.

16 V.

SIMPLEX SUBDIVISION AND PROPERTIES A.

Simplex subdivision algorithms

Two very simple algorithms for subdividing simplexes have been given in the computer graphics literature by Moore (1992). Let us denote the vertices of the starting simplex by x0 , ..., xp , each of which is a p-vector, and from these let us form the p-vectors V (k1 , k2 ) defined by V (k1 , k2 ) =

1 (xk + xk2 ) , 2 1

k1 , k2 = 0, ..., p.

(41)

Thus, V (0, 0) = x0 , V (0, 1) = (1/2)(x0 + x1 ) and so forth, so that the vectors V (k1 , k2 ) consist of the original simplex vertices, together with the midpoints of the original simplex edges. Let k = 0, ..., 2p − 1 be an index which labels the 2p subsimplexes into which the original simplex is divided. Moore then gives two algorithms, which he terms recursive subdivision and symmetric subdivision, for determining the vertices to be assigned to the subsimplex labelled with k. Both make use of the binary representation of k, and of a function determined by this representation, the bitcount function b(k), which is the number of 1 bits appearing in the binary representation of k. The recursive subdivision algorithm proceeds as follows. As the 0 vertex of the subsimplex labelled by k, take the vector V (b(k), b(k)), that is, k1 = k2 = b(k). To get the other vertices, scan along the binary representation of k from right (the units digit) to left. For each 0 encountered, add 1 to k2 , and for each 1 encountered, subtract 1 from k1 . The sequence of vectors V (k1 , k2 ) obtained this way gives the desired p + 1 vertices of the kth subsimplex. The symmetric subdivision algorithm proceeds as follows. As the 0 vertex of the subsimplex labelled by k, take the vector V (0, b(k)), that is, k1 = 0, k2 = b(k). To get the other vertices, scan along the binary representation of k from right (the units digit) to left. For each 0 encountered, add 1 to k2 , and for each 1 encountered, add 1 to k1 . The sequence of vectors V (k1 , k2 ) obtained this way gives the desired p + 1 vertices of the kth subsimplex. The application of these algorithms in the p = 2 case is illustrated in Tables I and II and Figs. 4–7, and in the p = 3 case is illustrated in Tables III and IV, where the notations V (j) (k1 , k2 ) and x(j) both refer to the jth vertex, j = 0, ..., p, of the subdivided simplex labelled by the k in each row. After reviewing these tables, it should be easy to follow the Fortran program for the algorithms given later on. The standard Fortran library does not include a bitcount function, but it does include a function IBIT S(I, P OS, LEN ), which gives the value of the substring of bits of length LEN , starting at position P OS, of the argument I. Thus, IBIT S(k, j, 1) gives the binary

17

FIG. 4: Recursive subdivision of a standard simplex.

FIG. 5: Symmetric subdivision of a standard simplex.

digit (0 or 1) at position j in the binary representation of k, which is all the information needed for the algorithm.

B.

Subdivision properties

This subdivision algorithm has a number of properties that will be useful in applying it to p dimensional integration. 1. As noted by Moore, the subdivided simplexes all have equal volume, equal to the initial simplex volume divided by 2p . This follows from the fact that the general formula for the volume of a simplex with vertices x0 , x1 , ..., xp is V =

1 | det(x1 − x0 , x2 − x0 , ..., xp − x0 )| . p!

(42)

Applying this to the vertices for the subdivided simplexes in Tables I-IV verifies this statement for p = 2, 3, while a proof in the general case is given in Edelsbrunner and Grayson (2000). 2. Again as noted by Moore, both the recursive and symmetric algorithms subdivide Kuhn simplexes into Kuhn simplexes, which however do not all have the same orientation, as illustrated in Fig. 6 and Fig. 7. This follows from the fact that Kuhn simplexes are a tiling

18

TABLE I: Recursive subdivision of a triangle (p = 2) b(k) V (0) (k1 , k2 ) V (1) (k1 , k2 ) V (2) (k1 , k2 ) x(0)

k 0=00

0

(0,0)

(0,1)

(0,2)

x(1)

x0

1 2 (x0

1=01

1

(1,1)

(0,1)

(0,2)

x1

1 2 (x0

2=10

1

(1,1)

(1,2)

(0,2)

x1

1 2 (x1

x2

1 2 (x1

3=11

2

(2,2)

(1,2)

(0,2)

x(2)

+ x1 )

1 2 (x0

+ x2 )

+ x1 )

1 2 (x0

+ x2 )

+ x2 )

1 2 (x0

+ x2 )

+ x2 )

1 2 (x0

+ x2 )

TABLE II: Symmetric subdivision of a triangle (p = 2) k 0=00

b(k) V (0) (k1 , k2 ) V (1) (k1 , k2 ) V (2) (k1 , k2 ) 0

(0,0)

(0,1)

(0,2)

x(0) x0

1=01

1

(0,1)

(1,1)

(1,2)

1 2 (x0

2=10

1

(0,1)

(0,2)

(1,2)

1 2 (x0

(2,2)

1 2 (x0

3=11

2

(0,2)

(1,2)

x(1) 1 2 (x0

+ x1 )

+ x1 )

x1

+ x1 )

1 2 (x0

+ x2 )

+ x2 )

1 2 (x1

+ x2 )

x(2) 1 2 (x0

+ x2 )

1 2 (x1

+ x2 )

1 2 (x1

+ x2 )

x2

of hypercubes, which are divided into hypercubes by axis parallel planes that intersect the midpoints of the hypercube edges. Adding additional diagonal slices intersecting the midpoints of the hypercube edges gives Kuhn tilings of both the original and the subdivided hypercubes. However, as also noted by Moore, when the algorithms are applied to general simplexes, the resultant subdivided simplexes can have different shapes, and are not isomorphic. For p = 2, Fig. 4 shows that recursive subdivision applied to the standard simplex leads to subsimplexes of different shapes, while Fig. 5 shows that symmetric subdivision applied to the standard simplex leads to subsimplexes that are all standard simplexes with dimension reduced by half. However, an examination of the vertices in Table IV shows that already at p = 3, symmetric subdivision of a standard simplex does not lead to subsimplexes that are all half size standard simplexes. For example, for k = 2 in Table IV, there are √ vertices 21 (x0 + x2 ) and 21 (x1 + x3 ), the edge joining which has length 3/2, whereas the √ maximum side length of a half size p = 3 standard simplex is 2/2.

FIG. 6: Recursive subdivision of a Kuhn simplex.

19

TABLE III: Recursive subdivision of a tetrahedron (p = 3) b(k) V (0) (k1 , k2 ) V (1) (k1 , k2 ) V (2) (k1 , k2 ) V (3) (k1 , k2 ) x(0)

k 0=000

0

(0,0)

(0,1)

(0,2)

(0,3)

x(1)

x0

1 2 (x0

x(2)

+ x1 )

1 2 (x0

x(3)

+ x2 )

1 2 (x0

+ x3 )

+ x1 )

1 2 (x0

+ x2 )

1 2 (x0

+ x3 )

+ x2 )

1 2 (x0

+ x2 )

1 2 (x0

+ x3 ) + x3 )

1=001

1

(1,1)

(0,1)

(0,2)

(0,3)

x1

1 2 (x0

2=010

1

(1,1)

(1,2)

(0,2)

(0,3)

x1

1 2 (x1

x2

1 2 (x1

+ x2 )

1 2 (x0

+ x2 )

1 2 (x0

+ x2 )

1 2 (x1

+ x3 )

1 2 (x0

+ x3 )

+ x2 )

1 2 (x1

+ x3 )

1 2 (x0

+ x3 )

+ x3 )

1 2 (x0

+ x3 )

+ x3 )

1 2 (x0

+ x3 )

3=011

2

(2,2)

(1,2)

(0,2)

(0,3)

4=100

1

(1,1)

(1,2)

(1,3)

(0,3)

x1

1 2 (x1

5=101

2

(2,2)

(1,2)

(1,3)

(0,3)

x2

1 2 (x1

+ x3 )

1 2 (x1

+ x3 )

1 2 (x1

6=110

2

(2,2)

(2,3)

(1,3)

(0,3)

x2

1 2 (x2

7=111

3

(3,3)

(2,3)

(1,3)

(0,3)

x3

1 2 (x2

TABLE IV: Symmetric subdivision of a tetrahedron (p = 3) b(k) V (0) (k1 , k2 ) V (1) (k1 , k2 ) V (2) (k1 , k2 ) V (3) (k1 , k2 )

k 0=000

0

(0,0)

(0,1)

(0,2)

(0,3)

x(0) x0

x(1) 1 2 (x0

+ x1 )

x(2) 1 2 (x0

+ x2 )

1 2 (x0

+ x3 )

1 2 (x1

+ x2 )

1 2 (x1

+ x3 )

1 2 (x1

+ x2 )

1 2 (x1

+ x3 )

1 2 (x2

+ x3 )

+ x3 )

1 2 (x1

+ x3 ) + x3 ) + x3 )

1=001

1

(0,1)

(1,1)

(1,2)

(1,3)

1 2 (x0

2=010

1

(0,1)

(0,2)

(1,2)

(1,3)

1 2 (x0

+ x2 )

1 2 (x1

+ x1 )

1 2 (x0

+ x2 )

1 2 (x1

+ x3 )

1 2 (x1

+ x3 )

+ x3 )

1 2 (x2

+ x3 )

x1

+ x1 ) + x1 )

1 2 (x0

+ x2 ) + x2 )

1 2 (x0

+ x3 )

1 2 (x2 1 2 (x2

3=011

2

(0,2)

(1,2)

(2,2)

(2,3)

1 2 (x0

4=100

1

(0,1)

(0,2)

(0,3)

(1,3)

1 2 (x0

+ x2 )

1 2 (x1

+ x2 )

1 2 (x0

+ x3 )

1 2 (x1

5=101

2

(0,2)

(1,2)

(1,3)

(2,3)

1 2 (x0

6=110

2

(0,2)

(0,3)

(1,3)

(2,3)

1 2 (x0

(3,3)

1 2 (x0

7=111

3

(0,3)

(1,3)

(2,3)

x(3)

+ x2 )

x2

x3

3. An important question is whether the maximum side length of the subdivided simplexes decreases at each stage of subdivision. For Kuhn simplexes, the answer is immediate, since subdivision results in Kuhn simplexes of half the dimension. Since the longest side of a unit √ Kuhn simplex in dimension p has length p, after ℓ subdivisions the maximum side length

FIG. 7: Symmetric subdivision of a Kuhn simplex.

20 will be LKuhn max =

√

p/2ℓ

,

(43)

irrespective of whether recursive or symmetric subdivision is used. For standard simplexes, we can get an upper bound on the maximum side length by noting that a unit standard simplex on axes y1 , ..., yp is obtained from a unit Kuhn simplex on axes x1 , ..., xp by the linear transformation yp = xp , yp−1 = xp−1 − xp , yp−2 = xp−2 − xp−1 , ..., y2 = x2 − x3 , y1 = x1 − x2 , since this maps the components of the Kuhn simplex vertices given in Eq. (37) to the corresponding components of the standard simplex vertices given in Eq. (30). By linearity, this relation also holds between vertices of corresponding subdivided simplexes obtained from the initial unit standard and Kuhn simplexes by applying the same midpoint subdivision method (either symmetric or recursive) successively to each. Consequently, the S of a subdivided standard simplex can be length Lstandard of an edge with components E1,...,p K of the corresponding edge of the related Kuhn expressed in terms of the components E1,...,p

simplex by p−1 p X X 1 K 2 ) + (EpK )2 ] 2 Lstandard ≡[ (EjS )2 ] = [ (EjK − Ej+1 j=1 p X

≤2[

j=1

1

(EjK )2 ] 2 = 2LKuhn

.

j=1

(44) Thus the length Lstandard is bounded from above by twice the maximum length corresponding to a subdivided Kuhn simplex, and so Lstandard ≤ max

√

p/2ℓ−1

.

(45)

We have verified this inequality numerically for both the recursive and symmetric subdivision algorithms. The numerical results suggest that the symmetric subdivision algorithm is in fact a factor of 2 better than the bound of Eq. (45), so that Lstandard; max

symmetric

≤

√ ℓ p/2

,

(46)

but we do not have a proof of this. We already see evidence of this difference between the symmetric and recursive algorithms in Tables III and IV. As noted above, from Table IV we √ saw that symmetric subdivision of a p = 3 standard simplex gives an edge of length 3/2,

21 and it is easy to see that this is the longest edge. However, from Table III we see that for k = 5 there are vertices x2 and 12 (x1 + x3 ), the edge joining which, for an initial standard √ simplex, has length 6/2. 4. The result of Eqs. (45) and(46) suggests the stronger conjecture, that after any number ℓ of symmetric (recursive) subdivisions of a standard simplex, the resulting subsimplexes each fit within a hypercube of side 1/2ℓ (1/2ℓ−1 ). A simple argument shows this to be true for ℓ = 1 in any dimension p. Although we do not have a proof for general ℓ, we will use this conjecture in certain of the algorithms constructed below. For Kuhn simplexes, an analogous statement with a hypercube of side 1/2ℓ is true for both symmetric and recursive subdivision, as noted above in the discussion preceding Eq. (43). 5. Finally, we note that although the symmetric algorithm gives the same simplex subdivision after permutation of the starting vertices in dimension p = 2, as can be verified from Table II, it is not permutation symmetric in dimension p = 3, as can be verified from Table IV. For example, interchanging the labels 0 and 1 in the k = 2 line of Table IV gives a set of vertices that is not in the table. This means that with symmetric (as well as recursive) subdivision in dimension p ≥ 3, inequivalent simplex subdivisions can be generated by permuting the labels of the starting vertices. However, we have not incorporated this feature into our programs. The properties just listed show that the symmetric and recursive subdivision algorithms are well suited to adaptive higher dimensional integration. They are easily computable in terms of the vertex coordinates for a general simplex, and give subsimplexes of equal volume, so that it is not necessary to calculate a determinant to obtain the volume. Additionally, the bound on the maximum side length decreases as a constant times 1/2ℓ with increasing order of subdivision ℓ, so that the application of high order integration formulas gives errors that decrease rapidly with ℓ.

VI.

HYPERCUBE SUBDIVISION AND PROPERTIES

We have discussed simplexes first, because as noted in Sec. III, our direct approach to hypercube integration will be based on following as closely as possible the methods that we develop for simplex integration. In our direct hypercube programs (i.e., the ones not based on tiling a side 1 hypercube with Kuhn simplexes), we will start from a half-side 1 hypercube with base region (−1, 1) ⊗ (−1, 1) ⊗ ... ⊗ (−1, 1)

.

(47)

22 This region has inversion symmetry around the origin, and consequently the only monomials that have non-vanishing integrals over this region are ones in which each coordinate appears with an even exponent, considerably simplifying the calculations needed to construct higher order integration rules. Since we restrict ourselves to axis-parallel hypercubes, only p + 1 real numbers are needed to uniquely specify a hypercube: the p coordinates of the centroid xc and the half-side length S. For example, for the region of Eq. (47), the centroid is xc = (0, 0, ..., 0) and the half-side is 1. Once we have adopted this labelling, we can give a very simple subdivision algorithm for hypercubes, constructed in direct analogy with Moore’s simplex subdivision algorithms. The hypercube subdivision algorithm proceeds as follows. Start from a hypercube with centroid xc and half-side S, with sides parallel to the p unit axis vectors u ˆ1 =(1, 0, 0, ..., 0) u ˆ2 =(0, 1, 0, ..., 0) ............ u ˆp−1 =(0, 0, ..., 1, 0) u ˆp =(0, 0, ..., 0, 1)

. (48)

To subdivide it into 2p subhypercubes, take the new half-side as S/2. To get the new centroids xc;k , labelled by k = 0, ..., 2p − 1, scan along the binary representation of k from right (the units digit) to left. Denoting the p digits in this representation by 1 ≤ j ≤ p, let us label the units digit as j = 1, the power of 2 digit as j = 2, the power of 4 digit as j = 3, and so forth. For all ˆj to xc , and if the j th digit is 1, add − 12 S u ˆj to xc . For 1 ≤ j ≤ p, if the j th digit is 0, add 21 S u each given k, this gives the centroid of the kth subhypercube. This algorithm is illustrated for the case of a cube (p = 3) in Table V. This algorithm is simpler than the ones for subdividing simplexes, since it only needs the Fortran IBITS function, but does not require subsequent computation of the bitcount function. It evidently has properties analogous to those of the simplex subdivision algorithms: each subhypercube has the same volume, equal to the original hypercube volume divided by 2p , and every linear dimension of each subhypercube is a factor of 2 smaller than the corresponding linear dimension of the hypercube that preceded it in the subdivision chain. This latter implies that after ℓ subdivisions, the resulting subhypercubes all have dimension reduced by a factor 1/2ℓ .

23

TABLE V: Subdivision of a cube of half-side S and centroid xc (p = 3) k

xc;k − xc

0=000

(S/2, S/2, S/2)

1=001

(−S/2, S/2, S/2)

2=010

(S/2, −S/2, S/2)

3=011 (−S/2, −S/2, S/2) 4=100

(S/2, S/2, −S/2)

5=101 (−S/2, S/2, −S/2) 6=110 (S/2, −S/2, −S/2) 7=111 (−S/2, −S/2, −S/2)

For a hypercube with centroid xc and half-side S, and for a general point x, let us define the coordinate relative to the centroid as x ˜ = x−xc , as we did in the simplex case in Eq. (15). Consider now the set of 2p points x ˜j , j = 1, ..., 2p defined by x ˜1 =(S, 0, 0, ..., 0) x ˜2 =(0, S, 0, ..., 0) ............ x ˜p =(0, 0, ..., S) x ˜p+1 =(−S, 0, 0, ..., 0) x ˜p+2 =(0, −S, 0, ..., 0) ............ x ˜2p =(0, 0, ..., −S)

. (49)

These points are the centroids of the maximal boundary hypercubes, and will play a role in the direct hypercube algorithm analogous to that played by the simplex vertices in the simplex adaptive algorithm. For future use, we need the following result, analogous to that of Eqs. (28) and (29) for the simplex case. Consider the sum ˜ = X

N X i=1

λi x ˜i

,

(50)

24 with the coefficients λi obeying λi > 0, N X

i = 1, ..., N

λi < 1 ,

i=1

(51) with the points x ˜i any of the hypercube boundary points of Eq. (49). Some of these points may be omitted (in which case the corresponding coefficient λi is 0), and some used more than once, ˜ lies inside the hypercube. To see this, we note that the in the sum of Eq. (50). Then the point X ˜ along any axis j is of the form X ˜ j = S(µ+ − µ− ), with µ± each a sum of some projection of X

˜ j ≤ Sµ+ < S for subset of the coefficients λi , and hence 0 ≤ µ± < 1. Therefore −S < −Sµ− ≤ X

˜j , and thus X lies within the hypercube. This proof, again, is simpler than each axis component X the corresponding result in the simplex case.

VII.

PARAMETERIZED HIGHER ORDER INTEGRATION FORMULAS FOR A GENERAL SIMPLEX

We turn next to deriving higher order integration formulas for a general simplex, which are expressed directly in terms of the set of simplex vertices, and which involve parameters that can be changed to sample the function over the simplex in different ways. Two different choices of the parameters then give two different integration rules of the same order, which can be compared to give a local error estimate for use in adaptive integration. Since we want to derive integration rules up to ninth order in accuracy, we start from an expansion of a general function f (˜ x) up to ninth order, with x ˜ as before the p dimensional coordinate referred to the simplex centroid as origin. The expansion reads, x i5 ˜i1 ...˜ ˜i4 + Fi1 ...i5 x ˜ i3 x ˜ i2 x ˜ i1 x ˜i3 + Ei1 i2 i3 i4 x ˜ i2 x ˜ i1 x ˜i2 + Di1 i2 i3 x ˜ i1 x ˜ i1 + C i1 i2 x f (˜ x) =A + Bi1 x xi9 + ... ˜i1 ...˜ xi8 + Ji1 ...i9 x ˜i1 ...˜ xi7 + Ii1 ...i8 x ˜i1 ...˜ xi6 + Hi1 ...i7 x ˜i1 ...˜ +Gi1 ...i6 x

. (52)

We next need expressions for the integral of the monomials appearing in the expansion of Eq. (52) over a general simplex with vertices x0 , ..., xp . A general formula for these integrals has been given by Good and Gaskins (1969, 1971). They define m(ν) as the generalized moment Z ν dx1 ...dxp x ˜ν11 ...˜ xpp , m(ν) = simplex

(53)

25 ν

and show that m(ν) is equal to the coefficient of tν11 ...tpp in the expansion of ∞ X V p !ν1 !...νp ! 1 exp[ Ws ] . (p + ν1 + ... + νp ) ! s

(54)

s=2

Here Ws is a double sum over ith components of the p + 1 simplex vertices labelled by a = 0, ..., p, given by p p X X x ˜ai ti ]s [ Ws =

,

(55)

a=0 i=1

and V is the simplex volume. Good and Gaskins derive this formula by first transforming the original simplex to a standard simplex, followed by lengthy algebraic manipulations to express the resulting formula symmetrically in terms of standard simplex vertices. We give in Sec. VIII below a derivation that proceeds directly, and with manifest symmetry, from the vertices of the original simplex. To proceed to 9th order we need an expansion of the exponential in Eq. (54) through 9th order. Since each Ws is of degree s in the coordinates, the terms in this expansion are as follows: second order : third order : fourth order : fifth order : sixth order : seventh order : eighth order : ninth order :

W2 2 W3 3 W22 W4 + 8 4 W2 W3 W5 + 6 5 W23 W32 W2 W4 W6 + + + 48 18 8 6 W22 W3 W3 W4 W2 W5 W7 + + + 24 12 10 7 W24 W2 W32 W22 W4 W42 W3 W5 W2 W6 W8 + + + + + + 384 36 32 32 15 12 8 W23 W3 W33 W2 W3 W4 W22 W5 W4 W5 W3 W6 W2 W7 W9 + + + + + + + 144 162 24 40 20 18 14 9

. (56)

We are interested in integrals of monomials of the form x ˜i1 ...˜ xin , with n ranging from 1 to 9. Good and Gaskins note that it suffices to consider the case in which all the indices i1 , ..., in are distinct (which is always possible for p ≥ n), since the combinatoric factors are such that this gives a result that also applies to the case when some of the component indices are equal, as must necessarily be P the case when p < n. So we can take νi = 1 , i = 1, ..., n, and i νi = n, with n the order of the

26 monomial. We now can infer from Eq. (56) the moment integrals 1 V

Z

dx1 ...dxp x ˜i1 ...˜ x in =

simplex

p! Sn (p + n) !

,

(57)

with the quantities Sn (with tensor indices suppressed) given in terms of tensors Si1 ...in defined by sums over the vertices, Si1 ...in =

p X

x ˜ji1 ...˜ xjin

,

(58)

j=0

as follows: S2 =Si1 i2 S3 =2Si1 i2 i3 S4 =Si1 i2 Si3 i4 + Si1 i3 Si2 i4 + Si1 i4 Si2 i3 + 6Si1 i2 i3 i4 = Si1 i2 Si3 i4 + 2 terms + 6Si1 i2 i3 i4 S5 =2(Si1 i2 Si3 i4 i5 + 9 terms) + 24Si1 i2 i3 i4 i5 S6 =Si1 i2 Si3 i4 Si5 i6 + 14 terms + 4(Si1 i2 i3 Si4 i5 i6 + 9 terms) + 6(Si1 i2 Si3 i4 i5 i6 + 14 terms) +120Si1 i2 i3 i4 i5 i6 S7 =2(Si1 i2 Si3 i4 Si5 i6 i7 + 104 terms) + 12(Si1 i2 i3 Si4 i5 i6 i7 + 34 terms) +24(Si1 i2 Si3 i4 i5 i6 i7 + 20 terms) + 720Si1 i2 i3 i4 i5 i6 i7 S8 =Si1 i2 Si3 i4 Si5 i6 Si7 i8 + 104 terms + 4(Si1 i2 Si3 i4 i5 Si6 i7 i8 + 279 terms) +6(Si1 i2 Si3 i4 Si5 i6 i7 i8 + 209 terms) +36(Si1 i2 i3 i4 Si5 i6 i7 i8 + 34 terms) + 48(Si1 i2 i3 Si4 i5 i6 i7 i8 + 55 terms) +120(Si1 i2 Si3 i4 i5 i6 i7 i8 + 27 terms) + 5040Si1 i2 i3 i4 i5 i6 i7 i8 S9 =2(Si1 i2 Si3 i4 Si5 i6 Si7 i8 i9 + 1259 terms) + 8(Si1 i2 i3 Si4 i5 i6 Si7 i8 i9 + 279 terms) +12(Si1 i2 Si3 i4 i5 Si6 i7 i8 i9 + 1259 terms) + 24(Si1 i2 Si3 i4 Si5 i6 i7 i8 i9 + 377 terms) +144(Si1 i2 i3 i4 Si5 i6 i7 i8 i9 + 125 terms) + 240(Si1 i2 i3 Si4 i5 i6 i7 i8 i9 + 83 terms) +720(Si1 i2 Si3 i4 i5 i6 i7 i8 i9 + 35 terms) + 40320Si1 i2 i3 i4 i5 i6 i7 i8 i9

. (59)

The rule for forming terms in Eq. (59) from those in Eq. (56) is this: for each Ws in Eq. (56) there is a tensor factor S with s indices, and the product of such factors appears repeated in all nontrivial index permutations, giving the “terms” not shown explicitly in Eq. (59). The numerical coefficient is constructed from the denominator appearing in Eq. (56), multiplied by a numerator

27 consisting of a factor s ! for each Ws , and for each Wsm an additional factor m ! (that is, for Wsm there is altogether a factor (s !)m m !). For example, a W23 in Eq. (56) gives rise to a numerator factor of (2 !)3 3 ! = 48 in Eq. (59), and a W2 W3 W4 in Eq. (56) gives rise to a numerator factor of 2 !3 !4 ! = 288 in Eq. (59). In each case, the product of this numerator factor, times the number of terms in the symmetrized expansion, is equal to n !. For example, 48 × 15 = 720 = 6 !, and 288 × 1260 = 362880 = 9!. Our next step is to combine Eqs. (52), (57), and (59) to get a formula for the integral of the function f over a general simplex, expressed in terms of its expansion coefficients. Since we will always be dealing with symmetrized tensors, it is useful at this point to condense the notation, by labelling the contractions of the expansion coefficients with the tensors S by the partition of n which appears. Thus, we will write C i1 i2 S i1 i2 = C 2 Fi1 i2 i3 i4 i5 (Si1 i2 Si3 i4 i5 + 9 terms) = F3+2 Hi1 i2 i3 i4 i5 i6 i7 (Si1 i2 Si3 i4 Si5 i6 i7 + 104 terms) = H3+2+2

, (60)

and so forth. Since the partitions of n that are relevant only involve n ≥ 2, a complete list of partitions that appear through ninth order is as follows: C

2

D

3

E

4, 2 + 2

F

5, 3 + 2

G 6, 4 + 2, 2 + 2 + 2, 3 + 3 H

7, 5 + 2, 3 + 2 + 2, 4 + 3

I

8, 6 + 2, 4 + 2 + 2, 2 + 2 + 2 + 2, 5 + 3, 3 + 3 + 2, 4 + 4

J

9, 7 + 2, 5 + 2 + 2, 3 + 2 + 2 + 2, 4 + 3 + 2, 6 + 3, 5 + 4, 3 + 3 + 3 . (61)

Employing this condensed notation, we now get the following master formula for the integral of f

28 over a general simplex, 1 V

Z

p! p! C2 + 2D3 (p + 2) ! (p + 3) ! simplex p! p! (6E4 + E2+2 ) + (24F5 + 2F3+2 ) + (p + 4) ! (p + 5) ! p! + (120G6 + 6G4+2 + 4G3+3 + G2+2+2 ) (p + 6) ! p! (720H7 + 24H5+2 + 12H4+3 + 2H3+2+2 ) + (p + 7) ! p! + (5040I8 + 120I6+2 + 48I5+3 + 36I4+4 + 6I4+2+2 + 4I3+3+2 + I2+2+2+2 ) (p + 8) ! p! (40320J9 + 720J7+2 + 240J6+3 + 144J5+4 + 24J5+2+2 + 12J4+3+2 + (p + 9) ! dx1 ...dxp f (˜ x) = A +

+8J3+3+3 + 2J3+2+2+2 ) +...

. (62)

Our procedure is now to match this expansion to discrete sums over the function f evaluated at points on the boundary or interior of the simplex. We will construct these sums using parameter multiples of the vertices of the simplex (in which the summation limits for a, b, c, d are 0 to p for simplexes, and will be 1 to 2p later on when we apply these formulas to hypercubes), Σ1 (λ) =

X a

f (λ˜ xa ) , 0 ≤ λ ≤ 1

Σ2 (λ, σ) =

X

f (λ˜ xa + σ˜ xb ) , 0 ≤ λ, σ , λ + σ ≤ 1

Σ3 (λ, σ, µ) =

X

f (λ˜ xa + σ˜ xb + µ˜ xc ) , 0 ≤ λ, σ, µ , λ + σ + µ ≤ 1

a,b

a,b,c

Σ4 (λ, σ, µ, κ) =

X

a,b,c,d

f (λ˜ xa + σ˜ xb + µ˜ xc + κ˜ xd ) , 0 ≤ λ, σ, µ, κ , λ + σ + µ + κ ≤ 1

, (63)

where the conditions on the parameters λ, σ, µ, κ guarantee, by our discussion of simplex properties, that the points summed over do not lie outside the simplex. Clearly, once we have a formula for Σ4 , we can get a formula for Σ3 by setting κ = 0 and dividing by p + 1 (which becomes 2p in the hypercube case); we can then get Σ2 by further setting µ = 0 and dividing out another factor of p + 1, and so forth. Hence we only exhibit here the expansion of Σ4 in terms of f (˜0) = A and the

29 contractions C2 , ..., J3+2+2+2 appearing in Eq. (62). Abbreviating ξ = p + 1, we have Σ4 = ξ 4 A + ξ 3 (λ2 + σ 2 + µ2 + κ2 )C2 + ξ 3 (λ3 + σ 3 + µ3 + κ3 )D3 + ξ 3 (λ4 + σ 4 + µ4 + κ4 )E4 +2ξ 2 (λ2 σ 2 + λ2 µ2 + λ2 κ2 + σ 2 µ2 + σ 2 κ2 + µ2 κ2 )E2+2 + ξ 3 (λ5 + σ 5 + µ5 + κ5 )F5 +ξ 2 (λ2 σ 3 + σ 2 λ3 + λ2 µ3 + µ2 λ3 + λ2 κ3 + κ2 λ3 + σ 2 µ3 + µ2 σ 3 + σ 2 κ3 + κ2 σ 3 + µ2 κ3 + κ2 µ3 )F3+2 +ξ 3 (λ6 + σ 6 + µ6 + κ6 )G6 + ξ 2 (λ4 σ 2 + λ2 σ 4 + λ4 µ2 + λ2 µ4 + λ4 κ2 + λ2 κ4 + σ 4 µ2 + σ 2 µ4 +σ 4 κ2 + σ 2 κ4 + µ4 κ2 + µ2 κ4 )G4+2 + 6ξ(λ2 σ 2 µ2 + λ2 σ 2 κ2 + λ2 µ2 κ2 + σ 2 µ2 κ2 )G2+2+2 +2ξ 2 (λ3 σ 3 + λ3 µ3 + λ3 κ3 + σ 3 µ3 + σ 3 κ3 + µ3 κ3 )G3+3 + ξ 3 (λ7 + σ 7 + µ7 + κ7 )H7 +ξ 2 (λ5 σ 2 + λ2 σ 5 + λ5 µ2 + λ2 µ5 + λ5 κ2 + λ2 κ5 + σ 5 µ2 + σ 2 µ5 + σ 5 κ2 + σ 2 κ5 + µ5 κ2 + µ2 κ5 )H5+2 +2ξ(λ3 σ 2 µ2 + λ3 σ 2 κ2 + λ3 µ2 κ2 + σ 3 λ2 µ2 + σ 3 λ2 κ2 + σ 3 µ2 κ2 + µ3 λ2 σ 2 + µ3 λ2 κ2 + µ3 σ 2 κ2 +κ3 λ2 σ 2 + κ3 λ2 µ2 + κ3 σ 2 µ2 )H3+2+2 + ξ 2 (λ4 σ 3 + λ3 σ 4 + λ4 µ3 + λ3 µ4 + λ4 κ3 + λ3 κ4 + σ 4 µ3 + σ 3 µ4 +σ 4 κ3 + σ 3 κ4 + µ4 κ3 + µ3 κ4 )H4+3 + ξ 3 (λ8 + σ 8 + µ8 + κ8 )I8 + ξ 2 (λ6 σ 2 + λ2 σ 6 + λ6 µ2 + λ2 µ6 +λ6 κ2 + λ2 κ6 + σ 6 µ2 + σ 2 µ6 + σ 6 κ2 + σ 2 κ6 + µ6 κ2 + µ2 κ6 )I6+2 + 2ξ(λ4 σ 2 µ2 + λ4 σ 2 κ2 + λ4 µ2 κ2 +σ 4 λ2 µ2 + σ 4 λ2 κ2 + σ 4 µ2 κ2 + µ4 λ2 σ 2 + µ4 λ2 κ2 + µ4 σ 2 κ2 + κ4 λ2 σ 2 + κ4 λ2 µ2 + κ4 σ 2 µ2 )I4+2+2 +24λ2 σ 2 µ2 κ2 I2+2+2+2 + ξ 2 (λ5 σ 3 + λ3 σ 5 + λ5 µ3 + λ3 µ5 + λ5 κ3 + λ3 κ5 + σ 5 µ3 + σ 3 µ5 + σ 5 κ3 +σ 3 κ5 + µ5 κ3 + µ3 κ5 )I5+3 + 2ξ(λ2 σ 3 µ3 + λ2 σ 3 κ3 + λ2 µ3 κ3 + σ 2 λ3 µ3 + σ 2 λ3 κ3 + σ 2 µ3 κ3 +µ2 λ3 σ 3 + µ2 λ3 κ3 + µ2 σ 3 κ3 + κ2 λ3 σ 3 + κ2 λ3 µ3 + κ2 σ 3 µ3 )I3+2+2 + 2ξ 2 (λ4 σ 4 + λ4 µ4 + λ4 κ4 + σ 4 µ4 +σ 4 κ4 + µ4 κ4 )I4+4 + ξ 3 (λ9 + σ 9 + µ9 + κ9 )J9 + ξ 2 (λ7 σ 2 + λ2 σ 7 + λ7 µ2 + λ2 µ7 + λ7 κ2 + λ2 κ7 +σ 7 µ2 + σ 2 µ7 + σ 7 κ2 + σ 2 κ7 + µ7 κ2 + µ2 κ7 )J7+2 + 2ξ(λ5 σ 2 µ2 + λ5 σ 2 κ2 + λ5 µ2 κ2 + σ 5 λ2 µ2 +σ 5 λ2 κ2 + σ 5 µ2 κ2 + µ5 λ2 σ 2 + µ5 λ2 κ2 + µ5 σ 2 κ2 + κ5 λ2 σ 2 + κ5 λ2 µ2 + κ5 σ 2 µ2 )J5+2+2 +6(λ3 σ 2 µ2 κ2 + λ2 σ 3 µ2 κ2 + λ2 σ 2 µ3 κ2 + λ2 σ 2 µ2 κ3 )J3+2+2+2 + ξ(λ4 σ 3 µ2 + λ4 σ 2 µ3 + λ4 σ 3 κ2 +λ4 σ 2 κ3 + λ4 µ3 κ2 + λ4 κ2 µ3 + σ 4 λ3 µ2 + σ 4 λ2 µ3 + σ 4 λ3 κ2 + σ 4 λ2 κ3 + σ 4 µ3 κ2 + σ 4 µ2 κ3 +µ4 λ3 σ 2 + µ4 λ2 σ 3 + µ4 λ3 κ2 + µ4 λ2 κ3 + µ4 σ 3 κ2 + µ4 σ 2 κ3 + κ4 λ3 σ 2 + κ4 λ2 σ 3 + κ4 λ3 µ2 +κ4 λ2 µ3 + κ4 σ 3 µ2 + κ4 σ 2 µ3 )J4+3+2 + ξ 2 (λ6 σ 3 + λ3 σ 6 + λ6 µ3 + λ3 µ6 + λ6 κ3 + λ3 κ6 + σ 6 µ3 + σ 3 µ6 +σ 6 κ3 + σ 3 κ6 + µ6 κ3 + µ3 κ6 )J6+3 + ξ 2 (λ5 σ 4 + λ4 σ 5 + λ5 µ4 + λ4 µ5 + λ5 κ4 + λ4 κ5 + σ 5 µ4 + σ 4 µ5 +σ 5 κ4 + σ 4 κ5 + µ5 κ4 + µ4 κ5 )J5+4 + 6ξ(λ3 σ 3 µ3 + λ3 σ 3 κ3 + λ3 κ3 µ3 + σ 3 µ3 κ3 )J3+3+3

.

(64) Evidently, for simplexes Eq. (63) requires (p + 1)4 function evaluations to compute Σ4 , (p + 1)3 function evaluations to compute Σ3 , etc. For hypercubes, with the simplex vertices replaced by the

30 2p points of Eq. (49), and ξ = 2p, Eq. (63) requires (2p)4 function evaluations to compute Σ4 , (2p)3 function evaluations to compute Σ3 , etc. Since it is known that the minimal number of function evaluations for a simplex integration method of order 2t + 1 involves p t /t ! + O(p t−1 ) function calls (Stroud (1971), Grundmann and M¨oller (1978)), and for a hypercube integration method of order 2t + 1 involves (2p) t /t ! + O(p t−1 ) function calls (Lyness, 1965), we will take the leading Σs in our integration formulas to have equal arguments, e.g. Σ4 (λ, λ, λ, λ), Σ3 (λ, λ, λ, λ), etc. This allows the parameterized integration formulas constructed below to have an optimal leading order power dependence on the space dimension p (but reflecting the parameter freedom, the non-leading power terms will not in general be minimal). In the computer program, the following formulas are useful in evaluating the sums using a minimum number of function calls, X

Σ4 (λ, λ, λ, λ) =24

f (λ(˜ xa + x ˜b + x ˜c + x ˜d )) + 12

XX a

Σ3 (λ, λ, λ) =6

f (2λ(˜ xa + x ˜b )) + 4

XX a

b

X

f (λ(˜ xa + x ˜b + x ˜c )) + 3

XX

f (λ(˜ xa + x ˜b )) +

X

X

a

Σ3 (2λ, λ, λ) =2

X

f (3λ˜ xa + λ˜ xb ) +

XX

f (2λ˜ xa + λ˜ xb ) +

a

Σ2 (3λ, λ) = Σ2 (2λ, λ) =

XX a

f (2λ(˜ xa + x ˜b )) +

f (3λ˜ xa ) ,

X

f (4λ˜ xa )

f (3λ˜ xa + λ˜ xb )

b6=a

,

a

b

f (3λ˜ xa + λ˜ xb ) +

XX

f (2λ˜ xa + λ˜ xb ) +

a

X

f (2λ˜ xa ) ,

f (2λ˜ xa + λ(˜ xb + x ˜c )) + 2

XX a

f (4λ˜ xa ) ,

a

b6=a

a b6=a, c6=a, b

+2

X

a

b

XX

f (2λ˜ xa + λ(˜ xb + x ˜c ))

a

b6=a

a

a

Σ2 (λ, λ) =2

X

a b6=a, c6=a, b

a

+6

X

X

f (4λ˜ xa )

,

X

f (3λ˜ xa )

.

a

b6=a

a

b6=a

(65) With these preliminaries in hand, we are now ready to set up integration formulas of first through fourth, fifth, seventh, and ninth order, for integrals over general simplexes.

A.

First through third order formulas

We begin here with integration formulas of first through third order, which may be more useful than high order formulas for integrating functions that are highly irregular, or as explained later on, for integrations in low dimensional spaces. Two different first order accurate estimates of the

31 integral of Eq. (62) are clearly Ia =Σ1 (λ)/ξ = A + second order , Ib =f (˜0) = A , (66) with x ˜ = ˜0 the simplex centroid. Evidently Ib is the dimension p analog of the dimension one center-of-bin rule, and when λ = 1, Ia is the dimension p analog of the dimension one trapezoidal rule. To get a second order accurate formula, we have to match the terms A+

p! C2 (p + 2)!

(67)

in Eq. (62). Solving Σ1 (λ) = ξA + λ2 C2 + ... for C2 , we get C2 ≃ [Σ1 (λ) − ξA]/λ2

,

(68)

which when substituted into Eq. (67) gives the second order accurate formula ξ 1 p! p! Σ1 (λ) . f (˜0) + I = 1− 2 (p + 2)! λ (p + 2)! λ2

(69)

Using two different parameter values λa,b gives two different second order accurate estimates Ia,b of the integral. We give two different methods of getting a third order accurate formula, both of which will play a role in the methods for getting higher odd order formulas. We first note that for λ = 2/(p + 3), we have Σ1 (λ) = ξA + λ2 [C2 + 2D3 /(p + 3)]

,

(70)

and so the coefficients of C2 and D3 are in the same ratio as appears in Eq. (62). Hence defining an overall multiplicative factor κ1 to make both terms match in magnitude, and adding a multiple κ0 of A to make this term match, we get a third order accurate formula Ia =κ1 Σ1 (λ) + κ0 f (˜0) , κ1 =

p! (p + 3)2 λ−2 = (p + 2)! 4(p + 2)(p + 1)

κ0 =1 − ξκ1

,

. (71)

32 An alternative method of getting a third order accurate formula is to look for a match by writing Ib =¯ κ0 f (˜ 0) +

2 X

κi1 Σ1 (λi1 )

i=1

=¯ κ0 A +

2 X

κi1 [ξA + (λi1 )2 C2 + (λi1 )3 D3 ]

i=1

=A +

p! 2p ! C2 + D3 + ... . (p + 2)! (p + 3)! (72)

Equating coefficients of A we get κ ¯0 = 1 − ξ

2 X

κi1

,

(73)

i=1

while equating coefficients of C2 and D3 , we obtain a system of two simultaneous equations for κi1 , i = 1, 2 , q1 =w1 + w2

,

q2 =λ11 w1 + λ21 w2

, (74)

where we have abbreviated q1 =

p! , (p + 2)!

wi =κi1 (λi1 )2 ,

q2 =

2p ! (p + 3)!

i = 1, 2

,

. (75)

This set of equations can be immediately solved to give λ21 q1 − q2 λ21 − λ11 λ1 q1 − q2 w2 = 11 λ1 − λ21

w1 =

, , (76)

giving a second third order accurate formula for any nondegenerate λ11 and λ21 lying in the interval (0,1). We will see later on, in discussing higher odd order integration formulas, that this is our first encounter with a Vandermonde system of equations.

33 B.

Fourth order formula

Although we will subsequently focus on odd-order formulas, we next derive a fourth order formula, which follows a different pattern. Referring to Eq. (62), we see that to get a fourth order formula we have to use the sums of Eq. (63) to match the coefficients of A, C2 , D3 , E4 , and E2+2 . Since only the final one of these, E2+2 , involves two partitions of 4, we can use Σ2 (λ, λ) to extract this, with any positive value of λ ≤ 21 . Since the simplex subdivision algorithm uses the midpoints 1 2 (xa

+ xb ) as the vertices of the subdivided simplex, an efficient way to proceed in this case is

to take λ =

1 2

in Σ2 (λ, λ), so that what is needed is the function value at the midpoints, and to

compute these function values as part of the subdivision algorithm. This also yields the function values at the vertices of the subdivided simplex. We can get A from f (˜ xc ), and we can fit C2 , D3 , and E4 by evaluating Σ1 (λ) with three distinct values of λ. One of these values can be taken as λ = 1, corresponding to the function values at the simplex vertices. The other two are free parameters, and by making two different choices for one of these, we get two different fourth order evaluations of the integral. We worked out the fourth order program before proceeding systematically to the odd order cases, and so used a different notation from that of Eq. (63). Let us write fc , fv , and fs for the sums of function values at the centroid, the vertices, and the side midpoints, fc =f (˜ xc ) fv = fs =

1 p+1

, X

f (˜ xa ) ,

a

X 1 1 xa + x ˜b )) . f ( (˜ (p + 1)p 2 a6=b

(77) Let us also introduce, for n > m, the definition p nm ≡ (p + m)(p + m + 1)...(p + n) .

(78)

A simple calculation then shows that through fourth order terms, we have 1 V

Z

simplex

dx1 ...dxp f (˜ x) −

8p 1 (fs + fv ) p42 p

=k0 A + k2 C2 + k3 D3 + k4 E4

, (79)

34 with coefficients given by 8(p + 1) , p42 1 4 k2 = − , p21 p42 2 2 − , k3 = p31 p42 1 6 − . k4 = p41 p42 k0 =1 −

(80) Defining now fλ =

1 X f (λ˜ xa ) , p+1 a

(81)

so that f1 = fv , we find that through fourth order,

1 (t1 − t2 ), λ1 − λ2 1 p+1 [fv − fc − 2 (fλj − fc )] , j = 1, 2 tj = 1 − λj λj

E4 =

,

2

D3 =

1X [tj − (λj + 1)E4 ] , 2 j=1

C2 =(p + 1)(fv − fc ) − D3 − E4

. (82)

When substituted into Eq. (79), this gives a fourth order formula for the integral, with a second evaluation of the integral obtained by replacing λ2 by a third, distinct value λ3 .

C.

Fifth order formula

We turn next to deriving a fifth order formula. Referring to Eq. (62), we see that to get a fifth order formula we have to use the sums of Eq. (63) to match the coefficients of A, C2 , D3 , E4 , E2+2 , F5 , and F3+2 . Since at most two partitions appear, we can still get the leading two-partition terms from Σ2 (λ, λ), but we must now impose a condition on λ to guarantee that E2+2 and F3+2 appear with coefficients in the correct ratio. From Eq. (62) we see that the ratio of the coefficient of F3+2 to that of E2+2 must be 2/(p + 5), and from Eqs. (63) and (64) with λ = σ and µ = κ = 0, we see that this is obtained with λ=

2 p+5

,

(83)

35 which for any p ≥ 1 obeys the condition 2λ < 1. The overall coefficient of Σ2 needed to fit E2+2 and F3+2 is easily seen to be κ2 =

(p + 5)4 p! λ−4 = 2 !(p + 4) ! 32p 41

,

(84)

where we have used the abbreviated notation of Eq. (78). Thus we have, again from Eq. (64), κ2 Σ2 (λ, λ) =

p! p! E2+2 +2 F3+2 +κ2 [ξ 2 A+ξ(2λ2 C2 +2λ3 D3 +2λ4 E4 +2λ5 F5 )] , (85) (p + 4) ! (p + 5) !

with ξ = p + 1. Since there are four single partition terms, we look for an integration formula of the form κ2 Σ2 (λ, λ) +

4 X

κi1 Σ1 (λi1 ) + κ0 A ,

(86)

i=1

which is to be equated to the sum of terms through fifth order in Eq. (62). The equation for matching the coefficient of A can immediately be solved in terms of the coefficients κi1 , giving κ0 = 1 − R0 ,

2

R0 = ξ κ2 + ξ

4 X

κi1

.

(87)

i=1

The four equations for matching the coefficients of C2 , D3 , E4 , and F5 give a N = 4 Vandermonde system that determines the four coefficients κi1 . Writing an order N Vandermonde system in the standard form N X

xik−1 wi = qk ,

k = 1, ..., N

,

(88)

i=1

the equations determining the κi1 take this form with xi =λi1 , 1 p21 2 q2 = p31 6 q3 = p41 24 q4 = p51 q1 =

wi = κi1 (λi1 )2 − 2ξκ2 λ2

,

− 2ξκ2 λ3

,

− 2ξκ2 λ4

,

− 2ξκ2 λ5

.

,

(89) Solving this system of linear equations, for any nondegenerate values of the parameters 0 < λi1 < 1, gives the coefficients κi1 and completes specification of the integration formula.

36 D.

Vandermonde solvers

Since we will repeatedly encounter Vandermonde equations in setting up parameterized higher order integration formulas, both for simplexes and for hypercubes, we digress at this point to discuss methods of solving a Vandermonde system. The explicit inversion of the Vandermonde system is well known (see, e.g. Neagoe (1996), Heinen and Niederjohn (1997)), and takes the form w1 =

qN − S1 (x2 , ..., xN )qN −1 + S2 (x2 , ..., xN )qN −2 − ... + (−1)N −1 x2 ....xN q1 (x1 − x2 )(x1 − x3 )....(x1 − xN )

,

(90)

with Sj (x2 , ..., xN ) the sum of j-fold products of x2 , ..., xN , S1 (x2 , ..., xN ) =x2 + ... + xN

,

S2 (x2 , ..., xN ) =x2 x3 + ... + x2 xN + x3 x4 + ... + x3 xN + ... + xN −1 xN

, (91)

and so forth. The remaining unknowns w2 through wN are obtained from this formula by cyclic permutation of the indices i = 1, ..., N on the wi and the xi , with the qk held fixed. For N not too large it is straightforward to program this solution, and we include subroutines for the N = 2, 3, 4, 6, 8 cases in the programs. This suffices to solve the Vandermonde equations appearing in the fifth through ninth order simplex formulas, and in the fifth through ninth order hypercube formulas derived below. For large N , programming the explicit solution becomes inefficient and a better procedure is to use a compact algorithm for solving the Vandermonde equations for general N , based on polynomial operations, which has running time proportional to N 2 . A good method of this type, that we have tested, is the algorithm vander.for given in the book Numerical Recipes in Fortran by Press et al. (1992). A similar algorithm for inverting the Vandermonde matrix (that we have not tested) can be found in an on-line paper of Dejnakarintra and Banjerdpongchai, searchable under the title “An Algorithm for Computing the Analytical Inverse of the Vandermonde Matrix”.

E.

Seventh order formula

To get a seventh order formula, we use the sums of Eq. (63) to match the coefficients appearing in Eq. (62) through the term H3+2+2 . Since at most three partitions appear, we can get the leading three-partition terms G2+2+2 and H3+2+2 from Σ3 (λ, λ, λ) by imposing the condition λ=

2 p+7

,

(92)

37 which guarantees that their coefficients are in the correct ratio, and which for any p ≥ 1 obeys the condition 3λ < 1. The overall coefficient of Σ3 needed to fit G2+2+2 and H3+2+2 is κ3 =

(p + 7)6 p! λ−6 = 3 !(p + 6) ! 384p 61

.

(93)

6 X

(94)

We now look for an integration formula of the form κ3 Σ3 (λ, λ, λ) + κ′2 Σ2 (2λ, λ) +

U X

κi2 Σ2 (λi2 , λi2 ) +

κi1 Σ1 (λi1 ) + κ0 A ,

i=1

i=1

with U ≤ 6 since there are 6 two-partition terms to be matched. Equating coefficients of the twopartition terms, we find that the equations for G4+2 −G3+3 and H5+2 −H4+3 are both automatically satisfied by taking κ′2 = 3κ3

.

(95)

This leaves only the two-partition terms E2+2 , F3+2 , G4+2 , and H5+2 to be matched, so we can take the upper limit in the Σ2 summation as U = 4. The four coefficients κi2 are then determined by solving an N = 4 Vandermonde system with inhomogeneous terms q2i , xi =λi2 ,

wi = 2κi2 (λi2 )4

,

1 p41 2 q22 = p51 6 q23 = p61 24 q24 = p71

− (6ξ + 24)κ3 λ4

,

− (6ξ + 36)κ3 λ5

,

− (6ξ + 60)κ3 λ6

,

q21 =

− (6ξ + 108)κ3 λ7

i = 1, ..., 4,

. (96)

We next have to match the 6 single partition terms, using Σ1 sums. To save function calls, we take four of the parameters λi1 to be equal to 2λi2 , with the other two λi1 taken as new, independent parameters. Equating the coefficients of the single partition terms C2 through H7 then gives a N = 6 Vandermonde system determining the coefficients κi1 , with inhomogeneous terms q1i ,

i=

38 1, ..., 6, xi =λi1 ,

wi = κi1 (λi1 )2

,

4

q11 =

X 1 κi2 (λi2 )2 − (3ξ 2 + 15ξ)λ2 κ3 − 2ξ p21

,

i=1

q12 =

2 − 2ξ p31

4 X i=1

κi2 (λi2 )3 − (3ξ 2 + 27ξ)λ3 κ3

,

6 − ξq21 − (3ξ 2 + 51ξ)λ4 κ3 , p41 24 q14 = − ξq22 − (3ξ 2 + 99ξ)λ5 κ3 , p51 120 − ξq23 − (3ξ 2 + 195ξ)λ6 κ3 , q15 = p61 720 − ξq24 − (3ξ 2 + 387ξ)λ7 κ3 . q16 = p71

q13 =

(97) Note that in q13 , ..., q16 , the sums 2

P4

i j i i=1 κ2 (λ2 )

, j = 4, ..., 7 have been eliminated in terms of

q21 , ..., q24 by using the Vandermonde system of Eq. (96). Finally, matching the coefficient of A we get, using Eq. (95) κ0 = 1 − R0 ,

3

2

R0 = (ξ + 3ξ )κ3 + ξ

2

4 X

κi2

+ξ

κi1

.

(98)

i=1

i=1

F.

6 X

Ninth order formula

To get a ninth order formula, we use the sums of Eq. (63) to match the coefficients appearing in Eq. (62) through the final exhibited term J3+2+2+2 . Since at most four partitions appear, we can get the leading four-partition terms J3+2+2+2 and I2+2+2+2 from Σ4 (λ, λ, λ, λ) by imposing the condition λ=

2 p+9

,

(99)

which guarantees that their coefficients are in the correct ratio, and which for any p ≥ 1 obeys the condition 4λ < 1. The overall coefficient of Σ4 needed to fit J3+2+2+2 and I2+2+2+2 is κ4 =

(p + 9)8 p! λ−8 = 4 !(p + 8) ! 6144p 81

.

(100)

39 We now look (with benefit of hindsight) for an integration formula of the form κ4 Σ4 (λ, λ, λ, λ) +

κ′3 Σ3 (2λ, λ, λ)

+

κ′′2 Σ2 (3λ, λ)

+

4 X

κi3 Σ3 (λi3 , λi3 , λi3 ) +

+

κ ¯ i2 Σ2 (λi2 , λi2 ) +

4 X

κi1 Σ1 (3λi3 ) +

κ ¯i1 Σ1 (2λi3 ) + κ0 A ,

i=1

i=1

i=1

4 X

κi2 Σ2 (2λi3 , λi3 )

i=1

i=1

6 X

4 X

(101) with four of the λi2 taken equal to the four λi3 , and the other two λi2 additional parameters. (Again, we reuse parameters wherever similar structures are involved in Eq. (65), so as to save function calls.) We proceed to sketch the remaining calculation, without writing down the detailed form of the resulting Vandermonde equations (which can be read off from the programs, and is fairly complicated). We begin with the three-partition terms. The equations for J5+2+2 − J4+3+2 , J4+3+2 − J3+3+3 , and I4+2+2 − I3+3+2 are all automatically satisfied by taking κ′3 = 6κ4

.

(102)

This leaves four independent matching conditions for G2+2+2 , H3+2+2 , I4+2+2 , and J5+2+2 , which lead to a N = 4 Vandermonde system determining the coefficients κi3 . We turn next to the twopartition terms. We find that the equations for I6+2 −4.5I5+3 +3.5I4+4 and J7+2 −3.5J6+3 +2.5J5+4 are automatically satisfied by taking κ′′2 = 8κ4

.

(103)

The four equations for G4+2 − G3+3 , H5+2 − H4+3 , I6+2 − I4+4 , and J7+2 + J6+3 − 2J5+4 then

give a N = 4 Vandermonde system determining the coefficients κi2 . The remaining independent equations matching two-partition terms, for E2+2 , F3+2 , G4+2 , H5+2 , I6+2 , and J7+2 , then give a

N = 6 Vandermonde system determining the coefficients κ ¯i2 . Turning to the single partition terms, the eight equations obtained by matching coefficients for C2 , D3 , E4 , F5 , G6 , H7 , I8 , and J9 give a N = 8 Vandermonde system determining simultaneously the four coefficients κi1 and the four coefficients κ ¯ i1 . Finally, equating coefficients of A gives κ0 = 1 − R ,

4

3

2

R = (ξ + 6ξ + 8ξ )κ4 + ξ

3

4 X i=1

κi3

4 6 4 X X X i i (κi1 + κ ¯ i1 ) . (104) κ ¯2) + ξ κ2 + +ξ ( 2

i=1

i=1

i=1

40 G.

Leading term in higher order

We have not systematically pursued constructing integration formulas of orders higher than ninth, but this should be possible by the same method. One can, however, see what the pattern will be for the leading term in such formulas. An integration formula of order 2t + 1 will have a leading term Σt (λ, ..., λ), with t arguments λ. The only partition t terms appearing in the continuation of Eq. (62) will be 2 + 2 + .... + 2, containing t terms 2, and 3 + 2 + ... + 2, with one 3 and t − 1 terms 2. Requiring these to have coefficients in the correct ratio restricts λ to be λ=

2 p + 2t + 1

,

(105)

and the leading term in the integration formula will be κt Σt (λ, ..., λ), with κt given by κt =

p! t !(p + 2t) !λ2t

.

(106)

Where nonleading terms give multiple equations of the same order, corresponding to inequivalent partitions of 2t + 1, 2t, ..., one has to include terms proportional to Σt−1 (2λ, λ, ..., λ), Σt−2 (3λ, λ, ..., λ), and other such structures with asymmetric arguments summing to tλ, for the differences of these multiple equations to have consistent solutions. Once such multiplicities have been taken care of, the remaining independent equations will form a number of sets of Vandermonde equations.

VIII.

DERIVATION OF THE SIMPLEX GENERATING FUNCTION

We give here a simple proof of the simplex generating function formulas of Eqs. (54) and (55), using the standard simplex integral Z ν dx1 ...dxp (1 − x1 − x2 − ... − xp )ν0 xν11 ...xpp = standard simplex

Qp a=0 νa ! P (p + pa=0 νa ) !

,

(107)

which we obtain later on as a specialization of the multinomial beta function integral of Eq. (168) , the simplex volume formula of Eq. (42), and the expansion formulas of Eqs. (17) through (19). We start from Z

simplex

dx1 ...dxp

∞ X

ν1 ...νp

Qp Z Pp xi ti )νi i=1 (˜ Q dx1 ...dxp e i=1 x˜i ti = p simplex i=1 νi ! =0

and substitute the expansion of Eq. (17) on the right hand side, giving Z Pp Pp dx1 ...dxp e a=0 αa i=1 x˜ai ti . simplex

,

(108)

(109)

41 We now express the integral over the general simplex in terms of an integral over its barycentric P coordinates αa . Since pa=0 αa = 1, we can rewrite Eq. (19), by subtraction of x0 from both sides, as

p p X X (xa − x0 )αa (xa − x0 )αa =

.

(110)

From this we immediately find for the Jacobian det ∂x1 ...∂xp = | det(xa − x0 )i | = V p ! , ∂α1 ...∂αp

(111)

x − x0 =

a=1

a=0

with V the volume of the simplex. Since the αa , a = 1, ..., p span a standard simplex, we have

transformed the integral of Eq. (109) to the form Z Pp Pp dα1 ...dαp e a=0 αa i=1 x˜ai ti V p!

.

(112)

standard simplex

Expanding the exponential on the right in a power series, we have P Qp Z ∞ X ˜ai ti )νa )νa ( pi=1 x a=0 (αaQ dα1 ...dαp V p! p standard simplex a=0 νa ! ν ...ν =0 1

and then recalling that α0 = 1 − standard simplex, we get

Pp

V p!

a=1 αa ,

∞ X

(113)

and using Eq. (107) to evaluate the integral over the

Qp

Pp x ˜ai ti )νa Pi=1 p (p + a=0 νa ) !

a=0 (

ν1 ...νp =0

,

p

.

(114)

Let us now define Pn as the projector on terms with a total of n powers of the parameters ti , since this is the projector that extracts the nth order moments. Applying Pn to Eq. (114), the denominator is converted to (p + n) !, which can then be pulled outside the sum over the νi , permitting these sums to be evaluated as geometric series, Qp P p ∞ X ˜ai ti )νa a=0 ( Pi=1 x Pn V p ! p (p + a=0 νa ) ! ν ...ν =0 1

=

p

p p X ∞ X Y V p! Pn x ˜ai ti )νa ( (p + n) ! ν ...ν =0 a=0 1

=

V p! Pn (p + n) !

p Y

a=0

i=1

p

[1 −

p X

x ˜ai ti )]−1

.

i=1

(115) Finally, applying to each factor in the product over a the rearrangement −1

(1 − y)

∞ X ys = exp[− log(1 − y)] = exp[ ] s s=1

,

(116)

42 we get Pp ∞ Pp X ˜ai ti ]s V p! a=0 [ i=1 x Pn exp[ ] , (p + n) ! s s=2

where we have used the fact that the s = 1 term in the sum vanishes because

(117) P

˜ai ax

= 0.

Comparing Eq. (117) with the starting equation Eq. (108), we get Eqs. (54) and (55).

IX.

PARAMETERIZED HIGHER ORDER INTEGRATION FORMULAS FOR AXIS-PARALLEL HYPERCUBES

We turn in this section to the problem of deriving higher order integration formulas for axisparallel hypercubes, in analogy with our treatment of the simplex case. Our formulas can be viewed as a generalization of those obtained by Lyness (1965) and McNamee and Stenger (1967). We consider an axis-parallel hypercube of half-side S, and denote by x ˜ coordinates referred to the centroid of the hypercube. Through ninth order, the expansion of a general function f (˜ x) is given as before by Eq. (52). Consider now the moment integrals Z ν dx1 ...dxp x ˜ν11 ...˜ xpp m(ν) =

.

(118)

hypercube

Since the limits of integration for each axis are −S, S, the moment integral factorizes and can be immediately evaluated as m(ν) =

p Y S νℓ +1 [1 + (−1)νℓ ] νℓ + 1

ℓ=1

=0 any νℓ odd =

p Y

2S S νℓ νℓ + 1

ℓ=1 p Y

=V

ℓ=1

,

all νℓ even

S νℓ νℓ + 1

all νℓ even

, (119)

with V = (2S)p in the final line the hypercube volume. We now reexpress this moment integral in terms of sums over the set of 2p hypercube points x ˜j given in Eq. (49), which will play a role in hypercube integration analogous to that played by simplex vertices in our treatment of simplex integration. In analogy with Eq. (58), we define the sum Si1 ...in =

2p X j=1

x ˜ji1 ...˜ xjin

.

(120)

43 Since x ˜ji = Sδji for 1 ≤ j ≤ p and x ˜ji = −Sδji for p + 1 ≤ j ≤ 2p, this sum vanishes unless n is

even and all of the indices i1 ,...,in are equal, in which case it is equal to 2S n . The tensors of Eq. (120) and their direct products form a complete basis on which we can expand moment integrals over the hypercube. We have carried out this calculation two different ways. First, by matching the non-vanishing moment integrals through eighth order, we find 1 dx1 ...dxp x ˜ i1 x ˜ i2 = S i1 i2 , 6 hypercube Z 1 1 1 dx1 ...dxp x ˜ i1 x ˜ i2 x ˜ i3 x ˜i4 = (Si1 i2 Si3 i4 + Si1 i3 Si2 i4 + Si1 i4 Si2 i3 ) − Si1 i2 i3 i4 V hypercube 36 15 1 1 = (Si1 i2 Si3 i4 + 2 terms) − Si1 i2 i3 i4 , 36 15 Z 1 1 dx1 ...dxp x ˜ i1 x ˜ i2 x ˜ i3 x ˜ i4 x ˜ i5 x ˜ i6 = (Si i Si i Si i + 14 terms) V hypercube 216 1 2 3 4 5 6 1 8 − (Si1 i2 Si1 i2 i3 i4 + 14 terms) + Si1 i2 i3 i4 i5 i6 , 90 63 Z 1 1 dx1 ...dxp x ˜ i1 x ˜ i2 x ˜ i3 x ˜ i4 x ˜ i5 x ˜ i6 x ˜ i7 x ˜ i8 = (Si i Si i Si i Si i + 104 terms) V hypercube 1296 1 2 3 4 5 6 7 8 1 (Si i i i Si i i i + 34 terms) + 225 1 2 3 4 5 6 7 8 1 − (Si i Si i Si i i i + 209 terms) 540 1 2 3 4 5 6 7 8 4 + (Si i Si i i i i i + 27 terms) 189 1 2 3 4 5 6 7 8 8 − S i1 i2 i3 i4 i5 i6 i7 i8 . 15 1 V

Z

(121) Combining these formulas with Eq. (52), and using a similar condensed notation to that used in the simplex case (but with the contractions referring now referring to the sums over the hypercube of Eq. (120)), we have for the integral of a general function over the hypercube, through ninth order, 1 V

1 1 1 dx1 ...dxp f (˜ x) =A + C2 + E2+2 − E4 6 36 15 hypercube 1 8 1 G2+2+2 − G4+2 + G6 + 216 90 63 1 1 1 4 8 + I2+2+2+2 + I4+4 − I4+2+2 + I6+2 − I8 1296 225 540 189 15

Z

+... . (122) A second, and more general way, to obtain these results is to construct a generating function,

44 analogous to that of Good and Gaskins used in the simplex case. We start from the formula Z S Z S p Y sinh Stℓ , (123) V −1 dxp e t1 x1 +...+tp xp = dx1 ... Stℓ −S −S ℓ=1

and recall the power series expansion for the logarithm of log

sinh x x ,

1 4 1 6 1 sinh x 1 2 x + x − x8 + ... = x − x 6 180 2835 37800 ∞ X (−1)n+1 22n−1 B2n−1 2n = x , n(2n) ! n=1 (124)

with B2n−1 the Bernoulli numbers B1 =

1 6

, B3 =

1 30

, B5 =

1 42

, B7 =

1 30

, B9 =

5 66

, ... .

Defining, in analogy with the simplex case (with x ˜ai now the components of the 2p vectors x ˜a of Eq. (49)), Wu =

p 2p X X a=1

x ˜ai ti

i=1

!u

=0 , u odd , =2S u

p X

tui , u even

,

i=1

(125) we rewrite Eq. (123), using Eq. (124), as Z S Z S P∞ V −1 dxp e t1 x1 +...+tp xp = e n=1 Kn W2n dx1 ... −S

,

(126)

−S

with Kn =

(−1)n+1 22n−2 B2n−1 n(2n) !

.

(127)

Through eighth order, the right hand side of Eq. (126) is eW2 /12−W4 /360+W6 /5670−W8 /75600+... =1 W2 12 W4 W2 − + 2 360 288 W2 W4 W23 W6 − + + 5670 4320 10368 W8 W42 W2 W6 W 2 W4 W24 − + + − 2 + + ... . 75600 259200 68040 103680 497664

+

(128)

45 Applying the same rule as in the simplex case, of multiplying the coefficient of each term by a combinatoric factor (s !)t t ! for each factor Wst , we recover the numerical coefficients in the expansions of Eqs. (121) and (122). This method can be readily extended to the higher order terms of these expansions. We now follow the procedure used before in the simplex case. We match the expansion of Eq. (122) to sums over the function f evaluated at points within the hypercube, this time constructing these sums using parameter multiples of the 2p points of Eq. (49), which are the centroids of the maximal boundary hypercubes. The formulas of Eqs. (63), (64), and (65) still apply, with sums that extended from 0 to p in the simplex case extending now from 1 to 2p, and with ξ = p + 1 in Eq. (64) replaced by ξ = 2p. We proceed to set up integration formulas of first, third, fifth, seventh, and ninth order, for integrals over an axis-parallel hypercube. Since all odd order terms in the expansion of Eq. (52) integrate to zero by symmetry, to achieve this accuracy it suffices to perform a matching of the non-vanishing terms through zeroth, second, fourth, sixth, and eighth order, respectively. We will see that as a result of the absence of odd order terms, the higher order hypercube formulas are considerably simpler than their general simplex analogs.

A.

First and third order formulas

We begin our derivation of odd order hypercube integration formulas with examples of first and third order accuracy, obtained by matching the first two terms in the expansion of Eq. (122), 1 I = A + C2 + ... 6

.

(129)

Proceeding in direct analogy with the first order formulas of Eq. (66) in the simplex case, we get Ia =Σ1 (λ)/ξ Ib =f (˜0)

,

, (130)

with x ˜ = ˜0 the centroid of the hypercube, and ξ = 2p. These are again analogs of the trapezoidal and center-of-bin methods for the one dimensional case. Similarly, in analogy with the second order accurate formula of Eq. (69) for the simplex, we get the third order accurate hypercube formula ξ 1 I = 1 − 2 f (˜0) + 2 Σ1 (λ) 6λ 6λ

.

(131)

46 For any two nondegenerate values λa,b in the interval (0,1) this gives two different third order accurate estimates Ia,b of the hypercube integral.

B.

Fifth order formula

To get a fifth order formula, we have to use the sums of Eq. (63) to match the coefficients of A, C2 , E2+2 , and E4 appearing on the first line of Eq. (122). Since there is now only a single two-partition term, E2+2 , we can extract it from Σ2 (λ, λ) for any λ in the interval (0, 12 ). So we look for a fifth order formula of the form κ2 Σ2 (λ, λ) +

2 X

κi1 Σ1 (λi1 ) + κ0 A .

(132)

1 72λ4

(133)

i=1

Matching the coefficient of E2+2 gives κ2 =

,

while matching the coefficient of A gives κ0 = 1 − R0 ,

R0 = ξ 2 κ2 + ξ

2 X

κi1

.

(134)

i=1

Matching the coefficients of C2 and E4 gives a N = 2 Vandermonde system c.f. Eq. (88) with xi =(λi1 )2 ,

wi = κi1 (λi1 )2

,

ξ 1 , q1 = − 6 36λ2 −1 ξ q2 = − . 15 36 (135)

C.

Seventh order formula

To get a seventh order formula, we have to match the coefficients appearing on the first two lines of Eq. (122). Since there is only one three-partition term, G2+2+2 , we can extract it from Σ3 (λ, λ, λ) for any λ in the interval (0, 31 ). We look for a seventh order formula of the form κ3 Σ3 (λ, λ, λ) +

2 X

κi2 Σ2 (λi2 , λi2 ) +

3 X

κi1 Σ1 (λi1 ) + κ0 A ,

(136)

i=1

i=1

with matching the coefficient of G2+2+2 requiring κ3 =

1 1296λ6

.

(137)

47 To reduce the number of function calls, we take λi+1 = 2λi2 , i = 1, 2, with only λ11 an additional 1 parameter. Matching the coefficients of the two-partition terms E2+2 and G4+2 gives a N = 2 Vandermonde system with xi =(λi2 )2 ,

wi = 2κi2 (λi2 )4

,

ξ 1 − , 36 216λ2 1 ξ q2 = − − , 90 216 q1 =

(138) while matching the coefficient of A gives κ0 = 1 − R0 ,

3

R0 = ξ κ3 + ξ

2

2 X

κi2

+ξ

3 X

κi1

.

(139)

i=1

i=1

Matching coefficients of the single-partition terms C2 , E4 , and G6 gives the N = 3 Vandermonde system with xi =(λi1 )2 ,

wi = κi1 (λi1 )2

,

2

X 1 ξ2 q1 = − κi2 (λi2 )2 − 2ξ 4 6 432λ

,

i=1

ξ ξ2 1 − + q2 = − 15 36 432λ2 ξ ξ2 8 + . q3 = + 63 90 432

,

(140)

D.

Ninth order formula

To get a ninth order formula, we have to match the coefficients of all three lines of Eq. (122). Since there is only one four-partition term I2+2+2+2 , we can extract it from Σ4 (λ, λ, λ, λ) for any λ in the interval (0, 41 ). We look for a ninth order formula of the form κ4 Σ4 (λ, λ, λ, λ)+

2 X

κi3 Σ3 (λi3 , λi3 , λi3 )+κ′2 Σ2 (3λ, λ)+

3 X

κ ¯ i2 Σ2 (λi2 , λi2 )+

κi1 Σ1 (λi1 )+κ0 A , (141)

i=1

i=1

i=1

4 X

with matching the coefficient of I2+2+2+2 requiring κ4 =

1 31104λ8

.

(142)

48 To reduce the number of function calls, we take λi2 = λi3 , i = 1, 2, and λi1 = 3λi2 , i = 1, 2, 3, with only λ32 and λ41 additional parameters. Matching the coefficients of the three-partition terms G2+2+2 and I4+2+2 gives a N = 2 Vandermonde system with xi =(λi3 )2 ,

wi = 6κi3 (λi3 )6

,

ξ 1 − , 216 1296λ2 1 ξ q32 = − − . 540 1296

q31 =

(143) Taking the difference of the matching equations for I6+2 and I4+4 determines κ′2 to be κ′2 =

237 237 = κ4 8 8164800λ 262.5

,

(144)

while matching the coefficient of A gives κ0 = 1 − R0 ,

R0 = ξ 4 κ4 + ξ 3

2 X

κi3 + ξ 2 (κ′2 +

3 X

κ ¯ i2 ) + ξ

i=1

i=1

4 X

κi1

.

(145)

i=1

Matching the remaining independent two-partition terms E2+2 , G4+2 , and I4+4 gives a N = 3 Vandermonde system with xi =(λi2 )2 ,

wi = 2¯ κi2 (λi2 )4

, 2

q21 =

X 1 ξ2 κi3 (λi3 )4 − κ′2 18λ4 − − 6ξ 36 2592λ4

,

i=1 ξ2

1 ξ − κ′2 90λ6 − + , 90 216 2592λ2 ξ ξ2 1 − κ′2 162λ8 + + . q23 = 225 540 2592

q22 = −

(146)

49 Finally, matching coefficients of the single-partition terms C2 , E4 , G6 , and I8 gives a N = 4 Vandermonde system with xi =(λi1 )2 ,

wi = κi1 (λi1 )2

, 2

3

i=1

i=1

X X ξ3 1 i i 2 ′ 2 2 κ ¯i2 (λi2 )2 κ (λ ) − 2ξ − κ 10ξλ − 3ξ q11 = − 3 3 2 6 7776λ6

,

2

q12 = −

X ξ3 1 κi3 (λi3 )4 − ξ q21 − − κ′2 82ξλ4 − 3ξ 2 4 15 7776λ

,

i=1

8 ξ3 1 − − κ′2 730ξλ6 − ξ 2 q31 − ξ q22 , 2 63 7776λ 2 ξ3 1 8 − − κ′2 6562ξλ8 − ξ 2 q32 − ξ q23 . q14 = − 15 7776 2 q13 =

(147)

E.

Leading term in higher order

As in the simplex analysis, in the hypercube case we have not systematically pursued constructing integration formulas of orders higher than ninth, but this should be possible by the same method. Again, one can see what the pattern will be for the leading term in such formulas. An integration formula of order 2t + 1 will have a leading term Σt (λ, ..., λ), with t arguments λ. The only t-partition term appearing in the continuation of Eq. (122) will be 2 + 2 + .... + 2, containing t terms 2. So λ can be taken to have any value in the interval (0, 1t ), and the leading term in the integration formula will be κt Σt (λ, ..., λ), with κt given by κt =

1 t !6t λ2t

.

(148)

Where nonleading terms give multiple equations of the same order, corresponding to inequivalent partitions of 2t,... one has to include terms proportional to Σt−2 (3λ, λ, ..., λ), and other such structures with asymmetric arguments summing to tλ, for the differences of these multiple equations to have consistent solutions. Once such multiplicities have been taken care of, the remaining independent equations will form a number of sets of Vandermonde equations.

F.

One dimension revisited: comparison of Vandermonde moment fitting to standard one dimensional methods

In this subsection we address several related issues. We first set up a one-dimensional analog of the moment fitting method that we have used in general p dimensions to develop higher order

50 integration formulas. The one dimensional moment fitting equations can be satisfied by leaving the sampling points as free parameters, giving in any order a Vandermonde equation system to determine the weights assigned to the sampling points. Alternatively, the moment fitting equations can be satisfied by restricting the sampling points, which is what is done in Gaussian quadrature, which reduces the number of function calls. We then show that the fifth (and higher) order direct hypercube integration formulas for p dimensions, when restricted to one dimension, involve a larger number of function calls than needed for the case when the sampling points are all free parameters. We interpret this as resulting from linear dependencies in low dimension among the various terms appearing in the generating function of Eq. (128). We study these linear dependencies as a function of dimension p, and suggest that the integration rule of order 2t + 1 has redundant parameters when the spatial dimension p < t. Let f (x) = f0 + f1 x + f2 x2 + f3 x3 + f4 x4 + ... be a function that is power series expandable on the interval (-1,1), and consider the one dimensional integral 1 I= 2

Z

1

f (x)dx −1

=f0 +

f2 f4 f6 f8 + + + + ... . 3 5 7 9 (149)

Defining Σ1 (λ) by Σ1 (λ) = f (λ) + f (−λ) ,

0<λ≤1 ,

(150)

we look for an integration formula of the form I =κ0 f (0) +

n X

κi Σ1 (λi )

i=1

=f0 +

n X

κi [f0 + f2 (λi )2 + f4 (λi )4 + f6 (λi )6 + f8 (λi )8 + ...]

.

i=1

(151)

51 Matching the coefficients of fℓ between Eq. (149) and Eq. (151), we get the system of equations 1 =κ0 + 2

n X

κi

,

i=1

n

X 1 κi (λi )2 =2 3

,

i=1

1 =2 5 1 =2 7 1 =2 9

n X

κi (λi )4

,

κi (λi )6

,

κi (λi )8

,

i=1

n X i=1

n X i=1

(152) and similarly if one wishes to go to higher order than ninth. There are now two ways to proceed to solve the matching equations, to give a discrete approximation to the integral to a given order of accuracy. The first, which is what we have done in getting simplex and hypercube integration formulas, is to regard all of the λi as adjustable parameters, and to determine the coefficients κi to satisfy the system of equations of Eq. (152) to the needed order. Thus, to get a first order accurate formula, we take I ≃ f (0) with all the κi equal to zero,

which is the center-of-bin rule. To get a third order accurate formula we must take κ1 as nonzero

and solve the system 1 =κ0 + 2κ1

,

1 =2κ1 (λ1 )2 3

. (153)

To get a fifth order accurate formula we must take both κ1 and κ2 as nonzero and solve the system 1 =κ0 + 2(κ1 + κ2 )

,

1 =2κ1 (λ1 )2 + 2κ2 (λ2 )2 3 1 =2κ1 (λ1 )4 + 2κ2 (λ2 )4 5

, , (154)

52 to get a seventh order accurate formula we must take κ1,2,3 as nonzero and solve the system 1 =κ0 + 2(κ1 + κ2 + κ3 ) , 1 =2κ1 (λ1 )2 + 2κ2 (λ2 )2 + 2κ3 (λ3 )2 3 1 =2κ1 (λ1 )4 + 2κ2 (λ2 )4 + 2κ3 (λ3 )4 5 1 =2κ1 (λ1 )6 + 2κ2 (λ2 )6 + 2κ3 (λ3 )6 7

, , , (155)

and so forth. Evidently, to get an order 2t + 1 formula, we must take n = t, so that there are t distinct positive sampling points λ1,...,t,and to determine the coefficients κ1,...,t we must solve an order N = t Vandermonde system. The resulting order 2t + 1 integration formula uses 2t + 1 function values. An alternative way to proceed is to adjust the values of the sampling points so that fewer of them are needed to satisfy the matching conditions. This is what is done in the well-known Gaussian integration method, which gives a more efficient scheme, in terms of the number of function calls, starting with third order. Referring to Eq. (153), we can evidently achieve a third order match by taking 2κ0 =0 , 1 λ1 = √ 3

2κ1 = 1

,

. (156)

Similarly, referring to Eq. (154), we can evidently achieve a fifth order match by taking 8 , 9 √ 3 λ1 = √ 5

2κ0 =

2κ1 =

5 9

,

. (157)

Proceeding in this way, we can obtain the general Gaussian integration formula, which for order 2t + 1 integration involves t points. Of course, the usual derivation of the Gaussian integration rule does not proceed this way, but instead uses an argument based on one dimensional polynomial long division to relate the special points λi to zeros of the Legendre polynomials. Since in higher dimensions there is no analogous polynomial division rule, there is no universal higher dimensional analog of the Gaussian integration rule, although there are a multitude of special formulas using

53 specially chosen sampling points in higher dimensions (see, e.g., Stroud (1971)). On the other hand, as we have seen, the method of keeping all the sampling points λi as free parameters, and solving a set of Vandermonde equations to get the coefficients κi , readily extends to higher dimensions. Let us now examine the number of function evaluations required by our general moment fitting formulas for hypercubes, when restricted to one dimension. The first order center-of-bin formula requires just the one function evaluation f (0), and so is the same in all methods. The third order formula of Eq. (131) involves f (0) and one Σ1 (λ), and so uses 3 function values, in agreement with Eq. (153), whereas the Gaussian formula needs 2 function values. Turning to the fifth order formula of Eq. (132), calculation of Σ2 (λ, λ) requires 3 function evaluations, calculation of each of the two Σ1 (λi ) requires 2 function evaluations, and evaluation of f (0) requires one function evaluation, for a total of 8 function evaluations. This is to be compared to the one dimensional moment fitting formula of Eq. (154) which requires 5 function evaluations, and the Gaussian method, which requires 3. The reason that the fifth order integration formula for general p, when specialized to one dimension, requires more function evaluations than the moment fitting method of Eq. (154), is that whereas in two and higher dimensions W4 and W22 are linearly independent, in one dimension they are proportional to one another by virtue of the identity t41 = (t21 )2 . Hence the term Σ2 (λ, λ) in the general integration formula is not needed to get a match, and when this is dropped one has a formula identical in form to that of Eq. (154), requiring only 3 function calls. Turning to the higher order hypercube formulas, we see that the seventh order hypercube formula of Eq. (136) has redundant parameters and function calls for dimension p < 3, since in 2 dimensions W6 , W2 W4 and W23 are linearly dependent by virtue of the algebraic identity 0 = (t21 + t22 )3 − 3(t21 + t22 )(t41 + t42 ) + 2(t61 + t62 ) .

(158)

Similarly, the ninth order hypercube formula of Eq. (141) has redundant parameters and function calls for dimension p < 4, since in 3 dimensions W8 , W42 , W2 W6 , W22 W4 , and W24 are linearly dependent by virtue of the identity 0 =(t21 + t22 + t23 )4 − 6(t81 + t82 + t83 ) + 3(t41 + t42 + t43 )2 +8(t21 + t22 + t23 )(t61 + t62 + t63 ) − 6(t21 + t22 + t23 )2 (t41 + t42 + t43 ) . (159) These results suggest the conjecture that the hypercube formula of order 2t + 1 will involve redundant parameters and function calls for dimension p < t, and we expect an analogous state-

54 ment to apply for the simplex formulas derived by the moment fitting method in Sec. VII. This redundancy for small p is a consequence of the fact that the integration formulas that we have derived for simplexes and hypercubes are universal, in the sense that they involve the same number of parameters irrespective of the dimension p. As p increases, the number of sampling points increases, but the number of parameters, and the size of the Vandermonde systems needed to find coefficients, remains fixed.

X.

FUNCTION CALLS NEEDED FOR INTEGRATION ROUTINES OF VARIOUS ORDERS

We summarize in this section the number of function calls needed for a single call to the integration routines of various orders. These are obtained by running the programs to integrate the function fcn=1, in which case the programs exit without subdividing the base region, giving the desired function call count for two samplings of the integral at the indicated order of accuracy, as well as unity as the output integral (since the programs all compute the integral over the base region, divided by the base region volume). In Table VI we give the function call counting for the simplex integration programs of first through fourth, fifth, seventh, and ninth order. For comparison, in Table VII we give a similar table from the paper of Genz and Cools (2003), which gives the function call counting for one evaluation of the indicated order, plus a second evaluation at a lower order used for error estimation. Unlike our method, which proceeds directly from the vertices of a general simplex, the Genz and Cools program uses integration rules for a standard p-simplex, with an affine transformation needed to treat more general simplexes. Although not directly comparable, the two tables show that the strategy we have used, of incorporating a number of free parameters into the integration which can be used to give different samplings of the integrand, does not lead to an inefficiency of more than a factor of 2 to 3 compared to the method used by Genz and Cools. In Table VIII we give the function call counting for the direct hypercube programs of first, third, fifth, seventh, and ninth order. For comparison, in Table IX we have tabulated tp + (t + 1)p , with the odd order of integration n related to t by n = 2t + 1; this is the number of function calls needed if one uses a p-fold direct product of Gaussian integrations of indicated order, together with a p-fold direct product of Gaussian integrations of the next higher odd order to get an error estimate. One sees from these tables that for t = 1, 2, 3 our parameterized method is more efficient than direct product Gaussian for dimension p ≥ 4, and for t = 4 the parameterized method is more

55

TABLE VI: Function calls by for simplex integration of order n in dimension p by method of Sec. VII n p→ 1

2

3

4

5

6

7

8

9

1

3

4

5

6

7

8

9

10

11

2

5

7

9

11 13

15

17

19

21

3

7 10 13 16 19

22

25

28

31

4

10 16 23 31 40

50

61

73

86

5

20 31 43 56 70

85

101 118 136

7

37 71 117 176 249 337 441 562 701

9

74 168 316 531 827 1219 1723 2356 3136

TABLE VII: Function calls for simplex integration of order n in dimension p from Genz and Cools (2003) n p→ 2

3

4

5

6

7

8

9

3

7

9

11 13 15 17 19

21

5

16 23 31 40 50 61 73

86

7

32 49 86 126 176 237 310 396

9

65 114 201 315 470 675 940 1276

efficient for p ≥ 5. Since the number of function calls in the parameterized method is asymptotically

polynomial of order (2p)t /t !, whereas in the direct product Gaussian method it is exponential in

p, the parameterized method becomes markedly more efficient for large dimension p. These results reinforce the indication from the previous section that, as a very rough rule of thumb, in using integration routines with n = 2t + 1 in dimension p, one should avoid high order routines with t > p. This is true both because in low dimension the higher order routines have redundant function calls, and because the extra computation involved in using a high order routine is justified only when the 2p scaling in the number of subregions, as the program subdivides from level to level, becomes large enough. However, this is only a very general criterion, since the optimum choice or choices of integration routine order will depend on the nature of the function being integrated. Moreover, in dimension p = 1 the programs are so fast on current computers that use of the fifth or seventh order integration routines, while not as efficient as Gaussian integration, still gives good results.

56

TABLE VIII: Function calls for hypercube integration of order n in dimension p by method of Sec. VIII n p→ 1

2

3

4

5

6

7

8

9

1

3

5

7

9

11

13

15

17

19

3

5

9

13

17

21

25

29

33

37

5

12 27 46

69

96

127 162 201

244

7

21 69 153 281 461 701 1009 1393 1861

9

48 192 501 1059 1966 3338 5307 8021 11644

TABLE IX: Function calls for hypercube integration of order n in dimension p by comparison of two product Gaussian rules

XI.

n p→ 1 2

3

4

5

6

7

3

3 5

9

17

33

65

129

5

5 13 35 97 275

793

2315

7

7 25 91 337 1267 4825 18571

9

9 41 189 881 4149 19721 94509

PUTTING IT ALL TOGETHER – SKETCH OF THE ALGORITHMS

We are now ready to give a sketch of the adaptive algorithms incorporating the elements described above. The basic algorithm starts from a base region, which acts as the initial level subregion, which is either a standard simplex, a Kuhn simplex (for hypercube integration treated by tiling with Kuhn simplexes), or a half-side 1 hypercube. It then proceeds recursively through higher levels of subdivision, by evaluating the integral using an integration method of order specified by the user with two different parameter choices, giving two estimates of the integral over the subregion divided by the subregion volume, which we denote by Ia (subregion) and Ib (subregion). (Dividing out the volume is convenient because of the 1/V factor appearing on the left hand side of Eqs. (62) and (122).) If the level number exceeds a user-specified value ithinlev which determines when thinning begins, then a thinning condition is applied. When the user-specified thinning function parameter ithinfun is given the value 1, the thinning condition used is |Ia (subregion) − Ib (subregion)| < ǫ

,

(160)

with ǫ an error measure specified by the user. (Further thinning options will be discussed shortly.) If this condition is met, the results are retained as contributions to the Ia and Ib estimates of the integral divided by the base region volume, and the subregion is not further subdivided. If this condition is not met, then the subregion is subdivided into 2p subregions, and the process is

57 repeated. The process terminates when either the thinning condition is met for all subregions, or a limit to the number of levels of subdivision set by the user is reached. In the latter case, the contributions of the remaining subregions that have not satisfied the thinning condition are added to the Ia and Ib totals, as well as to the sum of the absolute values of the local subinterval errors. With either termination, we get the final estimates of the integral divided by the base region volume, Ia ≃ Ib ≃

X

V (subregion)Ia (subregion) ,

X

V (subregion)Ib (subregion) .

subregions

subregions

(161) Here V (subregion) is the subregion volume divided by the base region volume, and since the subregions are a tiling of the initial base region, we have X

V (subregion) = 1 .

(162)

subregions

From the difference of Ia and Ib we get an estimate of the error, given by |outdiff| ≡ |Ia − Ib | .

(163)

We can also (as in the one dimensional illustration) compute the sum of the absolute values of the local subinterval errors, errsum ≡

X

subregions

V (subregion)|Ia (subregion) − Ib (subregion)|

.

(164)

Comparing Eqs. (161), (163), and (164), we see that errsum and |outdiff| obey the inequality errsum ≥ |outdiff|

,

with equality holding if Ia − Ib has the same sign in all subregions.

(165) When the condition

|Ia (subregion) − Ib (subregion)| < ǫ is met for all subregions, errsum reduces, using Eq. (162), to errsum < ǫ .

(166)

Hence to evaluate the integral to a relative error δ, one should choose ǫ ∼ δ|Ia |

.

(167)

58 Since Ia and Ib give the integral over the base region divided by the base region volume, to get the value of the integral without normalization by the base region volume, one must multiply these outputs by the base region volume V0 . For a standard simplex, V0 = 1/p !, for a side 1 hypercube, V0 = 1, while for a half-side 1 hypercube, V0 = 2p . Note that the thinning condition determining whether to subdivide a subregion does not include a factor of the subregion volume; we are testing variances of the integrand as sampled over the subregion, not variances of the net contribution to the integral. This may seem counter-intuitive, but is motivated by the formulas of Eqs. (162)–(166), by simplicity, and by the fact that it works well in practice. The problem with including a subregion volume weighting factor in the thinning condition is that at a very fine level of subdivision, there are many subregions, and so small error contributions from each can add up to a large error in the total. Since the local test does not involve comparisons of the errors from different regions, the calculation in each subregion proceeds independently from that in all the others. The local thinning condition that we use is equivalent to the “Local Subdivision Strategy” described in the monographs of Krommer and Ueberhuber (1991) and Ueberhuber (1995) using a parameter ǫabs , which plays the role of our ǫ. Using |Ia (subregion) − Ib (subregion)| as the basis for a thinning decision is only one possibility of many. More generally, given A ≡ Ia (subregion) and B ≡ Ib (subregion), one can take as the thinning function any function f (A, B) with the properties f (A, B) ≥ 0 and f (A, B) = 0 iff A = B, imposing now the thinning condition f (A, B) < ǫ. In the programs, we have included three options, (1) f (A, B) = |A − B| as in the discussion above, (2) f (A, B) = |A − B|/|A + B|, and

(3) f (A, B) = (A − B)2 . In many cases, and in particular for polynomial integrals, we found their R1 1 performance (with appropriate ǫ) to be similar, but for the singular integral 0 dx √1−x 2 we found

choice (3) to perform considerably better than the other two.

Three versions of the basic algorithm are presented in each of the directories of programs. In the first, the algorithm subdivides until all subregions obey the thinning condition, or until a preset limit on the level of subdivisions is reached, which is dictated by the available memory. Typically, for simple integrands and moderate dimension p, this happens rather quickly, in other words, the algorithm has saturated capabilities of the machine memory, but not of the machine speed. In a second version labelled “r”, the algorithm is “recirculated” by keeping, at a level limit set by the user which is chosen to avoid exceeding machine memory capabilities, all the subintervals that do not obey the thinning condition. These are then treated one at a time by the same algorithm, up to a second level limit again set by the user. This can take hours or days for high accuracy, high p computations, with a practical limit set by the speed capabilities of the machine. Finally,

59 a third version labelled “m” takes the “recirculating” algorithm and parallelizes it using the MPI (message passing interface) protocol, by distributing to each process of a cluster a large number of the subintervals that do not obey the thinning condition , each of which is then processed by the algorithm sequentially. This speeds up the computation by a factor of the number of processes available. All routines are coded in double precision, but since the ninth order integration formulas involve large numbers in computing coefficients, double precision computation is not enough to give double precision accuracy results, so for the fifth, seventh, and ninth order routines in both the simplex and direct hypercube cases, we also give a quadruple precision real(16) version of

the programs.

The programs present the user with various options. By an appropriate choice of ithinlev, thinning can be delayed, or even suppressed entirely so that all subdivisions take place to the specified subdivision limits. This can give a check that subregions with large contributions, but accidentally small error estimates, have not been harvested prematurely, and when the programs are modified, gives a useful check that the tiling condition of Eq. (162) is obeyed. By a choice of ithinfun, the user can choose which of three preset thinning functions to use, or by modifying the subroutine containing these functions, the user can make another choice of thinning function. For simplex integration, the user can choose whether to use the recursive or the symmetric subdivision algorithm. The user can choose the accuracy of the integration method used: first through fourth, fifth, seventh, or ninth for simplex based routines, and first, third, fifth, seventh, and ninth for the direct hypercube routines. Finally, the user can modify the free parameters in the integration routines, so as to get different samplings of the integrand, which can give a useful assessment of whether the error estimates from the initially used sampling are realistic.

XII.

TEST INTEGRALS; FALSE POSITIVES AND THEIR AVOIDANCE

For verifying the higher order integration programs, and for checking the operation of the adaptive programs, it is essential to have test integrals with known answers. For the standard simplex (c.f. Eqs. (30) and (31)), a useful formula is the multinomial beta function integral, Qp Z Γ(αa ) α −1 Pp , (168) dx1 ...dxp (1 − x1 − x2 − ... − xp )α0 −1 xα1 1 −1 ...xp p = a=0 Γ( standard simplex a=0 αa )

with Γ the usual gamma function (see the Wikipedia article on Dirichlet distributions). When αa − 1 = νa , Z

a = 0, ..., p with νa an integer, this can be rewritten as

standard simplex

dx1 ...dxp (1 − x1 − x2 − ... −

ν xp )ν0 xν11 ...xpp

Qp a=0 νa ! P = (p + pa=0 νa ) !

.

(169)

60 The ν0 = 0 case of this formula is the formula given by Stroud (1971) see also Grundmann and M¨oller (1978) for the integral of a general monomial over the standard simplex. For a unit hypercube, the corresponding formula is Z 1 Z 1 p Y νp ν1 dxp x1 ...xp = dx1 ... 0

0

ℓ=1

1 νℓ + 1

,

(170)

while for a half-side 1 hypercube the corresponding monomial integrals are (c.f. Eqs. (118) and (119)) Z

1

dx1 ... −1

Z

1 −1

ν

dxp xν11 ...xpp = 2p

p Y

ℓ=1

1 νℓ + 1

for all νℓ even, and zero otherwise .

(171)

Testing the simplex programs with the integral of Eq. (169), and starting thinning at level 1, shows that when the order of the monomial is less than or equal to the order of the integration formula used, the iteration terminates at the initial level, and the difference between Ia and Ib is of order the computer truncation error. When a monomial is integrated that is of higher order than the integration formula used, with a small enough error measure ǫ, the adaptive program starts to subdivide the base region. However, a more complicated pattern is seen for the hypercube integrals when evaluated by the direct hypercube algorithms, and this brings us to the issue of false positives. As in the simplex case, when thinning is started at level 1 and the order of the test monomial is less than or equal to the order of the integration formula used, the iteration terminates again at the initial level, and Ia − Ib is of order the truncation error. However, when a monomial is integrated that is of higher order than the integration formula used, the adaptive program does not always start to iterate. For example, using the fifth order hypercube formula in dimension p = 4, the program iterates for the integrand x61 , but not for the integrand x21 x22 x23 . The reason is that the latter function, although of higher order than that of the integration formula, vanishes on the hyperplanes spanning the axes where the fifth order integration formula samples the integrand, and so the Ia and Ib evaluations give the same answer (zero), and the thinning condition is obeyed for arbitrarily small ǫ. This is an example of a false positive, in which the thinning condition is obeyed even though the actual error is large. Any sampling program for evaluating integrals is subject to false positives for functions that take special values (in our case zero, or a constant) on the sampling points. Since the sampling points in the simplex integration formulas are on oblique, rather than axis-parallel, lines or planes, this problem is not so readily seen with the multinomial test functions of Eq. (169), but we have nonetheless found examples of false positives. For example, using fifth order integration and symmetric subdivision, the p = 5 monomial x(1)x(2)x(3)x(4)2 x(5), when computed with thinning

61 starting at any level below 3, develops a false positive at level 2 and gives an answer that is wrong in the fourth decimal place, even though the output error measures suggest much higher accuracy. There are several general ways to guard against false positives. The simplest is to use the freedom of choosing the parameter ithinlev to delay thinning until several subdivisions have taken place. False positives are most dangerous if they occur in the initial few levels, since these have the largest subregions, and if a subregion is prematurely harvested, there is a possibility of significant error. On the other hand, thinning becomes most important after several subdivisions have taken place, when the number of subregions is large. So there can be a useful tradeoff between starting thinning early and starting it late. If computer time permits, one can always do an a posteriori check by choosing ithinlev greater than the limit on the number of levels, which suppresses thinning altogether, and gives the approximate Riemann sum corresponding to the level of subdivision attained. A second general way to guard against false positives is to compute the integral using alternative options, for example, using integration programs of several different orders, or where allowed as an option for simplex integrals, to use recursive instead of symmetric subdivision. In the fifth order p = 5 example noted above, changing to seventh order integration, or changing from symmetric to recursive subdivision while maintaining fifth order integration, both eliminate the false positive at level two. A third way is to add a function with known integral to the integrand, which has significantly different local behavior, and to subtract its known integral from the total at the end. For example, in the hypercube case, consider the integral Z 1 φq (x) , φq (x) = 0= −1

1 1 − 2 2 (q + x) q −1

,

(172)

which exists for any q > 1. Adding a multiple of p Y

φq (xℓ )

(173)

ℓ=1

to the test monomial integrands does not change the expected answer, but forces the adaptive program to start to subdivide at level 1 (for small enough ǫ) in all monomial cases. It is of course not necessary for the added function to have an integral that can be evaluated in closed form. In the p = 5 simplex case discussed above, we eliminated the false positive at level 2 by numerically integrating the function (1+x(1))−1 , and then adding a multiple of this function to the integrand and subtracting its integral from the answer. When adding such an auxiliary function, it is probably a good idea to rescale it so that its order of magnitude is similar to that of the integral being evaluated. Clearly there is an infinite variety of such auxiliary functions that can be added

62 to the integrand, each of which shifts the false positive problem to a different part of integrand function space. Even when one is dealing with generic integrands, in which the program starts to subdivide as expected, adding such functions will alter the pattern of subdivision, and can be used (in addition to changing the integration formula parameters) to give further estimates of the errors in the output values Ia,b provided by the integration algorithm. We do not recommend just changing the integration formula parameters as a way of eliminating false positives. The reason is that the samplings in both the simplex and hypercube cases take place on hyperplanes that are determined by the general structure of the integration formulas, but do not vary as the parameters in the integration formulas are changed. So if a false positive is associated with a zero or constant integrand value on one of these hyperplanes, it will not be eliminated by changing the parameter values. Similar remarks apply to changing the thinning function as a way of eliminating false positives. For related reasons we have not written into the programs another way of creating a criterion for thinning, the comparison of results from integration programs of different orders (say, of fifth and seventh order). In the simplex example discussed above, doing this would eliminate the false positive, since the seventh order routine uses sampling points that avoid the problematic hyperplanes sampled by the fifth order routine. However, in this case one may as well do two seventh order samplings to set up the thinning condition , and thus benefit from the higher accuracy accruing from use of the seventh order routine for smooth integrands.

XIII.

DESCRIPTION OF PROGRAMS IN THE SEVEN DIRECTORIES A.

General description

The Fortran programs are grouped into 7 directories, named simplex123, simplex4, simplex579, simplex579 16, cube13, cube579, and cube579 16. All programs are valid for arbitrary dimension p≥1 The simplex programs all perform adaptive integration over a standard simplex or a Kuhn simplex with one vertex at the origin, using real(8) precision except for simplex579 16, which uses real(16) . The programs in simplex123 perform first through third order integration, the

programs in simplex4 perform fourth order integration, and the programs in simplex 579 perform fifth, seventh, or ninth order integration. The same adaptive program treats both the standard and Kuhn simplex cases, with a subroutine

63 argument “i init” determining which initialization is used. Included in all the simplex packages are programs for integration over a side 1 hypercube with one vertex at the origin, by tiling with Kuhn simplexes followed by adaptive simplex integration. The programs in simplex4 perform fourth order adaptive integration using a different subdivision strategy from that used in all the other cases. In simplex4 the simplex vertices are used as sampling points, with the side midpoints giving the vertices at the next level of subdivision. In all the other programs, only interior points of the simplex are used for sampling. Hence, the simplex4 programs cannot be used to integrate functions which have integrable singularities at the base simplex boundary, whereas the other programs can be used in this case. The programs in cube13 perform first or third order adaptive integration, and those in cube579 perform fifth, seventh, or ninth order adaptive integration, over half-side 1 hypercubes centered on the origin, with real(8) precision. These programs use less memory (by roughly a factor 1/p) than the hypercube tiling programs. The cube programs are valid for arbitrary dimension p ≥ 1. The programs in simplex579 16 are real(16) re-writings of those in simplex579, and the programs in cube579 16 are real(16) re-writings of those in cube579. The real(16) versions are obtained from the corresponding real(8) programs by making the following global substitutions: (1) Replace “d0” by “q0”, (2) replace “implicit real(8)” by “implicit real(16)”, (3) replace “dabs” by “qabs”, (4) replace “d20.13” by “d32.36”. These changes can be made using a “replace all” utility, since the strings that have to be modified do not occur anywhere else in the programs. Note that explicit data type declarations that override the implicit ones are not changed. Each directory contains a package of subprograms, labeled respectively simplexsubs123.for, simplexsubs4.for, simplexsubs579.for, simplexsubs579 16.for, cubesubs123.for, cubesubs579.for, and cubesubs579 16.for. The subroutines in these packages do not have to be accessed by the user in normal operation of the adaptive programs. If they are accessed to alter the programs, we strongly recommend doing several test integrals before and after the changes, to make sure they still operate correctly. Each directory also contains a series of main program files, and each main program file contains the main program proper, as well as a subroutine setting up the function to be integrated, subroutines setting up the free parameters used in the parameterized integrations, a subroutine setting up three options for the thinning function, and in the case of the Kuhn tiling treatment of hypercubes, a subroutine symmetrizing the function to be integrated over all its variables. Each program that requires user setting of input parameters contains comment statements giving instructions. To run the programs, the user must compile and link the subroutine package in a directory with the appropriate main program file in the same directory.

64 As noted in the section on Vandermonde solvers, all programs are self-contained, since their subroutine packages include Vandermonde solvers that compute the explicit solution of the Vandermonde system for the relevant values of N . Because the ninth order simplex integration routines and associated Vandermonde equations involve large numbers in computing coefficients, use of real(16) is recommended if one wants to get answers with real(8) accuracy. Solving the Vandermonde equations to get the coefficient parameters for the integrations need be done only once before adaptive integration begins; this is done in the subroutines with names beginning with “ext”, the output of which is then fed to the integration programs that are used repeatedly in the adaptive integration process. There are three generic types of main programs in each directory. Those with names not ending in “r” or “m” execute adaptive integration to a subdivision level set by the user (and limited by machine memory). Those with names ending in “r” execute the “recirculating” routines, in which after the first stage of subdivision, the remaining subregions are subdivided sequentially in a second stage to a second level of subdivision set by the user. Those with names ending in “m” execute an MPI parallel version of the “recirculating” routines, in which after the first stage of subdivision, the remaining subregions are farmed out to the available processes for a second stage of subdivision to the second level of subdivision set by the user. In order to conserve memory, the labeling of simplex and cube points and the simplex subdivision routines use a lattice built on integer(2) arithmetic. This allows 14 levels of subdivision in the initial stage, since 214 =16384, which is half the maximum integer representable in integer(2). In order to go beyond 14 levels of subdivision in one stage, say to 30 levels of subdivision, one would have to replace 16384 in the subroutines by 230 = 1, 073, 741, 824, which is half the maximum integer representable in integer(4), replace all integer(2) data type declarations by integer(4), and enlarge the level number limits in the programs. The explicit limits in the programs on the number of levels correspond to the requirement that the minimum integer(2) lattice spacing must not be smaller than 1, since in integer arithmetic 1/2 is replaced by 0. Program stages that pass on subdivided regions have a limit of 14 levels, while output stages that do not pass on subdivided regions have a limit of 15 levels. An exception to this rule is in the simplex4 programs, where there is an explicit division by 2 in the programs, and so the corresponding limits are 13 and 14. Note that in integer(2) arithmetic, 16384/2 + 16382/2 = 16384 6= (16384 + 16384)/2 = (−32768)/2 = −16384, which is why in the simplex4 integration program we have not regrouped added terms into parentheses. The recirculating and MPI programs make use of the observation that symmetric (or recursive) subdivision of standard simplexes, symmetric and recursive subdivision of Kuhn simplexes, and

65 hypercube subdivision, all give after ℓ subdivisions a subregion that fits within a hypercube of side 1/2ℓ (or 1/2ℓ−1 ). This observation, which is an unproved conjecture supported by our numerical results in the case of standard simplexes, permits a doubling of the number of levels attainable within integer(2) arithmetic in the “r” and “m” programs, as follows. At the start of the second stage of subdivision, each subregion is translated by a shift vector and is rescaled by a factor which expands it to just fit within the initial lattice containing base region. This permits another 15 (or for recursive subdivision, 14) levels in the second stage with corresponding limits in the simplex4 programs of 14 (or 13) , and so the “r” and “m” programs can subdivide to subregions that have a dimension 2−28 = 3.725 × 10−9 of the base region dimension. Whether this can be attained

in practice for a given dimension p of course depends on available machine memory. Subdivision limits appropriate to the various cases have been incorporated into the main programs. Because simplex points are represented in integer(2) arithmetic, in order to apply the simplex subroutines to a starting simplex that does not have only 0s or 1s in the vertex coordinates (for example, an equilateral triangle), one would have to change the integer(2) data type declarations to real(4) for the programs to work correctly. This change increases the memory requirements, and should not be made unless needed. We note also that with the aim of conserving memory, we have used allocatable memory to store subregion information, allocating memory where needed at each level of subdivision, and deallocating memory when no longer used. Finally, we note that the MPI programs are written using only simple MPI Send and MPI Recv commands. All processes simultaneously carry out the first stage of subdivision, and then each process of rank greater than 0 takes its share of the remaining subregions after the first stage and processes them further. This wastes some processor time, but avoids large data transfers. Only at the end, when all processes of rank greater than 0 have finished, is their output combined in process 0. Because MPI can only pass real(8) numbers as messages, the real(16) MPI programs give only real(8) output. (This is one of the reasons why the explicit real(8) declarations are not modified in the conversion substitutions leading to real(16) programs.) Nevertheless, the MPI programs compute the sensitive parts of the high order integrations in real(16), converting to real(8) only at the end when process outputs are combined. To enhance readability of the programs, we have used indents to show the different levels of “if” chains, except in one place in the MPI programs, where we have given the “if”, “else if”, and “end if” lines statement numbers 97,98,99. We have not indented the contents of “do” loops, since these always begin and end with a statement number, and never with an unnumbered “enddo”. The one exception to this is in the subroutine BestLex used for the symmetrization step in the

66 Kuhn tiling programs for integration over hypercubes, which has been taken verbatim from H. D. Knoble’s (1995) website. This is of course a matter of taste; our feeling was that indenting both the “if” chains and “do” loops would result in so many levels of indents that readability of the programs would be decreased. We also remind the reader that the direct hypercube programs were written by minimal modification of the simplex programs, changing array arguments where needed (e.g., “ip+1” for simplexes becomes “2∗ip” for hypercubes), but not changing array names. So the array names in the direct hypercube subroutines are not the ones that would naturally be chosen if these programs were written without reference to the simplex case.

B.

Inputs

The main programs require the following inputs to be set by the user: 1. ithinlev tells the program when to begin thinning subregions, by harvesting those that obey the thinning condition of Eq. (160). Thinning begins when the total level number exceeds ithin. Thus, with ithin = 0, thinning begins at level 1, while if ithin is greater than or equal to the maximum total level number, there is no thinning. 2. ithinfun tells the program which thinning function option to use. As explained in Sec. X, ithinf un = 1 corresponds to a thinning function f (A, B) = |A − B|, ithinf un = 2 to

f (A, B) = |A − B|/|A + B|, and ithinf un = 3 to f (A, B) = (A − B)2 .

3. isubdivision tells the simplex programs whether to use symmetric subdivision (isubdivision = 1) or recursive subdivision (isubdivision = 2). This parameter does not appear in the main programs in cube13, cube579, and cube579 16, where there is no choice of subdivision methods. 4. iaccuracy tells the programs to use the integration program of order iaccuracy. For example, in the simplex123 programs, to select third order accuracy one sets iaccuracy = 3, and in the simplex 579 programs, to select seventh order integration one sets iaccuracy = 7. This parameter does not appear in the main programs in simplex4, which uses only fourth order integration. 5. ip gives the spatial dimension p of the simplex or hypercube being integrated over, and can take any integer value ≥ 1. Thus, to integrate over a three dimensional cube one would set ip = 3.

67 6. eps sets the parameter ǫ appearing in the thinning condition of Eq. (160). For ithinfun=1, this gives an absolute error criterion; to achieve a given level of relative error, one needs a rough estimate of the value of the integral as given by outa or outb, which can be used to readjust eps by use of Eq. (167). For nonsingular integrands, the eps value when using ithinfun=3 should, as a first guess, be taken as the square of the eps value that one used for ithinfun=1. Note that if ithinlev is greater than the total level number, so that thinning is suppressed, the results are independent of the value given to eps. 7. In all programs other than the hypercube tiling program, the external function is supplied by the user in the subroutine fcn.for. In the tiling programs, fcn.for is instead the symmetrization program for the external function supplied by the user in the subroutine fcn1.for. 8. llim sets the limit to the number of subdivisions in the programs with names not ending in “r” or “m”. It can be any integer between 1 and 15, except in the simplex4 programs, where the range is 1 to 14. In practice, the effective upper limit is set by machine memory. Start with a low value of llim, and then to improve the accuracy, increase it until you get a diagnostic saying memory has been exceeded; the value of llim one less than this is the maximum value llim = LM AX that does not exceed memory. Since the final level l = llim does not further subdivide, this limit is associated with the number of subregions carried forward from level l−1 to the final level. As llim is increased the execution time will increase, and this will also impose an effective upper limit. 9. llim1 and llim2 in the programs with names ending in “r” and “m” set the limit to the number of subdivisions in the first and second stages of subdivision, respectively. The maximum value of llim1 is 14 and of llim2 is 15, except in the simplex4 programs, where the respective limits are 13 and 14, and also except for recursive subdivision, where the maximum value of llim2 is one less than the corresponding value for symmetric subdivision. In all cases, the built-in subdivision limits prevent the program from dividing 1 by 2, giving an integer arithmetic answer of 0. As before, the effective upper limit will be set by machine memory and machine execution speed. In using the “r” and “m” programs, and setting llim2 = 1, the maximum value of llim1 that will not exceed memory is llim1 = LM AX−1, with LM AX the corresponding maximum determined as above for the single stage program. Once LM AX is determined, one can take any value 1 ≤ llim2 ≤ LM AX without exceeding memory. Because of the staging, the numerical output depends only on the sum llim1 + llim2, that

68 is, one is free to redistribute the computational effort between the first and second stages. 10. The parameters for the higher order integration routines are contained in the main program files in subprograms with names beginning with “setparam”. They are given in array constructors, and have been preset to values indicated. The program variable names have been chosen to roughly correspond to the symbol names in the formulas of Secs. VII and VIII. For example, for the direct cube routines, where λ is a free parameter, it is called aalamb; in the order 7 routine for simplex integration, λi1 and λi2 are the respective elements of the array constructors alamb1 (blamb1) and alamb2 (blamb2) corresponding to the first (second) choice of parameter values. (In the fifth order cube and simplex routines, where only one pair of array constructors is needed, they are called alamb (blamb), even though the corresponding quantity is labeled λi1 in Secs. VIIC and IXB.) These presets can be changed by the user to give a different sampling of the integrand in the integration subregions, subject to the following rules: (i) The inequalities in the comment statements must be obeyed, to keep the sampling points inside the subregion, as discussed in Secs. V and VI. (ii) The parameters in each array constructor must have non-degenerate values, so that the corresponding Vandermonde equations will be solvable. If two parameters in an array constructor are very close, solution of the Vandermonde system will have large truncation errors, so care should be taken to keep the parameters in each array constructor reasonably well spaced. (iii) The “a” and “b” array constructors should have different parameter values, since these are used to give the two different integrand evaluations used in the error estimate.

C.

Outputs (and their use in making memory and running time estimates)

Program outputs (except in the MPI case) are written to a file “outdat.txt” and also appear on the screen. In the MPI case, outputs are written to the output file specified by the system for a “print” statement. A brief description of output labeling follows: 1. All programs write out the user-set values of ip, llim (or llim1 and llim2), eps, ithinlev, ithinf un, isubdivision, and iaccuracy. They do not print out the values of the parameters in the array constructors in the subprograms setparam. 2. In all programs, outa and outb give two evaluations of the integral divided by the base region volume, corresponding respectively to the two different samplings of the integrand set by the “a” and “b” parameters in the array constructors, and |outdiff | gives the difference

69 |outa − outb|. The size of |outdiff | gives an estimate of the likely error in the answer; this estimate can be improved by evaluating the integral with a number of different choices of the array constructor parameters, and also by comparing the evaluations obtained using different program options as set by the user-set inputs. As noted in Sec. XI, to get the value of the integral without normalization by the base region volume, one must multiply outa and outb by the base region volume V0 . For a standard simplex, V0 = 1/p !, for a side 1 hypercube, V0 = 1, while for a half-side 1 hypercube, V0 = 2p . 3. In all programs, errsum gives the sum of the absolute values of the local subinterval thinning tests, errsum ≡

X

subregions

V (subregion)|f Ia (subregion, Ib (subregion) | ,

(174)

with f (A, B) the thinning function. As explained above, for the choice ithinf un = 1 this gives an upper bound for |outdiff |, and when Ia (subregion − Ib (subregion) has uniform sign over all subregions, errsum = |outdiff |. However, when signs are not uniform over subregions, errsum for ithinfun=1 can be much larger than the actual error, as in the two Gaussian example discussed below. 4. In all programs, l gives the level number, ind gives the number of subregions carried forward to the next level, indmax gives the maximum value of ind encountered over the course of the various levels that have been executed, f cncalls gives the number of function calls, t current gives the current elapsed time in seconds at the various levels of the first stage, and t f inal gives the total elapsed execution time in seconds. In the approximation in which the geometric series summing the number of function calls over the various levels is approximated by its largest term, corresponding to the highest level attained, and when there is no thinning, f cncalls ≃ T 2p(llim−1) for the single stage program, and f cncalls ≃ T 2p(llim1+llim2−1) for the “r” and “m” programs, with T the appropriate function call value from Table VI or VIII. In the absence of thinning, the exact formula summing the geometric series is f cncalls = T [2p(K+1) − 1]/[2p − 1], with K = llim − 1 for the single stage program and K = llim1 + llim2 − 1 for the “r” and “m” programs. When there is thinning, this

gives an upper bound on the number of function calls.

5. In the “recirculating” programs with main program name ending in “r”, t restart gives the time at which the second stage is initiated, in which the subregions carried forward from the

70 first stage are subdivided sequentially. During the second stage, the program will indicate approximately when it is 10, 20, ..., 90, 100 percent finished in sequentially processing the subregions carried forward from the first stage, by printing this information to the screen (but not by writing it to file). In interactive mode, this permits one to gauge how long the calculation will take to finish; if it looks like the calculation will take longer than one wishes to wait, one can stop execution and restart with different, more tractable, parameter values. Since these numbers are computed by integer division, the actual numbers may be 9,19,... or other similar strings, depending on the residue modulo ten of the number of regions carried forward. One can also estimate the total running time by multiplying t restart by the number of subregions ind carried forward to the second stage from the final level of the first stage, further multiplied by 2p(llim2−llim1) to correct for a difference in the first and second stage level numbers. (Similarly, for the single stage programs, from t current and ind at the output of any level l, the maximum running time, in the absence of thinning, to reach the level limit llim1 is the product t current times ind, further multiplied by ep(llim1−l) .)We generally found that timing values, on a laptop, varied by one or two tenths of a second between identical runs, so estimates of total running time become reliable only when one has proceeded to the point where several seconds have elapsed. The final statistics include indcount, which gives a sum of the ind values at each level of the second stage, and which indicates the ind value that would be needed if the second stage subdivisions were carried out in the first stage by using a larger llim1 value. Because of the staging strategy, the maximum ind value that is required is the much smaller number indmax. 6. In the MPI programs with main program name ending in “m”, t restart is the time at the end of the first stage when the subregions carried forward, numbering indstart in total, are distributed to multiple processes, and f cncalls gives the number of function calls up to this point. If t restart does not appear in the output, the program has completed execution before entering the second stage. Since MPI programs are typically run in batch mode, no intermediate statistics are output during the second stage, but one can make a rough estimate of total second stage running time by multiplying t restart by the number of subregions indstart carried forward, further multiplied by 2p(llim2−llim1) to correct for a difference in the first and second stage level numbers, and dividing by Nprocess − 1 (process 0 serves only as an accumulation register for the output of the remaining Nprocess − 1 processes). If this estimate is too large, one can stop execution and restart with less ambitious parameters.

71 The final statistics include indmaxprocess, which is the maximum of the final indmax over all of the processes.

XIV.

SOME SAMPLE RESULTS, AND OPEN QUESTIONS A.

Sample results

We turn now to some sample results which illustrate the capabilities of our numerical integration programs. Our first example is one given in the paper on VEGAS of Lepage (1978), consisting of the sum of two spherically symmetric Gaussians equally spaced along the diagonal of a cubical integration volume, 1 Ip = 2

1 aπ 1/2

p Z

1

d p x[e−

Pp

2 2 i=1 (x(i)−1/3) /a

+ e−

Pp

2 2 i=1 (x(i)−2/3) /a

] ,

(175)

0

with a = 0.1. In this form Ip can be evaluated by the “cubetile” programs which tile a unit hypercube with Kuhn simplexes. In order to apply the direct hypercube “cube” programs, we make the change of variable x = (1 + y)/2 to rewrite Ip as an integral over a half-side 1 hypercube, p Z 1 P P 1 1 1 p − pi=1 (y(i)+1/3)2 /(4a2 ) − pi=1 (y(i)−1/3)2 /(4a2 ) d y[e + e ] . (176) Ip = p 22 aπ 1/2 −1 In his paper Lepage compares numerical evaluations of Ip for various p with a target value of unity, which is accurate enough for his purposes. However, the programs given here are capable of much higher accuracy with current computers, so we will need a high accuracy evaluation of Ip for comparison purposes. We can get this by noting that the two Gaussians contribute equally to Ip to see this, set x → −x in Eq. (176) , and each individual Gaussian is the pth power of a one

dimensional integral J, giving

Ip =J p

,

1 J= 2aπ 1/2

Z

1

2 /(4a2 )

dye−(y+1/3)

.

−1

(177) The one-dimensional integral J can be evaluated in terms of error functions or complementary error functions, 1 J = [erf 1/(3a) + erf 2/(3a) ] , 2 1 =1 − [erfc 1/(3a) + erfc 2/(3a) ] 2

, (178)

72

TABLE X: Evaluation of J to 13 place accuracy using 3rd, 5th, 7th order cube routines iaccuracy |outdiff | f cncalls t f inal 10−16 0.2 × 105 < .02s

3

10−15 0.5 × 105 < .02s

5

10−13 0.9 × 105 < .02s

7

TABLE XI: Evaluation of powers of J to give expected values of Ip to 13 place accuracy p

Ip = J p

1 0.9999987857663 2 0.9999975715341 3 0.9999963573033 4 0.9999951430740 5 0.9999939288462 6 0.9999927146199 7 0.9999915003951 8 0.9999902861717 9 0.9999890719498

but it can also be evaluated numerically to 13 digit accuracy by running the “cube” program to a depth of twelve total levels. Running the “r” version of the programs with parameter values ip = 1, llim1 = 5, llim2 = 7, ithinlev = 12 (no thinning, which makes the results independent of eps and ithinf un), and iaccuracy = 3, 5, 7, we get from all three runs the result J = 0.9999987857663

.

(179)

The statistics for running time (on a MacBook Pro) and the number of function calls for these runs are given in Table X. Running with iaccuracy = 1 gave only 10 place accuracy with 12 levels, but gave 13 place accuracy when 20 levels were used (which took about a second, rather than hundredths of a second). Thus, for this calculation iaccuracy = 3 is the most cost-effective program. Running a program to raise J to powers then gives the expected results for Ip given in Table XI, with an uncertainty of 1 in the final decimal place. We give in Table XII results for dimensions p = 2, 3, 4, and 5 as obtained from the “r” version of the programs on a laptop, and in Table XIII for p = 7, 9 as obtained by running the “m” version on a 64 process cluster. (Laptop runs were done on a MacBook Pro and an older Dell Inspiron, and for the latter, the timings were rescaled by a factor 0.49 to give timings for a MacBook Pro. We

73 made cluster runs with 128 or 64 processes, and for the former, we rescaled the running time to that for 64 processes. We invite the reader to compare the running times and accuracies summarized in Tables XII and XIII with those that can be obtained from other integration programs.) For all of these runs, where thinning was used, we took ithinf un = 1. “Place accuracy” indicates the decimal place where differences first appear from the 13 place result in Table XI. To within an order of magnitude, this agrees with the difference |outdiff| between the two evaluations of the integral given by the program. The values of errsum for these integrals (not shown) were typically one to three orders of magnitude larger than both |outdiff| and the actual error, indicating that the local subinterval errors do not all have the same sign. Since |outdiff| can be smaller than the actual error, for an unknown integral it cannot be taken as giving the error; in this case the best way to estimate the error is to run the program with different choices of program options and to use the spread of results to estimate the error. From Tables XII and XIII, we see that a minimum of 5 levels is needed to get good accuracy for the double Gaussian example. With 5 levels, the smallest hypercube side is 1/32 = 0.03125, small enough to resolve the double Gaussian characteristic scale of 0.1 in good detail. On the other hand, with only 4 levels, the minimum side is 1/16 = 0.0625, making it harder to resolve a scale of 0.1 and limiting the accuracy to 4 significant figures. Most of the cluster runs were done without thinning, and thus should characterize the accuracy attainable for any function on a unit hypercube with a characteristic scale length of 0.1. For serial runs done with thinning (Table XII), the reduction in running time was proportional to the reduction in number of function calls, and ranged from a saving in the range 30% to a factor of 3.5, for values of ǫ which yield the same or one place less accuracy as when there is no thinning. For parallel cluster runs done with thinning (Table XIII), the reduction in running time is considerably less than the reduction in number of function calls. This arises from the fact that even though the program initially distributes subregions to processes using a shuffling routine that assigns adjacent subregions in the stack to different processes, some processes get subregions (like ones near the Gaussian peak) that are “hard” and so take longer to finish, as compared with processes that get “easy”, readily thinned regions near the Gaussian tails. Consequently, since the final time t f inal records the time when all processes have finished, it is not reduced by thinning in proportion to the number of function calls. From Tables XII and XIII, we see that thinning with a value of ǫ equal to the error level in the runs with no thinning leads, in the double Gaussian examples, to a reduction in accuracy. This reflects the fact that in the double Gaussian case, errsum is typically 2 to 3 orders of magnitude larger than |outdiff|, indicating that the local errors are not of constant sign, and also errsum is 2

74 to 3 orders of magnitude larger than ǫ, indicating that the local thinning condition is not satisfied in all subregions. When local subregion errors alternate in sign and the thinning condition is not uniformly obeyed, there can be cancelations of errors in the output integrals, leading to improved accuracy and a smaller |outdiff|, but thinning can then reduce the degree of cancelation and reduce the accuracy. Finally, we note that the 6 level run with iaccuracy = 9 did not give as many significant figures as the corresponding run with iaccuracy = 7; we believe this is due to the increased truncation errors associated with running the ninth order routine. The cluster which we used was more than an order of magnitude slower in running quadruple precision real(16) as opposed to double precision real(8) code, so it was not feasible for us to investigate this further

by repeating the ninth order 6 level run in quadruple precision. (We did, however, test in quadruple precision that the ninth order routines integrate polynomials of ninth degree or lower to within expected truncation errors.) We also studied the double Gaussian example using the “cubetile” programs. These integrate over a unit hypercube by integrating, over a single Kuhn simplex, the symmetrized function that sums over corresponding points of a Kuhn tiling of the hypercube. Results for this study are given in Table XIV. Because of the p ! symmetrization factor in the number of function calls, the

“cubetile” programs take longer to run than the “cube” programs for a corresponding number of subdivision levels. Because tiling of a hypercube with Kuhn simplexes does not reduce the subregion side length, the p ! increase in number of subregions does not compensate for insufficient resolution when the attainable level number is not large enough. This can be seen from the results in Table XIV. For p = 5, where 5 levels can be run on the cluster in reasonable time, significantly better results are obtained from a 5 level “cubetile” run than are obtained from a 5 level “cube” run, at the price of a factor of 120 more function calls. However, for p = 7 it was not possible to do a 5 level calculation in reasonable cluster running time, so we had to settle for 4 levels, which as we saw above has insufficient resolution to give very high accuracy for the double Gaussian test problem. The “cubetile” results in this case are better than the 4 level “cube” results, reflecting the factor of 5040 more function calls, but the 6 place accuracy achieved is not as good as what can be achieved, in less running time, by using the “cube” program with 5 levels. We conclude that the “cubetile” programs become significantly less useful as the dimension p increases, because of the p ! symmetrization factor necessitated by Kuhn tiling. Our next two examples illustrate results obtained from the “simplex” programs to evaluate integrals over a standard simplex. For our first example, we consider an integral based on the

75

TABLE XII: Double Gaussian results using the “cube” program for ip ≡ p =2 ,3 ,4 ,5 ,7 (timings for MacBook Pro); levels ≡ llim1 + llim2 and “place accuracy” compares to Table XI ip iaccuracy levels

ithinlev

eps t f inal f cncalls

2

3

10

no thinning

–

1.2s

2

5

10

no thinning

–

3.4s

2

5

10

2

2

7

10

2

7

3

−13

10

2.1s

no thinning

–

8.5s

10

2

10−13

2.4s

3

7

no thinning

–

1.7s

3

3

9

3

5

3

2

10

−13

72s

7

2

10−9

2.8s

5

7

3

5

3

2

10

−13

4.1s

9

2

10−13

190s

7

7

2

10−12

13s

4

3

7

2

10−9

55s

4

5

7

4

7

4

2

10

−13

310s

5

2

10−7

3.7s

7

6

5

3

5

2

10

−13

90s

5

2

10−13

6.4s

3

6

2

10

5

5

5

no thinning

5

5

5

2

5

7

5

no thinning

5

7

6

2

7

3

5

no thinning

7

3

5

2

10

7

5

5

2

10−9

−13

– 10

−13

– 10

−9

– −9

7

7

4

no thinning

–

7

7

4

2

10−9

170s 49s 29s 240s

(outa + outb)/2

|outdiff |

0.31 × 107 0.9999975715340 0.4 × 10−12

place accuracy 13

0.94 × 107 0.9999975715339 0.3 × 10−14

13

0.59 × 10

13

7

0.9999975715340

10

−14

0.24 × 108 0.9999975715342 0.5 × 10−12

13

0.68 × 107 0.9999975715342 0.4 × 10−12

13

0.999996358

9

0.39 × 107 0.18 × 10

9

0.71 × 107 0.11 × 10

8

0.2 × 10−8

0.999996357305 0.9 × 10 0.999996356

−11

0.3 × 10−10

0.99999635730 0.2 × 10

12 9

−10

11

0.48 × 109 0.9999963573032 0.3 × 10−13

13

0.32 × 108 0.9999963573033 0.5 × 10−12

13

0.999995143

9

0.12 × 109 0.71 × 10

9

0.83 × 107 0.20 × 10

9

0.14 × 108

0.3 × 10−8

0.99999514305 0.4 × 10 0.99999510

−10

0.3 × 10−8

0.99999514308 0.2 × 10

−10

11 8 11

0.9999940

7

9

0.7 × 10−6

0.99999394

8

0.10 × 109

0.5 × 10

−7

0.99999386

7

0.63 × 10

8

0.2 × 10−6

0.99999386

7

0.50 × 109

0.2 × 10

−6

0.99999393

0.1 × 10−7

8

0.999993926

0.3 × 10

−10

9

0.37 × 10

1700s 0.34 × 10

10

6100s 0.78 × 1010

0.999992

6

840s

10

0.9 × 10−6

0.999991

6

3500s 0.62 × 1010

0.9 × 10

−6

0.999991

0.2 × 10−6

6

−2

3 3

0.11 × 10

10

0.9997

0.60 × 109

0.4 × 10

0.9995

0.4 × 10−2

1200s 0.21 × 10 340s

Feynman-Schwinger formula of Eq. (32), with D0 = 1 and D1 = ... = Dp = a, 1 = p! ap

Z

standard simplex

1 [1 + (a − 1) x(1) + ... + x(p) ]p+1

.

(180)

Taking a = 0.1 (and for p = 5, also a = 0.01) gives an integral that is sharply peaked on the diagonal hyperplane 1 = x(1) + ... + x(p) bounding the simplex. Results for this integral obtained from the “m” version of the program, with symmetric subdivision and no thinning on a 64 process

76

TABLE XIII: Double Gaussian results using the “cube” program for ip ≡ p =7, 9 from a 64 process cluster; levels ≡ llim1 + llim2 and “place accuracy” compares to Table XI ip iaccuracy levels 7

3

5

ithinlev

eps

t f inal

no thinning

–

11s

7

5

5

no thinning

–

120s

7

7

5

no thinning

–

760s

7

1

6

no thinning

–

2900s

7

3

6

no thinning

–

4300s

7

5

6

no thinning

–

7

7

6

no thinning

–

7

7

6

2

7

9

5

no thinning

10

−10

–

7

9

6

no thinning

–

9

3

5

no thinning

–

9

5

5

no thinning

–

9

7

5

no thinning

–

9

7

5

2

10

f cncalls (outa + outb)/2 |outdiff | place accuracy

0.78 × 1010

0.999992

0.9 × 10−6

6

0.44 × 10

0.9999913

0.5 × 10

−6

7

0.99999150

0.3 × 10−7

8

0.999993

0.4 × 10

−6

6

0.9999915

0.8 × 10−7

7

0.99999150

0.9 × 10

−8

8

0.27 × 1012 0.52 × 10

12

0.10 × 1013

17000s 0.56 × 10

13

98000s 0.35 × 1014 0.9999915003 0.1 × 10−9

10

37000s 0.37 × 10

13

0.99999148

8

0.14 × 1013

0.1 × 10

−9

0.99999144

0.6 × 10−7

8

0.99999148

0.1 × 10

−7

8

0.999989

0.1 × 10−5

6

0.9999888

7

380000s 0.13 × 1015

0.8 × 10

−6

0.99998907

0.7 × 10−7

8

100000 s 0.52 × 10

0.9999885

0.7 × 10

7

4000s

510000s 0.18 × 10 7500s

15

0.25 × 1013

49000s 0.17 × 10

−8

11

14

13

−7

TABLE XIV: Double Gaussian results using the “cubetile” program with symmetric subdivision, for ip ≡ p =5, 7 from a 64 process cluster; levels ≡ llim1 + llim2 and“place accuracy” compares to Table XI ip iaccuracy levels

ithinlev

eps t f inal

5

7

5

no thinning –

7

7

4

no thinning –

73s

f cncalls

(outa + outb)/2

120 × 0.27 × 10

9

12000s 5040 × 0.93 × 109

|outdiff |

0.999993928884 0.7 × 10 0.999988

place accuracy 12

−11

0.1 × 10−5

6

cluster, are given in Table XV. “Place accuracy” indicates the decimal place where differences first appear from the exact answer a−p , and this correlates well with the difference |outdiff| between the two evaluations of the integral given by the program. The values of errsum for these integrals (not shown) were nearly identical to |outdiff|. As our second simplex example, we consider the polynomial integral c.f. Eq. (169) 2p+1 = p! (3p + 2)(3p + 1)...(p + 2)(p + 1)

Z

2

standard simplex

[1 − x(1) − ... − x(p)]

p Y

x(i)2

,

(181)

i=1

which is strongly suppressed at all the vertices of the simplex. Running a program to evaluate the exact answer for this integral on the left hand side of Eq. (181), and then evaluating the integral on the right on a laptop using the “simplex” programs, gives the results in Table XVI.

77

TABLE XV: Feynman-Schwinger integral for ip ≡ p = 5, 7, 9 from a 64 process cluster, with no thinning; levels ≡ llim1 + llim2 and “place accuracy” compares to the exact answer a−p ip

a

iaccuracy levels t f inal f cncalls

5 0.1

7

5

5 0.01

7

9

7 0.1

7

5

7 0.1

7

6

9 0.1

7

5

(outa + outb)/2 |outdiff | place accuracy

0.27 × 109

0.99998 × 105

0.12 × 10

12

7

56000s 0.48 × 10

14

0.39s

0.3

5

52000s 0.22 × 1014 0.999998 × 1010 0.3 × 104 160s

0.99995 × 10

20000s 0.15 × 1014 0.9999995 × 107 0.9999 × 10

9

0.2 × 10

3

2. 0.3 × 10

6 5 7

5

4

TABLE XVI: Polynomial integral for ip ≡ p = 4, 5 (timings for a MacBook Pro); levels ≡ llim1 + llim2 and “place accuracy” compares to the exact answer on the left hand side of Eq. (181) ip iaccuracy levels t f inal f cncalls 4

5

6

4

5

8

5

7

5

5

7

6

8.7s

0.63 × 10

8

(outa + outb)/2 0.88095326 × 10

−8

exact answer 0.8809532619056 × 10

|outdiff | −8

0.1 × 10

place accuracy

−17

2200s 0.16 × 1011 0.880953261905 × 10−8 0.8809532619056 × 10−8 0.3 × 10−21 39s

0.27 × 109

0.21591990 × 10−10

0.2159199171337 × 10−10 0.2 × 10−18

1300s 0.86 × 1010 0.2159199171 × 10−10 0.2159199171337 × 10−10 0.7 × 10−21

As our final example, we consider the 1 dimensional singular integral Z 1 1 dx √ = π/2 ≃ 1.57079633 . 1 − x2 0

(182)

In Table XVII we give results for this integral using the “r” version of the “cube” program, with p = ip = 1, llim1 = 14 and llim2 = 15 (that is, using the maximum allowed number of levels), iaccuracy = 5, ithinlev = 0 (that is, thinning starts at the outset), and eps = 10−10 , as a function of the choice of thinning function ithinf un. We see that in this case, ithinf un = 3 gives the fastest evaluation, with ithinf un = 2 next fastest and ithinf un = 1 the slowest. This differs from the double Gaussian examples, where ithinf un = 1 gives better results than either ithinf un = 2 or ithinf un = 3.

B.

Programming extensions and open questions

There are a number of possible extensions of the programs that could be pursued in the future. (1) The MPI version of the programs could be rewritten to include redistribution of the process workload after each level ℓ of the second stage. This would make the reduction in running time when using thinning track more closely with the reduction in the number of function calls. (2) The multistage strategy could be extended to a third (or more) stages, by not harvesting the subregions

8 12 8 10

78

TABLE XVII: Evaluation of a one dimensional singular integral with different thinning functions. Running times were all less than 0.1s; “place accuracy” compares to the exact answer ithinf un f cncalls (outa + outb)/2 1 2 3

|outdiff |

place accuracy

0.24 × 106

1.570778

5

0.13 × 10

5

0.35 × 10−5

1.570778

5

0.45 × 104

0.35 × 10

−5

1.570777

0.43 × 10−5

5

that fail to obey the thinning condition at the end of the second stage, but instead writing them to a memory device which is then read sequentially by a third stage, etc. (3) One could build in an option of permuting the simplex vertices at the start of the simplex programs, which gives a different subdivision, and therefore a different evaluation of the integral for use in estimating errors. (4) Finally, we remark that the same subdivision, thinning, and staging strategies that we have used will apply with any integration formulas that give two different estimates of the answer from each subregion, not just the parameterized moment fitting formulas that we developed in Secs. VII and IX. There are also a number of mathematical questions that we have left open. (1) We found numerical evidence that symmetric subdivision of a standard simplex obeys the bound of Eq. (46) for reduction of side length, and that after ℓ symmetric (recursive) subdivisions, the resulting subsimplexes each fit within a hypercube of side 1/2ℓ (1/2ℓ−1 ). We do not have a proof of these conjectures, but have assumed them true in constructing the programs. (2) Given the regularities in the construction of parameterized fifth, seventh, and ninth order integration formulas for the simplex and hypercube cases, it would be of interest to try to find a general all-orders rule for these. (3) We have not addressed the question of analytic error estimates for the parameterized integration formulas. (4) We have not addressed in any systematic way the question of deciding which thinning function is optimal for a given choice of integrand. (5) It would be of interest to study the systematics, in the moment fitting method, of the tradeoff between the number of parameters that are fixed by appropriate conditions, and the number of function calls.

XV.

ACKNOWLEDGEMENTS

I wish to thank the School of Natural Sciences computing staff, Prentice Bisbal, Kathleen Cooper, Christopher McCafferty, and James Stephens, for their helpful support and advice throughout this project. I also want to acknowledge several helpful conversations with Prentice Bisbal about

79 Fortran language features, and to thank Susan Higgins for drawing the figures. I am grateful to Herman D. Knoble for permission to use in the hypercube tiling routines his program BestLex (which is in the public domain and is not covered by the copyright for this book). This work was partially supported by the Department of Energy under grant DE-FG02-90ER40542. I also wish to acknowledge the hospitality of the Aspen Center for Physics during the summers of 2009 and 2010.

XVI.

REFERENCES

Note: These references make no pretense to completeness; they list what I have found useful in constructing the algorithms discussed in this book. For research groups with a continuing program, I have listed only recent publications that contain earlier references. Cools, R. and Haegemans, A. (2003) Algorithm 824: CUBPACK: A Package for Automatic Cubature; Framework Description. ACM Trans. Math. Software 29, 287-296. Dejnakarintra, M. and Banjerdpongchai (undated), D. An Algorithm for Computing the Analytical Inverse of the Vandermonde Matrix. Searchable on-line. Edelsbrunner, H. and Grayson, D. R. (2000). Edgewise Subdivision of a Simplex. Discrete Comput. Geom. 24, 707-719. Genz, A. and Cools, R. (2003) An Adaptive Numerical Cubature Algorithm for Simplices ACM Trans. Math. Software 29, 297-308. Good, I. J. and Gaskins, R. A. (1969) Centroid Method of Integration. Nature 222, 697-698. Good, I. J. and Gaskins, R. A. (1971) The Centroid Method of Numerical Integration. Numer. Math. 16, 343-359. Grundmann, A. and M¨oller, N. M. (1978) Invariant Integration Formulas for the n-Simplex by Combinatorial Methods. Siam J. Numer. Anal. 15, 282-290. Hahn,

T.

(2005)

CUBA-

a

library

for

multidimensional

numerical

integration.

arXiv:hep-ph/0404043. Heinen, J. A. and Niederjohn, R. J. (1997) Comments on “Inversion of the VanderMonde Matrix”. IEEE Signal Process. Lett. 4, 115. Kahaner, D. K. and Wells (1979), M. B. An Experimental Algorithm for N -Dimensional Adaptive Quadrature. ACM Trans. Math. Software 5, 86-96. Knoble, H. D. (1995), website download of BestLex subroutine.

80 Krommer, A. R. and Ueberhuber, C. W. (1994). Numerical Integration on Advanced Computer Systems, Lecture Notes in Computer Science 848. (Berlin: Springer-Verlag), p183. Kuhn, H. W. (1960). Some combinatorial Lemmas in Topology. IBM J. Res. and Dev. 4, 518-524. Lepage, G. P. (1978). A New Algorithm for Adaptive Multidimensional Integration. J. Comp. Phys. 27, 192-203. Lyness, J. N. (1965) Symmetric Integration Rules for Hypercubes II. Rule Projection and Rule Extension. Math. Comp. 19, 394-407. McKeeman, W. M. (1962). Algorithm 145: Adaptive numerical integration by Simpson’s rule. Commun. ACM 5, 604. McNamee, J. and Stenger, F. (1967). Construction of Fully Symmetric Numerical Integration Formulas. Num. Math. 10, 327-344. Moore, D. (1992) Subdividing Simplices, in Graphics Gems III, D. Kirk ed. (Boston: Academic Press), pp. 244-249 and pp. 534-535. See also Moore, D. (1992), Simplicial Mesh Generation with Applications, Cornell University dissertation. Neagoe, V.-E. Inversion of the Van der Monde Matrix. IEEE Signal Process. Lett. 3, 119-120. Osborne, M. R. (2001) Simplicial Algorithms for Minimizing Polyhedral Functions (Cambridge: Cambridge U. Press), p. 5. Plaza, A. (2007). The eight-tetrahedra longest-edge partition and Kuhn triangulations. Comp. & Math. with Applications 54, 426-433, Fig. 1. Pontryagin, L. S. (1952) Foundations of Combinatorial Topology, English translation. (Rochester: Graylock Press), pp. 10-12. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992) Numerical Recipes in Fortran, pp. 82-85. Sch¨ urer, R. (2008) HIntLib Manual, available on-line at: mint.sbg.ac.at/HIntLib/manual.pdf . Stroud, A. H. (1971) Approximate Calculation of Multiple Integrals (Englewood Cliffs: PrenticeHall). Ueberhuber, C. W. (1995) Numerical Computation 2: Methods, Software, Analysis (Berlin: Springer-Verlag), pp. 161-162. Wikipedia articles on: Adaptive Simpson’s method, Barycentric coordinates, Dirichlet distribution, Gaussian quadrature, Hypercube, Polynomial long division, Simplex.

81 XVII.

CONTENTS OF PROGRAMS IN DIRECTORIES

Each file listed in this summary contains multiple programs, each of which begins with comment lines describing its function. • The simplex programs take as base region the standard simplex of Eq. (30). When used as part of the cubetile programs, the base region is the Kuhn simplex of Eq. (37).

• The cubetile programs take as base region the side 1 hypercube of Eq. (39).

• The cube programs take as base region the half-side 1 (i.e., side 2) hypercube of Eq. (47). • The numbers after simplex or cube indicate the integration orders that are included. A.

Directory simplex123

This directory contains: • The subprogram file simplexsubs123.for. • Main program files simplexmain123.for, cubetilemain123.for. • Recirculating main program files simplexmain123r.for, cubetilemain123r.for. • MPI parallel main program files simplexmain123m.for, cubetilemain123m.for. B.

Directory simplex4

This directory contains: • The subprogram file simplexsubs4.for. • Main program files simplexmain4.for, cubetilemain4.for. • Recirculating main program files simplexmain4r.for, cubetilemain4r.for. • MPI parallel main program files simplexmain4m.for, cubetilemain4m.for.

82 C.

Directory simplex579

This directory contains: • The subprogram file simplexsubs579.for. • Main program files simplexmain579.for, cubetilemain579.for. • Recirculating main program files simplexmain579r.for, cubetilemain579r.for. • MPI parallel main program files simplexmain579m.for, cubetilemain579m.for. D.

Directory simplex579 16

This directory contains: • The subprogram file simplexsubs579 16.for. • Main program files simplexmain579 16.for, cubetilemain579 16.for. • Recirculating main program files simplexmain579 16r.for, cubetilemain579 16r.for. • MPI parallel main program files simplexmain579 16m.for, cubetilemain579 16m.for. E.

Directory cube13

This directory contains: • The subprogram file cubesubs13.for. • Main program file cubemain13.for. • Recirculating main program file cubemain13r.for. • MPI parallel main program file cubemain13m.for. F.

Directory cube579

This directory contains: • The subprogram file cubesubs579.for.

83 • Main program file cubemain579.for. • Recirculating main program file cubemain579r.for. • MPI parallel main program file cubemain579m.for. G.

Directory cube579 16

This directory contains: • The subprogram file cubesubs579 16.for. • Main program file cubemain579 16.for. • Recirculating main program file cubemain579 16r.for. • MPI parallel main program file cubemain579 16m.for.