Feb 9, 2018 - the law on subsets of Z describing the limiting opinion of each node ... for which the limiting opinion is 1, then A converges in law as...

0 downloads 13 Views 1MB Size

Scott Sheffield†

arXiv:1802.03346v1 [math.PR] 9 Feb 2018

February 12, 2018

Abstract The Schelling model, introduced by Schelling in 1969 as a model for residential segregation in cities, describes how populations of multiple types self-organize to form homogeneous clusters of one type. In this model, vertices in an N -dimensional lattice are initially assigned types randomly. As time evolves, the type at a vertex v has a tendency to be replaced with the most common type within distance w of v. We present the first mathematical description of the dynamical scaling limit of this model as w tends to infinity and the lattice is correspondingly rescaled. We do this by deriving an integro-differential equation for the limiting Schelling dynamics and proving almost sure existence and uniqueness of the solutions when the initial conditions are described by white noise. The evolving fields are in some sense very “rough” but we are able to make rigorous sense of the evolution. In a key lemma, we show that for certain Gaussian fields h, the supremum of the occupation density of h − φ at zero (taken over all 1-Lipschitz functions φ) is almost surely finite, thereby extending a result of Bass and Burdzy. In the one dimensional case, we also describe the scaling limit of the limiting clusters obtained at time infinity, thereby resolving a conjecture of Brandt, Immorlica, Kamath, and Kleinberg.

i

t=0

t>0

t=∞

Figure 1: The Schelling model on the two-dimensional torus with two types (red and blue). At time t = 0 (left) the types of the nodes are chosen uniformly and independently at random. Each node i is associated with a neighborhood (shown in green) and an independent rate one Poisson clock. Every time the clock of a node rings, it updates its type to the most common type in its neighborhood. Eventually we reach a stable configuration (right).

∗ Massachusetts † Massachusetts

Institute of Technology, Cambridge, MA, [email protected] Institute of Technology, Cambridge, MA, [email protected]

1

Contents 1 Introduction 1.1 History . . . . . . . 1.2 Overview . . . . . . 1.3 The Schelling model 1.4 Main results . . . . . 1.5 Notation . . . . . . . 2 The 2.1 2.2 2.3 2.4

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

continuum Schelling model Occupation measures of random fields: basic definitions . . . . . . . . . . . . Supremum of the occupation kernel on Lipschitz functions for moving average Existence and uniqueness of solutions of the continuum Schelling model . . . Long-time behavior of the one-dimensional continuum Schelling model . . . .

. . . . .

2 2 3 5 6 10

. . . . . . . . . Gaussian fields . . . . . . . . . . . . . . . . . .

11 12 13 19 22

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

3 The discrete Schelling model 30 3.1 The early phase of the discrete Schelling model . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Limiting states for the one-dimensional discrete Schelling model . . . . . . . . . . . . . . . . . 37 4 Open problems

1

47

Introduction

The Schelling model [Sch69, Sch71, Sch78] was initially introduced to explain residential segregation in cities, and is one of the earliest and most influential agent-based models studied by economists. Variants of the model have been studied by thousands of researchers within a number of disciplines, e.g. social sciences, statistical mechanics, evolutionary game theory, and computer science, see the works referenced below and [Cla91, PW01, LJ03, VK07, PV07, SS07, DCM08, Odo08, GGS+ 08, GVN09, GBLJ09, SVW09] for a small and incomplete selection of these works. Until recently [BIKK12, BEL14, IKLZ15, BEL15c, BEL15a, BEL15b] most analysis of the model was based either on simulation, non-rigorous analysis or so called “perturbed” versions of the model (discussed below). We will discuss the Schelling model history and give an informal overview of the paper in Sections 1.1 and 1.2, and we provide a precise definition of the model in Section 1.3.

1.1

History

In the original formulation of the model, individuals of two “types” occupy a subset of the nodes of a graph, and at random times an individual moves to a free (i.e., unoccupied) position in the graph. Individuals move to locations at which they will have more neighbors of their own type. Schelling showed, using simulations he implemented manually with pennies and dimes on a ruled sheet of paper [Sch78], that segregation occurs even if the agents have only a weak preference for being in regions with a high density of their own type. His findings have been confirmed later by a huge number of simulations of other researchers, and his findings have strongly influenced debates about the causes of residential segregation [CF08]. The introduction of the model also contributed to Schelling winning the Nobel Memorial Prize in Economics in 2005 [N05]. The first mathematically rigorous results on the model considered a variant where the dynamics describing the transition between states were “perturbed” in the sense that agents have a small probability p > 0 of acting against their preference [You01, Zha04]. The perturbed model was analyzed by studying the stationary distribution of the associated Markov chain. In particular, the stochastically stable states, which are states whose stationary probability is bounded away from zero when p → 0, were studied. The stochastically stable states are proved to be those which minimize the length of the interface between the two types of individuals (i.e., the number of neighboring pairs containing one individual of each type) so that using the terminology of statistical physics the stochastically stable states correspond to Ising model ground states. 2

One interesting property of the unperturbed model is that it can be shown to stabilize almost surely in finite time, i.e., after some point in time the agents stop moving. It has been argued (see e.g. [IKLZ15]) that these limiting stable configurations have at least some properties in common with the segregation patterns observed in real cities, e.g. since they tend to appear more irregular than the stochastically stable states of the unperturbed model. We will not address real world segregation patterns in this paper. The first mathematically rigorous analysis of the unperturbed model by Brandt, Immorlica, Kamath, and Kleinberg [BIKK12] concerns a version of the model on the one-dimensional torus where the neighborhood of a node is given by its nearest 2w + 1 neighbors (including itself) for some w ∈ N. They prove that the configuration of types in the stable limiting configuration consists of intervals of length at most polynomial in w. In [IKLZ15] the authors consider a two-dimensional Schelling model where an agent only changes location if the fraction of his neighbors having the same type as himself is < 1/2 − for some 1. They prove that the expected diameter of the segregated region containing the origin in the final configuration grows at least exponentially in w2 . See also [BEL14, BEL15c, BEL15a, BEL15b] for recent rigorous results on the unperturbed Schelling model in one, two, and three dimensions. The models studied in these papers have a more general initial configuration of types and more general tolerance parameters than the models in [BIKK12, IKLZ15] and the current paper. The authors are particularly interested in parameter values which lead to either very high degree of segregation, total takeover of one type, or almost no changes relative to the initial configuration. We will focus on a variant of the model which we call the single-site-update Schelling model, briefly explained later in this paragraph, in which individual vertices are updated one at a time. This variant of the model is also the one considered in [BEL15c, BEL15a, IKLZ15]. Some of the other papers mentioned above consider the pair-swapping Schelling model, where two individuals of different types will swap positions with each other if this leads to both nodes having more neighbors of their own type. In both variants of the model all the nodes of the considered graph are occupied, i.e., there are no free or unoccupied nodes. In the single-site-update Schelling model unsatisfied individuals change types, instead of swapping with each other. That is, one picks a random individual and allows that individual to change type if desired, instead of picking a pair of individuals and asking them to swap locations if desired. In the single-site-update version, the number of vertices of a given type is not constant. Instead, one imagines that there is a larger “outside world” beyond the graph being considered, and that when a vertex changes type, it corresponds to an individual within the configuration swapping location with someone from the “outside world.” The single-site-update evolution is essentially equivalent to the pair-swapping model evolution in a setting with an “outside world” region (disconnected from the main lattice graph under consideration) that contains a large number of unsatisfied individuals of each type. We will focus on the single-site-update variant in this paper because it is cleaner mathematically (one only has to deal with one individual at a time when making updates), but we will explain at the end of Section 1.4 that our first main result (Theorem 1.1) also holds in the pair-swapping setting.

1.2

Overview

We study an unperturbed, single-site-update version of the Schelling model on an N -dimensional lattice with M ≥ 2 different types. A node is unsatisfied if the most common type in its neighborhood differs from its current type, and the size of the neighborhood is described by a constant w ∈ N. Adapting the vocabulary of majority dynamics, we call the type of the node the opinion of the node. At time zero, each node is assigned an opinion uniformly and independently at random. Each node is associated with an independent Poisson clock, and every time the clock of a node rings it updates its opinion to the most common opinion in its neighborhood. In other words, a node changes its opinion when its Poisson clock rings if and only if the node is currently not satisfied. We prove a dynamical scaling limit result for the early phase of the Schelling dynamics for any N ∈ N and M ∈ {2, 3, . . . }. We define a vector-valued function Y w called the normalized bias function; as explained below, Y w (z) is the vector whose components are (a normalizing constant times) the M opinion densities (minus their expectations) in the radius w box centered at z. We prove that Y w converges in the scaling limit to the solution Y of a differential equation (more precisely, an integro-differential equation) with Gaussian 3

initial data. We call the function Y the continuum bias function, and we call the associated initial value problem the continuum Schelling model. See Theorem 1.1 and Proposition 3.1. Solutions of the differential equation are not unique for all choices of initial data. However, we prove existence and uniqueness of solutions for Gaussian initial data. The basic idea of the argument is to note that even though the initial Gaussian normalized bias function is very rough, the change in its value from the initial time to a finite later time is a.s. a (random) Lipschitz function. Focusing on this difference, we are left with a random ODE in the (more regular) space of Lipschitz functions. To establish existence and uniqueness of the evolution within this space, we wish to apply a variant of the Picard-Lindel¨ of theorem, but doing so requires some sort of continuity in the corresponding ODE, which requires us to understand whether there are situations where two coordinates of the normalized bias function are very close on a large set, so that even small perturbations lead to big changes in the direction the functions are evolving. It turns out that one can show these situations are unlikely by establishing control on the maximal occupation kernel corresponding to the intersection of the initial Gaussian function with a Lipschitz function (where the maximum is taken over all functions with Lipschitz norm bounded by a fixed constant). Analogous results for Brownian local time, established by Bass and Burdzy in [BB01], turn out to be close to what we need, and we are able to adapt the techniques of [BB01] to our higher dimensional setting with a few modifications. In the special case when N = 1 and M = 2 we also prove a scaling limit result for the final configuration of opinions. This confirms a conjecture in [BIKK12]. More precisely, we show in Theorem 1.2 below that the law on subsets of Z describing the limiting opinion of each node converges upon rescaling by w. The theorem says that if we study the model on the rescaled lattice w−1 Z and let A ⊂ w−1 Z be the set of nodes for which the limiting opinion is 1, then A converges in law as w → ∞, viewed as an element in the space of closed subsets of R equipped with the Hausdorff distance. The scaling limit result is proved by studying the long-time behavior of the continuum bias function Y . This is one of the more technically interesting parts of the paper, as a number of tricks are used to rule out anomalous limiting behavior. Our conclusion is that in the limit one obtains a random collection of homogeneous neighborhoods, each of width strictly greater than one. The idea of the proof is to show that if this does not occur, then it will occur if we make a slight perturbation to the initial data, and a delicate analysis of the differential equation is required to show that this is indeed the case. After we describe the continuum dynamics and (for N = 1, M = 2) its limiting behavior, we will need to do some additional work to make the connection with the discrete model. We consider two phases separately: First we study the model up to time Cw−N/2 for C 1, and then (for N = 1, M = 2) we consider times larger than Cw−1/2 . The first phase of the evolution is governed by the differential equation, and we prove that the differential equation predicts the evolution of the discrete model well by bounding the error which accumulates during a short interval of length ∆t. The second phase starts when the solution of the differential equation has almost reached its limiting state with homogeneous intervals. We show that with high probability the homogeneous intervals observed at time Cw−1/2 will continue to exist until all nodes have reached their final opinion. Nodes near the boundary between two intervals at time Cw−1/2 have approximately half of their neighbors of each opinion, which makes it hard to control the evolution of the bias for these nodes; however, we do manage to show that with high probability each interval does not shrink too much before all nodes have reached their final opinion. In the remainder of the introduction we will give a precise definition of the Schelling model and state our main results. In Section 2.3 we show existence and uniqueness of solutions of the continuum Schelling model by using results from Section 2.2, and in Section 2.4 we prove that for the one-dimensional model with M = 2 the sign of the solution converges almost surely at almost every point. In Section 3.1 we prove that the continuum Schelling model describes the discrete Schelling model well for small times and large w. In Section 3.2 we conclude the proof of the scaling limit result for the one-dimensional Schelling model. We also prove (proceeding similarly as in [TT14]) that the opinion of each node converges a.s. for any N ∈ N and M ≥ 2, and we include a lemma which might be related to the typical cluster size for the limiting opinions in higher dimensions. We conclude the paper with a list of open questions in Section 4.

4

1.3

The Schelling model

We will start by defining the Schelling model on a general simple graph G with vertex set V (G) and edge set E(G). The vertex set V (G) may be infinite, but we assume G is locally finite. Let M ∈ {2, 3, 4, . . . }. Each i ∈ V (G) is associated with an opinion in {1, ..., M } and an independent unit rate Poisson clock, i.e., each node is associated with a clock such that the times between two consecutive rings of the clock are distributed as i.i.d. unit rate exponential random variables. Let X(i, t) ∈ {1, . . . , M } denote the opinion of node i at time t ≥ 0, and let N(i) := {j ∈ V (G) : (i, j) ∈ E(G)} ∪ {i} be the neighborhood of i. Every time the Poisson clock of a node rings, the node updates its opinion according to the following rules: (i) The node chooses the most common opinion in its neighborhood if this is unique. In other words, we set X(i, t) = m for m ∈ {1, . . . , M } if for all m0 ∈ {1, . . . , M } \ {m} we have |{j ∈ N (i) : X(j, t) = m}| > |{j ∈ N (i) : X(j, t) = m0 }|. (ii) If there is a draw between different opinions, and the current opinion of the node is one of these opinions, the node keeps its current opinion. (iii) If there is a draw between different opinions, and none of these opinions are equal to the current opinion of the node, the new opinion of the node will be chosen uniformly at random from the set of most common opinions in its neighborhood. Note that almost surely no two Poisson clocks will ring simultaneously, even when |V (G)| is infinite. Furthermore, one can show that for graphs with bounded degree, and for any fixed vertex i ∈ V (G) and a fixed time t ∈ R+ , the set of vertices whose initial opinion may have influenced the opinion of i at time t, given the times at which the various Poisson clocks were ringing, is almost surely finite, see e.g. [TT14, Claim 3.5]. These two observations imply that the configuration at each time t is a.s. determined by the initial configuration and the ring times, along with knowledge about how the draws described in (iii) are resolved. In this paper, we will consider the Schelling model on a lattice, and we consider scaling limits as the neighborhood size tends to infinity, but we will be fairly flexible about the “shape” (circle, square, etc.) of the neighborhood. Let N ∈ N be a parameter describing the dimension of the graph, and let w ∈ N be a parameter we call the window size. Let N = (−1, 1)N (or, alternatively, let N ⊂ (−1, 1)N be a sphere or some other shape; precise conditions on N appear below). Define the neighborhood N(i) of an element i of ZN by N(i) = {j ∈ ZN : w−1 (j − i) ∈ N }. When the dimension N is greater than one, we will mostly work on a torus whose size is a large constant times the neighborhood size; precisely, for a fixed constant R ∈ {3, 4, . . . } and defining the one-dimensional torus S = Z/(RwZ) we will work on the torus SN . We use the torus instead of ZN mainly because we do not establish existence and uniqueness of solutions of a particular differential equation for the model on ZN when N ≥ 2. In the remainder of this section we describe the Schelling model in terms of SN rather than ZN , but we obtain the model on ZN by repeating the description with ZN instead of SN . To simplify notation when considering the Schelling model on SN , we identify an element i ∈ ZN with its equivalence class in SN . Assume the initial opinions of the nodes are i.i.d. random variables satisfying P[X(i, 0) = m] = M −1 for all i ∈ SN and m ∈ {1, ..., M }. Throughout the paper we let R denote the set of rings of the Poisson clocks R = {(i, t) ∈ SN × R+ : the clock of node i ∈ SN rings at time t}.

(1)

N P We define the bias of node i ∈ S towards opinion m ∈ {1, ..., M } at time t > 0, to be the sum j∈N(i) 1X(j,t)=m . When the clock of a node rings, the node updates its opinion to the opinion towards N which it has the strongest bias, with draws resolved as described inP (ii)-(iii) above. We say that P i ∈ S agrees with the most common opinion in its neighborhood at time t if j∈N(i) 1X(j,t)=X(i,t) ≥ j∈N(i) 1X(j,t)=m for any m = {1, ..., M }. In other words, i agrees with the most common opinion in its neighborhood if and only if it would not update its opinion if its Poisson clock were ringing.

5

In the description of the Schelling model above we considered (for simplicity of the description) the neighborhood N = (−1, 1)N , but our results are proved for more general neighborhoods, since the more general case is not significantly more difficult to analyze. Unless otherwise stated, we will assume that N ⊂ (−1, 1)N is an arbitrary open set containing 0, such that ∂N has upper Minkowski dimension strictly smaller than N , and such that the following technical condition is satisfied. If x0 is a unit vector in an arbitrary direction, λ denotes Lebesgue measure, and we let N + sx0 denote the set {x + sx0 : x ∈ N } for some s ∈ R, then [ 1 λ (N + tx0 ) \ (N + sx0 ) > 0. (2) lim inf t→0+ t s≤0

The condition (2) will be used in the proof of Lemmas 2.5 and 2.7. It is obviously satisfied by most of the smooth-boundary regions one would be inclined to consider. We may for example let N = Np , where Np := {x ∈ RN : kxkp < 1} is a metric ball for the `p norm for some p ∈ [1, ∞]. Finally we remark that the first main result of the next section holds for the general N -dimensional Schelling model on the torus with M ∈ {2, 3, . . . } opinions, while the second main result holds for the torus or the real line for N = 1, M = 2, and N = N∞ .

1.4

Main results

Our first main result is a dynamical scaling limit result for the Schelling model. We prove that a function Y w describing the opinions of the nodes in the early phase (times up to order w−N/2 ) of the Schelling model on the torus, converges in law as w → ∞. Denote the N -dimensional torus of side length R by S N = S × · · · × S. Let C(S N × R+ ) denote the set of continuous real-valued functions on S N × R+ , and let CM (S N × R+ ) denote the set of functions which can be written in the form f = (f1 , . . . , fM ) for fm ∈ C(S N × R+ ) and m = 1, . . . , M . Equip CM (S N × R+ ) with the topology of uniform convergence on compact sets. Define the unscaled bias function Ym by X 1 . Ym (i, t) = 1X(j,t)=m − M j∈N(i)

w Then define the (normalized) bias function Y w = (Y1w , . . . , YM ) ∈ CM (S N × R+ ) by

Ymw (x, t) :=

1 Ym (xw, tw−N/2 ) wN/2

m ∈ {1, ..., M },

x ∈ w−1 ZN ,

t ≥ 0,

(3)

e ∈ w−1 ZN which satisfy and for x 6∈ w−1 ZN let Ymw (x, t) be a weighted average of the ≤ 2N points x −1 kx − x ek∞ < w ! N X Y w Ym (x, t) = (1 − w|xk − x ek |) Ymw (e x, t). (4) x e∈w−1 ZN : kx−e xk∞

k=1

See Figure 2. Note that Y w encodes the bias of each node towards each opinion 1, . . . , M . Also note that when defining Y w we do not only scale space; we also scale time by w−N/2 . This implies that the convergence result of the following theorem only describes the evolution of the bias in the very beginning (more precisely, up to times of order O(w−N/2 )) of the Schelling model. Theorem 1.1. In the setting described above, Y w converges in law in CM (S N × R+ ) to a random function Y as w → ∞. The theorem is an immediate consequence of Proposition 3.1 in Section 3.1, which identifies Y as the solution of a particular differential equation with Gaussian initial data. Our second main result is a scaling limit result for the final configuration of opinions in the onedimensional Schelling model. We consider the model on either the torus S or on Z, and have M = 2 6

Normalized bias at time t=0

Normalized bias at time t=10 3

1.5

2

1

1

0.5

Yw

Yw

0

0

-0.5

-1

-1

-2

-1.5 -3 2

4

6

8

10

12

14

2

4

6

x

8

10

12

14

x

Normalized bias at time t=32

Normalized bias at time t=800

5

20

4 15 3 10 2 5

1

Y

w

Y

0 -1

w

0 -5

-2 -10 -3 -15 -4 -20

-5 2

4

6

8

10

12

14

2

x

4

6

8

10

12

14

x

Figure 2: The graphs show the bias function Y w at four different times for the Schelling model on the torus of dimension N = 1 with M = 3 opinions. Each color in the figure corresponds to one of the M opinions. The initial data of Y w (upper left figure) converges in law to the Gaussian field B defined in Section 2 when w → ∞. The evolution of Y w can be well approximated by the differential equation (5) on compact time intervals, see Theorem 1.1 and Proposition 3.1. When we have reached the final configuration of opinions, each function Ymw , m = 1, 2, 3, is piecewise linear with slopes ±2w1/2 and 0, and values in the interval 1 1 1 [−2w1/2 M , 2w1/2 (1 − M )]. Regions in which Ymw equals 2w1/2 (1 − M ) correspond to long intervals where the nodes have limiting opinion m. For the continuum analog Y of Y w , each function Ym converges to ±∞ at almost every point. The thick line at the x axis in the lower right figure represents the limiting opinion of the nodes. The plots are made with window size w = 100 and torus width R = 14.

opinions and neighborhood N = N∞ . For R ∈ {3, 4, . . . } let S be the torus of width R, i.e., S = R/ ∼, where ∼ is the equivalence relation on R defined by x ∼ y iff x − y is an integer multiple of R. By [TT14] (see Proposition 3.9 below for the analogous result for general M ) the opinion of each node converges almost surely as time goes to infinity, hence each node is associated with a unique opinion in {1, 2} describing its limiting opinion.

7

Normalized bias at time t= 10

Normalized bias at time t= 0 6 4 4

3 2

2 1

Yw

Yw

0

0

-1 -2 -2 -3

-4

-4 -6 2

4

6

8

10

12

14

2

4

6

x

10

12

14

12

14

x

Normalized bias at time t= 32

Normalized bias at time t= 800 20

8 6

15

4

10 5

2 w

Y

8

w

Y

0

0

-2

-5

-4

-10

-6

-15

-8

-20 2

4

6

8

10

12

14

2

4

x

6

8

10

x

Figure 3: The graphs show the function Yb w := Y1w − Y2w for the Schelling model on the torus of dimension N = 1 with M = 2 opinions, which evolves approximately as described by (10). The plots are made with window size w = 100 and torus width R = 14.

For V = S and V = R define D(V ) := {A = (A1 , A2 ) : Am ⊂ V is closed for m = 1, 2}. Equip D(V ) with the topology of convergence of A1 and A2 for the Hausdorff distance on compact sets. The appropriately normalized limiting distribution of opinions in the Schelling model, is a random variable in D(V ). The following theorem says that this random variable converges in law in D(V ) as the window size w converges to ∞. In the theorem below we identify S and S with {0, . . . , Rw − 1} and [0, R), respectively. Theorem 1.2. Let V = Z and V = R, or let V = S and V = S. Consider the one-dimensional Schelling model on V as described in Section 1.3 with window size w ∈ N, M = 2 opinions, and N = N∞ . Define Aw ∈ D(V ) by w Aw := (Aw 1 , A2 ),

−1 Aw ∈ V : j ∈ V, lim X(j, t) = m} for m = 1, 2. m := {jw t→∞

8

w −1 Then Aw : j ∈ V} a.s., and Aw converges in law as a random variable in D(V ) to a limiting 1 ∪ A2 = {jw random variable A = (A1 , A2 ). The sets A1 and A2 have disjoint interior and union V a.s., and each set Am for m = 1, 2 is almost surely the union of at most countably many closed intervals each of length strictly larger than 1.

When we prove the theorem in Section 3.2 we will describe the limiting random variable A in terms of the solution Y of the initial value problem mentioned above. We will not describe the law of this random variable further, but we remark that if I is defined to be the maximal interval satisfying either 0 ∈ I ⊂ A1 or 0 ∈ I ⊂ A2 , then the length of I decays at least exponentially; this holds by Lemma 3.8, and since the event considered in this lemma holds independently and with uniformly positive probability on each interval [10k, 10k + 5], k ∈ Z (see the proof of [BIKK12, Theorem 1] for a similar argument).

N(i) i

N(j) j

w=1

N(i) i

w=2 w=3

Figure 4: Left: initial configuration of opinions in the Schelling model on the torus for N = 2, M = 2, R = 9, and w = 1. Middle: one possible final configuration of opinions with initial data as on the left figure. Observe that all nodes agree with the most common opinion in their neighborhood. Right: final configuration of opinions in the Schelling model on the torus for N = 1, M = 2, R = 15, and w = 1, 2, 3.

Figure 5: Final configuration of opinions in the Schelling model on the torus for N = 2, M = 2, and a torus width of 4000 nodes. Left: w = 4, middle: w = 8, and right: w = 12. Simulations by Omer Tamuz. Other results in the paper of independent interest include Theorem 2.1, which establishes existence and uniqueness of the solution of the differential equation mentioned above, and Theorem 2.4, which says that for a certain family of Gaussian fields the supremum of its occupation kernel on Lipschitz functions is finite. Finally we remark that our methods can be adapted easily to certain other variants of the Schelling model. For example, we may consider a perturbed variant of the model, where each node acts against its own preference with probability p ∈ (0, 1) every time its Poisson clocks rings, e.g. it chooses some opinion uniformly at random from {1, . . . , M } instead of changing its opinion to the most common opinion in its neighborhood. In this perturbed model a variant of Theorem 1.1 still holds, but the continuum bias function would evolve slower than with the unperturbed dynamics. The results above are stated for the single-site-update variant of the Schelling model. In the version of the Schelling model studied in certain other papers, however, the nodes swap opinions rather than changing 9

opinions; equivalently, the nodes have a fixed opinion and they change locations in order to be surrounded by nodes of a similar opinion to themselves. In this formulation of the model we would consider a finite grid (e.g. the torus), and each time step could consist of choosing two nodes i, j uniformly at random and swapping their opinions if the opinion of node i (resp. j) equals the most common opinion in the neighborhood of node j (resp. i). Defining Y w using (3) and (4), the continuum approximation Y to Y w would evolve as described by a particular differential equation with random initial data, i.e., a variant of Theorem 1.1 still holds in this setting. This differential equation also describes certain variants closely related to Schelling’s original model, where some nodes are unoccupied and individuals may move to unoccupied sites if they are not satisfied. See the introduction of Section 2 for more details. Our results above also extend easily to other lattices than ZN and SN (assuming N(i) is still defined by (1.3) for each node i).

1.5

Notation

We will use the following notation: • If a and b are two quantities whose values depend on some parameters, we write a b (resp. a b) if there is a constant C independent of the parameters such that a ≤ Cb (resp. a ≥ Cb). We write a b if a b and a b. (We will sometimes abuse notation and use the same terminology when C depends on some parameters but not others, but this will be made clear in context.) • For any N ∈ N let λ denote the Lebesgue measure on RN or the torus S N . • For N ∈ N and a topological space V let C(V N ) denote the space of continuous real-valued functions on V N , equipped with the topology of uniform convergence on compact sets. • For N ∈ N and either V = S or V = R let L(V N ) denote the space of real-valued Lipschitz continuous functions on V N equipped with the topology of uniform convergence on compact sets. For K > 0 define LK (V N ) ⊂ L(V N ) by 0 |y(x) − y(x )| N −1 N N K N . ≤ K2 sup L (V ) = y ∈ L(V ) : kykL∞ ≤ K2 and ∀k ∈ {1, ..., N }, |xk − x0k | x, x0 ∈ V N , xk 6= x0k , 0 xj = xj ∀j 6= k

N • For M, N ∈ N and either V = S or V = R define CM (V N ) (resp. LM (V N ), LK M (V )) to be the M space of functions f = (f1 , ..., fM ) taking values in R , such that for each m ∈ {1, ..., M } we have fm ∈ C(V N ) (resp. fm ∈ L(V N ), fm ∈ LK (V N )).

• For any topological space V let B(V ) denote the Borel σ-algebra. See Sections 1.3, 1.4, and 2 for additional notation.

Acknowledgements We thank Omer Tamuz for introducing us to the Schelling model, for numerous helpful discussions, for comments on an earlier draft of this paper, and for allowing us to use his simulations in Figure 5. We thank Chris Burdzy for our discussion about his paper [BB01], and we thank Matan Harel for our discussions about Proposition 3.10. We thank Nicole Immorlica, Robert Kleinberg, Brendan Lucier, and Rad Niazadeh for discussions about this paper and some of their related work [BIKK12, IKLZ15]. The first author was supported by a fellowship from the Norwegian Research Council. The second author was supported by NSF grants DMS 1209044 and DMS 1712862.

10

2

The continuum Schelling model

In this section we will introduce a differential equation which describes the early phase of the Schelling dynamics when the window size w is large. We call this differential equation with appropriate initial the continuum Schelling model. The main result of this section is the existence and uniqueness of solutions of the differential equation (Theorem 2.1), along with a result on the occupation kernel of Gaussian fields (Theorem 2.4) and some properties of the continuum one-dimensional dynamics (Proposition 2.14). Let N ∈ N, M ∈ {2, 3, . . . }, and R ∈ {3, 4, . . . }, and let V = S or V = R. The solution of the differential equation we will define just below is a function Y = (Y1 , . . . , YM ), Ym : V N ×R+ → R, which is the continuum analog of the function Y w defined by (3) and (4). Define the plurality function p : RM 7→ {0, 1, ..., M } by ym0 , m if ym > 0 max m ∈{1,...,M }\{m} p(y) = if there is no m for which ym > 0 max ym0 . 0 m ∈{1,...,M }\{m}

Letting N ⊂ (−1, 1)N be as in Section 1.3 define the neighborhood of x ∈ V N by N (x) = {x0 ∈ V N : x0 − x ∈ N }, where we view x0 − x modulo addition of an element in RZN if V = S. Consider the following differential equation Z ∂Ym (x, t) = (1 − M −1 )1p(Y (x0 ,t))=m − M −1 1p(Y (x0 ,t))6=m dx0 m ∈ {1, . . . , M }, x ∈ V N , t ≥ 0. ∂t x0 ∈N (x) (5) We will prove in Proposition 3.1 that this differential equation approximates the early phase of the Schelling dynamics well for large window size. An (N, M )-random field is a random map from RN to RM . Let B be a multivariate Gaussian (N, M )random field (see Section 2.1) on the probability space (Ω, F, P) with mean and covariance functions given by mm (x) = 0 and M −1 0 for m = m0 , M 2 λ N (x) ∩ N (x ) Cm,m0 (x, x0 ) = (6) −1 0 for m 6= m0 . M 2 λ N (x) ∩ N (x ) We prove in Lemma 3.2 that B is well-defined as a continuous field. Although we will not need this formulation, we remark that one way to construct B involves √ starting with W = (W1 , W2 , . . . , WM ), where the Wi are i.i.d. instances of white noise (each rescaled by 1/ M ) on N -dimensional space, and then writing M X fi = Wi − 1 W Wi , M i=1

fi sum up to zero a.s. and each W fi describes (in a limiting sense) the “surplus” of individuals so that the W √ f / 2 over the set N (x). with opinion i. We can then let B(x) denote the integral of W Let the initial data of (5) be given by B Y (x, 0) = B(x),

∀x ∈ V N .

(7)

The following theorem will be proved in Section 2.3. Theorem 2.1 (Existence and uniqueness for (5), (7)). Let V = S and N ∈ N, or let V = R and N = 1. Let M ∈ {2, 3, . . . } and R ∈ {3, 4, . . . }, and let N ⊂ (−1, 1)N be as defined in Section 1.3. Then the initial value problem (5), (7) a.s. has a solution Y : V N × [0, ∞) → RM . This solution can be written as the sum of the function (x, t) 7→ B(x) and a function y : V N × [0, ∞) → RM satisfying the following properties. For any t > 0, we have y(·, t) ∈ LtM (V N ), so that in particular y(·, 0) = 0, and y is continuously differentiable in t. The solution Y just described is unique in the space of functions satisfying these properties. 11

Note that we have not proved that the initial value problem (5), (7) is well-defined for N ≥ 2 and V = R, but we believe that the above theorem also holds in this case. As we will discuss in Section 2.3 there exist initial data for which (5) does not have a unique solution (also when N = 1 and/or V = S), and solutions of (5) do not in general vary continuously with the initial data. N 1 N For any ω ∈ Ω and m ∈ {1, ..., M } define the random function φω m : CM (V ) → L (V ) by Z ω φm (y) : x 7→ (1−M −1 )1p(B(x0 )+y(x0 ))=m −M −1 1p(B(x0 )+y(x0 ))6=m dx0 , ∀x ∈ V N , y ∈ CM (V N ). (8) x0 ∈N (x)

ω Also define φω : CM (V N ) → L1M (V N ) by φω = (φω 1 , ..., φM ). Note that solving (5), (7) is equivalent to solving the following initial value problem

∂y (·, t) = φω (y(·, t)), t ≥ 0, ∂t y(x, 0) = 0, x ∈ V N.

(9)

We obtain a solution to (5), (7) by defining Y (x, t) = y(x, t) + B(x) for all t ≥ 0 and x ∈ V N . We also observe that (5) is equivalent to a single differential equation when M = 2. Let sign : R → {−1, 0, 1} denote the sign function, which is defined to be 0 at 0. Defining Yb = Y 1 − Y 2 , the initial value problem (5), (7) is equivalent to ∂ Yb (x, t) = ∂t

Z x0 ∈N (x)

sign Yb (x0 , t) dx0 ,

∀x ∈ V N , t ≥ 0,

(10)

b Yb (x, 0) = B(x), b is the centered Gaussian field with covariances Cov(B(x), b b 0 )) = λ(N (x) ∩ N (x0 )). where B B(x Finally, we will briefly state the analog of (5), (7) for the setting of the pair-swapping variant of the Schelling model. See the end of Section 1.4 for the definition of this model. In this variant of the model the initial data for the continuum approximation Y : S N × R+ → RM to Y w : S N × R+ → RM is still given by (7), while the differential equation describing the evolution of Y is given by Z ∂Ym (x, t) = (λ(S N ) − Λm (t))1p(Y (x0 ,t))=m − Λm (t)1p(Y (x0 ,t0 ))6=m dx0 , ∂t x0 ∈N (x) (11) N N m ∈ {1, . . . , M }, x ∈ S , t ≥ 0, Λm (t) := λ({x ∈ S : p(Y (x, t)) = m}). The main difference between (5) and (11) is that the rate at which Y changes in (11) depends on the overall fraction Λm (t) of points with the various biases. Also observe that the integral of Ym (·, t) is constant in time, which is consistent with the fact that the number of nodes with opinion m is constant. Theorems 1.1 and 2.1 (for the model on the torus) also hold for the pair-swapping Schelling model, and are proved exactly as before. Notice in particular that any solution of (11), (7) can be written on the same form as the N function Y in Theorem 2.1 (except that LtM (V N ) is replaced by LCt M (V ) for some constant C > 1), and Ct N since y(·, t) ∈ LM (V ) we can still apply Theorem 2.4 in this setting. The initial value problem (11), (7) also describes the situation where each node is unoccupied with constant probability in the initial configuration, and individuals (who have a fixed type or opinion) may move to an unoccupied node if this would make them satisfied.

2.1

Occupation measures of random fields: basic definitions

We now give a short introduction to the theory of occupation measures of random fields, which is used frequently in our study of the continuum Schelling model. We refer to [GH80] for further information. Let N, M ∈ N. An (N, M )-random field on a probability space (Ω, F, P) is a collection B of random variables with values in RM , which are indexed by an N -dimensional vector space Vb , i.e., B = {B(x) : x ∈ Vb } 12

or B = (B(x))x∈Vb . If M = 1 we say that the field is an N -random field. In the remainder of the paper we consider Vb = V N for V = R or V = S, where S is a one-dimensional torus. The field B is Gaussian if M = 1 and if, for every k ∈ N and x1 , ..., xk ∈ V N , the random variable (B(x1 ), ..., B(xk )) is multivariate Gaussian. We say that B = (B1 , ..., BM ) is a multivariate (N, M )-Gaussian PM field if, for every α ∈ RM , the weighted sum m=1 αm Bm is a real-valued Gaussian field. By e.g. [Adl10], a multivariate Gaussian field is uniquely determined by its mean m = (m1 , ..., mM ) and covariance matrix C = (Cm,m0 )m,m0 ∈{1,...,M } , which satisfy the following relations with t denoting the transpose of a matrix m(x) = E[B(x)],

C(x, x0 ) = E (B(x) − m(x))t (B(x0 ) − m(x0 )) ,

x, x0 ∈ Vb .

White noise on V N for V = S or V = R is a collection of random variables W = (W (X) : X ∈ B(V N )) such that (i) W (X) ∼ N (0, λ(X)) for any X ∈ B(V N ), (ii) W (X1 ∪X2 ) = W (X1 )+W (X2 ) if X1 , X2 ∈ B(V N ) and X1 ∩ X2 = ∅, and (iii) W (X1 ) and W (X2 ) are independent if X1 , X2 ∈ B(V N ) and X1 ∩ X2 = ∅. Note b = {B(x) b that for a fixed set X ∈ B(V N ) we can define an N -random field B : x ∈ V N } by defining 0 0 N b B(x) := W (x + X), where x + X = {x + x : x ∈ X} for any x ∈ V . We call a field that can be written on this form a moving average Gaussian field. The following definition is from [GH80, Section 21]. See [GH80, Theorem 6.3] for a proof that the occupation kernel α described below is well-defined when µX is a.s. absolutely continuous with respect to Lebesgue measure λ. Definition 2.2 (Occupation measure and occupation kernel). Let N, M ∈ N, let V = S or V = R, and consider an (N, M )-random field B = (B1 (x), ..., BM (x))x∈V N on the probability space (Ω, F, P). • The occupation measure µ = (µX )X∈B(V N ) is defined by µX (A) := λ(X ∩ B −1 (A)),

A ∈ B(RM ).

• If µX is a.s. absolutely continuous with respect to Lebesgue measure λ for all X ∈ B(V N ), let α(a, X) denote the Radon-Nikodym derivative of µX with respect to λ, i.e., Z −1 λ(X ∩ B (A)) = α(a, X) da, A ∈ B(RM ). (12) A

Let α be chosen such that α(·, X) is measurable for each fixed X ∈ B(V N ), and α(a, ·) is a σ-finite measure on (V N , B(V N )) for each a ∈ RM . We call α the occupation kernel of B. • For f : V N → RM let α(f, ·, ·) denote the occupation kernel (provided it exists) of the field B − f . • More generally, for k ∈ {1, ..., N − 1}, x0 = (x01 , . . . , x0k ) ∈ V k , and f : V N → RM , let α(x0 , f, ·, ·) denote the occupation kernel (provided it exists) of the (N − k, M )-random field B(x1 , . . . , xN −k , x01 , . . . , x0k ) − f (x1 , . . . , xN −k , x01 , . . . , x0k ) (x1 ,...,x . )∈V N −k N −k

2.2

Supremum of the occupation kernel on Lipschitz functions for moving average Gaussian fields

In [BB01, BB02] the authors prove that the supremum on Lipschitz curves of Brownian local time is finite. In this section we will prove a higher-dimensional analog of this result, stated in Theorem 2.4 below. We consider the supremum on Lipschitz functions of the occupation kernel of a particular centered moving average Gaussian field B. The idea of the proof is to define various (N + 1)-dimensional boxes on different scales, and bound the number of such boxes intersected by both the graph of B and a Lipschitz function f , uniformly over all 13

choices of f . On each scale we proceed by using that f is approximately constant, while B fluctuates rapidly. We also prove that if f and g are uniformly close then the occupation kernel of B on f and g, respectively, are close with high probability. We start the section by proving existence and basic properties of the occupation kernel of B on any fixed Lipschitz function. Theorem 2.4 will imply that for any solution Y of (5), (7) on S N and any m, m0 ∈ {1, ..., M }, m 6= m0 , the field Ym − Ym0 is close to 0 only for a small subset of S N simultaneously. This will help us to prove existence, uniqueness and other properties of solutions to (5), (7). Since Bm − Bm0 has the law of a constant multiple of Bm , it will be sufficient to obtain our result for the following real-valued field B. Remark 2.3. We will assume throughout the section that B = (B(x))x∈S N is the centered moving average Gaussian field with covariances given by C(x, x0 ) = λ N (x) ∩ N (x0 ) , x, x0 ∈ S N . (13) Observe that B is a moving average Gaussian field as defined in Section 2.1. The proof of the following theorem uses ideas from [BB01, BB02]. See Section 2.1 for the definition and basic properties of the occupation measure of random fields. Theorem 2.4. Let B = (B(x))x∈S N be the centered Gaussian field with covariances given by (13), and let K > 0. For each fixed f ∈ LK (S N ) the occupation kernel α(f, ·, ·) of B −f exists almost surely. Furthermore, almost surely there exists a random field α e = {e α(f ) : f ∈ LK (S N )} satisfying the following properties. (I) For each fixed f ∈ LK (S N ), we have α e(f ) = α(f, S N , 0) a.s. (II) Almost surely, f 7→ α e(f ) is continuous on LK (S N ) equipped with the supremum norm. (III) Almost surely, supf ∈LK (S N ) α e(f ) < ∞. First we will prove existence and various properties of the occupation kernel for a certain class of Gaussian random fields. e = B − f , where B is the centered Gaussian field on Lemma 2.5. Let K > 0 and f ∈ LK (S N ). Define B N S with covariances given by (13). Then the following holds a.s. e has an occupation kernel α(f, ·, ·). (I) B (II) For all k ∈ {1, ..., N − 1} and almost all x0 = (x01 , ..., x0k ) ∈ S k the (N − k)-field e 00 , ..., x00 , x0 , ..., x0 ) x00 7→ B(x 1 N −k 1 k for x00 = (x001 , ..., x00N −k ) ∈ S N −k , has an occupation kernel α(x0 , f, ·, ·). For almost all a ∈ R the following holds for all X 0 ∈ B(S k ) and X 00 ∈ B(S N −k ) Z 00 0 α(f, a, X × X ) = α(x0 , f, a, X 00 ) dx0 . (14) X0

e 0 ) − B(x)|) e Proof of Lemma 2.5. By (13), we have Var(|B(x λ N (x)∆N (x0 , where the implicit constant is independent of x, x0 ∈ S N and ∆ denotes symmetric difference. Since x, x0 ∈ S N , the difference x − x0 is only defined as an element in RN modulo RZN , but we will view x − x0 as an element in RN by choosing the equivalence class such that kx − x0 k1 is minimized (with some arbitrary choice of equivalence class in e 0 ) − B(x)|) e case of draws). By (2) it follows that Var(|B(x kx0 − xk1 , where the implicit constant is again 0 N independent of x, x ∈ S , but may depend on all other parameters. Since the probability density function of a standard normal random variable is bounded, for any x ∈ S N , Z Z −1/2 e 0 ) − B(x)| e lim inf −1 P[|B(x ≤ ] dx0 kx0 − xk1 dx0 < ∞. (15) →0

SN

SN

14

e is a random field for which the left side of (15) is finite, then By [GH80, Theorem 21.12] we know that if B e has an occupation kernel a.s. Applying this result concludes our proof of (I). B The same theorem also implies the existence of the occupation kernel α(x0 , f, ·, ·) for a.e. fixed x0 = 0 (x1 , ..., x0k ). The identity (14) follows by [GH80, Theorem 23.5], since for any k ∈ {1, ..., N − 1} and x = (x001 , . . . , x00N −k , x01 , . . . , x0k ) ∈ S N , x00 = (x001 , . . . , x00N −k ) ∈ S N −k , x0 = (x01 , . . . , x0k ) ∈ S k , Z e x1 , . . . , x e 00 , . . . , x00 ,x0 , . . . , x0 ))| ≤ ] de sup −1 P[|B((e eN −k , x01 , . . . , x0k )) − B((x x 1 N −k 1 k S N −k >0 Z −1/2 ke x − x00 k1 de x < ∞. S N −k

For K > 1 let LK,k (S N ) denote the space of functions f ∈ L(S N ) that satisfy the following properties: (i) for each x ∈ 2−k ZN , f (x) ∈ 2−k Z, (ii) for each x 6∈ 2−k ZN , f (x) is a weighted average of f at the ≤ 2N points x e ∈ 2−k ZN which satisfy kx − x ek∞ < 2−k , where the weights are defined as in (4), (iii) kf kL∞ (S N ) ≤ N −k 0 −k N K2 + 2 , and (iv) if x, x ∈ 2 Z satisfy kx − x0 k1 = 2−k then |f (x) − f (x0 )| ≤ K2N −k−1 + 22−k . Define [ LK,k (S N ). (16) LeK (S N ) = k∈N

The following lemma implies that LeK (S N ) is dense in LK (S N ) ∪ LeK (S N ) for the supremum norm. Lemma 2.6. For any f ∈ LK (S N ) and k ∈ N there is a canonically defined element f k ∈ LK,k (S N ) such that kf − f k kL∞ ≤ 2−k , where the implicit constant is independent of k and f , but may depend on K and k. Proof. For each x ∈ 2−k ZN let f k (x) be the multiple of 2−k which is closest to f (x). For x 6∈ 2−k ZN let f k (0) be defined such that condition (ii) in the definition of LK,k (S N ) is satisfied. It is immediate that the properties (i)-(iv) in the definition of LK,k (S N ) are satisfied. By (13) we may couple B with an instance of white noise W on S N such that for any x ∈ S, we have B(x) = W (N (x)). Define a filtration (Ft )t∈[0,R−2] by [ Ft = σ(W |B(Dt ) ), Dt := N ((s, x2 , . . . , xN )). (17) s∈[0,t], (x2 ,...,xN )∈S N −1

Let 0 ≤ s < t < R − 2 and (x2 , . . . , xN ) ∈ RN −1 . Conditioned on Fs the random variable B(t, x2 , . . . , xN ) is a Gaussian random variable with expectation W (N ((t, x2 , . . . , xN )) ∩ Ds ) and variance depending only on t − s. Therefore there exists a function p : (0, R − 2) × R × R such that p(t − s, W (N ((t, x2 , . . . , xN )) ∩ Ds ), ·) is the probability density function of B(t, x2 , . . . , xN ) conditioned on Fs . By (13), and with x = (t, 0, . . . , 0) and pe(s, a, y) := (2πs)−1/2 exp(−|y − a|2 /(2s)), p(t, a, y) = pe(λ(N (x) \ D0 ), a, y).

(18)

Lemma 2.7. Let p be given by (18). For t ∈ (0, R − 2) and a, y ∈ R, p(t, a, y) t−1/2 ,

(19)

where the implicit constant is independent of t, a, y, but may depend on N . Let δ ∈ (0, 1/2), a ∈ R, and assume f, g : [0, R − 2] → R satisfy kf − gkL∞ ([0,R−2]) < δ. Then Z

R−2

|p(t, a, f (t)) − p(t, a, g(t))| dt δ log(1/δ), 0

where the implicit constant may depend on N . 15

(20)

Proof. Since p is continuous in t by (18) and since p(t, a, y) 1 for t > 2 and any a, y ∈ R, in order to prove (19) it is sufficient to prove lim inf t→0 t1/2 p(t, a, y) 1. By the explicit formula for p this follows from (2). The estimate (20) follows by the exact same argument as for the case p = pe, which is considered in [BB01, Lemma 3.4], except that we use (19) instead of the corresponding estimate for pe. Lemma 2.8. Let d ∈ (0, 1 ∧ (R − 2)], and let f, g be two functions defined on some N -dimensional cube I ⊂ S N with side lengths d. Assume that kf − gkL∞ (I) ≤ δ for some δ > 0. Then for all b ≥ 1 and with α(f, I) = α(f, 0, I) and α(g, I) = α(g, 0, I) as in Definition 2.2, h i log P α(f, I) − α(g, I) ≥ bd(N −3/4) (δ log(δ −1 ))1/2 −b, where the implicit constant is independent of δ, f and g. Proof. Assume without loss of generality that I = [0, d]N . Couple B with an instance of white noise W as described above the statement of Lemma 2.7, and recall the filtration (Ft )t and the sets Dt ⊂ RN defined by (17). For any t ∈ [0, d] define It = [0, t] × [0, d]N −1 ⊂ S N , and let Aft = α(f, It ), Agt = α(g, It ), et = Aft − Agt . Then (Aft )t∈[0,d] , (Agt )t∈[0,d] , and (A et )t∈[0,d] are stochastic processes adapted to the and A filtration (Ft )t∈[0,d] . To simplify notation we write f (t, x0 ) instead of f (t, x01 , . . . , x0N −1 ) for t ∈ R and x0 = (x01 , . . . , x0N −1 ) ∈ RN −1 . Let t1 , t2 ∈ [0, d] satisfy t1 < t2 . By Lemma 2.5 (II) and (19) of Lemma 2.7, "Z # h i E Aft2 − Aft1 | Ft1 = E α(x0 , f (·, x0 ), [t1 , t2 ]) dx0 Ft1 [0,d]N −1

Z

Z

t2 −t1

= d

[0,d]N −1 0 Z t2 −t1 N −1

p s, N ((t2 , x01 , . . . , x0N −1 )) ∩ Dt1 , f (t1 + s, x0 ) ds dx0

s−1/2 ds

0

dN −1/2 . By a similar argument E[Agt2 − Agt1 | Ft1 ] dN −1/2 . By (20) we get further "Z # et − A et | Ft ]| = E |E[A α x0 , fe(·, x0 ), [t1 , t2 ] − α x0 , ge(·, x0 ), [t1 , t2 ] dx0 Ft1 2 1 1 N −1 [0,d] Z Z t2 −t1 ≤ p s, N ((t2 , x01 , . . . , x0N −1 )) ∩ Dt1 , f (t1 + s, x0 ) [0,d]N −1

0

− p s, N ((t2 , x01 , . . . , x0N −1 )) ∩ Dt1 , g(t1 + s, x0 ) ds dx0 dN −1 δ log δ. The lemma now follows by [BB01, Lemma 2.1]. e := dL1/2 e. For m ∈ {1, ..., L} e define the (N + 1)-dimensional Lemma 2.9. Let L > 1, a ∈ R, and define L m rectangle J by e −1 ]. J m = [(m − 1)L−1 , mL−1 ] × [0, L−1 ]N −1 × [a, a + L e which intersect the graph {(x, B(x)) ∈ S N × R : Let A be the number of rectangles J m for m ∈ {1, ..., L} N x ∈ S } of B. Then the following estimate holds for all ξ > 1 and ∈ (0, 1] e 1/2+ ] L−ξ , P[A ≥ L where the implicit constant is independent of a and L, but depends on and ξ.

16

Proof. Let Em be the event that the graph of B intersects J m , let Jem = [(m − 1)L−1 , mL−1 ] × [0, L−1 ]N −1 , 0 0 e −1+/10 }. Define the let xm = mL−1 , 0, ..., 0 ∈ JemPand define the event Em by Em = {|B(xm ) − a| < L m m 0 . By (19), for m2 > m1 , random variable A by A = m0 ≤m 1Em 0 0 e −1+/10 (m2 − m1 )−1/2 L e /10 . | Fm1 L−1 ] ((m2 − m1 ) · L−1 )−1/2 L P[Em 2

e Therefore, for any m ∈ {1, . . . , L}, E[AL − Am | FmL−1 ] e

e L X

e /10 L e 1/2+/10 . d−1/2 L

d=1

e 1/2+/10 } By applying [Bas95, Corollary I.6.12] to a constant multiple of the sequence {Am /L e , we get 1≤m≤L e e 1/2+ ) −L e /10 . log P(AL > L

For any x, x0 ∈ S and b > 1 we have E[|B(x) − B(x0 )|b ] kx − x0 kb/2 for an implicit constant depending on b. By a quantitative version of the Kolmogorov-Chentsov theorem as in e.g. [MS16, Proposition 2.3], the function B is γ-H¨ older continuous with (random) constant C(γ) for any γ < 1/2, and P[C(γ) > C] decays faster than any power of C. In particular, P[C(1/2 − /100) > L/100 ] L−ξ for any ξ. Observe that if 0 C(1/2 − /100) ≤ L/100 and Em occurs, and if L is sufficiently large, then Em also occurs since for some m x ∈ Je , e −1 < L e −1+/10 . |B(xm ) − a| ≤ |B(xm ) − B(x)| + |B(x) − a| ≤ N L/100 (L−1 )1/2−/100 + L Therefore 0 Em ⊂ Em ∪ {C(1/2 − /100) > L/100 },

so e 1/2+ } ∪ {C(1/2 − /100) > L/100 }, e 1/2+ } ⊂ {ALe ≥ L {A ≥ L and the lemma follows by a union bound. Lemma 2.10. For any k ∈ N divide S N into N -dimensional cubes Ij , j = 1, ..., (R2k )N , of side length 2−k , 0 such that the cubes have pairwise disjoint interior. Also divide S N × R into (N + 1)-dimensional cubes Iij −k k N of side length 2 for j = 1, ..., (R2 ) and i ∈ Z, such that the cubes have pairwise disjoint interior. Let ∈ (0, 1/100) and define the three events G1k , G2k , G3k as follows 0 intersecting the graph of both g and • G1k is the event that for any g ∈ LK (S N ) the number of cubes Iij k(N −1/2+) B, is bounded by 2 for any g.

• G2k is the event that for any f ∈ LK (S N ) and j ∈ {1, ..., 2k }, the approximations f k and f k+1 to f defined in Lemma 2.6 satisfy |α(f k , Ij ) − α(f k+1 , Ij )| ≤ 2k(−N +1/4+) . • G3k is the event that for any j ∈ {1, ..., 2k } and any two f, g ∈ LK (S N ) satisfying kf − gkL∞ ≤ 2−k , we have |α(f k , Ij ) − α(g k , Ij )| ≤ 2k(−N +1/4+) . Finally define Gk by Gk :=

\ k0 ≥k

G1k0 ∩

\ k0 ≥k

Then P[Gk ] → 1 as k → ∞. 17

G2k0 ∩

\ k0 ≥k

G3k0 .

Proof. It is sufficient to prove that for b = 1, 2, 3 it holds that limk→∞ P ∩k0 ≥k Gbk0 = 1. We consider the three cases b = 1, 2, 3 separately. All implicit constants may depend on R and K, but not on k. 0 Case b = 1: For any L ∈ N divide S N × R into (N + 1)-dimensional cubes Iij (L) for i ∈ Z and N j ∈ {1, ..., (LR) }, such that the interior of the cubes are disjoint, and each cube has side length L−1 . 0 0 0 e := dL1/2 e ∈ N. Divide S N ×R Observe that Iij = Iij (2k ) with Iij as in the statement of the lemma. Define L e such their interiors into (N + 1)-dimensional rectangles Jij = Jij (L) for i ∈ Z and j ∈ {1, ..., RN LN −1 L}, −1 e e −1 ]. are pairwise disjoint, and such that each rectangle is a translation of Ji1 := [0, L ] × [0, L−1 ]N −1 × [0, L 0 −1 N +1 0 Assume I1,1 (L) = [0, L ] , and that the projection of Iij (L) (resp. Jij ) onto the last coordinate is given e −1 [i − 1, i]). by L−1 [i − 1, i] (resp. L e m L Consider one of the rectangles Jij . Find a cover {Jij }m=1 of Jij of (N + 1)-dimensional rectangles whose e −1 ] (note that unless L e 2 = L, interiors are disjoint, and each of which is a translation of [0, L−1 ]N × [0, L m have a non-empty intersection with the complement of Jij ). Let a small fraction of the rectangles Jij Aij = Aij (L) denote the number of such rectangles that contain a point of the graph of B. By Lemma 2.9 the following holds for all i, j and any ξ > 1 e 1/2+ ] L e −ξ , P[Aij ≥ L

(21)

where the implicit constant depends on and ξ. We will now prove that on the event \ eL := e 1/2+ } E {Aij < L e 1≤j≤RN LN −1 L,−KL≤i

18

For each fixed x ∈ Iij ∩ (2k ZN ), f k (x) can take 2k different values. Conditioned on f k (x) the number of possible realizations of f k |Ij and f k+1 |Ij is bounded by a constant. It follows that there are 2k possibilities for f k |Ij and f k+1 |Ij . Since the number of cubes Ij is 2kN , a union bound implies that log(1 − P[G2k ]) −2k/2 . We conclude by a union bound. Case b = 3: We proceed exactly as in the case b = 2. The result follows by a union bound, Lemma 2.8, and by observing that the number of possible realizations of f k |Ij and g k |Ij for each fixed j is 2k . Proof of Theorem 2.4. We start by proving uniform continuity of g 7→ α(g) on LeK (S N ), where LeK (S N ) is defined by (16). Let η, β > 0, and the choose e k ∈ N sufficiently large such that the event Gek of Lemma 2.10 holds with probability at least 1 − η. Condition on the event Gek . Consider any f, g ∈ LeK (S N ) such e that kf − gkL∞ ≤ 2−k , and let f k , g k ∈ LK,k (S N ) denote the approximations to f, g, respectively, defined in Lemma 2.6. By the definition of the events G1k and G2k , |α(f k+1 ) − α(f k )| 2k(N −1/2+) · 2k(−N +1/4+) = 2k(−1/4+2) for all k ≥ e k and a universal implicit constant. Since f = f k for sufficiently large k the triangle inequality e implies that, after we increase e k if necessary, we have |α(f ) − α(f k )| ≤ β/3. By the same argument e |α(g) − α(g k )| ≤ β/3. By the definition of G1k and G3k , |α(f k ) − α(g k )| 2k(N −1/2+) · 2k(−N +1/4+) = 2k(−1/4+2) . e

e

e

e

e

Increasing e k if necessary, it follows by the triangle inequality that |α(f ) − α(g)| ≤ β with probability at least e N ) satisfying kf − gkL∞ ≤ 2ek . Note that e 1 − η for all f, g ∈ L(S k is a function of β and η, i.e., e k=e k(β, η). Fix some η > 0 and a sequence (βm )m∈N converging to 0. For any m ∈ N and any f, g ∈ LeK (S N ) −m e satisfying kf − gkL∞ ≤ 2−k(βm ,η2 ) we have |α(f ) − α(g)| ≤ βm with probability at least 1 − η2−m . By a union bound it holds with probability at least 1 − η that |α(f ) − α(g)| ≤ βm for all m ∈ N and all −m e f, g ∈ LeK (S N ) satisfying kf − gkL∞ ≤ 2−k(βm ,η2 ) . Since the choice of η was arbitrary this implies uniform continuity of g 7→ α(g). Define α e to be the restriction to LK (S N ) of the continuous extension of α from LeK (S N ) to LK (S N ) ∪ K N e L (S ). Part (II) of the theorem follows by the definition of α e and uniform continuity of α on LeK (S N ). K Part (III) of the theorem follows by using part (II) and that L (S N ) is compact for the supremum norm. Finally we will prove part (I). Let f ∈ LK (S N ), and for each k ∈ N let f k be as in Lemma 2.6. By Lemma 2.8 and the Borel-Cantelli lemma we can find an increasing sequence {kl }l∈N , kl ∈ N, such that α(f kl ) → α(f ) a.s. as l → ∞. By part (II) we have α(f k ) → α e(f ) a.s. as k → ∞. Part (I) now follows by the triangle inequality.

2.3

Existence and uniqueness of solutions of the continuum Schelling model

In this section we will prove Theorem 2.1, i.e., we will prove existence and uniqueness of solutions of the initial value problem (5), (7). First we will see that the theorem does not hold for all choices of initial data, i.e., there exist initial data for which (5) does not have a unique solution. Furthermore, solutions of (5) do not in general vary continuously with the initial data. We will illustrate these properties of (5) by considering the model for N = 1, M = 2, N = N∞ , and V = S. Let Yb be as in (10). The initial data Yb (·, 0) ≡ and Yb (·, 0) ≡ − for > 0, give solutions Yb (x, t) = + 2t and Yb (x, t) = − − 2t, respectively, so the solution does not vary continuously with the initial data. As an example of initial data for which Theorem 2.1 does not hold, assume n := 3R/4 ∈ N, and define periodic initial data by −1 + 3(x − 4k/3) for x ∈ 23 [2k, 2k + 1), k ∈ {0, ..., n − 1}, b Y (x, 0) = 1 − 3(x − 4k/3 − 2/3) for x ∈ 23 [2k + 1, 2k + 2), k ∈ {0, ..., n − 1}. 19

Then we have for all t ∈ [0, 3/2), ∂ Yb 2/3 − 2(x − 4k/3) (x, t) = −2/3 + 2(x − 4k/3 − 2/3) ∂t

for x ∈ 23 [2k, 2k + 1), k ∈ {0, ..., n − 1}, for x ∈ 23 [2k + 1, 2k + 2), k ∈ {0, ..., n − 1}.

For t = 3/2 we have Yb (·, t) ≡ 0. If we only allow for solutions satisfying (5) for all t ≥ 0, we have no solutions since (5) is not satisfied at t = 3/2. If we allow the time derivative not to exist for the single time t = 3/2, solutions are not unique, e.g. Yb (t, x) = 2(t − 3/2) and Yb (t, x) = −2(t − 3/2) are both solutions for t ≥ 3/2. We do not encounter these problems for the Gaussian initial data (7). The problems in the examples above arise since Yb (x, t) is close to 0 for many x ∈ S N simultaneously, and Theorem 2.4 implies that this is not the case for the Gaussian initial data. N Lemma 2.11. For any ω ∈ Ω let φω be defined by (8). For any K > 0 the map φω |LK : LK N M (S ) → M (S ) L1M (S N ) is a.s. continuous for the supremum norm.

Proof. By the definition of φω , for any > 0 and y ∈ LK (S N ) it holds a.s. that kφω (y) − φω (e y )kL∞

sup y e∈L(S N ), ke y −ykL∞ <

≤

Z

sup

X

y e∈L(S N ), ke y −ykL∞ <

1≤m,m0 ≤M, m6=m0

≤

X 1≤m,m0 ≤M, m6=m0

SN

1Bm (x)+ym (x)≥Bm0 (x)+ym0 (x);Bm (x)+eym (x)≤Bm0 (x)+eym0 (x) dx (22)

Z SN

1|(Bm0 (x)−Bm (x))−(ym (x)−ym0 (x))|≤2 dx.

We want to show that a.s., for all y ∈ LK (S N ) the right side of (22) converges to 0 as → 0. Fix e m, m0 ∈ {1, ..., M }, m 6= m0 . The random field B(x) := Bm0 (x) − Bm (x) has the law of a constant multiple of Bm . Also note that ym − ym0 ∈ L2K (S N ). Assume = 2−k for k ∈ N. On the event G1k of Lemma 2.10 e instead of B, and = 1/1000), (with 2K instead of K, B Z 1|(Bm0 (x)−Bm (x))−(ym (x)−ym0 (x))|≤2 dx 2k(N −1/2+1/1000) × 2−kN = 2−k/2+k/1000 SN

for all y ∈ LK (S N ). The lemma now follows by Lemma 2.10. N K )→ Proposition 2.12. For any ω ∈ Ω let φω be defined by (8). For any K > 0 the map φω |LK N : LM (S M (S ) 1 N LM (S ) is a.s. Lipschitz continuous for the supremum norm.

Proof. Recall the set LeK (S N ) defined by (16). By Lemmas 2.6 and 2.11 it is sufficient to prove a.s. Lipschitz continuity on LeK (S N ). By (22) it is sufficient to prove that for any m, m0 ∈ {1, ..., M }, m 6= m0 , and e = Bm − Bm0 , B Z sup e2K (S N ) f ∈L

SN

1|B(x)−f e (x)|≤ dx ,

(23)

e The occupation times formula (12) where the implicit constant is independent of , but depends on B. e − f, implies that a.s. for fixed f and with α(f, ·, ·) denoting the occupation kernel of B Z Z 1|B(x)−f α(f, a, S N ) da. e (x)|≤ dx = SN

−

For fixed a ∈ R, by the definition of α it holds a.s. that α(f, a, S N ) = α(f + a, 0, S N ). By Theorem 2.4 (I) and with α e as in this theorem, we have α(f + a, 0, S N ) = α e(f + a) a.s. Therefore, Z Z 1|B(x)−f α e(f + a) da ≤ 2 sup α e(f ), e (x)|≤ dx = SN

f ∈LK+ (S N )

−

20

which completes the proof of the lemma upon an application Theorem 2.4 (III). We will deduce Theorem 2.1 from the following Banach space version of the theorem known as the PicardLindel¨ of theorem in the theory of differential equations. The theorem is proved in the same manner as the Picard-Lindel¨ of theorem, i.e., by defining a contraction mapping from the integral version of (9), showing that the Picard iterates converge to a solution, and deducing uniqueness from the contraction property, see e.g. [AMR88, Lemma 4.1.6]. The integral in (iii) is the Bochner integral. b k · k) be a Banach space, Lb ⊂ C, b y0 ∈ L, b t0 ∈ R, ∆t > 0, and I = [t0 − ∆t, t0 + ∆t]. Theorem 2.13. Let (C, b b Let φ : C → C be a map satisfying the following properties: b (i) φ is uniformly Lipschitz continuous on the closure of L, (ii) supy∈Lb kφ(y(s))k < ∞, and (iii) y0 +

Rt t0

b φ(y) ds ∈ Lb for any t ∈ I and any continuous curve (y(s))t0 ≤s≤t with values in L.

Then there is a unique curve (y(t))t∈I , such that y(t0 ) = y0 , Proof. The conditions

∂y ∂t

∂y ∂t (t)

= φ(y(t)), and y(t) ∈ Lb for all t ∈ I.

= φ(y(t)) and y(t0 ) = y0 are equivalent to the following Z

t

y(t) = y0 +

φ(y(s)) ds.

(24)

t0

Define y0 (t) = y0 for all t ∈ I, and for n ∈ N and t ∈ I define yn (t) by induction: Z

t

yn+1 (t) = y0 +

φ(yn (s)) ds.

(25)

t0

By assumption (iii) we have yn (t) ∈ Lb for all n ∈ N and t ∈ I. Let C1 ≥ 0 be the Lipschitz constant of φ on b and let C2 ∈ [0, ∞) be defined by C2 := sup b kφ(y)k. Then ky1 (t) − y0 k ≤ C2 |t − t0 | for all t ∈ I by L, y∈L (25). By assumption (i) and induction on n, we get further t0 ∨t

Z

kφ(yn (s)) − φ(yn−1 (s))k ds ≤ C2 C1n |t − t0 |n+1 /(n + 1)!.

kyn+1 (t) − yn (t)k ≤ t0 ∧t

Since Cb is complete it follows that (yn (t))t∈I converges uniformly to a curve (y(t))t∈I , such that y(t) ∈ Cb for any t ∈ I. Further, by sending n → ∞ in (25) it follows by continuity of φ on the closure of Lb and by the dominated convergence theorem for the Bochner integral that y satisfies the integral equation (24). We have y(t) ∈ Lb for all t ∈ I by assumption (iii). This concludes the proof of existence of solutions. To obtain uniqueness of solutions let (e y (s))s∈I for ye(s) ∈ Lb be another solution. By (24) we have ky0 (t) − ye(t)k ≤ C2 |t − t0 | for all t ∈ I. By assumption (i) and induction we get further that for any n ∈ N, Z kyn (t) − ye(t)k ≤

t0 ∨t

kφ(yn (s)) − φ(e y (s))k ds ≤ C2 C1n |t − t0 |n+1 /(n + 1)!.

t0 ∧t

Letting n → ∞ it follows that y = ye. N Theorem 2.1 follows from Theorem 2.13 applied with Cb = CM (S N ), Lb = LK M (S ) for some K > 0, and ω φ = φ as defined by (8).

21

Proof of Theorem 2.1. First consider the case when V = S. Assume the assertion of the theorem is not true, and let e t ∈ [0, ∞) be the supremum of times t ≥ 0 for which (9) has a unique solution in [0, t]. Choose an arbitrary ∆t > 0, let K > e t + ∆t, and define t0 = e t − (∆t)/2. We will prove that the assumptions of Theorem N N b 2.13 are a.s. satisfied with φ = φω , y0 = y(t0 ), Lb = LK M (S ), C = CM (S ) and I = [t0 − ∆t, t0 + ∆t]. We b Condition (i) holds by equip Cb with the norm kf k = max1≤m≤M kfm kL∞ (S N ) for f = (f1 , . . . , fM ) ∈ C. ω N K Proposition 2.12. Condition (ii) is satisfied since kφ (y)k ≤ 2 for any y ∈ LM (S N ) (in fact, this bound holds even for y ∈ CM (S N )). Finally observe that (iii) holds since for any t ∈ I,

Z t

ω N N

y(t0 ) + φ (y(·, s)) ds

≤ 2 t0 + ∆t2 , t0

Rt and since the function y(t0 ) + t0 φω (y(·, s)) ds is Lipschitz continuous with Lipschitz constant at most 2N −1 e t + 2N −1 ∆t in each coordinate. Theorem 2.13 now implies that (9) has a unique solution on I, which is a contradiction. This completes the proof of the theorem for the case V = S. Now consider the case N = 1 and V = R. First we prove uniqueness. It is sufficient to show that given any > 0 solutions are unique on [−−1 , −1 ] × R+ with probability at least 1 − . Let R ∈ {3, 4, . . . } be sufficiently large such that with probability at least 1 − there are real numbers xi (which are random and measurable with respect to σ(B)) for i = 1, 2, 3, 4 such that −R/2 + 1 < x1 < x2 − 1 < −−1 − 1 < −1 + 1 < x3 + 1 < x4 < R/2 − 1, and such that x 7→ p(B(x)) is constant on the intervals [x1 , x2 ] and [x3 , x4 ]. Consider e e the Schelling model on the torus S of width R, and let (B(x)) x∈[−R/2,R/2] be the initial values. Couple B and e [−R/2+1,R/2−1] = B|[−R/2+1,R/2−1] a.s., and observe that if Y : R × R+ solves the Schelling B such that B| model (5), (7) on R, then Y |[x2 ,x3 ] is a solution to the Schelling model on S restricted to [x2 , x3 ]. Here we use that if x 7→ p(B(x)) is constant on an interval of length > 1 then Y (resp. Yb ) evolves independently to the left and to the right of this interval. By uniqueness of solutions to the Schelling model on S, we obtain uniqueness of solutions to the Schelling model on R. e and xi ∈ R for i = 1, 2, 3, 4 be as Existence follows by a similar argument. Let > 0, R ∈ {3, 4, . . . }, B, in the previous paragraph. It is sufficient to prove existence of Y restricted to [x2 , x3 ], since the real line a.s. can be divided into countably many disjoint intervals, such that each interval either (i) has length > 1 and is such that x 7→ p(B(x)) is constant on the interval, or (ii) is between two intervals of type (i). If we find a solution on each interval of type (ii) we can get a global solution by concatenating the solution from the different intervals, since p(Y (x, t)) is constant for all t ≥ 0 and all x in an interval of type (i). By existence of solutions to the Schelling model on the torus, we define Y |[x2 ,x3 ] to be equal to the solution of the Schelling model on the torus restricted to [x2 , x3 ], which concludes the proof.

2.4

Long-time behavior of the one-dimensional continuum Schelling model

The main result in this section is the following proposition. Proposition 2.14. Let V = R or let V = S. Let Y be the solution of the initial value problem (5), (7) on V with M = 2, N = 1 and N = N∞ , and for m = 1, 2 define n o Am := x ∈ R : lim p(Y (x, t)) = m . t→∞

Then R = A1 ∪ A2 a.s., the boundary ∂A1 is a.s. equal to the boundary ∂A2 , and this boundary a.s. consists of a countable collection of points such that the distance between any two of these points is strictly greater than one 1. The proposition implies that the complement of ∂A1 = ∂A2 is a sequence of open intervals, each of length greater than 1, which alternately belong to A1 and A2 . We make no statement about whether the limit does or does not exist at the boundary points themselves.

22

Remark 2.15. The reason the proposition is only stated for M = 2 is that a particular form of monotonicity of (5) holds only for M = 2. More precisely, if M = 2, Y solves (5), (7), and Ye solves (5) with initial data Ye 1 (·, 0) = B + f and Ye 2 (·, 0) = B − f for a strictly positive function f (with f chosen such that Ye is well-defined), then (Ye 1 − Ye 2 ) − (Y 1 − Y 2 ) is a strictly positive function which is increasing in t. However, we do believe that the proposition also holds for M > 2, and if we had established the proposition for all M , then Theorem 1.2 would hold for all M ∈ {2, 3, . . . }. We briefly outline the proof of Proposition 2.14 before we proceed, and we begin with some notation. We say that an opinion m dominates in an interval J in the limit as t → ∞ if the fraction of (x, t) ∈ J × [0, T ] for which p(Y (x, t)) = m converges to 1 as T → ∞ (equivalently, r(m, J, T ) → 1 with the notation introduced below). Intuitively, this means that individuals in the interval J are (regardless of how they started out) increasingly tending to switch their opinions to m in the large T limit. The first part of the argument is to show that if a certain opinion m dominates in an interval J of length ≥ 1 in the limit as t → ∞, and J is not contained in a larger interval satisfying this property, then the interval of length 1 immediately to the right (or left) of J is dominated by some other opinion m0 as t → ∞. This result is stated in Lemma 2.20 (which is in turn immediate from Lemmas 2.18 and 2.19 below). As explained right after Lemma 2.20, from this lemma we can deduce a weak variant of Proposition 2.14 which holds for all M . We conclude the proof of Proposition 2.14 by a perturbative approach. We show that if the result of Proposition 2.14 does not hold, then it will hold for a slight perturbation of the initial data which favors one opinion more. We deduce from this that the set of initial data on which the proposition does not hold is exceptional. If we increase the initial bias towards opinion (say) 1 uniformly by , then we can find some e such that for any x the measure of the set {x0 ∈ N (x) : p(Y (x0 , t)) = 1} positive (random) number `(t) e e > 0, increases by at least `(t). By a detailed analysis of the differential equation we can show that inf t≥0 `(t) which we use to show that there exists at least one interval of length > 1 on which the bias converges, and further (using Lemma 2.20) that this property must hold everywhere. The following lemma is immediate from (5), and will be used throughout the proof of the proposition. Lemma 2.16. Let V = R or let V = S. Let Y be a solution of the initial value problem (5), (7) on V with N = 1 and N = N∞ . Let t ≥ 0, let m ∈ {1, ..., M }, and let J ⊂ V be an interval of length ≥ 1 such that p(Y (x, t)) = m for all x ∈ J. Then p(Y (x, t0 )) = m for all t0 ≥ t and x ∈ J. Remark 2.17. For the discrete Schelling model on Z the final configuration of opinions will always consist of intervals of length at least w + 1 in which all nodes have the same limiting opinion. This can be seen by the following argument. If there is an interval of length w + 1 where all nodes have the same opinion m, then no nodes in this interval will ever change their opinion. Furthermore, if i is the first node to the right of this interval for which m is not the limiting opinion, then nodes i, i + 1, . . . , i + w must all have the same limiting opinion; otherwise node i would not be satisfied in the final configuration. Since we consider the model on Z, there will always be some interval of length w + 1 where all nodes have the same opinion in the initial configuration. By induction on the nodes to the left and right, respectively, of this interval, it follows that all nodes are contained in an interval of length at least w + 1 in which all nodes have the same limiting opinion. The two lemmas we will prove next (which imply Lemma 2.20 when combined) are continuum analogs of this result. Lemma 2.18 says that if some opinion m dominates in an interval I = (a − 1, a) of length exactly 1, but p ◦ Y does not converge pointwise to m in I, then there is some opinion m0 6= m which dominates in the interval Ie = (a, a + 1). We give a brief outline of the proof in the simplified setting where M = 2 and m = 1. Observe that for x ∈ I and t 1, we see from (10) that Yb (x, t) is approximately equal to t r(1, I, t)−r(2, N (x)\I, t) . If r(1, I, t) is very close to 1, but Yb (x, t) < 0, then we must have r(2, N (x)\I, t) very close to 1. We can use this to argue existence of x0 satisfying 0 < x0 − a 1 such that r(2, x0 , t) is close e we see that Yb (x0 , t) is approximately equal to 1. By (10) and since N (x0 ) is approximately equal to I ∪ I, e t) . We can deduce from this that r(2, I, e t) is close to 1, so opinion 2 dominates in the to t r(1, I, t) − r(2, I, e interval I. 23

I

Ie a x0t a + 1/5

a−1

a+1

Figure 6: Illustration of objects in the statement and proof of Lemma 2.18. We assume that opinion m dominates in the interval I, in the sense that r(m, I, t) → 1 as t → ∞. We also assume that there exists an x ∈ I for which p(Y (x, t)) does not converge to m. Using the latter assumption, we prove that if we pick some sufficiently small > 0 then for every sufficiently large t (where in particular √ enough so √ t is large that r(m, I, t) > 1 − ) there exists an xt in the slightly translated interval (a − 1 + , a + ) such that p(Y (xt , t)) 6= m. Using this, we will then prove existence of m0 ∈ {1, . . . , M } \ {m} and x0t ∈ (a, a + 1/5 ) for all sufficiently large t > 0, such that r(m0 , x0t , t) > 1 − 1/5 . Then we use the existence of x0t to prove that e t) = 1. Note that by symmetry, we could have made the analogous argument with Ie on the limt→∞ r(m0 , I, left side of I, instead of the right side. Lemma 2.18. Let Y be a solution of (5) with continuous initial data (chosen such that we have existence of solutions of (5)), N = 1, N = N∞ , and either V = S or V = R. For any m ∈ {1, ..., M }, an interval J ⊂ R, and t ≥ 0, define Z tZ 1 1p(Y (x,t0 ))=m dx dt0 . r(m, J, t) := |J|t 0 J For a ∈ R define I := (a − 1, a) and Ie := (a, a + 1). Assume there exists an m ∈ {1, ..., M } such that limt→∞ r(m, I, t) = 1, and that there exists x ∈ I for which the limit limt→∞ p(Y (x, t)) either does not exist or e t) = takes a value different from m. Then there exists an m0 ∈ {1, ..., M }, m0 6= m, such that limt→∞ r(m0 , I, 1. Proof. Define K := supx∈I∪Ie kY (x, 0)k1 . Let ∈ (0, 1/10), and define d := 1/2 and d0 := 1/5 . For each t ≥ 0 we define xt ∈ R by xt := inf{x ≥ a − 1 + d : p(Y (x, t)) 6= m}, and let m0t ∈ {1, ..., M }\{m} be such that Ym (xt , t) ≤ Ym0t (xt , t). Note that we can find such an m0t since Y is continuous. For m ∈ {1, ..., M }, x ∈ R and t ≥ 0 define 1 r(m, x, t) := t

t

Z

1p(Y (x,t0 ))=m dt0 .

(26)

0

We abuse notation slightly by letting r denote both this function and the function in the statement of the lemma. We will prove that for all sufficiently large t ≥ 0 there exists x0t ∈ (a, a+d0 ) satisfying r(m0t , x0t , t) > 1−d0 . The following relation, which follows directly from (5) and holds for any m1 , m2 ∈ {1, ..., M }, will be used multiple times throughout the proof of this result 1 1 Ym1 (xt , t) − Ym2 (xt , t) − Ym1 (xt , 0) − Ym2 (xt , 0) = r(m1 , N (xt ), t) − r(m2 , N (xt ), t). t t

(27)

We consider the following three cases separately: (I) a − 1 + d ≤ xt < a − 1 + d0 , (II) a − 1 + d0 ≤ xt < a, (III) a ≤ xt ≤ a + d . One of the cases (I)-(III) must occur by the following argument, i.e., we cannot have xt > a + d . If xt > a + d , we would have an interval of length > 1, such that p(Y (x, t)) = m for all x in the interval. Therefore, by Lemma 2.16 we would have p(Y (x, t0 )) = m for all t0 ≥ t and all x in the interval. Since limt→∞ r(m, I, t) = 1, and by the differential equation (5) and Lemma 2.16, it would follow that limt→∞ p(Y (x, t)) = m for all x ∈ I, which is a contradiction to the assumptions of the lemma.

24

First consider cases (I) and (II) defined above. Define IL := (xt − 1, a − 1), IR := (a, xt + 1) and dt := xt − (a − 1) ≥ d . Note that N (xt ) = IL ∪ I ∪ IR , |IR | = dt and |IL | = 1 − dt . By (27), for all sufficiently large t (chosen such that r(m, I, t) > 1 − , which implies r(m0t , I, t) < ) |IR |r(m0t , IR , t) =

1 1 Ym0t (xt , t) − Ym (xt , t) − Ym0t (xt , 0) − Ym (xt , 0) − r(m0t , I, t) + r(m, I, t) t t − |IL |r(m0t , IL , t) + |IL |r(m, IL , t) + |IR |r(m, IR , t)

> 0 − 2K/t − + (1 − ) − (1 − dt ) + 0 + 0 = − 2K/t − 2 + dt . If there is no appropriate x0t and case (I) occurs, dt (1 − d0 ) ≥ dt

sup

r(m0t , x, t) ≥ |IR |r(m0t , IR , t) > −2K/t − 2 + dt ,

x∈[a,a+dt ]

which is a contradiction for sufficiently large t, since dt d0 ≥ 2K/t+2 for all large t. If there is no appropriate x0t and case (II) occurs, dt − (d0 )2 ≥ d0

sup x∈[a,a+d0 ]

r(m0t , x, t) + (dt − d0 )

sup x∈[a+d0 ,a+d0t ]

r(m0t , x, t) ≥ |IR |r(m0t , IR , t) ≥ 2K/t − 2 + dt ,

which is a contradiction for sufficiently large t, since d0 d0 ≥ K/t + for all large t. In case (III) define dt := xt −a < d , IL := I ∩N (xt ) = (xt −1, a) and IR := N (xt )\IL = (a, xt +1). Note that |IL | = 1 − dt and |IR | = 1 + dt . By (27), for any sufficiently large t (such that |IL |r(m, IL , t) > 1 − dt − ) |IR |r(m0t , IR , t) =

1 1 Ym0t (xt , t) − Ym (xt , t) − Ym0t (xt , 0) − Ym (xt , 0) t t − |IL |r(m0t , IL , t) + |IL |r(m, IL , t) + |IR |r(m, IR , t)

≥ − 2K/t + 0 − + (1 − dt − ) + 0, so if there is no appropriate x0t , 1 + dt − (d0 )2 = d0 (1 − d0 ) + (1 + dt − d0 ) ≥ d0

sup x∈[a,a+d0 ]

r(m0t , x, t) + (1 + dt − d0 )

inf

x∈IR \[a,a+d0 ]

r(m0t , x, t)

≥ |IR |r(m0t , IR , t) ≥ −2K/t − 2 − dt + 1, which implies (d0 )2 ≤ 2K/t + 2 + 2dt . This is a contradiction for sufficiently large t. We conclude that an appropriate x0t exists for all large t in all cases (I)-(III). For any t ≥ 0 define e s) < 1 − 3 − 3d0 , r(m, I, s) > 1 − }. St := {s ∈ (K/, ∞) : r(m0t , I, We will prove that |St ∩ [0, t]|/t < d0 for all sufficiently large t > 0. For any t > 0 sufficiently large such that x0t exists and r(m, I, s) > 1 − , and for any s ∈ St it follows from (27) that 1 1 Ym (x0t , s) − Ym0t (x0t , s) = Ym (x0t , 0) − Ym0t (x0t , 0) + (x0t − a)r(m, [a + 1, x0t + 1], s) s s e s) − r(m0t , I, e s) − (x0t − a)r(m0t , [a + 1, x0t + 1], s) + r(m, I, + (1 + a − x0t )r(m, [x0t − 1, a], s) − (1 + a − x0t )r(m0t , [x0t − 1, a], s) ≥ − 2K/s + 0 − d0 + 0 − (1 − 3 − 3d0 ) + (1 − − d0 ) − > 0, 25

where we used the following estimates to obtain the first inequality (1 + a − x0t )r(m, [x0t − 1, a], s) = r(m, I, s) − (x0t − a)r(m, [a − 1, x0t − 1], s) ≥ (1 − ) − d0 , (1 + a − x0t )r(m0t , [x0t − 1, a], s) = r(mt , I, s) − (x0t − a)r(mt , [a − 1, x0t − 1], s) ≤ (1 − r(m, I, s)) − 0 ≤ . Therefore Ym (x0t , s) > Ym0t (x0t , s) for all sufficiently large t and s ∈ St , and it follows from the definition of r that r(m0t , x0t , t) ≤ 1 − |St ∩ [0, t]|/t for all sufficiently large t. Since 1 − d0 < r(m0t , x0t , t) by definition of x0t it follows that |St ∩ [0, t]|/t < d0 for all sufficiently large t. Note that if is sufficiently small, and t, t0 ≥ 0 are such that m0t 6= m0t0 , then it follows from the definition of St and St0 that (St )c ∩ (St0 )c ⊂ {s ≥ 0 : r(m, I, s) ≤ 1 − }. Therefore the estimate |St ∩ [0, t]|/t < d0 for all sufficiently large t and the assumption lims→∞ r(m, I, s) = 1 imply that there is an m0 ∈ {1, ..., M } such that m0t = m0 for all sufficiently large t. Define e s) < 1 − 3 − 3d0 }, S0 := {s ∈ (K/, ∞) : r(m0 , I, and note that |S0 ∩ [0, t]|/t < d0 for all sufficiently large t. Let e > 0. In order to complete the proof of the lemma it is sufficient to show that the set of t0 > 0 e t0 ) < 1 − e e t0 ) < 1 − e such that r(m0 , I, is bounded from above. Let t0 > 0 be such that r(m0 , I, . Choose e 0 0 0 e 0 > 0 such that e > 10d . By definition of r, for any t ∈ [(1 − 10 )t , t ] we have r(m , I, t) < 1 − 21 e . By e t) < 1 − 3 − 3d0 for all t ∈ [(1 − e )t0 , t0 ]. By definition of definition of we have 3 + 3d0 < 12 e , so r(m0 , I, 10 /10. On the other hand we know from the preceding paragraph that S0 this implies that |S0 ∩ [0, t0 ]|/t0 > e /10 for all sufficiently large t, which completes the proof of the lemma. |S0 ∩ [0, t]|/t < d0 < e Lemma 2.19. Let Y be the solution of the initial value problem (5), (7) for N = 1, N = N∞ , and either V = S or V = R. Assume a ∈ R and m ∈ {1, . . . , M } are such that limt→∞ p(Y (x, t)) = m for all x ∈ I := (a − 1, a), and that for x > a arbitrarily close to a, either the limit limt→∞ p(Y (x, t)) does not exist or limt→∞ p(Y (x, t)) 6= m. Then there is an m0 ∈ {1, . . . , M }, m0 6= m, such that, in the notation of Lemma e t) = 1 for Ie := (a, a + 1). 2.18, we have limt→∞ r(m0 , I, Proof. For each t ≥ 0 we can find an xt ∈ R such that limt→∞ xt = a and mt := p(Y (xt , t)) 6= m. By the identity (27) and letting ot (1) denote a term which converges to 0 as t → ∞, 1 0 ≤ (Ymt (xt , t) − Ym (xt , t)) t 1 e · r(mt , N (xt ) ∩ I, e t) = (Ymt (xt , 0) − Ym (xt , 0)) + |N (xt ) ∩ I| · r(mt , N (xt ) ∩ I, t) + |N (xt ) ∩ I| t e · r(m, N (xt ) ∩ I, e t) + ot (1) − |N (xt ) ∩ I| · r(m, N (xt ) ∩ I, t) − |N (xt ) ∩ I| e t) − r(m, I, t) − r(m, I, e t) + ot (1). = r(mt , I, t) + r(mt , I, e t) = 1 Since limt→∞ r(m, I, t) = 1, which implies limt→∞ r(mt , I, t) = 0, it follows that limt→∞ r(mt , I, PM 0 e e and limt→∞ r(m, I, t) = 0. Since k=1 r(k, I, t) = 1 this implies further that there is an m ∈ {1, ..., M }, e t) = 1. m0 6= m, such that mt = m0 for all sufficiently large t > 0. It follows that limt→∞ r(m0 , I, The following lemma is immediate from Lemmas 2.18 and 2.19. Lemma 2.20. Let V = S or V = R, and consider the initial value problem (5), (7) for N = 1 and N = N∞ . Assume m ∈ {1, . . . , M } and I = [a1 , a2 ] ⊂ R is an interval of length ≥ 1 such that limt→∞ r(m, I, t) = 1, and such that I is not contained in any larger interval satisfying this property. Then there is an m0 ∈ {1, . . . , M } \ {m} such that limt→∞ r(m0 , [a2 , a2 + 1], t) = 1. We can deduce a weak version of Proposition 2.14 for the case V = R from Lemma 2.20. This weak version of Proposition 2.14 is the last result of the section which holds also for M > 2. By Lemma 2.16 we know that there will be some interval I satisfying the conditions of Lemma 2.20. Lemma 2.20 therefore says 26

that all x ∈ R will be contained in an interval J ⊂ R of length ≥ 1, such that for some m0 ∈ {1, . . . , M }, we have limt→∞ r(m0 , J, t) = 1. In particular, both this lemma and Proposition 2.14 say that the limiting states of the continuum Schelling model on R can be divided into intervals of length ≥ 1 such that each interval is associated with a particular limiting opinion. The lemma is weaker than Proposition 2.14 in two ways: First, we do not prove that each interval has length strictly larger than 1, and second, instead of proving that p ◦ Y converges pointwise on the intervals we prove a weaker result expressed in terms of the function r. Both these stronger properties are needed when we apply Proposition 2.14 in our proof of Theorem 1.2. For x, x0 ∈ S we define |x − x0 | to be equal to inf k∈N |x − x0 − Rk| when we identify S with the interval [0, R). Lemma 2.21. Consider the initial value problem (10) with N = 1, N = N∞ , and either V = S or V = R, but with the following perturbed initial data for some K > 0 b Yb (x, 0) = B(x) + λ([−(K + 1), K + 1] ∩ [x − 1, x + 1]) b Yb (x, 0) = B(x) + for V = S.

for V = R,

(28)

For 1 > 2 > 0 let Yb 1 and Yb 2 denote the solution of (10), (28) with = 1 and = 2 , respectively. For b 1 , and 2 , and for V = S let J = S. For V = R let J ⊂ [−K, K] be an interval which may depend on B, t ≥ 0 define h(t) := inf{Yb 1 (x, t) − Yb 2 (x, t) : x ∈ J},

`(t) = inf{|x − x0 | : x, x0 ∈ J, Yb 1 (x, t) < 0, Yb 2 (x, t) > 0},

b 1 , and 2 , where the infimum over the empty set is defined to be ∞. There are c0 , c1 > 0 depending on B, such that h(t) h(t) − c1 ; min c0 ; . `(t) ≥ max 2t 2.01t Proof. For all x ∈ J we have Yb 1 (x, 0) ≥ Yb 2 (x, 0) + (1 − 2 ). Since the right side of the differential equation (10) is monotone in Y , this implies that the function t 7→ Yb 1 (x, t) − Yb 2 (x, t) is increasing for each x ∈ J. b b 0 )| : x, x0 ∈ J}. Let yb, ye : V × R+ → R be such that Yb 1 (x, t) = Define c1 := sup{|B(x) − B(x Yb 1 (x, t) + yb(x, t) and Yb 2 (x, t) = Yb 2 (x, t) + ye(x, t), and recall that yb(·, t) and ye(·, t) are Lipschitz continuous 1 , we with constant 2t by Theorem 2.1. For x, x0 ∈ J and t ≥ 0 satisfying Yb 2 (x, t) ≥ 0 and |x − x0 | ≤ h(t)−c 2t have b 0 ) − B(x)) b Yb 2 (x0 , t) = (B(x + (b y (x0 , t) − yb(x, t)) + (Yb 1 (x, t) − Yb 2 (x, t)) + Yb 2 (x, t) ≥ −c1 − 2t|x − x0 | + h(t) + 0 ≥ 0,

1 which implies `(t) ≥ h(t)−c . 2t 3 1 −2 b on J, and observe that c0 ≤ h(t) 3 Define c0 := 1000c2 , where c2 is the 1/3-H¨older constant for B 1000c2 1/3 for all t ≥ 0, so c2 c ≤ h(t) . For x, x0 ∈ J satisfying Yb 2 (x) ≥ 0 and |x − x0 | ≤ min c0 ; h(t) , we have

0

1000

2.01t

b 0 ) − B(x)) b Yb 1 (x0 , t) = (B(x + (b y (x0 , t) − yb(x, t)) + (Yb 1 (x, t) − Yb 2 (x, t)) + Yb 2 (x, t) ≥ −c2 |x − x0 |1/3 − 2t|x − x0 | + h(t) + 0 ≥ −

h(t) h(t) − 2t + h(t) ≥ 0, 1000 2.01t

h(t) which implies `(t) ≥ min c0 ; 2.01t . Combining the above two bounds for `(t) we obtain the lemma. Lemma 2.22. Consider the initial value problem (10) with N = 1, N = N∞ , and either V = S or V = R. Let E be the event that (in the notation of Proposition 2.14), R \ (A1 ∪ A2 ) has measure zero, and each set A1 and A2 can be written as the union of intervals of length > 1. Let E 0 be the event that at least one of the sets A1 and A2 contains an interval of length > 1. Then P[E 0 ∩ E c ] = 0. 27

Yb 1 (x, t)

≥ h(t)

Yb 1 (x, t)

J

Yb 2 (x, t)

≥ `(t)

x

x0 − 1

x0

x0

x00

x0 + 1

x

Yb 2 (x, t)

Figure 7: For 1 > 2 > 0 we consider two solutions Yb 1 and Yb 2 of (10) with initial data perturbed by 1 and 2 , respectively. We prove Proposition 2.14 by showing that the event considered in the proposition must occur for at least one of Yb 1 and Yb 2 . Left: Illustration of h and ` defined in Lemma 2.21. Right: Illustration of the proof of Proposition 2.14, case (b). Proof. For > 0 and fixed K > 1 consider the initial value problem (10) with perturbed initial data (28). Let A1 () and A2 () be defined just as the sets A1 and A2 , respectively, in the statement of Proposition e 2.14. Let E() be the event that the following two properties hold: (i) if V = S then at least one of the sets A1 and A2 contains an interval of length > 1, and if V = R then at least one of the sets A1 ∩ [−K, K] and A2 ∩ [−K, K] contains an interval of length > 1, and (ii) the origin is not contained in an interval I of length > 1 such that limt→∞ sign Y (x, t) exists and is equal for all x ∈ I. Since K was arbitrary and by translation invariance in law in the space variable, in order to complete the proof of the lemma it is sufficient to show e that P[E(0)] = 0. First we will reduce the lemma to proving e 1 ) ∩ E( e 2 )] = 0 P[E(

∀1 > 2 > 0.

(29)

e Assume P[E(0)] > 0. Observe that the initial data (28) are absolutely continuous with respect to the initial data with = 0. By absolute continuity, we can find a sequence (n )n∈N and p > 0 such that n1 6= n2 for e n )] > p for all n ∈ N. If we assume (29) holds this leads to a contradiction, since for any n1 6= n2 and P[E( n ∈ N, " n # n [ X e k) = e n )] ≥ np, P E( P[E( k=1

k=1

which converges to ∞ as n → ∞. We conclude that the lemma will follow once we have established (29). Let Yb 1 (resp. Yb 2 ) denote the solution of (10) with perturbed initial data for = 1 (resp. = 2 ). We e 1 ) and E( e 2 ) occur, and want to derive a contradiction. For an interval will assume that both events E( I ⊂ V and t ≥ 0 define Z tZ 1 sign(Yb 2 (x, t0 )) dx dt0 . r(I, t) := λ(I)t 0 I e 2 ) there By Lemma 2.20 and by absolute continuity of the initial data, we know that on the event E( is almost surely an interval I of length ≥ 1 containing the origin, such that either limt→∞ r(I, t) = 1 or limt→∞ r(I, t) = −1. Without loss of generality we assume limt→∞ r(I, t) = −1; the case limt→∞ r(I, t) = 1 can be treated similarly. We also assume that I is the maximal open interval satisfying this property, and let a be the left end-point of the interval. By Lemma 2.20, limt→∞ r([a − 1, a], t) = 1. If λ(I) > 1, it is immediate from the definition of r and (10) that limt→∞ sign Yb 2 (x, t) = −1 for all x ∈ I, which contradicts (ii) in the definition of E(2 ). Therefore we will assume λ(I) = 1. To conclude the proof of the lemma it is sufficient to derive a contradiction to the occurrence of E(1 ). Let J = [a − 1/2, a + 1/2], and let h and ` be as in Lemma 2.21. Let T ⊂ R+ be the set of times t ≥ 0 for we can find a x0 ∈ [a−1/10, a+1/10] such that Yb 2 takes both positive and negative values arbitrarily close to

28

x0 . Since limt→∞ r([a, a + 1/2], t) = −1 and limt→∞ r([a − 1/2, a], t) = 1, we have limt→∞ λ(T ∩ [0, t])/t → 1. Also observe that for t ∈ T , λ({x ∈ J : Yb 1 (x, t) > 0, Yb 2 (x, t) < 0}) ≥ `(t) ∧ (1/4). By (10) we get further that for all t ∈ T , dh (t) ≥ 2λ({x ∈ J : Yb 1 (x, t) > 0, Yb 2 (x, t) < 0}) ≥ (2`(t)) ∧ (1/2). dt n o h(t) Since `(t) ≥ min c0 ; 2.01t , this implies that limt→∞ h(t) = ∞. h(t)−c1 By the lower bound for ` in Lemma 2.21 we also have dh for all t ∈ T , and since dt (t) ≥ 2`(t) ≥ t limt→∞ h(t) = ∞ this implies that h(t) ≥ ct for some random constant c > 0 and all t ≥ 0. Lemma 2.20 implies that limt→∞ r([a − 1, a] ∪ [a + 1, a + 2], t) = 1. By this result, (10), and limt→∞ r(I, t) = −1,

inf

x∈I

1 1 b 2 inf r(I 0 , t) → 0 as t → ∞. Y (x, t) ≥ inf Yb 2 (x, 0) + r(I, t) + 0 x∈I t t I ⊂[a−1,a]∪[a+1,a+2] : λ(I 0 )=1

It follows that inf x∈I Yb 1 (x, t) ≥ inf x∈I Yb 1 (x, t) + h(t) > 0 for all sufficiently large t. This implies that inf x∈[a−1,a+2] Yb 1 (x, t) > 0 for all sufficiently large t, which is a contradiction to the condition (ii) in the definition of E 0 (1 ). Proof of Proposition 2.14. For the case when V = R the proposition follows immediately from Lemma 2.22, since P[E 0 ] = 1 by Lemma 2.16. Let V = S. For > 0 let E 0 () be the event of Lemma 2.22, but for the perturbed initial data. It is sufficient to prove that for any 1 > 2 > 0, we have P[E 0 (1 )c ∩ E 0 (2 )c ] = 0, since this implies that P[(E 0 )c ] = 0 (see the argument in the second paragraph in the proof of Lemma 2.22 for a similar argument). We assume that both E 0 (2 )c and E 0 (1 )c occur, and will derive a contradiction. First we will argue that the following inequality holds dh ≥ 4`(t). dt

(30)

dh ≥ 2 inf λ({x0 ∈ N (x) : Yb 1 (x0 , t) > 0, Yb 2 (x0 , t) < 0}). x∈S dt

(31)

By (10) we have

Since E 0 (1 )c occurs, each closed interval I 0 ⊂ S of length ≥ 1 must intersects some interval of length ≥ 2`(t) on which Yb 2 (x, t) < 0; otherwise we would have Yb 1 (x, t) > 0 on I 0 . For some fixed x0 ∈ S the interval [x0 − 1/2, x0 + 1/2] therefore intersect some interval of length ≥ 2`(t) on which Yb 2 (x, t) < 0. Let J be an interval satisfying this property and which is maximal in the sense that J is not contained inside any other interval satisfying the property. We have either (a) J ⊂ (x0 − 1, x0 + 1), (b) {x0 − 1, x0 + 1} ∩ J 6= ∅. If case (a) occurs, then the following holds by the definition of ` λ({x0 ∈ N (x0 ) : Yb 1 (x0 , t) > 0, Yb 2 (x0 , t) < 0}) ≥ 2`(t).

(32)

Next assume case (b) occurs, and without loss of generality assume (x0 − 1) ∈ J. See Figure 7 for an illustration. We have `(t) ≤ 1/2; otherwise E 0 (2 )c and E 0 (1 )c cannot both occur, since Yb 1 would be positive on any maximal interval of length < 1 on which Yb 2 is negative. Let x0 ∈ [x0 − 1/2, x0 ) be the right end-point of J, and observe that Yb 1 is positive and Yb 2 negative on the interval (x0 − `(t), x0 ). Also observe that (x0 − `(t), x0 ) ⊂ N (x0 ) since x0 ∈ [x0 − 1/2, x0 ) and `(t) ≤ 1/2. Define x00 := inf{x > x0 : Yb 1 (x, t) < 0}. We have x00 < x0 + 1 − `(t) < x0 + 1 − `(t), where the first inequality holds since Yb 1 (x0 , t) > 0 and E 0 (1 )c occurs. We also have x00 > x0 + 2`(t); otherwise we would have Yb 2 < 0 on (x0 − `(t), x0 + `(t)) by the definition of `(t), which contradicts the definition of x0 . It follows 29

that Yb 1 is positive and Yb 2 negative on the interval (x00 −`(t), x00 ) ⊂ N (x0 ). Since the intervals (x0 −`(t), x0 ) and (x00 − `(t), x00 ) are disjoint, we see that (32) holds also in case (b). We obtain (30) by combining (31) and (32). Next we argue that h(t) ≥ 2c0 t for all t ≥ 0. Let τ = inf{t ≥ 0 : h(t) ≤ 2c0 t}. We see that τ = ∞, since h(0) > 0, and since (30) and Lemma 2.21 imply that for t ∈ [0, τ ], h(t) 8 dh ≥ 4 min c0 ; ≥ c0 . dt 2.01t 2.01 We conclude that h(t) ≥ 2c0 t for all t ≥ 0. In particular, we have limt→∞ h(t) = ∞, so by Lemma 2.21 h(t) 4h(t) 1 we have `(t) ≥ h(t)−c ≥ 2.01t for all sufficiently large t, which implies by (30) that dh 2t dt ≥ 2.01t for all sufficiently large t. Further we get h(t) ≥ ct4/2.01 for some random constant c > 0. The function h cannot grow superlinearly, since Yb 2 and Yb 1 grow at most linearly in time. Therefore we obtain a contradiction, which concludes the proof.

3

The discrete Schelling model

3.1

The early phase of the discrete Schelling model

The main purpose of this section is to prove the following proposition, which says that the solution of the initial value problem (5), (7) describes the early phase of the discrete Schelling model well for large w. Recall that we rescaled time by w−N/2 when we defined Y w . Therefore the proposition only provides information about times up to order w−N/2 for the discrete Schelling model. Proposition 3.1. Let N ∈ N and V = S, or let N = 1 and V = R. Let M ∈ {2, 3, . . . }, let N be as defined in Section 1.3, let Y be the solution of the initial value problem (5), (7), and let Y w be given by (3). There is a coupling of Y and Y w for w ∈ N such that a.s. Y w converges uniformly to Y on compact subsets of V N × R+ as w → ∞. First we observe that the initial values for the discrete and continuum Schelling model can be coupled. Lemma 3.2. Let V = S or V = R. There is a uniquely defined continuous centered Gaussian (N, M )random field B on V N with covariances given by (6). The initial data Y w (·, 0) of the normalized discrete bias function defined by (3) converges in distribution to B. Proof. We first define a smoothed version Yˇ w ∈ CM (V N ) of Y w (·, 0), and prove convergence of Yˇ w to B. For any i ∈ ZN let A(i) denote the square of side length 1 centered at i, and for i ∈ ZN let wN + i = {x ∈ RN : w−1 (x − i) ∈ N }. If V = S (resp. V = R) define V = S (resp. V = Z). For m ∈ {1, . . . , M } define the smoothed unscaled bias function Yˇm : V × R+ → R by X 1 Yˇm (i, t) = λ(A(j) ∩ (wN + i)) 1X(j,t)=m − , i ∈ V, t ≥ 0. M j∈V

Then define Ymw by (3) and (4), but using Yˇ instead of Y. It follows by e.g. [AP86] that for each m ∈ {1, . . . , M }, Yˇmw converges in law to Bm in C(V N ). Note in particular that the entropy integral considered in [AP86] is finite as required, since ∂N has an upper Minkowski dimension strictly smaller than 2. By this convergence result, we see that the law of Y w (·, 0) is tight in CM (V N ). By convergence of the finite dimensional distributions and hence uniqueness of the limit, we get that (B(x))x∈V N exists and that Y w (·, 0) converges in law to B in CM (V N ). Continuity of B follows e.g. by applying the Kolmogorov-Chentsov theorem as in the construction of Brownian motion, see the proof of Lemma 2.9. To conclude the proof it is sufficient to show that Ymw (·, 0) − Yˇmw converges in law to 0 as w → ∞ for any m ∈ {1, . . . , M }. Let qw ∈ N denote the following measure for the number of lattice points which are near the boundary of wN qw = |{j ∈ ZN : 0 < λ(A(j) ∩ (wN )) < 1}|. 30

Observe that for each i, Ymw (i, 0) − Yˇmw (i) is the weighted sum of qw i.i.d. centered random variables with 1 M −1 values in {− M , M }, divided by wN/2 , where the weights are in [0, 1]. Since the upper Minkowski dimension of ∂N is smaller than N , it holds that qw = O(wN − ) for some > 0. Therefore, for any i ∈ V, we have P[|Ymw (i, 0) − Yˇmw (i)| > w−/3 ] exp(−w−/10 ). A union bound now gives that Ymw (·, 0) − Yˇmw converges in law to 0 as w → ∞. The evolution of Y w is random, while the evolution of Y (given its random initial conditions) is deterministic. The next lemma states that if Y w and Y are coupled in such a way that they are likely to be close at time t0 , then it is likely that they remain close at time t0 + ∆t. Informally, this means that the evolution of Y w is approximately deterministic and approximately follows the same evolution rule as Y . One reason the lemma is challenging to prove is that the evolution of Y w (resp. Y ) may be very sensitive to small perturbations when the bias is approximately as strong towards two different opinions. To bound the effect of this we will use Theorem 2.4, which will imply that the measure of the set of points at which this happens is not too large, uniformly for t in a compact set. Lemma 3.3. Let M , N , R, R, w and N be as in Section 1.3, and consider the Schelling model on V = S. Let T > 0, ∆t > 0 and t0 ∈ {0, ∆t, 2∆t, . . . , dT /∆te∆t}. Consider an arbitrary coupling of Y w (which is defined by (3)) and Y (which solves (5), (7)). Let (Ft )t≥0 denote the filtration Ft = σ Y w |S N ×[0,t] , X|SN ×[0,tw−N/2 ] , {(j, s) ∈ R : s ∈ [0, tw−N/2 ]} . For any δ > 0 define the event Eδw := {kY w (·, t0 ) − Y (·, t0 )kL∞ (S N ) < δ}. There exists a random function v : N → [0, 1] and a random constant c > 1, such that limw→∞ v(w) = 1, and such that for all w ∈ N, 1Eδw P E(t0 + ∆t) ≤ E(t0 )(1 + c∆t) + c∆t2 | Ft0 ≥ 1Eδw v(w), E(t) := kY w (·, t) − Y (·, t)kL∞ (S N ) . The constant c depends on T and the σ-algebra generated by (B(x))x∈S N , and the function v depends on T, δ, ∆t and the σ-algebra generated by (B(x))x∈S N . Lemma 3.3 will follow almost immediately from the following lemma. For any f ∈ CM (S N ) and w ∈ N define kf kL∞ (SN ) := supm∈{1,...,M } supi∈SN |fm (w−1 i)|. Lemma 3.4. The result of Lemma 3.3 holds if we replace L∞ (S N ) by L∞ (SN ) in the definition of the event Eδw and in the second indented equation. The next lemma, which is a discrete version of the estimate (23), will help us to prove Lemma 3.4. Lemma 3.5. For any T > 0 and ∆t > 0 there are random constants c and w0 such that for any δ ∈ (0, 1), if et,δ,w := {i ∈ SN : ∃m, m0 ∈ {1, ..., M }, m 6= m0 , such that |Ym (iw−1 , t) − Ym0 (iw−1 , t)| < δ}, A

t ≥ 0,

et,δ,w | < cδwN for all t ∈ {0, ∆t, 2∆t, . . . , dT /∆te∆t} and all sufficiently large w ≥ w0 . The constant then |A c satisfies the same properties as in Lemma 3.3, and the constant w0 depends on T , ∆t, and the σ-algebra generated by (B(x))x∈S N . Proof. Note that for any m, m0 ∈ {1, . . . , M } satisfying m 6= m0 the field Ym (·, t) − Ym0 (·, t) has the law of a constant multiple of Bm plus some element of L2t (S N ). The estimate (23) implies that λ(At,δ ) <

1 cδ, 2

At,δ := {x ∈ S N : |Ym (x, t) − Ym0 (x, t)| < δ}.

(33)

By e.g. [Str11, Lemma 4.2.6] and since At,δ is open, X et,δ,w |. λ(At,δ ) = lim w−N 1|Ym (iw−1 ,t)−Ym0 (iw−1 ,t)|<δ = lim w−N |A w→∞

w→∞

i∈SN

et,δ,w | < cδwN for all sufficiently large values of w for fixed t ∈ [0, T ], so |A et,δ,w | < cδwN for all Therefore |A sufficiently large values of w and all t ∈ {0, ∆t, 2∆t, . . . , dT /∆te∆t}. 31

Proof of Lemma 3.4. Throughout the proof the implicit constant of will depend only on T and (B(x))x∈S N . The function v will change throughout the proof, but will always satisfy the properties of the function v in the statement of the lemma. The expression ”for all sufficiently large w” means that a statement is true for all w 1. Fix m ∈ {1, ..., M }. For any i ∈ Se let Ui,t = 1X(i,w−N/2 t− )6=m , and let Vi be i.i.d. Bernoulli random variables with P[Vi = 0] = M −1 and P[Vi = 1] = 1 − M −1 . Recall the definition (1) of R, and for i ∈ SN define the following sets R0 , R00 , R0i , R00,i ⊂ R R0 := {(j, tw−N/2 ) ∈ R : t ∈ (t0 , t0 + ∆t]},

R00 := {(j, t) ∈ R0 : 6 ∃s ∈ [t0 , t) such that (j, s) ∈ R0 },

R0i := {(j, t) ∈ R0 : j ∈ N(i)},

R00,i := {(j, t) ∈ R00 : j ∈ N(i)} = R00 ∩ R0i .

Note that several of the random variables or sets we have defined above depend on w, but we have chosen not to indicate the w dependence in order to simplify notation. We have k(Ymw (·, t0 + ∆t) − Ym (·, t0 + ∆t)) − (Ymw (·, t0 ) − Ym (·, t0 ))kL∞ (SN ) X X = sup w−N/2 1X(i,t− )6=X(i,t)=m − w−N/2 1m=X(i,t− )6=X(i,t) N j∈S

(i,t)∈R0j

− (1 − M −1 )

Z

(i,t)∈R0j

t0 +∆t Z

t0

1p(Y (x,t))=m dx dt + M −1

x∈N (jw−1 )

Z

t0 +∆t

t0

Z x∈N (jw−1 )

1p(Y (x,t))6=m dx dt .

To conclude the proof of the lemma it is sufficient to show that with probability > v(w), the right side is ∆t(∆t + δ). By the triangle inequality, k(Ymw (·, t0

+ ∆t) − Ym (·, t0 + ∆t)) −

(Ymw (·, t0 )

− Ym (·, t0 ))kL∞ (SN ) ≤

7 X

(Av + A0v ),

v=1

where A1

A2

A3

A4

A5

A6 A7

X −N/2 X −N/2 = sup w 1X(i,t− )6=X(i,t)=m − w 1p(Y w (iw−1 ,t))=m Ui,t , j∈SN (i,t)∈R0j (i,t)∈R0j X −N/2 X = sup w 1p(Y w (iw−1 ,t))=m Ui,t − w−N/2 1p(Y w (iw−1 ,t0 ))=m Ui,t , j∈SN (i,t)∈R0j (i,t)∈R0j X −N/2 X −N/2 = sup w 1p(Y w (iw−1 ,t0 ))=m Ui,t − w 1p(Y w (iw−1 ,t0 ))=m Ui,t0 , j∈SN (i,t)∈R0j (i,t)∈R00,j X −N/2 X −N/2 = sup w 1p(Y w (iw−1 ,t0 ))=m Ui,t0 − w 1p(Y w (iw−1 ,t0 ))=m Vi , j∈SN (i,t)∈R00,j (i,t)∈R00,j Z −N/2 X = sup A5,j , A5,j = w 1p(Y w (iw−1 ,t0 ))=m Vi − ∆t(1 − M −1 ) 1p(Y w (x,t0 ))=m dx , j∈SN x∈N (jw−1 ) (i,t)∈R00,j Z Z −1 = (1 − M )∆t sup 1p(Y w (x,t0 ))=m dx − 1p(Y (x,t0 ))=m dx , N −1 −1 j∈S x∈N (jw ) x∈N (jw ) Z Z t0 +∆t Z −1 = (1 − M ) sup ∆t 1p(Y (x,t0 ))=m dx − 1p(Y (x,t))=m dx dt , j∈SN x∈N (jw−1 ) t0 x∈N (jw−1 )

where we view iw−1 as an element of S N by identifying S (resp. S) with [0, R) (resp. {0, . . . , Rw − 1}). Define A0v exactly as Av , except that = m is replaced by 6= m, 1 − M −1 is replaced by M −1 , Ui,t is replaced by 1 − Ui,t , and Vi is replaced by 1 − Vi . 32

For each j ∈ {1, ..., 7} we will show that with probability > v(w) we have Aj ∆t(∆t + δ). The term A0j can be bounded exactly as Aj for all j, and the proof of its bound will therefore be omitted. First we show that A1 ∆t(δ + ∆t) with probability > v(w). Since |R0 | is a Poisson random variable with parameter RN ∆twN/2 , we may assume that |R0 | < 2RN ∆twN/2 , since P[|R0 | < 2RN ∆twN/2 ] ≥ v(w). This implies kY w (·, t) − Y (·, t)kL∞ (S N ) ≤ δ + c∆t for all t ∈ [t0 , t0 + ∆t] and some appropriate c as in the statement of the lemma. Under these assumptions, X X X 1p(Y w (iw−1 ,t))=0 ≤ w−N/2 A1 ≤ w−N/2 1|Ym0 (iw−1 ,t)−Ym (iw−1 ,t)|≤2δ+2c∆t . (i,t)∈R0 1≤m,m0 ≤M,m6=m0

(i,t)∈R0

By independence of R0 and Ft0 , conditioned on Ft0 the right side has the law of the sum of i.i.d. {0, 1}-valued random variables. By Lemma 3.5 the probability that this random variable equals 1 is δ + ∆t. The bound for A1 now follows by the assumed upper bound for |R0 | and a Chernoff bound. We will bound A2 by using (23) and the assumption kY w (·, t0 ) − Y (·, t0 )kL∞ (SN ) < δ. As above we e ⊂ SN by assume |R0 | < 2RN ∆twN/2 . Define A e = {i ∈ SN : ∃m0 ∈ {1, ..., M }\{m} such that |Ymw0 (w−1 i, t0 ) − Ymw (w−1 i, t0 )| ≤ 2w−N/2 |R0 |}. A e (∆t+δ)wN for all sufficiently large w. If i ∈ A e the assumption kY (·, t0 )−Y w (·, t0 )kL∞ (SN ) < We claim that |A| 0 δ implies that there exists m ∈ {1, ..., M }\{m}, such that |Ym (iw−1 , t0 ) − Ym0 (iw−1 , t0 )| ≤ 2w−N/2 |R0 | + 2δ < 4RN ∆t + 2δ.

(34)

It follows by Lemma 3.5 that the set of nodes i satisfying (34) is wN (∆t + δ), and our claim follows. If i ∈ SN is such that there is a t ∈ [t0 , t0 + ∆t] for which p(Y w (iw−1 , t)) 6= p(Y w (iw−1 , t0 )), then we e by the definition of Y w . Therefore must have i ∈ A X A2 ≤ w−N/2 1i∈Ae. (i,t)∈R0

e and proceeding exactly as in the proof of We conclude the bound for A2 by using independence of R0 and A A1 . Next we claim that with probability > v(w) we have A3 < (∆t)2 . For each i ∈ SN let Pi be the Poisson random variable with parameter ∆tw−N/2 which denotes the number of rings of Poisson clock i during the interval (t0 , t0 + ∆t]. Then X A3 ≤ w−N/2 max{Pi − 1, 0}. i∈SN

Since E[max{Pi − 1, 0}] (∆t)2 w−N , we have E[A3 ] (∆t)2 w−N/2 . By Chebyshev’s inequality P[A3 ≥ (∆t)2 ] w−N/2 , which implies our claim. We will bound A4 by approximating the region where p(Y w (iw−1 , t0 )) = m by small cubes of side length −1 L , and by proving that when a node i is sampled uniformly from one of these cubes and L w, then Ui,t0 and Vi have approximately the same distribution. As in our proof for the bound of A3 we can assume |R0 | < 2RN ∆twN/2 . Let L = dw1/2 e and divide S N into (RL)N disjoint cubes of side length L−1 . For any e ⊂ SN by i ∈ SN let IiL denote the cube containing iw−1 . Define A ⊂ S and A A = {x ∈ S N : p(Y (x, t0 )) = m},

e = {i ∈ SN : IiL ⊂ A}. A

We have A4 ≤ w−N/2

X

|1p(Y w (iw−1 ,t0 ))=m − 1i∈Ae| + sup w−N/2 j∈SN

(i,t)∈R0

33

X

(Ui,t0 − Vi ).

e (i,t)∈R00,j ,i∈A

(35)

We will prove that A4 < δ∆t with probability > v(w). Any i ∈ SN for which 1p(Y w (iw−1 ,t0 ))=m 6= 1i∈Ae must satisfy one of the following conditions: (i) p(Y w (iw−1 , t0 )) 6= p(Y (iw−1 , t0 )), or (ii) p(Y (iw−1 , t0 )) = m and e i 6∈ A. We will prove that the number of nodes satisfying one of the conditions (i)-(ii) is wN δ with probability > v(w). If i ∈ SN satisfies (i), by definition of Eδw there is an m0 ∈ {1, ..., M }, m0 6= m, such that |Ym (iw−1 , t0 )−Ym0 (iw−1 , t0 )| < 2δ, and the wanted result follows by Lemma 3.5. If i satisfies (ii), the function Ym (·, t0 )−Ym0 (·, t0 ) intersects zero in IiL . By our estimates for the event G1k in Proposition 2.10, it holds with probability > v(w) that the number of such cubes is < LN −1/2+1/100 for all t ∈ [0, T ]. Using L = dw1/2 e, it follows that for w > δ −4 the number of nodes i ∈ Se satisfying (ii) is LN −1/2+1/100 (w/L)N wN δ with probability > v(w). This completes the proof that the number of nodes satisfying one of the conditions (i)-(iii) is wN δ with probability > v(w). Given any i ∈ SN the events {∃t ∈ (t0 , t0 + ∆t] such that (i, t) ∈ R0 } and 1p(Y w (i,t0 ))=m 6= 1i∈Ae are independent. Proceeding as when bounding A2 and A3 , we see that the first term on the right side of (35) is δ∆t with probability > v(w). Next we will prove that the second term on the right side of (35) converges to 0 in probability as w → ∞. Since supx∈S |Y w (x, t0 )| < δ + supx∈S |Y (x, t0 )| < ∞ on Eδw , the difference in probability between the events {Ui,t0 = 1} and {Vi = 1} is w−N/2 when we sample i uniformly from one of the cubes IjL . Since R0 is independent of Ui,t0 and Vi for all i ∈ N(j), the second term on the right side of (35) is stochastically dominated by w−N/2 times the sum of < |R0 | ∆twN/2 i.i.d. random variables taking values in {−1, 0, 1} and with expectation w−N/2 . Our claim follows by a Chernoff bound and a union bound. f0 0,j denote the first Next we claim that A5 < δ∆t with probability > v(w). If |R00,j | ≥ ∆t2N wN/2 let R N N/2 0 f0 0,j d∆t2 w e rings of the Poisson clocks during the interval [t0 , t0 + ∆t], and if |R | < ∆t2N wN/2 let R 0,j

denote the union of R00,j and (d∆t2N wN/2 e − |R00,j |) pairs (i, t0 + ∆t), where the i’s are pairwise different and sampled independently and uniformly from N(j). By the triangle inequality and letting ∆ denote symmetric difference, X Vi A5,j ≤ w−N/2 f0 0,j (i,t)∈R00,j ∆R

X + w−N/2 f0 (i,t)∈R

−1 1p(Y w (iw−1 ,t0 ))=m Vi − ∆t(1 − M ) 1p(Y w (iw−1 ,t0 ))=m dx . x∈N (jw−1 )

(36)

Z

0,j

We will prove that P[A5,j > w−1/100 ] decays faster than any power of w when w → ∞, which is sufficient to complete the proof of our bound for A5 . We see immediately that the first term on the right side of (36) decays sufficiently fast. By independence of R00,j and Ft0 , the second sum on the right side of (36) is, conditioned on Ft0 , equal in law to w−N/2 times the sum of ∆twN/2 independent bounded centered random variables. We obtain the desired bound by a Chernoff bound. Now we will prove that A6 δ∆t with probability > v(w). By first using kYmw (·, t0 )−Ym (·, t0 )kL∞ (S N ) < δ and (22), and then using Ym − Bm ∈ LtM (S N ) and (23) for all t ∈ [t0 , t0 + ∆t], we get Z X A6 ≤ ∆t sup 1|(Ym (t0 ,x0 )−Ym0 (t0 ,x0 ))|≤2δ dx0 N 1≤m,m0 ≤M,m6=m0 x∈S

X

≤ ∆t

|x−x0 |≤1

Z sup

2t0 (S N ) 1≤m,m0 ≤M,m6=m0 f ∈L

SN

1|(Bm (x)−Bm0 (x))−f (x)|<2δ dx

δ∆t. Finally we will bound A7 . By Lemma 2.12 and kYm (·, t) − Ym (·, t0 )kL∞ (S N ) ≤ 2∆t for all t ∈ [t0 , t0 + ∆t], we have A7 (∆t)2 . Combining the above estimates for Aj , j = 1, ..., 7, we obtain the lemma by a union bound. 34

Proof of Lemma 3.3. For any x ∈ S N there are αi (x) ∈ [0, 1] and xi (x) ∈ SN for i = 1, . . . , 2N such that P2N P2N kxi (x) − xk∞ ≤ w−1 and i=1 αi (x) = 1, and such that for any t ≥ 0, Y w (x, t) = i=1 αi (x)Y w (xi (x), t). For any x ∈ S N define ∆Y w (x) := Y w (x, t0 + ∆t) − Y w (x, t0 ) and ∆Y (x) := Y (x, t0 + ∆t) − Y (x, t0 ). Observe that N

w

∆Y (x) − ∆Y (x) =

2 X

N

w

αi (x)(∆Y (xi (x)) − ∆Y (xi (x))) +

i=1

2 X

αi (x)(∆Y (xi (x)) − ∆Y (x)).

By uniform continuity of Y , which follows from uniform continuity of B, X sup αi (x)(∆Y (xj (x)) − ∆Y (x)) → 0 as w → ∞, x∈S N

(37)

i=1

(38)

i

and the rate of convergence depends only on T and the σ-algebra generated by (B(x))x∈S N . By Lemma 3.4, w (39) 1Eδw P sup k∆Y (x) − ∆Y (x)k ≤ c∆t(∆t + δ) | Ft0 ≥ 1Eδw v(w). x∈SN

We obtain the desired bound for ∆Y w (x) − ∆Y (x) by combining (37), (38) and (39). The following lemma will be needed to transfer the result of Proposition 3.1 from S to R. It says that a discrete version of Lemma 2.16 holds with high probability for large w. Lemma 3.6. Consider the Schelling model on S (resp. Z) with (in the notation of Section 1.3) N = 1, M ∈ {2, 3, . . . } and N = N∞ . Let Y w ∈ CM (S × R+ ) (resp. Y w ∈ CM (R × R+ )) be given by (3). Fix an interval I ⊂ R of length > 1 and some m ∈ {1, . . . , M }. For any > 0 and t ≥ 0 define the event Et by ( ) Et :=

Ymw (x, t) >

sup m0 ∈{1,...,M }\{m}

Ymw0 (x, t) + , ∀x ∈ I

.

c S For any > 0, limw→∞ P E0 \ t∈[0,−1 ] Et0 = 0. Proof. Define Yemw (x, t) := Ymw (x, t) − supm0 ∈{1,...,M }\{m} Ymw0 (x, t). For any interval I ⊂ R and w ∈ N, let wI = {wx : x ∈ I}. Define stopping times T and Ti for i ∈ wI by Ti = inf{t ≥ 0 : Yemw (w−1 i, t) ≤ 0}, Since

S

t∈[0,−1 ]

0 c

Et

T = inf Ti . i∈(wI)

⊂ {T ≤ −1 } it is sufficient by a union bound to prove that for each fixed i ∈ wI log P[E0 ; Ti = T ≤ −1 ] −w,

where the implicit constant can depend on all parameters except w and i. Letting tn := −1 w−0.1 n for n ∈ {0, 1, . . . , dw0.1 e}, we observe that 1 {E0 ; Ti = T ∈ [tn , tn+1 ]} ⊂ E0 ; tn ≤ Ti = T ; Yemw (w−1 i, tn ) < Yemw (w−1 i, 0) 2 1 e w −1 ∪ E0 ; |{(j, w−1/2 t) ∈ R : j ∈ N(i), t ∈ [tn , tn+1 ]}|w−1/2 > Ym (w i, 0) . 100 It follows by a union bound that dw0.1 e−1

P[E0 ;

Ti = T ≤

−1

]≤

X n=0

dw0.1 e−1

+

X n=0

1 e w −1 w −1 e P E0 ; tn ≤ Ti = T ; Ym (w i, tn ) < Ym (w i, 0) 2

1 e w −1 P E0 ; |{(j, w−1/2 t) ∈ R : j ∈ N(i), t ∈ [tn , tn+1 ]}|w−1/2 > Ym (w i, 0) . 100 35

Since Yemw (w−1 i, 0) > on the event E0 , the logarithm of the last sum is −w, so to conclude the proof of the lemma it is sufficient to show that for each fixed n ∈ {0, 1, . . . , dw−0.1 e − 1}, 1 e w −1 w −1 e log P E0 ; tn ≤ Ti = T ; Ym (w i, tn ) < Ym (w i, 0) −w, (40) 2 where the implicit constant can depend on all parameters except w, i, and n. Fix n ∈ {0, 1, . . . , dw−0.1 e − 1}, and define N+ := {j ∈ N(i) : X(j, 0) 6= m, jw ∈ I},

R+ := {j ∈ N+ : ∃t ∈ [0, tn ] such that (j, tw−1/2 ) ∈ R},

N− := {j ∈ N(i) : X(j, 0) = m, jw 6∈ I},

R− := {j ∈ N− : ∃t ∈ [0, tn ] such that (j, tw−1/2 ) ∈ R}.

By large deviation estimates for Bernoulli random variables, o n c b −w, b := |R± | − (1 − e−w−1/2 tn )|N± | < w0.1 . log P E E

(41)

Furthermore, observe that |N+ | − |N− | = |{j ∈ N(i) : (jw) ∈ I}| − |{j ∈ N(i) : X(j, 0) = m}| 2w + 1 1/2 w −1 + w Ym (w i, 0) ≥ (w + 1) − M

(42)

> −w1/2 Ymw (w−1 i, 0), b where the inequality on the second line follows by the definition of Y w . By (42), the definition of E, −1 −1/2 −w−1/2 tn | ≤ 2 w , and |1 − e 1t ≤T Ye w (w−1 i, tn ) ≥ 1t ≤T Ye w (w−1 i, 0) + 2w−1/2 |R+ | − 2w−1/2 |R− | , n

m

n

m

b it follows that on the event {E0 ; tn ≤ Ti = T ; E}, Yemw (w−1 i, tn ) ≥ Yemw (w−1 i, 0) − 2−1 w−1/2 Ymw (w−1 i, 0) − 4w−0.4 . Since log P[2−1 w−1/2 Ymw (w−1 i, 0) > Yemw (w−1 i, 0)] −w, this result and (41) implies (40). Proposition 3.1 now follows by iterating the estimate of Lemma 3.3. Proof of Proposition 3.1. First consider the case V = S and N ∈ N. Couple the discrete and continuum Schelling model as in Lemma 3.2. Let T, ∆t > 0. Conditioned on B, let c and v be the (random) constant and function, respectively, of Lemma 3.3. Recall that c depends on B and T , while v depends on B, T , ∆t, w and the error E(t0 ) with E as in Lemma 3.3. By Lemma 3.3, and with F0 and Ec∆t 2 for t0 = 0 as in that lemma, w 1E w 2 P kY (·, ∆t) − Y (·, ∆t)kL∞ (S N ) < c2 (∆t)3 + 2c(∆t)2 | F0 > 1E w 2 v(w). c∆t

c∆t

Iterating the result of Lemma 3.3, we get further that for any n ∈ N, h 1E w 2 P kY w (·, ne∆t) − Y (·, ne∆t)kL∞ (S N ) < ∆t(1 + c∆t)ne+1 − ∆t, c∆t

i n e ∈ {0, ..., n} | F0 > 1E w

c∆t2

v(w)n .

w We need n0 := dT /∆te time steps to reach time T , so conditioned on F0 and on the event Ec∆t 2 , with n0 probability at least v(w) and ∆t < 1/(100c),

kY w (·, n e∆t) − Y (·, n e∆t)kL∞ (S N ) < ∆t(1 + c∆t)n0 +1 − ∆t < (2ecT − 1)∆t,

36

∀e n ∈ {0, . . . , n0 }.

(43)

With probability converging to 1 as w → ∞, for any interval I = [∆te n, ∆t(e n + 1)] and node i ∈ SN , the total number of times during I at which the Poisson clock of a node in N(i) rings, is ≤ 10N wN/2 ∆t. Therefore, with probability converging to 1 as w → ∞, sup

sup

sup kY w (iw−1 , n∆t + d) − Y w (iw−1 , n∆t)kL∞ (S N ) ≤ 10N ∆t.

i∈SN 0≤n≤n0 d∈[0,∆t]

Combining this estimate with (43), for any given 0 > 0 and for all w sufficiently large as compared to 0 , " # 1E w 2 P sup sup kY (x, t) − Y w (x, t)kL∞ (S N ) ≤ (2ecT + 10N )∆t F0 > 1E w 2 (v(w)n0 − 0 ). c∆t

c∆t

x∈S N t∈[0,T ]

w Since P[Ec∆t 2 ] → 1 as w → ∞, for all sufficiently large w, "

P

sup x∈S N

w

sup kY (x, t) − Y (x, t)kL∞ (S N ) ≤ (2e

# cT

N

+ 10 )∆t > v(w)n0 − 20 .

t∈[0,T ]

We first make (2ecT + 10N )∆t arbitrarily small by decreasing ∆t, and then we make v(w)n0 − 20 arbitrarily close to 1 by sending 0 → 0 and w → ∞. It follows that supt∈[0,T ] kY w (·, t) − Y (·, t)kL∞ (S N ) → 0 in probability. By the Skorokhod representation theorem we can couple the model for different values of w, such that we obtain almost sure convergence. This concludes the proof in the case V = S. Now consider the case V = R and N = 1. Let > 0. For R > 2(−1 + 2) define the event ER by ER = {∃a− ∈ [−R/2 + 2, −−1 ], a+ ∈ [−1 , R/2 − 2] : Y1 (x, 0) >

sup

Ym0 (x, 0) + ,

m0 ∈{2,...,M }

∀x ∈ [a− − 2, a− ] ∪ [a+ , a+ + 2]}. Choose R sufficiently large such that P[ER ] > 1 − /2. Let Y (resp. Yb ) denote the solution of (5), (7) on R (resp. S = [−R/2, R/2]), and let Y w (resp. Yb w ) be given by (3) for the Schelling model on Z (resp. S). We will argue that we can couple Y, Yb , Y w , and Yb w such that with probability at least 1 − , Y w → Y uniformly on [−−1 , −1 ] × [0, −1 ]. This will be sufficient to complete the proof of the proposition since was arbitrary. By the convergence result for the torus proved above, we can couple Yb w and Yb such that Yb w |[−R/2,R/2]×[0,−1 ] converges uniformly to Yb |[−R/2,R/2]×[0,−1 ] . Furthermore, on ER we can couple Y and Yb such that Y |[a− ,a+ ] = Yb |[a− ,a+ ] , since the law of the initial conditions are the same, and since Lemma 2.16 implies that p(Y (x, t)) = p(Yb (x, t)) = 1 for all x ∈ [a− − 1, a− ] ∪ [a+ , a+ + 1] and t ≥ 0. To complete the proof of the proposition it is sufficient to prove that on ER we can couple Y w and Yb w such that Y w |[a− ,a+ ]×[1,−1 ] = Yb w |[a− ,a+ ]×[1,−1 ] with probability at least 1 − /2. Consider a coupling of Y w and Yb w such that the initial opinion of the nodes corresponding to the interval − [a − 1, a+ + 1] is identical for the models on Z and S, and such that the set of rings of Poisson clocks corresponding to this interval, i.e. the set {(i, t) ∈ R : (a− −1)w ≤ i ≤ (a+ +1)w}, is the same for the models on Z and S. We also assume that draws as described in (iii) of Section 1.3 are resolved in the same way. By Lemma 3.6, p(Y w (x, t)) = p(Yb w (x, t)) = 1 for all x ∈ [a− − 2, a− ] ∪ [a+ , a+ + 2] and t ≥ 0 with probability at least 1 − /2 for sufficiently large w. On the event that this happens Y w |[a− ,a+ ]×[0,−1 ] = Yb w |[a− ,a+ ]×[0,−1 ] for all t ∈ [0, −1 ], so we have obtained an appropriate coupling.

3.2

Limiting states for the one-dimensional discrete Schelling model

In this section we will first conclude the proof of Theorem 1.2. Then we will prove that the opinion of each node in the Schelling model in any dimension converges almost surely, and we will present a result on stable configurations in the higher-dimensional Schelling model. The main inputs to our proof of Theorem 1.2 are Propositions 2.14 and 3.1. We consider a coupling of the discrete and continuum Schelling model as in Proposition 3.1, and choose a sufficiently large t ≥ 0 such 37

that the limiting configuration of the continuum Schelling model described in Proposition 2.14 is almost obtained; more precisely, we choose t sufficiently large such that with high probability 0 is contained in an interval of length strictly larger than 1 on which p ◦ Y (·, t) is constant. Let m ∈ {1, . . . , M } denote the value of p ◦ Y (·, t) in this interval. Recall that by the scaling we used when defining Y w in (3), a time t for Y corresponds to time tw−1/2 for the discrete Schelling model. To conclude the proof it will be sufficient to prove that p ◦ Y w = m in the interval of length > 1 identified above until all nodes in this interval have changed opinion to m. We will first prove a lemma (Lemma 3.7) which says, roughly speaking, that p ◦ Y = m in the interval for a macroscopic time with high probability, and then we prove (Lemma 3.8) that conditioned on the event of Lemma 3.7, p ◦ Y = m in the interval throughout dw0.02 e time intervals of length w−0.01 with very high probability. In each step of the proof we allow the interval on which p ◦ Y = m to shrink slightly. For nodes i bounded away from the boundary of the interval, we can guarantee that p ◦ Y = m by using (among other properties) that the fraction of nodes in N(i) which have a bias towards m is strictly larger than 1/2 for (almost) the full time interval we consider; therefore the bias of i towards m will have an upwards drift and never become negative. For nodes i near the boundary of our interval, however, up to half of the nodes in N(i) may have a bias towards another opinion than m, so we do not necessarily have an upward drift, and the node may eventually get a bias towards another opinion. For such nodes we can guarantee that the bias will not become negative too fast, by using that the node typically has a strong bias towards m at the beginning of the time interval we consider. We show that the interval on which p ◦ Y = m shrinks sufficiently slowly, such that all nodes on a subinterval of length > 1 get opinion m before the interval vanishes. Define X X w 1X(j,t)=m0 sup Y m (i, t) := 1X(j,t)=m − m0 ∈{1,...,M }\m

j∈N(i)

j∈N(i)

! =w

1/2

Ymw (i/w, tw1/2 )

−

sup m0 ∈{1,...,M }\m

Ymw0 (i/w, tw1/2 )

.

w

Observe that if Y m (i, t) > 0 then m is the most common opinion in the neighborhood of node i at time t. In the statement and proof of the following lemma wI = {wx : x ∈ I} for any interval I ⊂ R and w ∈ N. Lemma 3.7. Couple the discrete and continuum Schelling model on V as described in Proposition 3.1, where V = R or V = S, and N = 1, M ∈ {2, 3, . . . }, and N = N∞ . Let {A1 , . . . , AM } be as defined in Proposition b to be the event that the set R\∪1≤m≤M Am has measure zero, and that each set Am can be 2.14, and define E b occurs, choose a ∈ V in a σ(B)-measurable way such that written as the union of intervals of length > 1. If E a ∈ ∪m∈{1,...,M } Am almost surely, and let m ∈ {1, . . . , M } be such that a ∈ Am . Let c1 , c2 ∈ (0, 1/10), let I 0 be the connected component of Am containing a, and let I ⊂ I 0 be the open interval with left (resp. right) b occurs, end-point at distance c2 from the left (resp. right) end-point of I 0 . Let E = Ecw1 ,c2 be the event that E w

−1/2

that I has a length between 1 + c2 and c−1 2 , and Y m (i, t) > 0 for all i ∈ (wI) ∩ Z and t ∈ [c1 b c ] = 1. Then limc2 →0 limc1 →0 limw→∞ P[E ∪ (E)

w−1/2 , c1 ].

Proof. First we give a brief outline of the proof. For small c1 and large w it holds with high probability (by −1/2 Proposition 3.1) that all nodes in wI have a large bias towards opinion m at time c1 w−1/2 . In particular, w −1/2 w−1/2 Y m (i, c1 w−1/2 ) 1 for nodes i in wI. We consider the system until (roughly speaking) the first w −1/2 time Tb > c1 w−1/2 at which Y m (i, Tb) ≤ 0 for some node i in wI; note that until this time occurs all nodes in wI will have a bias towards m. We show that Tb > c1 with high probability by arguing that each w −1/2 individual node i in wI is unlikely to be the first node in wI for which Y m (i, t) ≤ 0 if t ∈ [c1 w−1/2 , c1 ]. −1/2 Let the nodes i∗1 and i∗2 represent the two end-points of (wI) ∩ Z. If i = i∗1 and for t ∈ [c1 w−1/2 , Tb], the w evolution of w−1/2 Y m (i, t) is (approximately) bounded below by a Brownian motion with a weak downward drift starting from a large positive value, since we know that at least half of the neighbors of i∗1 have a bias 38

w

towards m. This implies that Y m (i, t) will not reach zero before time c1 1 with high probability. We argue similarly for i = i∗2 . If i is contained in wI and has distance Ω(w) from the boundary of wI, then w −1/2 Y m (i, t) has an upward drift for t ∈ [c1 w−1/2 , Tb], since the fraction of neighbors of i which have a bias w towards m is uniformly above 1/2; therefore Y m (i, t) is unlikely to get negative before time c1 . If i is close w to the boundaries of wI, but not equal to i∗1 or i∗2 , we conclude that Y m (i, t) is unlikely to get negative by comparing with i∗1 or i∗2 . b = 0, so we may assume P[E] b > 0. We will condition on the Note that the lemma clearly holds if P[E] b throughout the proof of the lemma. In other words, all objects we define are defined conditional event E b Let Ie ⊂ I 0 be the open interval with left (resp. right) end-point at distance c2 /2 from the left (resp. on E. e Let i∗ (resp. i∗ ) be the smallest (resp. largest) element of right) end-point of I 0 , and observe that I ⊂ I. 1 2 (wI) ∩ Z, and let R denote the set of rings as defined in (1). Define the following random variables A1 , A2 ∈ {0, 1, 2, . . . }, where | · | denotes the number of elements in a set −1/2

A1 := |{(j, t) ∈ R : c1

w−1/2 ≤ t ≤ c1 , j ∈ {i∗1 − w, . . . , i∗2 + w}|, −1/2

A2 := |{(j, t) ∈ R : j ∈ {i∗1 − w, . . . , i∗2 + w}, 0 ≤ t ≤ c1 −1/2

∃(j, t0 ) ∈ R such that c1

w−1/2 ,

(44)

w−1/2 ≤ t0 ≤ c1 }|,

Let A3 ∈ R be a random variable which is equal to the infimum in [0, 1] such that the following inequalities are satisfied 1 − A3 w−1/2 , 2 1 ≤ + A3 w−1/2 , 2

|{i ∈ N (i∗k ) ∩ (wI) : X(i, 0) 6= m}|w−1 ≥

k = 1, 2,

|{i ∈ N (i∗k ) \ (wI) : X(i, 0) = m}|w−1

k = 1, 2.

Define A4 ∈ R by

(45)

−1/2 A4 := inf Ymw (x, c1 ) − x∈I

X

−1/2

Ymw0 (x, c1

) .

m0 6=m

Let E be the event 1 E := ∀i ∈ (wI), |{j ∈ N (i) ∩ (wI) : X(i, 0) 6= m}|w−1 > − w−1/4 2 1 ∩ |{j ∈ N (i) \ (wI) : X(i, 0) = m}|w−1 < + w−1/4 . 2 e=E ecw ,c by Define the event E 1 2 e = {A1 < 2c1 c−1 w} ∩ {A2 < 2c1/2 c−1 w1/2 } ∩ {A3 < c−1/10 } ∩ {A4 > 2} ∩ {1 + c2 < λ(I) < c−1 } ∩ E. (46) E 2 1 2 1 2 The probability of the first, second and sixth event on the right side of (46) converge to 1 as w → ∞ for any fixed c1 , c2 ∈ (0, 1/10) if we condition on the fifth event {1 + c2 < λ(I) < c−1 2 }. For M = 2 the probability of the third event on the right side of (46) converges to a constant as w → ∞, and it converges to 1 when first w → ∞ and then c1 → 0. For M > 2 the probability of the P third event on the right side of (46) converges to 1 as w → ∞. It is immediate from (5) that Ym (·, t) − m0 6=m Ym0 (·, t) → ∞ uniformly on I as t → ∞. Therefore it follows from Proposition 3.1 that the probability of the fourth event on the right side of (46) converges to a constant as w → ∞, and it converges to 1 as first w → ∞ and then c1 → 0. The probability of the fifth event on the right side of (46) is independent of w and c1 , and converges to 1 as c2 → 0. Therefore e = 1. lim lim lim P[E]

c2 →0 c1 →0 w→∞

39

(47)

−1/2

For any j ∈ N let t1j be the jth smallest element of {t ≥ c1 w−1/2 : ∃x ∈ N (i∗1 ) such that (x, t) ∈ R}. w w w Then define i1j ∈ N (i∗1 ) such that (i1j , t1j ) ∈ R, and define Rj1 := (Y m (i∗1 , tj )−Y m (i∗1 , t− j ))1(i1 >i∗ )∨(Y (i1 ,t1 )<0) . 1

j

m

j

j

Define (i2j , t2j ) (resp. Rj2 ) exactly as (i1j , t1j ) (resp. Rj1 ), but with i∗2 in place of i∗1 , and where we require ij < i∗2 instead of ij > i∗1 in the indicator in the definition of Rj2 . Now define the following stopping times Tj for j = 1, 2, 3 X −1/2 T1 := inf t ≥ c1 w−1/2 : Rj1 ≤ −w1/2 , j : t1j ≤t X −1/2 Rj2 ≤ −w1/2 , T2 := inf t ≥ c1 w−1/2 : (48) j : t2j ≤t n −1/2 T3 := inf t ≥ c1 w−1/2 : ∃i ∈ wI such that i ∈ {i∗1 + c2 w + 1, i∗1 + c2 w + 2, . . . , i∗2 − c2 w − 1} o w and Y m (i, t) ≤ 0 . We will now argue that w

Y (i, t) > 0

∀i ∈ wI

if

e occurs. t < T1 ∧ T2 ∧ T3 and E

(49)

e occurs and t < T1 ∧ T2 ∧ T3 , then Y(i, t) > 0 for any To prove this it is sufficient to show that if E i ∈ {i∗1 , . . . , i∗1 + c2 w} ∪ {i∗2 − c2 w, . . . , i∗2 }, since the inequality Y(i, t) > 0 clearly holds for other i by the definition of T3 . For i ∈ {i∗1 , . . . , i∗1 + c2 w} this follows by first observing that the bias is positive for all nodes −1/2 in N(i) \ N(i∗1 ) throughout [c1 w−1/2 , t], so X w w w w −1/2 −1/2 Y m (i, t) − Y m (i, c1 w−1/2 ) ≥ Y m (i∗1 , t) − Y m (i∗1 , c1 w−1/2 ) ≥ Rj1 , j : t1j ≤t w

w

−1/2

and then using the definition of T1 and Y m (i∗1 , c1 w−1/2 ) ≥ A4 w1/2 > 2w1/2 to conclude that Y m (i, t) > 0. We argue similarly for i ∈ {i∗2 − c2 w, . . . , i∗2 }, and can conclude that (49) holds. e ∩ {c1 < T1 ∧ T2 ∧ T3 } ⊂ E, Since (49) implies that E e c ] + P[E; e T1 ≤ T2 ∧ T3 ; T1 ≤ c1 ] + P[E; e T2 ≤ T1 ∧ T3 ; T2 ≤ c1 ] + P[E; e T3 ≤ T1 ∧ T2 ; T3 ≤ c1 ]. (50) P[E c ] ≤ P[E We will bound the probability of each term on the right side separately. By (47), and since the second and third term on the right side have the same probability, it is sufficient to bound the second term and the fourth term on the right side of (50). First we bound the second term on the right side of (50). Recall the definition of (i1j , t1j )j∈N above. Define ej to be equal to 2 (resp. −2) iff both the conditions X(i1 , 0) 6= m (resp. X(i1 , 0) = m) and i1 ∈ (wI) (resp. R j j j ej = 0 otherwise. Observe that R1 , R ej ∈ {−2, 0, 2} for all j ∈ N, and i1j 6∈ (wI)) are satisfied, and define R j ej ≤ R1 on the event E e if t1 < T1 ∧ T2 ∧ T3 and if there is no t0 such that (i1 , t0 ) is contained in the set that R j j j 1/2 e considered when defining A2 in (44). Therefore, using that A1 < 2c1 c−1 w and A2 < 2c c−1 w1/2 on E, 1

2

1E;T e 1 ≤T2 ∧T3

−1/2

c1

inf w−1/2 ≤t≤c1 ∧T1

X

Rj1 ≥ 1E;T e 1 ≤T2 ∧T3

j : t1j ≤t

min −1

1≤k≤2c1 c2 w

k X

2

ej − 2c1/2 c−1 w1/2 . R 1 2

(51)

j=1

e (in particular, the requirement on A3 ) and for w sufficiently large, conditioned on σ((X|t=0 ) On the event E ej )j∈N stochastically dominates a sequence (R bj )j∈N of i.i.d. random variables which are equal the sequence (R −1/10 −1/2 −1/10 −1/2 1 1 to 2 (resp. −2) with probability 2 ( 2 − c1 w ) (resp. 12 ( 12 + c1 w )), and equal to 0 otherwise. 40

ej conditioned on i∗ . Letting W = (Wt )t≥0 be a This follows since the sequence (i1j , t1j ) is independent of R 1 Pdtwe b −1/10 Brownian motion with drift −2c1 and Var[Wt ] = 2t the process t 7→ w−1/2 j=1 R j converges in law to (Wt )t≥0 . Therefore e T1 ≤ T2 ∧ T3 ; T1 ≤ c1 ] lim lim lim P[E;

c2 →0 c1 →0 w→∞

e T1 ≤ T2 ∧ T3 ; ≤ lim lim lim P E; c2 →0 c1 →0 w→∞

X

inf

−1/2 −1/2 c1 w ≤t≤c1 ∧T1

e T1 ≤ T2 ∧ T3 ; ≤ lim lim lim P E; c2 →0 c1 →0 w→∞

min

1/2

inf

c2 →0 c1 →0

0≤t≤2c1 c−1 2

k X

1≤k≤2c1 c−1 2 w

" ≤ lim lim P

Wt − 2c1 c−1 2 <−

1 2

j : t1j ≤t

1 Rj1 < − w1/2 2

ej − 2c1/2 c−1 w1/2 R 1 2

j=1

1 1/2 <− w 2

#

= 0. In order to bound the fourth term on the right side of (50), a union bound and the upper bound on the e implies that it is sufficient to prove that for fixed c1 , c2 ∈ (0, 1/10) and fixed length of I on the event E, i ∈ [−c2 w, c2 w] ∩ Z, log P[E 0 ; T ≤ c1 ] −w, e ∩ {T = T3 ≤ T1 ∧ T2 } ∩ {i ∈ {i∗1 + c2 w + 1, i∗1 + c2 w + 2, . . . , i∗2 − c2 w − 1}}, E 0 := E T := inf{t ≥

−1/2 c1 w−1/2

:

w Y m (i, t)

(52)

≤ 0},

where the implicit constant is independent of w, but may depend on c1 , c2 . We prove this by a similar approach as our bound for the second term on the right side of (50), and will therefore only give a brief justifica−1/2 tion. For any j ∈ N let t0j be the jth smallest element of {t ≥ c1 w−1/2 : ∃x ∈ N(i) such that (x, t) ∈ R}. w w e0 to Then define i0j ∈ N(i) such that (i0j , t0j ) ∈ R, and define Rj0 := (Y m (i, t0j ) − Y m (i, (t0j )− )). Define R j 0 0 0 be equal to 2 (resp. −2) iff both the conditions X(ij , 0) 6= m (resp. X(ij , 0) = m) and ij ∈ (wI) (resp. e0 = 0 otherwise. As above, using A1 < 2c1 c−1 w and A2 < 2c1/2 c−1 w1/2 i0j 6∈ (wI)) are satisfied, and define R j 2 1 2 on E 0 ,

1E 0

−1/2

c1

inf w−1/2 ≤t≤c1 ∧T

w

w

−1/2

Y m (i, t) − Y m (i, c1

w−1/2 ) ≥ 1E 0

min −1

1≤k≤2c1 c2 w

k X

ej0 − 2c1/2 c−1 w1/2 . R 1 2

j=1

e0 )j∈N on the event E 0 , stochastically dominates a sequence (R b0 )j∈N of i.i.d. In this case the sequence (R j j random variables which are equal to 2 (resp. −2) with probability 21 (1 + c2 )( 12 − w−1/4 ) (resp. 12 (1 − c2 )( 12 + w−1/4 )), and equal to 0 otherwise, where the terms (1 ± c2 ) follow from the condition λ(I) > 1 + c2 and the terms ( 21 ± w−1/4 ) follow from the definition of the event E. For sufficiently large w the process Pdtwe b 1 0 0 0 t 7→ w−1/2 j=1 R j − 2 c2 dtwe stochastically dominates a Brownian motion W = (Wt )t≥0 satisfying W0 = 0 0 and Var(Wt ) = 2t. Therefore " # w w −1/2 −1/2 0 0 −1/2 lim P[E ; T ≤ c1 ] ≤ lim P E ; inf ) < −w Y m (i, t) − Y m (i, c1 w w→∞

w→∞

−1/2

c1

w−1/2 ≤t≤c1 ∧T

" ≤ lim P w→∞

inf

0≤t≤2c1 c−1 2

Wt −

1/2 2c1 c−1 2

which implies (52). 41

1 + c2 tw1/2 2

# < −1 = 0,

Lemma 3.8. Consider the setting described in Lemma 3.7, and let I, c1 , c2 , and E be as in that lemma. Define open intervals In for n ∈ {0, . . . , dw0.02 e} inductively by I0 := I, and by letting In ⊂ In−1 be the open interval such that the left (resp. right) end-point of In has distance w−0.1 from the left (resp. right) end-point of In−1 . For n ∈ {1, . . . , dw0.02 e} define tn := nw−0.01 , and let En be the following event w

−1/2

En = {Y m (i, t) > 0, ∀i ∈ (wIn ), ∀t such that c1 w−1/2 ≤ t ≤ tn }. Then log P E ∩ En−1 ∩ Enc −w for all n ∈ {1, . . . , dw0.02 e}, where the implicit constant is independent of w and n, but may depend on all other constants, including c1 and c2 . Proof. We first give a brief outline of the proof. Note that in this lemma we consider only times of order 1, rather than times of order w−1/2 as in the remainder of the paper; this means that a uniformly positive fraction of the Poisson clocks have been ringing. As in the proof of Lemma 3.7, we consider some time T w which represents the first time at which Y m (i, t) ≤ 0 for some i in In , and we show that for each i in In it is very unlikely that i is the first node for which this happens for T ≤ tn . We assume the event E ∩ En−1 occurs. We divide N(i) into three parts (see N1 (i), N2 (i), N3 (i) below). For each k ∈ {1, 2, 3} we can w obtain a good (lower) bound on the contribution to Y m (i, tn−1 ) coming from nodes in Nk (i), since we know approximately how many Poisson clocks which have been ringing before time tn−1 , and since nodes in In−1 −1/2 (corresponding to N1 (i) ∪ N3 (i), plus maybe part of N2 (i)) have a bias towards m during [c1 w−1/2 , tn−1 ] on the event E ∩ En−1 . Conditioning on the fraction of nodes in Nk (i), k ∈ {1, 2, 3}, of opinion m at time w tn−1 , we compare the evolution of Y m (i, t) on [tn−1 , T ] to a random walk of a certain step size distribution, and we argue that this random walk is unlikely to hit 0 before time tn , which completes the proof. b > 0. We assume throughout the proof that E b occurs; As in the proof of Lemma 3.7 we may assume P[E] b in particular, some variables we define may exist only conditioned on E. Define the following stopping times −1 for i ∈ [a − c−1 2 w, a + c2 w] ∩ Z w

Ti := inf{t ≥ tn−1 : Y m (i, t) ≤ 0}, c−1 2 w, a

T := inf{Ti : i ∈ (wIn )},

c−1 2 w]

Fix i ∈ [a − + ∩ Z. By a union bound it is sufficient to prove the following estimate, where the implicit constant is independent of w, n, and i log P Eni ; Ti < tn −w, Eni := E ∪ En−1 ∪ {i ∈ (wIn )} ∪ {Ti = T < tn }. (53) Divide the neighborhood N(i) into three disjoint parts Nk (i) for k = 1, 2, 3 satisfying the following requirements; existence of appropriate neighborhoods is immediate by using the definition of In and λ(In ) > 1 |N1 (i)| = w,

|N2 (i)| = w − w0.9 + 1,

N1 (i) ⊂ {j ∈ N(i) : j ∈ (wIn )},

|N3 (i)| = w0.9 ,

N3 (i) ⊂ {j ∈ N(i) : j ∈ (wIn−1 )}.

w,k

For any t ≥ 0 define Y m (i, t) for k = 1, 2, 3 by X X 1 w,k Y m (i, t) := 1X(j,t)=m − |Nk (i)| − M j∈Nk (i)

j∈Nk (i)

−1/2

M −1 1X(j,t)6=m − |Nk (i)| . M

On E ∩ En−1 the nodes in N1 (i) have bias m throughout [c1 w−1/2 , tn−1 ]. For each node the probability that its clock rings during [0, tn−1 ] is (1 − e−tn−1 ), and when this happens for the first time for some −1/2 t ∈ [c1 w−1/2 , tn−1 ] the node updates its opinion to m iff its current opinion is different. When this w,1 happens Y m increases by 2. Except on an event of exponentially small probability, for large w the number −1/2 1 of nodes with an opinion different from m at time c1 w−1/2 differ from MM−1 w by at most 100 w1/2+1/100 , −1/2 −1/2 , tn−1 ] differ from and the number of nodes in N1 (i) whose Poisson clock rings during t ∈ [c1 w 1 (1 − e−tn−1 )w by at most 100 w1/2+1/100 . Therefore w,1 M −1 −tn−1 1/2+1/100 log P E; En−1 ; Y m (i, tn−1 ) − 2 (1 − e )w > w −w, (54) M 42

w,2

By a similar argument the following inequalities hold. Note that we only get a lower bound for Y m (i, tn−1 ) since we do not know the bias of the nodes in N2 (i) throughout [0, tn−1 ] 1 w,2 log P E; En−1 ; Y m (i, tn−1 ) < −2 (1 − e−tn−1 )(w − w0.9 ) − w1/2+1/100 −w, M (55) w,3 M −1 −tn−1 0.9 )w > w1/2+1/100 −w. log P E; En−1 ; Y m (i, tn−1 ) − 2 (1 − e M P3 w w,k First assume M > 2. By (54) and (55), and by using Y m (i, tn−1 ) = k=1 Y m (i, tn−1 ), M −2 w −tn−1 P E; En−1 ; i ∈ (wIn ); Y m (i, tn−1 ) < (1 − e )w −w. M

(56)

Defining A1 := |{(j, t) ∈ R : j ∈ N(i), t ∈ [tn−1 , tn ]}| M −2 4M (1

w

−tn−1

w

)w] −w. Using this estimate, (56) and Y m (i, tn ) ≥ Y m (i, tn−1 ) − 2A1 −e we have log P[A1 > we obtain (53) for the case M > 2. In the remainder of the proof assume M = 2. Defining T w := {tn−1 + w−0.3 , tn−1 + 2w−0.3 , . . . , tn−1 + 0.3−0.01 dw ew−0.3 }, observe by a union bound that for all sufficiently large w X 1 w w w w i i 0.85 P En ; inf Y m (i, t) ≤ 0 ≤ P En ; s < Ti ; Y m (i, s) < Y m (i, tn−1 ); Y m (i, tn−1 ) > w tn−1 ≤t≤Ti ∧tn 2 s∈T w X + P(Eni ; |{(j, t) ∈ R : j ∈ N(i), t ∈ [s − w−0.3 , s]}| > w0.8 ) s∈T w

w + P Eni ; Y m (i, tn−1 ) ≤ w0.85 . The logarithm of the second sum on the right side is −w. The logarithm of the third term on the right side is also −w by (54)-(55), so to prove (53), and thereby complete the proof of the lemma, it is sufficient to prove the following estimate for any s ∈ T w 1 w w w 0.85 i −w. (57) log P En ; s < Ti ; Y m (i, s) < Y m (i, tn−1 ); Y m (i, tn−1 ) > w 2 Fix s ∈ T w , define the following random variables R+ := |{j ∈ N+ (i) : ∃t ∈ (tn−1 , s] such that (j, t) ∈ R}|,

N+ (i) := {j ∈ N1 (i) : X(j, tm−1 ) 6= m},

R− := |{j ∈ N− (i) : ∃t ∈ (tn−1 , s] such that (j, t) ∈ R}|,

N− (i) := {j ∈ N2 (i) ∪ N3 (i) : X(j, tm−1 ) = m},

and observe that 1 w 1 w w Eni ∩ {s < Ti } ∩ Y m (i, s) < Y m (i, tn−1 ) ⊂ Eni ∩ {s < Ti } ∩ R+ − R− < − Y m (i, s) . 2 4

(58)

w,k

By the definition of Y m (i, tn−1 ), |N+ (i)| =

1 w,1 (w − Y m (i, tn−1 )), 2

|N− (i)| =

1 w,2 w,3 (w + Y m (i, tn−1 ) + Y m (i, tn−1 )). 2

By large deviation estimates for Bernoulli random variables log P[R+ < (1 − e−(s−tn−1 ) )|N+ (i)| − w1/2+1/100 ] −w, log P[R− > (1 − e−(s−tn−1 ) )|N− (i)| + w1/2+1/100 ] −w, 43

(59)

so using (59) and 1 − e−(s−tn−1 ) < 2w−0.01 , for all sufficiently large w, 1 w log P Eni ; s < Ti ; R+ − R− < − Y m (i, s) −w. 4 We obtain (57) by using this estimate and (58). Proof of Theorem 1.2. Couple the discrete and continuum Schelling model as in Proposition 3.1. Almost surely there is an m∗ ∈ {1, 2} such that 0 ∈ Am∗ . Let I 0 ⊂ Am∗ (resp. I w ⊂ Aw m∗ ) be the connected 0 w w component of Am∗ (resp. Aw m∗ ) containing to origin, where I is empty if 0 6∈ Am∗ . Let > 0. Define I ⊂ I 0 0 c by I := {x ∈ I : dist(x, (I ) ) > }. By translation invariance in law of both the discrete and continuum Schelling model it is sufficient to prove that P[I ⊂ I w ] > 1 − for all sufficiently large w ∈ N. Consider the objects defined in the statement of Lemma 3.7 for a = 0. Since M = 2 we know by b = 1, so limc →0 limc →0 limw→∞ P[E] = 1. Let c1 , c2 ∈ (0, 1/2) be such that Proposition 2.14 that P[E] 2 1 limw→∞ P[E] > /100 and such that c2 . Observe that if n o −1/2 E 0 := ∀i ∈ (wI) ∩ Z, ∃t ∈ (c1 w−1/2 , w0.01 ) such that (i, t) ∈ R and the events En for n ∈ {0, . . . , dw0.02 e} are defined as in Lemma 3.8, then E 0 ∩ Edw0.02 e ⊂ {I ⊂ I w }. It follows by a union bound that dw0.02 e 0 c

w

c

P[I 6⊂ I ] ≤ P[(E ) ] + P[E ] +

P[E; E0c ]

+

X

P[E; En−1 ; Enc ].

(60)

n=1

The first term on the right side of (60) converges to 0 as w → ∞, and the second term on the right side of (60) is smaller than /3 for all sufficiently large w by our choice of c1 and c2 . The third term on the right side of (60) is identically equal to zero. The last term on the right side of (60) is smaller than /3 for all sufficiently large w by Lemma 3.8. Therefore P[I ⊂ I w ] > 1 − for all sufficiently large w, which concludes the proof. The following proposition says that the opinion of the nodes in the discrete Schelling model converge almost surely. For the case M = 2 it was proved in [TT14, Theorem 1.1], and our proof proceeds similarly as the proof for this case. See e.g. [Mor95, GH00] for other related results. Observe that this result is not used in our proof of our main results, and is included only as a statement of independent interest. Proposition 3.9. Consider the Schelling model on either ZN or on the torus SN where (in the notation on Section 1.3) N is invariant upon reflection through the origin, N ∈ N, and M ∈ {2, 3, . . . }. For each node i the opinion X(i, t) converges almost surely as t → ∞, i.e., for each node i there is a random time T > 0 such that X(i, t) = X(i, T ) for all t ≥ T . Proof. The proof is identical for the two cases ZN and SN . We will only present it for the case of ZN , but the result for SN follows by replacing ZN by SN throughout the proof. Let E be the set of undirected edges of the graph on which the Schelling model takes place, i.e. (i, j) = (j, i) ∈ E for i, j ∈ ZN iff j ∈ N(i) (equivalently, since N is invariant upon reflection through the origin, i ∈ N(j)). For each i, j ∈ ZN such that (i, j) ∈ E, associate a positive real number wij , such that for any i, j, k for which (i, j), (i, k) ∈ E wij (2w + 1)N + 1 < . wik (2w + 1)N − 1

(61)

P Choose the wij ’s such that (i,j)∈E wij < ∞. The existence of appropriate wij satisfying these properties follows by [TT14, Proposition 3.4], since the degree of each node is bounded by (2w + 1)N , and since our graph satisfies the growth criterion considered in [TT14]. For each i ∈ ZN and t ≥ 0 define Jti by X X Jti := wij 1X(j,t)6=X(i,t) − wij 1X(j,t− )6=X(i,t− ) . j∈N(i)

j∈N(i)

44

If t is a time at which the Poisson clock of node i rings, the first (resp. second) term on the right side expresses how many neighbors of i that disagree with i after (resp. before) i updates its opinion at time t. Assume the clock of node i rings at time t, and consider the three rules (i)-(iii) from Section 1.3 that i follows when updating its opinion. By our constraint (61), Jti < 0 if one of the rules (i) or (iii) apply, while Jti = 0 if i does not change its opinion, which is the case when (ii) applies. Given a node i and the value of wij for all j ∈ N(i), the number of possible values for Jti is finite. It follows that we can find a real number i > 0, such that we have either Jti < −i or Jti = 0 for all times t ≥ 0 for which the clock of node i rings. Next define the Lyapunov function L : [0, ∞) → R+ by X Lt = wij 1X(i,t)6=X(j,t) . (i,j)∈E

Note that Lt < ∞ for all t ≥ 0 by our assumption of summability of wij . Since Lt − Lt− = Jti if the clock of node i rings at time t, Lt is decreasing in t. It follows that there exists some L ≥ 0 such that limt→∞ Lt = L. Fix i ∈ ZN , and let T > 0 be such that Lt − L < i for all t ≥ T . Since Lt − Lt− = Jti for any time t at which the clock of node i rings, we see that Jti = 0 for all times t > T . It follows that i never updates its opinion after time T , which completes the proof of the lemma. We end the section with a result which may be related to the limiting opinions of the Schelling model on ZN for N ≥ 2. We say that A ⊂ ZN is connected if, for any two i, j ∈ A, there is an n ∈ N and a sequence {ik }0≤k≤n such that i0 = i, in = j, ik ∈ A and kik − ik−1 k1 ≤ 1 for all k ∈ {1, ..., n}. We say that A ⊂ ZN is stable if all nodes of ZN agree with the most common opinion in their neighborhood when all nodes in A have opinion 1 and all nodes in ZN \A have opinion 2. Note that the definition of stability depends on N and w. The diameter of a set A ⊂ ZN is defined by supi,j∈A ki − jk∞ . We say that A is a smallest stable shape for the Schelling model if it is a connected stable subset of ZN of minimal diameter. We observed before the statement of Lemma 2.18 that in the one-dimensional Schelling model on Z the final configuration of opinions consists of monochromatic intervals of length at least w + 1. We also observe that the smallest stable shapes for the Schelling model on Z are sets A ⊂ Z consisting of w + 1 consecutive integers. In particular, this means that all nodes are part of a monochromatic stable shape in the final configuration. By Theorem 1.2 the blocks with constant opinion in the final configuration of the Schelling model on Z or S have length of order w. One might guess that the smallest stable shape is related to the diameter of a typical cluster in the limiting configuration of opinions also in higher dimensions. There exist stable configurations where the cluster sizes are smaller than the diameter of a stable shape (e.g. a checkerboard configuration when N has the shape of a cube), but these seem unlikely to occur since they are typically unstable, in the sense that changing the opinion of a small number of nodes may cause a large cluster of nodes to obtain the same opinion. We thank Omer Tamuz for suggesting the approach used in the upper bound of the following proposition. Proposition 3.10. Let N > 1, p ∈ [1, ∞], and assume N = Np := {x ∈ RN : kxkp < 1}. The diameter d of a smallest stable shape for the Schelling model satisfies w2 d wN +1 , where the implicit constants can be chosen independently of p and w, but the constant in the upper bound may depend on N . Proof. We will prove the lower bound w2 d by induction on the dimension N . We will only do the case p < ∞, but the case p = ∞ can be done in exactly the same way. We start with the case N = 2. See Figure 8 for an illustration. Assume A is a stable shape, and define ik1 and i2 as follows for k = {1, . . . , bw/2c} i2 := min{i2 ∈ Z : ∃i1 ∈ Z such that (i1 , i2 ) ∈ A},

ik1 := min{i1 ∈ Z : (i1 , i2 + k − 1) ∈ A}.

Then define ik := (ik1 , i2 + k − 1) ∈ A. We will prove by induction on k that ik1 − ik+1 ≥ w − k for all 1 bw/2c 2 k ∈ {1, . . . , bw/2c}, which is sufficient to obtain the lower bound w d since it implies that i11 −i1 w2 . 2 For any i = (i1 , i2 ) ∈ Z define N− (i) = {(i01 , i02 ) ∈ N(i) : i02 < i2 ∨ ((i02 = i2 ) ∧ (i01 < i1 ))}, 45

N+ (i) = N(i) \ N− (i).

N(i2 ) A ⊂ Z2

i3 = (i31 , i2 + 2)

i2 = (i21 , i2 + 1)

i1 = (i11 , i2 )

Figure 8: Illustration of the lower bound in Proposition 3.10 for N = 2, w = 6, and p = 1. The points of Z2 marked in blue is a subset of a stable shape A, while elements of Z \ A are shown in red. The green dotted line separates N− (i2 ) (lower part) and N+ (i2 ) (upper part).

First let k = 1. By definition of i2 and i11 , A ∩ N− (i1 ) = ∅. Since i1 agrees with the most common opinion in its neighborhood this implies A ∩ N(i1 ) = N+ (i1 ). In particular, this implies by the definition of i21 and since (i11 − w + 1, i2 + 1) ∈ N+ (i1 ) that i11 − i21 ≥ w − 1. Now assume k > 1 and that i`1 − i`+1 ≥ w − ` for ` ∈ {1, . . . , k − 1}. This assumption implies by the 1 definition of i`1 that A ∩ N− (ik ) ⊂ {(ik1 + w − k + 1, i2 + k − 2), (ik1 + w − k + 2, i2 + k − 2), . . . , (ik1 + w − 1, i2 + k − 2)}, in particular, |A ∩ N− (ik )| ≤ k. Since ik ∈ A agrees with the most common opinion in its neighborhood by stability of A, we must have |N+ (ik ) \ A| ≤ k − 1. By N = Np for p ∈ [1, ∞), {(ik1 − w + 1, i2 + k), (ik1 − w + 2, i2 + k), . . . , (ik1 − w + k, i2 + k)} ⊂ N+ (ik ), where we note that the set on the left side has k elements. Since |N+ (ik ) \ A| ≤ k − 1 this implies ik1 − ik+1 ≥ 1 w − k. This completes our proof by induction, and hence completes the proof of the lower bound for d in the case when N = 2. Now assume the lower bound w2 d has been proved for dimension 2, . . . , N − 1 for some N > 2. We want to show that it also holds in dimension N . Define iN := min{iN ∈ Z : ∃i1 , . . . iN −1 ∈ Z such that (i1 , i2 , . . . , iN ) ∈ A}. Then {i = (i1 , . . . , iN ) ∈ A : iN < iN } = ∅, so since all elements of A ∩ {i = (i1 , . . . , iN ) ∈ ZN : iN = iN } agree with the most common opinion in its neighborhood and by, for any i0 = (i01 , . . . , i0N ) ∈ ZN , symmetry of N(i0 ) upon reflection through the plane iN = i0N , the following (N − 1)-dimensional set must be stable if the neighborhood of any j ∈ ZN −1 is given by {j 0 ∈ ZN −1 : kj − j 0 kp < w} {(i1 , . . . , iN −1 ) ∈ ZN −1 : (i1 , . . . , iN −1 , iN ) ∈ A}. By the induction hypothesis this set has diameter w2 , so A also has diameter w2 . This concludes the proof of the lower bound by induction. Now we will prove the upper bound d wN +1 . Let r = 22N wN +1 , and define A0 = {i ∈ ZN : kik∞ ≤ r. Let all nodes in A0 have opinion 1, and let all nodes in ZN \A0 have opinion 2. We define decreasing sets An ⊂ ZN , n ∈ N, by induction as follows. For each n ∈ N we choose one element of i ∈ An−1 which does not agree with the most common opinion in its neighborhood, we change the opinion of this node to 2, and we define An = An−1 \{i}. We continue this procedure until An = ∅ or until all nodes in An agree with the 46

most common opinion in its neighborhood. Let n e denote the time at which the process terminates. We will prove by contradiction that Ane 6= ∅. This will imply the existence of a stable shape of diameter r wN +1 . Define En = {(i, j) : i ∈ An , j ∈ ZN \An }. By our choice of the node i in each step, the sequence (|En |)1≤n≤en is strictly decreasing. Assuming Ane = ∅ this implies |A0 | ≤ |E0 |. We have |A0 | = (2r + 1)N . We can find a constant CN > 0 depending on N , such that there are < CN (2r + 1)N −1 w nodes in A0 which have a neighbor in ZN \A0 . By using this and that |N(i)| ≤ (2w + 1)N for any i ∈ SN , we get |E0 | ≤ CN (2r + 1)N −1 w(2w + 1)N . Using |A0 | ≤ |E0 | these estimates imply (2r + 1) ≤ CN w(2w + 1)N . This is a contradiction to our definition of r, and we conclude that Ane 6= ∅.

4

Open problems

This paper explains the limiting configuration of opinions in the Schelling model when N = 1 and M = 2 (see Theorem 1.2). One open problem is to understand the limiting configuration of opinions in cases where N ≥ 2 and/or M > 2. In particular, it remains an open question to understand the following situations: (i) SN for N ≥ 2, (ii) ZN for N ≥ 2, and (iii) S or Z for M > 2. Case (i) could possibly be understood by studying the long-time behavior of solutions Y of the initial value problem (5), (7) for N ≥ 2. If we knew that the limit Y (x) := limt→∞ p(Y (x, t)) exists for almost every x ∈ S N a.s., the field (Y (x))x∈S N would likely describe law of the limiting opinions in the discrete model. Observe that non-trivial limiting configurations (i.e., limiting configurations with more than one limiting opinion) happen with positive probability, e.g. if the torus width R is at least 3, and the initial data are such that p(Y ((x1 , . . . , xN ), 0)) equals 1 (resp. 2) for x1 ∈ [0, 1] (resp. x1 ∈ [R − 5/4, R − 1/4]). For such initial data we will have p(Y ((x1 , . . . , xN ), t)) equal to 1 (resp. 2) for x1 ∈ [0, 1] (resp. x1 ∈ [R − 5/4, R − 1/4]) and all t ≥ 0. The continuum Schelling model (5), (7) may be less helpful for understanding case (ii). Even if we had established existence and uniqueness of solutions of (5), (7) on RN for N ≥ 2 (see Theorem 2.1), the solution Y may be of only limited help for understanding the final configurations of opinions in the discrete model. Observe that there are no bounded continuum stable shapes, where a continuum stable shape is a set D ⊂ RN which is such that if m ∈ {1, . . . , M }, t0 ≥ 0 and p(Y (x, t0 )) = m for all x ∈ D then p(Y (x, t)) = m for all t ≥ t0 (see above Proposition 3.10 for the discrete definition). Since there are no bounded continuum stable shapes, we expect that the limit limt→∞ p(Y (x, t)) a.s. does not exist for any fixed x ∈ RN , at least when M = 2. The existence of this limit is necessary in order for the continuum Schelling model to describe the limiting configurations of opinions in the discrete model. The continuum Schelling model (5), (7) is related to the discrete Schelling model upon rescaling the lattice by w−1 . If we proved a scaling limit result for the discrete model by proving convergence of the opinions in the continuum model, the diameter of a typical cluster in the discrete model would therefore be of order w. Figure 5 suggests that the diameter of the limiting clusters on Z2 grow superlinearly in w. The typical cluster size may be related to the size of the smallest (discrete) stable shape for the model; see Proposition 3.10 for upper and lower bounds on the diameter of the smallest stable shape. An independently interesting problem (which involves no probability) is to resolve the sizable discrepancy between the upper and lower bounds in Proposition 3.10. One could try to explicitly construct the minimal stable shape for each given w, and compute its size. Case (iii) could be understood by studying the long-time behavior of the solutions of (5), (7) for N = 1. We believe Theorem 1.2 also holds for M > 2, i.e., the limiting opinions in the Schelling model have a scaling limit upon rescaling the lattice by w−1 , and the limiting law can be described by an M -tuple (A1 , . . . , AM ), where the sets Am have a.s. disjoint interior and can be written as the union of intervals each of length larger than 1 a.s. This version of Theorem 1.2 with M > 2 would be immediate from the approach in Section 3.2 if we had established the corresponding version of Proposition 2.14 (see Remark 2.15).

47

References [Adl10]

Robert J. Adler. The geometry of random fields, volume 62 of Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2010. Reprint of the 1981 original [ MR0611857].

[AMR88] R. Abraham, J. E. Marsden, and T. Ratiu. Manifolds, tensor analysis, and applications, volume 75 of Applied Mathematical Sciences. Springer-Verlag, New York, second edition, 1988. [AP86]

Kenneth S. Alexander and Ronald Pyke. A uniform central limit theorem for set-indexed partialsum processes with finite variance. Ann. Probab., 14(2):582–597, 1986.

[Bas95]

Richard F. Bass. Probabilistic techniques in analysis. Probability and its Applications (New York). Springer-Verlag, New York, 1995.

[BB01]

Richard F. Bass and Krzysztof Burdzy. The supremum of Brownian local times on H¨older curves. Ann. Inst. H. Poincar´e Probab. Statist., 37(6):627–642, 2001.

[BB02]

Richard F. Bass and Krzysztof Burdzy. Erratum to: “The supremum of Brownian local times on H¨ older curves” [Ann. Inst. H. Poincar´e Probab. Statist. 37 (2001), no. 6, 627–642; MR1863273 (2002j:60146)]. Ann. Inst. H. Poincar´e Probab. Statist., 38(5):799–800, 2002.

[BEL14]

G. Barmpalias, R. Elwes, and A. Lewis-Pye. Digital morphogenesis via Schelling segregation. In 55th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2014, pages 156– 165. IEEE Computer Soc., Los Alamitos, CA, 2014.

[BEL15a] G. Barmpalias, R. Elwes, and A. Lewis-Pye. From randomness to order: unperturbed Schelling segregation in two or three dimensions. ArXiv e-prints, April 2015. [BEL15b] G. Barmpalias, R. Elwes, and A. Lewis-Pye. Minority population in the one-dimensional Schelling model of segregation. ArXiv e-prints, August 2015. [BEL15c] G. Barmpalias, R. Elwes, and A. Lewis-Pye. Tipping points in 1-dimensional Schelling models with switching agents. J. Stat. Phys., 158(4):806–852, 2015. [BIKK12] Christina Brandt, Nicole Immorlica, Gautam Kamath, and Robert Kleinberg. An analysis of one-dimensional Schelling segregation. In STOC’12—Proceedings of the 2012 ACM Symposium on Theory of Computing, pages 789–803. ACM, New York, 2012. [CF08]

W. Clark and M. Fossett. Understanding the social context of the schelling segregation model. Proceedings of the National Academy of Sciences, 105(11):4109–4114, 2008.

[Cla91]

W. A. V. Clark. Residential preferences and neighborhood racial segregation: A test of the schelling segregation model. Demography, 28(1):1–19, 1991.

[DCM08] L. Dall’Asta, C. Castellano, and M. Marsili. Statistical physics of the schelling model of segregation. J. Stat. Mech, 7, 2008. [GBLJ09] S. Grauwin, E. Bertin, R. Lemoy, and P. Jensen. Competition between collective and individual dynamics. Proceedings of the National Academy of Sciences of the United States of America , National Academy of Sciences, 106(49):20622–20626, 2009. [GGS+ 08] Stefan Gerhold, Lev Glebsky, Carsten Schneider, Howard Weiss, and Burkhard Zimmermann. Computing the complexity for schelling segregation models. Communications in Nonlinear Science and Numerical Simulation, 13(10):2236 – 2245, 2008. [GH80]

Donald Geman and Joseph Horowitz. Occupation densities. Ann. Probab., 8(1):1–67, 1980.

48

[GH00]

Y. Ginosar and R. Holzman. The majority action on infinite graphs: strings and puppets. Discrete Matematics, 215(1-3):59–72, 2000.

[GVN09]

L. Gauvin, J. Vannemenus, and J.-P. Nadal. Phase diagram of a schelling segregation model. European Physical Journal B, 70:293–304, 2009.

[IKLZ15] N. Immorlica, B. Kleinberg, B. Lucier, and M. Zadomighaddam. Exponential Segregation in a Two-Dimensional Schelling Model with Tolerant Individuals. ArXiv e-prints, November 2015. [LJ03]

J. Laurie and K. Jaggi. Role of ’vision’ in neighbourhood racial segregation: A variant of the schelling segregation model. Urban Stud, 40(13):2687–2704, 2003.

[Mor95]

G. Moran. On the period-two-property of the majority operator in infinite graphs. Transactions of the American Mathematical Society, 347(5):1649–1667, 1995.

[MS16]

J. Miller and S. Sheffield. Liouville quantum gravity and the Brownian map II: geodesics and continuity of the embedding. ArXiv e-prints, May 2016.

[N05]

The Royal Swedish Academy of Sciences. Advanced information on the Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel, 2005.

[Odo08]

G. Odor. Self-organising, two temperature ising model describing human segregation. International journal of modern physics C, 3:393–398, 2008.

[PV07]

Romans Pancs and Nicolaas J. Vriend. Schelling’s spatial proximity model of segregation revisited. Journal of Public Economics, 91(12):1 – 24, 2007.

[PW01]

M. Pollicott and H. Weiss. The dynamics of schelling-type segregation models and a non-linear graph laplacian variational problem. Adv. Appl. Math., 27:17–40, 2001.

[Sch69]

T. Schelling. Models of segregation. The American Economic Review, pages 488–493, 1969.

[Sch71]

Thomas Schelling. Dynamic models of segregation. Journal of Mathematical Sociology, 1, 1971.

[Sch78]

T. C. Schelling. Micromotives and Macrobehavior. New York, NY: Norton, 1978.

[SS07]

D. Stauffer and S. Solomon. Ising, schelling and self-organising segregation. European Physical Journal B, 57:473–479, 2007.

[Str11]

Daniel W. Stroock. Essentials of integration theory for analysis, volume 262 of Graduate Texts in Mathematics. Springer, New York, 2011.

[SVW09] A. Singh, D. Vainchtein, and H. Weiss. Schelling’s segregation model: Parameters, scaling, and aggregation. Demographic Research, 21:341–365, 2009. [TT14]

O. Tamuz and R. J. Tessler. Majority dynamics and the retention of information. 2014.

[VK07]

D. Vinkovic and A. Kirman. A physical analogue of the schelling model. Proceedings of the National Academy of Sciences, 103(51):19251–19265, 2007.

[You01]

H.P. Young. Individual strategy and social structure: an evolutionary theory of institutions. Princeton University Press, 2001.

[Zha04]

J. Zhang. A dynamic model of residential segregation. Journal of Mathematical Sociology, 28(3):147–170, 2004.

49