Feb 19, 2014 - As an alternative, erasure codes (for instance, Reed-Solomon codes) have been used by Facebook, OceanStore, RAID-6. [2] and others to i...

0 downloads 24 Views 1MB Size

Bradley Department of Electrical and Computer Engineering † Hume Center for National Security and Technology, Virginia Tech, Blacksburg, VA USA Email: {tandonr, adhiraj, tcc, rbuehrer}@vt.edu

arXiv:1310.0054v2 [cs.IT] 19 Feb 2014

Abstract Distributed storage systems in the presence of a wiretapper are considered. A distributed storage system (DSS) is parameterized by three parameters (n, k, d), in which a file stored across n distributed nodes, can be recovered from any k out of n nodes. This is called as the reconstruction property of a DSS. If a node fails, any d out of (n − 1) nodes help in the repair of the failed node so that the regeneration property of the DSS is preserved. For such a (n, k, d)-DSS, two types of wiretapping scenarios are investigated: (a) Type-I (node) adversary which can wiretap the data stored on any l < k nodes; and a more severe (b) Type-II (repair data) adversary which can wiretap the contents of the repair data that is used to repair a set of l failed nodes over time. The focus of this work is on the practically relevant setting of exact repair regeneration in which the repair process must replace a failed node by its exact replica. We make new progress on several non-trivial instances of this problem which prior to this work have been open. The main contribution of this paper is the optimal characterization of the secure storage-vs-exact-repair-bandwidth tradeoff region of a (n, k, d)-DSS, with n ≤ 4 and any l < k in the presence of both Type-I and Type-II adversaries. While the problem remains open for a general (n, k, d)-DSS with n > 4, we present extensions of these results to a (n, n − 1, n − 1)-DSS, in presence of a Type-II adversary that can observe the repair data of any l = (n − 2) nodes. The key technical contribution of this work is in developing novel information theoretic converse proofs for the Type-II adversarial scenario. From our results, we show that in the presence of Type-II attacks, the only efficient point in the storage-vs-exact-repair-bandwidth tradeoff is the MBR (minimum bandwidth regenerating) point. This is in sharp contrast to the case of a Type-I attack in which the storage-vs-exactrepair-bandwidth tradeoff allows a spectrum of operating points beyond the MBR point.

I. I NTRODUCTION A continuous rise in the volume of data managed across various systems, calls for new storage mechanisms that maintain this data reliably. Distributed storage is the default technique for storing data in all new generation applications. The data from a file is stored in a decentralized manner on several un-reliable nodes/disks that when collectively used are capable of recovering the entire file. To ensure robustness to disk failures, the simplest scheme is to replicate and store the data across several disks. For instance, the Google File System (GFS) and the Hadoop Distributed File System (HDFS) store 3 copies of the data across several nodes [1]. While replication is robust to failures, it is not a scalable strategy to store large volumes of data. As an alternative, erasure codes (for instance, Reed-Solomon codes) have been used by Facebook, OceanStore, RAID-6 [2] and others to introduce redundancy into the storage system. However, to repair a failed node, the entire file must be downloaded from the remaining alive nodes. Thus, the repair process for such codes can be excessively bandwidth intensive. The concept of regenerating codes for distributed storage was introduced in the seminal work by Dimakis et al. [3]. A typical distributed storage system (DSS) consists of n storage nodes each with a storage capacity of α (symbols or units of data) such that the entire file of size B can be recovered by accessing any k ≤ n nodes. This is called as the reconstruction property of the DSS. Whenever a node fails, d nodes (where k ≤ d ≤ n) participate in the repair process by sending β units of data each. This procedure is termed as the regeneration of a failed node and β is often referred to as the per-node repair bandwidth. In [3], the authors show that the in order to store a file of size B, the parameters of a DSS must necessarily satisfy B≤

k−1 X i=0

min (α, (d − i)β) .

(1)

Thus, in order to store a file of size B, there exists a fundamental tradeoff between α (storage) and dβ (total repair bandwidth). Furthermore, it was shown that the reconstruction-regeneration requirements of a DSS can be equivalently formulated as a multicasting problem over an appropriately defined graph. This revelation along with the celebrated result of network coding for multicasting over graphs [4] were used to show that this tradeoff is indeed achievable. However, this tradeoff is in general achievable only for the functional-repair case. In functional repair, a failed node is replaced by a new node such that the resulting DSS has the same reconstruction and regeneration capabilities as before. In particular, the content of the repaired node may not necessarily be identical to the failed node even though the desirable properties of the DSS are preserved. In contrast to functional repair, exact repair regeneration requires the repair process to replace a failed node with an identical new node. Exact repair is appealing for many practical applications where the data has to be stored intact. The file recovery

K

Insecure Repair

X X+

X

+K

1

X + K2 Secure Repair

X : Data

2K

X + K1

K1 + K2

Secure Node 2

+K

X : Data K1 , K2 : Keys

K

+K

1

X + K2

K1 + K2

Secure Node 3 Data Collector (3, 2, 2)-DSS with Type-I (node) security.

X

1,

K : Key

X + 3K

[n k d] = [3 2 2]

X

[n k d] = [3 2 2]

Secure Node 3

Fig. 1.

X + K1

+

3K

+2

+

X

X + 3K

X + K2

Secure Node 1

Secure Node 1

Secure Node 2

X + K1

X +K

Repaired Node

2

X +K

X + 2K

Failed Node

Repaired Node

K

Failed Node

X + K2 , K

1 + K2

2

Data Collector

Fig. 2.

(3, 2, 2)-DSS with Type-II (repair) security.

process is also easier in this case compared to the functional repair scenario since the file reconstruction procedure need not change whenever a failed node is replaced. While characterizing the storage-vs-bandwidth tradeoff for the case of exact repair remains a challenging open problem in general, two extreme points of this tradeoff (depending on whether α or β is minimized first) have been studied extensively. They are the minimum storage regenerating (MSR) and the minimum bandwidth regenerating (MBR) points for which the explicit exact-repair regenerating codes have been developed (see [5], [6] and references therein). Beyond these results, Tian [7] has recently characterized the optimal exact-repair tradeoff for the (4, 3, 3)-DSS where it has been shown that there is a gap between the optimal tradeoffs for functional repair and exact repair. Sensitive data such as personal, health and financial records are increasingly being stored in a DSS. Securing such data from adversaries/eavesdroppers is necessary to ensure data secrecy for the users. Hence a DSS should be secure apart from satisfying the reconstruction and regenerating requirements. Two types of eavesdropping attacks can potentially occur in a DSS, (a) Type-I attack, in which the eavesdropper can wiretap the storage contents of l nodes in the DSS and (b) Type-II attack, in which the eavesdropper wiretaps the contents of the repair data (and thereby the storage content as well) of l nodes in the DSS. Throughout this paper, we assume that l < k since k is the minimum number of nodes required to reconstruct the entire file of size B. Else, if l ≥ k, the eavesdropper can recover the file by using the reconstruction property of the DSS. Encryption and decryption i.e., cryptographic approaches to ensure data security have been deemed to offer less secrecy compared to other information/coding theoretic approaches [11]. Further, handling secure key management issues in a DSS are more complicated compared to developing information theoretically secure codes that offer the desired level of secrecy. The concept of security in a DSS against Type-II attacks was introduced in [9] where, linear codes that achieve the optimal tradeoff region in the bandwidth limited regime were proposed. In the context of DSS, secure exact repair regenerating codes are beneficial as the functional repair process may reveal additional information such as coding coefficients that are used in the process of regenerating a functionally equivalent node [8]. Optimal exact repair codes that are secure against eavesdropping have been explored for the MSR and MBR points in [10]-[12]. The codes developed in [10] achieve the MBR point for all (n, k, d) configurations with any l < k. The MSR code in [10] was shown to be optimal for Type-I attacks while it was shown to be optimal for Type-II attacks with l = 1 in [11]. The maximum file size B that can be securely stored using linear MSR codes with exact repair was studied in [12] and proved to be optimal for d = n − 1. As an example, consider the (3, 2, 2)-DSS and consider the setting in which the wiretapper can only read the contents of any l = 1 node (Type-I adversary). For this setting, a secure exact repair code for this problem is shown in Fig. 1. When the 1st node fails, the other nodes send their data contents to enable its repair. Using these two data symbols, the 1st node can recover its initial data contents (since two symbols can be recovered using two linearly independent combinations) thus satisfying the exact repair requirements. Since the eavesdropper is unaware of the secure key K, it cannot recover the data symbol X by wiretapping on any single node in the DSS. Now, consider the same (3, 2, 2)-DSS in the presence of a more powerful Type-II adversary that can observe the repair data of any l = 1 node. It is clear that the code in Figure 1 will leak out the entire data X during the repair of node 1, as the wiretapper can use X + 2K, X + 3K, (2 linear combinations of 2 symbols) to recover X. Thus, a different secure exact repair DSS that can handle a Type-II adversary is shown in Fig. 2. It is seen that the storage capacity should be increased to α = 2 in this case compared to the Type-I attack (where α = 1). Since the eavesdropper is not aware of the keys K1 and K2 , it cannot get any information about the message symbol X even if it observes the repair data of any one node. From these schemes, the secure repair scheme in Figure 2 requires higher storage per-node (α = 2) compared to the secure node scheme in Figure 1 (α = 1 per-node). Thus, a natural question arises: does there exist a DSS with a smaller storage per

node α < 2 that can store a file of size B = 1, with repair bandwidth β = 1 while still preserving the security of the repair 0 0 process? Or, equivalently, is there any other more efficient tradeoff pair (α , β ) than (α, β) = (2, 1) that can store a file of size B = 1 while ensuring secure exact repair ? In this paper, we answer this question in the negative through an information theoretic converse proof that shows that if secure and exact repair requirements are imposed, then the storage per-node cannot be smaller than α = 2 and hence the scheme in Figure 2 is optimal. Equivalently, our result shows that the only efficient point in the (α, β) tradeoff for this example is the MBR point which corresponds to α = dβ. This result is extended and generalized for any (n, n − 1, n − 1)-DSS and Type-I/Type-II adversaries for l = (k − 1) = (n − 2). We next summarize the main contributions of this paper. 1) We characterize and prove the optimal exact repair region that is achievable under Type-I and Type-II attacks for the (n, k, d)-DSS with n ≤ 4 and l < k. Specifically, we look at (3, 2, 2)-DSS with l = 1, (4, 2, 3)-DSS with l = 1 and (4, 3, 3)-DSS with l = 1, 2. The main contributions of this paper are the novel converse proofs that characterize the strorage-vs-bandwidth tradeoff region against Type-II attacks for these distributed storage systems. 2) For the (4,3,3)-DSS with l = 1 and Type-I attack, we leverage upon the characterization of the exact repair region obtained in the absence of an eavesdropper in [7]. Specifically, we introduce the additional security constraint in order to obtain the secure storage-vs-bandwidth tradeoff region in this case. Further, a novel code construction for this storage and bandwidth efficient region of the tradeoff curve is also introduced. This code construction also achieves the optimal exact repair region for (4, 3, 3) in the absence of any eavesdroppers and is shown to be read efficient (in terms of the number of disk reads required to repair/recover a file [13]) when compared to the code given in [7]. 3) These results are extended and proved for a general (n, n − 1, n − 1)-DSS when any l = n − 2 nodes are compromised in the DSS. The rest of this paper is organized as follows. In Section II we describe the system and the eavesdropper’s model. The main theorems that describe the optimal achievable storage-bandwidth tradeoff regions are given in Section III and are proved in the Appendix. The intuition behind the converse proofs is explained in Section IV. The coding schemes that achieve these optimal regions are presented in Section V. Finally conclusions are drawn in Section VI1 . II. S YSTEM M ODEL A (n, k, d, α, β, B)-DSS consists of n storage nodes that store a file F of size B across n nodes, with each node capable of storing up to α units of data. A data collector can connect to any k < n nodes and must be able to reconstruct the entire file F . This is known as the MDS property of the DSS [3]. We focus on the scenario of single node failures in which at any given point any one node in the system could fail. For the repair of a failed node, any d out of the remaining (n − 1) alive nodes can be accessed by downloading up to β ≤ α symbols to repair the failed node. The parameter dβ is referred to as the total repair bandwidth. From an information theoretic perspective, the goal is to store a file F , whose entropy is B, i.e., H(F ) = B.

(2)

We next introduce random variables corresponding to the data stored in the nodes and the random variables used in the file recovery and the repair process. Let Wi denote the content that is stored at node i, for i = 1, 2 . . . , n. Hence, the storage constraint implies H(Wi ) ≤ α,

i = 1, 2, . . . , n.

Due to the MDS property (i.e., the file B must be reconstructed from any k ≤ n nodes), we also have H F |W{k} = 0,

(3)

(4)

where W{k} is the data stored in any subset of k storage nodes. Next, we consider the repair of a failed node j from any d remaining nodes. We denote the data sent by node i to repair node j by Sij . Due to the repair bandwidth constraint, we have H(Sij ) ≤ β,

(5)

and for exact repair of node j from the repair data of d nodes, we also have H(Wj |Sr1 j , Sr2 j , . . . , Srd j ) = 0, {ri }di=1 ∈ [1, n] 6= j.

(6)

Finally, we note that any repair data sent by node i, i.e., Sij is a function of the data stored in node i, i.e., H(Sij |Wi ) = 0. 1 Parts

of this paper will be presented at ITA-2014 [16] and ICC-2014 [17].

A. Adversarial Models We next formalize the Type-I and Type-II security constraints, each corresponding to different capabilities of the wiretapping adversary. • Type-I (node) security: in this setting, the wiretapper can read the storage content of any l < k nodes. Hence, we require that the information leakage by revealing the data of any l storage nodes must be zero, i.e., I F ; W{l} = 0, (7) •

where W{l} is the data stored in any subset of l nodes. Type-II (repair) security: in this setting, the wiretapper can read the repair data of any l < k nodes over time. Hence, for the repair of any l nodes to be secure, we require I (F ; Sn1 , Sn2 , . . . , Snl ) = 0,

(8)

where Sni is the data (downloaded from d repair nodes) used in the repair of node ni . Furthermore, it is worth noting that any DSS that is secure under Type-II secrecy constraint is also secure under Type-I secrecy constraint but the reverse statement is not true in general. To illustrate these constraints, consider the (n, k, d) = (3, 2, 2)-DSS. Then, the constraints regarding file-regeneration and exact repair are as follows: File Regeneration: H(F |W1 , W2 ) = H(F |W1 , W3 ) = H(F |W2 , W3 ) = 0. Exact Repair: H(W1 |S21 , S31 ) = H(W2 |S12 , S32 ) = H(W3 |S13 , S23 ) = 0. Repair data functionality: H(S12 , S13 |W1 ) = H(S21 , S23 |W2 ) = H(S31 , S32 |W1 ) = 0. For this example, the parameter l can be either 0 and 1. The non-trivial case is l = 1, for which the Type-I constraints can be written as: I(F ; W1 ) = I(F ; W2 ) = I(F ; W2 ) = 0. whereas the Type-II security constraints are: I(F ; S21 , S31 ) = I(F ; S12 , S32 ) = I(F ; S13 , S23 ) = 0. We next define the Type I (respectively Type-II) secrecy capacity of a DSS as the maximum file size that can be stored under the constraints placed on storage (3), file regeneration (4), repair bandwidth constraint (5), exact repair (6) and Type-I (respectively Type-II) secrecy constraint. Formally, we define the Type-I secrecy capacity as BIS =

(3)−(6),(7)

S BII =

(3)−(6),(8)

max

H(F ),

(9)

max

H(F ).

(10)

and the Type-II secrecy capacity as

The study of distributed storage systems in the presence of a passive wiretapper was initiated in [9]. It was shown that for any (n, k, d)-DSS with either Type-I or Type-II secrecy constraint (characterized by the parameter l), the following is an upper bound on the maximum secure file size B S : BS ≤

k−1 X i=l

min(α, (d − i)β).

(11)

Intuitively, this result can be interpreted as follows. In the presence of a wiretapper, the maximum file size a DSS can store must necessarily reduce (compared to (1)) because l nodes are wiretapped. Since l nodes are compromised, at most (k − l) nodes can help a data collector in recovering the entire file while keeping it secure from l nodes. Hence the summation is over (k − l) nodes as opposed to k nodes [10]. To illustrate by an example, consider the (3, 2, 2)-DSS with l = 1 for which the

(n, k, d) = (3, 2, 2) l = 1

↵= =

Gap Optimal Tradeo↵

(n, k, d) = (4, 2, 3)

1 2

MBR

(↵ = 2 )

1 Fig. 3.

2

↵

1

Secure (α, β) tradeoff for (3, 2, 2)-DSS and l = 1.

Fig. 4.

↵ BS BS

Optimal Tradeo↵

Type I Security

(2, 1)

(↵ = )

↵= =

Optimal Tradeo↵

Type II Security

(1, 1)

l=1

Gap

BS

Optimal Tradeo↵

Type I Security 1

↵ BS

Type II Security

(1, 12 )

( 32 , 12 )

(↵ = )

MBR (↵ = 3 ) 3 2

↵

Secure (α, β) tradeoff for (4, 2, 3)-DSS and l = 1.

upper bound in (11) simplifies to B S ≤ min(α, β).

(12)

Normalizing (12) throughout by B S , and defining α = α/B S , β = β/B S , we can equivalently write 1 ≤ min(α, β). This bound gives rise to a region in the (α, β) plane and is shown in Fig. 3 with the corner point (α, β) = (1, 1). Furthermore, it is not difficult to show that this tradeoff is achievable under Type-I security constraints by showing that it is possible to store a file of size B S = 1 using α = 1 and β = 1. The scheme is shown in Figure 1. Hence, the upper bound in (12) along with the scheme of Figure 1 imply that the optimal (α, β)-tradeoff for (3, 2, 2)-DSS, l = 1 in presence of a Type-I adversary is given by BIS ≤ min(α, β).

(13)

As for the more stringent Type-II security is concerned, it has been shown by Shah et al in [10], using the Product-Matrix framework at the MBR point (α = dβ), that it is possible to store a file of size B S = 1 using α = 2 and β = 1. This scheme (shown in Figure 2) shows that the following (α, β)-tradeoff region is achievable: α S BII ≤ min ,β . (14) 2 On the other hand, the bound in (12) implies that the optimal (α, β)-tradeoff region for the Type-II secrecy constraint is contained in: S BII ≤ min(α, β).

(15)

Thus, there exists a significant gap between these two regions for the Type-II secrecy constraint. Furthermore, it is not clear whether the gap is due to the weakness of the upper bound or if there exists an improved achievable scheme that can close this gap. In this paper, we show that this gap is fundamental to the secure exact repair problem and the upper bound can be improved. By deriving a novel information theoretic converse, we show that for the (3, 2, 2)-DSS, l = 1 and Type-II secrecy constraint, we show that the optimal secure storage-repair-bandwidth tradeoff is given by α S BII ≤ min ,β . (16) 2 III. M AIN R ESULTS In this section, we outline the main theorems that describe the optimal secure storage-vs-exact repair-bandwidth tradeoffs (in short referred to as (α, β)-tradeoff region) under the exact repair requirement. Theorem 1: The optimal (α, β)-tradeoff regions for (3, 2, 2)-DSS with l = 1 under exact repair are given by: BIS ≤ min(α, β), (17) α S BII ≤ min ,β . (18) 2 The converse proof for (17) follows directly from (11) and is therefore omitted. The main contribution is the converse proof for the bound (18) which is given in Section IV.

↵= =

(n, k, d) = (4, 3, 3) l = 1 Gap 1 2

1 1 2, 2

(↵ = )

(n, k, d) = (4, 3, 3) l = 2

Gap

Optimal Tradeo↵

Optimal Tradeo↵

Type I Security

Type II Security

Optimal Tradeo↵

1, 13 MBR (↵ = 3 )

Outer bound [8] 3 5

2 3

1

1

↵

1

Secure (α, β) tradeoff for (4, 3, 3)-DSS and l = 1.

Fig. 6.

↵ BS BS

Optimal Tradeo↵

Type I Security

1 3

1 2

↵= =

BS

3 2 5, 5

2 5

Fig. 5.

↵ BS

Type II Security

(1, 1)

(3, 1)

(↵ = )

MBR (↵ = 3 )

3

↵

Secure (α, β) tradeoff for (4, 3, 3)-DSS and l = 2.

Theorem 2: The optimal (α, β)-tradeoff regions for (4, 2, 3)-DSS with l = 1 under exact repair are given by: BIS ≤ min(α, 2β) 2α S , 2β . BII ≤ min 3 The bound in (19) follows directly from (11). The converse proof for (20) is given in Section VII-A in the Appendix. Theorem 3: The optimal (α, β)-tradeoff regions for (4, 3, 3)-DSS with l = 1 under exact repair are given by: α + 6β S BI ≤ min min(α, 2β) + min(α, β), 3 S BII ≤ min (α, 3β) .

(19) (20)

(21) (22)

The first term in (21) is directly from the upper bound given in (11). The converse proof for the second term in (21) is one novel contribution of this paper. This proof is leveraged off the proof of exact repair given for the (4, 3, 3)-DSS without any secrecy constraints in [7]. This is presented in Section VII-B1 in the Appendix. A novel converse proof for the bound in (22) under the Type-II constraints is given in Section VII-B2 in the Appendix. Theorem 4: The optimal (α, β)-tradeoff regions for (4, 3, 3)-DSS with l = 2 under exact repair are given by: BIS ≤ min(α, β) (23) α S BII ≤ min ,β . (24) 3 The bound in (23) follows directly from (11). Hence the proof is skipped. The converse proof for (24) is given in Section VII-C in the Appendix. Theorem 5: The optimal (α, β)-tradeoff regions for (n, n − 1, n − 1)-DSS with l = n − 2 under exact repair are given by: BIS ≤ min(α, β) α S BII ≤ min ,β . n−1

(25) (26)

This is an extension of the Theorems 1 and 4 to a general (n, n − 1, n − 1)-DSS. This is the worst case scenario with respect to the (n, n − 1, n − 1)-DSS since l = k − 1 nodes are compromised and hence the file size that can be stored securely is the minimum among all possible l < k scenarios. Similar to the Theorems 1-4 the proof of (25) is omitted as it is derived from (11). The proof for (26) is a novel extension of the converse proofs given for (18) (24) and is provided in the Appendix in Section VII-D. Figs. 3-7 show the optimal (α, β) tradeoff regions described by Theorems 1-5. From these theorems, it is seen that the only efficient point in the optimal (α, β)-tradeoff region for the Type-II constraints is the MBR point where α = dβ. However, this is different from the optimal tradeoff-region achievable under Type-I constraints as seen in these results. Thus there is a gap between the optimal regions achievable under these two constraints i.e., the file size that can be securely stored under Type-II constraints is lower than the file size achieved under Type-I constraint.

(n, k, d) = (n, n

1, n

1) l = n

↵= =

Gap Optimal Tradeo↵

Fig. 7.

BS

Type II Security

(1, 1)

(n

(↵ = )

MBR

1, 1)

↵ = (n

n

1

↵ BS

Optimal Tradeo↵

Type I Security 1

2

1)

↵

1

Secure (α, β) tradeoff for (n, n − 1, n − 1)-DSS and l = n − 2.

Further notice that the gap between these two constraints with respect to the secure file size increases as n increases in the DSS. For the Type-I attacks, when l = n − 2 and k = n − 1, only one node can securely store the file and hence the maximum file size that can be securely stored is given by (25) which is independent of n, while it decreases due to the wiretapping process as seen in (26) (a more stringent constraint on the DSS). IV. P ROOF OF T HEOREM 1 AND I NTUITION BEHIND C ONVERSE P ROOFS In this section, we present the proof of the Theorem 1 i.e., (3, 2, 2)-DSS with l = 1 in order to explain the intuition behind the novel converse proofs that establish the secure storage-vs-bandwidth repair tradeoff region against Type-II attacks. The converse proofs for the other Theorems are presented in the Appendix. Let F be the random variable that denotes the file stored on the DSS. For the (3, 2, 2)-DSS with l = 1, the bound in (11) (corresponding to Type-I attack) is given by H(F ) ≤ min(α, β).

(27)

In the Type-I attack, the stored content of any l = 1 node must not reveal any information about the file F . For instance, the content stored in node 1 must not reveal any information about F , i.e., the DSS must satisfy I(F ; W1 ) = 0. Using such Type-I security constraints, one can readily show that H(F ) ≤ min(α, β) as follows: H(F ) = H(F |W1 )

(28)

= H(F, W1 ) − H(W1 )

(29)

= H(W1 , W2 ) + H(F |W1 , W2 ) −H(W1 ) | {z }

(31)

= H(W2 |W1 )

(33)

≤ H(F, W1 , W2 ) − H(W1 )

(30)

=0 (file regeneration)

= H(W1 , W2 ) − H(W1 )

(32)

Next, the conditional entropy H(W2 |W1 ) can be expanded in two distinct ways as follows: (a) By bounding this term as H(W2 |W1 ) ≤ H(W2 ) ≤ α,

(34)

gives the bound H(F ) ≤ α. (b) On the other hand, we can also bound this term as H(W2 |W1 ) ≤ H(W2 , S32 |W1 )

(35)

= H(S32 |W1 ) + H(W2 |S32 , W1 ) {z } |

(36)

≤ H(S32 ) ≤ β,

(38)

=0 (repair of node 2)

= H(S32 |W1 ) which gives the bound H(F ) ≤ β.

(37)

While this bound H(F ) ≤ min(α, β) is tight for the case of Type-I attack, it turns out to be strictly sub-optimal for the case of more severe Type-II attack. In particular, for the Type-II attack, in order to secure the DSS from an eavesdropper that wiretaps the repair process of any single node (l = 1), the DSS must satisfy the following three security constraints: I(F ; S21 , S31 ) = I(F ; S12 , S32 ) = I(F ; S13 , S23 ) = 0,

(39)

to indicate the secure repair of nodes 1, 2 and 3 respectively. If the eavesdropper can read (S21 , S31 ), i.e., the repair data for W1 , then two cases can arise: • if there is no redundancy in the repair process (α = dβ), then it is clear that if the stored data is secure, then the repair data will also be secure. • However, for α < dβ, more information about the file could be leaked in general. This aspect is not captured by the proof for the Type-I attack and one needs to carefully deal with the security of the repair process. We have the following sequence of bounds for secure repair of node 1, H(F ) = H(F |S21 , S31 )

(40)

= H(F, S21 , S31 ) − H(S21 , S31 )

(41)

= H(W2 , S21 , S31 ) + H(F |W2 , S21 , S31 ) −H(S21 , S31 ) | {z }

(43)

= H(W2 , S21 , S31 ) − H(S21 , S31 , W1 ) + H(W1 |S21 , S31 ) | {z }

(45)

≤ H(F, W2 , S21 , S31 ) − H(S21 , S31 )

(42)

=0 (file regeneration)

= H(W2 , S21 , S31 ) − H(S21 , S31 )

(44)

=0 (repair of node 1)

= H(W2 , S21 , S31 ) − H(S21 , S31 , W1 )

(46)

= H(W2 , S21 , S31 ) − H(S21 , S31 , W1 , S12 , S13 ) + H(S12 , S13 |W1 , S21 , S31 ) {z } |

(47)

=0

≤ H(W2 , S21 , S31 ) − H(S21 , S31 , S12 , S13 ),

(48)

where (43) follows from the file regeneration requirement from nodes (1, 2), (47) follows from the fact that (S12 , S13 ) are functions of W1 . Similarly, for secure repair of node 2, we have H(F ) ≤ H(W1 , S12 , S32 ) − H(S12 , S32 , S21 , S23 ).

(49)

Adding (48) and (49), we have 2H(F ) ≤ H(W2 , S21 , S31 ) + H(W1 , S12 , S32 ) − H(S21 , S31 , S12 , S13 ) − H(S12 , S32 , S21 , S23 )

= H(W2 , S21 , S31 ) + H(W1 , S12 , S32 ) − 2H(S21 , S12 ) − H(S31 , S13 |S21 , S12 ) − H(S23 , S32 |S21 , S12 )

≤ H(W2 , S21 , S31 ) + H(W1 , S12 , S32 ) − H(S21 , S12 ) − H(S12 , S21 , S13 , S31 , S23 , S32 )

(50) (51) (52)

= H(W2 , S21 , S31 ) + H(W1 , S12 , S32 ) − H(S21 , S12 ) − H(S12 , S21 , S13 , S31 , S23 , S32 , W2 )

(53)

= H(W1 , S12 , S32 ) − H(S21 , S12 )

(55)

≤ H(W2 , S21 , S31 ) + H(W1 , S12 , S32 ) − H(S21 , S12 ) − H(W2 , S21 , S31 )

(54)

≤ H(W1 , W3 , S12 , S32 ) − H(S21 , S12 )

(56)

= H(W3 , S12 ) + H(W1 , S32 |W3 , S12 ) −H(S21 , S12 ) {z } |

(57)

=0

= H(W3 , S12 ) − H(S21 , S12 )

(58)

≤ H(W3 ) + H(S12 ) − H(S12 )

(60)

≤ H(W3 , S12 ) − H(S12 )

(59)

= H(W3 )

(61)

≤ α.

(62)

where • (53) is obtained by noting that W2 is a function of the repair data S12 , S32 , • (54) follows from the fact that conditioning reduces entropy, • (58) is obtained by noting the fact that (W1 , W2 ) can be obtained from (W3 , S12 ). To note this, first observe that (S32 , S31 )

are functions of W3 , hence using (S32 , S12 ), we can repair W2 (and subsequently obtain S21 ). Thus, using (S21 , S31 ), we can repair W1 . Hence from (62), we have H(F ) ≤ α2 . Thus along with the bound H(F ) ≤ β, we have the proof for α H(F ) ≤ min ,β (63) 2 Interestingly, this tradeoff between (α, β) only has one efficient point (see Fig. 3), corresponding to the case when α = 2β (or the MBR point for the (3, 2, 2) DSS). Extending this approach to more general (n, k, d)-DSS is far from trivial. The proof presented above essentially couples two security constraints corresponding to the secure repair of nodes 1 and 2. In general, we believe that it suffices to consider the coupling between kl security constraints, (each constraint corresponding to the secure repair of l nodes). As we show in the Appendix, this statement turns out to be true for the (4, 2, 3), (4, 3, 3) and the (n, n − 1, n − 1) DSS and helps us in establishing the corresponding optimal tradeoff regions. V. ACHIEVABILITY P ROOFS In this section, we present coding schemes that achieve the bounds mentioned in Theorems 1-5. We first present the achievability of the MBR point under Type-II secrecy constraint. Later, the coding schemes that achieve a various spectrum of points beyond the MBR point in the presence of a Type-I adversary are presented. A. Achievability under Type-II security constraint As mentioned earlier, the MBR point is the only valid point in the (α, β) tradeoff region in the presence of Type-II adversary. The MBR point is defined by the (α, β) relationship α = dβ [3]. Substituting this in (11), we get that the optimal secure file size B S must satisfy k l S B ≤ kd − β − ld − β. (64) 2 2 Notice that (64) is identical to the tradeoff regions specified by Theorems 1-5 with the corresponding values of l. This equivalence is further explained below. Secure codes that achieve this MBR point for a general (n, k, d)-DSS with any l < k compromised nodes have been described in [10] for both Type-I and Type-II attacks (since α = dβ at the MBR point, the eavesdropper does not get any additional information by wiretapping the repair process i.e., Type-I and Type-II attacks are equivalent at the MBR point). For S example, in the (3, 2, 2)-DSS with l = 1, the MBR point in (64) is simplified as BII = β; α = 2β, which is the α = 2, β = 1, S BII = 1 point shown in Fig. 3, whose achievability is given in Fig. 2. In the (4, 3, 3)-DSS with l = 1, (64) is given by S S BII = 3β; α = 3β, which is also the α = 3, β = 1, BII = 3 point shown in Fig. 5. Along similar lines, with l = 2, the MBR S S = 1 point shown in Fig. 6. For the (4, 2, 3)-DSS, with = β; α = 3β, i.e., the α = 3, β = 1, BII point in (64) is given by BII S = 2, i.e., MBR point in Fig. 4. For the (n, n − 1, n − 1)-DSS, with l = n − 2, we have l = 1, we have α = 3, β = 1 and BII S = 1, i.e., the MBR point in Fig. 7. For more information on the codes that achieve the MBR point, α = n − 1, β = 1 and BII please see [10]. B. Achievability under Type-I security constraint At the MBR point, the Type-I and Type-II security constraints are equivalent and hence the MBR point is achievable under both these constraints [10]. Beyond this MBR point, a spectrum of points in the secure storage-vs-exact-repair-bandwidth tradeoff region are achievable in the presence of a Type-I adversary. We next present the various coding schemes that achieve these points. 1) (α, β) = (1, 1), BIS = 1 in a (3, 2, 2)-DSS, l = 1: The coding scheme that achieves this point is shown in Fig. 1. In this scheme, X ∈ Fq and K the secure key, is a random variable that is uniformly distributed over Fq such that q > 3. From Fig. 1, it is easy to see that the repair and the reconstruction processes are straightforward. ¯ = (1, 1 ) 2) (α, β) = (2, 1), BIS = 2 in a (4, 2, 3)-DSS, l = 1: This point corresponds to the normalized (α, β) pair i.e., (¯ α, β) 2 in the Fig. 4. Let a1 , a2 denote the message symbols to be stored in the DSS. Choose keys k1 and k2 to securely store the data on the DSS. Here a1 , a2 , k1 , k2 ∈ F5 . Let x1 , x2 , x3 , x4 denote 4 linearly independent combinations of the message and key symbols. They are stored in the DSS as shown in Table I. Notice that the eavesdropper cannot infer any information by observing the data contents on any single storage node. Further, the repair and reconstruction processes are straightforward. ¯ = (1, 1) 3) (α, β) = (1, 1), BIS = 2 in a (4, 3, 3)-DSS, l = 1: This point corresponds to the normalized (α, β) pair i.e., (¯ α, β) 2 2 S in the Fig. 5. Below, we present a coding scheme that can store a file of size B = 2 securely in a (4, 3, 3)-DSS with l = 1 when each node stores α = 1 unit of data and sends β = 1 unit of data for the repair of failed nodes. Let a1 , a2 ∈ F2 denote the message symbols to be stored in the DSS. Choose a key k that is uniformly distributed over F2 in order to securely store the data on the DSS. The resulting DSS is shown in Table. II. Since the eavesdropper is unaware

TABLE I A (4, 2, 3) SECURE EXACT REPAIR CODE FOR (α, β) = (2, 1), BS = 2.

node node node node

1 2 3 4

first symbol x1 x3 x1 + x3 x1 + x4

second symbol x2 x4 x2 + x4 x2 + x3

TABLE II A (4, 3, 3) SECURE EXACT REPAIR CODE FOR (α, β) = (1, 1), BS = 2.

node 1 a1 + k

node 2 a1 + k

node 3 a1 + a2 + k

node 4 k

of the key k, it cannot decode either a1 or a2 from any of the nodes in the DSS, thereby securing the DSS against Type-I attacks. Further, it is easy to see that the file (a1 , a2 ) can be recovered from any k = 3 nodes (by using 3 linearly independent combinations of 3 symbols a1 , a2 and k). Consider the repair process of the 1st node. The nodes 2, 3 and 4 send their data contents in order to repair this failed node. See that a1 and k can be recovered from these three data symbols and that a1 + k, the contents of the 1st node can be restored exactly. ¯ = (3, 2) 4) (α, β) = (3, 2), BIS = 5 in a (4, 3, 3)-DSS, l = 1: This point corresponds to the normalized (α, β) pair i.e., (¯ α, β) 5 5 S in the Fig. 5. This can be shown through a coding scheme that can securely store a file of size B = 5 over n = 4 nodes where each node stores 3 units of data and sends 2 units of data for repair of other nodes. Let (a1 , a2 , a3 , a4 , a5 ) ∈ Fq denote the message symbols to be stored in the DSS. Let (k1 , k2 , k3 ) be the secure keys that are used by the DSS to keep the data secure from the eavesdropper. It is assumed that these keys are uniformly distributed over Fq for some q > 8. Let A = [a1 , a2 , a3 , a4 , a5 ]T and K = [k1 , k2 , k3 ]T represent the message and the secure key vectors. Let GA X = [A K] = GA A + GK K, (65) GK GA be the 1 × 12 vector of symbols to be stored on a DSS. Let be a 8 × 12 Vandermonde matrix such that the ith row is GK defined as [1, i, i2 , i3 , . . . , i11 ] where i = 1, 2, . . . , 8. Here, GA is a 5 × 12 sub matrix obtained from the above Vandermonde matrix given by 1, 1, 12 , . . . , 111 1, 2, 22 , . . . , 211 (66) .. . 1, 5, 52 , . . . , 511

and GK is a 3 × 12 sub matrix given by

1, 6, 62 , . . . , 611 1, 7, 72 , . . . , 711 1, 8, 82 , . . . , 811

(67)

I(A; [X]3 ) = 0

(68)

Since we use a 8 × 12 Vandermonde matrix, we define all the elements over F9 . The resulting elements of X are stored in the DSS as shown in Table III. In order to achieve secrecy, i.e., the eavesdropper should not get any information about {aj }5j=1 by observing the contents of any single node, the following condition must hold true.,

where [X]3 is any 3 elements of the vector X (since any 3 elements of X are linearly independent combinations of A and K). We can represent [X]3 as ˆA G [X]3 = [A K] ˆ , (69) GK 8×3

TABLE III A (4, 3, 3) SECURE EXACT REPAIR CODE FOR (α, β) = (3, 2), BS = 5.

node node node node

1 2 3 4

first symbol x1 x3 x5 x7

second symbol x2 x4 x6 x8

third symbol x9 = x6 + x7 x10 = x1 + x8 x11 = x2 + x3 x12 = x4 + x5

TABLE IV R EPAIR PROCEDURE OF A (4, 3, 3) SECURE EXACT REPAIR CODE WHEN NODE 1 FAILS .

node 2 node 3 node 4

first symbol x3 x6 x7

second symbol x10 = x1 + x8 x11 = x2 + x3 x8

GA that correspond to [X]3 . Notice that these sub matrices are also Vandermonde GK matrices. Thus we have the following set of inequalities, ˆA, G ˆ K are sub matrices of where G

ˆAA + G ˆ K K) I(A; [X]3 ) = I(A; G ˆAA + G ˆ K K) − H(G ˆAA + G ˆ K K|A) = H(G ˆAA + G ˆ K K) − H(G ˆ K K) = H(G ˆK ≤ 3 − rank G = 3 − 3 = 0,

The last two inequalities follow from the fact that the sub-Vandermonde matrices have a maximum rank of 3. Repair Process: Since this is a cyclic construction, every node is equivalent. Without loss of generality, it suffices to consider the repair of any single node. Consider the case when node 1 fails and the other nodes participate in the repair process. Since β = 2, each node sends 2 symbols to aid the repair of the 1st node. The repair process of node 1 is shown in Table IV. The first symbol of node 1 is recovered by combining the second symbols sent by nodes 2 and 4. The second symbol of node 1 is obtained by a combination of the first symbol sent by node 2 and the second symbol sent by node 3. Finally, the third symbol of node 1 is recovered by a combination of the first symbols sent by nodes 3 and 4. Hence, upon receiving the 6 symbols, it can be seen that the 1st node can exactly recover its original symbols, thereby satisfying the requirements of an exact repair code. Remark 1: This code construction can be easily extended to the exact repair case in the absence of an eavesdropper. In the absence of an eavesdropper (no requirement for keys), this code can store a file size of 8 {xj }8j=1 , no linear combinations required in n = 4 nodes with each node storing α = 3 units of data and sending β = 2 units for the repair process. Thus this code achieves a normalized storage-bandwidth tradeoff point ( 83 , 14 ) which is shown to be optimal in [7]. Further, this code is also efficient in terms of the disk reads required for a repair process compared to the code proposed for the exact repair of a (4, 3, 3)-DSS in [7]. It is seen that the number of disk reads required in this proposed code is 6 (6 symbols obtained directly from the nodes) while it is 9 in the case of the code proposed in [7] (see [7] for more details). This behavior is akin to the codes proposed in [13] that are designed to be data-read and download efficient during a repair process. 5) (α, β) = (1, 1), B S = 1 in a (4, 3, 3)-DSS, l = 2: Let a ∈ Fq denote the message symbol to be stored on the DSS. Since l = 2, any two nodes need to be secure from the eavesdropping attack. Hence we choose two keys k1 , k2 that are uniformly distributed in Fq in order to securely store a. Let X = [x1 , x2 , x3 , x4 ]T denote the storage contents of the 4 nodes in the DSS such that a k X = [GA GK ] k1 = GA a + GK 1 , (70) k2 k2 where GA and GK are chosen such that rank ([GA GK ]) = 3 and rank (GK ) = 2 (for instance Vandermonde matrices similar to Section V-B4). {xi }4i=1 are the linear combinations of the message symbol a and the secure keys k1 , k2 . The rank of the concatenated

matrix [GA GK ] ensures that the file can be recovered from any 3 out of the 4 nodes in the DSS (3 linearly independent combinations of 3 symbols), thereby satisfying the reconstruction property of the DSS. By following the similar arguments in Section V-B4, it can be shown that the rank of the matrix GK which is 2, ensures that the storage contents on any two nodes are secure from a Type-I attack by an eavesdropper. Remark 2: The above coding scheme can be extended to a general (n, n − 1, n − 1)-DSS with l = n − 2 to achieve the (α, β) = (1, 1), B S = 1 point under exact repair and Type-I secrecy as shown in Fig. 7. Remark 3: Although the same (α, β) = (3, 1) pair is achievable under Type-II secrecy when l = 1 or l = 2 for the (4, 3, 3)DSS, it is seen that the maximum file size that can be securely stored when l = 1 is B S = 3 while it is B S = 1 when l = 2. Intuitively, when 2 nodes in the DSS are compromised, the secrecy constraints are stringent compared to the case when only one node is compromised and hence the file size that can be securely stored decreases when the number of compromised nodes increases. VI. C ONCLUSION Securing distributed storage systems against passive eavesdropping attacks is addressed in this paper. A complete characterization of the storage-bandwidth tradeoff region is provided for a (n, k, d)-DSS for n ≤ 4 and l < k under exact repair and Type-I, Type-II secrecy constraints. These results have been extended to the (n, n − 1, n − 1)-DSS when n − 2 nodes are compromised. Novel regenerating codes that are read and download repair efficient are presented in order to achieve these optimal secure tradeoff regions. The converse proof for the optimal tradeoff region under Type-I secrecy constraint was leveraged off the proof presented for the exact repair tradeoff region in the absence of an eavesdropper. Novel converse proofs were presented under the Type-II secrecy constraints, thereby characterizing the maximum file size that can be securely stored in a DSS. As expected, the file size that can be securely stored decreases when the number of compromised nodes increases in the DSS. The gap in the file size that can be securely stored under Type-I and Type-II attacks increases as n increases, thereby indicating the severe limitations of the DSS under Type-II attacks. Further, the secure storage-vs-exact repair bandwidth tradeoff region obtained for the Type-II attacks indicates that the MBR point is the only achievable point in the corresponding tradeoff region. Extending these results to a general (n, k, d)-DSS is part of our ongoing work. ACKNOWLEDGEMENT This work was supported in part by L3 Communications. The authors would also like to thank Dr. Soheil Mohajer for helpful discussions regarding the converse proof of (4, 2, 3)-DSS. R EFERENCES [1] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur and K. Ramchandran, “A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster”, in arXiv:1309.0186, Sept. 2013. [2] K. V. Rashmi, N. B. Shah and P. V. Kumar, “Enabling node repair in any erasure code for distributed storage”, in arXiv:1101.0133, Jun. 2011. [3] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. Wainwright and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inf. Theory, vol. 56, no. 9, pp. 4539–4551, Sept. 2010. [4] R. Ahlswede, N. Cai, S.-Y. R. Li and R. W. Yeung, “Network information flow,” IEEE Trans. Inf. Theory, vol. 46, pp. 1204–1216, Jul. 2000. [5] N. B. Shah, K. V. Rashmi, P. V. Kumar and K. Ramchandran, “Distributed storage codes with repair-by-transfer and non-achievability of interior points on the storage-bandwidth tradeoff,” IEEE Trans. Inf. Theory, vol. 58, no. 3, pp. 1837–1852, Mar. 2012. [6] V. Cadambe, S. Jafar, H. Maleki, K. Ramchandran and C. Suh, “Asymptotic interference alignment for optimal repair of MDS codes in distributed storage,” IEEE Trans Inf. Theory, vol. 59, no. 5, pp. 2974–2987, May. 2013. [7] C. Tian, “Rate region of the (4,3,3) exact-repair regenerating codes,” in Proc. Intern. Symp. Inf Theory, ISIT, Istanbul, Turkey, Jun. 2013. [8] O. O. Koyluoglu, A. S. Rawat, and S. Vishwanath, “ Secure Cooperative Regenerating Codes for Distributed Storage Systems,”, in arXiv:1210.3664, Oct. 2012. [9] S. Pawar, S. El Rouayheb, and K. Ramchandran, “Securing dynamic distributed storage systems against eavesdropping and adversarial attacks,” IEEE Trans. Inf. Theory, vol. 58, pp. 6734–6753, Mar. 2012. [10] N. B. Shah, K. V. Rashmi, and P. V. Kumar, “Information-theoretically secure regenerating codes for distributed storage,” in Proc. IEEE Global Commun. Conf., GLOBECOM, Houston, TX, Dec. 2011. [11] A. S. Rawat, O. O. Koyluoglu, N. Silberstein, and S. Vishwanath, ”Optimal locally repairable and secure codes for distributed storage systems,” in arXiv:1210.6954, Aug. 2013. [12] S. Goparaju, S. El Rouayheb, R. Calderbank and H. Vincent Poor, “Data Secrecy in Distributed Storage Systems under Exact Repair,” in Proc. IEEE International Symposium on Network Coding, NETCOD, Calgary, Canada, Jun. 2013. [13] K. V. Rashmi, N. B. Shah and K. Ramchandran, “A Piggybacking Design Framework for Read-and Download-efcient Distributed Storage Codes”, in Proc. IEEE International Symposium on Information Theory, ISIT, Istanbul, Turkey, Jun. 2013. [14] F. Cheng, R. W. Yeung, and K. W. Shum, “Imperfect Secrecy in Wiretap Channel II”, in arXiv:1202.0859, Dec. 2012. [15] K. Rashmi, N. B. Shah, P. V. Kumar, and K. Ramchandran, “Explicit Construction of Optimal Exact Regenerating Codes for Distributed Storage”, in Proc. Allerton Conference on Control, Computing, and Communication, Urbana-Champaign, IL, 2009, pp. 1243–1249. [16] R. Tandon, S. Amuru, T. C. Clancy and R. M. Buehrer, “Distributed Storage Systems with Secure and Exact Repair-New Results”, in Proc. Information Theory and Applications Workshop (ITA), San Diego, CA, Feb. 2014. [17] R. Tandon, S. Amuru, T. C. Clancy and R. M. Buehrer, “On Secure Distributed Storage Systems with Exact Repair”, in Proc. IEEE International Conference on Communications (ICC), Sydney, Australia, May 2014.

VII. A PPENDIX A. Proof of Theorem 2: (4, 2, 3)-DSS, l = 1 In this section, we present the proof for the Type-II setting for the the (4, 2, 3)-DSS and l = 1. In particular we will show that 2α S , 2β . (71) BII ≤ min 3 S To this end, we focus on proving that BII ≤ 2α 3 . • File regeneration from any k = 2 nodes:

•

H(F |W1 , W2 ) = 0

(72)

H(F |W1 , W4 ) = 0

(74)

H(F |W2 , W4 ) = 0

(76)

H(F |W1 , W3 ) = 0

(73)

H(F |W2 , W3 ) = 0

(75)

H(F |W3 , W4 ) = 0.

(77)

H(W1 |S21 , S31 , S41 ) = 0

(78)

Exact repair requirements: H(W2 |S12 , S32 , S42 ) = 0

H(W3 |S13 , S23 , S43 ) = 0

H(W4 |S14 , S24 , S34 ) = 0. •

•

(79) (80) (81)

Secure repair of any l = 1 node: I(F ; S21 , S31 , S41 ) = 0

(82)

I(F ; S12 , S32 , S42 ) = 0

(83)

I(F ; S13 , S23 , S43 ) = 0

(84)

I(F ; S14 , S24 , S34 ) = 0.

(85)

Repair data from a node is a function of stored data: H(S12 , S13 , S14 |W1 ) = 0

(86)

H(S31 , S32 , S34 |W3 ) = 0

(88)

H(S21 , S23 , S24 |W2 ) = 0

(87)

H(S41 , S42 , S43 |W4 ) = 0.

(89)

For secure repair of node 1, we have H(F ) = H(F |S21 , S31 , S41 )

= H(F, S21 , S31 , S41 ) − H(S21 , S31 , S41 )

(149)

= H(F, S21 , S31 , S41 ) − H(S21 , S31 , S41 , S12 )

= H(F, S21 , S31 , S41 ) − H(S12 , S21 ) − H(S31 , S41 |S12 , S21 ).

(90) (91) (92) (93)

Similarly, for secure repair of node 2, we have H(F ) = H(F |S12 , S32 , S42 )

= H(F, S12 , S32 , S42 ) − H(S12 , S32 , S42 ) (150)

= H(F, S12 , S32 , S42 ) − H(S12 , S31 , S41 , S21 )

= H(F, S12 , S32 , S42 ) − H(S12 , S21 ) − H(S32 , S42 |S12 , S21 ).

(94) (95) (96) (97)

Adding (90) and (94), we obtain 2H(F ) = H(F, S21 , S31 , S41 ) + H(F, S12 , S32 , S42 ) − 2H(S12 , S21 ) − H(S31 , S41 |S12 , S21 ) − H(S32 , S42 |S12 , S21 ) (98) ≤ H(F, S21 , S31 , S41 ) + H(F, S12 , S32 , S42 ) − H(S12 , S21 ) − H(S31 , S41 , S32 , S42 , S12 , S21 )

≤ H(F, S21 , S31 , S41 ) + H(F, S12 , S32 , S42 ) − H(S21 ) − H(S31 , S41 , S32 , S42 , S12 , S21 ).

(99)

(100)

Notice that, H(S31 , S41 , S32 , S42 , S12 , S21 ) = H(S31 , S41 , S32 , S42 , S12 , S21 , W1 , W2 , F ) ≥ H(F, S21 , S31 , S41 )

= H(S21 , S31 , S41 ) + H(F |S21 , S31 , S41 )

(153)

= H(S21 , S31 , S41 ) + H(F ).

(101) (102) (103) (104)

Using (101) in (98), we get 2H(F ) ≤ H(F, S21 , S31 , S41 ) + H(F, S12 , S32 , S42 ) − H(S21 ) − (H(S21 , S31 , S41 ) + H(F )).

(105)

We bound the first term in the above equation as follows H(F, S21 , S31 , S41 ) ≤ H(F, S21 , S31 , S41 , W4 ) (74)

= H(S21 , S31 , S41 , W4 )

(106) (107)

= H(W4 ) + H(S41 |W4 ) + H(S21 , S31 |S41 , W4 )

(108)

= H(W4 ) + H(S21 , S31 , S41 ) − H(S41 )

(110)

≤ H(W4 ) + H(S21 , S31 |S41 )

(109)

≤ α + H(S21 , S31 , S41 ) − H(S41 ).

(111)

H(F, S12 , S32 , S42 ) = H(F, S21 , S31 , S41 ).

(112)

H(F, S12 , S32 , S42 ) ≤ α + H(S21 , S31 , S41 ) − H(S41 ).

(113)

By symmetry, we can show that

Hence, we can bound

Substituting (111) and (113) in (105), we have 2H(F ) ≤ 2α + 2H(S21 , S31 , S41 ) − 2H(S41 ) − H(S21 ) − H(F ) − H(S21 , S31 , S41 )

(114)

3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S21 ) − H(S31 )

(116)

3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S31 ) − H(S21 )

(118)

3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S41 ) − H(S31 )

(120)

H(S21 , S31 , S41 ) ≤ H(S21 ) + H(S31 ) + H(S41 ),

(122)

3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S41 ) − H(S21 ).

(115)

Similarly, we have 3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S21 ) − H(S41 )

(117)

3H(F ) ≤ 2α + H(S21 , S31 , S41 ) − 2H(S31 ) − H(S41 )

(119) (121)

By using the fact that

and summing all the above 6 inequalities, we have S 3H(F ) ≤ 2α =⇒ BII ≤

B. Proof of Theorem 3: (4, 3, 3)-DSS, l = 1

2α . 3

(123)

1) Converse for Type-I secrecy constraint: We note that for (4, 3, 3)-DSS and l = 1, under Type-I secrecy constraint, the optimal (α, β)-tradeoff is given by α + 6β S . (124) BI ≤ min min(α, 2β) + min(α, β), 3 We note that the following bound BIS ≤ min(α, 2β) + min(α, β).

(125)

follows directly from (11). Hence, in the remainder of this section, we will show that α + 6β . (126) 3 This proof of this bound closely follows a recent result in [7] which presents the optimal (α, β)-tradeoff for the (4, 3, 3)-DSS in the absence of secrecy constraint. The key difference is carefully incorporating the Type-I secrecy constraint as we will highlight during the proof. We proceed (similar to [7]) as follows: BIS ≤

8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≥ 4H(S31 , S21 , W4 ) + 4H(S32 , W4 )

≥ 4H(S31 , S21 , W4 ) + 4H(S32 , W4 ) − 2I(S21 ; W3 |W4 ) − 2I(W3 ; W4 |S32 , S34 , S24 )

(127)

(a)

= 4H(S31 , S21 , W4 ) + 2H(W4 ) + 2H(W3 , W4 , S21 ) − 2H(W3 , W4 ) + 2H(S32 , S34 , S24 )

+ 2H(W3 , W4 , S24 , S32 , S34 ) − 2H(W4 , S32 , S34 , S24 ) h i ≥ 4H(S31 , S21 , W4 ) + 2H(W4 ) + 2 H(F ) + H(W4 ) − 2H(W3 , W4 ) + 2H(S32 , S34 , S24 )

(b)

+ 2H(W3 , W4 , S24 , S32 , S34 ) − 2H(W4 , S32 , S34 , S24 ),

(128)

where • (a) follows by expanding all the mutual information expressions; then using the fact that (S32 , S34 ) is a function of W3 ; and finally using the symmetry (henceforth referred in short by (sym)), which implies that H(S32 , W4 ) = H(S24 , W3 ) = H(S21 , W4 ). In general we have H(S, W) = H(π(S), π(W)) where S ⊆ {Sij }4i,j=1 , i 6= j, W ⊆ {Wj }4j=1 and π is a permutation of the indices i, j [7]. • (b) is the key inequality that invokes the secrecy constraint; and can be proved as follows H(W3 , W4 , S21 ) = H(F, W3 , W4 , S21 ) ≥ H(F, W4 )

= H(F ) + H(W4 ) − I(F ; W4 ) = H(F ) + H(W4 ).

in which the first equality follows from the fact that the file F can be recovered from (W3 , W4 , S21 ) and the last equality follows from the Type-I secrecy constraint, i.e., I(F ; W4 ) = 0. Next, we lower bound the following term that appears in (128): (sym)

H(S31 , S21 , W4 ) + H(S32 , S34 , S24 ) = H(S34 , S24 , W1 ) + H(S32 , S34 , S24 ) = H(W1 |S34 , S24 ) + H(S32 |S34 , S24 ) + 2H(S34 , S24 )

≥ H(W1 , S32 |S34 , S24 ) + 2H(S34 , S24 )

= H(W1 , S32 , S34 , S24 ) + H(S34 , S24 ) (rep)

= H(W1 , W2 , W4 ) + H(S34 , S24 )

= H(W1 , W2 , W4 , F ) + H(S34 , S24 ) ≥ H(W4 , F ) + H(S34 , S24 )

= H(W4 ) + H(F ) + H(S34 , S24 ),

(129)

where the fourth equality in obtained by noticing that the file F can be recovered using (W1 , S32 , S34 , S24 ) and the fact that Sij is a function of Wi . This is shown below. H(W1 , S32 , S34 , S24 ) = H(W1 , S14 , S32 , S34 , S24 ) = H(W1 , W4 , S32 ) = H(W1 , S12 , W4 , S42 , S32 ) = H(W1 , W2 , W4 ), This property is henceforth denoted by (rep) or repair in the rest of the converse proof. The equation (129) follows from Type-I secrecy constraint. Substituting (129) in (128), we have the following 8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 )

h i ≥ 2H(S31 , S21 , W4 ) + 2H(W4 ) + 4 H(F ) + H(W4 ) − 2H(W3 , W4 ) + 2H(W3 , W4 , S24 )

− 2H(W4 , S32 , S34 , S24 ) + 2H(S34 , S24 ) h i = 2H(S31 , S21 , W4 ) + 2H(W4 ) + 4 H(F ) + H(W4 ) − 2H(W3 , W4 ) + 2H(W3 , W4 , S24 )

(sym)

− 2H(W4 , S32 , S34 , S24 ) + 2H(S31 , S21 )

≥ 2H(S31 , S21 , W4 ) + 2H(S31 , S21 , W4 ) − 2H(W3 , W4 ) + 2H(W3 , W4 , S24 ) h i − 2H(W4 , S32 , S34 , S24 ) + 4 H(F ) + H(W4 ) h i = 4H(S31 , S21 , W4 ) + 4 H(F ) + H(W4 ) − 2H(W3 , W4 ) + 2H(W3 , W4 , S24 ) − 2H(W4 , S32 , S34 , S24 ).

(130)

Next, we have the following inequality:

H(W3 , W4 , S24 ) − H(W3 , W4 ) = H(S24 |W3 , W4 )

≥ H(S24 |S13 , W3 , W4 )

= H(S24 , S13 , W3 , W4 ) − H(S13 , W3 , W4 ).

(131)

Using (131) to further lower bound (130), we get

h i 8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≥ 4H(S31 , S21 , W4 ) + 4 H(F ) + H(W4 )

− 2H(S13 , W3 , W4 ) + 2H(W3 , W4 , S24 , S13 ) − 2H(W4 , S32 , S34 , S24 )

(rep)

= 2H(S31 , S21 , W4 ) + 2H(S31 , S21 , W4 , W1 ) − 2H(S13 , W3 , W4 ) + 2H(W3 , W4 , S24 , S13 ) h i − 2H(W4 , S32 , S34 , S24 ) + 4 H(F ) + H(W4 )

(sym)

= 2H(S31 , S21 , W4 ) + 2H(S13 , S23 , W4 , W3 )

Next, we have the following inequality

− 2H(S13 , W3 , W4 ) + 2H(W3 , W4 , S24 , S13 ) h i − 2H(W4 , S32 , S34 , S24 ) + 4 H(F ) + H(W4 ) .

(132)

H(S13 , S23 , W4 , W3 ) + H(W3 , W4 , S24 , S13 ) − H(S13 , W3 , W4 ) = H(S23 |S13 , W3 , W4 ) + H(S24 |S13 , W3 , W4 ) + H(S13 , W3 , W4 )

≥ H(S23 , S24 |S13 , W3 , W4 ) + H(S13 , W3 , W4 )

= H(S23 , S24 , S13 , W3 , W4 ).

(133)

Using (133) in (132), we obtain

h i 8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≥ 2H(S31 , S21 , W4 ) + 4 H(F ) + H(W4 )

+ 2H(S23 , S24 , S13 , W3 , W4 ) − 2H(W4 , S32 , S34 , S24 ) h i ≥ 2H(S31 , S21 , W4 ) + 4 H(F ) + H(W4 )

+ 2H(S23 , S24 , S13 , W3 , W4 ) − 2H(W4 , S32 , S34 , S24 , S14 ) h i (c) = 2H(S31 , S21 , W4 ) + 4 H(F ) + H(W4 ) + 2H(S23 , S24 , S13 , W3 , W4 ) − 2H(S32 , S34 , S24 , S14 ),

(134)

where (c) follows as W4 is a function of (S14 , S24 , S34 ). Next, we have (sym)

2H(S23 , S24 , S13 , W3 , W4 ) = H(S23 , S24 , S14 , W3 , W4 ) + H(S34 , S32 , S14 , W2 , W4 ) (rep)

= H(S23 , S24 , S14 , S32 , S34 , W3 , W4 ) + H(S34 , S32 , S14 , S24 , S23 , W2 , W4 )

= H(W3 , W4 |S14 , S24 , S34 , S23 , S32 )

+ H(W2 , W4 |S14 , S24 , S34 , S23 , S32 )

+ 2H(S14 , S24 , S34 , S23 , S32 )

≥ H(W2 , W3 , W4 , S14 , S24 , S34 , S23 , S32 ) + H(S14 , S24 , S34 , S23 , S32 )

≥ H(W2 , W3 , W4 ) + H(S14 , S24 , S34 , S23 , S32 )

= H(W2 , W3 , W4 , F ) + H(S14 , S24 , S34 , S23 , S32 ) ≥ H(W4 , F ) + H(S14 , S24 , S34 , S23 , S32 ) h i = H(W4 ) + H(F ) + H(S14 , S24 , S34 , S23 , S32 ),

(135)

where (135) follows from the Type-I secrecy constraint. Using (135) in (134) gives h i 8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≥ 2H(S31 , S21 , W4 ) + 5 H(F ) + H(W4 )

− 2H(S32 , S34 , S24 , S14 ) + H(S14 , S24 , S34 , S23 , S32 ) h i (sym) = 5 H(F ) + H(W4 ) + H(S31 , S21 , W4 ) + H(S14 , S24 , W3 )

− 2H(S32 , S34 , S24 , S14 ) + H(S14 , S24 , S34 , S23 , S32 ) h i (rep) = 5 H(F ) + H(W4 ) + H(S31 , S21 , W4 ) + H(S14 , S24 , S32 , S34 , W3 )

− 2H(S32 , S34 , S24 , S14 ) + H(S14 , S24 , S34 , S23 , S32 ) h i = 5 H(F ) + H(W4 ) + H(S31 , S21 , W4 ) + H(W3 |S14 , S24 , S32 , S34 )

+ H(S23 |S14 , S24 , S32 , S34 ) h i ≥ 5 H(F ) + H(W4 )

+ H(S31 , S21 , W4 ) + H(W3 , S23 |S14 , S24 , S32 , S34 ).

(136)

Finally, using symmetry, we write H(S31 , S21 , W4 ) as: (sym)

H(S31 , S21 , W4 ) = H(S14 , S34 , W2 ) (rep)

= H(W2 , S14 , S34 , S23 , S24 ).

(137)

On the other hand, we also have H(W3 , S23 |S14 , S24 , S32 , S34 ) = H(W3 , S23 , S14 , S24 , S32 , S34 ) − H(S14 , S24 , S32 , S34 ) (sym)

= H(W3 , S23 , S14 , S24 , S32 , S34 ) − H(S14 , S24 , S32 , S23 )

= H(W3 , S32 |S14 , S34 , S23 , S24 )

≥ H(W3 , S32 |W2 , S14 , S34 , S23 , S24 ).

(138)

Using (137) and (138), we have from (136)

h i 8H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≥ 5 H(F ) + H(W4 ) + H(W2 , W3 , S32 , S14 , S24 , S34 , S23 ) h i (rep) = 5 H(F ) + H(W4 ) + H(W2 , W3 , W4 , S32 , S14 , S24 , S34 , S23 ) h i ≥ 5 H(F ) + H(W4 ) + H(W2 , W3 , W4 ) h i = 5 H(F ) + H(W4 ) + H(W2 , W3 , W4 , F ) h i ≥ 5 H(F ) + H(W4 ) + H(W4 , F ) h i = 5 H(F ) + H(W4 ) + H(W4 ) + H(F ) h i = 6 H(F ) + H(W4 ) ,

(139) (140)

where (139) follows from Type-I secrecy constraint. Hence, from (140), we have

6H(F ) ≤ 2H(W4 ) + 4H(S31 , S21 ) + 4H(S32 ) ≤ 2α + 12β,

(141) (142)

which implies that 3H(F ) ≤ α + 6β and hence we have the proof of the bound: BIS ≤

α + 6β . 3

(143)

S ≤ min(α, 2β) + min(α, β) 2) Converse for Type-II Constraint: For this case, we will show that the existing bound (11) BII is strictly loose and the secure storage-repair bandwidth tradeoff is given by S BII ≤ min(α, 3β).

(144)

H(F |W1 , W2 , W3 ) = 0

(145)

H(F |W1 , W3 , W4 ) = 0

(147)

S To this end, we focus on proving that BII ≤ α. • File regeneration from any k = 3 nodes:

•

H(F |W1 , W2 , W4 ) = 0

(146)

H(F |W2 , W3 , W4 ) = 0.

(148)

Exact repair requirements: H(W1 |S21 , S31 , S41 ) = 0

H(W2 |S12 , S32 , S42 ) = 0

H(W3 |S13 , S23 , S43 ) = 0

•

(149) (150) (151)

H(W4 |S14 , S24 , S34 ) = 0.

(152)

I(F ; S21 , S31 , S41 ) = 0

(153)

I(F ; S12 , S32 , S42 ) = 0

(154)

I(F ; S13 , S23 , S43 ) = 0

(155)

I(F ; S14 , S24 , S34 ) = 0.

(156)

Secure repair of any l = 1 node:

•

Repair data from a node is a function of stored data: H(S12 , S13 , S14 |W1 ) = 0

H(S21 , S23 , S24 |W2 ) = 0

H(S31 , S32 , S34 |W3 ) = 0

H(S41 , S42 , S43 |W4 ) = 0.

(157) (158) (159) (160)

For secure repair of node 1, we have (153)

H(F ) = H(F |S21 , S31 , S41 )

= H(F, S21 , S31 , S41 ) − H(S21 , S31 , S41 )

= H(F, S21 , S31 , S41 ) − H(S21 ) − H(S31 |S21 ) − H(S41 |S31 , S21 ).

(161)

Similarly, for secure repair of node 2, we have (154)

H(F ) = H(F |S12 , S32 , S42 ) (150)

= H(F |S12 , S32 , S42 , W2 )

(158)

= H(F |S12 , S32 , S42 , W2 , S21 , S23 )

≤ H(F |S21 , S23 , S42 )

= H(F, S21 , S23 , S42 ) − H(S21 , S23 , S42 )

= H(F, S21 , S23 , S42 ) − H(S21 ) − H(S23 |S21 ) − H(S42 |S23 , S21 ).

(162)

Adding (161) and (162), we obtain 2H(F ) ≤ H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − 2H(S21 ) − H(S31 |S21 ) − H(S23 |S21 ) − H(S41 |S31 , S21 ) − H(S42 |S23 , S21 )

≤ H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − 2H(S21 ) − H(S31 , S23 |S21 ) − H(S41 |S31 , S21 ) − H(S42 |S23 , S21 )

= H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − H(S21 ) − H(S21 , S31 , S23 ) − H(S41 |S31 , S21 ) − H(S42 |S23 , S21 )

= H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − H(S21 , S31 , S23 )

− H(S21 ) − H(S41 |S21 ) − H(S42 |S21 ) + I(S41 ; S31 |S21 ) + I(S42 ; S23 |S21 )

≤ H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − H(S21 , S31 , S23 )

− H(S21 ) − H(S41 , S42 |S21 ) + I(S41 ; S31 |S21 ) + I(S42 ; S23 |S21 )

= H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 )

− H(S21 , S23 , S31 ) − H(S41 , S42 , S21 ) + I(S41 ; S31 |S21 ) + I(S42 ; S23 |S21 ) h i = H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) − 2H(S21 , S23 , S31 ) + I(S41 ; S31 |S21 ) + I(S42 ; S23 |S21 ),

where in (163), we have replaced H(S41 , S42 , S21 ) by H(S31 , S23 , S21 ) by invoking the following symmetry property: H(S41 , S42 , S21 ) = H(S31 , S32 , S21 ) = H(S31 , S23 , S21 ).

(163)

Next, consider the term H(F, S21 , S31 , S41 ) appearing in (163): H(F, S21 , S31 , S41 ) ≤ H(F, W4 , S23 , S21 , S31 , S41 )

= H(W4 , S23 , S21 , S31 ) + H(F, S41 |W4 , S23 , S21 , S31 )

= H(W4 , S23 , S21 , S31 )

= H(W4 ) + H(S23 , S21 , S31 ) − I(W4 ; S21 , S31 , S23 )

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S41 ; S21 , S31 , S23 )

(164) (165)

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S41 ; S21 , S31 )

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S41 ; S31 |S21 )

≤ α + H(S23 , S21 , S31 ) − I(S41 ; S31 |S21 ).

(166)

where (164) follows from the fact that S41 is a function of W4 ; and the fact that the file F can be recovered from W4 , S23 , S21 , S31 . Equation (165) follows from the fact that S41 is a function of W4 and by using data processing inequality. Similarly, for the term H(F, S21 , S23 , S42 ) in (163): H(F, S21 , S23 , S42 ) ≤ H(F, W4 , S21 , S23 , S42 , S31 )

= H(W4 , S23 , S21 , S31 ) + H(F, S42 |W4 , S23 , S21 , S31 )

= H(W4 , S23 , S21 , S31 )

= H(W4 ) + H(S23 , S21 , S31 ) − I(W4 ; S21 , S31 , S23 )

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S42 ; S21 , S31 , S23 )

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S42 ; S21 , S23 )

≤ H(W4 ) + H(S23 , S21 , S31 ) − I(S42 ; S23 |S21 )

≤ α + H(S23 , S21 , S31 ) − I(S42 ; S23 |S21 ).

Adding (166) and (167), we arrive at h i H(F, S21 , S31 , S41 ) + H(F, S21 , S23 , S42 ) ≤ 2α + 2H(S23 , S21 , S31 ) − I(S41 ; S31 |S21 ) − I(S42 ; S23 |S21 ).

(167)

(168)

Substituting (168) in (163), we obtain

2H(F ) ≤ 2α + 2H(S23 , S21 , S31 ) − I(S41 ; S31 |S21 ) − I(S42 ; S23 |S21 ) − 2H(S21 , S23 , S31 ) + I(S41 ; S31 |S21 ) + I(S42 ; S23 |S21 )

= 2α,

which yields H(F ) ≤ α and hence we have the proof for

S BII ≤ α.

(169) (170)

(171)

C. Proof of Theorem 4: (4, 3, 3)-DSS, l = 2 Here, we focus on the case of (4, 3, 3)-DSS and l = 2. For this case, we will show that the secure storage-repair bandwidth tradeoff is given by α S BII ≤ min ,β . (172) 3 For secure repair of any l = 2 nodes, we have the following 42 = 6 constraints: I(F ; S21 , S31 , S41 , S12 , S32 , S42 ) = 0

(173)

I(F ; S21 , S31 , S41 , S13 , S23 , S43 ) = 0

(174)

I(F ; S12 , S32 , S42 , S13 , S23 , S43 ) = 0

(175)

I(F ; S21 , S31 , S41 , S14 , S24 , S34 ) = 0

(176)

I(F ; S12 , S32 , S42 , S14 , S24 , S34 ) = 0

(177)

I(F ; S13 , S23 , S43 , S14 , S24 , S34 ) = 0.

(178)

With these secure repair constraints, and exact repair requirements, we will now show that α S BII ≤ . 3 For the secure repair of nodes 1 and 2, we have

(179)

(173)

H(F ) = H(F |S21 , S31 , S41 , S12 , S32 , S42 )

= H(F, S21 , S31 , S41 , S12 , S32 , S42 ) − H(S21 , S31 , S41 , S12 , S32 , S42 ).

(180)

We first focus on the term H(F, S21 , S31 , S41 , S12 , S32 , S42 ) and bound it as follows: H(F, S21 , S31 , S41 , S12 , S32 , S42 ) ≤ H(F, W4 , S21 , S31 , S41 , S12 , S32 , S42 )

= H(W4 , S21 , S31 , S41 , S12 , S32 , S42 ) + H(F |W4 , S21 , S31 , S41 , S12 , S32 , S42 )

= H(W4 , S21 , S31 , S41 , S12 , S32 , S42 ) + H(F |W4 , W1 , W2 , S21 , S31 , S41 , S12 , S32 , S42 ) (181) = H(W4 , S21 , S31 , S41 , S12 , S32 , S42 )

(182)

= H(W4 , S21 , S31 , S41 , S12 , S32 , S42 , S23 , S13 )

(183)

= H(W4 , S21 , S31 , S12 , S32 , S23 , S13 )

(184)

= H(W4 , U123 ),

(185)

where (181) follows from the fact that W1 can be obtained from (S21 , S31 , S41 ) and W2 can be obtained from (S12 , S32 , S42 ). Next, (182) follows from (146). (183) follows from the fact that W2 (and hence S23 ) is a function of (S12 , S32 , S42 ). Similarly, W1 (and hence S13 ) is a function of (S21 , S31 , S41 ). Finally, (184) follows from the fact that (S41 , S42 ) are functions of W4 . In (185), we have defined U123 , (S21 , S12 , S31 , S13 , S32 , S23 ).

(186)

Substituting (185) in (180), we have H(F ) ≤ H(W4 , U123 ) − H(S21 , S31 , S41 , S12 , S32 , S42 ) (a)

= H(W4 , U123 ) − H(S21 , S31 , S41 , S12 , S32 , S42 , S23 , S13 )

= H(W4 , U123 ) − H(S21 , S12 , S31 , S13 , S32 , S23 ) − H(S41 , S42 |S21 , S12 , S31 , S13 , S32 , S23 )

(b)

= H(W4 , U123 ) − H(U123 ) − H(S41 , S42 |U123 )

where (a) follows from the fact that W2 (and hence S23 ) is a function of (S12 , S32 , S42 ). Similarly, W1 (and hence S13 ) is a function of (S21 , S31 , S41 ). In (b), we have used the variable U123 as defined in (186). In summary, for secure repair of nodes 1 and 2, we have H(F ) ≤ H(W4 , U123 ) − H(U123 ) − H(S41 , S42 |U123 ).

(187)

Similarly, for secure repair of nodes 1 and 3, we have H(F ) ≤ H(W4 , U123 ) − H(U123 ) − H(S41 , S43 |U123 ),

(188)

and for secure repair of nodes 2 and 3, we can obtain H(F ) ≤ H(W4 , U123 ) − H(U123 ) − H(S42 , S43 |U123 ).

(189)

Summing up (187), (188) and (189), we obtain 3H(F ) ≤ 3H(W4 , U123 ) − 3H(U123 ) − H(S41 , S42 |U123 ) − H(S41 , S43 |U123 ) − H(S42 , S43 |U123 ).

(190)

We next note that for any four random variables (X, Y, Z, U ), the following inequality holds: H(X, Y |U ) + H(X, Z|U ) + H(Y, Z|U ) ≥ 2H(X, Y, Z|U ).

(191)

This inequality can be proved as follows: H(X, Y |U ) + H(X, Z|U ) + H(Y, Z|U ) = H(X, Y |U ) + H(Z|U ) + H(X|Z, U ) + H(Y, Z|U )

≥ H(X, Y |U ) + H(Z|X, Y, U ) + H(X|Y, Z, U ) + H(Y, Z|U )

= 2H(X, Y, Z|U ).

Using the inequality (191) by letting X = S41 , Y = S42 , Z = S43 and U = U123 in (190), we obtain 3H(F ) ≤ 3H(W4 , U123 ) − 3H(U123 ) − H(S41 , S42 |U123 ) − H(S41 , S43 |U123 ) − H(S42 , S43 |U123 ) ≤ 3H(W4 , U123 ) − 3H(U123 ) − 2H(S41 , S42 , S43 |U123 )

= 3H(W4 , U123 ) − H(U123 ) − 2H(S41 , S42 , S43 , U123 )

= 3H(W4 , U123 ) − H(U123 ) − 2H(W4 , S41 , S42 , S43 , U123 ) + 2H(W4 |S41 , S42 , S43 , U123 )

= 3H(W4 , U123 ) − H(U123 ) − 2H(W4 , S41 , S42 , S43 , U123 )

(192)

≤ 3H(W4 , U123 ) − H(U123 ) − 2H(W4 , U123 ) = H(W4 , U123 ) − H(U123 )

≤ H(W4 ) + H(U123 ) − H(U123 )

= H(W4 ) ≤ α,

(193)

where (192) follows from the fact that H(W4 |S41 , S42 , S43 , U123 ) = 0. Hence, from (193), we have shown that 3H(F ) ≤ α, and thus we have the proof for α S (194) BII ≤ . 3 D. Proof of Theorem 5: (n, n − 1, n − 1)-DSS, l = (n − 2). In this section, we present the proof for the Type-II setting for the more general (n, n − 1, n − 1)-DSS and l = n − 2. In particular, we will show that α S BII ≤ min ,β . (195) n−1

S ≤ To this end, we focus on proving that BII

α n−1 .

From Type-II security requirement we require:

I(F ; Sπ1 , Sπ2 , . . . , Sπn−2 ) = 0,

(196)

where Sπr is the repair data (from the remaining d = (n − 1) alive nodes) that is used to repair the πr th node. Note that there n are n−2 = n(n − 1)/2 such constraints; each corresponding to the secure repair of a set of l = (n − 2) nodes. Let us consider the first k = (n − 1) nodes, i.e., nodes 1, 2, . . . , n − 1. For secure repair of any l = (n − 2) out of these (n − 1) nodes, we have kl = n−1 n−2 = (n − 1) constraints. Before describing these constraints, we note that the repair data (coming from the remaining (n − 1) alive nodes) for node i is given by: Si = (S1i , . . . , S(i−1)i , S(i+1)i , . . . , Sni ).

(197)

S[1:n−1] , (S1 , S2 , . . . , Sn−1 ),

(198)

Using this, we define

where S[1:n−1] is the collective repair data that is used to repair the first k = (n − 1) nodes. Next, we define Uij , (Sij , Sji ),

(199)

where Uij consists of the repair data Sij that node i sends in repair of node j, and the repair data Sji that node j sends in the repair of node i. Using this, we define for any set A ⊂ {1, . . . , n}: (j)

SA , {Uij : i ∈ A}

UA , {Uij : (i, j) ∈ A, i 6= j}.

(200) (201)

With these definitions in place, we can write the (n − 1) Type-II secrecy constraints for the first (n − 1) nodes as follows: I(F ; S[1:n−1] \ {S1 }) = 0

(202)

I(F ; S[1:n−1] \ {S2 }) = 0 .. .

(203)

I(F ; S[1:n−1] \ {Sn−1 }) = 0.

(204)

S[1:n−1] \ {Si } , (S1 , . . . , Si−1 , Si+1 , . . . , Sn−1 ),

(205)

where we have defined

for i = 1, 2, . . . , (n − 1). Using the constraint (202) (i.e., secure repair of nodes (2, 3, . . . , n − 1)), we have the following: H(F ) = H(F |S[1:n−1] \ {S1 })

= H(F, S[1:n−1] \ {S1 }) − H(S[1:n−1] \ {S1 }).

(206)

Let us focus on the first term appearing in (206): H(F, S[1:n−1] \ {S1 }) ≤ H(F, Wn , S[1:n−1] \ {S1 })

= H(Wn , S[1:n−1] \ {S1 }) + H(F |Wn , S[1:n−1] \ {S1 })

= H(Wn , S2 , S3 , . . . , Sn−1 ) + H(F |Wn , S2 , S3 , . . . , Sn−1 )

≤ H(Wn , S2 , S3 , . . . , Sn−1 ) + H(F |Wn , W2 , W3 , . . . , Wn−1 )

= H(Wn , S2 , S3 , . . . , Sn−1 )

≤ H(Wn , S1 , S2 , S3 , . . . , Sn−1 )

= H(Wn , U[1:n−1] , S1 , S2 , S3 , . . . , Sn−1 ) = H(Wn , U[1:n−1] ) + H(S1 , S2 , S3 , . . . , Sn−1 |Wn , U[1:n−1] )

= H(Wn , U[1:n−1] ) + H(Sn1 , Sn2 , Sn3 , . . . , Sn(n−1) |Wn , U[1:n−1] ) = H(Wn , U[1:n−1] ),

(207)

where (207) follows from the fact that (Sn1 , Sn2 , . . . , Sn(n−1) ) are all functions of Wn . Next, we focus on the second term appearing in (206): H(S[1:n−1] \ {S1 }) = H(S2 , . . . , Sn−1 )

= H(S2 , . . . , Sn−1 , S21 , S2n , . . . , S(n−1)1 , S(n−1)n )

= =

(n) H(U[1:n−1] , S[2,3,...,n−1] ) (n) H(U[1:n−1] ) + H(S[2,3,...,n−1] |U[1:n−1] ),

(208) (209) (210)

where (208) follows from the fact that Wi (and hence (Si1 , Sin )) is a function of Si . Thus, as we have S2 , we can add S21 , S2n ; similarly, as we have Si , we can add (Si1 , Sin ) without increasing the entropy, for i = 2, 3, . . . , n. Finally, (209) follows by directly expanding all the terms S2 , S3 , . . . , Sn−1 and compactly expressing all the variables by using the definitions (n) of U[1:n−1] and S[2,3,...,n−1] which were defined in (201) and (200). Using (207) and (210) in (206), we obtain (n) H(F ) ≤ H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − H S[2,3,...,n−1] |U[1:n−1] . (211) In summary, from the secure repair constraint of nodes {1, . . . , n − 1} \ {1}, we have (n) H(F ) ≤ H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − H S[1:n−1]\{1} |U[1:n−1] . Similarly, for the secure repair of nodes {1, . . . , n − 1} \ {i}, we can obtain (n) H(F ) ≤ H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − H S[1:n−1]\{i} |U[1:n−1] .

(212)

(213)

There are total of (n − 1) such bounds for i = 1, 2, . . . , (n − 1). Summing up these (n − 1) bounds, we obtain (n − 1)H(F ) ≤ (n − 1)H(Wn , U[1:n−1] ) − (n − 1)H(U[1:n−1] ) −

n−1 X i=1

(n) H S[1:n−1]\{i} |U[1:n−1] .

(214)

We next focus on the summand appearing in (214) for which we have the following inequality: n−1 X i=1

(n) (n) H S[1:n−1]\{i} |U[1:n−1] ≥ (n − 2)H S[1:n−1] |U[1:n−1] ,

(215)

which can be shown readily as follows: n−1 X i=1

n−1 X (n) (n) (n) H S[1:n−1]\{i} |U[1:n−1] = H S[1:n−1]\{i} |U[1:n−1] + H S[1:n−1]\{1} |U[1:n−1] = = =

i=2 n−1 X i=2 n−1 X

i=2 n−1 X

(216)

(n) (n) H S[1:n−1]\{i} |U[1:n−1] + H S[2,3,...,n−1] |U[1:n−1]

(217)

(n) H S[1:n−1]\{i} |U[1:n−1] + H U2n , U3n , . . . , U(n−1)n |U[1:n−1]

(218)

(n) H S[1:n−1]\{i} |U[1:n−1] + H U2n |U[1:n−1]

i=2 + H U3n |U2n , U[1:n−1] + . . . + H U(n−1)n |U2n , U3n , . . . , U(n−2)n , U[1:n−1] n−1 X (n) (n) ≥ H S[1:n−1]\{i} |U[1:n−1] + H U2n |S[1:n−1]\{2} , U[1:n−1] i=2

(n) (n) + H U3n |S[1:n−1]\{3} U[1:n−1] + . . . + H U(n−1)n |S[1:n−1]\{(n−1)} , U[1:n−1] (n) (219) = (n − 2)H S[1:n−1] |U[1:n−1] .

This completes the proof for the bound (215). Using (215) to further bound (214), we obtain

(n) (n − 1)H(F ) ≤ (n − 1)H(Wn , U[1:n−1] ) − (n − 1)H(U[1:n−1] ) − (n − 2)H S[1:n−1] |U[1:n−1] (n) = (n − 1)H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − (n − 2)H S[1:n−1] , U[1:n−1] (n) = (n − 1)H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − (n − 2)H S[1:n−1] , Wn , U[1:n−1] ≤ (n − 1)H(Wn , U[1:n−1] ) − H(U[1:n−1] ) − (n − 2)H Wn , U[1:n−1]

(220)

= H(Wn , U[1:n−1] ) − H(U[1:n−1] )

≤ H(Wn ) + H(U[1:n−1] ) − H(U[1:n−1] ) = H(Wn ) ≤ α,

(221) (n)

(n)

where (220) follows from the fact that Wn is a function of S[1:n−1] . To note this, we observe that S[1:n−1] among other variables, consists of (S1n , S2n , . . . , S(n−1)n ), which is precisely the repair data for regenerating the information stored in node n (i.e., Wn ). Hence, (221) implies that (n − 1)H(F ) ≤ α, and hence we have the proof for the bound: α S . (222) BII ≤ n−1