5 days ago - technique for optical fiber monitoring within the BSS framework, ... munication links appears as a top priority for network managers, ...

0 downloads 3 Views 1018KB Size

arXiv:1806.03329v1 [eess.SP] 8 Jun 2018

1

Department of Electrical Engineering, PUC-Rio (email: [email protected]). 2 Center for Telecommunications Studies, PUC-Rio (email: [email protected]). 3 QC2Lab, University of Calgary (email: [email protected]). 4 Department of Industrial Engineering, PUC-Rio (email: [email protected]). June 12, 2018 Abstract Manipulation of the detected backscattered Rayleigh signal inside the bandwidth of a frequencyswept optical sub-carrier propagating into an optical fiber permits an efficient localization of faults through a Fourier operator. When the bandwidth is restricted, analysis in the frequency domain can overcome the spatial resolution limitation while also inducing a high-dimensional problem. Introducing the Lasso as a signal processing technique paired with the Baseband Subcarrier Sweep (BSS) framework allows for a methodology to consistently evaluate fiber defects. In this work, a novel technique for optical fiber monitoring within the BSS framework, hereinafter called the BSS-Lasso, is proposed and tested in simulated and real-world environments, taking into account both reflective and non-reflective events. The results show that, for fiber links ranging from 2 to 15 km with up to 3 faults, over 80% of faults are detected within a 50 m range, and indicate that the proposed methodology significantly outperforms current state-of-the-art BSS-based supervision techniques. Finally, the BSS-Lasso allows for precise, low-cost, transmitter-embedded full characterization of optical fiber links.

1

Introduction

As society becomes more dependent on fast distribution of information, robust operation of telecommunication links appears as a top priority for network managers, since even the shortest service outage affects thousands or even millions of users; recently, it has been estimated that 80% of all long-distance data traffic in the world is carried by optical fibers [1]. With the evolution of data transmission protocols and network architectures, the link supervision technology must also evolve to meet this sought-after robustness whilst maintaining a low impact over the quality of data transmission. The most successful and well-established supervision technique of the physical layer of optical networks is the Optical Time-Domain Reflectometry (OTDR) [2]. The main advantage of such technology is the possibility to obtain accurate precision with high resolution in long distance monitoring [3, 4]. However, in order to fully acquire information from a fiber stretch using this technique, the transmission data is generally suspended for a non-negligible time frame since the OTDR pulse carries a considerable amount of optical power spread through a broad spectral bandwidth. As a consequence, although technically efficient, standard OTDR monitoring is usually economically burdening (see [5] and the references therein for a wider discussion). To tackle this issue, several other reflectometry-based techniques have been proposed in technical literature [6–10]. Generally, they seek for a monitoring routine such that a narrow spectral channel can be allocated, widening the simultaneous data traffic capability. In other words, an equilibrium between an efficient monitoring capacity, both in distance and resolution, and the coexistence with data transmission is pursuit. Thus, in practice, network operators need to balance the pros and cons of different monitoring 1

techniques in order to choose the one that best fits their network, both from a technical and an economical perspective. Following this rationale, a novel transparent and cost-effective reflectometry-based technique has been recently developed [11]. Due to its characteristics, the technique will be henceforth dubbed as Baseband Subcarrier Sweep (BSS). From a technical point of view, the main goals of the BSS monitoring method are to achieve reasonable spatial resolution and dynamic range in optical fiber monitoring, with both negligible a priori knowledge about the fiber and interference on data traffic. Within these four goals, three have been successfully achieved, with a dynamic range limited to 7 dB being the major hindrance of the method. Furthermore, from an implementation point of view, this technique fits the architecture of the so-called Mobile Fronthaul [12, 13], an ubiquitous concept for next-generation mobile networks, and can be seamlessly incorporated into the optical transmitter (a so-called transmitter-embedded technique) with minimum cost overhead [13]. Generally speaking, the nature of the BSS-based monitoring technique is to measure the fiber’s transfer function by detecting the Rayleigh backscattered portion of the propagating optical signal modulated by a swept tone covering several frequencies within an allocated low-frequency bandwidth. The resulting probing signal has a periodic structure in the frequency domain, which can be represented as a linear combination of spatial-dependent phasors that take the fault positions as arguments [9]. Therefore, by identifying which spatial-dependent phasors are present in the monitoring signal, one consequently identifies the set of fault positions present in the fiber. Although intuitive, the key drawback of this methodology is the necessity to perform an extensive combinatorial search in order to precisely determine the fault positions, an utterly non-trivial task to be conducted by naive optimization methods [11, 14] in reasonable computational time. In this context, techniques based on high-dimensional analysis emerge as an effective tool. In this work, a widely-used high-dimensional signal interpreter method known as Least Absolute Shrinkage and Selection Operator (Lasso) [15] is employed in order to perform a computationally efficient fault detection [16]. Fundamentally, the Lasso performs a variable selection from an over-complete dictionary, electing the components that best fit the original signal. Therefore, by designing the over-complete dictionary with the appropriate sinusoidal-based functions of all possible fault positions (e.g., performing a meterby-meter discretization of the fiber length), the Lasso methodology can be used to evaluate the optical fiber link in practical time. In fact, the Lasso has already been adapted to fit in BSS-based monitoring techniques [9]. Nevertheless, the key issue regarding this previous work was the absence of a complete mathematical model describing the impact of reflective and non-reflective events on the acquired signal; in [9], only non-reflective events were considered within the dictionary. In order to compensate the eventual unaccounted contribution of reflective events, the partial model either produces unreal events or induces shifts on the real event positions, thus reducing the accuracy and robustness of the monitoring method. Therefore, the main objectives and contributions of this work are threefold: 1. To extend the methodology proposed in [9] to account for a hybrid reflective and non-reflective modeling framework, which represents most practical cases of fiber link supervision. The proposed model makes use of the signal description derived in [11] to construct the over-complete dictionary that feeds the Lasso. 2. To propose an extension on the standard Lasso methodology to accommodate an ex-post analysis. By making use of the properties of reflective events, a tailored heuristic, hereinafter referred to as BSS-Lasso, is designed to enhance the fault position detection. 3. To design and validate a methodology to properly reconstruct the time-domain profile of a fiber based on the BSS-Lasso method. The model that relates the frequency- to the time-domain profile allows precise estimation of the fault and reflection magnitudes, a feature that enables the full characterization of the fiber’s profile. Finally, as a minor contribution, a large library of faulty fiber profiles in the frequency domain is made available so they can be used to perform extensive tests and comparisons with current state-of-the-art monitoring techniques. All data of this test bench is available in [17]. This step, and the possibility of recreating frequency-domain profiles that mimic experimental acquired data is of great importance for future proposals and allows one to validate and compare the capacity of different monitoring routines. 2

Due to the interdisciplinary character of this work, which involves optical backreflection measurements and high-dimensional data processing, an introductory background is provided. Firstly, the data acquisition and mathematical model of the fault location problem are briefly presented in Section 2, where the high-dimensional feature of the problem naturally arises from the characteristics of the supervision technique, and its constraints are pointed out. In Section 3, the framework of the Lasso is introduced, along with a discussion on how the idiosyncrasies of the stated problem allow for the tailored heuristic, the BSS-Lasso, to be proposed. With the background from these two Sections, experimental fault location results using real-world fibers are presented in Section 4. Moreover, in Section 5, the comparison of the BSS-Lasso with state-of-the-art monitoring techniques is performed in the constructed test bench in order to provide a statistically relevant analysis of the technique’s monitoring capability. The paper is concluded in Section 6, where the advances achieved by the BSS-Lasso and on-going research are summarized.

2

Experimental Setup

Sweeping the frequency of an optical sub-carrier tone to evaluate fault locations in an optical fiber has been extensively studied, and is referred to as Incoherent Optical Frequency-Domain Reflectometry (I-OFDR) [18]. In case the bandwidth of the frequency sweep is sufficiently large (in the order of GHz) to yield a granular spatial resolution (in the order of meters), a Fourier transform can be employed to extract the desired information, but at a high cost of consuming a significant portion of the available transmission band. Conversely, if the bandwidth of the frequency sweep is reduced (e.g., to a few kilo Hertz), the achievable spatial resolution through a Fourier transform is not sufficient for a precise detection, but allows for seamless adaptation into certain data transmission formats, with special stress to Sub-Carrier Multiplexed optical network architectures, where the baseband of the optical carrier is left unoccupied [11, 13]. The BSS technique falls within the latter class, making itself attractive from the point of view of adaptation to transmitters and coexistence with data transmission. The data acquisition is accomplished by modulating an optical carrier electric field with a sinusoidal signal which has its frequency increased in a stepwise fashion that respects the steady-state condition of the fiber while the amplitude and phase of the backscattered signal inside the frequency swept bandwidth are determined by a complex frequency beat detector.

2.1

Data Acquisition and Mathematical Model

The experimental setup for data acquisition is presented in Fig. 1. A Network Analyzer (NA) generates the low-frequency step-wise swept tone that directly modulates the current of a laser diode biased with a Laser Bias Source (LBS). This causes the electric field of the optical carrier to be modulated accordingly, thus creating a low-frequency optical sub-carrier channel with bandwidth defined by the step-wise sweep range, which propagates through the fiber; this channel is henceforth dubbed the monitoring channel. The optical circulator, placed immediately before the Fiber Under Test (FUT), allows one to direct any counter-propagating signal, such as the Rayleigh backscattered portion of the incoming signal, to a photodetector. Due to the elastic character of Rayleigh scattering, the detected backscattered signal will conserve the original modulation, so that a frequency analysis inside the low-frequency optical subcarrier channel can be performed with the correct apparatus. In this case, the NA itself allows for complex (amplitude and phase) frequency beat detection, so the resulting electrical signal is amplified and directed to the NA for frequency response analysis. A key challenge to adapt BSS-based technologies in fiber monitoring is precisely the spatial resolution limitation imposed by the Fourier transform analysis induced by a frequency sweep with limited range. As discussed in [11], analysis of the resulting signal directly in the frequency domain permits one to overcome such limitation and extend the achieved spatial resolution. In order to model the backscattered signal inside the monitoring channel bandwidth, the fact that the OTDR profile {P (z)}z∈[0,L] of a fiber with length L meters can be suitably approximated by a linear combination of step (breaks) and peak (reflections) functions with a single slope (the fiber’s attenuation coefficient) is used [19]. More specifically,

3

Figure 1: Experimental Setup of the baseband subcarrier sweep monitoring. Data acquisition is performed by measuring the steady-state amplitude and phase values for each frequency step. LD: Laser Diode; NA: Network Analyzer; PD: Photodiode; LBS: Laser Bias Source; BEAT: complex frequency beat detector. ∀ z ∈ [0, L], P (z) = e−2αz

X

h i φb u (z) − u (z − Xb ) +

b∈B

X

h

θr δ (z − Xr )

i

(1)

! ,

r∈R

where {Xb }b∈B are the non-reflective events with B the set of non-reflective events indexes. Similarly, {Xr }r∈R are the reflective events with R ⊆ B the set of reflective events indexes. In (1), u (z) and δ (z) denote, respectively, the Heaviside and Dirac impulse functions. Therefore, the fiber profile contains the level shifts that correspond to non-reflective faults and spikes that correspond to reflective events. Note that, although P (z) translates the amount of power being backscattered and/or reflected from each position of the fiber, the coefficients φb are not directly related to the faults magnitudes as usually presented in an OTDR; this relation is given by v u φb ξb = u , ∀ b ∈ B, (2) u1 − Y t ξj2 j∈B|j

where the ξb are the fault magnitudes of the OTDR profile in linear scale [11]. As the modulated electric field of the optical carrier propagates into the fiber, the backscattered signal (conserving the modulation properties) will propagate in the opposite direction. Since the modulating function is a sinusoidal wave, the modulated optical electric field can be cast into a phasor form with parameter κ, where κ is the wave vector κ = ωn c , with c denoting the speed of light, n the group index of refraction of the fiber’s core, and ω the low-frequency modulation of the electric field. Due to the fact that the signal is analyzed after the opto-electrical conversion, the optical phasor with optical wave-vector will be omitted [11]. The electrical power at a specific frequency S (f ), detected after the optical circulator, will correspond to the integral of P (z) over z ∈ [0, L] multiplied by the modulating function. Formally, Z S (f ) = P (z)Aej2κz dz z∈[0,L]

Z = z∈[0,L]

X

h i φb u (z) − u (z − Xb ) +

b∈B

X

! h i θr δ (z − Xr ) Aej2z(κ+jα) dz,

r∈R

4

(3)

where it should be noted that, for convenience, the intrinsic attenuation coefficient of the optical fiber has been manipulated to appear in the exponential term. For presentation purposes, experimental parameters on which the measured backscattered signal depends, such as the Rayleigh backscattering coefficient of the fiber C, the photodiode’s responsivity R, the power launched into the fiber P0 , the portion of the modulation depth occupied by the monitoring signal m, the photodetector gain G, and the Noise-Equivalent Power (NEP) of the Network Analyzer Nna , are merged into A = CRP0 mGNna , for simplicity, as they are assumed to be constant throughout the measurements. The role of each of these parameters will be discussed with depth in Section 4. The integral in equation (3) can be solved analytically by making use of properties of the Heaviside and Dirac impulse functions. More precisely, the Heaviside steps change only the limits of the integral, while the Dirac impulses evaluate the multiplying function at its displacement point. The result is the following complex function, that enables one to associate the frequency profile characteristics of the fiber to both reflective and non-reflective events. X X S (f ) = Φb SB (f, Xb ) + Θr SR (f, Xr ) , (4) r∈R

b∈B

where Φb = Aφb ;

Θr = Aθr ; 4πf n Xb c

e−2αXb − 1 SB (f, Xb ) = n j 4πf − 2α c 4πf n SR (f, Xr ) = ej c Xr e−2αXr . ej

! ;

(5)

The non-reflective (SB ) and reflective (SR ) terms in (5) correspond to the so-called spatial-dependent phasor [9, 11]. The representation in (4) is more suitable for monitoring identification and numerical analysis, since one can explore the fiber length [0, L] to construct the sets B and R that adequately recover the acquired signal, thus precisely indicating the fault positions. In Section 3, an efficient methodology to identify the sets B and R on (4) based on the Lasso is presented.

2.2

Equivalence of time- and frequency-domain monitoring

Following the results of the previous section, it becomes clear that identifying the sets B and R, which corresponds to fully characterizing the optical fiber, can be performed either in the frequency domain using S (f ) and the low-frequency profile [9] or in the time domain using P (z) [19]. Developing a consistent mathematical procedure that identifies these sets while utilizing the low-frequency profile of the fiber is the main goal of this work. Before the method is described, however, it is interesting to analyze the mathematical model and validate its description. For this purpose, the OTDR is an excellent reference since it is the standard procedure for characterizing a fiber link, as it provides the time-domain profile of the fiber. Given a fiber link for which the reflective and non-reflective events have been identified, the validation of the mathematical model can be performed by evaluating the low-frequency response of the fiber from two different – but, in principle, equivalent – points of view: (i) the signal output from the Network Analyzer, i.e., the low-frequency profile of the fiber, can be compared to the model function S (f ) using the known B and R sets; (ii) the OTDR profile of the fiber {P (z)}z∈[0,L] combined with (3) generates the low-frequency profile, which can also be compared to the model function S (f ) given B and R. The agreement between these two branches of analysis, depicted in the diagram of Fig. 2 for clarity, validates the model and allows for a direct translation between the results in frequency- and time-domains. In the first panel of Fig. 3, the OTDR profile, or time-domain profile, of an illustrative fiber is depicted as a reference; this trace is visually useful since the reflective and non-reflective events are directly identifiable. The respective positions are determined from it to be 3664 m and 10036 m. The frequency-domain profiles of the same fiber are depicted: (i) in the second panel of Fig. 3, following the upper analysis branch; (ii) in the third panel of Fig. 3, following the lower analysis branch. Together with the experimentally acquired frequency-domain profiles of the example fiber (in black), the model function S (f ), calculated based on the known events positions, is also depicted (in red). It should be 5

Figure 2: Branches of analysis clarified in the form of a diagram. In the upper branch, the output signal from the BSS monitoring technique is compared to the S (f ) model signal calculated based on the fault positions. In the lower branch, the signal acquired by a standard OTDR device is processed according to the model of Eq. (3) and, then, compared to the same S (f ). For the sake of presentation, g (f, z) is defined as the integrand in (3). noted that only the real part of the profiles is depicted for ease of visualization even though its imaginary counterpart follows a similar pattern. Apart from slight mismatches arising from the imperfection of the measurement apparatus, the correspondence between the results validates the mathematical model described in (3)–(4).

3

BSS-Lasso monitoring technique

The main objective of this work is to develop an efficient methodology to identify the spatial-dependent phasors that compose the frequency profile acquired using the BSS probing apparatus. More specifically, the methodology aims to determine the set of non-reflective and reflective event locations, respectively B and R, based on the data acquisition architecture described in Fig. 1. For this purpose, the theoretical results developed in Section 2 are combined with a high-dimensional technique based on `1 regularization, known as Lasso [15]. The proposed BSS-Lasso method involves three main stages thoroughly described in this section. The first stage, called selection stage, extends the model constructed in [9] to account for reflective events; the second, named correction stage, exploits properties of the problem to design an heuristic that corrects errors related to reflective events; finally, the third stage, called treatment stage, performs a refinement step in order to output unambiguous results. For completeness purposes, we begin by presenting the foundations of the Lasso.

3.1

The Lasso technique

The Lasso [15] was originally designed to perform a computationally efficient variable selection in regression analysis through `1 regularization (see [20,21] for a systematic literature review on variable selection techniques and extensions of the Lasso). The goal of the Lasso is, thus, to provide an adequate selection of predictors in order to successfully explain the dynamics of a given variable while reducing the number of selected explanatory variables through the `1 -norm penalty. Formally, it consists of a standard least-squares formulation with the addition of a penalty factor on the `1 -norm of the selections: n o

2 min y − Mβ 2 + λ β > w β ≥ 0 , (6) β

where y is the observed/dependent variable, M stands for a matrix of explanatory/independent variables, β is a decision vector of coefficients, and w and λ are positive vector- and scalar-size parameters, respectively. The `1 -norm penalty term (second term of the objective function) performs a regularization of the β vector around zero. In other words, it penalizes “unnecessary” deviations from zero, acting thus as 6

Figure 3: Time- and frequency-domain reconciliation based on the two branches of analysis in Fig. 2. In the upper panel, the time-domain profile is depicted as a reference. In the second panel, the output of the BSS monitoring technique (black) is compared to the model function (red). In the third panel, the frequency-domain profile calculated from the time-domain profile and (3) (black) is compared to the model function (red). a variable selection. In this regularization, w stands for the penalty weight at each coordinate of β and λ defines the penalty magnitude for the total regularization term. It is important to note that the standard Lasso does not restrict the vector β to be positive. Nevertheless, since a permanent power loss or a momentary power peak (induced by a fault or reflection, respectively) can only generate frequencydomain signals with positive coefficients, the positiveness constraint ensures the physical soundness of the model. A key challenge to apply the Lasso in practical applications is an adequate definition of the penalty weights w and λ. Usually, practitioners set w = 1 and apply an information criterion (e.g., the Extended Bayesian Information Criterion (EBIC) [22, 23]) to select, from a pre-defined set Λ of penalty values, the magnitude of λ. Therefore, one needs to solve the optimization problem (6) for all λ ∈ Λ and pick the one that results in the best (i.e., minimal) EBIC. Algorithm 1 showcases the Lasso procedure as conducted in this work1 . Algorithm 1 Lasso for λ in Λ do ˆ β(λ) ← arg minβ≥0 ky − Mβk22 + λ β > w end for ˆ ← arg min ˆ EBIC(β(λ)) ˆ return β β(λ) ˆ For the sake of simplicity, EBIC(β(λ)) represents the evaluation of the Extended BIC from the solution of (6) with penalty λ. Henceforth, Algorithm 1 will be referred to as Lasso(y, M, w).

3.2

Model design and selection stage of the BSS-Lasso

In order to identify the set of fault locations on a given fiber link based on its frequency-domain profile, equation (4) is accomodated into the Lasso framework discussed in Subsection 3.1. The procedure 1 The Lasso can be efficiently executed through several open-source packages, among which we highlight glmnet [24], that utilizes coordinate descent with Fortran subroutines and is available in several languages such as Julia, Matlab and R.

7

starts by creating a spatial discretization of the fiber length L in X = {X1 , · · · , Xq } locations. For a reasonable granularity, the fault locations are within a negligible distance of given elements in X . Additionally, a byproduct of the data acquisition architecture discussed in Section 2 is a set of frequencies F = {f1 , · · · , fm } for which the probing signal has been precisely evaluated. Therefore, the matrix M in (6) can be defined as i 1 h (7) M = · MB MR , L where matrices MB and MR represent the fault and reflection signals as follows: " # " # Re{SB (F, X )} Re{SR (F, X )} MB = ; MR = . (8) Im{SB (F, X )} Im{SR (F, X )} In (7), M contains the real and imaginary parts of the signals SB and SR defined in (5), generated by every possible fault and reflection location within the given fiber discretization X and frequency in F. Furthermore, a normalization procedure (division by L in (7)) is performed to avoid numerical issues and an intercept with associated zero penalty can be added to account for possible measurement effects that offset the acquired data. According to the model description, the dependent variable y is an instance of S(f ), which is also decomposed into its real and imaginary parts: # " Re{S(F)} . (9) y= Im{S(F)} Within this design, the decision vector β measures the coefficients Φ and Θ in equation (4), thus being decomposed into non-reflective and reflective coefficients: β = [{βbB }qb=1 , {βrR }qr=1 ]> . Therefore, the position within X in which a fault is located can be identified by picking {βbB }qb=1 greater than zero after executing Algorithm 2 with M and y defined in (7) and (9), respectively. Note that, as a byproduct of the proposed methodology, the network operator can also distinguish if the fault position has a reflective event, i.e., if the corresponding β R is also greater than zero. The selection stage is explicitly presented in Algorithm 2. Algorithm 2 BSS-Lasso – Selection stage ˆ (1) ← Lasso(y, M, 1) β ˆ (1) return β Remark 1 The monitoring technique discussed in [9], called SincLasso, is a particular instance of the selection stage by dropping matrix MR from the model, thereby not taking reflective events into account.

3.3

Correction stage of the BSS-Lasso

In [9], it was observed that the proposed SincLasso monitoring scheme regularly failed to accurately identify fault positions whenever the event had a reflective component. More specifically, significant shifts in the estimated fault position with respect to the real position were observed. Furthermore, a similar pattern persisted in the BSS-Lasso selection stage, even with the explicit inclusion of reflective events in the modeling design. Nevertheless, it was observed that, although the fault selection remained shifted in the presence of a reflection, its reflective counterpart was consistently accurate. Therefore, by making use of this empirical observation, a correction stage was designed to accommodate an ex-post analysis of the selection stage in order to improve the supervision of fibers with reflective events. The correction stage has a similar motivation to the Adaptive Lasso [25], in which it utilizes Lasso selections to modify the penalty vector w. More precisely, reflections only exist accompanied by a fault at the same position (i.e., R ⊆ B). Therefore, since the selection stage has shown much greater accuracy for the reflective selections, an adjustment on the weight vector w can be made at the respective fault position, reducing its penalty so that Algorithm 1 can be re-executed with the adjusted w. The correction stage is presented next.

8

Algorithm 3 BSS-Lasso – Correction stage (1)

ˆ Let β be the selection stage output. Let Q = {q + 1, ..., 2q}, > 0, 0 < γ < 1. Initialize penalty vectors w(k) ← 1, k = 2, 3. for k = 2 to 3 do (k−1) ˆ (k−1) i. If maxj∈Q {βˆj } = 0 =⇒ return β ii. Adjust penalty vector w(k) : (k) (k−1) (k−1) wi−q ← γ, ∀ i ∈ Q | βˆi > · maxj∈Q {βˆj } (k) iii. Run the Lasso with penalty w : ˆ (k) ← Lasso(y, M, w(k) ) β end for ˆ (3) return β In Algorithm 3, is a small number that acts as a sensitivity threshold. Recall from the model design in Subsection 3.2 that coordinates {1, ..., q} in β correspond to non-reflective event candidates ({βbB }qb=1 ) and {q + 1, ..., 2q} the reflective event candidates ({βrR }qr=1 ). Therefore, γ ∈ (0, 1) is defined to reduce the penalty weight on the first q elements of β according to the reflective selections of the previous iteration, such that w(k) has all elements equal to one, except those that represent faults with nonnegligible reflective selections, which receive a reduced penalty value γ. More precisely, if the selection stage outputs any non-zero reflective selections, the BSS-Lasso enters the correction stage, where up to two more iterations are computed in order to address the previously mentioned shifting phenomenon. For each iteration k, a reduced penalty vector w(k) is created based on the selections of the previous iteration, and the Lasso is executed accordingly. The rationale behind the correction stage is that, if a reflection was found at a given position, it generally indicates the presence of a fault at that location. Additionally, if Algorithm 3, due to the reduced penalty, selects at the correct location a fault that was previously shifted, this often causes the algorithm to reject its previous incorrect position because of the `1 -norm penalty, thus correcting the shifting phenomenon. For illustrative purposes, Fig. 4 depicts the monitoring accuracy of the SincLasso, BSS-Lasso up to the selection stage and complete BSS-Lasso for an 8 km fiber link. Note that, for the reflective fault at 4 km, both the SincLasso and the BSS-Lasso up to the selection stage incurred in an error of over 400 m, while the complete BSS-Lasso presented a 10 m error due to the correction stage. Remark 2 In this work, the proposed algorithm has up to three overall Lasso iterations. Although additional iterations might be justifiable to further improve the quality of the method, there is a lack of a consistent stopping criterion in order to avoid cycling. In fact, two overall iterations are sufficient for the existence of a correction stage. However, it was empirically observed that a third one, which acts as a refinement step, significantly improves the precision of the method. Furthermore, the benefits of additional iterations (more than three) were observed to be marginal, non-existent or even negative. For these reasons, the number of iterations was fixed at three in Algorithm 3, which results in a computationally effective procedure.

3.4

Treatment stage of the BSS-Lasso

Ultimately, Algorithm 3 outputs a 2n-dimensional vector indicating the fault locations. However, in many cases, the result is not presented as a straightforward handful of singular selections, but rather as a set, or cluster, of event positions. Although the appearance of such clusters may have its roots in the limited spatial resolution of the monitoring technique, the Lasso is known to produce clusters whenever the over-complete dictionary variables present high levels of correlation [20]. In fact, depending on the fiber discretization, adjacent columns of the explanatory matrix M represent close enough positions that their induced signals are almost indistinguishable, thus strongly correlated. As a consequence, the selection and correction stages oftentimes output multiple selections around the real fault position as a cluster (see Fig. 5 for an illustrative example containing an output from the correction stage and the final BSS-Lasso result) and it might be useful to refine the result in order to obtain more accurate selections to better assist the network operator. 9

Figure 4: Comparison of the normalized selections resulting from SincLasso, BSS-Lasso up to the selection stage and complete BSS-Lasso for an 8 km fiber link with a reflective fault at 4 km and a non-reflective fault at 8 km. A handful of techniques appears in technical literature to handle correlated explanatory variables within the Lasso framework. The commonly used approach performs an a priori clustering of the correlated explanatory variables before solving the Lasso problem [26]. However, such a priori clustering is not suited for the particular application of this work, since all explanatory variables are sequentially correlated to its adjacent neighbors with respect to the employed distance grid in matrix M. To overcome this issue, an alternative is to narrow down the clusters of selections into single selections after the Lasso processing. From a technical point-of-view, a cluster of selections represent a degenerate projection of y onto the column space of M, i.e, one possible way to express this projection in terms of the column vectors of M. As a consequence, it is likely that the real fault is among the cluster positions and closer to the selections with higher magnitude. Therefore, one can interpret each cluster of selections as a probability density for the respective true fault position. In this context, several approaches can be applied to directly narrow down the clusters of selections. Firstly, since the clusters can be interpreted as probability densities, a computationally efficient approach is to estimate the true fault position by applying a weighted average of the positions within the clusters, where the weights are given by the selection magnitudes. Another possibility is to consider all possible combinations of positions with a single selection per cluster and choose the one which outputs the least square model fitting error (minimal `2 norm). This approach can be implemented by either a combinatorial set of ordinary least squares computations or via Mixed Integer Quadratic Programming (MIQP) [27, 28]. Despite showing the best empirical results, in theory, it is affected by the curse of dimensionality and can be computationally burdening, i.e., instances with wide and/or several clusters (e.g., a fiber with numerous faults) might be intractable in reasonable computational time. Nevertheless, it should be emphasized that, for most practical cases, this approach can be computed in the order of seconds. Finally, a viable alternative to the combinatorial least squares for fibers with multiple events is the least absolute error (minimal `1 norm), which can be solved using Mixed Integer Linear Programming (MILP) algorithms. Although still a combinatorial procedure (due to its integer nature), MILP problems are widely recognized to be more computationally efficient to be solved than MIQP. Due to its better empirical performance, in this work, the treatment of the clusters of selections is performed using least squares. It should be highlighted that this choice is more consistent with the proposed BSS-Lasso methodology, since the Lasso already uses the `2 norm for the model fitting (see the first term of the objective function in (6)). In fact, this choice of ex-post cluster treatment can be seen as an instance of the Lasso’s original design [15], in which the `1 regularization term (second term of the objective function in (6)) is replaced by the semi-norm `0 and written as a constraint bounded by 10

Figure 5: Time-domain profile in the top panel and BSS-Lasso selections in the bottom panel for a 5 km fiber link with a reflective event at 3 km and a non-reflective event at 5 km. The reflective event results in a single selection, while the non-reflective one induces a cluster of selections. 1 for each cluster. More precisely, let {Ci }C i=1 be the family of C clusters resultant from the BSS-Lasso C and M = Mj }j∈Ci i=1 , the explanatory matrix restricted to the position indexes within each cluster. Then the cluster treatment procedure can be formulated as the following mathematical programming problem ([27, 28] are referred to for a wider discussion and efficient formulations for this problem): ( )

2 β i 0 ≤ 1, ∀ i ∈ C;

min y − Mβ 2 , (10) β ≥ 0; β where β i = {βj }j∈Ci . Therefore, in this context, the original `0 -Lasso is solved, but avoiding its high combinatorial nature since the `0 search is performed in a drastically reduced space when compared to the original problem. The treatment stage of the BSS-Lasso is thus presented in Algorithm 4. Algorithm 4 BSS-Lasso – Treatment stage (3)

ˆ Let β be the correction stage output. i ← 1, Ci ← {}, C ← i. for j = 1 to q do (3) If βˆj > 0 =⇒ Ci ← Ci ∪ {j} Else i ← i + 1; Ci ← {}; C ← i end for C Construct M = Mj }j∈Ci i=1 Solve `0 -Lasso (10) and return its optimal decision vector For future reference, BSS-Lasso will be identified as the sequential computing of Algorithms 2–4.

4

Experimental Validation

Validation of the BSS-Lasso in a real environment consists in comparing its estimated events positions with the reference events positions determined using a standard OTDR device for different fibers. Practically, the limitation on the number of experimentally tested links is determined by the availability of fibers 11

with different lengths in the laboratory and the possibility of connecting these fibers to form links with different number of events and different lengths. Six fiber links were available, for which the profiles have been individually measured; combinations of these six fibers two-by-two, to compose two-event links, produced 30 more examples. Prior to measuring the fibers, however, a few experimental parameters concerning the data acquisition and the minimum monitoring signal modulation power must be defined.

4.1

Experimental parameters characterization

As shown in the experimental setup of Fig. 1, the optoelectric conversion and signal amplification is performed by a photodetector, which is composed mainly by a photodiode and two amplifiers. From an optical perspective, as long the backscattered optical signal reaching the photodiode carries power greater than the NEP, the output electrical signal will carry information about the fiber. In the same way, but from an electrical point of view, the signal reaching the NA must be greater than its NEP (Nna ), so that the measured frequency profile translates the fiber’s characteristics. Since an important characteristic of the Baseband Subcarrier Sweep monitoring technique is its coexistence with data transmission in a transmitter-embedded configuration, as discussed in [11], it is paramount that a balance between monitoring signal power and the capacity of meeting the previously mentioned power requirements, in both the photodetector and the NA, be found. In this subsection, this balance will be analyzed and the experimental parameters used throughout the BSS-Lasso validation will be presented. Analysis of the optical signal-to-noise ratio (SNR) requires knowledge of a few parameters, all of which are summarized in the upper part of Table 1. The laser diode’s bias current is set at 70 mA current yielding an output optical power of 4 dBm; this allows for a 52 mA excursion inside its linear region. An insertion loss of approximately 1 dB in the optical circulator (port 1 → port 2) sets the input optical power to the fiber under test (FUT) to ∼ 3 dBm. Thus, considering a Rayleigh backscattered coefficient C of -72 dB/m [3], an attenuation coefficient α of 0.2 dB/km, a 6372 m optical fiber, and another 1 dB of insertion loss in the optical circulator (port 2 → port 3), the optical power arriving at the photodiode in the steady-state regime would be −33.20 dBm (or 477 nW). This value has been experimentally verified to be −34.02 dBm. The employed photodetector exhibits a NEP of 200 pW in a full bandwidth condition [29], so a comfortable 33 dB SNR at the photodiode is ensured. Note, however, that this SNR represents the total optical power arriving at the photodiode, including the DC component. As will be discussed presently, a 10% portion of the laser’s full modulation depth is occupied by the monitoring signal, so the SNR for the signal of interest is approximately 23 dB. Electrical signal analysis begins right after the opto-electrical conversion of the photodiode: although the two amplification stages of the photodetector exhibit noise figures of their own and also amplify noise coming from the photodiode, the optical SNR is sufficiently high so that one can consider the output signal of the photodetector to maintain the same SNR. Thus, taking into account the received optical power of 477 nW and the photodetector parameters (responsivity, transimpedance gain, and the 2nd stage voltage gain) and also the portion of the full modulation depth occupied by the monitoring signal, all shown in Table 1, it is possible to calculate an amplitude of 298.37 mV for the input signal entering port 2 of the NA at low frequencies. According to the mathematical model and the experimental results, the amplitude of the electrical signal has an inverse dependence with the frequency and, as the latter increases, the former is expected to dramatically decrease. The value for the low-frequency measurement, however, has been experimentally measured to be 303.57 mV, which is far above Nna (−95 dBm [30]), setting the electrical SNR at 47.32 dB and validating both the optical and electrical signal analysis. To address the SNR reduction as the frequency increases, which can translate into up to 30 dB reduction at the maximum swept frequency, an averaging process is employed. This consists of acquiring several traces inside the same interval and taking the arithmetic average, a resource already available in the employed NA. At the same time, since the frequency-domain profile is determined in a steady-state regime, the intermediate frequency bandwidth of the NA frequency beat detector is set to a low value (150 Hz) translating into a considerable long time-constant of 6.64 s. This long time constant translates into an intrinsic averaging of the measured amplitude and phase of the input signal, which, by itself, diminishes noise contributions to the measurement, especially at higher frequencies, and alleviates the number of samples for the average procedure. Therefore, in order to reach a compromise between faster data acquisition and negligible noise contribution, a total of 10 samples has been set as default to the averaging procedure; the total data acquisition process amounts less than 2 minutes. As mentioned in the previous analysis, the portion of the laser’s full modulation depth occupied by 12

Table 1: Experimental Parameters for Data Acquisition Experimental Parameters Optical

Opto/Electric

Electrical

Fiber input power (P0 ) Fiber length Circulator loss Fiber attenuation (α)

3 dBm 6372 m 1 dB 0.2 dB/km

PD responsivity @ 1550 nm (R)

1 A/W

PD transimpedance gain PD 2nd stage voltage gain Laser Bias Source (LBS) NA modulation power Modulation depth (m)

626 V/A 1x104 V/V 70 mA 0 dBm 10%

the monitoring signal was set to 10% as the balance between minimal monitoring power that allows for accurate fiber characterization. This value immediately impacts on the capacity of concurrent data transmission in a direct modulation scheme and experimentally characterizes the BSS-Lasso technique in the context of monitoring and data coexistence. To validate this parameter, the signal’s amplitude entering the modulation port of the laser was varied from 50 mVpp to 1.6Vpp, which corresponds, approximately, to 2% and 62% of the laser’s full modulation depth, respectively. The BSS-Lasso was tested for each signal amplitude in the same fiber testbed (with parameters described in Table 1), with the frequency-domain fiber profile being determined with the BSS setup, and errors between the real positions of events and the ones obtained from the technique being evaluated based on the output of the BSS-Lasso. The experimental result is presented in Fig. 6.

Figure 6: Impact of the low-frequency monitoring signal’s amplitude – written in terms of percents of the laser’s full modulation depth – on the BSS-Lasso event position estimation. The 5% mark represents a clear lower bound on the amplitude of the monitoring signal so that reliable results are obtained. The results validate the experimental parameters presented in Table 1 and show that, for monitoring signals with amplitude greater than 10% of the modulation depth, position estimates are within a 30 m distance error from the actual positions. When the monitoring signal’s amplitude falls below 5% of the modulation depth, on the other hand, the distance error rapidly increases. This indicates that the BSS-Lasso can be implemented to continually monitor a fiber link while occupying only a tenth of the laser’s full modulation depth, thus making available a considerable portion of the laser’s electro-optical transfer function for the purpose of data transmission. All the experimental results presented in this paper have been acquired using the parameters described in Table 1.

13

4.2

Experimental results

In Fig. 7, the event position estimation error of the BSS-Lasso is compared with the reference results of a standard OTDR device and presented in the form of a scatter plot. Three error thresholds are defined in the scatter; the first one delimits a low-error interval, where the results are deemed to be very close to the reference, in which almost 70% of the results have fallen; the second delimits results that are fairly close to the reference position, and holds over 93% of the total results. A few outliers can be observed with errors above 100 m. The error distribution, mainly concentrated around the zero error point and within the 0-50 m error interval attests the prowess of the BSS-Lasso in characterizing the assembled experimental fiber links.

Figure 7: Distribution of position estimation errors for all the experimentally measured fiber profiles. Apart from a few outliers, the positions are within a 100 meters error interval from the actual real fault positions. Interesting observations can be made from the results of Fig. 7. First of all, it is clear that the estimation error tends to be closer to the zero for single-fiber links. As will also be discussed in Section 5, the estimation is hindered by the amount of events present in the fiber link. Secondly, it is noticeable that the presence of reflective events induces a better estimation of the event’s positions, as all but one of them fell within a 50 m error even with a higher number of events (two non-reflective events and one reflective event). This characteristic, which will also be studied in Section 5, indicates that the modeling of reflective events paired with the correction stage increases the robustness of the estimation. Up to this point, accurate determination of the faults positions have been demonstrated using the BSSLasso. However, determination of the fault magnitudes is of utmost importance for full characterization of the optical fiber link, and can be performed following Eq. (2). Nevertheless, even though the coefficients φb determined as a product of the BSS-Lasso allow for accurate estimation of the BSS frequency-domain profile, they include the bias naturally induced by the Lasso [15]. Moreover, the relationship between the φb and the actual fault magnitudes ξb (Eq. (2)) is extremely non-linear, so eventual deviations in the former induce severely imprecise results for the latter. To ensure correct estimation of the fault magnitudes, a reconstruction procedure of the time-domain profile of the fiber, that does not depend on the amplitudes determined by the BSS-Lasso, is proposed. It is as follows: utilizing the event position estimates, fiber link profiles are created in the form of Eq. (1) for which the respective non-reflective magnitudes ξb are sorted within a predetermined interval of [0, 5] dB in two sequences of steps, a coarser 0.5 dB, and then a finer 0.1 dB, and the reflective magnitudes θr are sorted within a predetermined interval of [0, 20] dB in steps of 2 dB; from the created P (z), frequencydomain profiles S (f ) are calculated using Eq. (3) and then compared to the estimated frequency-domain 14

profile given by the BSS-Lasso; the created S (f ) are compared with the BSS-Lasso estimated profile using the `2 error norm; the best reconstructed fiber profile according to this metric is presented as the reconstructed fiber profile. Presented in Table 2, for four experimental fiber links, are: the real fault magnitudes, calculated using the OTDR profile measured by a standard OTDR device; the magnitudes calculated using Eq. (2) and the BSS-Lasso coefficients; and the magnitudes determined through the reconstruction procedure. Table 2: Fault Magnitude Comparison Real Calculated Reconstructed Magnitudes [dB] Magnitudes [dB] Magnitudes [dB] 1.9, 2.0, 2.9, 0.9,

23.0 20.7 17.5 22.2

2.77, 1.79, 3.53, 1.92,

∞ ∞ ∞ ∞

1.6, 2.0, 3.0, 1.0,

22.0 22.0 22.0 22.0

In other words, the reconstruction procedure is equivalent to the one depicted in the lower branch of Fig. 2, but instead of using experimentally acquired OTDR profiles, it uses artificially created P (z) based on the results of the BSS-Lasso. The result, as can be perceived from Table 2, is a much more accurate estimate of the fault magnitudes. Finally, an example of the reconstruction procedure for the first fiber link of Table 2, i.e., the artificially generated P (z) that translates into S (f ) that best approximates the estimated frequency-domain signal given by the BSS-Lasso, is depicted in Fig. 8.

Figure 8: Time-domain profile reconstruction based on the results of the BSS-Lasso. In the upper panel, the estimated frequency-domain profile of the BSS-Lasso is plotted against the S (f ) that corresponded to the minimum `2 error norm during the reconstruction step. In the lower panel, the original OTDR profile acquired with a standard OTDR device is depicted along with the reconstructed profile.

5

Simulation Results

In order to illustrate the robustness of the proposed monitoring methodology, in this section, extensive computational tests are conducted on the BSS-Lasso. To do so, a large-scale test bench was created: a total of three sets of 1000 fiber links each containing, respectively, one, two, and three faults were randomly generated based on the following steps: 15

Table 3: Selection errors for each set of simulated links Error [m]

SincLasso

1 fault BSS-1

[0, 50] (50, 100] (100, 200] (200, ∞)

52.20% 1.60% 11.80% 34.40%

52.20% 1.60% 11.90% 34.30%

BSSLasso 89.80% 1.60% 7.20% 1.40%

SincLasso

2 faults BSS-1

49.60% 1.20% 4.40% 44.80%

49.45% 1.45% 5.85% 43.25%

BSSLasso 81.50% 2.95% 5.65% 9.90%

SincLasso

3 faults BSS-1

50.70% 2.60% 3.33% 43.37%

50.77% 2.87% 4.93% 41.43%

BSSLasso 77.50% 4.03% 4.30% 14.17%

Table 4: Contingency table (+- 50 m)

Fault Found Fault Neglected Measures

SincLasso True Positives 3035 False Negatives 2945 Sensitivity 50.58%

Fault Present BSS-1 True Positives 3034 False Negatives 2966 Sensitivity 50.57%

BSS-Lasso True Positives 4853 False Negatives 1147 Sensitivity 80.88%

SincLasso False Positives 2808 True Negatives 2959557 Specificity 99.91%

Fault Absent BSS-1 False Positives 3251 True Negatives 2958128 Specificity 99.89%

BSS-Lasso False Positives 1379 True Negatives 2957685 Specificity 99.95%

1. Fiber lengths are sampled evenly within L ∈ [2, 15] km. 2. Given the fiber length sampled in Step 1, the fault locations are also sampled evenly within [2, L], with a mandatory fault at the end of the fiber. 3. In order to sample reflective events, a 50% chance of having an associated reflection is attributed to each fault sampled in Step 2. 4. The magnitudes of the events are then randomly chosen, with faults ranging evenly between [1, 5] dB and reflections up to 20 dB. 5. The time-domain profile P (z) of the fiber link is constructed using Eq. (1) and the parameters sampled in Steps 1–4. 6. The methodology presented in Section 2 is utilized to obtain the frequency-domain profile S (f ) from P (z), using Eq. (3) with a set of frequencies ranging within [100, 100000] Hz discretized in 100 Hz. For reproducibility purposes, the complete test bench of fiber links used in this section is available in [17]. All tests were conducted in Julia language with an Intel Core i7-490K CPU at 4.00 GHz and 32 GB of RAM memory. The simulation parameters used throughout the testbench were: 10 m length discretization; reduced penalty factor γ = 0.5; and sensitivity threshold = 0.05. For the sake of comparison, the performance of the BSS-based monitoring technology presented in [9], known as SincLasso, was also evaluated for the same test bench. Furthermore, in order to quantify the contribution of the correction stage described in Algorithm 3, the results of the BSS-Lasso up to the selection stage (named BSS-1 for presentation purposes) are also studied. The test results are presented in two separate tables, for more rigorous analysis. In Table 3, the errors are stratified among four distance intervals and the respective percentage of position estimates within each interval is evaluated for the three techniques. Table 4 presents the so-called contingency table. The idea is to perform a binary classification of fault/no fault events based on the output of the techniques. More precisely, in the contingency table, True Positives are accounted by estimations within the low-error interval, i.e., when the selection error is 50 m or lower. Similarly, False Positives are estimations outside the low-error interval. It should be highlighted that Table 4 was constructed by combining the results of the three sets of fiber links, totaling 6000 events. The first conclusion based on the analysis of both tables is the clear dominance of the BSS-Lasso over the other two techniques; the BSS-Lasso achieves over 80% of fault estimates within the low-error interval while the other techniques barely reach 50%. A second observation is that the number of events clearly impacts the accuracy of the methodology, as it diminishes from 89.80% for the set of fiber links with a 16

single event to 77.50% for the set with three events. It is also striking that the selection stage of the BSSLasso has performance extremely similar to the SincLasso, indicating that the inclusion of the reflections in the dictionary of the Lasso is not sufficient to accurately detect reflective events. Furthermore, the near 50% accuracy limit of the SincLasso shares a strong correlation with the definition of Step 3 in the test bench creation protocol, since the probability of a reflective event was set to 50%. This result translates the lack of precision induced by a reflection and indicates that the BSS-Lasso not only deals with such reflective events but uses their presence to optimize the estimation through the correction stage. The BSS-Lasso also excels when analyzing the contingency table, with a number of False Negatives and False Positives less than half of those of the other techniques. Even though a finer analysis of the specificity2 is hindered by the high number of positions that do not present a fault event, the results are coherent with the sparse nature of the problem itself. Furthermore, BSS-Lasso’s precision3 of 77.87% indicates that a selection does generally represent a true fault. This is extremely important from a practical point of view, as scheduling an in-field unit to repair a non-existing fault can be costly. At the same time, although the sensitivity4 measure is roughly 80%, analysis of the events which were not identified shows a two-fold behavior: either the fault magnitude was low and the BSS-Lasso neglects its presence; or the magnitude of an associated reflective event was low enough so that it was not identified and the shift correction was not employed, thus ensuing an error of over 50 m. In the first case, which corresponds to the majority of errors, the small faults do not entirely compromise the link’s operation capacity. In the second case, though representing a more serious defect in fault location, the operator would have knowledge of the presence of the fault but with reduced accuracy.

6

Conclusions

Physical layer supervision is essential for robust operation of optical fiber links, which, currently, supports over 80% of all the data traffic worldwide. Different optical networks present differing supervision requirements such as dynamic range, spatial resolution, co-existence with data transmission, or ease of adaptation into the transmission system. Baseband Subcarrier Sweep-based monitoring techniques offer all of the above except for a high dynamic range, which makes it especially interesting for short- to medium-haul optical fiber links. The high-dimensional problem induced by the mathematical model, however, dictates that a signal processing technique must be employed for data analysis. The Lasso, which performs sparse estimation given an over-complete dictionary of candidates, is employed as the analysis mechanism together with a tailored heuristic to yield robust and precise estimations for the fault locations. The BSS-Lasso presents several desired properties in the context of optical fiber link characterization, as corroborated by the results: (i) high specificity and sensitivity for reflective and non-reflective events; (ii) successful detection of multiple faults within a low-error interval; and (iii) low computational burden, having required less than two minutes for all links simulated and experimentally acquired in this paper. To summarize, the BSS-Lasso is able to extend the frequency-domain profile model characterization including all the possible events present in a fiber link, both reflective and non-reflective. Also, both experimental and simulation results show the efficacy of the ex-post analysis, or correction stage, which besides accurately identifying the reflective events, also makes use of their presence to increase the precision of the fault locations. Furthermore, validation of the mathematical model and correct manipulation of the BSS-Lasso results allow for the reconstruction of the time-domain profile of fibers and the estimation of the magnitude of faults. In conclusion, the BSS-Lasso allows for precise, low-cost, transmitter embedded full characterization of optical fiber links in practical computational time. Even though the fiber characterization technique described by the BSS-Lasso has reached a selfconsistent and complete methodology, with no loose ends in terms of either data acquisition or signal processing, several ideas could be explored in future works to improve the technique. From a dataacquisition point of view, some of these are: (i) the analysis of the impact of the optical carrier’s bandwidth and polarization on the measured frequency-domain profile; (ii) the extension of the frequency sweep range and whether this allows for increased estimation accuracy; and (iii) the employment of a lower NEP detector with the goal of extending the limited dynamic range of the BSS-Lasso. In the 2 Specificity

= #True Negatives/(#True Negatives + #False Positives) = #True Positives/(#True Positives + #False Positives) 4 Sensitivity = #True Positives/(#True Positives + #False Negatives) 3 Precision

17

signal processing part, some of these are: (i) determining whether the dynamic range is limited by an impossibility of acquiring signals from distant portions of the fiber, or by the sparse characteristic of the LASSO, which neglects signal contributions that are too small; and (ii) identifying a more consistent and efficient methodology for the cluster analysis step. Finally, the possibility of embedding the software into a micro-controlled unit running Julia allied with a dedicated low-cost complex frequency beat detector (instead of the NA) would allow the creation of an independent measurement device, such as a standard OTDR, and is also a viable future development of this work.

Acknowledgements The authors would like to thank Christiano Nascimento and Breno Perlingeiro for technical support. Financial support from brazilian agency Capes is acknowledged.

References [1] S. Kumar and M. J. Deen, Fiber optic communications: fundamentals and applications. 2014.

Wiley,

[2] M. K. Barnoski, M. D. Rourke, S. M. Jensen, and R. T. Melville, “Optical time domain reflectometer,” Applied Optics, vol. 16, no. 9, pp. 2375–2379, Sept. 1977. [3] D. Derickson, Fiber Optic - Test and Measurement, 1st ed.

Prentice Hall, 1998.

[4] F. Liu and C. J. Zarowski, “Events in fiber optics given noisy OTDR data—Part I. GSR/MDL method,” IEEE Transactions on Instrumentation and Measurement, vol. 50, no. 1, pp. 47–58, Feb. 2001. [5] P. Eraerds, M. Legr´e, J. Zhang, H. Zbinden, and N. Gisin, “Photon counting OTDR: advantages and limitations,” Journal of Lightwave Technology, vol. 28, no. 6, pp. 952–964, Mar. 2010. [6] S. Hangai and Y. Taki, “Detection of faults in short fiber by the phase compensated reflectometer,” IEEE Transactions on Instrumentation and Measurement, vol. 39, no. 1, pp. 238–241, Feb. 1990. [7] X. Dong, A. Wang, J. Zhang, H. Han, T. Zhao, X. Liu, and Y. Wang, “Combined attenuation and high-resolution fault measurements using chaos-OTDR,” IEEE Photonics Journal, vol. 7, no. 6, pp. 1–7, Dec. 2015. [8] G. C. Amaral, J. D. Garcia, L. E. Y. Herrera, G. P. Tempor˜ao, P. J. Urban, and J. P. von der Weid, “Automatic fault detection in WDM-PON with tunable photon counting OTDR,” Journal of Lightwave Technology, vol. 33, no. 24, pp. 5025–5031, Dec. 2015. [9] G. C. Amaral, J. D. Garcia, B. Fanzeres, P. J. Urban, and J. P. von der Weid, “Multiple fiber fault location with low-frequency sub-carrier tone sweep,” IEEE Photonics Technology Letters, vol. 29, no. 13, pp. 1116–1119, Jul. 2017. [10] F. Calliari, L. E. Y. Herrera, J. P. von der Weid, and G. C. Amaral, “High-dynamic and highresolution automatic photon counting OTDR for optical fiber network monitoring,” in Proceedings of the 6th International Conference on Photonics, Optics and Laser Technology - Volume 1: PHOTOPTICS,, INSTICC. SciTePress, 2018, pp. 82–90. [11] G. C. Amaral, A. Baldivieso, J. D. Garcia, D. C. Villafani, R. G. Leibel, L. E. Y. Herrera, P. J. Urban, and J. P. von der Weid, “A low-frequency tone sweep method for in-service fault location in sub-carrier multiplexed optical fiber networks,” Journal of Lightwave Technology, vol. 35, no. 10, pp. 2017–2025, May 2017. [12] A. Pizzinat, P. Chanclou, F. Saliou, and T. Diallo, “Things you should know about fronthaul,” Journal of Lightwave Technology, vol. 33, no. 5, pp. 1077–1083, Mar. 2015.

18

[13] P. J. Urban, G. C. Amaral, and J. P. von der Weid, “Fiber monitoring using a sub-carrier band in a sub-carrier multiplexed radio-over-fiber transmission system for applications in analog mobile fronthaul,” Journal of Lightwave Technology, vol. 34, no. 13, pp. 3118–3125, Jul. 2016. [14] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM Journal on Computing, vol. 24, no. 2, pp. 227–234, 1995. [15] R. Tibshirani, “Regression shrinkage and selection via the Lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996. [16] M. Parrilla, J. Anaya, and C. Fritsch, “Digital signal processing techniques for high accuracy ultrasonic range measurements,” IEEE Transactions on Instrumentation and Measurement, vol. 40, no. 4, pp. 759–763, Aug. 1991. [17] “Test bench [Online]. Available: https://github.com/raphaelsaavedra/LassoFiberAnalysis. jl.” [18] S. Liehr, N. N¨ other, and K. Krebber, “Incoherent optical frequency domain reflectometry and distributed strain detection in polymer optical fibers,” Measurement Science and Technology, vol. 21, no. 1, pp. 1–4, Nov. 2010. [19] J. P. von der Weid, M. H. Souto, J. D. Garcia, and G. C. Amaral, “Adaptive filter for automatic identification of multiple faults in a noisy OTDR profile,” Journal of Lightwave Technology, vol. 34, no. 14, pp. 3418–3424, Jul. 2016. [20] T. Hastie, R. Tibshirani, and J. Friedman, The elements of statistical learning: data mining, inference, and prediction., 2nd ed. Springer, 2009. [21] F. Peres and F. Fogliatto, “Variable selection methods in multivariate statistical process control: A systematic literature review,” Computers & Industrial Engineering, vol. 115, no. 1, pp. 603–619, Jan. 2018. [22] A. F. M. Smith and D. J. Spiegelhalter, “Bayes factors and choice criteria for linear models,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 2, no. 42, pp. 213–220, 1980. [23] J. Chen and Z. Chen, “Extended BIC for small-n-large-p sparse GLM,” Statistica Sinica, vol. 2, no. 22, pp. 555–574, 2012. [24] J. Friedman, T. Hastie, and R. Tibshirani, “glmnet: Lasso and elastic-net regularized generalized linear models,” R package version, vol. 1, no. 4, 2009. [25] H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418–1429, Jan. 2006. [26] P. B¨ uhlmann, P. R¨ utimann, S. van de Geer, and C.-H. Zhang, “Correlated variables in regression: Clustering and sparse estimation,” Journal of Statistical Planning and Inference, vol. 11, no. 143, pp. 1835–1858, Nov. 2013. [27] D. Bertsimas, A. King, and R. Mazumder, “Best subset selection via a modern optimization lens,” The Annals of Statistics, vol. 44, no. 2, pp. 813–852, Apr. 2016. [28] D. Bertsimas and A. King, “OR Forum – An algorithmic approach to linear regression,” Operations Research, vol. 64, no. 1, pp. 2–16, Jan.–Feb. 2016. [29] 10-MHz Adjustable Balanced Photoreceivers, New Focus. [Online]. Available: newport.com/p/2117-FC

https://www.

[30] E5061B Network Analyzer, Keysight Technologies. [Online]. Available: https://literature.cdn. keysight.com/litweb/pdf/5990-4392EN.pdf?id=1790097

19