Abstract:
All-optical switching, in place of electronic switching, of high data-rate lightpaths at intermediate nodes is one of the key enabling technologies for economically scalable future data networks. This replacement of electronic switching with optical switching at intermediate nodes, however, presents new challenges for fault detection and localization in reconfigurable all-optical networks. Presently, fault detection and localization techniques, as implemented in SONET/G.709 networks, rely on electronic processing of parity checks at intermediate nodes. If similar techniques are adapted to all-optical reconfigurable networks, optical signals need to be tapped out at intermediate nodes for parity checks. This additional electronic processing would break the all-optical transparency paradigm and thus significantly diminish the cost advantages of all-optical networks. In this thesis, we propose new fault-diagnosis approaches specifically tailored to all-optical networks, with an objective of keeping the diagnostic capital expenditure and the diagnostic operation effort low. Instead of the aforementioned passive monitoring paradigm based on parity checks, we propose a proactive lightpath probing paradigm: optical probing signals are sent along a set of lightpaths in the network, and network state (i.e., failure pattern) is then inferred from testing results of this set of end-to-end lightpath measurements. Moreover, we assume that a subset of network nodes (up to all the nodes) is equipped with diagnostic agents - including both transmitters/receivers for probe transmission/detection and software processes for probe management to perform fault detection and localization. The design objectives of this proposed proactive probing paradigm are two folded: i) to minimize the number of lightpath probes to keep the diagnostic operational effort low, and ii) to minimize the number of diagnostic hardware to keep the diagnostic capital expenditure low.(cont.) The network fault-diagnosis problem can be mathematically modeled with a group testing-over-graphs framework. In particular, the network is abstracted as a graph in which the failure status of each node/link is modeled with a random variable (e.g. Bernoulli distribution). A probe over any path in the graph results in a value, defined as the probe syndrome, which is a function of all the random variables associated in that path. A network failure pattern is inferred through a set of probe syndromes resulting from a set of optimally chosen probes. This framework enriches the traditional group-testing problem by introducing a topological structure, and can be extended to model many other network-monitoring problems (e.g., packet delay, packet drop ratio, noise and etc) by choosing appropriate state variables. Under the group-testing-over-graphs framework with a probabilistic failure model, we initiate an information-theoretic approach to minimizing the average number of lightpath probes to identify all possible network failure patterns. Specifically, we have established an isomorphic mapping between the fault-diagnosis problem in network management and the source-coding problem in Information Theory. This mapping suggests that the minimum average number of lightpath probes required is lower bounded by the information entropy of the network state and efficient source-coding algorithms (e.g. the run-length code) can be translated into scalable fault-diagnosis schemes under some additional probe feasibility constraint. Our analytical and numerical investigations yield a guideline for designing scalable fault-diagnosis algorithms: each probe should provide approximately 1-bit of state information, and thus the total number of probes required is approximately equal to the entropy of the network state.(cont.) To address the hardware cost of diagnosis, we also developed a probabilistic analysis framework to characterize the trade-off between hardware cost (i.e., the number of nodes equipped with Tx/Rx pairs) and diagnosis capability (i.e., the probability of successful failure detection and localization). Our results suggest that, for practical situations, the hardware cost can be reduced significantly by accepting a small amount of uncertainty about the failure status.

Description:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.MIT Barker Engineering Library copy: printed in pages.Also issued printed in pages.Includes bibliographical references (leaves 255-262).