Increasing the robustness of networked systems

Kandula, Srikanth

Author(s)

Kandula, Srikanth

DownloadFull printable version (56.13Mb)

Other Contributors

Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.

Advisor

Dina Katabi.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

What popular news do you recall about networked systems? You've probably heard about the several hour failure at Amazon's computing utility that knocked down many startups for several hours, or the attacks that forced the Estonian government web-sites to be inaccessible for several days, or you may have observed inexplicably slow responses or errors from your favorite web site. Needless to say, keeping networked systems robust to attacks and failures is an increasingly significant problem. Why is it hard to keep networked systems robust? We believe that uncontrollable inputs and complex dependencies are the two main reasons. The owner of a web-site has little control on when users arrive; the operator of an ISP has little say in when a fiber gets cut; and the administrator of a campus network is unlikely to know exactly which switches or file-servers may be causing a user's sluggish performance. Despite unpredictable or malicious inputs and complex dependencies we would like a network to self-manage itself, i.e., diagnose its own faults and continue to maintain good performance. This dissertation presents a generic approach to harden networked systems by distinguishing between two scenarios. For systems that need to respond rapidly to unpredictable inputs, we design online solutions that re-optimize resource allocation as inputs change. For systems that need to diagnose the root cause of a problem in the presence of complex subsystem dependencies, we devise techniques to infer these dependencies from packet traces and build functional representations that facilitate reasoning about the most likely causes for faults. We present a few solutions, as examples of this approach, that tackle an important class of network failures. Specifically, we address (1) re-routing traffic around congestion when traffic spikes or links fail in internet service provider networks, (2) protecting websites from denial of service attacks that mimic legitimate users and (3) diagnosing causes of performance problems in enterprises and campus-wide networks. Through a combination of implementations, simulations and deployments, we show that our solutions advance the state-of-the-art.

Description

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.

Includes bibliographical references (p. 133-143).

Date issued

2009

URI

http://hdl.handle.net/1721.1/46609

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses