19th ITC Specialist Seminar 2008

Keynotes and Invited Talks

Keynotes

Invited Talks

Keynote: Locating IP Congested Links with Unicast Probes
Patrick Thiran (EPFL)

How can we locate congested links in the Internet from end-to-end measurements using active probing mechanisms? As IP multicast is not widely deployed, we want to use only IP unicast probes and avoid relying on tight temporal synchronization between beaconing nodes, which is difficult to achieve between distant sites. Like other problems in network tomography or traffic matrix estimation, this inverse problem is ill-conditioned: the end-of-end measurement outcomes do not allow to uniquely identify the variables representing the status of the IP links. To overcome this critical problem, current methods use the unrealistic assumption that all IP links have the same prior probability of being congested. We find that this assumption is not needed: spatial correlations are sufficient to either learn these probabilities, or to identify the variances of the link loss rates. We can then use the learned probabilities or variances as priors to find rapidly the congested links at any time, with an order of magnitude gain in accuracy over existing algorithms. These solutions scale well and are therefore applicable in today's Internet, as shown by the results obtained both by simulation and real implementation using the PlanetLab network over the Internet.

Joint work with Hung X. Nguyen, University of Adelaide.

Keynote: Delay Tolerant Bulk Data Transfers on the Internet or How to Book Some Terabytes on "Red-Eye" Bandwidth
Pablo Rodriguez (Telefonica Research)

Many emerging scientific and industrial applications require the capability to transfer large quantities of data, ranging from tens of terabytes to petabytes. Examples include the transport of high definition movies between studios and theaters, and the transport of large quantities of data from telescopes and particle accelerators/colliders to laboratories all around the world. A convenient property of many of these applications is their ability to tolerate delivery delays from a few hours to a few days. Such Delay-Tolerant Bulk (DTB) transfers are currently being serviced through the use of the postal system to transport hard drives and DVDs, or though the use of expensive dedicated networks.

In this talk, based on traffic data from 200+ links of a large transit ISP, we show that the naive approach of using end-to-end (E2E) connection oriented transfers can be prohibitively expensive under widely used 95-percentile charging schemes. We also show that the available bandwidth of E2E connections is subject to time-of-day effects. Based on these observations, we proceed to design a system for performing Store-and-Forward transfer of DTB data and we evaluate the performance of our system under two scenarios: (1) 95-percentile charging, (2) flat-rate charging under time-of-day capacity constraints.

About the speaker

Pablo Rodriguez is the Internet Scientific Director at Telefonica R&D. Prior to Telefonica, he was a Researcher at Microsoft Research Cambridge within the Systems and Networking group. While at Microsoft, he developed the Avalanche P2P system and worked in content distribution, wireless systems, network coding, and large-scale complex networks. He was also involved in large scale measurement studies of very popular services, such as the Windows Update system, FolderShare, or Xbox live.

During his early research career he worked as a Member of Technical Staff at Bell-Labs, NJ where he developed systems to improve the performance of Wireless applications. Prior to Bell-Labs, he worked as a software architect for various startups in the Silicon Valley including Inktomi (acquired by Yahoo!) and Tahoe Networks (now part of Nokia).

He received his Ph.D. from the Swiss Federal Institute of Technology (EPFL, Lausanne). During his Ph.D. he also worked at AT&T Labs (Shannon Laboratory, Florham Park, NJ). He obtained postgraduate studies from King's College, London and an a B.S. / M.S. in Telecommunication Engineering at the Universidad Pública de Navarra, Spain. He holds more than 20 patents. He is the General Chair for ACM/SIGCOMM '09; a member of the editorial board for IEEE / ACM Transactions on Networking (ToN), and editor for ACM Computer Communications Review SIGCOMM / CCR. He is also a steering committee member for IEEE / HotWeb, and has been the editor for IEEE / JSAC special issue on P2P streaming.

Modeling the evolution of the AS-level Internet: Integrating aspects of traffic, geography and economy
Petter Holme (KTH)

Modeling Internet growth is important both for understanding the current network and to predict and improve its future. To date, models have typically attempted to explain a subset of the following characteristics: network structure, traffic flow, geography, and economy. I will discuss a discrete, agent-based model, that integrates all these aspects. The model will be compared to empirical data. Network topology, dynamics, and (more speculatively) spatial distributions that are similar to the Internet.

TCP Root Cause Analysis Revisited
Ernst W. Biersack (Eurécom)

Abstract

Throughput is the key performance metric for long TCP connections. The achieved throughput results from the aggregate effects of the network path, the parameters of the TCP end points, and the application on top of TCP. Finding out which of these factors is limiting the throughput of a TCP connection, referred to as TCP root cause analysis is important for end users that want to understand the origins of their performance limitations they experience, ISPs that need to troubleshoot their network, and application designers that need to know how to interpret the performance of the application.

The seminal work in the area of TCP root cause analysis by Zhang et al. (Zhang, Y., Breslau, L., Paxson, V., Shenker, S.: On the characteristics and origins of internet flow rates. In: Proceedings of ACM SIGCOMM 2002) has shown that for more than 50 % of the TCP connections analyzed, it is not the network bandwidth that limits the throughput but rather the application or mechanisms such as TCP slow start or too small a receiver window.

In this talk we will describe our system for TCP root cause analysis. The system takes as input bidirectional packet header traces captured at a measurement point anywhere along the path from source to destination and produces as output quantitative information about the limitation causes of TCPs throughput for each connection. We only analyze long lived TCP connections with at least 100 packets for which TCP slow start no longer dominates the throughput performance.

A great number of parameters determines the performance of a TCP connection, such as round-trip time, receiver advertised window, link capacities, available bandwidth, or version of TCP. We extract these parameters from the packet header trace and use them to derive the most likely root cause (or causes).

We distinguish three main classes of root causes: (i) limitations due to the application, (ii) limitations due to the TCP end-hosts and (iii) limitations due to the network.

For a single long lived TCP connection, it is very likely to observe different limitation causes during different periods of time. For instance, some Internet applications such as BitTorrent or HTTP 1.1 operate by switching between active transfer periods and passive keep-alive periods.

A major challenge is to detect those different periods and analyze them separately. For each connection, first the periods where the throughput is determined by the application are isolated. The remaining parts of the connection consist of one or more so called bulk transfer periods that are then further analyzed for limitations due to the TCP end-hosts and limitations due to the network.

Analyzing Internet traffic at packet level involves generally large amounts of raw data, derived data, and results from various analysis sub-tasks. Typically, the analysis often proceeds in an iterative manner and is done using ad-hoc methods and many specialized scripts. Instead, we have decided to use a Data Base Management System, PostgreSQL in our case, to manage the different data and implement the various analysis procedures.

We will not only describe our system and present some of the results obtained, but also discuss the difficulties encountered, relate our exerience using PostgreSQL, and indicate possibilites for carrying the work further.

More details can be found in M. Siekkinen, G. Urvoy-Keller, E. W. Biersack, and D. Collange. A Root Cause Analysis Toolkit for TCP. Computer Networks, 52(9):1846–1858, 2008.

Under http://avband2.eurecom.fr/ we make available a service for performing TCP root cause analysis that is particularily interesting for users connected to the Internet via DSL, Cable, or WLAN hotspots.

Keywords: Network tomography, passive measurements, TCP, performance, troubleshooting.

Location and Fairness in Self-Organized Networks
Sonja Buchegger (Deutsche Telekom Laboratories)

Network nodes may experience large disparities in utility according to their location in the network topology, mainly attributable to differences in traffic load due to shortest-path routing. These disparities become more problematic in resource-constrained self-organized networks, such as mobile ad-hoc, peer-to-peer, wireless mesh or sensor networks, than they have been in traditional infrastructure-based networks.

The impact of node location has so far received relatively little attention, e.g. it is common practice to assume the random-waypoint mobility model in mobile ad-hoc networks, implying that over time node location will be evenly distributed. We are interested in the effect of location on node utility when this assumption is removed.

Applying insights from graph theory and social network analysis, we use centrality metrics and quantify the effect of location and several network topologies. We then use metrics and methods for equity taken from economics. Combining such metrics, we can understand better how choices of mechanisms and topologies impact the total network performance as well as its distribution over the network nodes.

As a concrete example of the general problem, we investigate how incentives for cooperation (such as payment or reputation systems for traffic relay in mobile ad-hoc networks) exacerbate or alleviate node utility disparities due to location. We show that location matters and that without location awareness, such incentive schemes can be unfair.

The generalization of the problem definition (of location-dependent fairness) and the methodology (of quantifying the distributions of both location and utility as well as evaluating the suitability of different notions of fairness) can be applied to several domains, such as evaluation of mobility models, strategic node behavior (location changes) in mobile networks, placement of access points in wireless mesh networks, of road-side units in vehicular networks, sinks in sensor networks, topology control of overlay networks, location-aware incentives for cooperation, and in general evaluation of fairness of networking protocols.

A snapshot of 3G mobile traffic
Fabio Ricciato, (FTW and Univ. Salerno)

Third-generation (3G) mobile networks are becoming increasingly popular as access infrastructures to the global Internet. 3G networks are still undergoing a fast evolution phase, due to a continuous process of architectural upgrades, increase of network capacity, terminal evolution and tariff change. In such an evolving framework, traffic monitoring is the key to obtain and maintain understanding about the behaviour of the network and its user population.

In this talk I will report on our experience with monitoring an operational 3G network. The core of our research is focused on developing a concept for anomaly detection in support of the network operation process. We take a distributional change-detection approach. A number of traffic features are measured per-mobile terminal (number of connection openings downloaded packets, etc.), at different time aggregation time-scales from 1 minute up to 1 hour, from which time-series of feature distributions are derived. Building upon the notion of KL divergence, we have developed a change-detection scheme aimed at reporting "anomalous deviations" from what has been observed "in the past". This requires a preliminary understanding of the "typical" (normal) temporal characteristics of the distributions, to be achieved by an extensive exploration of the data at hand. The talk provides an overview of such exploration, enlightening the key findings about the characteristics and temporal (ir)regularities of feature distributions. The work is based on two large datasets, collected one year apart, covering several weeks of data from an operational network. Furthermore, we discuss the design principles of our anomaly detection system, and report on our operational experience.

Reconstruction of traffic and topology from active measurements
Gábor Vattay (Eötvös University)

In this talk I concentrate on one question: Can active measurement based monitoring and management be a viable competitor of passive methods in a Future Internet? The advantage of active methods is that they raise limited or no privacy / security concerns. Data can be shared without anonymization. They can also be implemented without the collaboration of internet service providers. Yet, these methods provide key information about their networks. The disadvantage of active methods is their limited detail and accuracy.

We investigate two interrelated problems in detail: The efficiency of the active topology discovery and active network tomography.

We present new analytic results for the link and node discovery probability and for the number of links and nodes discovered by traceroute-like shortest path based methods. Our most important finding is that the link discovery probability in real networks is several order of magnitude different than it is predicted by the mean-field approximation. As a consequence, the number of discovered nodes grows linearly. We also give an analytic relation between the true and the discovered degree distribution. We demonstrate this in simulated growing complex networks, real Internet data provided by traceroute measurements done in the PlanetLab infrastructure. The result can also be applied to Peer-to-Peer overlay networks where the average number of Internet routers affected by the P2P traffic can be estimated.

Network tomography enables us to investigate the delay fluctuations in the discovered routers and links. We demonstrate that by now these measurements reached the accuracy of passive methods. Detailed analysis of the entire delay distribution is possible. It includes fitting of queuing models with long range dependent cross traffic and the determination of the Hurst exponent inside of the network. We discuss briefly the prospects of a new branch of statistics based on ensembles with network structure.