With increasing link bandwidths and router capacities in todays Internet
it becomes increasingly difficult for network operators to be aware of
amount and nature of the traffic they have to manage. This information
is needed for several reasons. First, it is necessary for network
operators to be aware of network related problems like hardware or
software failure, misconfigurations and attacks as soon as possible.
Second, traffic volume information is needed for both billing of customers
and provisioning of the network. Third, network operators need information
on traffic to be able to actively influence traffic passing over the network
which is commonly referred to as traffic engineering.
The high bandwidths used make it very hard to do such analysis on packet trace level. Packet counters in routers can be used to infer the total number of packets per time unit passing a link or a router but don't provide information necessary for billing and traffic engineering. To tackle these problems, the notion of network flows, or short netflows, were introduced and deployed.
A netflow in this context is a unidirectional stream of related packets. Netflows have a start and an end, determined by time gaps during which no related packet has been observed. This concept leads to a drastic reduction in data volume while conserving information like endpoints of the flow and information on the nature of the data in the flow. This concept is particularly interesting when it comes to traffic engineering.
As traffic engineering involves changes in routers forwarding tables, it is infeasible to do this on a per-connection basis or on a per-host basis. Instead one can define what packets to account for what flow and then do traffic engineering on a per-flow basis. Here one wants to concentrate on large and long living flows.
But what are large flows and what are long living flows? Are large flows long living? Are long living flows large? These are but a few of the questions that occur when coping with netflows. Obviously we lack basic understanding of the nature of netflows that can be observed in the Internet. There are a few findings related to the behavior of flows on a very fine timescale. We also have learned that the size of netflows is consistent with Zipf's law. This means that we have a very small number of flows, that contribute most of the bytes and packets going over a link. But we have no information on what these few large flows look like, where they come from, and how they behave over time.
It is these questions that we try to to find answers for. The answers will allow us to make more efficient use of netflow information for applications like billing and traffic engineering. We also hope to find new aspects of network traffic that allow for new applications and further out under- standing of what exactly happens on today's and tomorrow's Internet.