The Internet consists of interconnected autonomous systems (ASs). An AS typically consists of many networks, where a network (also called an IP network) was defined in the previous section. Recall from Section 4.3 that each autonomous system is administered independently. The administrator of an autonomous system chooses the intra-AS routing algorithm for that AS, and is responsible for administering that AS and no others. Datagrams must also be routed among the ASs, and this is the job of inter-AS routing protocols. As discussed in Section 4.3, this hierarchical organization of the Internet has permitted the Internet to scale. In this section we examine the intra-AS and inter-AS routing protocols for that are commonly used in the Internet.
An intra-AS routing protocol is used to configure and maintain the routing tables within an autonomous system (AS). Once the routing tables are configured, datagrams are routed within the AS as described in the previous section. Inter-AS routing protocols are also known as interior gateway protocols. Historically, three routing protocols have been used extensively for routing within an autonomous system in the Internet: RIP (the Routing Information Protocol), and OSPF (Open Shortest Path First), and IGRP (Cisco's propriety Interior Gateway Routing Protocol).
The Routing Information Protocol (RIP) was one of the earliest intra-AS Internet routing protocols and is still in widespread use today. It traces its origins and its name to the Xerox Network Systems (XNS) architecture. The widespread deployment of RIP was due in great part to its inclusion in 1982 of the Berkeley Software Distribution (BSD) version of UNIX supporting TCP/IP. RIP version 1 is defined in [RFC 1058], with a backwards compatible version 2 defined in [RFC 1723].
RIP is a distance vector protocol that operates in a manner very close to the idealized protocol we examined in Section 4.2.3. The version of RIP specified in RFC 1058 uses hop count as a cost metric, i.e., each link has a cost of 1, and limits the maximum cost of a path to 15. This limits the use of RIP to autonomous systems that are less than 15 hops in diameter.Recall that in distance vector protocols, neighboring routers exchange routing information with each other. In RIP, the routing tables are exchanged between neighbors every 30 seconds using RIP's. This is done with RIP's so-called response message, with each response message containing that host's routing table entries for up to 25 destination networks. These response messages containing routing tables are also called advertisements.
Let us take a look at a simple example of how RIP advertisements work. Consider the portion of an AS shown in Figure 4.5-2. In this figure, the rectangles denote routers and the lines connecting the rectangles denote networks. Note that the routers are labeled A, B, etc. and the networks are labeled 1, 10, 20, 30, etc. For visual convenience, some of the routers and networks are not labeled. Dotted lines in the figure indicate that the autonomous system continues on and perhaps loops back. Thus this autonomous system has many more routers and links than are shown in the figure.
Figure 4.5-2: A portion of an autonomous system.
Now suppose that the routing table for router D is as shown in Figure 4.5-3. Note that the routing table has three columns. The first column is for the destination network, the second column indicates the next router along the shortest path to the destination network, and the third column indicates the number of hops (i.e., the number of networks that have to be traversed, including the destination network, to get to the destination network along the shortest path). For this example, the table indicates that to send a datagram from router D to destination network 1, the datagram should be first sent to neighboring router A; moreover, the table indicates that destination network 1 is two hops away along the shortest path. Also note that the table indicates that network 30 is seven hops away via router B. In principle, the routing table should have one row for each network in the AS. (Although aggregation, a topic beyond the scope of this book, can be used to aggregate entries.) It should also have at least one row for networks that are outside of the AS. The table in Figure 4.5-3, and the subsequent tables to come, are only partially complete.
of hops to
Now suppose that 30 seconds later, router D receives from router A the advertisement shown in Figure 4.5-4. Note that this advertisement is nothing other but the routing table in router A! This routing table says, in particular, that network 30 is only 4 hops away from router A.
of hops to
Router D, upon receiving this advertisement, merges the advertisement (Figure 4.5-4) with the "old" routing table (Figure 4.5-3). In particular, router D learns that there is now a path through router A to network 30 that is shorter than the path through router B. Thus, router D updates its routing table to account for the "shorter" shortest path, as shown in Figure 4.5-5. How is it, you might ask, that the shortest path to network 30 became shorter. This is because either this decentralized distance vector algorithm was still in the process of converging (see Section 4.2), or new links and/or routers were added to the AS, which changed the actual shortest paths in the network.
of hops to
Returning now to the general properties of RIP, if a router does not hear from its neighbor at least once every 180 seconds, that neighbor is considered to be no longer reachable, i.e., either the neighbor has died or the connecting link has gone down. When this happens, RIP modifies its local routing table and then propagates this information by sendind advertisements to its neighboring routers (the ones that are still reachable). A router can also request information about its neighbor's cost to a given destination using RIP's request message. Routers send RIP request and response messages to each other over UDP using port number 520.The UDP packet is carried between routers in a standard IP packet. The fact that RIP uses a transport layer protocol (UDP) on top of a network layer protocol (IP) to implement network layer functionality (a routing algorithm) may seem rather convoluted (it is!). Looking a little deeper at how RIP is implemented will clear this up.
Figure 4.5-6 sketches how RIP is typically implemented in a UNIX system, e.g., for example, a UNIX workstation serving as a router. A process called routed (pronounced "route dee") executes the RIP protocol, i.e., maintains the routing table and exchanges messages with routed processes running in neighboring routers. Because RIP is implemented as an application-layer process (albeit a very special one that is able to manipulate the routing tables within the UNIX kernel), it can send and receive messages over a standard socket and use a standard transport protocol. Thus, RIP is an application-layer protocol (see Chapter 2), running over UDP.
Figure 4.5-6: Implementation of RIP as the routed daemon
Finally, let us take a quick look at a RIP routing table. The
RIP routing table below in Figure 4.5-7 is taken from a UNIX
router giroflee.eurecom.fr. If you give a
-rn command on a UNIX system, you can view the routing
table for that host or router. Performing a netstat on
giroflee.eurecom.fr yields the following routing
Destination Gateway Flags Ref Use Interface -------------------- -------------------- ----- ----- ------ --------- 127.0.0.1 127.0.0.1 UH 0 26492 lo0 192.168.2. 192.168.2.5 U 2 13 fa0 193.55.114. 184.108.40.206 U 3 58503 le0 192.168.3. 192.168.3.5 U 2 25 qaa0 220.127.116.11 18.104.22.168 U 3 0 le0 default 22.214.171.124 UG 0 143454
Table 4.5-7: RIP routing table from giroflee.eurecom.fr
The router giroflee is connected to three networks. The
second, third and fourth rows in the table tell us that these
three networks are attached to
giroflee's network interfaces fa0, le0 and qaa0.
giroflee interfaces have IP addresses
192.168.2.5, 126.96.36.199 and 192.168.3.5, respectively. To
transmit a packet to any host belonging to one of these three
giroflee will simply send the outgoing
IP datagram over the appropriate interface. Of particular
interest to us is the default route.
Any IP datagram that is not destined for one of the networks
explicitly listed in the routing table will be forwarded to the
router with IP address
188.8.131.52; this router
is reached by sending the datagram over the default network
interface. The first entry in the routing table is the
so-called loopback interface. When IP sends a datagram to the
loopback interface, the packet is simply returned back to IP;
this is useful for debugging purposes. The address 184.108.40.206 is
a special multicast (Class D) IP address. We will examine IP
multicast in Section 4.8.
Like RIP, the Open Shortest Path First (OSPF) routing is used for intra-AS routing. The "Open" in OSPF indicates that the routing protocol specification is publicly available (e.g., as opposed to Cisco's IGRP protocol). The most recent version of OSPF, version 2, is defined in RFC 2178 – a public document.
OSPF was conceived as the successor to RIP and as such has a number of advanced features. At its heart, however, OSPF is a link-state protocol that uses flooding of link state information and a Dijkstra least cost path algorithm. With OSPF, a router constructs a complete topological map (i.e., a directed graph) of the entire autonomous system. The router then locally runs Dijkstra's shortest path algorithm to determine a shortest path tree to all networks with itself as the root node. The router's routing table is then obtained from this shortest path tree. Individual link costs are configured by the network administrator.
Let us now contrast and compare the advertisements sent by RIP and OSPF. With OSPF, a router periodically sends routing information to all other routers in the autonomous system, not just to its neighboring routers. This routing information sent by a router has one entry for each of the router's neighbors; the entry gives the distance (i.e., link state) from the router to the neighbor. On the otherhand, a RIP advertisement sent by a router contains information about all the networks in the autonomous system, although this information is only sent to its neighboring routers. In a sense, the advertising techniques of RIP and OSPF are duals of each other.
Some of the advances embodied in OSPF include the following:
Security. All exchanges between OSPF routers (e.g., link state updates) are authenticated. This means that only trusted routers can participate in the OSPF protocol within a domain, thus preventing malicious intruders (or networking students taking their newfound knowledge out for a joyride) from injecting incorrect information into router tables.
Multiple same-cost paths. When multiple paths to a destination have the same cost, OSPF allows multiple paths to be used (i.e., a single path need not be not chosen for carrying all traffic when multiple equal cost paths exist).
Different cost metrics for different TOS traffic. OSPF allows each link to have different costs for different TOS (type of service) IP packets. For example, a high bandwidth satellite link might be configured to have a low cost (and hence be attractive) for non-time critical traffic, but a very high cost metric for delay-sensitive traffic. In essence, OSPF sees different network topologies for different classes of traffic, and hence can compute different routes for each type of traffic.
Integrated support for unicast and multicast routing. Multicast OSPF (RFC 1584) provides simple extensions to OSPF to provide for multicast routing (a topic we cover in more depth in Section 4.8). MOSPF uses the existing OSPF link database and adds a new type of link state advertisement to the existing OSPF link state broadcast mechanism.
Support for hierarchy within a single routing domain. Perhaps the most significant advance in OSPF is the ability to hierarchically structure an autonomous system. Section 4.3 has already looked at the many advantages of hierarchical routing structures. We cover the implementation of OSPF hierarchical routing in the remainder of this section.
As OSPF autonomous system can be configured into "areas." Each area runs its own OSPF link state routing algorithm, with each router in an area broadcasting its link state to all other routers in that area. The internal details of an area thus remain invisible to all routers outside the area. Intra-area routing involves only those routers within the same area.
Within each area, one of more area border routers are responsible for routing packets outside the area. Exactly one OSPF area in the AS is configured to be the backbone area. The primary role of the backbone area is to route traffic between the other areas in the AS. The backbone always contains all area border routers in the AS and may contain non border routers as well. Inter-area routing within the AS requires that the packet be first routed to an area border router (ntradomain routing), then routed though the backbone to the area border router that is in the destination area, and then routed to the final destination.
Figure 4.5-7: Hierarchically structured OSPF AS with four areas.
A diagram of a hierarchically structured OSPF network is shown in Figure 4.4-5 . We can identify four types of OSPF routers in Figure 4.5-7:
internal routers. These routers, shown in black, are in a non-backbone areas and only perform intra-AS routing.
area border routers. These routers, shown in blue, belong to both an area and the backbone.
backbone routers (non border routers). These routers, shown in gray, perform routing within the backbone but themselves are not area border routers. Within a non-backbone area, internal routers learn of the existence of routes to other areas from information (essentially a link state advertisement, but advertising the cost of a route to another area, rather than a link cost) broadcast within the area by its backbone routers.
boundary routers. A boundary router, shown in blue, exchanges routing information with routers belonging to other autonomous systems. This router might, for example, use BGP to perform inter-AS routing. It is through such a boundary router that other routers learn about paths to external networks.
The Interior Gateway Routing Protocol (IGRP) [Cisco97] is a proprietary routing algorithm developed by Cisco Systems, Inc. in the mid-1980's as a successor for RIP. IGRP is a distance vector protocol. Several cost metrics (including delay, bandwidth, reliability, and load) can be used in making routing decisions, with the weight given to each of the metrics being determined by the network administrator. This ability to use administrator-defined costs in making route selections is an important difference from RIP; we will see shortly that so-called policy-based interdomain Internet routing protocols such as BGP also allow administratively defined routing decisions to be made. Other important differences from RIP include the use of a reliable transport protocol to communicate routing information, the use of update messages that are sent only when routing table costs change (rather than periodically) , and the use of a distributed diffusing update routing algorithm [Garcia-Luna-Aceves 1991] to quickly compute loop free routing paths.
The Border Gateway Protocol version 4, specified in RFC 1771 (see also RFC 1772, RFC 1773), is the de facto standard interdomain routing protocol in today's Internet. It is commonly referred to as BGP4 or simply as BGP. As an inter-autonomous system routing protocol, it provides for routing between autonomous systems (that is, administrative domains).
While BGP has the same general flavor as the distance vector protocol that we studied in Section 4.2, it is more appropriately characterized as a path vector protocol. This is because BGP in a router does not propagate cost information (e.g., number of hops to a destination), but instead propagates path information, such as the sequence of ASs on a route to a destination AS. We will examine the path information in detail shortly. We note though that while this information includes the names of the ASs on a route to the destination, they do not contain cost information. Nor does BGP specify how a specific route to a particular destination should be chosen among the routes that have been advertised. That decision is a policy decision that is left up to the domain's administrator. Each domain can thus choose its routes according to whatever criteria it chooses (and need not even inform its neighbors of its policy!) -- allowing a significant degree of autonomy in route selection. In essence, BGP provides the mechanisms to distribute path information among the interconnected autonomous systems, but leaves the policy for making the actual route selections up to the network administrator.
Let's begin with a grossly simplified description of how BGP works. This will help us see the forest through the trees. As discussed in Section 4.3, as far as BGP is concerned, the whole Internet is a graph of ASs, and each AS is identified by an AS number. At any instant of time, a given AS X may, or may not, know of a path of ASs that lead to a given destination AS Z. As an example, suppose X has listed in its BGP table such a path XY1Y2Y3Z from itself to Z. This means that X knows that it can send datagrams to Z through the ASs X, Y1, Y2 and Y3, Z. When X sends updates to its BGP neighbors (i.e., the neighbors in the graph), X actually sends the enitre path information, XY1Y2Y3Z, to its neighbors (as well as other paths to other ASs). If, for example, W is a neighbor of X, and W receives an advertisement that includes the path XY1Y2Y3Z, then W can list a new entry WXY1Y2Y3Z in its BGP table .However, we should keep in mind that W may decide to not create this new entry for one of several reasons. For example, W would not create this entry if W is equal to (say) Y2, thereby creating an undesirable loop in the routing; or if W already has a path to Z in its tables, and this existing path is preferable (with respect to the metric used by BGP at W) to WXY1Y2Y3Z ; or, finally, if W has a policy decision to not forward datagrams through (say) Y2 .
In BGP jargon, the immediate neighbors in the graph of ASs are called peers. BGP information is proprogated through the network by exchanges of BGP messages between peers. The BGP protocol defines the four types of messages: OPEN, UPDATE, NOTIFICATION and KEEPALIVE.
OPEN: BGP peers communicate using the TCP protocol and port number 179. TCP thus provides for reliable and congestion controlled message exchange between peers. In contrast, recall that we earlier saw that two RIP peers, e.g., the routed's in Figure 4.5-6 communicate via unreliable UDP. When a BGP gateway wants to first establish contact with a BGP peer (e.g., after the gateway itself or a connecting link has just be booted), an OPEN message is sent to the peer. The OPEN message allows a BGP gateway to identify and authenticate itself, and provide timer information. If the OPEN is acceptable to the peer, it will send back a KEEPALIVE message.
UPDATE: A BGP gateway uses the UPDATE message to advertise a path to a given destination (e.g., XY1Y2Y3Z) to the BGP peer. The UPDATE message can also be used to withdraw routes that had previously been advertised (that is, to tell a peer that a route that it had previously advertised is no longer a valid route).
KEEPALIVE: This BGP message is used to let a peer know that the sender is alive but that the sender doesn't have other information to send. It also serves as an acknowledgment to a received OPEN message.
NOTIFICATION: This BGP message is used to inform a peer that an error has been detected (e.g., in a previously transmitted BGP message) or that the sender is about to close the BGP session.
Recall from our discussion above that BGP provides mechanisms for distributing path information but does not mandate policies for selecting a route from those available. Within this framework, it is thus possible for an AS such as Hatfield.net to implement a policy such as "traffic from my AS should not cross the AS McCoy.net," since it knows the identities of all AS's on the path. (The Hatfield and the McCoy's are two famous feuding families in the US). But what about a policy that would prevent the McCoy's from sending traffic through the Hatfield's network? The only means for an AS to control the traffic it passes though its AS (known as "transit" traffic - traffic that neither originates in, nor is destined for, the network, but instead is "just passing through") is by controlling the paths that it advertises. For example, if the McCoy's are immediate neighbors of the Hatfields, the Hatfields could simply not advertise any routes to the McCoy's that contain the Hatfield network. But restricting transit traffic by controlling an AS's route advertisement can only be partially effective. For example, if the Jones are between the Hatfield's and the McCoy's, and the Hatfield's advertise routes to the Jones' that pass through the Hatfields, then the Hatfields can not prevent (using BGP mechanisms) the Jones from advertising these routes to the McCoys.
Very often an AS will have muliple gateway routers that provide connections to other ASs. Even though BGP is an inter-AS protocol, it can still be used inside an AS as a pipe to exchange BGP updates among gateway routers belonging to the same AS. BGP connections inside an AS are called Internal BGP (IBGP), whereas BGP connections between ASs are called External BGP (EBGP).
As noted above, BGP, which is the successor to EGP, is becoming the de facto standard for inter-AS routing for the public Internet. BGP is used for example at the major network access points (NAP's) where major Internet carries connect to each other and exchange traffic. To see the contents of today's (less than four hours out of date) BGP routing table (large!) at one of the major NAP's in the US (which include Chicago and San Francisco ), click here.
This completes our brief introduction of BGP. Although BGP is complex, it plays a central role in the Internet. We encourage readers to see the references [Halabi 97] and [Huitema 95] to learn more about BGP.
Having now studied the details of specific inter-AS and intra-AS routing protocols deployed in today's Internet, let us conclude by considering perhaps the most fundamental question we could ask about these protocols in the first place (hopefully, you have been wondering this all along, and have not lost the forest for the trees!):
Why are different inter-As and intra-AS routing protocols used?
The answer to this question gets at the heart of the differences between the goals of routing within an AS and among ASs:
Policy. Among ASs, policy issues dominate. It may well be important that traffic originating in a given AS specifically not be able to pass through another specific AS. Similarly, a given AS may well want to control what transit traffic it carries between other ASs. We have seen that BGP specifically carries path attributes and provide for controlled distribution of routing information so that such policy-based routing decisions can be made. Within an AS, everything is nominally under the same administrative control, and thus policy issues play a much less important role in choosing routes within the AS.
Scale. The ability of a routing algorithm and its data structures to scale to handle routing to/among large numbers of networks is a critical issue in inter-AS routing. Within an AS, scalability is less of a concern. For one thing, if a single administrative domain become too large, it is always possible to divide it into two ASs and perform inter-AS routing between the two new ASs. (Recall that OSPF allows such a hierarchy to be built by splitting an AS into "areas").
Performance. Because inter-AS routing is so policy-oriented, the quality (e.g., performance) of the routes used is often of secondary concern (i.e., a longer or more costly route that satisfies a certain policy criteria may well be taken over a route that is shorter but does not meet that criteria). Indeed, we saw that among ASs, there is not even the notion of preference or costs associated with routes. Within a single AS, however, such policy concerns can be ignored, allowing routing to focus more on the level of performance realized on a route.
If you are interested in an Internet Draft relating to a certain subject or protocol enter the keyword(s) here.
Return to Table Of Contents
Copyright Keith W. Ross and James F. Kurose, 1996–2000. All rights reserved.