Lessons Learned from Building Large-Scale Testbeds ================================================== Testbeds are becoming an increasingly important tool in evaluating new network concepts, protocols, technologies & applications. Traditionally, testbeds are built for a specific project and hardly survive the termination of the project. What we need are amendable testbeds which can easily be adapted to new research agendas. Not unlike simulators where an experimenter only need to modify a small fraction to obtain results. However, there remain many hurdles to the full integration of testbeds into the scientific discovery process: * Cost * Ease of deployment * Maintenance * Ease of use * Repeatability * Measurements & correct analysis Many experiments require only off-the-shelf hardware with their cost still following Moore's Law. The same law is also quickly pushing the cost of programmable radios into very attractive regions. However, deployment and management related modifications still dominate the cost if they cannot be amortized over larger number of nodes. What is also often severely underestimated is the cost and effort required in developing the software infrastructure to efficiently & reliably manage what is effectively a highly distributed system. Over the last few years we built Orbit, a large-scale wireless testbed with a US$5.45M/4yr NSF grant. Orbit is one of the few testbeds which was from the outset designed as an open, shared facility. What lessons have we learned? ----------------------------- Not surprisingly, building large-scale testbeds with cheap, off-the-shelf hardware is still very difficult. While the design and building of a testbed itself poses many interesting research challenges, it also requires considerable engineering effort and expertise which may not be readily be available in a university setting. We also were very conscious of the fact that our users would primarily have a simulation background and little practical expertise with controlling a large distributed systems. The design goals for Orbit can be summarized as: * maximum flexibility * repeatability * accuracy & observability We wanted to give experimenters access to the 'bare' metal. While we know that many will opt for the comfort of a fully tested operating system pre-installed on each node, we recognized that many innovations will come from being able to modify EVERYTHING. However, at the same time we need to maintain full manageability of all resources. The initial design achieved that through hardware modifications of the deployed testbed nodes. This was cost-effective for a scale of 500 nodes, but prevented us from easily spawning additional smaller scale testbeds as requested by many from our user community. We have rectified this situation by moving all the custom capabilities into a single PCI board which now allows us to cost-effectively produced new testbed nodes even in smaller quantities. We are very close to offering Orbit testbed kits through a US computer manufacturer with multiple orders of various sizes already placed. One of the primary motivators for building Orbit was the lack of repeatability of most experiments reported onin our community. Not only does that hinder our ability to verify results as is common practice in other scientific fields, it also hampers building on and honestly comparing with related work. Therefore, repeatability at the basic level should allow an experimenter to at least repeat the experiment. To achieve this we need to capture all the experiment parameters and externalities, such as software versions and configuration parameters (e.g. kernel parameters). Testbeds also often contain other resources do emulate real-world settings, such as interference sources, or sensor stimulation which not only need to be managed but also captured to support repeatability. For this, we developed the Orbit Management Framework (OMF) to allow an experimenter to describe the entire experiment in a single file and initiate and control if from a single testbed console. This method also enables unattended batch mode operation and with it higher utilization of the testbed resources. OMF V4 is currently in the final stages of testing on four different testbeds (2 indoor, 2 outdoor) demonstrating its adaptability to different environments and wireless technologies. For instance, the management plane of one outdoor testbed is running over a commercial WiMax service while normally running over fixed ethernet. Most experiments require extensive measurements to be collected and analyzed. Instrumenting a testbed in such a way that the measurements and their collection do not interfere with the experiment itself is not only a challenge but often requires additional infrastructure and with it cost. But the efficient collection of results from distributed sources is also of crucial importance. Not only does efficient collection reduce the time between experiments but it also reduces analysis and interpretation errors. For this, we have developed the Orbit Measurement Library (OML). OML provides a low effort way to instrument applications, services, and infrastructure. It cleanly separates the decision on what measurements are available (application developer) from what should be collected in a specific experiment (experimenter). The latter depends heavily on the type of experiments and the granularity and volume of measurements required. OML collects the measurements over the separate control network into a single relational database. The database approach ensures that all measurements are in one place, and all related meta data are captured as well. In addition, most popular analysis tools contain SQL adaptors reducing the need for custom conversion and filter tools. OML is readily available to Orbit users and we expect to obtain permission to release OML for wider use under an open source license shortly. In summary, we strongly believe in the impact testbeds have in validating new network concepts, protocols, technologies & applications in more real-live settings. Not only will that lead to new insights but it also increases the confidence of decision makers to more readily transition research results into production network, products, and services. However, experience has taught us that building an eco system of testbeds will be greatly helped by the adoption of a common management framework which not only dramatically simplifies the deployment of a testbed but also supports the entire live cycle of experimental validation. A common framework will also enable a user to easily move between testbeds which will not allow her to use the right testbed for the right task but also maximize the benefit the entire community gets from all the available resources. We believe that with OMF, and its related tools we have created one such framework.