close this bookTrust In Cyberspace
source ref: ebooktrufi.html
View the documentMetadata
View the documentChapter 1:Introduction
View the documentChapter2: Public Telephone Network and Internet Trustworthiness
View the documentChapter 3: Softawre for Networked Information Systems
View the documentChapter 4: Reinventing Security
View the documentChapter 5:Trustworthy Systems from Untrustworthy components
View the documentChapter 6:The economic and Public Policy Context
View the documentChapter 7: Conclusions and Research Recommendations
Open this folder and view contentsAppendix

Expanding the text here will generate a large amount of data for your browser to display

Chapter 5:Trustworthy Systems from Untrustworthy components

5

Trustworthy Systems from Untrustworthy Components

It is easy to build a system that is less trustworthy than its least trustworthy component. The challenge is to do better: to build systems that are more trustworthy than even their most trustworthy components. Such designs can be seen as "trustworthiness amplifiers." The prospect that a system could be more trustworthy than any of its components might seem implausible. But classical engineering is full of designs that accomplish analogous feats. In building construction, for example, one might find two beams that are each capable of supporting a 200-pound load being laminated together to obtain an element that will support in excess of 400 pounds. Can this sort of thing be done for trustworthiness of computing components, services, and systems? For some dimensions of trustworthiness it already has. Today, many computing services are implemented using replication, and multiple processors must fail before the service becomes unavailable—the service is more reliable than any single component processor. Secrecy, another dimension of trustworthiness, provides a second example: encrypting an already encrypted text, but with a different key, can (although not always; see Menenzes et al., 1997) increase the effective key length, hence the work factor for conducting a successful attack. Again, note how design (multiple encryption, in this case) amplifies a trustworthiness property (secrecy).

Replication and multiple encryption amplify specific dimensions of trustworthiness. But the existence of these techniques and others like them also suggests a new approach for implementing networked information system (NIS) trustworthiness: A system's structure, rather than

trustworthy systems from untrustworthy components 155

    

its individual components, should be the major source of trustworthiness. This chapter explores that theme. By pointing out connections between what is known for specific trustworthiness dimensions and what is needed, the intent is to inspire investigations that would support a vision of trustworthiness by design. Detailed descriptions of specific research problems would be premature at this point—too little is known. Accordingly, this chapter is more abstract than the other technical chapters in this volume. Getting to the point where specific technical problems have been identified will itself constitute a significant step forward.

Replication and Diversity

Diversity can play a central role in implementing trustworthiness. The underlying principle is simple: some members of a sufficiently diverse population will survive any given attack, although different members might be immune to different attacks. Long understood in connection with the biological world, this principle can also be applied for implementing fault tolerance and certain security properties, two key dimensions of trustworthiness.

Amplifying Reliability

A server can be viewed abstractly as a component that receives requests from clients, processes them, and produces responses. A reliable service can be constructed using a collection of such servers. Each client request is forwarded to a sufficient number of servers so that a correct response can be determined, even if some of the servers are faulty. The forwarding may be performed concurrently, as in active replication (Schneider, 1990), or, when failures are restricted to more benign sorts, serially (forwarding to the next server only if the previous one has failed), as in the primary backup approach (Alsberg and Day, 1976).

This use of replication amplifies the reliability of the components. Observe that the amplification occurs whether or not the servers employed are especially reliable, provided the servers fail independently. The failure-independence requirement is actually an assumption about diversity. Specifically, in this context, "attacks" correspond to server failures, and failure-independence of servers is equivalent to positing a server population with sufficient diversity so that each attack fells only a single server. Processors that are physically separated, powered from different sources, and communicate over narrow-bandwidth links approximate such a population, at least with respect to the random hardware failures. So, this replication-based design effectively amplifies server fault tolerance against random hardware failures. Error correcting codes, used to

156 trust in cyberspace     

tolerate transient noise bursts during message transmissions, and alternative-path routing, used to tolerate router and link outages, can also be viewed in these terms—reliability is achieved by using replicas that fail independently.

Notice, however, that replication can diminish another aspect of trustworthiness—privacy—because replicating a service or database increases the number of locations where the data can be compromised (Randell and Dobson, 1986). Use of selective combinations of secret sharing and cryptographic techniques (so-called threshold cryptography) may, in some cases, reduce the exposure (DeSantis et al., 1994). And replication is not the only example in which techniques for enhancing one aspect of trustworthiness can adversely affect another.

Design and implementation errors in hardware or software components are not so easily tolerated by replication. The problem is that replicas of a single component define a population that lacks the necessary diversity. This is because attacks are now the stimuli that cause components to encounter errors and, since all replicas share design and implementation errors, a single attack will affect all replicas. However, if differently designed and implemented components were used, the necessary diversity would be present in the population. This approach was first articulated in connection with computer programming by Elmendorf,1 who called it "fault-tolerant programming" (Elmendorf, 1972), and subsequently it has been refined by researchers and employed in a variety of control applications, including railway and avionics (Voges, 1988). However, the approach is expensive—each program is developed and tested independently N times and by separate development teams. More troubling than cost, though, are the experimental results that raise questions about whether separate development teams do indeed create populations with sufficient diversity when these teams start with the identical specifications (Knight and Leveson, 1986). See Ammann and Knight (1991) for an overall assessment of the practical issues concerning design diversity.

There are circumstances, however, in which replication can amplify resilience to software design and implementation errors. Program execution typically is determined not only by input data but also by other aspects of the system state. And, as a result of other system activity, the system state may differ from one execution of a given program to the next, causing different logic to be exercised in that program. Thus, an error that

1Dionysius Lardner in 1834 also pointed out the virtues of this approach to computing. See Voges (1988), page 4, for the Lardner quote: "The most certain and effectual check upon errors which arise in the process of computation is to cause the same computations to be made by separate and independent computers; and this check is rendered still more decisive if they make their computations by different methods."

trustworthy systems from untrustworthy components 157

    

causes one execution of the program to fail might not be triggered in a subsequent execution, even for the same input data. Experiences along these lines have been reported by programmers of Tandem systems in which system support for transactions makes it particularly easy to build software that reruns programs after apparent software failures (Gray and Reuter, 1997). Further supporting experiences are reported in Huang et al. (1995), who show that periodic server restarts decrease the likelihood of server crashes. Interestingly, it is this same phenomenon that gives rise to so-called Heisenbugs (Gray and Reuter, 1997)—transient failures that are difficult to reproduce because they are triggered by circumstances beyond the control of a tester. Particularly troubling are Heisenbugs that surface only after a tester adds instrumentation to facilitate debugging a system.

Amplifying Security

Diversity not only can amplify reliability, but it can also be used to amplify immunity to more coordinated and hostile forms of attack. For such attacks, simple replication of components provides no benefit. These attacks are not random or independent; after successfully attacking one replica, an attacker can be expected to target other replicas and repeat that attack. A vulnerability in one replica constitutes a vulnerability for all replicas, and a population of identical replicas will lack the necessary diversity to survive. But a more diverse population—even though its members might each support the same functionality—can provide a measure of immunity from attacks.

The diversity necessary for deflecting hostile attacks can be viewed in terms of protocols, interfaces, and their implementations. Any attack will necessarily involve accessing interfaces because attacks exploiting vulnerabilities in standard protocols can be viewed as attacks against an interface. The attack will succeed owing to vulnerabilities associated with the semantics of those interfaces or because of flaws in the implementation of those interfaces. Different components or systems that provide the same functionality might do so by supporting dissimilar interfaces, by supporting similar interfaces having different implementations, or by supporting similar interfaces having similar implementations. With greater similarity comes increased likelihood of common vulnerabilities. For example, in UNIX implementations from different vendors, there will be some identical interfaces (because that is what defines UNIX) with identical implementations, some identical interfaces in which the implementations differ, and some internal interfaces that are entirely dissimilar. A Windows NT implementation is less similar to a UNIX system than another UNIX system would be. Thus, a successful attack against one UNIX implementation is more likely to succeed against the other UNIX imple

158 trust in cyberspace     

mentations than against Windows NT. Unfortunately, realities of the marketplace and the added complexities when diverse components are used in building a system reduce the practicality of aggressively employing diversity in designing systems.

Findings

1. Replication and diversity can be employed to build systems that amplify the trustworthiness of their components. Research is needed to understand the limits and potential of this approach. How can diversity be added to a collection of replicas? How can responses from a diverse set of replicas be combined so that responses from corrupted components are ignored?

2. Research is also needed to understand how to measure similarities between distinct implementations of the same functionality and to determine the extent to which distinct implementations share vulnerabilities.

Monitor, Detect, Respond

Monitoring and detection constitute a second higher-level design approach that can play a role in implementing trustworthiness: attacks or failures are allowed to occur, but they are detected and a suitable and timely response is initiated. This approach has been applied both with respect to security and to fault tolerance. Its use for fault tolerance is broadly accepted, but its role in providing security is somewhat controversial.

Physical plant security typically is enforced by using such a combined approach—locks keep intruders out, and alarms, video surveillance cameras, and the threat of police response not only serve as deterrents but also enable the effects of an intrusion to be redressed. This combined approach is especially attractive when shortcomings in prevention technology are suspected. For example, in addition to antiforgery credit card technology and authorization codes for each transaction, credit card companies monitor and compare each transaction with profiles of past cardholder activity. A combined approach may be even more cost-effective than solely deploying prevention technology of sufficient strength.

Limitations in Detection

Whatever the benefits, the monitor-detect-respond approach is limited by the available detection technology—response is not possible without detection. For example, when this approach is used for security, the

trustworthy systems from untrustworthy components 159

    

detection subsystem must recognize attacks (and report them) or must recognize acceptable behavior (and report exceptions) (Lunt, 1993). To recognize attacks, the detection subsystem must be imbued with some characterization of those attacks. This characterization might be programmed explicitly (perhaps as a set of pattern-matching rules for some aspect of system behavior) or derived by the detection subsystem itself from observing attacks. Notice that whatever means is employed, new attacks might go unrecognized. Systems that recognize acceptable behavior employ in effect some model for that behavior. Again, whether the model is programmed explicitly or generated by observing past acceptable behavior, the detection subsystem can be fooled by new behavior—for example, the worker who stays uncharacteristically late to meet a deadline.

With only approximate models to drive the detection subsystem, some attacks might not be detected and some false alerts might occur. Undetected attacks are successful attacks. And with false alerts, one detection problem is simply transformed into another one, with false alerts being conveyed to human operators for analysis. An operator constantly dealing with false alerts will become less attentive and less likely to notice a bona fide attack. Attackers might even try to exploit human frailty by causing false alerts so that subsequent real attacks are less likely to attract notice.

Any detection subsystem must gather information about the system it is monitoring. Deploying the necessary instrumentation for this surveillance may require modifications to existing systems components. That, however, could be difficult with commercial off-the-shelf components, since their internals are rarely available for view or modification. It also may become increasingly difficult if there is greater use of encryption for preserving confidentiality of communications, since that restricts the places in the system where monitoring can be performed. Data must be collected at the right level, too. Logs of low-level events might be difficult to parse; keeping only logs of events at higher levels of abstraction might enable an attack to be conducted below the level of the surveillance. A final difficulty with using the monitor-detect-respond approach to augment prevention mechanisms is its implicit reliance on prevention technology. The surveillance and detection mechanisms must be protected from attack and subversion.

Response and Reconfiguration

For the monitor-detect-respond paradigm to work, a suitable response must be available to follow up the detection of a failure or attack.

When it is failures that are being detected, system reconfiguration to

160 trust in cyberspace     

isolate the faulty components seems like a reasonable response. For systems whose components are physically close, solutions for this system-management problem are understood reasonably well. But for systems spanning a wide-area network, like a typical networked information system (NIS), considerably less is known. The problem is that communication delays now can be significant, giving rise to open questions about trade-offs involving the granularity and flexibility of the system-management functions that must be added to implement reconfigurations. And there is also the question of how to integrate partitions once they can be reconnected.

When hostile attacks are being detected, further concerns come into play. Isolating selected subsystems might be the sensible response, but knowing how and when to do so requires additional research into how to design an NIS that can continue functioning, perhaps in a degraded mode, once partitioned. Having security functionality be degraded in response to an attack is unwise though, since the resulting system could then admit a two-phase attack. The first phase causes the system to reconfigure and become more vulnerable to attack; the second phase of the attack exploits one of those new vulnerabilities. Finally, system reconfiguration mechanisms also must be protected from attacks that could compromise system availability. Triggering the reconfiguration mechanism, for example, could be the basis for a denial-of-service attack.

Perfection and Pragmatism

The monitor-detect-respond paradigm is theoretically limited by, among other things, the capabilities of the detection subsystem that it employs. This is more of a problem for attack monitoring than for failure monitoring. Specifically, a failure detector for a given system is unlikely to grow less effective over time, whereas an attack detector will grow less effective because new attacks are constantly being devised. Other common defensive measures, such as virus scanners and firewalls, are similarly flawed in theory but useful nevertheless.

There is nothing wrong with deploying theoretically limited solutions. What is known as "defense in depth" in the security community argues for using a collection of mechanisms so that the burden of perfection is placed on no single mechanism. One mechanism covers the flaws of another. Implicit in defense in depth, however, is a presumption about coverage. An attack that penetrates one mechanism had better not penetrate all of the others. Unfortunately, this coverage presumption is one that is not easily discharged—attack detectors are never accompanied by useful characterizations of their coverage, partly because no good characterizations exist for the space of attacks. Analogous to the error bars and

trustworthy systems from untrustworthy components 161

    

safety factors that structural engineers employ, security engineers need ways to understand the limitations of their materials. What is needed can be seen as another place where the research into a "theory of insecurity" (advocated in Chapter 4) would have value, by providing a method by which vulnerabilities could be identified and their system-wide implications understood.

Findings

1. Monitoring and detection can be employed to build systems that amplify the trustworthiness of their components. But research is needed to understand the limits and potential of this approach.

2. Limitations in system monitoring technology and in technology to recognize events, like attacks and failures, impose fundamental limits on the use of monitoring and detection for implementing trustworthiness. For example, the limits and coverage of the various approaches to intruder and anomaly detection are not well understood.

Placement of Trustworthiness Functionality

In traditional uniprocessor computing systems, functionality for enforcing security policies and tolerating failures is often handled by the kernel, a small module at the lowest level of the system software. That architecture was attractive for three reasons:

• Correct operation of the kernel—hence, security and fault-tolerance functionality for the entire system—depended on no other software and, therefore, could not be compromised by flaws in other system software.

• Keeping the kernel small facilitated understanding it and gaining assurance in the entire system's security and fault-tolerance functionality.

• By segregating security and fault-tolerance functionality, both of which are subtle to design and implement, fewer programmers with those skills were required, and all programmers could leverage the efforts of the few.

Whether such an architecture is suitable for building an NIS seems less clear. For such a system to be scalable and to tolerate the failure of any single component, the "kernel" would have to span some of the network infrastructure and perhaps multiple processors. And, because NIS components are likely to be distributed geographically, ensuring unimpeded access to a "kernel" might force it, too, to be geographically distributed. A "kernel" that must span multiple, geographically distributed proces

162 trust in cyberspace     

sors is not likely to be small or easily understood, making alternative architectures seem more attractive. For example, an argument might be made for placing security and fault-tolerance functionality at the perimeter of the system, so that processors minimize their dependence on network infrastructure and other parts of the system.

An effort was made, associated with the Trusted Network Interpretation (the so-called Red Book) of the Trusted Computer System Evaluation Criteria (TCSEC), to extend the "kernel" concept, for the security context, from a single computer to an entire network (U.S. DOD, 1987). According to the Red Book, there was a piece of the "kernel" in each processing component, and communication between components was assumed to be secure. This approach was found to be infeasible for large networks or even relatively small nonhomogeneous ones.

Too few NISs have been built, and even fewer have been carefully analyzed, for any sort of consensus to have emerged about what architectures are best or even about what aspects of an NIS and its environment are important in selecting an architecture. The two extant NISs discussed in Chapter 2—the public telephone network (PTN) and the Internet—give some feel for viable architectures and their consequences. A proposed third system under discussion within government circles, the so-called minimum essential information infrastructure (MEII), gives insight into difficulties and characteristics associated with specifying a sort of "kernel" for an NIS. Therefore, the remainder of this section reviews these three systems and architectures. While only a start, this exercise suggests that further research in the area could lead to insights that would be helpful to NIS designers.

Public Telephone Network

The PTN is structured around a relatively small number of highly reliable components. A single modern telephone switch can handle all of the traffic for a town with tens of thousands of residents; long-distance traffic for the entire country is routed through only a few hundred switches. All of these switches are designed to be highly available, with downtime measured in small numbers of minutes per year. Control of the PTN is handled by a few centrally managed computers. The end systems (telephones) do not participate in PTN management and are not expected to have processing capacity.

The use of only a small number of components allows telephone companies to leverage their scarce human resources. PTN technicians are needed to operate, monitor, maintain, test, and upgrade the software in only a relatively small number of machines. Having centralized control simplifies network-wide load management, since the state of the system


trustworthy systems from untrustworthy components 163

    

is both accessible and easily changed. But the lack of diversity and centralization does little to prevent widespread outages. First, shared vulnerabilities and common-mode failures are more than a possibility; they have already occurred. Second, after propagating only a short distance (i.e., through a relatively small number of components), a failure or attack can affect a significant portion of the system.

As discussed in Chapter 2, the PTN maintains state for each call being handled. This, in turn, facilitates resource reservations per call that enable quality of service guarantees per call—a connection, once established, receives 56 kbps (kilobits per second) of dedicated bandwidth. But, establishing a connection in the PTN is not guaranteed. If a telephone switch does not have sufficient bandwidth available, then it will decline to process a call. Consequently, existing connections are in no way affected by increases in offered load.2

Internet

The Internet, by and large, exemplifies a more distributed architecture than the PTN. It is built from thousands of routers that are run by many different organizations and (as a class) are somewhat less reliable than telephone switches. Control in the Internet is decentralized, and delivery of packets is not guaranteed. Routers communicate with each other to determine the current network topology and automatically route packets, or discard them for lack of resources. The end systems (i.e., hosts) are responsible for transforming the Internet's "best effort" service into something stronger, and hosts are assumed to have processing capacity for this purpose.

The reliability of the Internet comes from the relatively high degree of redundancy and absence of centralized control. To be sure, any given end system on the Internet experiences lower availability than, for instance, a typical telephone. However, the network as a whole will remain up despite outages. No single make of computer or operating system is run everywhere in the Internet, though many share a common pedigree. Diversity of hardware and software protects the Internet from some common-mode design and implementation failures and contributes to the reliability of the whole. But the Internet's routing infrastructure is built using predominantly Cisco routers, with Bay and a few other companies supplying the rest. In that regard, the Internet is like the PTN, relying

2If the call is declined by a switch, then the call may be routed via other switches or it may be declined altogether by returning a busy signal to the call initiatior.

164 trust in cyberspace     

largely on switches from Lucent, with Nortel, Siemens, and a few others supplying the rest.

With protocol implementations installed in the tens of millions of end systems, it is relatively difficult to install changes to the Internet's protocols. This, then, is one of the disadvantages of an architecture that depends on end-system processing. Even installing a change in the Internet's routers is difficult because of the large number of organizations involved.

As discussed in Chapter 2, the Internet's routers, by design, do not maintain state for connections—indeed, connections are known only to the end systems. Different packets between a pair of end systems can travel different routes, and that provides a simple and natural way to tolerate link and router outages. The statelessness of the Internet's routers means that router memory capacity does not limit the number of end systems nor the number of concurrently open connections. However, there is a disadvantage to this statelessness: routers are unable to offer hosts true service guarantees, and the service furnished to a host can be affected by increases in load caused by other hosts.

In addition to supporting end-system scaling, the statelessness of the Internet helps avoid a problem often associated with distributed architectures: preserving constraints that link the states of different system components. Preservation of constraints, especially when outages of components must be tolerated, can require complex coordination protocols. Note that consistency constraints do link the routing tables in each of the Internet's routers. But these are relatively weak consistency constraints and are, therefore, easy to maintain. Even so, the Internet experiences routing-state maintenance problems, known as "routing flaps." (Routing response is dampened to help deal with this problem, at the level of the Border Gateway Protocol.) State per connection would be much harder to maintain because of the sheer numbers and the short-lived nature of the connections.

Minimum Essential Information Infrastructure

A minimum essential information infrastructure (MEII) is a highly trustworthy communications subsystem—a network whose services are immune to failures and attacks. The notion of an MEII was originally proposed in connection with providing support for NISs that control critical infrastructures.3 The MEII essentially was to be a "kernel" for many, if not all, NISs.

3According to Anderson et al. (1998), the term "MEII" is credited to Rich Mesic, a RAND researcher who was involved in a series of information-warfare exercises run by RAND starting in 1995.

trustworthy systems from untrustworthy components 165

    

The study committee believes that implementing a single MEII for the nation would be misguided and infeasible. An independent study conducted by RAND (Anderson et al., 1998) also arrives at this conclusion. One problem is the incompatibilities that inevitably would be introduced as nonhardened parts of NISs are upgraded to exploit new technologies. NISs constantly evolve to exploit new technology, and an MEII that did not evolve in concert would rapidly become useless.

A second problem with a single national MEII is that "minimum" and "essential" depend on context and application (see Box 5.1), so one size cannot fit all. For example, water and power are essential services. Losing either in a city for a day is troublesome, but losing it for a week is unacceptable, as is having either out for even a day for an entire state. A hospital has different minimum information needs for normal operation (e.g., patient health records, billing and insurance records) than it does during a civil disaster. Finally, the trustworthiness dimensions that should be preserved by an MEII depend on the customer: local law enforcement agents may not require secrecy in communications when handling a civil disaster but would in day-to-day crime fighting.

Despite the impracticality of having a single national MEII, providing all of the trustworthiness functionality for an NIS through a "kernel" could be a plausible design option. Here are likely requirements:

• The "kernel" should degrade gracefully, shedding less essential functions if necessary to preserve more essential functions. For example, low-speed communications channels might remain available after high-speed ones are gone; recent copies of data might, in some cases, be used in place of the most current data.4

• The "kernel" should, to the extent possible, be able to function even if all elements of the infrastructure are not functioning. An example is the PTN, whose essential components have backup battery power enabling them to continue operating for a few hours after a power failure and without telephone company emergency generators (which might not be functioning).

• The "kernel" must be designed with restart and recovery in mind. It should be possible to restore the operation, starting from nothing, if necessary.

Note that neither the PTN nor the Internet exhibits all three of these characteristics, although the PTN probably comes closer than the Inter

4Applications that depend on a gracefully degrading MEII must themselves be able to function in the full spectrum of resource availability that such an MEII might provide.

166 trust in cyberspace     

BOX 5.1

Taxonomy of Applications for Support by a Minimum Essential Information Infrastructure

• Military. Short-term strategic communications and information management needs of the Armed Forces as required to operate national defense systems, gather intelligence, and conduct operations against hostile powers.

• Nonmilitary federal government. Communications and information needs of the federal government to communicate with the military and local governments, to coordinate civil responses to natural disasters, and to direct national law enforcement against internal threats, terrorists, and organized crime.

• National information and news. Infrastructure required to communicate national issues rapidly to the U.S. public. Current examples include national radio and television networks (both broadcast and cable) and the national emergency broadcast program and national newspapers.

• National power and telecommunications services. Communications required to operate natural gas distribution, fuel distribution, the electric power distribution grids, and the public switched telephone network at a moderate level allowing nonmilitary communication.

• National economy. Communications required to operate public and private banking systems, stock exchanges, and other economic institutions; the concept may also extend to social service programs, which include income distribution components.

• Local government. Communications and information management needs of state and municipal governments to coordinate civil responses to natural disasters, to communicate with federal authorities, and to direct local law enforcement, fire, and health and safety personnel.

• Local information and news. Infrastructure required to communicate local information to a local area rapidly. Current examples include local television, radio, and newspapers.

• Nongovernment civil. Communications and information management needs of civil institutions, such as the Red Cross, hospitals, ambulance services, and other critical and safety-related civil institutions.

• Local power and telecommunications. Communications required to operate local power grids and telephony networks at a restricted level.

• Local economic and mercantile. Communication infrastructure required to operate local banks, markets, stores, and other essential mercantile infrastructure.

• Transportation. Communications infrastructure needed to manage air traffic, signaling and control infrastructure for controlling railroads, and infrastructure for automobile traffic signaling and control of traffic congestion in cities.

net.5 The development of a "kernel" exhibiting all three of the characteristics might well require new research, and an attempt to build such a "kernel" could reveal technical problems that are not, on the surface, apparent. Implementing an NIS using such a "kernel" could also be a

5There is some question as to whether the PTN can be disconnected and then restarted from scratch.

trustworthy systems from untrustworthy components 167

    

useful research exercise, since it might reveal other important characteristics the "kernel" should possess.

An alternative vision of the specification for a trustworthy "kernel" is as a computer network—hardware, communications lines, and software—that has a broad spectrum of operating modes. At one end of the spectrum, resource utilization is optimized; at the other end—entered in response to an attack—routings are employed that may be suboptimal but more trustworthy because they use diverse and replicated routings. In the more conservative mode, packets might be duplicated or fragmented6 by using technology that is effective for communicating information even when a significant fraction of the network has been compromised.7

Notice that for such a multimode MEII implementation to be viable, it must possess some degree of diversity. Thus, there might well be a point after which hardening by using trustworthy components should defer to design goals driven by diversity. Second, detecting the occurrence of an attack is a prerequisite to making an operating-mode change that constitutes a defense in this MEII vision. Tools for monitoring the global status of the network thus become important, especially since a coordinated attack might be recognized only by observing activity in a significant fraction of the network.

A third plausible architecture for supporting trustworthiness functionality is to use some sort of a service broker that would monitor the status of the communications infrastructure. This service broker would sense problems and provide information to restore service dynamically, interconnecting islands of unaffected parts of the communications infrastructure. For example, it might be used in commandeering for priority uses some unaffected parts that normally operate as private intranets.

Findings

1. Attempting to build a single MEII for the nation would be misguided and a waste of resources because of the differing requirements of NISs.

2. Little is known about the advantages and disadvantages of different NIS system architectures and about where best to allocate in a system the responsibility for trustworthiness functionality. A careful analysis of

6See, for example, Rabin (1989).

7Note that this multimode scheme implements resistance to attacks by using techniques traditionally used for supporting fault tolerance, something that seems especially attractive because a single mechanism is then being used to satisfy multiple requirements for trustworthiness. On the other hand, single mechanisms do present a common failure mode risk.

168 trust in cyberspace     

existing systems would be one way to learn about the trustworthiness consequences of different architectures.

3. The design of systems that exhibit graceful degradation has great potential, but little is known about supporting or exploiting such systems.

Nontraditional Paradigms

Other less architecturally oriented design approaches have been investigated for amplifying trustworthiness properties, most notably amplifying fault tolerance. These approaches are more algorithmic in flavor. Further research is recommended to develop the approaches and to better understand the extent and domain of their applicability.

Self-stabilization, for example, has been used to implement system services that recover from transient failures (Schneider, 1993). Informally, a self-stabilizing algorithm is one that is guaranteed to return to some predefined set of acceptable states after it has been perturbed and to do so without appealing to detectors or centralized controllers of any sort. For example, some communications protocols depend on the existence of a token that is passed among participants and empowers its holder to take certain actions (e.g., send a message). A self-stabilizing token management protocol would always return the system to the state in which there is a single token, even after a transient failure causes loss or duplication of the token. More generally, the design of network management and routing protocols could clearly benefit from a better understanding of control algorithms having similar convergent properties. The goal should be control schemes that are robust by virtue of the algorithm being used rather than the robustness of individual components.

It may also be possible to develop a science base for algorithms that amplify resilience or other dimensions of trustworthiness by relying on group behavior. Metaphors and observations about the nature of our natural world—flocking birds, immunological systems,8 and crystalline structures in physics—might provide ideas for methods to manage networks of computers and the information they contain. The design approaches outlined above—population diversity and monitor-detect-respond—have clear analogies with biological concepts. Studying the organization of free markets and game theory for algorithmic content might be another source of ideas. Of course, there are significant differences between an NIS and the natural world; these differences might restrict the applicability of natural group behavior algorithms to NISs.

8With regard to the immunology metaphor, sophisticated attacks are like biological weapons, which have always proven effective in overcoming natural immunity.

trustworthy systems from untrustworthy components 169

    

For example, the actions and behaviors of natural systems arise not from deterministic programming but from complex, sometimes random, interactions of the individual elements. Instead of exhibiting the desirable robust behaviors, collections of programmed computers might instead become synchronized or converge in unintended ways. Clearly, research is needed to establish what ideas can apply to an NIS and to understand how they can be leveraged. See Anderson et al. (1998) for a discussion of how biological metaphors might be applied to the design of an MEII.

Finding

A variety of research directions involving new types of algorithms—self-stabilization, emergent behavior, biological metaphors—have the potential to be useful in defining systems that are trustworthy. Their strengths and weaknesses are not well understood, and further research is called for.

References

Alsberg, P.A., and J.D. Day. 1976. "A Principle for Resilient Sharing of Distributed Resources," pp. 627-644 in Proceedings of the 2nd International Conference on Software Engineering. Los Alamitos, CA: IEEE Computer Society Press.

Ammann, P.E., and J.C. Knight. 1991. "Design Fault Tolerance," Reliability Engineering and System Safety, 32(1):25-49.

Anderson, Robert H., Phillip M. Feldman, Scott Gerwehr, Brian Houghton, Richard Mesic, John D. Pinder, and Jeff Rothenberg. 1998. A "Minimum Essential Information Infrastructure" for U.S. Defense Systems: Meaningful? Feasible? Useful? Santa Monica, CA: RAND National Defense Research Institute, in press.

DeSantis, A., Y. Desmedt, Y. Frankel, and M. Yung. 1994. "How to Share a Function Securely," pp. 522-533 in Proceedings of the 26th ACM Symposium on the Theory of Computing. New York: ACM Press.

Elmendorf, W.R. 1972. "Fault-Tolerant Programming," pp. 79-83 in Proceedings of the 2nd International Symposium on Fault-tolerant Computing (FTCS-2). Los Alamitos, CA: IEEE Computer Society Press.

Gray, James, and Andreas Reuter. 1997. Transaction Processing: Concepts and Techniques. San Mateo, CA: Morgan Kaufmann Publishers.

Huang, Yennun, Chandra Kintala, Nick Kolettis, and N. Dudley Fulton. 1995. "Software Rejuvenation: Analysis, Module, and Applications," pp. 381-390 in Proceedings of the 25th Symposium on Fault-tolerant Computing. Los Alamitos, CA: IEEE Computer Society Press.

Knight, J.C., and Nancy G. Leveson. 1986. "An Experimental Evaluation of the Assumption of Independence in Multi-version Programming," IEEE Transactions on Software Engineering, 12(1): 96-109.

Lunt, Teresa F. 1993. "A Survey of Intrusion Detection Techniques," Computers and Security, 12(4):405-418.

Menenzes, Alfred J., Paul C. Van Oorschot, and Scott A. Vanstone. 1996. Handbook of Applied Cryptography. CRC Press Series on Discrete Mathematics and Its Applications. Boca Raton, FL: CRC Press, October.

170 trust in cyberspace     
Rabin, M.O. 1989. "Dispersal of Information for Security, Load Balancing, and Fault Tolerance," Communications of the ACM, 36(2):335-348. Available online at <http://www.ACM.org/pubs/citations/journals/jacm/1989-36-2/p335-rabin>.

Randell, B., and J. Dobson. 1986. "Reliability and Security Issues in Distributed Computing Systems," pp. 113-118 in Proceedings of the Fifth Symposium on Reliability in Distributed Software and Database Systems. Los Alamitos, CA: IEEE Computer Society Press.

Schneider, Fred B. 1990. "Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial," ACM Computing Surveys, 22(4):299-319.

Schneider, Marco. 1993. "Self-stabilization," ACM Computing Surveys, 25(1): 45-67.

U.S. Department of Defense (DOD). 1987. Trusted Network Interpretation of the Trusted Computer System Evaluation Criteria, NCSC-TG-005, Library Number S228,526, Version 1, the "Red Book." Ft. Meade, MD: National Computer Security Center.

Voges, Udo. 1988. Software Diversity in Computerized Control Systems. Vol. 2 in the series Dependable Computing and Fault Tolerance Systems. Vienna, Austria: Springer-Verlag.

 


Trust in Cyberspace
Committee on Information Systems Trustworthiness, National Research Council (1999) 352 pages   6 x 9

 

6

The Economic and
Public Policy Context

    

Factors that cause networked information systems (NISs) to be less trustworthy than they might be—environmental disruption, human user and operator errors, attacks by hostile parties, and design and implementation errors—are examined in this report. In a number of instances, research and development efforts have yielded state-of-the-art technological solutions that could be deployed to enhance NIS trustworthiness. Why are such technological solutions not used more widely in practice?

Some experts posit that the benefits from increased trustworthiness are difficult to estimate or trade off, and consumers therefore direct their expenditures toward other investments that they perceive will have more definitive returns. Similarly, producers tend to be reluctant to invest in products, features, and services that further trustworthiness when their resources can be directed (e.g., toward increasing functionality) where the likelihood of profit appears greater. Thus, there seems to be a market failure for trustworthiness. Other factors, such as aspects of public policy, also tend to inhibit the use of existing solutions.

As this report makes clear, while the deployment of extant technologies can improve the trustworthiness of NISs, in many critical areas answers are not known. Research is needed. Most of the research activity related to trustworthiness involves federal government funding. (Although the private sector conducts "research," most of this effort is development that is directed toward specific products.) Inasmuch as the federal government is the major funder of basic and applied research in computing and communications, this chapter examines its interests and

172 trust in cyberspace     

research emphases related to trustworthiness. Certain aspects of trustworthiness (e.g., security) are historically critical areas for federal agencies responsible for national security interests. The National Security Agency (NSA) and Defense Advanced Research Projects Agency (DARPA), both part of the Department of Defense (DOD), have particularly influential roles in shaping research priorities and funding for trustworthiness.

In this chapter, there is a greater emphasis on security than on other dimensions of trustworthiness, because the federal government has placed tremendous emphasis on computer and communications security consistent with the importance of this technology in supporting national security activities. As the broader concept of trustworthiness becomes increasingly important, especially in light of the recent concern for protection of critical infrastructures, increased attention to the nonsecurity dimensions of trustworthiness by the federal government may be warranted. This is not to say that attention to security is or will become unimportant—indeed, security vulnerabilities are expected to increase in both number and severity in the future. Additionally, the success of security in the marketplace is mixed at best, so a discussion of the reasons for this situation merits some attention here.

This chapter begins with a discussion of risk management, which provides the analytical framework to assess rationales for people's investment in trustworthiness or their failure to do so. The risk management discussion leads to an analysis of the costs that consumers encounter in their decisions regarding trustworthiness. These first two sections articulate reasons that there is a disincentive for consumers to invest in trustworthiness. Producers also face disincentives (but different ones) to invest in trustworthiness, as discussed in the third section. Then there is a discussion of standards and criteria and possible roles that they may play to address the market failure problem. The important role of cryptography is explicated in Chapters 2 and 4; here, the focus is on the question of why cryptography is not more widely used. The federal government's many interests in trustworthiness include facilitating the use of technology to improve trustworthiness today and fostering research to support advances in trustworthiness. This chapter concludes with a discussion of the federal agencies involved with conducting and/or sponsoring research in trustworthiness. Two agencies with central roles in this arena—the NSA and DARPA—are examined in some detail.

Risk Management

The motivation to invest in trustworthiness is to manage risks. While it is conceivable to envision positive benefits deriving from trustworthi