dc.description.abstract | Traditional approaches to hazard analysis and safety-related risk management are based on an accident model that focuses on failure events in static engineering designs and linear notions of causality. They are therefore limited in their ability to include complex human decision-making, software errors, system accidents (versus component failure accidents), and organizational risk factors in the analysis. These traditional accident models do not adequately capture the dynamic complexity and non-linear interactions that characterize accidents in complex systems, i.e., what Perrow called system accidents. System accidents often result from adaptation and degradation of safety over time: The move to a high-risk state occurs without any particular decision to do so but simply as a series of decisions or adaptations (asynchronous evolution) that move the system into a high-risk state where almost any slight error or deviation can lead to a major loss.
To handle this more comprehensive view of accidents, risk management tools and models need to treat systems as dynamic processes that are continually adapting to achieve their ends and to react to changes in themselves and their environment. Leveson’s new accident model, STAMP (Systems-Theoretic Accident Modeling and Processes), provides the foundation for such a risk management approach by describing the process leading up to an accident as an adaptive feedback function that fails to maintain safety constraints as performance changes over time to meet a complex set of goals and values.
In this report, a new type of hazard analysis based on this new model of accident causation is described called STPA (STAMP-based Analysis). STPA is illustrated by applying it to TCAS II, a complex aircraft collision avoidance system, and to a public water safety system in Canada. In the first example (TCAS II), STPA is used to analyze an existing system design. A formal and executable modeling/specification language called SpecTRM-RL is used to model and simulate the technical and human components in the system and to provide the support required for the STPA analysis. The results are compared with traditional hazard analysis techniques, including a high-quality TCAS II fault tree analysis created by MITRE for the FAA. The STPA analysis was found to be more comprehensive and complete than the fault tree analysis.
The second example of STPA (the public water system) illustrates its application to the organizational and social components of open systems as well as the technical. In this example, STPA is used to drive the design process rather than to evaluate an existing design. Again, SpecTRM-RL models are used to support the analysis, but this time we added system dynamics models. SpecTRM-RL allows us to capture the system’s static structure (hardware, software, operational procedures, and management controls) and is useful in performing hazard analyses that examine complex socio-technical safety control structures. The addition of system dynamics models allows simulation and modeling of the system’s behavioral dynamics and the effects of changes over time.
STPA allowed us to examine the impact of organizational decision-making and technical design decisions on system risk and resiliency. The integration of STPA, SpecTRM-RL, and system dynamics creates the potential for a simulation and analysis environment to support and guide the initial technical and operational system design as well as organizational and management policy design. The results of STPA analysis can also be used to support organizational learning and performance monitoring throughout the system’s life cycle so that degradation of safety and increases in risk can be detected before a catastrophe results. | en_US |