HIP-HOPS

Name: HiP-HOPS
Author: University of Hull

May 8
10 min read

HIP-HOPS: Automated Dependability Analysis and Evolutionary Design of Systems

- HiP-HOPS tool website

Challenge

Imagine that you develop a steer-by-wire systems of cars. In Fig. 1, you see a high-level Matlab-Simulink model of a steer-by-wire system that has been designed by Volvo.

Fig. 1. Example of a complex system: top-level model of a steer-by-wire system for cars by Volvo

The model is already complex with many components and interactions; most of the boxes are subsystems enclosing architectures of components. In the course of design, analysts would be called to identify hazards related to this system, for example loss of steering or incorrect steering; one of the questions that would then be asked is: which potential failures of components and their combinations could give rise to these hazards? And is the system adequate safe or reliable?

Now imagine that there are N components in this system, and you wish to examine the effect of combinations of two failures.

There are N*(N-1)/2 unique combinations.

If N=1000 then this is 1000*999/2 = 499500 unique combinations of two component failures to be examined. The task would be impossible to complete manually by a team of analysts. It is clear that some automation would benefit this type of analysis.

In the University of Hull, over 25 years we develop a method and tools that simplify dependability analysis and design optimisation of systems by partly automating the process. The method is known as Hierarchically Performed - Hazard Origin and Propagation Studies (HiP-HOPS) and is supported by a commercial tool. The following subsections explain the principles that underpin two capabilities of HiP-HOPS: Dependability Analysis and Evolutionary Design Optimisation

Dependability Analysis

The purpose of dependability analysis it to determine whether a proposed design meets its safety and reliability requirements. This analysis is typically done by identifying hazards associated with the system (for example “loss of steering” in the Volvo example).

The goal of the analysis is then to identify the causes of the hazard in the architecture of the system and demonstrate that the event is sufficiently unlikely, using probabilities.

In the current industrial practice this analysis is largely manual, done by analysts using a technique called Fault Tree Analysis. Fault trees are logical graphs which show how low-level failures, component faults and other conditions combine and propagate through a system to cause hazards at the outputs. The process starts with the identification of a particular hazard or undesirable event to be analysed, and then works backwards to identify the root causes step-by-step. Since each intermediate event can have multiple possible causes, whether singly or in combination, a tree of Boolean logic is formed from AND and OR gates (e.g. see Fig. 2).

Fig. 2. Modelling and Dependability Analysis in HiP-HOPS.

While software tool support exists and the probabilistic calculations can be automated, creating the fault tree itself is often still a manual process relying on the expertise and informal knowledge of the analysts involved. In HiP-HOPS, this process is largely automated. The analysis is performed on an architectural system model which identifies material, energy, and data transactions among components, as illustrated in Fig.3

Fig. 3. Modelling and Dependability Analysis in HiP-HOPS.

The model can be hierarchical if necessary to manage complexity. In the case of a hierarchical model, subsystems enclose architectures of more basic subsystems and components.

The dependability analysis process in HiP-HOPS then proceeds in three phases. The first phase is the annotation phase: each component in the model is annotated with its local error logic, describing the errors that can occur in the component and how it responds to deviations of its inputs. HiP-HOPS defines a language for the description of this error logic. In the basic version of this language, the error logic of a component can be specified as a list of internal failure modes of the component and a list of errors or deviations as they can be observed at component outputs.

Each component failure mode is optionally accompanied by quantitative data, for example a failure and a repair rate. Output errors carry Boolean expressions which describe their causes as a logical combination of component faults and similar errors observed at component inputs.

For example, one can specify:

“omission-component.output” is caused by “internalFailure” OR “omission-component.input”

Collectively, a set of failure expressions that logically explain all possible errors at all output ports of a component provides a model of the error logic of the component under examination. This model can be stored in a library. For simple components, e.g. sensors and actuators, such models could be re-used across different applications to simplify the manual part of the analysis and the overall application of the proposed technique.

The second phase of the HiP-HOPS dependability analysis process is the synthesis phase. Using the error logic associated with components, computerised algorithms automatically determine how errors propagate through connections in the model to cause functional failures at system outputs. These are the failures that analysts are typically interested in identifying and analysing. For example, in a car, such functional failures may include the loss of steering or braking. Since HiP-HOPS shows how individual failure modes in components can combine and lead to functional failures at system outputs, a system failure such as loss of braking may be seen to be the result of an actuator failure.

This global view is captured in a set of interconnected fault trees. These fault trees show how the leaf nodes of the trees — representing the component failure modes and their local effects — can logically combine and propagate though the system to cause the top events of the fault trees, which represent the functional failures of the system.

The interconnections between the trees represent dependencies in model, e.g. the failure of a common power supply or a global condition that may affect more than one system function. Common cause failures, such as flooding of physically co-located components, can also be represented in HiP-HOPS.

Once this is done, the third phase of HiP-HOPS is to perform analyses of this global system error model: the analysis phase.

HiP-HOPS can perform both qualitative and quantitative analysis of fault trees. Qualitative analysis is used to establish the minimal cut sets of the fault trees — the smallest combinations of failure events necessary to cause system failure — which more readily indicate how system failures may occur. Quantitative analysis is also possible when probabilistic parameters have been provided at component level and is used to predict the reliability and availability of the system.

In the final stage of the analysis, the complex body of logic encoded in the set of interconnected fault trees is simplified by an automated algorithm which translates it into a simple table of direct relationships between component and system failures (Fig.4).

Fig. 4. FMEA Generation in HiP-HOPS

In a similar way to a classical Failure Modes and Effects Analysis (FMEA), this table determines, for each component in the system and for each failure mode of that component, the effect of that failure mode on the system. The table shows which system failures (if any) each failure mode causes, both by itself and in conjunction with other events.

Note that in a classical manual FMEA only the effects of single failures are typically assessed. Thus, one advantage of generating an FMEA from fault trees is that fault trees record the effects of combinations of component failures and this useful information can also be transferred into the FMEA. The FMEA shows all the functional effects to which a particular component failure mode contributes, both individually and as part of a combination.

This is particularly useful as a failure mode that contributes to multiple system failures is potentially more significant than those that only cause a single top event. Consequently, this type of FMEA can also help analysts to determine the level of fault tolerance in the system, i.e., to determine whether the system can tolerate any single failure or any combination of two, three or more component failures.

Evolutionary Design Optimisation

Let us assume now that a team of analysts is designing a system, and that a tool such as HiP-HOPS suggests that the system does not meet its dependability requirements. At this stage we need to improve the design so that it does meet the requirements. There is typically a range of options available to improve a design, including:

replacing a component with a more reliable and expensive component
replacing part of the architecture with a more dependable alternative
replicating components in fault tolerant schemes so that failures are tolerated
increasing the frequency of maintenance, an action that prolongs the useful life of components and thereby increases the reliability of the system.

The difficulty is that in a typical system design, there is a very large number of possibilities for substitution, replication and maintenance scheduling. For instance, in a system of N components, if there are two suppliers for each component then there are 2^N configurations which equates to 1.26×10³⁰ configurations when N=100. Each configuration will have its own dependability and cost performance. In such situations analysts are confronted with a multi-objective optimisation problem, where the objectives may include dependability, cost, weight and other properties.

It would be prohibitively expensive to investigate more than a handful of these possible configurations manually. Therefore, to optimise such designs, we have developed an extension of HiP-HOPS that employs genetic algorithms to perform multi-objective optimisation of architectures with respect to dependability and other attributes.

As with dependability analysis in HiP-HOPS, the process starts from a model of the system (see Fig.3). However, this time the model is not fixed — it has variability, i.e., components can have multiple alternative implementations. These points of variability may involve different parameters of components or may involve architectural changes, e.g. replacing a single component with a more fault-tolerant design using primary and backup components. For example, a sensor can be chosen from two different suppliers, with each choice having its own cost, weight, performance, and failure characteristics. Subsystems can also carry alternatives, e.g. a subsystem can have two different implementations that provide the functions using different sets of components and different architectures. There can be options for replication of components with known patterns of fault tolerance, e.g. a primary-standby configuration, or multiple parallel channels with majority voting. Finally, there can be options for the scheduling of component maintenance.

Fig.5. Evolutionary Optimisation of Design in HIP-HOPS

Once the system model has been annotated to include these variable possibilities and any further required information, including associated cost and failure data etc, the model is given to HiP-HOPS, which then applies an evolutionary optimisation process. In the context of this process, HiP-HOPS creates a population of candidate designs by resolving the variability of the model, i.e., fixing variation points in the model by selecting particular design options. Each candidate design is then evaluated with respect to the objectives of the optimisation. The evaluation is performed using the analysis algorithms of HiP-HOPS. The dependability of a candidate design are automatically calculated from the generated fault trees. External plugins can also be designed to enable more precise evaluation of cost, weight or other objective functions.

Once candidate designs have been evaluated, they are ranked according to their performance and a Pareto frontier is formed showing the best designs in the current population. Roulette wheel selection, a random process biased towards the better performing designs, is used to select candidates to form the parents of the next generation. Through application of classic genetic operators such as mutation and crossover, a new population is then formed and the process of evaluation and ranking is iterated. The result of this process over a number of successive generations is a gradual improvement of the average performance of the population that is evident in the progressive improvement of the Pareto frontier. The process is terminated on meeting certain constraints or after a specified number of generations. The result is a set of models that give optimal or near optimal trade-offs among the objectives of the optimisation.

Via this process, designers can take informed decisions about the selection of components, subsystems, the location and type of replication, and about maintenance scheduling, all the while making sure that dependability requirements can be met whilst minimising costs.

An as example of this architectural optimisation process, HiP-HOPS was applied to a high-level abstract design of a vehicle pre-collision system (see Toyota paper) and an evolutionary optimisation technique was used to achieve balanced solutions with respect to dependability and cost.

The pre-collision system is an automotive safety technology that avoids or reduces the damage caused by a collision. The system supports drivers by issuing warnings when a potential collision threat is identified and activates emergency braking if the driver fails to apply the brakes. To improve system fault tolerance, a number of fault tolerance mechanisms were considered. These mechanisms may be applied to various locations in the system architecture to achieve greater dependability, albeit at an increased cost. The mechanisms include self-protection, self-checking, checkpoint-restart and process-pair. Self-protection and self-checking are functions which can be used for error detection. In self-protection, the component protects itself from external disturbances by detecting errors propagated from other components. In self-checking, a component detects internal errors and prevent the propagation of those errors to other components. Checkpoint-restart not only detects failures, but also recovers from errors by restarting the component. Finally, process-pair is a fault tolerance technique which uses redundancy realised by two identical software components. These are typical mechanisms for detection and correction of errors which give a sophisticated range of options to consider in early design. To model situations where these fault tolerant components miss some failures which need to be detected, an additional event miss was included in the analysis. Fault tolerant mechanisms may also experience failure, so the event failure is used to represent internal malfunction for the fault tolerant components. Information on failure expression and failure rate were included for each of the components in the system with reasonable assumptions about plausible hardware and software failures. The HiP-HOPS optimisation algorithm was finally employed to select the optimal location and types of fault-tolerance mechanisms in an improved version of the system. From a total design space of about 12⁷ » 3.6×10⁷, in just 5 minutes it was able to find 8 Pareto optimal solutions that provided a good trade-off between risk and cost while meeting the required constraints.

The case study showed that insight into the optimal use of fault tolerance can be arrived at much more rapidly with the aid of automated tool support. The vast number of different options, let alone the time required to evaluate and compare these options, would make an equivalent manual process infeasible. Thus metaheuristic approaches allow a designer to obtain significant improvements in reliability and cost performance.

Advanced Concepts

Extensions to HiP-HOPS over the years include:

Pandora, an algebraic framework for analysis of temporal fault trees and prediction of dependability in dynamic systems. Pandora can analyse state-sensitive fault trees describing sequencing of faults and created from architectural models and state machines.

Evolutionary algorithms for Automatic Allocation of Safety Requirements as Safety Integrity Levels or Development Assurance Levels; these automate the implementation of modern automotive safety standards such as ISO 26262 and the ARP aerospace safety standards.

New Fuzzy and Bayesian concepts for safety analysis under uncertainty that are integrated into the HiP-HOPS method.

Andromeda: A method and tool for model-based synthesis of safety arguments and safety cases

Nature-inspired algorithms capturing the social intelligence of Penguins. This work has been applied in automotive design and received attention of the BBC (article & interview) and other global media (Automotive IQ, EE Journal, Daily Mail )

Contribution to EAST-ADL (MAENAD EU project), and AADL ̶ two emerging languages with dependability analysis and optimization capabilities for design of automotive and avionics systems respectively.

Resources on HiP-HOPS:

Fault Tree Analysis fundamentals
Engineering Failure Analysis and Design Optimisation with HiP-HOPS
Application of HiP-HOPS by Toyota on Software Fault Tolerance
Safety Requirement Allocation according to automotive safety standard ISO26262 with HiP-HOPS
Andromeda: Model-based synthesis of safety arguments and safety cases
HiP-HOPS tool website

HIP-HOPS

HIP-HOPS: Automated Dependability Analysis and Evolutionary Design of Systems

Recent Posts

Comments

Subscribe to Our Newsletter