Safety of Autonomy and SoS using Executable Digital Dependability Identities (EDDIs)
EDDIs will be useful to you if you develop an autonomous system, a driverless car, or a mobile robot, or a complex system of systems such as a multi-robot factory system, or a swarm of drones performing a collaborative task, e.g. inspection and maintenance.
The autonomy and open/cooperative nature of these systems pose new challenges for dependability. We explore how these can be addressed with new dependability monitoring technologies.
Autonomous vehicles, unmanned aerial vehicles, distributed and cloud-controlled robotics, telehealth systems, smart energy grids, and the internet of things are often cooperative Systems of Systems (SoS) and making them dependable is challenging. We can identify a number of difficulties they pose in dependability assurance.
Challenges
The first difficulty is caused by the distribution and the often heterarchical organisation of systems. A heterarchy is a system or organisation where the elements are unranked and non-hierarchical or can be ranked in different ways. SoS are inherently distributed, loosely connected, and non-hierarchical. Individual systems within a SoS are produced by different stakeholders and there is no overarching specification or authority that can guarantee their dependability when they meet in various configurations. None of the systems typically has total control and authority over others. This means that the dependability of the overall system cannot be interpreted as a set of goals that are related to the behaviour of one system and to which other systems contribute. The latter is possible in more conventional systems organised as hierarchies of subsystems.
It is possible, for example, to express the safety requirements of a car as a set of integrity requirements that must be achieved by its components as dictated by safety standard ISO 26262. However, it is not possible to use a single reference starting point from which one could express the requirements for safety in the totality of a transport system, which is composed of connected autonomous cars and smart infrastructure. A car comprises of a hierarchy of components, while the connected transport system is a heterarchy of systems where no system has priority or absolute control when safety is concerned. This heterarchical organisation poses a major challenge for the state-of-the-art on dependability. The challenge applies both to new standards as well as cutting edge research, e.g. on model-based safety analysis, model-checking or other formal methods. Indeed, both standards and current research mostly assume a hierarchical organisation of the system, decomposition of systems into subsystems, and clear hierarchical authority of control.
A second challenge is caused by the inevitable incompleteness of dependability models anyone would attempt to do a priori at design time for an SoS. A traffic system of connected and autonomous cars and smart infrastructures does not have a finite set of configurations. Given the unpredictable nature of SoS and the infinity of configurations, any a priori dependability models are likely to be incomplete. Indeed, all state-of-the-art dependability analysis and assurance techniques assume a bounded system, which means that full a priori certification before operation using these techniques is impossible when the SoS is unbounded and its configurations are impossible to enumerate. These systems operate in highly dynamic and unpredictable environments, where systems collaborate with other systems adapting their behaviour in response to the change in the context of operation, workload, physical infrastructure, and network topology.
Finally, there is increased uncertainty in SoS. It can arise from many sources: a) limited observability of the system and its environment caused by lack of sensors or failure of sensors b) unreliability of measurements c) inaccuracy, indeterminism or probabilistic nature of the inferences drawn by AI components, e.g. machine learning algorithms d) limited knowledge concerning services and dependability-relevant properties of collaboration partners in a cooperative or open system e) limited knowledge regarding trustworthiness and quality of 3rd party information.
Executable Digital Dependability Identities
To address the challenges of CSoS identified above, we take a new approach to dynamic dependability assurance of SoS. The approach uses a network of intelligent dependability monitoring agents, called Executable Digital Dependability Identities (EDDIs) to deliver dynamic dependability assurance within a SoS.
An agent for a system carries information in the form of dependability models (such as fault trees, Conditional Safety Certificates and Bayesian Nets) and uses information shared by the agents of other cooperating systems in conjunction with information from the environment to provide dependability management at runtime, e.g., event monitoring, detection and diagnosis of faults, certification of safe operations, risk prediction and adaptation when things go wrong in unpredictable circumstances. The approach aims to facilitate the self-certified dependable operation of SoS even if the system design evolves during operation.
An EDDI is both a dependability monitor of a physical system, observing and enforcing its dependability, and an agent in a distributed multi-agent system managing dependability in a SoS. The architecture of an EDDI agent is shown in Fig.1.
EDDI interfaces to its respective system, operators and the SoS. Through System Inputs it receives information about the state of the system and its parameters. Through System Outputs, it communicates corrective measures to its system. Through SoS Inputs it receives dependability guarantees and state information from other EDDIs in the SoS. Through SoS Outputs it communicates dependability demands, and own state information to other EDDIs in the SoS. EDDI also has an interface to operators.
Low level detection and processing of events is done via an Event Monitor and a Diagnostic Engine. A set of High-level Reasoning Apps further processes detected and diagnosed events and performs further dependability management operations. A dynamic Model Validation and Repair component monitors the models High-level Reasoning Apps used at run-time for correctness and completeness. We sketch the technology of the components:

Fig.1. EDDI architecture
Event Monitor: This mechanism evaluates the occurrence of events using real-time sensory data stored in the time series store, i.e. a dynamic list of dependability-relevant events referenced by the High-level Reasoning Apps at each point in time. This is a repository holding data produced by the component or system, e.g. sensor readings, or maintenance events. One implementation of this is as a list of circular buffers which hold a shifting time window of current and recent values of parameters. Historical values of monitored parameters are stored and accessed by the monitor. The expressions confirming the occurrence of events can be complex logical expressions combining constraints monitored over time and can contain differentiation and integration operators to enable reasoning about the past trends. A system of three-value logic, where an ‘unknown’ value is added to ‘true’ and ‘false’, is also employed to enable evaluation of expressions in the context of incomplete information. This system can in certain cases mask unknown truth values and, ultimately, compute the known or unknown truth value of compound expressions from the known or unknown truth values of their constituent components. In practice, this system can allow the monitor to produce early alarms in the presence of (detected) sensor failures and incomplete process data without violating the logic specified in monitoring expressions. The event monitor can also filter spurious measurements that could trigger false alarms by using expressions that check the consistency of deviations over a period of time.
Components of the system that incorporate machine learning introduce a probabilistic type of uncertainty, and they are treated differently from events the state of which can be confirmed with certainty to be true, false or unknown. AI components generate events which have a probability of being true or false; for instance, a camera that does pattern recognition will not always correctly recognise an object.
Diagnostic Engine: One potential issue in low-level event monitoring is that detected conditions may reflect the symptoms of failures and not underlying causes. We would expect that some of those symptoms would require further diagnosis before conclusions about the health of the system, certification of operations and corrective measures can be taken by High-level Reasoning Apps. The second low-level mechanism of EDDI is precisely a Diagnostic Engine which tries to localise the root failures of detected anomalous symptoms. Diagnosis can be achieved via traversal of branches of fault trees in which the initial symptoms appear as top events and potential causes appear as leaf nodes.
High-level Reasoning Apps: Following the primary detection and diagnosis of events, dependability-relevant events are handled by a set of model-based High-level Apps. Each App examines the impact of those events on the model upon which it operates. Apps use protocols to communicate with other EDDIs in the SoS. In light of information about the system and other relevant systems, and by executing their own models, Apps deliver certification and further dependability management functions. Apps operate on models such as Conditional Safety Certificates and Bayesian Nets, showing relationships between causes and effects of failure. All these models define causal relationships between events and can be encoded in the form of directed graphs. Models contain events augmented with monitoring expressions that can be evaluated by the Event Monitor at runtime to verify the occurrence of these events. Different models can be used by the Apps to deliver different functionalities
-
ConSerts define a success logic and are used to check whether the safety goals that they specify could be met over a configuration of systems involved in the delivery of these goals in the SoS. A ConSert App will examine whether the safety goals specified by the ConSert can be met through verified satisfaction of the demands communicated to other relevant systems. A car, for example, to satisfy its safety goals, may require guarantees that other cars will keep their distance. Based on that evaluation, the App would be able to certify or not an operation that assumes satisfaction of these goals as a precondition.
-
Bayesian networks are used for dynamic probabilistic calculation of evolving risks for the system and the SoS. Such risk estimation can be achieved as evidence of events is used to recalculate probabilities and risk and form the basis of further decisions to ensure safe operation.
Model Validation and Repair: EDDI is a largely model-based system, and the models upon which EDDI relies may in practice deviate from reality. Scenarios may not have been anticipated or may have been incorrectly described. To address this issue, EDDI incorporates a dynamic Model Validation and Repair component. Deviations observed between system conditions and model predictions are captured and can be used to suspend the monitor and revert to a default safe mode.
The concept has the potential to address cutting-edge problems in technology. Early results and applications are reported in the SESAME EU project.
Resources on EDDIs:
-
Metamodels for some of the models used by EDDIs can be found in Githu