Software safety

Software safety is an engineering discipline that aims to ensure that software, which is used in safety-related systems, does not contribute to any hazards such a system might pose.
There are numerous standards that govern the way how safety-related software should be developed and assured in various domains. Most of them classify software according to their criticality and propose techniques and measures that should be employed during the development and assurance:

Software for generic electronic safety-related systems: IEC 61508
Automotive software: ISO 26262
Railway software: EN 50716
Airborne software: DO-178C/ED-12C)
Air traffic management software: DO-278A/ED-109A
Medical devices: IEC 62304
Nuclear power plants: IEC 60880

Terminology

System Safety is the overarching discipline that aims to achieve safety by reducing risks in technical systems to an acceptable level. According to the widely adopted system safety standard IEC 61508, safety is “freedom from unacceptable risk of harm”. As software alone – which can be considered as pure information – cannot cause any harm by itself, the term software safety is sometimes dismissed and replaced by “software system safety”. This stresses that software can only cause harm in the context of a technical system, that has some effect on its environment.
The goal of software safety is to make sure that software does not cause or contribute to any hazards in the system where it is used and that it can be assured and demonstrated that this is the case. This is typically achieved by the assignment of a "safety level" to the software and the selection of appropriate processes for the development and assurance of the software.

Assignment of safety levels

One of the first steps when creating safety-related software is to classify software according to its safety-criticality. Various standards suggest different levels, e.g. Software Levels A-E in DO-178C, SIL 1-4 in IEC 61508, ASIL A-D in ISO 26262.
The assignment is typically done in the context of an overarching system, where the worst case consequences of software failures are investigated. For example, automotive standard ISO 26262 requires the performance of a Hazard and Risk Assessment on vehicle level to derive the ASIL of the software executed on a component.

Process adherence and assurance

It is essential to use an adequate development and assurance process, with appropriate methods and techniques, commensurate with the safety criticality of the software. Software safety standards recommend and sometimes forbid the use of such methods and techniques, depending on the safety level.
Most standards suggest a lifecycle model and prescribe required activities to be executed during the various phases of the software. For example, IEC 61508 requires that software is specified adequately, that the software design should be modular and testable, that adequate programming languages are used, documented code reviews are performed and that testing should be performed an several layers to achieve an adequately high test coverage.
The focus on the software development and assurance process stems from the fact that software quality is heavily influenced by the software process, as suggested by IEC 25010. It is claimed that the process influences the internal software quality attributes and these in turn influence external software quality attributes.
The following activities and topics addressed in the development process contribute to safe software.

Documentation

Comprehensive documentation of the complete development and assurance process is required by virtually all software safety standards. Typically, this documentation is reviewed and endorsed by third parties and therefore a prerequisite for the approval of safety-related software. The documentation ranges from various planning documents, requirements specifications, software architecture and design documentation, test cases on various abstraction levels, tool qualification reports, review evidence, verification and validation results etc. Fig C.2 in EN 50716 lists 32 documents that need to be created along the development lifecycle.

Traceability

Traceability is the practice to establish relationships between different types of requirements and between requirements and design, implementation and testing artefacts. According to EN 50716, the objective “is to ensure that all requirements can be shown to have been properly met and that no untraceable material has been introduced”. By documenting and maintaining traceability, it becomes possible to follow e.g. a safety requirement into the design of a system, further on into the software source code, and to an appropriate test case and test execution.

Software implementation

Safety standards can have requirements directly affecting the implementation of the software in source code, such as e.g. the selection of an appropriate programming language, the size and complexity of functions, the use of certain programming constructs and the need for coding standards. Part 3 of IEC 61508 contains the following requirements and recommendations:

Use of a strongly typed programming language. Some languages are better suited than others for safety-related systems. Languages that support strong typing can detect more faults during the compilation process that would otherwise only be detected during runtime. Therefore, assembler is typically discouraged, whereas high level languages especially geared towards for the safety-related market are recommended.
Use of an appropriate coding standard defining a “safe” language subset, e.g. MISRA C. MISRA-C is a coding standard for the C programming language that aims to improve code quality and safety by disallowing error prone constructs, or features that are compiler dependent.
Limiting the use of recursion, pointers and interrupts.
Disallowing “unstructured control flow in programs”, i.e. avoiding jumping in an unstructured way, e.g. by using “goto”-like statements.

Test coverage

Appropriate test coverage needs to be demonstrated, i.e. depending on the safety level more rigorous testing schemes have to be applied. A well known requirement regarding test coverage depending on the software level is given in DO-178C:

Level C: Statement coverage is required - i.e. "every statement in the program has been invoked at least once" during testing.
Level B: Branch coverage is required - i.e. "every point of entry and exit in the program has been invoked at least once and every decision in the program has taken on all possible outcomes at least once."
Level A: Modified condition/decision coverage - an extension of branch coverage, with the requirement that "each condition in a decision has been shown to independently affect that decision's outcome."

Independence

Software safety standards typically require some activities to be executed with independence, i.e. by a different person, by a person with different reporting lines, or even by an independent organization. This ensures that conflicts of interest are avoided and increases the chances that faults are identified.
For example, EN 50716 Figure 2 requires the roles “implementer”, “tester” and “verifier” to be held by different people, the role “validator” to be held by a person with different reporting line and the role “assessor” to be held by a person from a different organizational unit.
DO-178C and DO-278A require several activities to be executed “with independence”, with independence being defined as “separation of responsibilities which ensures the accomplishment of objective evaluation”.

Open questions and issues

Software failure rates

In system safety engineering, it is common to allocate upper bounds for failure rates of subsystems or components. It must then be shown that these subsystems or components do not exceed their allocated failure rates, or otherwise redundancy or other fault tolerance mechanisms must be employed. This approach is not practicable for software.
Software failure rates cannot be predicted with any confidence. Although significant research in the field of software reliability has been conducted, current software safety standards do not require any of these methods to be used or even discourage their usage, e.g. DO178C states: “Many methods for predicting software reliability based on developmental metrics have been published, for example, software structure, defect detection rate, etc. This document does not provide guidance for those types of methods, because at the time of writing, currently available methods did not provide results in which confidence can be placed.” ARP 4761 clause 4.1.2 states that software design errors “are not the same as hardware failures. Unlike hardware failures, probabilities of such errors cannot be quantified.”

Safety and security

Software safety and security may have differing interests in some cases. On the one hand safety-related software that is not secure can pose a safety risk, on the other hand, some security practices contradict established safety practices.

Artificial intelligence

Software that employs artificial intelligence techniques such as machine learning follows a radically different lifecycle. In addition, the behavior is harder to predict than for a traditionally developed system. Hence, the question whether and how these technologies can be used, is under current investigation. Currently, standards generally do not endorse their use. For example, EN 50716 states that artificial intelligence and machine learning are not recommended for any safety integrity level.

Agile development methods

Agile software development, which typically features many iterations, is sometimes still stigmatized as being too chaotic for safety-related software development. This might be partially caused by statements such as "working software over comprehensive documentation", which is found in the manifesto for agile development. Although most software safety standards present the software lifecycle in the traditional waterfall-like sequence, some do contain statements that allow for more flexible lifecycles. DO-178C states that "The processes of a software life cycle may be iterative, that is, entered and reentered." EN 50716 contains Annex C that shows how iterative development lifecycles can be used in line with the requirements of the standard.

Goals

Functional safety is achieved through engineering development to ensure correct execution and behavior of software functions as intended
Safety consistent with mission requirements, is designed into the software in a timely, cost effective manner.
On complex systems involving many interactions safety-critical functionality should be identified and thoroughly analyzed before deriving hazards and design safeguards for mitigations.
Safety-critical functions lists and preliminary hazards lists should be determined proactively and influence the requirements that will be implemented in software.
Contributing factors and root causes of faults and resultant hazards associated with the system and its software are identified, evaluated and eliminated or the risk reduced to an acceptable level, throughout the lifecycle.
Reliance on administrative procedures for hazard control is minimized.
The number and complexity of safety critical interfaces is minimized.
The number and complexity of safety critical computer software components is minimized.
Sound human engineering principles are applied to the design of the software-user interface to minimize the probability of human error.
Failure modes, including hardware, software, human and system are addressed in the design of the software.
Sound software engineering practices and documentation are used in the development of the software.
Safety issues and safety attributes are addressed as part of the software testing effort at all levels.
Software is designed for human machine interface, ease of maintenance and modification or enhancement
Software with safety-critical functionality must be thoroughly verified with objective analysis and preferably test evidence that all safety requirements have been met per established criteria.