Software diversity
Software diversity is a research field about the comprehension and engineering of diversity in the context of software.
Areas
The different areas of software diversity are discussed in surveys on diversity for fault-tolerance or for security.The main areas are:
- design diversity, n-version programming, data diversity for fault tolerance
- randomization
- software variability
Techniques
Code transformations
It is possible to amplify software diversity through automated transformation processes that create synthetic diversity. A "multicompiler" is compiler embedding a diversification engine. A multi-variant execution environment is responsible for selecting the variant to execute and compare the output.Fred Cohen was among the very early promoters of such an approach. He proposed a series of rewriting and code reordering transformations that aim at producing massive quantities of different versions of operating systems functions. These ideas have been developed over the years and have led to the construction of integrated obfuscation schemes to protect key functions in large software systems.
Another approach to increase software diversity of protection consists in adding randomness in certain core processes, such as memory loading. Randomness implies that all versions of the same program run differently from each other, which in turn creates a diversity of program behaviors. This idea was initially proposed and experimented by Stephanie Forrest and her colleagues.
Recent work on automatic software diversity explores different forms of program transformations that slightly vary the behavior of programs. The goal is to evolve one program into a population of diverse programs that all provide similar services to users, but with a different code. This diversity of code enhances the protection of users against one single attack that could crash all programs at the same time.
Transformation operators include:
- code layout randomization: reorder functions in code
- globals layout randomization: reorder and pad globals
- stack variable randomization: reorder variables in each stack frame
- heap layout randomization