Name mangling
In compiler construction, name mangling is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages.
It provides means to encode added information in the name of a function, structure, class or another data type, to pass more semantic information from the compiler to the linker.
The need for name mangling arises where a language allows different entities to be named with the same identifier as long as they occupy a different namespace or have different type signatures. It is required in these uses because each signature might require different, specialized calling convention in the machine code.
Any object code produced by compilers is usually linked with other pieces of object code by a type of program called a linker. The linker needs a great deal of information on each program entity. For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.
The simple programming languages of the 1970s, like C, only distinguished subroutines by their name, ignoring other information including parameter and return types.
Later languages, like C++, defined stricter requirements for routines to be considered "equal", such as the parameter types, return type, and calling convention of a function. These requirements enable method overloading and detection of some bugs.
These stricter requirements needed to work with extant programming tools and conventions. Thus, added requirements were encoded in the name of the symbol, since that was the only information a traditional linker had about a symbol.
Examples
C
Although name mangling is not generally required or used by languages that do not support function overloading, like C and classic Pascal, they use it in some cases to provide added information about a function.For example, compilers targeted at Microsoft Windows platforms support a variety of calling conventions, which determine the manner in which parameters are sent to subroutines and results are returned. Because the different calling conventions are incompatible with one another, compilers mangle symbols with codes detailing which convention should be used to call the specific routine.
The mangling scheme for Windows was established by Microsoft and has been informally followed by other compilers including Digital Mars, Borland, and GNU Compiler Collection when compiling code for the Windows platforms. The scheme even applies to other languages, such as Pascal, D, Delphi, Fortran, and C#. This allows subroutines written in those languages to call, or be called by, extant Windows libraries using a calling convention different from their default.
When compiling the following C examples:
int _cdecl f
int _stdcall g
int _fastcall h
32-bit compilers emit, respectively:
_f
_g@4
@h@4
In the and mangling schemes, the function is encoded as
_@ and @@ respectively, where is the number of bytes, in decimal, of the argument in the parameter list. In the case of, the function name is merely prefixed by an underscore.The 64-bit convention on Windows has no leading underscore. This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits. For example, Fortran code can use 'alias' to link against a C method by name as follows:
SUBROUTINE f
!DEC$ ATTRIBUTES C, ALIAS:'_f' :: f
END SUBROUTINE
This will compile and link fine under 32 bits, but generate an unresolved external
_f under 64 bits. One workaround for this is not to use 'alias' at all. Another is to use the Fortran 2003 option:SUBROUTINE f BIND
END SUBROUTINE
In C, most compilers also mangle static functions and variables in translation units using the same mangling rules as for their non-static versions. If functions with the same name are also defined and used in different translation units, it will also mangle to the same name, potentially leading to a clash. However, they will not be equivalent if they are called in their respective translation units. Compilers are usually free to emit arbitrary mangling for these functions, because it is illegal to access these from other translation units directly, so they will never need linking between different object code. To prevent linking conflicts, compilers will use standard mangling, but will use so-called 'local' symbols. When linking many such translation units there might be multiple definitions of a function with the same name, but resulting code will only call one or another depending on which translation unit it came from. This is usually done using the relocation mechanism.
C++
compilers are the most widespread users of name mangling. The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers that produced machine code or assembly directly, the system's linker generally did not support C++ symbols, and mangling was still required.The C++ language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as classes, templates, namespaces, and operator overloading, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling the name of a symbol. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.
Simple example
A single C++ translation unit might define two functions named :int f
int f
void g
These are distinct functions, with no relation to each other apart from the name. The C++ compiler will therefore encode the type information in the symbol name, the result being something resembling:
int __f_v
int __f_i
void __g_v
Even though its name is unique, is still mangled: name mangling applies to all C++ symbols.
Complex example
The mangled symbols in this example, in the comments below the respective identifier name, are those produced by the GNU GCC 3.x compilers, according to the IA-64 ABI:export module org.wikipedia;
import std;
using std::ostream;
using std::string;
using std::string_view;
export namespace org::wikipedia
All mangled symbols begin with ; for nested names, this is followed by, then a series of <length, id> pairs, and finally. For example, becomes:
_ZN9org8wikipedia7Article6formatE
For functions, this is then followed by the type information; as is a function, this is simply ; hence:
_ZN9org8wikipedia7Article6formatEv
For, the standard type is used, which has the special alias ; a reference to this type is therefore, with the complete name for the function being:
_ZN9org8wikipedia7Article6printToERSo
How different compilers mangle the same functions
There isn't a standardized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers mangle public symbols in radically different ways. Consider how different C++ compilers mangle the same functions:Notes:
- The Compaq C++ compiler on OpenVMS VAX and Alpha and Tru64 UNIX has two name mangling schemes. The original, pre-standard scheme is known as the ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual. With the advent of new features in standard C++, particularly templates, the ARM scheme became more and more unsuitable – it could not encode certain function types, or produced identically mangled names for different functions. It was therefore replaced by the newer American National Standards Institute model, which supported all ANSI template features, but was not backward compatible.
- On IA-64, a standard application binary interface exists, which defines a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x has further adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
- The Visual Studio and Windows SDK include the program which prints the C-style function prototype for a given mangled name.
- On Microsoft Windows, the Intel compiler and Clang uses the Visual C++ name mangling for compatibility.
Handling of C symbols when linking from C++
- ifdef __cplusplus
- endif
is to ensure that the symbols within are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.
For example, the header usually contains contents resembling:
- ifdef __cplusplus
- endif
Thus, code such as:
if else
uses the correct, unmangled and. If the had not been used, the C++ compiler would produce code equivalent to:
if else
Since those symbols do not exist in the C runtime library, link errors would result.
Standardized name mangling in C++
It would seem that standardized name mangling in the C++ language would lead to greater interoperability between compiler implementations. However, such a standardization by itself would not suffice to guarantee C++ compiler interoperability and it might even create a false impression that interoperability is possible and safe when it isn't. Name mangling is only one of several application binary interface details that need to be decided and observed by a C++ implementation. Other ABI aspects like exception handling, virtual table layout, structure, and stack frame padding also cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits dictate a particular mangling scheme. A standardized requirement for name mangling would also prevent an implementation where mangling was not required at all – for example, a linker that understood the C++ language.The C++ standard therefore does not attempt to standardize name mangling. On the contrary, the Annotated C++ Reference Manual actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI are incompatible.
Nevertheless, as detailed in the section above, on some platforms the full C++ ABI has been standardized, including name mangling.