Polyglot (computing)


In computing, a polyglot is a computer program or script written in a valid form of multiple programming languages or file formats. The name was coined by analogy to multilingualism. A polyglot file is composed by combining syntax from two or more different formats.
When the file formats are to be compiled or interpreted as source code, the file can be said to be a polyglot program, though file formats and source code syntax are both fundamentally streams of bytes, and exploiting this commonality is key to the development of polyglots. Polyglot files have limited practical applications in compatibility, but can also present a security risk when used to bypass validation or to exploit a vulnerability.

History

Polyglot programs have been crafted as challenges and curios in hacker culture since at least the early 1990s. A notable early example, named simply polyglot was published on the Usenet group rec.puzzles in 1991, supporting eight languages, though this was inspired by even earlier programs. In 2000, a polyglot program was named a winner in the International Obfuscated C Code Contest.
In the 21st century, polyglot programs and files gained attention as a covert channel mechanism for propagation of malware. Polyglot files have limited practical applications in compatibility.

Construction

A polyglot is composed by combining syntax from two or more different formats, leveraging various syntactic constructs that are either common between the formats, or constructs that are language specific but carrying different meaning in each language. A file is a valid polyglot if it can be successfully interpreted by multiple interpreting programs. For example, a PDF-Zip polyglot might be opened as both a valid PDF document and decompressed as a valid zip archive. To maintain validity across interpreting programs, one must ensure that constructs specific to one interpreter are not interpreted by another, and vice versa.
This is often accomplished by hiding language-specific constructs in segments interpreted as comments or plain text of the other format.

Examples

C, PHP, and Bash

Two commonly used techniques for constructing a polyglot program are to make use of languages that use different characters for comments, and to redefine various tokens as others in different languages. These are demonstrated in this public domain polyglot written in ANSI C, PHP and bash:
Highlighted for Bash

#define a /*
# echo "\010Hello, world!\n";// 2> /dev/null > /dev/null \ ;
// 2> /dev/null; x=a;
$x=5; // 2> /dev/null \ ;
if )
// 2> /dev/null; then
return 0;
// 2> /dev/null; fi
#define e ?>
#define b */
#include
#define main int main
#define printf printf
#define function
function main
#define c /*
main
#*/

Highlighted for PHP

#define a /*
# echo "\010Hello, world!\n";// 2> /dev/null > /dev/null \ ;
// 2> /dev/null; x=a;
$x=5; // 2> /dev/null \ ;
if )
// 2> /dev/null; then
return 0;
// 2> /dev/null; fi
#define e ?>
#define b */
#include
#define main int main
#define printf printf
#define function
function main
#define c /*
main
#*/

Highlighted for C

#define a /*
# echo "\010Hello, world!\n";// 2> /dev/null > /dev/null \ ;
// 2> /dev/null; x=a;
$x=5; // 2> /dev/null \ ;
if )
// 2> /dev/null; then
return 0;
// 2> /dev/null; fi
#define e ?>
#define b */
#include
#define main int main
#define printf printf
#define function
function main
#define c /*
main
#*/

Note the following:
  • A hash sign marks a preprocessor statement in C, but is a comment in both bash and PHP.
  • "//" is a comment in both PHP and C and the root directory in bash.
  • Shell redirection is used to eliminate undesirable outputs.
  • Even on commented out lines, the "<?php" and "?>" PHP indicators still have effect.
  • The statement "function main" is valid in both PHP and bash; C #defines are used to convert it into "int main" at compile time.
  • Comment indicators can be combined to perform various operations.
  • "if )" is a valid statement in both bash and PHP.
  • printf is a bash shell builtin which is identical to the C printf except for its omission of brackets.
  • The final three lines are only used by bash, to call the main function. In PHP the main function is defined but not called and in C there is no need to explicitly call the main function.

    SNOBOL4, Win32Forth, PureBasicv4.x, and REBOL

The following is written simultaneously in SNOBOL4, Win32Forth, PureBasicv4.x, and REBOL:
Highlighted for SNOBOL

*BUFFER : A.A ;. @ To Including?
Macro SkipThis; OUTPUT = Char "Hello, World !"
;OneKeyInput Input ; Char
End; SNOBOL4 + PureBASIC + Win32Forth + REBOL = <3
EndMacro: OpenConsole : PrintN
Repeat : Until Inkey : Macro SomeDummyMacroHere
REBOL Print
"Hello, world !" EndMacro: func set-modes
system/ports/input Input set-modes
system/ports/input NOP:: EndMacro
; Wishing to refine it with new language ? Go on !

Highlighted for Forth

*BUFFER : A.A ;. @ To Including?
Macro SkipThis; OUTPUT = Char "Hello, World !"
;OneKeyInput Input ; Char
End; SNOBOL4 + PureBASIC + Win32Forth + REBOL = <3
EndMacro: OpenConsole : PrintN
Repeat : Until Inkey : Macro SomeDummyMacroHere
REBOL Print
"Hello, world !" EndMacro: func set-modes
system/ports/input Input set-modes
system/ports/input NOP:: EndMacro
; Wishing to refine it with new language ? Go on !

Highlighted for BASIC

*BUFFER : A.A ;. @ To Including?
Macro SkipThis; OUTPUT = Char "Hello, World !"
;OneKeyInput Input ; Char
End; SNOBOL4 + PureBASIC + Win32Forth + REBOL = <3
EndMacro: OpenConsole : PrintN
Repeat : Until Inkey : Macro SomeDummyMacroHere
REBOL Print
"Hello, world !" EndMacro: func set-modes
system/ports/input Input set-modes
system/ports/input NOP:: EndMacro
; Wishing to refine it with new language ? Go on !

Highlighted for REBOL

*BUFFER : A.A ;. @ To Including?
Macro SkipThis; OUTPUT = Char "Hello, World !"
;OneKeyInput Input ; Char
End; SNOBOL4 + PureBASIC + Win32Forth + REBOL = <3
EndMacro: OpenConsole : PrintN
Repeat : Until Inkey : Macro SomeDummyMacroHere
REBOL Print
"Hello, world !" EndMacro: func set-modes
system/ports/input Input set-modes
system/ports/input NOP:: EndMacro
; Wishing to refine it with new language ? Go on !

MS-DOS batch file and Perl

The following file runs as an MS-DOS batch file, then re-runs itself in Perl:
Highlighted for DOS batch

@rem = ' --PERL--
@echo off
perl "%~dpnx0" %*
goto endofperl
@rem ';
#!perl
print "Hello, world!\n";
__END__
:endofperl

Highlighted for Perl

@rem = ' --PERL--
@echo off
perl "%~dpnx0" %*
goto endofperl
@rem ';
#!perl
print "Hello, world!\n";
__END__
:endofperl

This allows creating Perl scripts that can be run on MS-DOS systems with minimal effort. Note that there is no requirement for a file to perform exactly the same function in the different interpreters.

Types

Polyglot types include:
  • stacks, where multiple files are concatenated with each other
  • parasites where a secondary file format is hidden within comment fields in a primary file format
  • zippers where two files are mutually arranged within each others' comments
  • cavities, where a secondary file format is hidden within null-padded areas of the primary file.

    Benefits

Polyglot markup

Polyglot markup has been proposed as a useful combination of the benefits of HTML5 and XHTML. Such documents can be parsed as either HTML or XML, and will produce the same DOM structure either way. For example, in order for an HTML5 document to meet these criteria, the two requirements are that it must have an HTML5 doctype, and be written in well-formed XHTML. The same document can then be served as either HTML or XHTML, depending on browser support and MIME type.
As expressed by the html-polyglot recommendation, to write a polyglot HTML5 document, the following key points should be observed:
  1. Processing instructions and the XML declaration are both forbidden in polyglot markup
  2. Specifying a document’s character encoding
  3. The DOCTYPE
  4. Namespaces
  5. Element syntax
  6. Element content
  7. Text
  8. Attributes
  9. Named entity references
  10. Comments
  11. Scripting and styling polyglot markup
The most basic possible polyglot markup document would therefore look like this:




The title element must not be empty.





In a polyglot markup document non-void elements cannot be self-closing even if they are empty, as this is not valid HTML. For example, to add an empty textarea to a page, one cannot use instead.