String literal
A string literal or anonymous string is a literal for a string value in source code. Commonly, a programming language includes a string literal code construct that is a series of characters enclosed in bracket delimiters usually quote marks. In many languages, the text
"foo" is a string literal that encodes the text foo but there are many other variations.Syntax
Bracket delimited
A bracketed string literal is delimited by a start and an end character. The language can specify the use of any characters as delimiters.Quotation is the most common way to delimit a string literal. Many languages support double-quotes and/or single-quotes. When both are supported, delimiter collision can be minimized by treating one style of quotes as normal text when enclosed in quotes of the other style. In Python the literal is valid since the outer quotes are double, making the inner single quotes regular text.
An empty string is written as or.
Paired delimiters are two different characters where one is used at the beginning of a literal and the other used at the end. With paired delimiters, the language can support embedding quotes in the literal text as long as they all are paired. For example, PostScript uses parentheses, as in
and m4, uses backtick at the start, and apostrophe at the end. Tcl allows both quotes and braces, as in "The quick brown fox" or ; this derives from the single quotations in Unix shells and the use of braces in C for compound statements, since blocks of code is in Tcl syntactically the same thing as string literals – that the delimiters are paired is essential for making this feasible.Quotation is most commonly via unpaired quotes, but some tools and character sets support paired quotes. The Unicode character set includes paired versions.
“Hi there!”
‘Hi there!’
„Hi there!“
«Hi there!»
Whitespace delimited
A language might support multi-line strings. In YAML, string literals may be specified by the relative positioning of whitespace andindentation.
- title: An example multi-line string in YAML
body : |
This is a multi-line string.
"special" metacharacters may
appear here. The extent of this string is
represented by indentation.
Word delimited
Some languages, such as Perl and PHP, allow string literals that are delimited the same as words in a natural language. In the following Perl code, for example,red, green, and blue are string literals, even though not quoted:Perl treats a non-reserved sequence of alphanumeric characters as string literal in most contexts. For example, the following two lines of Perl are equivalent:
$y = "x";
$y = x;
Declarative notation
The length of a literal can be encoded into the beginning of the text which alleviates the need for marking the beginning and end of a string. For example, in FORTRAN, string literals were written in Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string.A drawback of this technique is that it is relatively error-prone unless length insertion is automated, especially for multi-byte encodings. Advantages include: alleviates need to search for the end delimiter and therefore requires less computational overhead, prevents delimiter collision issues and enables the inclusion of metacharacters that might otherwise be mistaken as commands
Delimiter collision
When using quoting, if one wishes to represent the delimiter itself in a string literal, one runs into the problem of delimiter collision. For example, if the delimiter is a double quote, one cannot simply represent a double quote itself by the literal""" as the second quote is interpreted as the end of the string literal, not as the value of the string, and similarly one cannot write "This is "in quotes", but invalid." as the middle quoted portion is instead interpreted as outside of quotes. There are various solutions, the most general-purpose of which is using escape sequences, such as "\"" or "This is \"in quotes\" and properly escaped.", but there are many other solutions.Paired quotes, such as braces in Tcl, allow nested strings, such as
but do not otherwise solve the problem of delimiter collision, since an unbalanced closing delimiter cannot simply be included, as in Doubling up
A number of languages, including Pascal, BASIC, DCL, Smalltalk, SQL, J, and Fortran, avoid delimiter collision by doubling up on the quotation marks that are intended to be part of the string literal
itself:
'This Pascal stringcontains two apostrophes'
"I said, ""Can you hear me?"""
Dual quoting
Some languages, such as Fortran, Modula-2, JavaScript, Python, and PHP allow more than one quoting delimiter; in the case of two possible delimiters, this is known as dual quoting. Typically, this consists of allowing the programmer to use either single quotations or double quotations interchangeably – each literal must use one or the other.
"This is John's apple."
'I said, "Can you hear me?"'
This does not allow having a single literal with both delimiters in it, however. This can be worked around by using several literals and using string concatenation:
'I said, "This is ' + "John's" + ' apple."'
Python has string literal concatenation, so consecutive string literals are concatenated even without an operator, so this can be reduced to:
'I said, "This is '"John's"' apple."'
Delimiter quoting
introduced so-called raw string literals. They consist, essentially of
that is, after R" the programmer can enter up to 16 characters except whitespace characters, parentheses, or backslash, which form the end-of-string-id, then an opening parenthesis is required. Then follows the actual content of the literal: Any sequence characters may be used, and finally – to terminate the string – a closing parenthesis, the eos id, and a quote is required.
The simplest case of such a literal is with empty content and empty eos id: R"".
The eos id may itself contain quotes: is a valid literal
Escape sequences don't work in raw string literals.
D supports a few quoting delimiters, with such strings starting with q" plus an opening delimiter and ending with the respective closing delimiter and ". Available delimiter pairs are , <>, , and ; an unpaired non-identifier delimiter is its own closing delimiter. The paired delimiters nest, so that is a valid literal; an example with the non-nesting / character is.
Similar to C++11, D allows here-document-style literals with end-of-string ids:
In D, the end-of-string-id must be an identifier.
In some programming languages, such as sh and Perl, there are different delimiters that are treated differently, such as doing string interpolation or not, and thus care must be taken when choosing which delimiter to use; see the section on [|different kinds of strings] below.Multiple quoting
A further extension is the use of multiple quoting, which allows the author to choose which characters should specify the bounds of a string literal.
For example, in Perl:
qq^I said, "Can you hear me?"^
qq@I said, "Can you hear me?"@
qq§I said, "Can you hear me?"§
all produce the desired result. Although this notation is more flexible, few languages support it; other than Perl, Ruby and C++11 also support these. A variant of multiple quoting is the use of here document-style strings.
Lua provides a limited form of multiple quoting, particularly to allow nesting of long comments or embedded strings. Normally one uses and to delimit literal strings, but the opening brackets can include any number of equal signs, and only closing brackets with the same number of signs close the string. For example:
local ls = =]
Multiple quoting is particularly useful with regular expressions that contain usual delimiters such as quotes, as this avoids needing to escape them. An early example is sed, where in the substitution command s/regex/'replacement/ the default slash / delimiters can be replaced by another character, as in s,regex,replacement',.Constructor functions
Another option, which is rarely used in modern languages, is to use a function to construct a string, rather than representing it via a literal. This is generally not used in modern languages because the computation is done at run time, rather than at parse time.
For example, early forms of BASIC did not include escape sequences or any other workarounds listed here, and thus one instead was required to use the CHR$ function, which returns a string containing the character corresponding to its argument. In ASCII the quotation mark has the value 34, so to represent a string with quotes on an ASCII system one would write
"I said, " + CHR$ + "Can you hear me?" + CHR$
In C, a similar facility is available via sprintf and the %c "character" format specifier, though in the presence of other workarounds this is generally not used:
char buffer;
snprintf;
These constructor functions can also be used to represent nonprinting characters, though escape sequences are generally used instead. A similar technique can be used in C++ with the std::string stringification operator.