string literal

From cppreference.com
< cpplrm; | language
C++ language
General topics
Flow control
Conditional execution statements
Iteration statements (loops)
Jump statements
Functions
Function declaration
Lambda function declaration
inline specifier
Exception specifications (deprecated)
noexcept specifier (C++11)
Exceptions
Namespaces
Types
Specifiers
decltype (C++11)
auto (C++11)
alignas (C++11)
Storage duration specifiers
Initialization
Expressions
Alternative representations
Literals
Boolean - Integer - Floating-point
Character - String - nullptr (C++11)
User-defined (C++11)
Utilities
Attributes (C++11)
Types
typedef declaration
Type alias declaration (C++11)
Casts
Implicit conversions - Explicit conversions
static_cast - dynamic_cast
const_cast - reinterpret_cast
Memory allocation
Classes
Class-specific function properties
Special member functions
Templates
Miscellaneous

Syntax

" (unescaped_character|escaped_character)* " (1)
L " (unescaped_character|escaped_character)* " (2)
u8 " (unescaped_character|escaped_character)* " (3) (since C++11)
u " (unescaped_character|escaped_character)* " (4) (since C++11)
U " (unescaped_character|escaped_character)* " (5) (since C++11)
prefix(optional) R "delimiter( raw_characters )delimiter" (6) (since C++11)

Explanation

unescaped_character - Any valid character except the double-quote ", backslash \, or new-line character
escaped_character - See escape sequences
prefix - One of L, u8, u, U
delimiter - A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)
raw_characters - Any character sequence, except that it must not contain the closing sequence )delimiter"


1) Narrow multibyte string literal. The type of an unprefixed string literal is const char[].
2) Wide string literal. The type of a L"..." string literal is const wchar_t[].
3) UTF-8 encoded string literal. The type of a u8"..." string literal is const char[].
4) UTF-16 encoded string literal. The type of a u"..." string literal is const char16_t[].
5) UTF-32 encoded string literal. The type of a U"..." string literal is const char32_t[].
6) Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.

Notes

The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other.
L"x =%" PRId16 // at phase 4, PRId16 expands to "d"
                 // at phase 6, L"x =%" and "d" form L"x =%d"

If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed.

(since C++11)

Any other combination of encoding prefixes may or may not be supported by the implementation. The result of such a concatenation is implementation-defined.

String literals have static storage duration, and thus exist in memory for the life of the program.

String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".

The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.

bool b = "bar" == 3+"foobar" // could be true or false, implementation-defined

Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:

const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior

In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.

A string literal is not necessarily a C string: if a string literal has embedded null characters, it represents an array which contains more than one string.

const char* p = "abc\0def"; // std::strlen(p) == 3, but the array has size 8

If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:

//const char* p = "\xfff"; // error: hex escape sequence out of range
const char* p = "\xff""f"; // OK: the literal is const char[3] holding {'\xff','f','\0'}

The encoding of narrow multibyte string literals (1) and wide string literals (2) is implementation-defined. For example, gcc selects them with the commandline options -fexec-charset and -fwide-exec-charset.

Example

#include <iostream>

char array1[] = "Foo" "bar";
// same as
char array2[] = { 'F', 'o', 'o', 'b', 'a', 'r', '\0' };

const char* s1 = R"foo(
Hello
World
)foo";
//same as
const char* s2 = "\nHello\nWorld\n";

int main()
{
    std::cout << array1 << '\n';
    std::cout << array2 << '\n';

    std::cout << s1;
    std::cout << s2;
}

Output:

Foobar
Foobar

Hello
World

Hello
World

See also