Endianness
In computing, endianness is the order in which bytes within a word data type are transmitted over a data communication medium or addressed in computer memory, counting only byte significance compared to earliness. Endianness is primarily expressed as big-endian or little-endian.
Computers store information in various-sized groups of binary bits. Each group is assigned a number, called its address, that the computer uses to access that data. On most modern computers, the smallest data group with an address is eight bits long and is called a byte. Larger groups comprise two or more bytes; for example, a 32-bit word contains four bytes.
There are two principal ways a computer could number the individual bytes in a larger group, starting at either end. A big-endian system stores the most significant byte of a word at the smallest memory address and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Of the two, big-endian is thus closer to the way the digits of numbers are written left-to-right in English, comparing digits to bytes and assuming addresses increase from left to right.
Both types of endianness are in widespread use in digital electronic engineering. The initial choice of endianness of a new design is often arbitrary, but later technology revisions and updates perpetuate the existing endianness to maintain backward compatibility. Big-endianness is the dominant ordering in networking protocols, such as in the Internet protocol suite, where it is referred to as network order, transmitting the most significant byte first. Conversely, little-endianness is the dominant ordering for processor architectures and their associated memory. File formats can use either ordering; some formats use a mixture of both or contain an indicator of which ordering is used throughout the file.
Bi-endianness is a feature supported by numerous computer architectures that feature switchable endianness in data fetches and stores or for instruction fetches. Other orderings are generically called middle-endian or mixed-endian.
Origin
Endianness is primarily expressed as big-endian or little-endian, terms introduced by Danny Cohen in an Internet Experiment Note published in 1980. Cohen borrowed the terms from an absurd episode in the satirical novel Gulliver's Travels by Jonathan Swift. In the imaginary land of Lilliput, scholars are fiercely divided by a never-ending debate over how to correctly break the shell of a boiled egg. Those who insist on breaking the big end of the shell are Big-Endians, while their opponents who break the opposite end of the shell are Little-Endians.Characteristics
consists of a sequence of storage cells ; in machines that support byte addressing, those units are called bytes. Each byte is identified and accessed in hardware and software by its memory address. If the total number of bytes in memory is n, then addresses are enumerated from 0 to n − 1.Computer programs often use data structures or fields that may consist of more data than can be stored in one byte. In the context of this article, where its type cannot be arbitrarily complicated, a field consists of a consecutive sequence of bytes and represents a simple data value which, at least potentially, can be manipulated by one single hardware instruction. On most systems, the address of a multi-byte simple data value is the address of its first byte. There are exceptions to this rule for example, the Add instruction of the IBM 1401 addresses variable-length fields at their low-order position with their lengths being defined by a word mark set at their high-order position. When an operation such as addition is performed, the processor begins at the low-order positions at the high addresses of the two fields and works its way down to the high-order.
Another important attribute of a byte being part of a field is its significance.
These attributes of the parts of a field play an important role in the sequence the bytes are accessed by the computer hardware, more precisely: by the low-level algorithms contributing to the results of a computer instruction.
Numbers
are the predominant way of representing and particularly of manipulating integer data by computers. In pure form, this is valid for moderately sized non-negative integers, e.g., of C data typeunsigned. In such a number system, the value of a digit that contributes to the whole number is determined not only by its value as a single digit, but also by the position it holds in the complete number, called its significance. These positions can be mapped to memory mainly in two ways:- Decreasing numeric significance with increasing memory addresses, known as big-endian and
- Increasing numeric significance with increasing memory addresses, known as little-endian.
Text
When character strings are to be compared with one another, e.g., in order to support some mechanism like sorting, this is very frequently done lexicographically where a single positional element also has a positional value. Lexicographical comparison means almost everywhere: first character ranks highest, as in the telephone book. Almost all machines that can do this using a single instruction are big-endian or at least mixed-endian.Integer numbers written as text are always represented most significant digit first in memory, which is similar to big-endian, independently of text direction.
Byte addressing
When memory bytes are printed sequentially from left to right, little-endian representation of integers has the significance increasing from right to left. In other words, it appears backwards when visualized, which can be counterintuitive.This behavior arises, for example, in FourCC or similar techniques that involve packing characters into an integer, so that it becomes a sequence of specific characters in memory. For example, take the string "JOHN", stored in hexadecimal ASCII. On big-endian machines, the value appears left-to-right, coinciding with the correct string order for reading the result. But on a little-endian machine, one would see "N H O J". Middle-endian machines complicate this even further; for example, on the PDP-11, the 32-bit value is stored as two 16-bit words "JO" "HN" in big-endian, with the characters in the 16-bit words being stored in little-endian, resulting in "O J N H".
Byte swapping
Byte-swapping consists of rearranging bytes to change endianness. Many compilers provide built-ins that are likely to be compiled into native processor instructions, such as. Software interfaces for swapping include:- Standard [|network endianness] functions. Windows has a 64-bit extension in.
- BSD and Glibc functions.
- macOS macros.
- The function in C++23.
Some compilers have built-in facilities for byte swapping. For example, the Intel Fortran compiler supports the non-standard specifier when opening a file, e.g.:. Other compilers have options for generating code that globally enables the conversion for all file IO operations. This permits the reuse of code on a system with the opposite endianness without code modification.
Considerations
Simplified access to part of a field
On most systems, the address of a multi-byte value is the address of its first byte ; little-endian systems of that type have the property that, for sufficiently low data values, the same value can be read from memory at different lengths without using different addresses. For example, a 32-bit memory location with content can be read at the same address as either 8-bit, 16-bit, 24-bit, or 32-bit, all of which retain the same numeric value. Although this little-endian property is rarely used directly by high-level programmers, it is occasionally employed by code optimizers as well as by assembly language programmers. While not allowed by C++, such type punning code is allowed as "implementation-defined" by the C11 standard and commonly used in code interacting with hardware.Calculation order
Some operations in positional number systems have a natural or preferred order in which the elementary steps are to be executed. This order may affect their performance on small-scale byte-addressable processors and microcontrollers. However, high-performance processors usually fetch multi-byte operands from memory in the same amount of time they would have fetched a single byte, so the complexity of the hardware is not affected by the byte ordering.Addition, subtraction, and multiplication start at the least significant digit position and propagate the carry to the subsequent more significant position. On most systems, the address of a multi-byte value is the address of its first byte. The implementation of these operations is marginally simpler using little-endian machines, where this first byte contains the least significant digit.
Comparison and division start at the most significant digit and propagate a possible carry to the subsequent less significant digits. For fixed-length numerical values, the implementation of these operations is marginally simpler on big-endian machines.
Some big-endian processors contain hardware instructions for lexicographically comparing varying-length character strings.
The normal data transport by an assignment statement is, in principle, independent of the endianness of the processor.
Hardware
Many historical and extant processors use a big-endian memory representation, either exclusively or as a design option. The IBM System/360 uses big-endian byte order, as do its successors System/370, ESA/390, and z/Architecture. The PDP-10 uses big-endian addressing for byte-oriented instructions. The IBM Series/1 minicomputer uses big-endian byte order. The Motorola 6800 / 6801, the 6809 and the 68000 series of processors use the big-endian format. Solely big-endian architectures include the IBM z/Architecture and OpenRISC. The PDP-11 minicomputer, however, uses little-endian byte order, as does its VAX successor.The Datapoint 2200 used simple bit-serial logic with little-endian to facilitate carry propagation. When Intel developed the 8008 microprocessor for Datapoint, they used little-endian for compatibility. However, as Intel was unable to deliver the 8008 in time, Datapoint used a medium-scale integration equivalent, but the little-endianness was retained in most Intel designs, including the MCS-48 and the 8086 and its x86 successors, including IA-32 and x86-64 processors. The MOS Technology 6502 family, the Zilog Z80, the Altera Nios II, the Atmel AVR, the Andes Technology NDS32, the Qualcomm Hexagon, and many other processors and processor families are also little-endian.
The Intel 8051, unlike other Intel processors, expects 16-bit addresses for LJMP and LCALL in big-endian format; however, instructions store the return address onto the stack in little-endian format.