Pointer (computer programming)

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.
Using pointers significantly improves performance for repetitive operations, like traversing iterable data structures. In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.
Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries. In object-oriented programming, pointers to functions are used for binding methods, often using virtual method tables.
A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages, especially low-level languages, support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated as a memory address, as opposed to a magic cookie or capability which does not allow such. Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them, particularly in the latter case. Primitive pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" such a pointer whose value is not a valid memory address could cause a program to crash. To alleviate this potential problem, as a matter of type safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken, to verify that the pointer variable contains a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.

History

In 1955, Soviet Ukrainian computer scientist Kateryna Yushchenko created the Address programming language that made possible indirect addressing and addresses of the highest rank – analogous to pointers. This language was widely used on the Soviet computers. However, it was unknown outside the Soviet Union and usually Harold Lawson is credited with the invention, in 1964, of the pointer. In 2000, Lawson was presented the Computer Pioneer Award by the IEEE "or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high-level language". His seminal paper on the concepts appeared in the June 1967 issue of CACM entitled: PL/I List Processing. According to the Oxford English Dictionary, the word pointer first appeared in print as a stack pointer in a technical memorandum by the System Development Corporation.

Formal description

In computer science, a pointer is a kind of reference.
A data primitive is any datum that can be read from or written to computer memory using one memory access.
A data aggregate is a group of primitives that are logically contiguous in memory and that are viewed collectively as one datum. When an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.
In the context of these definitions, a byte is the smallest primitive; each memory address specifies a different byte. The memory address of the initial byte of a datum is considered the memory address of the entire datum.
A memory pointer is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum when the pointer's value is the datum's memory address.
More generally, a pointer is a kind of reference, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.
References serve as a level of indirection: A pointer's value determines which memory address is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type in programming languages; in statically typed programming languages, the type of a pointer determines the type of the datum to which the pointer points.

Architectural roots

Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word – depending on whether the architecture is byte-addressable or word-addressable – effectively transforming all of memory into a very large array. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address.
In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault. The second case is possible in the current implementation of AMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules, so if a non-canonical pointer is dereferenced, the processor raises a general protection fault.
On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode of the 80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.
In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.

Uses

Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, FreeBASIC, and implicitly in most assembly languages. They are used mainly to construct references, which in turn are fundamental to construct nearly all data structures, and to pass data between different parts of a program.
In functional programming languages that rely heavily on lists, data references are managed abstractly by using primitive constructs like cons and the corresponding elements car and cdr, which can be thought of as specialised pointers to the first and second components of a cons-cell. This gives rise to some of the idiomatic "flavour" of functional programming. By structuring data in such cons-lists, these languages facilitate recursive means for building and processing data—for example, by recursively accessing the head and tail elements of lists of lists; e.g. "taking the car of the cdr of the cdr". By contrast, memory management based on pointer dereferencing in some approximation of an array of memory addresses facilitates treating variables as slots into which data can be assigned imperatively.
When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.
Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.
Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it when it is no longer needed. Failure to do so may result in a memory leak.

C pointers

In C, the basic syntax to define a pointer is:

int *ptr;

This declares a variable ptr that stores a pointer to an object of type int. Other types can be used in place of int; for example, bool *ptr would declare a pointer to an object of type bool.
After a pointer has been declared, it can be assigned an address. In C, the address of a variable can be retrieved with the & unary operator:

// Declare a variable for a pointer to int
int *ptr;
// Declare a variable for an int
int a = 5;
// Assign the address of a to the pointer variable
ptr = &a;

To dereference the pointer, yielding the object it points to, an asterisk can be used:

printf;
// Prints: 5

There are two styles of declarations of a pointer-to-int named ptr:

int *ptr;, which emphasises that the expression *ptr has type int.
int* ptr;, which emphasises that ptr has type int*. This style is more prevalent in C++ than C.

Both int *x, y; and int* x, y; declare x to be a pointer to int, but declare y to be an int. This is because the * applies only to the variable immediately following it.
The asterisk can also be used on the left-hand side of an assignment to change the object a without having to be in the same scope.

ptr = 8;

If a is accessed later, its new value will be 8.
Because the C language does not specify an implicit initialization for objects of automatic storage duration, pointer variables can sometimes point to unexpected locations, causing undefined behavior. To combat this, pointers are sometimes initialized with a null pointer value, represented in C by the NULL macro; in C23 and later, nullptr is also available as an alternative, which is of type nullptr_t.

int *ptr = NULL;
int *ptr = nullptr; // since C23

Dereferencing a null pointer produces undefined behavior, which can result in unpredictable bugs and results.
This example may be clearer if memory is examined directly.
Assume that a is located at address 0x8130 in memory and ptr at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:

int a = 5;
int *ptr = NULL;

Address	Contents
'
'

By assigning the address of a to ptr:

ptr = &a;

yields the following memory values:

Address	Contents
'
'

Then by dereferencing ptr by coding:

ptr = 8;

the computer will take the contents of ptr, 'locate' that address, and assign 8 to that location yielding the following memory:

Address	Contents
'
'

Clearly, accessing a will yield the value of 8 because the previous instruction modified the contents of a by way of the pointer ptr.