Garbage collection (computer science)
In computer science, garbage collection is a form of automatic memory management. The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.
Garbage collection relieves the programmer from doing manual memory management, where the programmer specifies what objects to de-allocate and return to the memory system and when to do so. Other, similar techniques include stack allocation, region inference, and memory ownership, and combinations thereof. Garbage collection may take a significant proportion of a program's total processing time, and affect performance as a result.
Resources other than memory, such as network sockets, database handles, windows, file descriptors, and device descriptors, are not typically handled by garbage collection, but rather by other methods. Some such methods de-allocate memory also.
Overview
Many programming languages require garbage collection, either as part of the language specification or effectively for practical implementation. These are said to be garbage-collected languages. Other languages, such as C and C++, were designed for use with manual memory management, but have garbage-collected implementations available. Some languages, like Ada, Modula-3, and C++/CLI, allow both garbage collection and manual memory management to co-exist in the same application by using separate heaps for collected and manually managed objects. Still others, like D, are garbage-collected but allow the user to manually delete objects or even disable garbage collection entirely when speed is required.Although many languages integrate GC into their compiler and runtime system, post-hoc GC systems also exist, such as Automatic Reference Counting. Some of these post-hoc GC systems do not require recompilation.
Advantages
GC frees the programmer from manually de-allocating memory. This helps avoid some kinds of errors:- Dangling pointers, which occur when a piece of memory is freed while there are still pointers to it, and one of those pointers is dereferenced. By then the memory may have been reassigned to another use, with unpredictable results.
- Double free bugs, which occur when the program tries to free a region of memory that has already been freed, and perhaps already been allocated again.
- Certain kinds of memory leaks, in which a program fails to free memory occupied by objects that have become unreachable, which can lead to memory exhaustion.
Disadvantages
The moment when the garbage is actually collected can be unpredictable, resulting in stalls scattered throughout a session. Unpredictable stalls can be unacceptable in real-time environments, in transaction processing, or in interactive programs. Incremental, concurrent, and real-time garbage collectors address these problems, with varying trade-offs.
Strategies
Tracing
is the most common type of garbage collection, so much so that "garbage collection" often refers to tracing garbage collection, rather than other methods such as reference counting. The overall strategy consists of determining which objects should be garbage collected by tracing which objects are reachable by a chain of references from certain root objects, and considering the rest as garbage and collecting them. However, there are a large number of algorithms used in implementation, with widely varying complexity and performance characteristics.Escape analysis
is a compile-time technique that can convert heap allocations to stack allocations, thereby reducing the amount of garbage collection to be done. This analysis determines whether an object allocated inside a function is accessible outside of it. If a function-local allocation is found to be accessible to another function or thread, the allocation is said to "escape" and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs.Availability
Generally speaking, higher-level programming languages are more likely to have garbage collection as a standard feature. In some languages lacking built-in garbage collection, it can be added through a library, as with the Boehm garbage collector for C and C++.Most functional programming languages, such as ML, Haskell, and APL, have garbage collection built in. Lisp is especially notable as both the first functional programming language and the first language to introduce garbage collection.
Other dynamic languages, such as Ruby and Julia, JavaScript and ECMAScript also tend to use GC. Object-oriented programming languages such as Smalltalk, ooRexx, RPL and Java usually provide integrated garbage collection. Notable exceptions are C++ and Delphi, which have destructors.
BASIC
and Logo have often used garbage collection for variable-length data types, such as strings and lists, so as not to burden programmers with memory management details. On the Altair 8800, programs with many string variables and little string space could cause long pauses due to garbage collection. Similarly the Applesoft BASIC interpreter's garbage collection algorithm repeatedly scans the string descriptors for the string having the highest address in order to compact it toward high memory, resulting in Big O notation| performance and pauses anywhere from a few seconds to a few minutes. A replacement garbage collector for Applesoft BASIC by Randy Wigginton identifies a group of strings in every pass over the heap, reducing collection time dramatically. BASIC.SYSTEM, released with ProDOS in 1983, provides a windowing garbage collector for BASIC that is many times faster.C and C++
has never offered official support for garbage collection. C++ added garbage collection support in C++11 to the standard library, however this was removed in C++23 due to no compilers implementing support for the feature. The features that were part of this were related to pointer safety.Although garbage collection support in the standard library was removed, some garbage collectors such as Boehm garbage collector can still be used. Boehm GC uses mark-and-sweep garbage collection. It can also be used in leak detection mode, where memory management is still manual however leaks and double-free errors can be detected and reported. Its use can be called from the header
.One can still abstract away manual object destructions in C++ by using the "resource acquisition is initialization" idiom and smart pointers.
std::unique_ptr ties lifetimes to ownership, while std::shared_ptr uses reference counting to determine lifetime. std::weak_ptr can be used to obtain a pointer without increasing the reference count. Unlike garbage collection, RAII is deterministic.Objective-C
While the Objective-C traditionally had no garbage collection, with the release of OS X 10.5 in 2007 Apple introduced garbage collection for Objective-C 2.0, using an in-house developed runtime collector.However, with the 2012 release of OS X 10.8, garbage collection was deprecated in favor of LLVM's automatic reference counter that was introduced with OS X 10.7. Furthermore, since May 2015 Apple even forbade the usage of garbage collection for new OS X applications in the App Store. For iOS, garbage collection has never been introduced due to problems in application responsivity and performance; instead, iOS uses ARC.
Limited environments
Garbage collection is rarely used on embedded or real-time systems because of the usual need for very tight control over the use of limited resources. However, garbage collectors compatible with many limited environments have been developed. The Microsoft.NET Micro Framework,.NET nanoFramework and Java Platform, Micro Edition are embedded software platforms that, like their larger cousins, include garbage collection.Java
Garbage collectors available in Java OpenJDKs virtual machine include:- Serial
- Parallel
- CMS
- G1
- ZGC
- Epsilon
- Shenandoah
- GenZGC
- GenShen
- IBM Metronome
- SAP
- Azul C4
Compile-time use
This form of garbage collection has been studied in the Mercury programming language, and it saw greater usage with the introduction of LLVM's automatic reference counter into Apple's ecosystem in 2011.
Real-time systems
Incremental, concurrent, and real-time garbage collectors have been developed, for example by Henry Baker and by Henry Lieberman.In Baker's algorithm, the allocation is done in either half of a single region of memory. When it becomes half full, a garbage collection is performed which moves the live objects into the other half and the remaining objects are implicitly deallocated. The running program has to check that any object it references is in the correct half, and if not move it across, while a background task is finding all of the objects.
Generational garbage collection schemes are based on the empirical observation that most objects die young. In generational garbage collection, two or more allocation regions are kept, which are kept separate based on the object's age. New objects are created in the "young" generation that is regularly collected, and when a generation is full, the objects that are still referenced from older regions are copied into the next oldest generation. Occasionally a full scan is performed.
Some high-level language computer architectures include hardware support for real-time garbage collection.
Most implementations of real-time garbage collectors use tracing. Such real-time garbage collectors meet hard real-time constraints when used with a real-time operating system.