Version control

Version control is the software engineering practice of controlling, organizing, and tracking different versions in history of computer files – primarily source code text files, but generally any type of file.
Version control is a component of software configuration management.
A version control system is a software tool that automates version control. Alternatively, version control is embedded as a feature of some systems such as word processors, spreadsheets, collaborative web docs, and content management systems, such as Wikipedia's page history.
Version control includes options to view old versions and to revert a file to a previous version.

Fandom code:Revision_control

Overview

As teams develop software, it is common to deploy multiple versions of the same software, and for different developers to work on one or more different versions simultaneously. Bugs or features of the software are often only present in certain versions. Therefore, for the purposes of locating and fixing bugs, it is vitally important to be able to retrieve and run different versions of the software to determine in which version the problem occurs. It may also be necessary to develop two versions of the software concurrently: for instance, where one version has bugs fixed, but no new features, while the other version is where new features are worked on.
At the simplest level, developers could simply retain multiple copies of the different versions of the program, and label them appropriately. This simple approach has been used in many large software projects. While this method can work, it is inefficient as many near-identical copies of the program have to be maintained. This requires a lot of self-discipline on the part of developers and often leads to mistakes. Since the code base is the same, it also requires granting read-write-execute permission to a set of developers, and this adds the pressure of someone managing permissions so that the code base is not compromised, which adds more complexity. Consequently, systems to automate some or all of the revision control process have been developed. This abstracts most operational steps.
Moreover, in software development, legal and business practice, and other environments, it has become increasingly common for a single document or snippet of code to be edited by a team, the members of which may be geographically dispersed and may pursue different and even contrary interests. Sophisticated revision control that tracks and accounts for ownership of changes to documents and code may be extremely helpful or even indispensable in such situations.
Revision control may also track changes to configuration files, such as those typically stored in /etc or /usr/local/etc on Unix systems. This gives system administrators another way to easily track changes made and a way to roll back to earlier versions should the need arise.
Many version control systems identify the version of a file as a number or letter, called the version number, version, revision number, revision, or revision level. For example, the first version of a file might be version 1. When the file is changed the next version is 2. Each version is associated with a timestamp and the person making the change. Revisions can be compared, restored, and, with some types of files, merged.

History

IBM's OS/360 IEBUPDTE software update tool dates back to 1962, arguably a precursor to version control system tools. Two source management and version control packages that were heavily used by IBM 360/370 installations were The Librarian and Panvalet.
A full system designed for source code control was started in 1972: the Source Code Control System, again for the OS/360. SCCS's user manual, especially the introduction, which was published on December 4, 1975, implied that it was the first deliberate revision control system. The Revision Control System followed in 1982 and, later, Concurrent Versions System added network and concurrent development features to RCS. After CVS, a dominant successor was Subversion, followed by the rise of distributed version control tools such as Git.

Structure

Revision control manages changes to a set of data over time. These changes can be structured in various ways.
Often the data is thought of as a collection of many individual items, such as files or documents, and changes to individual files are tracked. This is in line with the notion of keeping files separate but causes problems when identity changes, which happens when files are renamed, split or merged. Accordingly, some systems such as Git, consider changes to the data as a whole instead, which is less intuitive for simple changes but simplifies more complex changes.
When data that is under revision control is modified, after being retrieved by checking out, this is not in general immediately reflected in the revision control system, but must instead be checked in or committed. A copy outside of revision control is known as a "working copy". As a simple example, when editing a computer file, the data stored in memory by the editing program is the working copy, which is committed by saving the file. Concretely, one may print out a document, edit it by hand, and only later manually input the changes into a computer and save it. For source code control, the working copy is instead a copy of all files in a particular revision, generally stored locally on the developer's computer; in this case saving the file only changes the working copy, and checking into the repository is a separate step.
If multiple people are working on a single data set or document, they are implicitly creating branches of the data, and thus issues of merging arise, as discussed below. For simple collaborative document editing, this can be prevented by using file locking or simply avoiding working on the same document that someone else is working on.
Revision control systems are often centralized, with a single authoritative data store, the repository, and check-outs and check-ins done with reference to this central repository. Alternatively, in distributed revision control, no single repository is authoritative, and data can be checked out and checked into any repository. When checking into a different repository, this is interpreted as a merge or patch.

Graph structure

In terms of graph theory, revisions are generally thought of as a line of development with branches off of this, forming a directed tree, visualized as one or more parallel lines of development branching off a trunk. In reality the structure is more complicated, forming a directed acyclic graph, but for many purposes "tree with merges" is an adequate approximation.
Revisions occur in sequence over time, and thus can be arranged in order, either by revision number or timestamp. Revisions are based on past revisions, though it is possible to largely or completely replace an earlier revision, such as "delete all existing text, insert new text". In the simplest case, with no branching or undoing, each revision is based on its immediate predecessor alone, and they form a simple line, with a single latest version, the "HEAD" revision or tip. In graph theory terms, drawing each revision as a point and each "derived revision" relationship as an arrow, this is a linear graph. If there is branching, so multiple future revisions are based on a past revision, or undoing, so a revision can depend on a revision older than its immediate predecessor, then the resulting graph is instead a directed tree, and has multiple tips, corresponding to the revisions without children. In principle the resulting tree need not have a preferred tip – just various different revisions – but in practice one tip is generally identified as HEAD. When a new revision is based on HEAD, it is either identified as the new HEAD, or considered a new branch. The list of revisions from the start to HEAD is the trunk or mainline. Conversely, when a revision can be based on more than one previous revision, the resulting process is called a merge, and is one of the most complex aspects of revision control. This most often occurs when changes occur in multiple branches, which are then merged into a single branch incorporating both changes. If these changes overlap, it may be difficult or impossible to merge, and require manual intervention or rewriting.
In the presence of merges, the resulting graph is no longer a tree, as nodes can have multiple parents, but is instead a rooted directed acyclic graph. The graph is acyclic since parents are always backwards in time, and rooted because there is an oldest version. Assuming there is a trunk, merges from branches can be considered as "external" to the tree – the changes in the branch are packaged up as a patch, which is applied to HEAD, creating a new revision without any explicit reference to the branch, and preserving the tree structure. Thus, while the actual relations between versions form a DAG, this can be considered a tree plus merges, and the trunk itself is a line.
In distributed revision control, in the presence of multiple repositories these may be based on a single original version, but there need not be an original root - instead there can be a separate root for each repository. This can happen, for example, if two people start working on a project separately. Similarly, in the presence of multiple data sets that exchange data or merge, there is no single root, though for simplicity one may think of one project as primary and the other as secondary, merged into the first with or without its own revision history.

Specialized strategies

Engineering revision control developed from formalized processes based on tracking revisions of early blueprints or bluelines. This system of control implicitly allowed returning to an earlier state of the design, for cases in which an engineering dead-end was reached in the development of the design. A revision table was used to keep track of the changes made. Additionally, the modified areas of the drawing were highlighted using revision clouds.

In business and law

Version control is widespread in business and law. Indeed, "contract redline" and "legal blackline" are some of the earliest forms of revision control, and are still employed in business and law with varying degrees of sophistication. The most sophisticated techniques are beginning to be used for the electronic tracking of changes to CAD files, supplanting the "manual" electronic implementation of traditional revision control.

In game development

Game development often involves large binary files and teams working together across different disciplines. As a result, game studios use version control systems with good support for large binary files, file locking, and fast synchronization. Common tools include Perforce and several newer cloud-based systems.

Atomic operations

An operation is atomic if the system is left in a consistent state even if the operation is interrupted. The commit operation is usually the most critical in this sense. Commits tell the revision control system to make a group of changes final, and available to all users. Not all revision control systems have atomic commits; Concurrent Versions System lacks this feature.

File locking

The simplest method of preventing "concurrent access" problems involves locking files so that only one developer at a time has write access to the central "repository" copies of those files. Once one developer "checks out" a file, others can read that file, but no one else may change that file until that developer "checks in" the updated version.
File locking has both merits and drawbacks. It can provide some protection against difficult merge conflicts when a user is making radical changes to many sections of a large file. If the files are left exclusively locked for too long, other developers may be tempted to bypass the revision control software and change the files locally, forcing a difficult manual merge when the other changes are finally checked in. In a large organization, files can be left "checked out" and locked and forgotten about as developers move between projects - these tools may or may not make it easy to see who has a file checked out.

Version merging

Most version control systems allow multiple developers to edit the same file at the same time. The first developer to "check in" changes to the central repository always succeeds. The system may provide facilities to merge further changes into the central repository, and preserve the changes from the first developer when other developers check in.
Merging two files can be a very delicate operation, and usually possible only if the data structure is simple, as in text files. The result of a merge of two image files might not result in an image file at all. The second developer checking in files will need to take care with the merge, to make sure that the changes are compatible and that the merge operation does not introduce its own logic errors within the files. These problems limit the availability of automatic or semi-automatic merge operations mainly to simple text-based documents, unless a specific merge plugin is available for the file types.
The concept of a reserved edit can provide an optional means to explicitly lock a file for exclusive write access, even when a merging capability exists.

Baselines, labels and tags

Most revision control tools will use only one of these similar terms to refer to the action of identifying a snapshot or the record of the snapshot. Typically only one of the terms baseline, label, or tag is used in documentation or discussion; they can be considered synonyms.
In most projects, some snapshots are more significant than others, such as those used to indicate published releases, branches, or milestones.
When both the term baseline and either of label or tag are used together in the same context, label and tag usually refer to the mechanism within the tool of identifying or making the record of the snapshot, and baseline indicates the increased significance of any given label or tag.
Most formal discussion of configuration management uses the term baseline.

Distributed revision control

Distributed revision control systems take a peer-to-peer approach, as opposed to the client–server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer's working copy of the codebase is a bona-fide repository.
Distributed revision control conducts synchronization by exchanging patches from peer to peer. This results in some important differences from a centralized system:

No canonical, reference copy of the codebase exists by default; only working copies.
Common operations are fast, because there is no need to communicate with a central server.

Rather, communication is only necessary when pushing or pulling changes to or from other peers.

Each working copy effectively functions as a remote backup of the codebase and of its change-history, providing inherent protection against data loss.

Best practices

Following best practices is necessary to obtain the full benefits of version control. Best practice may vary by version control tool and the field to which version control is applied. The generally accepted best practices in software development include: making small/incremental changes; making commits which involve only one task or fix -- a corollary to this is to commit only code which works and does not knowingly break existing functionality; using branching to complete functionality before release; writing clear and descriptive commit messages, making what, why, and how clear in either the commit description or the code; and using a consistent branching strategy. Other best software development practices such as code review and automated regression testing may assist in the following of version control best practices.

Costs and benefits

Costs and benefits will vary dependent upon the version control tool chosen and the field in which it is applied. This section speaks to the field of software development, where version control is widely applied.

Costs

In addition to the costs of licensing the version control software, using version control requires time and effort. The concepts underlying version control must be understood and the technical particulars required to operate the version control software chosen must be learned. Version control best practices must be learned and integrated into the organization's existing software development practices. Management effort may be required to maintain the discipline needed to follow best practices in order to obtain useful benefit.

Benefits

Allows for reverting changes

A core benefit is the ability to keep history and revert changes, allowing the developer to easily undo changes. This gives the developer more opportunity to experiment, eliminating the fear of breaking existing code.

Branching simplifies deployment, maintenance and development

Branching assists with deployment and release management. Branching and merging, the production, packaging, and labeling of source code patches and the easy application of patches to code bases, simplifies the maintenance and concurrent development of the multiple code bases associated with the various stages of the deployment process; development, testing, staging, production, etc.

Damage mitigation, accountability and process and design improvement

There can be damage mitigation, accountability, process and design improvement, and other benefits associated with the record keeping provided by version control, the tracking of who did what, when, why, and how.
When bugs arise, knowing what was done when helps with damage mitigation and recovery by assisting in the identification of what problems exist, how long they have existed, and determining problem scope and solutions. Previous versions can be installed and tested to verify conclusions reached by examination of code and commit messages.

Simplifies debugging

Version control can greatly simplify debugging. The application of a test case to multiple versions can quickly identify the change which introduced a bug. The developer need not be familiar with the entire code base and can focus instead on the code that introduced the problem.

Improves collaboration and communication

Version control enhances collaboration in multiple ways. Since version control can identify conflicting changes, i.e. incompatible changes made to the same lines of code, there is less need for coordination among developers.
The packaging of commits, branches, and all the associated commit messages and version labels, improves communication between developers, both in the moment and over time. Better communication, whether instant or deferred, can improve the code review process, the testing process, and other critical aspects of the software development process.

Integration

Some of the more advanced revision-control tools offer many other facilities, allowing deeper integration with other tools and software-engineering processes.

Integrated development environment

Plugins are often available for IDEs such as Oracle JDeveloper, IntelliJ IDEA, Eclipse, Visual Studio, Delphi, NetBeans IDE, Xcode, and GNU Emacs. Advanced research prototypes generate appropriate commit messages.

Common terminology

Terminology can vary from system to system, but some terms in common usage include:

Baseline

An approved revision of a document or source file to which subsequent changes can be made. See [|baselines, labels and tags].

Blame

A search for the author and revision that last modified a particular line.

Branch

A set of files under version control may be branched or forked at a point in time so that, from that time forward, two copies of those files may develop at different speeds or in different ways independently of each other.

Change

A change represents a specific modification to a document under version control. The granularity of the modification considered a change varies between version control systems.

Change list

On many version control systems with atomic multi-change commits, a change list, change set, update, or patch identifies the set of changes made in a single commit. This can also represent a sequential view of the source code, allowing the examination of source as of any particular changelist ID.

Checkout

To check out is to create a local working copy from the repository. A user may specify a specific revision or obtain the latest. The term 'checkout' can also be used as a noun to describe the working copy. When a file has been checked out from a shared file server, it cannot be edited by other users. Think of it like a hotel, when you check out, you no longer have access to its amenities.

Clone

Cloning means creating a repository containing the revisions from another repository. This is equivalent to pushing or pulling into an empty repository. As a noun, two repositories can be said to be clones if they are kept synchronized, and contain the same revisions.

Commit (verb)

To commit is to write or merge the changes made in the working copy back to the repository. A commit contains metadata, typically the author information and a commit message that describes the change.

Commit message

A short note, written by the developer, stored with the commit, which describes the commit. Ideally, it records why the modification was made, a description of the modification's effect or purpose, and non-obvious aspects of how the change works.

Conflict

A conflict occurs when different parties make changes to the same document, and the system is unable to reconcile the changes. A user must resolve the conflict by combining the changes, or by selecting one change in favour of the other.

Delta compression

Most revision control software uses delta compression, which retains only the differences between successive versions of files. This allows for more efficient storage of many different versions of files.

Dynamic stream

A stream in which some or all file versions are mirrors of the parent stream's versions.

Export

Exporting is the act of obtaining the files from the repository. It is similar to checking out except that it creates a clean directory tree without the version-control metadata used in a working copy. This is often used prior to publishing the contents, for example.

Fetch

See pull.

Forward integration

The process of merging changes made in the main trunk into a development branch.

Head

Also sometimes called tip, this refers to the most recent commit, either to the trunk or to a branch. The trunk and each branch have their own head, though HEAD is sometimes loosely used to refer to the trunk.

Import

Importing is the act of copying a local directory tree into the repository for the first time.

Initialize

To create a new, empty repository.

Interleaved deltas

Some revision control software uses Interleaved deltas, a method that allows storing the history of text based files in a more efficient way than by using Delta compression.

Label

See tag.

Locking

When a developer locks a file, no one else can update that file until it is unlocked. Locking can be supported by the version control system, or via informal communications between developers.

Mainline

Similar to trunk, but there can be a mainline for each branch.

Merge

A merge or integration is an operation in which two sets of changes are applied to a file or set of files. Some sample scenarios are as follows:

A user, working on a set of files, updates or syncs their working copy with changes made, and checked into the repository, by other users.
A user tries to check in files that have been updated by others since the files were checked out, and the revision control software automatically merges the files.
A branch is created, the code in the files is independently edited, and the updated branch is later incorporated into a single, unified trunk.
A set of files is branched, a problem that existed before the branching is fixed in one branch, and the fix is then merged into the other branch.

Promote

The act of copying file content from a less controlled location into a more controlled location. For example, from a user's workspace into a repository, or from a stream to its parent.

Pull, push

Copy revisions from one repository into another. Pull is initiated by the receiving repository, while push is initiated by the source. Fetch is sometimes used as a synonym for pull, or to mean a pull followed by an update.

Resolve

The act of user intervention to address a conflict between different changes to the same document.

Reverse integration

The process of merging different team branches into the main trunk of the versioning system.

Revision and version

A version is any change in form. In SVK, a Revision is the state at a point in time of the entire tree in the repository.

The act of making one file or folder available in multiple branches at the same time. When a shared file is changed in one branch, it is changed in other branches.

Stream

A container for branched files that has a known relationship to other such containers. Streams form a hierarchy; each stream can inherit various properties from its parent stream.

Tag

A tag or label refers to an important snapshot in time, consistent across many files. These files at that point may all be tagged with a user-friendly, meaningful name or revision number. See baselines, labels and tags.

Trunk

The trunk is the unique line of development that is not a branch

Update

An update merges changes made in the repository into the local working copy. Update is also the term used by some CM tools for the change package concept. Synonymous with checkout in revision control systems that require each repository to have exactly one working copy

Unlocking

Releasing a lock.

Working copy

The working copy is the local copy of files from a repository, at a specific time or revision. All work done to the files in a repository is initially done on a working copy, hence the name. Conceptually, it is a sandbox.