Document comparison
In computing, document comparison, also known as redlining or blacklining, is a process by which changes are identified between two versions of the same document for the purposes of document editing and review. Document comparison is a common task in the legal and financial industries.
The software-based document comparison process compares a reference document to a target document, and produces a third document which indicates information that has either been added to or removed from the reference document to produce the target document.
Common documents formats for comparison include word processing documents, spreadsheets, presentations, and Portable Document Format documents.
Overview
In the broadest definition, document comparison can refer to any act of marking changes made between two versions of the same document and presenting those changes in a third document via a graphical user interface. There are several variants in the types of changes registered through the process of document comparison. Some programs limit comparison to solely text and table content in word processing documents, while others register changes made in spreadsheets and presentations, along with changes made in versions of PDF documents. Certain programs also exist that compare changes made to objects like JPEG, TIFF, BMP, PNG images embedded in documents, and plain text files.Document comparison solutions mark changes made to the following types of documents:
| Document | Types of change |
| Word processing documents | Text in paragraphs and in text boxes; bullets and numbering; tables of content; applied styles; design and layout elements; tables, including additions and deletions of rows; embedded objects; inserted images. |
| Spreadsheets | Values; formulas; additions and deletions to rows and columns, applied styles; design and layout elements. |
| Presentations | Text, table, image and other changes. |
| PDF Documents | Text, table, image and other changes. |
It is common for document comparison software vendors to present forms of the compared document in separate windows in a GUI. Each window contains the following items and the various windows are displayed on one or more computer display monitors:
- the original document
- the modified document
- the redline document, and
- the list of changes made between document versions.
History
Prior to personal computers, document comparison entailed the printing of two versions of a single document and reviewing those hard copies in detail for changes and version amendment. Included in this process were the potential for human error and the expansive administrative time necessitated by this arduous process. A ruler was used with a red pen to draw strike-through lines of deleted text and double-underline inserted text. The term "redline" came from using a red pen on the original/current version. When the document was placed in a copy machine, the copies came out black, thus the term "blackline."With the advent of personal computers and the ubiquity of word processing software, the need arose to find a way to manage changes made to document versions shared via disk, and later email. The importance of mitigating risks associated with potential document changes became essential as the amount of document and revision sharing increased. Early document comparison software solutions provided robust document review, checking all the text in two documents for changes, and then presenting those changes in a third redline/comparison version.
As documents changed and evolved, so did document comparison solutions. Software began utilizing tables to manage a multiplicity of document layouts. Many document comparison solutions had difficulty comparing tables in document versions. These solutions first converted tables to text arrays and then compared the created arrays. In many cases, not enough due diligence on the software’s part was conducted; users would not be informed of sections that were not successfully compared. In the second generation, Microsoft’s Track Changes option was also introduced. With Track Changes, all changes made to documents were captured and stored inside the document. Flaws in the functionality of Track Changes could render the documents unusable and some comparison offerings again had difficulty managing the complex process of comparing in a Track Changes environment.
Before third generation technology, it was common for organizations to be required to use multiple documents for one product. A main document with various supporting documents would be used to present and share necessary information. However, later software enabled multiple types of information to be presented in a single document. Compound documents could include text, tables, and various styles, and could also include a range of embedded objects, such as Excel, Visio, ChemDraw, and SmartDraw objects, and inserted images in a range of types. While this enhancement greatly increased the usefulness of documents, it added an entirely new layer of risk to organizations that needed to fully understand changes made to document versions. The majority of document comparison software programs have not yet included mechanisms to mitigate the risk related to changes inside of embedded objects. The software program that can compare changes made in embedded objects provides pixel-to-pixel comparison of images and cell-level comparison of embedded Excel spreadsheets and other changes made to these complex, compound documents.