Diff: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Escape Orbit
Reverted 1 edit by 129.45.25.85 (talk): ??
 
imported>C.Fred
m Reverted edit by 41.198.159.95 (talk) to last version by Gurkubondinn
 
Line 1: Line 1:
{{Short description|Standard UNIX utility for file comparison}}
{{Short description|Shell command for comparing file content}}
{{About|the utility program|the general topic of file comparison|File comparison|diffs in Wikipedia|:Help:Diff|other uses|DIFF (disambiguation){{!}}DIFF}}
{{About|the command with Unix heritage|the general topic of file comparison|File comparison|diffs in Wikipedia|:Help:Diff|other uses|DIFF (disambiguation){{!}}DIFF}}
{{Lowercase title|title=diff}}
{{Lowercase title|title=diff}}
{{Infobox software
{{Infobox software
Line 24: Line 24:
| AsOf =  
| AsOf =  
}}
}}
In [[computing]], the utility '''diff''' is a [[data comparison]] tool that computes and displays the differences between the contents of files. Unlike [[edit distance]] notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like [[Levenshtein distance]] in that it tries to determine the smallest set of deletions and insertions to create one file from the other. The utility displays the changes in one of several standard formats, such that both humans or computers can parse the changes, and use them for [[patch (Unix)|patching]].
<code>'''diff'''</code> is a [[shell (computing)|shell]] [[command (computing)|command]] that [[data comparison|compares]] the content of files and reports differences. The term ''diff'' is also used to identify the output of the command and is [[verbed|used as a verb]] for running the command. To diff files, one runs diff to create a diff.<ref>Eric S. Raymond (ed.), [http://catb.org/jargon/html/D/diff.html "diff"] {{Webarchive|url=https://web.archive.org/web/20140131071625/http://www.catb.org/jargon/html/D/diff.html |date=2014-01-31 }}, ''The Jargon File'', version 4.4.7</ref>


Typically, ''diff'' is used to show the changes between two versions of the same file. Modern implementations also support [[binary file]]s.<ref>MacKenzie ''et al.''  "Binary Files and Forcing Text Comparison" in ''Comparing and Merging Files with GNU Diff and Patch''.  Downloaded 28 April 2007.  [https://www.gnu.org/software/diffutils/manual/html_node/Binary.html] {{Webarchive|url=https://web.archive.org/web/20171219223414/http://www.gnu.org/software/diffutils/manual/html_node/Binary.html|date=2017-12-19}}</ref> The output is called a "diff", or a [[patch (computing)|patch]], since the output can be applied with the [[Unix]] program [[patch (Unix)|{{Mono|patch}}]]. The output of similar file comparison utilities is also called a "diff"; like the use of the word "[[grep]]" for describing the act of searching, the word ''diff'' became a generic term for calculating data difference and the results thereof.<ref>Eric S. Raymond (ed.), [http://catb.org/jargon/html/D/diff.html "diff"] {{Webarchive|url=https://web.archive.org/web/20140131071625/http://www.catb.org/jargon/html/D/diff.html |date=2014-01-31 }}, ''The Jargon File'', version 4.4.7</ref> The [[POSIX]] standard specifies the behavior of the "diff" and "patch" utilities and their file formats.<ref>{{cite book|author1 = IEEE Computer Society|author2-link = The Open Group|author2 = The Open Group|date=26 September 2008|title = Standard for Information Technology&mdash;Portable Operating System Interface (POSIX) Base Specifications, Issue 7|pages = 2599–2607|author1-link = IEEE Computer Society}} IEEE Std. 1003.1-2001 specifies traditional, "ed script", and context diff output formats; IEEE Std. 1003.1-2008 added the (by then more common) unified format.</ref>
Typically, the command is used to compare [[text file]]s, but it does support comparing [[binary file]]s. If one of the input files contains non-textual data, then the command defaults to brief-mode in which it reports only a summary indication of whether the files differ. With the {{code|--text}} option, it always reports line-based differences, but the output may be difficult to understand since binary data is generally not structured in lines like text is.<ref>MacKenzie ''et al.''  "Binary Files and Forcing Text Comparison" in ''Comparing and Merging Files with GNU Diff and Patch''.  Downloaded 28 April 2007.  [https://www.gnu.org/software/diffutils/manual/html_node/Binary.html] {{Webarchive|url=https://web.archive.org/web/20171219223414/http://www.gnu.org/software/diffutils/manual/html_node/Binary.html|date=2017-12-19}}</ref>  
 
Although the command is primarily used ad hoc to analyze changes between two files, a special use is for creating a [[Patch (computing)|patch file]] for use with the <code>[[patch (Unix)|patch]]</code> command {{endash}} which was specifically designed to use a diff output report as a patch file.
[[POSIX]] standardized the {{code|diff}} and {{code|patch}} commands including their shared file format.<ref>{{cite book|author1 = IEEE Computer Society|author2-link = The Open Group|author2 = The Open Group|date=26 September 2008|title = Standard for Information Technology&mdash;Portable Operating System Interface (POSIX) Base Specifications, Issue 7|pages = 2599–2607|author1-link = IEEE Computer Society}} IEEE Std. 1003.1-2001 specifies traditional, "ed script", and context diff output formats; IEEE Std. 1003.1-2008 added the (by then more common) unified format.</ref>


== History ==
== History ==
diff was developed in the early 1970s on the Unix operating system, which was emerging from [[Bell Labs]] in Murray Hill, New Jersey. It was part of the 5th Edition of Unix released in 1974,<ref>https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s1/diff1.c</ref> and was written by [[Douglas McIlroy]], and [[James W. Hunt|James Hunt]].  This research was published in a 1976 paper co-written with James W. Hunt, who developed an initial prototype of {{Mono|diff}}.<ref name="diff paper">{{cite journal|author1=James W. Hunt|author2=M. Douglas McIlroy|title=An Algorithm for Differential File Comparison|volume=41|journal=Computing Science Technical Report, Bell Laboratories|date=June 1976|url=http://www.cs.dartmouth.edu/~doug/diff.pdf|access-date=2015-05-06|archive-date=2014-12-26|archive-url=https://web.archive.org/web/20141226005228/http://www.cs.dartmouth.edu/~doug/diff.pdf|url-status=live}}</ref> The algorithm this paper described became known as the [[Hunt–Szymanski algorithm]].
The original {{Mono|diff}} [[utility software|utility]] was developed in the early 1970s for the Unix operating system, at [[Bell Labs]] in Murray Hill, New Jersey. It was part of the 5th Edition of Unix released in 1974,<ref>https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s1/diff1.c {{Bare URL inline|date=August 2025}}</ref> and was written by [[Douglas McIlroy]], and [[James W. Hunt|James Hunt]].  This research was published in a 1976 paper co-written with James W. Hunt, who developed an initial prototype of {{Mono|diff}}.<ref name="diff paper">{{cite journal|author1=James W. Hunt|author2=M. Douglas McIlroy|title=An Algorithm for Differential File Comparison|volume=41|journal=Computing Science Technical Report, Bell Laboratories|date=June 1976|url=http://www.cs.dartmouth.edu/~doug/diff.pdf|access-date=2015-05-06|archive-date=2014-12-26|archive-url=https://web.archive.org/web/20141226005228/http://www.cs.dartmouth.edu/~doug/diff.pdf|url-status=live}}</ref> The algorithm this paper described became known as the [[Hunt–Szymanski algorithm]].


McIlroy's work was preceded and influenced by [[Stephen C. Johnson|Steve Johnson]]'s comparison program on [[GECOS]] and [[Mike Lesk]]'s {{Mono|proof}} program. {{Mono|Proof}} also originated on Unix and, like {{Mono|diff}}, produced line-by-line changes and even used angle-brackets (">" and "&lt;") for presenting line insertions and deletions in the program's output. The [[heuristic]]s used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks, but perform well in the processing and size limitations of the [[PDP-11]]'s hardware. His approach to the problem resulted from collaboration with individuals at Bell Labs including [[Alfred Aho]], Elliot Pinson, [[Jeffrey Ullman]], and Harold S. Stone.
McIlroy's work was preceded and influenced by [[Stephen C. Johnson|Steve Johnson]]'s comparison program on [[GECOS]] and [[Mike Lesk]]'s {{Mono|proof}} program. {{Mono|Proof}} also originated on Unix and, like {{Mono|diff}}, produced line-by-line changes and even used angle-brackets (">" and "&lt;") for presenting line insertions and deletions in the program's output. The [[heuristic]]s used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks, but perform well in the processing and size limitations of the [[PDP-11]]'s hardware. His approach to the problem resulted from collaboration with individuals at Bell Labs including [[Alfred Aho]], Elliot Pinson, [[Jeffrey Ullman]], and Harold S. Stone.
Line 35: Line 38:
In the context of Unix, the use of the [[ed (Unix)|{{Mono|ed}}]] line editor provided {{Mono|diff}} with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by {{Mono|ed}} into the modified file in its entirety. This greatly reduced the [[secondary storage]] necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for {{Mono|diff}} where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have {{Mono|diff}} be responsible for generating the syntax and reverse-order input accepted by the {{Mono|ed}} command.
In the context of Unix, the use of the [[ed (Unix)|{{Mono|ed}}]] line editor provided {{Mono|diff}} with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by {{Mono|ed}} into the modified file in its entirety. This greatly reduced the [[secondary storage]] necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for {{Mono|diff}} where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have {{Mono|diff}} be responsible for generating the syntax and reverse-order input accepted by the {{Mono|ed}} command.


In 1984, [[Larry Wall]] created a separate utility, [[patch (Unix)|patch]],
In 1984, [[Larry Wall]] created the [[patch (Unix)|{{mono|patch}}]] utility (releasing its source code on the ''mod.sources'' and ''net.sources'' newsgroups<ref>{{cite newsgroup
releasing its source code on the ''mod.sources'' and ''net.sources'' newsgroups.<ref>{{cite newsgroup
  | title = A patch applier--YOU WANT THIS!!!
  | title = A patch applier--YOU WANT THIS!!!
  | author = Larry Wall
  | author = Larry Wall
Line 69: Line 71:
  | archive-url = https://web.archive.org/web/20220219233604/https://groups.google.com/g/mod.sources/c/xSQM63e39YY/m/apNNJSkJi0gJ
  | archive-url = https://web.archive.org/web/20220219233604/https://groups.google.com/g/mod.sources/c/xSQM63e39YY/m/apNNJSkJi0gJ
  | url-status = live
  | url-status = live
  }}</ref> This program modifies files using output from {{Mono|diff}} and has the ability to match context.
  }}</ref>) for patching text files; using the output from {{Mono|diff}} plus the diff input file with the content before changes to create a file with the content after changes.


[[X/Open]] Portability Guide issue 2 of 1987 includes diff. Context mode was added in POSIX.1-2001 (issue 6). Unified mode was added in POSIX.1-2008 (issue 7).<ref>{{man|cu|diff|SUS}}</ref>
[[X/Open]] Portability Guide issue 2 of 1987 includes diff. Context mode was added in POSIX.1-2001 (issue 6). Unified mode was added in POSIX.1-2008 (issue 7).<ref>{{man|cu|diff|SUS}}</ref>
Line 76: Line 78:


== Algorithm ==
== Algorithm ==
The operation of {{Mono|diff}} is based on solving the [[longest common subsequence problem]].<ref name="diff paper" />
Unlike [[edit distance]] notions used for other purposes, {{code|diff}} is line-oriented rather than character-oriented, but it is like [[Levenshtein distance]] in that it tries to determine the smallest set of deletions and insertions to create one file from the other.  


In this problem, given two sequences of items:
The operation of {{Mono|diff}} is based on solving the [[longest common subsequence problem]].<ref name="diff paper" /> In this problem, given two sequences of items:


  {{underline|a}} {{underline|b}} {{underline|c}} {{underline|d}} {{underline|f}} {{underline|g}} h {{underline|j}} q {{underline|z}}
  {{underline|a}} {{underline|b}} {{underline|c}} {{underline|d}} {{underline|f}} {{underline|g}} h {{underline|j}} q {{underline|z}}
Line 93: Line 95:
  +  - +  -  + + + +
  +  - +  -  + + + +


== Usage ==
== Use ==
The <code>diff</code> command is invoked from the command line, passing it the names of two files: <code>diff ''original'' ''new''</code>. The output of the command represents the changes required to transform the ''original'' file into the ''new'' file.
The <code>diff</code> command accepts two arguments like: <code>diff ''original'' ''new''</code>. Commonly, the arguments each identify normal files, but if the two arguments identify directories, then the command compares corresponding files in the directories. With the <code>-r</code> option, it recursively descends matching subdirectories to compare files with corresponding relative paths.


If ''original'' and ''new'' are directories, then {{Mono|diff}} will be run on each file that exists in both directories. An option, <code>-r</code>, will recursively descend any matching subdirectories to compare files between directories.
===Default output format===
The example below shows the original and new file content as well as the resulting <code>diff</code> output in the default format. The output is shown with coloring to improve readability. By default, diff outputs [[plain text]], but GNU diff does use color [[syntax highlighting|highlighting]] when the {{code|--color}} option is used.{{cn|date=July 2025}}


Any of the examples in the article use the following two files, ''original'' and ''new'':
{{Col-begin}}
{{Col-begin}}
{{Col-break|width=33%}}
{{Col-break|width=33%}}
Line 162: Line 164:
</syntaxhighlight>
</syntaxhighlight>
{{col-break|width=33%}}
{{col-break|width=33%}}
The command '''<code>diff original new</code>''' produces the following ''normal diff output'':
output:
{{pre|
{{pre|
0a1,6
0a1,6
Line 186: Line 188:
> important new additions
> important new additions
> to this document.}}}}
> to this document.}}}}
{{col-end}}


'''Note''': ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is [[plain text]]. However, many tools can show the output with colors by using [[syntax highlighting]].''
In this default format, {{code|a}} stands for added, {{code|d}} for deleted and {{code|c}} for changed. The line number of the original file appears before the single-letter code and the line number of the new file appears after. The [[Less-than sign|less-than]] and [[Greater-than sign|greater-than]] signs (at the beginning of lines that are added, deleted or changed) indicate which file the lines appear in. Addition lines are added to the original file to appear in the new file. Deletion lines are deleted from the original file to be missing in the new file.
{{col-end}}
In this traditional output format, '''<samp>a</samp>''' stands for ''added'', '''<samp>d</samp>''' for ''deleted'' and '''<samp>c</samp>''' for ''changed''. Line numbers of the original file appear before <samp>a</samp>/<samp>d</samp>/<samp>c</samp> and those of the new file appear after. The [[Less-than sign|less-than]] and [[Greater-than sign|greater-than]] signs (at the beginning of lines that are added, deleted or changed) indicate which file the lines appear in. Addition lines are added to the original file to appear in the new file. Deletion lines are deleted from the original file to be missing in the new file.


By default, lines common to both files are not shown. Lines that have moved are shown as added at their new location and as deleted from their old location.<ref>{{cite book|title=Comparing and Merging Files with GNU Diff and Patch|url=https://www.gnu.org/software/diffutils/manual/|author1=David MacKenzie|author2=Paul Eggert|author3=Richard Stallman|isbn=978-0-9541617-5-0|publisher=Network Theory|year=1997|location=Bristol|access-date=2015-03-17|archive-date=2015-03-31|archive-url=https://web.archive.org/web/20150331031946/http://www.gnu.org/software/diffutils/manual/|url-status=live}}</ref> However, some diff tools highlight moved lines.
By default, lines common to both files are not shown. Lines that have moved are shown as added at their new location and as deleted from their old location.<ref>{{cite book|title=Comparing and Merging Files with GNU Diff and Patch|url=https://www.gnu.org/software/diffutils/manual/|author1=David MacKenzie|author2=Paul Eggert|author3=Richard Stallman|isbn=978-0-9541617-5-0|publisher=Network Theory|year=1997|location=Bristol|access-date=2015-03-17|archive-date=2015-03-31|archive-url=https://web.archive.org/web/20150331031946/http://www.gnu.org/software/diffutils/manual/|url-status=live}}</ref> However, some diff tools highlight moved lines.


== Output variations{{anchor|variations}} ==
{{anchor|variations}}


=== Edit script ===
=== Edit script ===
An [[Ed (text editor)|ed script]] can still be generated by modern versions of diff with the <code>-e</code> option. The resulting edit script for this example is as follows:
An [[Ed (software)|ed]] script can be generated by modern versions of diff with the <code>-e</code> option. The resulting edit script for this example is as follows:


  24'''a'''
  24'''a'''
Line 217: Line 218:
  .
  .


In order to transform the content of file ''original'' into the content of file ''new'' using {{Mono|ed}}, we should append two lines to this diff file, one line containing a <code>w</code> (write) command, and one containing a <code>q</code> (quit) command (e.g. by {{code|lang=bash|printf "w\nq\n" >> mydiff}}). Here we gave the diff file the name ''mydiff'' and the transformation will then happen when we run {{code|lang=bash|ed -s original < mydiff}}.
In order to transform the content of the original file into the content of new file using {{Mono|ed}}, one appends two lines to this diff file, one line containing a <code>w</code> (write) command, and one containing a <code>q</code> (quit) command (e.g. by {{code|lang=bash|printf "w\nq\n" >> mydiff}}). Here we gave the diff file the name ''mydiff'' and the transformation will then happen when we run {{code|lang=bash|ed -s original < mydiff}}.


=== Context format ===
=== Context format ===
Line 279: Line 280:
+ to this document.
+ to this document.
</syntaxhighlight>
</syntaxhighlight>
'''Note''': ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is [[plain text]]. However, many tools can show the output with colors by using [[syntax highlighting]].''


=== Unified format ===
=== Unified format ===
The ''unified format'' (or ''unidiff'')<ref>{{Cite web|url=https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html|title=Detailed Description of Unified Format|website=GNU Diffutils (version 3.7, 7 January 2018)|access-date=29 January 2020|archive-date=18 January 2020|archive-url=https://web.archive.org/web/20200118142136/http://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html|url-status=live}}</ref><ref>{{Cite web|url=https://www.artima.com/weblogs/viewpost.jsp?thread=164293|title=Unified Diff Format|last=van Rossum|first=Guido|website=All Things Pythonic|access-date=2020-01-29|archive-date=2019-12-25|archive-url=https://web.archive.org/web/20191225234517/https://www.artima.com/weblogs/viewpost.jsp?thread=164293|url-status=live}}</ref> inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the "<code>-u</code>" [[command-line option]]. This output is often used as input to the [[patch (Unix)|patch]] program. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers.
The ''unified format'' (or ''unidiff'')<ref>{{Cite web|url=https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html|title=Detailed Description of Unified Format|website=GNU Diffutils (version 3.7, 7 January 2018)|access-date=29 January 2020|archive-date=18 January 2020|archive-url=https://web.archive.org/web/20200118142136/http://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html|url-status=live}}</ref><ref>{{Cite web|url=https://www.artima.com/weblogs/viewpost.jsp?thread=164293|title=Unified Diff Format|last=van Rossum|first=Guido|website=All Things Pythonic|access-date=2020-01-29|archive-date=2019-12-25|archive-url=https://web.archive.org/web/20191225234517/https://www.artima.com/weblogs/viewpost.jsp?thread=164293|url-status=live}}</ref> inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the "<code>-u</code>" [[command-line option]]. This output is often used as input to the [[patch (Unix)|patch]] program. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers.


Unified context diffs were originally developed by Wayne Davison in August 1990 (in '''unidiff''' which appeared in Volume 14 of comp.sources.misc). [[Richard Stallman]] added unified diff support to the [[GNU|GNU Project]]'s diff utility one month later, and the feature debuted in '''GNU diff''' 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs.
Unified context diffs were originally developed by Wayne Davison in August 1990 (in '''unidiff''' which appeared in Volume 14 of comp.sources.misc). [[Richard Stallman]] added unified diff support to the [[GNU|GNU Project]]'s diff one month later, and the feature debuted in '''GNU diff''' 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs.


The format starts with the same two-line [[Header (computing)|header]] as the context format, except that the original file is preceded by "<samp>---</samp>" and the new file is preceded by "<samp>+++</samp>". Following this are one or more '''change hunks''' that contain the line differences in the file. The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a [[plus sign]], and deletion lines are preceded by a [[minus sign]].
The format starts with the same two-line [[Header (computing)|header]] as the context format, except that the original file is preceded by "<samp>---</samp>" and the new file is preceded by "<samp>+++</samp>". Following this are one or more '''change hunks''' that contain the line differences in the file. The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a [[plus sign]], and deletion lines are preceded by a [[minus sign]].
Line 344: Line 344:
+to this document.
+to this document.
</syntaxhighlight>
</syntaxhighlight>
'''Note''': ''Here, the diff output is shown with colors to make it easier to read. The diff utility does not produce colored output; its output is [[plain text]]. However, many tools can show the output with colors by using [[syntax highlighting]].''


Note that to successfully separate the file names from the timestamps, the delimiter between them is a tab character. This is invisible on screen and can be lost when diffs are copy/pasted from console/terminal screens.
To successfully separate the file names from the timestamps, the delimiter between them is a tab character. This is invisible on screen and can be lost when diffs are copy/pasted from console/terminal screens.


=== Extensions ===
=== Extensions ===
Line 355: Line 354:
  Index: path/to/file.cpp
  Index: path/to/file.cpp


The special case of files that do not end in a newline is not handled. Neither the unidiff utility nor the POSIX diff standard define a way to handle this type of files. (Indeed, such files are not "text" files by strict POSIX definitions.<ref>http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 {{Webarchive|url=https://web.archive.org/web/20130429195728/http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 |date=2013-04-29 }} Section 3.206</ref>) GNU diff and git produce "\ No newline at end of file" (or a translated version) as a diagnostic, but this behavior is not portable.<ref>{{cite web |title=Incomplete Lines (Comparing and Merging Files) |url=https://www.gnu.org/software/diffutils/manual/html_node/Incomplete-Lines.html |website=www.gnu.org}}</ref> GNU patch does not seem to handle this case, while git-apply does.<ref>{{cite web |title=git: apply.c |url=https://github.com/git/git/blob/69c786637d7a7fe3b2b8f7d989af095f5f49c3a8/apply.c#LL2901C35-L2901C39 |publisher=Git |date=8 May 2023}}</ref>
The special case of files that do not end in a newline is not handled. Neither {{code|unidiff}} nor the POSIX {{code|diff}} standard define a way to handle this type of files. (Indeed, such files are not "text" files by strict POSIX definitions.<ref>http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 {{Webarchive|url=https://web.archive.org/web/20130429195728/http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 |date=2013-04-29 }} Section 3.206</ref>) GNU diff and git produce "\ No newline at end of file" (or a translated version) as a diagnostic, but this behavior is not portable.<ref>{{cite web |title=Incomplete Lines (Comparing and Merging Files) |url=https://www.gnu.org/software/diffutils/manual/html_node/Incomplete-Lines.html |website=www.gnu.org}}</ref> GNU patch does not seem to handle this case, while git-apply does.<ref>{{cite web |title=git: apply.c |url=https://github.com/git/git/blob/69c786637d7a7fe3b2b8f7d989af095f5f49c3a8/apply.c#LL2901C35-L2901C39 |publisher=Git |date=8 May 2023}}</ref>


The [[Patch (Unix)|patch]] program does not necessarily recognize implementation-specific diff output. GNU patch is, however, known to recognize git patches and act a little differently.<ref>{{cite web |title=patch.c\src - patch.git - GNU patch |url=https://git.savannah.gnu.org/cgit/patch.git/tree/src/patch.c?id=c835ecc67b7e37c0d0b7dd7e032209fdaa285808#n1919 |website=git.savannah.gnu.org |quote=In git-style diffs, the "before" state of each patch refers to the initial state before modifying any files,..}}</ref>
The [[Patch (Unix)|patch]] program does not necessarily recognize implementation-specific diff output. GNU patch is, however, known to recognize git patches and act a little differently.<ref>{{cite web |title=patch.c\src - patch.git - GNU patch |url=https://git.savannah.gnu.org/cgit/patch.git/tree/src/patch.c?id=c835ecc67b7e37c0d0b7dd7e032209fdaa285808#n1919 |website=git.savannah.gnu.org |quote=In git-style diffs, the "before" state of each patch refers to the initial state before modifying any files,..}}</ref>
Line 393: Line 392:
'''spiff''' is a variant of ''diff'' that ignores differences in floating point calculations with roundoff errors and [[Space (punctuation)|whitespace]], both of which are generally irrelevant to source code comparison. [[Telcordia Technologies|Bellcore]] wrote the original version.<ref name="dontcallme"/><ref name="hpux"/> An [[HPUX]] port is the most current public release. spiff does not support binary files. spiff outputs to the [[standard output]] in standard diff format and accepts inputs in the [[C (programming language)|C]], [[Bourne shell]], [[Fortran]], [[Modula-2]] and [[Lisp (programming language)|Lisp]] [[programming language]]s.<ref>{{cite web|url=http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man1/spiff.1|title=SPIFF 1|date=1988-02-02|access-date=2013-06-16|archive-date=2016-10-02|archive-url=https://web.archive.org/web/20161002135508/http://www.math.utah.edu/cgi-bin/man2html.cgi?%2Fusr%2Flocal%2Fman%2Fman1%2Fspiff.1|url-status=live}}</ref><ref>{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/man.html|title=Man page|location=UK|first=Daniel W|last=Nachbar|date=1988-02-02|access-date=2013-06-16|archive-date=2012-09-10|archive-url=https://web.archive.org/web/20120910004102/http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/man.html|url-status=live}}</ref><ref name="dontcallme">{{cite web|url=https://github.com/dontcallmedom/spiff|title=spiff|author=dontcallmedotcom|website=[[GitHub]]|access-date=2013-06-16|archive-date=2015-03-26|archive-url=https://web.archive.org/web/20150326181245/https://github.com/dontcallmedom/spiff|url-status=live}}</ref><ref>{{cite web|url=https://stackoverflow.com/a/1489107/2291035|date=2009-09-28|author=Davide|title=stackoverflow|access-date=2013-06-16|archive-date=2022-02-19|archive-url=https://web.archive.org/web/20220219233604/https://stackoverflow.com/questions/1428177/diff-tool-that-ignores-floating-point-formats-but-not-values-in-text/1489107|url-status=live}}</ref><ref name="hpux">{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/|title=HP-UX Porting and Archiving|location=UK|first=Daniel W|last=Nachbar|date=1999-12-01|access-date=2013-06-13|archive-date=2012-09-05|archive-url=https://web.archive.org/web/20120905154135/http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/|url-status=live}}</ref>
'''spiff''' is a variant of ''diff'' that ignores differences in floating point calculations with roundoff errors and [[Space (punctuation)|whitespace]], both of which are generally irrelevant to source code comparison. [[Telcordia Technologies|Bellcore]] wrote the original version.<ref name="dontcallme"/><ref name="hpux"/> An [[HPUX]] port is the most current public release. spiff does not support binary files. spiff outputs to the [[standard output]] in standard diff format and accepts inputs in the [[C (programming language)|C]], [[Bourne shell]], [[Fortran]], [[Modula-2]] and [[Lisp (programming language)|Lisp]] [[programming language]]s.<ref>{{cite web|url=http://www.math.utah.edu/cgi-bin/man2html.cgi?/usr/local/man/man1/spiff.1|title=SPIFF 1|date=1988-02-02|access-date=2013-06-16|archive-date=2016-10-02|archive-url=https://web.archive.org/web/20161002135508/http://www.math.utah.edu/cgi-bin/man2html.cgi?%2Fusr%2Flocal%2Fman%2Fman1%2Fspiff.1|url-status=live}}</ref><ref>{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/man.html|title=Man page|location=UK|first=Daniel W|last=Nachbar|date=1988-02-02|access-date=2013-06-16|archive-date=2012-09-10|archive-url=https://web.archive.org/web/20120910004102/http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/man.html|url-status=live}}</ref><ref name="dontcallme">{{cite web|url=https://github.com/dontcallmedom/spiff|title=spiff|author=dontcallmedotcom|website=[[GitHub]]|access-date=2013-06-16|archive-date=2015-03-26|archive-url=https://web.archive.org/web/20150326181245/https://github.com/dontcallmedom/spiff|url-status=live}}</ref><ref>{{cite web|url=https://stackoverflow.com/a/1489107/2291035|date=2009-09-28|author=Davide|title=stackoverflow|access-date=2013-06-16|archive-date=2022-02-19|archive-url=https://web.archive.org/web/20220219233604/https://stackoverflow.com/questions/1428177/diff-tool-that-ignores-floating-point-formats-but-not-values-in-text/1489107|url-status=live}}</ref><ref name="hpux">{{cite web|url=http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/|title=HP-UX Porting and Archiving|location=UK|first=Daniel W|last=Nachbar|date=1999-12-01|access-date=2013-06-13|archive-date=2012-09-05|archive-url=https://web.archive.org/web/20120905154135/http://hpux.connect.org.uk/hppd/hpux/Text/spiff-1.0/|url-status=live}}</ref>


LibXDiff is an LGPL [[Library (computing)|library]] that provides an interface to many algorithms from 1998. An improved Myers algorithm with [[Rabin fingerprint]] was originally implemented (as of the final release of 2008),<ref>{{cite web |last1=Libenzi |first1=Davide |title=LibXDiff |url=http://freshmeat.sourceforge.net/projects/xdiff-lib |website=SourceForge FreshMeat |language=en |access-date=2020-06-28 |archive-date=2020-07-01 |archive-url=https://web.archive.org/web/20200701070301/http://freshmeat.sourceforge.net/projects/xdiff-lib |url-status=live }}</ref> but [[git]] and [[libgit2]]'s fork has since expanded the repository with many of its own. One algorithm called "histogram" is generally regarded as much better than the original Myers algorithm, both in speed and quality.<ref>{{cite journal |last1=Nugroho |first1=Yusuf Sulistyo |last2=Hata |first2=Hideaki |last3=Matsumoto |first3=Kenichi |title=How different are different diff algorithms in Git?: Use --histogram for code changes |website=Empirical Software Engineering |pages=790–823 |language=en |doi=10.1007/s10664-019-09772-z |date=January 2020|s2cid=59608676 |doi-access=free |arxiv=1902.02467 }}</ref><ref>{{cite web |title=algorithm - What's the difference between 'git diff --patience' and 'git diff --histogram'? |url=https://stackoverflow.com/a/32367597/ |website=Stack Overflow |quote=This does indeed show that histogram diff slightly beats Myers, while patience is much slower than the others. |access-date=2020-06-28 |archive-date=2022-02-19 |archive-url=https://web.archive.org/web/20220219233605/https://stackoverflow.com/questions/32365271/whats-the-difference-between-git-diff-patience-and-git-diff-histogram/32367597 |url-status=live }}</ref> This is the modern version of ''LibXDiff'' used by Vim.<ref name="brabandt2018"/>
LibXDiff is an LGPL [[Library (computing)|library]] that provides an interface to many algorithms from 1998. An improved Myers algorithm with [[Rabin fingerprint]] was originally implemented (as of the final release of 2008),<ref>{{cite web |last1=Libenzi |first1=Davide |title=LibXDiff |url=https://freshmeat.sourceforge.net/projects/xdiff-lib |website=SourceForge FreshMeat |language=en |access-date=2020-06-28 |archive-date=2020-07-01 |archive-url=https://web.archive.org/web/20200701070301/http://freshmeat.sourceforge.net/projects/xdiff-lib |url-status=live }}</ref> but [[git]] and [[libgit2]]'s fork has since expanded the repository with many of its own. One algorithm called "histogram" is generally regarded as much better than the original Myers algorithm, both in speed and quality.<ref>{{cite journal |last1=Nugroho |first1=Yusuf Sulistyo |last2=Hata |first2=Hideaki |last3=Matsumoto |first3=Kenichi |title=How different are different diff algorithms in Git?: Use --histogram for code changes |website=Empirical Software Engineering |pages=790–823 |language=en |doi=10.1007/s10664-019-09772-z |date=January 2020|s2cid=59608676 |doi-access=free |arxiv=1902.02467 }}</ref><ref>{{cite web |title=algorithm - What's the difference between 'git diff --patience' and 'git diff --histogram'? |url=https://stackoverflow.com/a/32367597/ |website=Stack Overflow |quote=This does indeed show that histogram diff slightly beats Myers, while patience is much slower than the others. |access-date=2020-06-28 |archive-date=2022-02-19 |archive-url=https://web.archive.org/web/20220219233605/https://stackoverflow.com/questions/32365271/whats-the-difference-between-git-diff-patience-and-git-diff-histogram/32367597 |url-status=live }}</ref> This is the modern version of ''LibXDiff'' used by Vim.<ref name="brabandt2018"/>


==See also==
==See also==
{{colbegin}}
{{colbegin}}
*[[Comparison of file comparison tools]]
*{{Annotated link|cmp (Unix)|cmp}}
*[[Delta encoding]]
*{{Annotated link|Comparison of file comparison tools}}
*[[Difference operator]]
*{{Annotated link|Delta encoding}}
*[[Edit distance]]
*{{Annotated link|Difference operator}}
**[[Levenshtein distance]]
*{{Annotated link|File Compare}}
*[[History of software configuration management]]
*{{Annotated link|History of software configuration management}}
*[[Longest common subsequence problem]]
*{{Annotated link|Revision control}}
*[[Microsoft File Compare]]
*{{Annotated link|Software configuration management}}
*[[WinDiff|Microsoft WinDiff]]
*{{Annotated link|WinDiff}}
*[[Revision control]]
*[[Software configuration management]]
{{colend}}
 
===Other free file comparison tools===
{{colbegin}}
*[[cmp (Unix)|cmp]]
*[[comm]]
*[[tkdiff]]
*[[WinMerge]] (Microsoft Windows)
*[[meld (software)|meld]]
*[[Pretty Diff]]
{{colend}}
{{colend}}



Latest revision as of 01:30, 14 October 2025

Template:Short description Script error: No such module "about". Template:Lowercase title Script error: No such module "Infobox".Template:Template otherScript error: No such module "Check for unknown parameters".Template:Main other diff is a shell command that compares the content of files and reports differences. The term diff is also used to identify the output of the command and is used as a verb for running the command. To diff files, one runs diff to create a diff.[1]

Typically, the command is used to compare text files, but it does support comparing binary files. If one of the input files contains non-textual data, then the command defaults to brief-mode in which it reports only a summary indication of whether the files differ. With the --text option, it always reports line-based differences, but the output may be difficult to understand since binary data is generally not structured in lines like text is.[2]

Although the command is primarily used ad hoc to analyze changes between two files, a special use is for creating a patch file for use with the patch command Template:Endash which was specifically designed to use a diff output report as a patch file. POSIX standardized the diff and patch commands including their shared file format.[3]

History

The original Template:Mono utility was developed in the early 1970s for the Unix operating system, at Bell Labs in Murray Hill, New Jersey. It was part of the 5th Edition of Unix released in 1974,[4] and was written by Douglas McIlroy, and James Hunt. This research was published in a 1976 paper co-written with James W. Hunt, who developed an initial prototype of Template:Mono.[5] The algorithm this paper described became known as the Hunt–Szymanski algorithm.

McIlroy's work was preceded and influenced by Steve Johnson's comparison program on GECOS and Mike Lesk's Template:Mono program. Template:Mono also originated on Unix and, like Template:Mono, produced line-by-line changes and even used angle-brackets (">" and "<") for presenting line insertions and deletions in the program's output. The heuristics used in these early applications were, however, deemed unreliable. The potential usefulness of a diff tool provoked McIlroy into researching and designing a more robust tool that could be used in a variety of tasks, but perform well in the processing and size limitations of the PDP-11's hardware. His approach to the problem resulted from collaboration with individuals at Bell Labs including Alfred Aho, Elliot Pinson, Jeffrey Ullman, and Harold S. Stone.

In the context of Unix, the use of the [[ed (Unix)|Template:Mono]] line editor provided Template:Mono with the natural ability to create machine-usable "edit scripts". These edit scripts, when saved to a file, can, along with the original file, be reconstituted by Template:Mono into the modified file in its entirety. This greatly reduced the secondary storage necessary to maintain multiple versions of a file. McIlroy considered writing a post-processor for Template:Mono where a variety of output formats could be designed and implemented, but he found it more frugal and simpler to have Template:Mono be responsible for generating the syntax and reverse-order input accepted by the Template:Mono command.

In 1984, Larry Wall created the [[patch (Unix)|Template:Mono]] utility (releasing its source code on the mod.sources and net.sources newsgroups[6][7][8]) for patching text files; using the output from Template:Mono plus the diff input file with the content before changes to create a file with the content after changes.

X/Open Portability Guide issue 2 of 1987 includes diff. Context mode was added in POSIX.1-2001 (issue 6). Unified mode was added in POSIX.1-2008 (issue 7).[9]

In Template:Mono's early years, common uses included comparing changes in the source of software code and markup for technical documents, verifying program debugging output, comparing filesystem listings and analyzing computer assembly code. The output targeted for Template:Mono was motivated to provide compression for a sequence of modifications made to a file.Script error: No such module "Unsubst". The Source Code Control System (SCCS) and its ability to archive revisions emerged in the late 1970s as a consequence of storing edit scripts from Template:Mono.

Algorithm

Unlike edit distance notions used for other purposes, diff is line-oriented rather than character-oriented, but it is like Levenshtein distance in that it tries to determine the smallest set of deletions and insertions to create one file from the other.

The operation of Template:Mono is based on solving the longest common subsequence problem.[5] In this problem, given two sequences of items:

Template:Underline Template:Underline Template:Underline Template:Underline Template:Underline Template:Underline h Template:Underline q Template:Underline
Template:Underline Template:Underline Template:Underline Template:Underline e Template:Underline Template:Underline i Template:Underline k r x y Template:Underline

and we want to find a longest sequence of items that is present in both original sequences in the same order. That is, we want to find a new sequence which can be obtained from the first original sequence by deleting some items, and from the second original sequence by deleting other items. We also want this sequence to be as long as possible. In this case it is

a b c d  f  g  j  z

From a longest common subsequence it is only a small step to get Template:Mono-like output: if an item is absent in the subsequence but present in the first original sequence, it must have been deleted (as indicated by the '-' marks, below). If it is absent in the subsequence but present in the second original sequence, it must have been inserted (as indicated by the '+' marks).

e   h i   q   k r x y
+   - +   -   + + + +

Use

The diff command accepts two arguments like: diff original new. Commonly, the arguments each identify normal files, but if the two arguments identify directories, then the command compares corresponding files in the directories. With the -r option, it recursively descends matching subdirectories to compare files with corresponding relative paths.

Default output format

The example below shows the original and new file content as well as the resulting diff output in the default format. The output is shown with coloring to improve readability. By default, diff outputs plain text, but GNU diff does use color highlighting when the --color option is used.Script error: No such module "Unsubst".

<templatestyles src="Col-begin/styles.css"/>

In this default format, a stands for added, d for deleted and c for changed. The line number of the original file appears before the single-letter code and the line number of the new file appears after. The less-than and greater-than signs (at the beginning of lines that are added, deleted or changed) indicate which file the lines appear in. Addition lines are added to the original file to appear in the new file. Deletion lines are deleted from the original file to be missing in the new file.

By default, lines common to both files are not shown. Lines that have moved are shown as added at their new location and as deleted from their old location.[10] However, some diff tools highlight moved lines.

Script error: No such module "anchor".

Edit script

An ed script can be generated by modern versions of diff with the -e option. The resulting edit script for this example is as follows:

24a

This paragraph contains
important new additions
to this document.
.
17c
check this document. On
.
11,15d
0a
This is an important
notice! It should
therefore be located at
the beginning of this
document!

.

In order to transform the content of the original file into the content of new file using Template:Mono, one appends two lines to this diff file, one line containing a w (write) command, and one containing a q (quit) command (e.g. by printf "w\nq\n" >> mydiff). Here we gave the diff file the name mydiff and the transformation will then happen when we run ed -s original < mydiff.

Context format

The Berkeley distribution of Unix made a point of adding the context format (-c) and the ability to recurse on filesystem directory structures (-r), adding those features in 2.8 BSD, released in July 1981. The context format of diff introduced at Berkeley helped with distributing patches for source code that may have been changed minimally.

In the context format, any changed lines are shown alongside unchanged lines before and after. The inclusion of any number of unchanged lines provides a context to the patch. The context consists of lines that have not changed between the two files and serve as a reference to locate the lines' place in a modified file and find the intended location for a change to be applied regardless of whether the line numbers still correspond. The context format introduces greater readability for humans and reliability when applying the patch, and an output which is accepted as input to the patch program. This intelligent behavior is not possible with the traditional diff output.

The number of unchanged lines shown above and below a change hunk can be defined by the user, even zero, but three lines is typically the default. If the context of unchanged lines in a hunk overlap with an adjacent hunk, then diff will avoid duplicating the unchanged lines and merge the hunks into a single hunk.

A "!" represents a change between lines that correspond in the two files, whereas a "+" represents the addition of a line, and a "-" the removal of a line. A blank space represents an unchanged line. At the beginning of the patch is the file information, including the full path and a time stamp delimited by a tab character. At the beginning of each hunk are the line numbers that apply for the corresponding change in the files. A number range appearing between sets of three asterisks applies to the original file, while sets of three dashes apply to the new file. The hunk ranges specify the starting and ending line numbers in the respective file.

The command diff -c original new produces the following output:

*** /path/to/original	timestamp
--- /path/to/new	timestamp
***************
*** 1,3 ****
--- 1,9 ----
+ This is an important
+ notice! It should
+ therefore be located at
+ the beginning of this
+ document!
+
  This part of the
  document has stayed the
  same from version to
***************
*** 8,20 ****
  compress the size of the
  changes.

- This paragraph contains
- text that is outdated.
- It will be deleted in the
- near future.

  It is important to spell
! check this dokument. On
  the other hand, a
  misspelled word isn't
  the end of the world.
--- 14,21 ----
  compress the size of the
  changes.

  It is important to spell
! check this document. On
  the other hand, a
  misspelled word isn't
  the end of the world.
***************
*** 22,24 ****
--- 23,29 ----
  this paragraph needs to
  be changed. Things can
  be added after it.
+
+ This paragraph contains
+ important new additions
+ to this document.

Unified format

The unified format (or unidiff)[11][12] inherits the technical improvements made by the context format, but produces a smaller diff with old and new text presented immediately adjacent. Unified format is usually invoked using the "-u" command-line option. This output is often used as input to the patch program. Many projects specifically request that "diffs" be submitted in the unified format, making unified diff format the most common format for exchange between software developers.

Unified context diffs were originally developed by Wayne Davison in August 1990 (in unidiff which appeared in Volume 14 of comp.sources.misc). Richard Stallman added unified diff support to the GNU Project's diff one month later, and the feature debuted in GNU diff 1.15, released in January 1991. GNU diff has since generalized the context format to allow arbitrary formatting of diffs.

The format starts with the same two-line header as the context format, except that the original file is preceded by "---" and the new file is preceded by "+++". Following this are one or more change hunks that contain the line differences in the file. The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a plus sign, and deletion lines are preceded by a minus sign.

A hunk begins with range information and is immediately followed with the line additions, line deletions, and any number of the contextual lines. The range information is surrounded by double at signs, and combines onto a single line what appears on two lines in the context format (above). The format of the range information line is as follows:

@@ -l,s +l,s @@ optional section heading

The hunk range information contains two hunk ranges. The range for the hunk of the original file is preceded by a minus symbol, and the range for the new file is preceded by a plus symbol. Each hunk range is of the format l,s where l is the starting line number and s is the number of lines the change hunk applies to for each respective file. In many versions of GNU diff, each range can omit the comma and trailing value s, in which case s defaults to 1. Note that the only really interesting value is the l line number of the first range; all the other values can be computed from the diff.

The hunk range for the original should be the sum of all contextual and deletion (including changed) hunk lines. The hunk range for the new file should be a sum of all contextual and addition (including changed) hunk lines. If hunk size information does not correspond with the number of lines in the hunk, then the diff could be considered invalid and be rejected.

Optionally, the hunk range can be followed by the heading of the section or function that the hunk is part of. This is mainly useful to make the diff easier to read. When creating a diff with GNU diff, the heading is identified by regular expression matching.[13]

If a line is modified, it is represented as a deletion and addition. Since the hunks of the original and new file appear in the same hunk, such changes would appear adjacent to one another.[14] An occurrence of this in the example below is:

-check this dokument. On
+check this document. On

The command diff -u original new produces the following output:

--- /path/to/original	timestamp
+++ /path/to/new	timestamp
@@ -1,3 +1,9 @@
+This is an important
+notice! It should
+therefore be located at
+the beginning of this
+document!
+
 This part of the
 document has stayed the
 same from version to
@@ -8,13 +14,8 @@
 compress the size of the
 changes.

-This paragraph contains
-text that is outdated.
-It will be deleted in the
-near future.
-
 It is important to spell
-check this dokument. On
+check this document. On
 the other hand, a
 misspelled word isn't
 the end of the world.
@@ -22,3 +23,7 @@
 this paragraph needs to
 be changed. Things can
 be added after it.
+
+This paragraph contains
+important new additions
+to this document.

To successfully separate the file names from the timestamps, the delimiter between them is a tab character. This is invisible on screen and can be lost when diffs are copy/pasted from console/terminal screens.

Extensions

There are some modifications and extensions to the diff formats that are used and understood by certain programs and in certain contexts. For example, some revision control systems—such as Subversion—specify a version number, "working copy", or any other comment instead of or in addition to a timestamp in the diff's header section.

Some tools allow diffs for several different files to be merged into one, using a header for each modified file that may look something like this:

Index: path/to/file.cpp

The special case of files that do not end in a newline is not handled. Neither unidiff nor the POSIX diff standard define a way to handle this type of files. (Indeed, such files are not "text" files by strict POSIX definitions.[15]) GNU diff and git produce "\ No newline at end of file" (or a translated version) as a diagnostic, but this behavior is not portable.[16] GNU patch does not seem to handle this case, while git-apply does.[17]

The patch program does not necessarily recognize implementation-specific diff output. GNU patch is, however, known to recognize git patches and act a little differently.[18]

Implementations and related programsScript error: No such module "anchor".

Changes since 1975 include improvements to the core algorithm, the addition of useful features to the command, and the design of new output formats. The basic algorithm is described in the papers An O(ND) Difference Algorithm and its Variations by Eugene W. Myers[19] and in A File Comparison Program by Webb Miller and Myers.[20] The algorithm was independently discovered and described in Algorithms for Approximate String Matching, by Esko Ukkonen.[21] The first editions of the diff program were designed for line comparisons of text files expecting the newline character to delimit lines. By the 1980s, support for binary files resulted in a shift in the application's design and implementation.

GNU diff and diff3 are included in the diffutils package with other diff and patch related utilities.[22]

Formatters and front-ends

Postprocessors sdiff and diffmk render side-by-side diff listings and applied change marks to printed documents, respectively. Both were developed elsewhere in Bell Labs in or before 1981.Script error: No such module "Unsubst".Template:Discuss

Diff3 compares one file against two other files by reconciling two diffs. It was originally conceived by Paul Jensen to reconcile changes made by two people editing a common source. It is also used by revision control systems, e.g. RCS, for merging.[23]

Emacs has Ediff for showing the changes a patch would provide in a user interface that combines interactive editing and merging capabilities for patch files.

Vim provides vimdiff to compare from two to eight files, with differences highlighted in color.[24] While historically invoking the diff program, modern vim uses git's fork of xdiff library (LibXDiff) code, providing improved speed and functionality.[25]

GNU Wdiff[26] is a front end to diff that shows the words or phrases that changed in a text document of written language even in the presence of word-wrapping or different column widths.

colordiff is a Perl wrapper for 'diff' and produces the same output but with colorization for added and deleted bits.[27] diff-so-fancy and diff-highlight are newer analogues.[28] "delta" is a Rust rewrite that highlights changes and the underlying code at the same time.[29]

Patchutils contains tools that combine, rearrange, compare and fix context diffs and unified diffs.[30]

Algorithmic derivatives

Utilities that compare source files by their syntactic structure have been built mostly as research tools for some programming languages;[31][32][33] some are available as commercial tools.[34][35] In addition, free tools that perform syntax-aware diff include:

  • C++: zograscope, AST-based.[36]
  • HTML: Daisydiff,[37] html-differ.
  • XML: xmldiffpatch by Microsoft and xmldiffmerge for IBM.[38][39]
  • JavaScript: astii (AST-based).
  • Multi-language: Pretty Diff (format code and then diff)[40]

spiff is a variant of diff that ignores differences in floating point calculations with roundoff errors and whitespace, both of which are generally irrelevant to source code comparison. Bellcore wrote the original version.[41][42] An HPUX port is the most current public release. spiff does not support binary files. spiff outputs to the standard output in standard diff format and accepts inputs in the C, Bourne shell, Fortran, Modula-2 and Lisp programming languages.[43][44][41][45][42]

LibXDiff is an LGPL library that provides an interface to many algorithms from 1998. An improved Myers algorithm with Rabin fingerprint was originally implemented (as of the final release of 2008),[46] but git and libgit2's fork has since expanded the repository with many of its own. One algorithm called "histogram" is generally regarded as much better than the original Myers algorithm, both in speed and quality.[47][48] This is the modern version of LibXDiff used by Vim.[25]

See also

Template:Colbegin

Template:Colend

References

Template:Reflist

Further reading

External links

Template:Sister project Template:Prone to spam

Script error: No such module "Navbox". Template:Plan 9 commands Template:Version control software

  1. Eric S. Raymond (ed.), "diff" Template:Webarchive, The Jargon File, version 4.4.7
  2. MacKenzie et al. "Binary Files and Forcing Text Comparison" in Comparing and Merging Files with GNU Diff and Patch. Downloaded 28 April 2007. [1] Template:Webarchive
  3. Script error: No such module "citation/CS1". IEEE Std. 1003.1-2001 specifies traditional, "ed script", and context diff output formats; IEEE Std. 1003.1-2008 added the (by then more common) unified format.
  4. https://minnie.tuhs.org/cgi-bin/utree.pl?file=V5/usr/source/s1/diff1.c Template:Bare URL inline
  5. a b Script error: No such module "Citation/CS1".
  6. Script error: No such module "citation/CS1".
  7. Script error: No such module "citation/CS1".
  8. Script error: No such module "citation/CS1".
  9. Template:Man
  10. Script error: No such module "citation/CS1".
  11. Script error: No such module "citation/CS1".
  12. Script error: No such module "citation/CS1".
  13. 2.2.3 Showing Which Sections Differences Are in, GNU diffutils manual
  14. Unified Diff Format by Guido van Rossum, June 14, 2006
  15. http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_403 Template:Webarchive Section 3.206
  16. Script error: No such module "citation/CS1".
  17. Script error: No such module "citation/CS1".
  18. Script error: No such module "citation/CS1".
  19. Script error: No such module "Citation/CS1".
  20. Script error: No such module "Citation/CS1".
  21. Script error: No such module "Citation/CS1".
  22. GNU Diff utilities Template:Webarchive. Made available by the Free Software Foundation. Free Documentation. Free source code.
  23. Script error: No such module "citation/CS1".
  24. Script error: No such module "citation/CS1".
  25. a b Script error: No such module "citation/CS1".
  26. Script error: No such module "citation/CS1".
  27. Script error: No such module "citation/CS1".
  28. Script error: No such module "citation/CS1".
  29. Script error: No such module "citation/CS1".
  30. Script error: No such module "citation/CS1".
  31. Script error: No such module "Citation/CS1".
  32. Script error: No such module "Citation/CS1".
  33. Grass. Cdiff: A syntax directed Diff for C++ programs. Proceedings USENIX C++ Conf., pp. 181-193, 1992
  34. Compare++, http://www.coodesoft.com/ Template:Webarchive
  35. SmartDifferencer, http://www.semanticdesigns.com/Products/SmartDifferencer Template:Webarchive
  36. Script error: No such module "citation/CS1".
  37. DaisyDiff, https://code.google.com/p/daisydiff/ Template:Webarchive
  38. xmldiffpatch, http://msdn.microsoft.com/en-us/library/aa302294.aspx Template:Webarchive
  39. xmldiffmerge, http://www.alphaworks.ibm.com/tech/xmldiffmerge Template:Webarchive
  40. Cheney, Austin. Pretty Diff - Documentation. http://prettydiff.com/documentation.php Template:Webarchive
  41. a b Script error: No such module "citation/CS1".
  42. a b Script error: No such module "citation/CS1".
  43. Script error: No such module "citation/CS1".
  44. Script error: No such module "citation/CS1".
  45. Script error: No such module "citation/CS1".
  46. Script error: No such module "citation/CS1".
  47. Script error: No such module "Citation/CS1".
  48. Script error: No such module "citation/CS1".