Comm: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Maidenhair
Remove unnecessary nowiki tags
 
imported>John of Reading
m Comparison to diff: Typo fixing, replaced: signficantly → significantly
 
Line 1: Line 1:
{{Short description|Standard UNIX utility for comparing files}}
{{Short description |Shell command for comparing files}}
{{Other uses}}
{{Other uses}}
{{for|the Portuguese Order of Merit|ComM}}
{{for|the Portuguese Order of Merit|ComM}}
Line 22: Line 22:
| website                =  
| website                =  
}}
}}
The {{mono|'''comm'''}} command in the [[Unix]] family of computer [[operating system]]s is a utility that is used to compare two [[computer file|files]] for common and distinct lines. {{Mono|comm}} is specified in the [[POSIX]] standard. It has been widely available on [[Unix-like]] operating systems since the mid to late 1980s.<!-- Case Larsen BSD 1989 (as found in OpenBSD), Richard Stallman and David MacKenzie Gnu 1986 -->
<code>'''comm'''</code> is a [[shell (computing)|shell]] [[command (computing)|command]] for comparing two [[computer file |files]] for common and distinct lines. It reads the files as lines of text and outputs text as three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. Columns are typically separated with the [[tab character]]. If the input text contains lines beginning with the separator character, the output columns can become ambiguous.


==History==
For efficiency, standard implementations of {{code|comm}} expect both input files to be sequenced in the same line [[collation]] order, sorted lexically. The <code>[[sort (Unix)|sort]]</code> command can be used for this purpose. The {{code|comm}} algorithm makes use of the collating sequence of the current [[Locale (computer software)|locale]]. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.
Written by [[Lee E. McMahon]], {{Mono|comm}} first appeared in [[Version 4 Unix]].<ref name="reader">{{cite tech report
 
The command is specified in the [[POSIX]] standard. It has been widely available on [[Unix-like]] operating systems since the mid to late 1980s. Originally implemented by [[Lee E. McMahon]], the command first appeared in [[Version 4 Unix]].<ref name="reader">{{cite tech report
  | first1      = M. D.
  | first1      = M. D.
  | last1      = McIlroy
  | last1      = McIlroy
Line 33: Line 34:
  | title      = A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 |series=CSTR
  | title      = A Research Unix reader: annotated excerpts from the Programmer's Manual, 1971–1986 |series=CSTR
  | number      = 139
  | number      = 139
  | institution = Bell Labs}}</ref>
  | institution = Bell Labs}}</ref> The version in [[GNU]] [[coreutils]] was written by [[Richard Stallman]] and David MacKenzie.<ref>{{Cite web|url=https://linux.die.net/man/1/comm|title = Comm(1): Compare two sorted files line by line - Linux man page}}</ref>
 
The version of {{mono|'''comm'''}} bundled in [[GNU]] [[coreutils]] was written by [[Richard Stallman]] and David MacKenzie.<ref>{{Cite web|url=https://linux.die.net/man/1/comm|title = Comm(1): Compare two sorted files line by line - Linux man page}}</ref>
 
==Usage==
{{Mono|comm}} reads two files as input, regarded as lines of text. {{Mono|comm}} outputs one file, which contains three columns.  The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. This functionally is similar to {{Mono|[[diff]]}}.
 
Columns are typically distinguished with the {{Mono|''<tab>''}} character. If the input files contain lines beginning with the separator character, the output columns can become ambiguous.
 
For efficiency, standard implementations of {{Mono|comm}} expect both input files to be sequenced in the same line [[collation]] order, sorted lexically. The [[sort (Unix)]] command can be used for this purpose.
 
The {{Mono|comm}} algorithm makes use of the collating sequence of the current [[Locale (computer software)|locale]]. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.
 
==Return code==
Unlike {{Mono|diff}}, the return code from {{Mono|comm}} has no logical significance concerning the relationship of the two files. A return code of 0 indicates success, a return code >0 indicates an error occurred during processing.


==Example==
==Example==
Line 91: Line 78:
|}
|}


==Comparison to diff==
==Limits==
In general terms, {{Mono|diff}} is a more powerful utility than {{Mono|comm}}. The simpler {{Mono|comm}} is best suited for use in scripts.
Up to a full line must be buffered from each input file during line comparison, before the next output line is written.


The primary distinction between {{Mono|comm}} and {{Mono|diff}} is that {{Mono|comm}} discards information about the order of the lines prior to sorting.
Some implementations read lines with the function {{code|readlinebuffer()}} which does not impose any line length limits if system memory suffices.


A minor difference between {{Mono|comm}} and {{Mono|diff}} is that {{Mono|comm}} will not try to indicate that a line has "changed" between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.
Other implementations read lines with the function <code>[[fgets]]()</code>. This function requires a fixed buffer. For these implementations, the buffer is often sized according to the [[POSIX]] macro {{code|LINE_MAX}}.


==Other options==
==Comparison to diff==
'''{{Mono|comm}}''' has [[command-line option]]s to suppress any of the three columns. This is useful for scripting.
Although also a file comparison command, <code>[[diff]]</code> reports significantly different information than {{code |comm}}. In general, {{code|diff}} is more powerful than {{code|comm}}. The simpler {{code|comm}} is best suited for use in scripts.


There is also an option to read one file (but not both) from standard input.
The primary distinction between {{code|comm}} and {{code|diff}} is that {{code|comm}} discards information about the order of the lines prior to sorting.
 
==Limits==
Up to a full line must be buffered from each input file during line comparison, before the next output line is written.


Some implementations read lines with the function {{Mono|readlinebuffer()}} which does not impose any line length limits if system memory suffices.
A minor difference between {{code|comm}} and {{code|diff}} is that {{code|comm}} will not try to indicate that a line has changed between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.


Other implementations read lines with the function {{Mono|[[fgets]]()}}. This function requires a fixed buffer. For these implementations, the buffer is often sized according to the [[POSIX]] macro {{Mono|LINE_MAX}}.
Unlike for {{code|diff}}, the [[exit code]] of {{code|comm}} does not indicate whether the files match. As is typical, 0 indicates success, and other positive values indicate an error.


==See also==
==See also==
* [[Comparison of file comparison tools]]
* {{Annotated link |Comparison of file comparison tools}}
* [[List of Unix commands]]
* {{Annotated link |List of POSIX commands}}
* [[cmp (Unix)]] – character oriented file comparison
* {{Annotated link |cmp (Unix)}}
* [[cut (Unix)]] – splitting column-oriented files
* {{Annotated link |cut (Unix)}}


==References==
==References==

Latest revision as of 15:42, 20 October 2025

Template:Short description Script error: No such module "other uses". Script error: No such module "For". Template:More footnotes

Script error: No such module "Infobox".Template:Template other Script error: No such module "Check for unknown parameters".Script error: No such module "Check for conflicting parameters". comm is a shell command for comparing two files for common and distinct lines. It reads the files as lines of text and outputs text as three columns. The first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both. Columns are typically separated with the tab character. If the input text contains lines beginning with the separator character, the output columns can become ambiguous.

For efficiency, standard implementations of comm expect both input files to be sequenced in the same line collation order, sorted lexically. The sort command can be used for this purpose. The comm algorithm makes use of the collating sequence of the current locale. If the lines in the files are not both collated in accordance with the current locale, the result is undefined.

The command is specified in the POSIX standard. It has been widely available on Unix-like operating systems since the mid to late 1980s. Originally implemented by Lee E. McMahon, the command first appeared in Version 4 Unix.[1] The version in GNU coreutils was written by Richard Stallman and David MacKenzie.[2]

Example

$ cat foo
apple
banana
eggplant
$ cat bar
apple
banana
banana
zucchini
$ comm foo bar
                  apple
                  banana
          banana
eggplant
          zucchini

This shows that both files have one banana, but only bar has a second banana.

In more detail, the output file has the appearance that follows. Note that the column is interpreted by the number of leading tab characters. \t represents a tab character and \n represents a newline (Escape character#Programming and data formats).

0 1 2 3 4 5 6 7 8 9
0 \t \t a p p l e \n
1 \t \t b a n a n a \n
2 \t b a n a n a \n
3 e g g p l a n t \n
4 \t z u c c h i n i \n

Limits

Up to a full line must be buffered from each input file during line comparison, before the next output line is written.

Some implementations read lines with the function readlinebuffer() which does not impose any line length limits if system memory suffices.

Other implementations read lines with the function fgets(). This function requires a fixed buffer. For these implementations, the buffer is often sized according to the POSIX macro LINE_MAX.

Comparison to diff

Although also a file comparison command, diff reports significantly different information than comm. In general, diff is more powerful than comm. The simpler comm is best suited for use in scripts.

The primary distinction between comm and diff is that comm discards information about the order of the lines prior to sorting.

A minor difference between comm and diff is that comm will not try to indicate that a line has changed between the two files; lines are either shown in the "from file #1", "from file #2", or "in both" columns. This can be useful if one wishes two lines to be considered different even if they only have subtle differences.

Unlike for diff, the exit code of comm does not indicate whether the files match. As is typical, 0 indicates success, and other positive values indicate an error.

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

External links

Template:Sister project

Script error: No such module "Navbox". Template:Plan 9 commands Template:Core Utilities commands