Backus–Naur form: Difference between revisions
imported>Retro →Impact: Remove redundancy. |
imported>EncyclopedianWP No edit summary |
||
| Line 1: | Line 1: | ||
{{Short description|Formalism to describe programming languages}} | {{Short description|Formalism to describe programming languages}} | ||
{{Distinguish|Boyce–Codd normal form}}In [[computer science]], '''Backus–Naur form''' ('''BNF''', pronounced {{IPAc-en|ˌ|b|æ|k|ə|s|_|ˈ|n|aʊər}}), also known as '''Backus normal form''', is a notation system for defining the [[Syntax (programming languages)|syntax]] of [[ | {{Distinguish|Boyce–Codd normal form}}In [[computer science]], '''Backus–Naur form''' ('''BNF''', pronounced {{IPAc-en|ˌ|b|æ|k|ə|s|_|ˈ|n|aʊər}}), also known as '''Backus normal form''', is a notation system for defining the [[Syntax (programming languages)|syntax]] of [[programming language]]s and other [[formal language]]s, developed by [[John Backus]] and [[Peter Naur]]. It is a [[metasyntax]] for [[context-free grammar]]s, providing a precise way to outline the rules of a language's structure. | ||
It has been widely used in official specifications, manuals, and textbooks on [[programming language theory]], as well as to describe [[ | It has been widely used in official specifications, manuals, and textbooks on [[programming language theory]], as well as to describe [[document format]]s, [[instruction set]]s, and [[communication protocol]]s. Over time, variations such as [[extended Backus–Naur form]] (EBNF) and [[augmented Backus–Naur form]] (ABNF) have emerged, building on the original framework with added features. | ||
==Structure== | ==Structure== | ||
BNF specifications outline how symbols are combined to form syntactically valid sequences. Each BNF consists of three core components: a set of [[Nonterminal symbol|non-terminal symbols]], a set of [[ | BNF specifications outline how symbols are combined to form syntactically valid sequences. Each BNF consists of three core components: a set of [[Nonterminal symbol|non-terminal symbols]], a set of [[terminal symbol]]s, and a series of derivation rules.<ref name="janikow">{{cite web |last1=Janikow |first1=Cezary Z. |title=What is BNF? |url=http://www.cs.umsl.edu/~janikow/cs4280/bnf.pdf}}</ref> Non-terminal symbols represent categories or variables that can be replaced, while terminal symbols are the fixed, literal elements (such as keywords or punctuation) that appear in the final sequence. Derivation rules provide the instructions for replacing non-terminal symbols with specific combinations of symbols. | ||
A derivation rule is written in the format: | A derivation rule is written in the format: <symbol> ::= __expression__ | ||
where: | where: | ||
* <code><[[symbol]]></code><ref name="class">{{cite web |last=Naur |first=Peter |date=1961 |title=A COURSE OF ALGOL 60 PROGRAMMING with special reference to the DASK ALGOL system |url=http://archive.computerhistory.org/resources/text/algol/ACM_Algol_bulletin/1064048/frontmatter.pdf |access-date=26 March 2015 |publisher=Regnecentralen |publication-place=Copenhagen}}</ref> is a non-terminal symbol, enclosed in angle brackets (<>), identifying the category to be replaced | * <code><[[symbol]]></code><ref name="class">{{cite web |last=Naur |first=Peter |date=1961 |title=A COURSE OF ALGOL 60 PROGRAMMING with special reference to the DASK ALGOL system |url=http://archive.computerhistory.org/resources/text/algol/ACM_Algol_bulletin/1064048/frontmatter.pdf |access-date=26 March 2015 |publisher=Regnecentralen |publication-place=Copenhagen}}</ref> is a non-terminal symbol, enclosed in angle brackets (<>), identifying the category to be replaced | ||
* {{Code|1=::=}} is a metasymbol meaning "is replaced by," | * {{Code|1=::=}} is a metasymbol meaning "is replaced by," | ||
| Line 18: | Line 15: | ||
For example, in the rule {{Code|1=<opt-suffix-part> ::= "Sr." {{!}} "Jr." {{!}} ""|2=bnf}}, the entire line is the derivation rule, "Sr.", "Jr.", and "" (an empty string) are terminal symbols, and {{Code|1=<opt-suffix-part>|2=bnf}} is a non-terminal symbol. | For example, in the rule {{Code|1=<opt-suffix-part> ::= "Sr." {{!}} "Jr." {{!}} ""|2=bnf}}, the entire line is the derivation rule, "Sr.", "Jr.", and "" (an empty string) are terminal symbols, and {{Code|1=<opt-suffix-part>|2=bnf}} is a non-terminal symbol. | ||
Generating a valid sequence involves starting with a designated start symbol and iteratively applying the derivation rules.<ref name="janikow" | Generating a valid sequence involves starting with a designated start symbol and iteratively applying the derivation rules.<ref name="janikow" /> This process can extend sequences incrementally. To allow flexibility, some BNF definitions include an optional "delete" symbol (represented as an empty alternative, e.g., {{Code|1=<item> ::=<thing> {{!}}|2=bnf}} ), enabling the removal of certain elements while maintaining syntactic validity.<ref name="janikow" /> | ||
==Example== | ==Example== | ||
A practical illustration of BNF is a specification for a simplified U.S. [[Address (geography)|postal address]]: | A practical illustration of BNF is a specification for a simplified U.S. [[Address (geography)|postal address]]: | ||
<syntaxhighlight lang="bnf"><postal-address> ::= <name-part> <street-address> <zip-part> | <syntaxhighlight lang="bnf"> <postal-address> ::= <name-part> <street-address> <zip-part> | ||
<name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part> | <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part> | ||
| Line 33: | Line 30: | ||
<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | "" | <opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | "" | ||
<opt-apt-num> ::= "Apt" <apt-num> | ""</syntaxhighlight> | <opt-apt-num> ::= "Apt" <apt-num> | ""</syntaxhighlight> | ||
This translates into English as: | This translates into English as: | ||
* A postal address consists of a name-part, followed by a [[street name|street-address]] part, followed by a [[ZIP Code|zip-code]] part. | * A postal address consists of a name-part, followed by a [[street name|street-address]] part, followed by a [[ZIP Code|zip-code]] part. | ||
* A name-part consists of either: a personal-part followed by a [[last name]] followed by an optional [[Suffix (name)|suffix]] (Jr. Sr., or dynastic number) and [[end-of-line]], or a personal part followed by a name part (this rule illustrates the use of [[Recursion (computer science)|recursion]] in BNFs, covering the case of people who use multiple first and middle names and initials).<ref>{{FOLDOC|Backus-Naur+Form}}</ref> | * A name-part consists of either: a personal-part followed by a [[last name]] followed by an optional [[Suffix (name)|suffix]] (Jr. Sr., or dynastic number) and [[end-of-line]], or a personal part followed by a name part (this rule illustrates the use of [[Recursion (computer science)|recursion]] in BNFs, covering the case of people who use multiple first and middle names and initials).<ref>{{FOLDOC|Backus-Naur+Form}}</ref> | ||
| Line 46: | Line 43: | ||
==History== | ==History== | ||
The concept of using [[Rewrite rule|rewriting rules]] to describe language structure traces back to at least [[Pāṇini]], an ancient Indian Sanskrit grammarian who lived sometime between the 6th and 4th centuries [[Before Christ|BC]].<ref>{{cite web |title=Panini biography |url=http://www-gap.dcs.st-and.ac.uk/~history/Biographies/Panini.html |access-date=2014-03-22 |publisher=School of Mathematics and Statistics, University of St Andrews, Scotland}}</ref> His notation for describing [[Sanskrit]] word structure is equivalent in power to that of BNF and exhibits many similar properties.<ref>{{cite journal |last=Ingerman |first=Peter Zilahy |date=March 1967 |title="Pāṇini-Backus Form" Suggested |journal=Communications of the ACM |volume=10 |issue=3 |page=137 |doi=10.1145/363162.363165 |s2cid=52817672 |doi-access=free}}</ref> | The concept of using [[Rewrite rule|rewriting rules]] to describe language structure traces back to at least [[Pāṇini]], an ancient Indian Sanskrit grammarian who lived sometime between the 6th and 4th centuries [[Before Christ|BC]].<ref>{{cite web |title=Panini biography |url=http://www-gap.dcs.st-and.ac.uk/~history/Biographies/Panini.html |access-date=2014-03-22 |publisher=School of Mathematics and Statistics, University of St Andrews, Scotland}}</ref> His notation for describing [[Sanskrit]] word structure is equivalent in power to that of BNF and exhibits many similar properties.<ref name="Ingerman1967">{{cite journal |last=Ingerman |first=Peter Zilahy |date=March 1967 |title="Pāṇini-Backus Form" Suggested |journal=Communications of the ACM |volume=10 |issue=3 |page=137 |doi=10.1145/363162.363165 |s2cid=52817672 |doi-access=free}}</ref> | ||
In Western society, grammar was long regarded as a subject for teaching rather than scientific study; descriptions were informal and targeted at practical usage. This perspective shifted in the first half of the 20th century, when linguists such as [[Leonard Bloomfield]] and [[Zellig Harris]] began attempts to formalize language description, including [[Phrase structure rules|phrase structure]]. Meanwhile, mathematicians explored related ideas through [[Semi-Thue system|string rewriting rules]] as [[formal logical systems]], such as [[Axel Thue]] in 1914, [[Emil Post]] in the 1920s–40s,<ref>{{cite journal |last=Post |first=Emil L. |year=1943 |title=Formal Reductions of the General Combinatorial Decision Problem |journal=American Journal of Mathematics |volume=65 |issue=2 |pages=197–215 |doi=10.2307/2371804}}</ref> and [[ | In Western society, grammar was long regarded as a subject for teaching rather than scientific study; descriptions were informal and targeted at practical usage. This perspective shifted in the first half of the 20th century, when linguists such as [[Leonard Bloomfield]] and [[Zellig Harris]] began attempts to formalize language description, including [[Phrase structure rules|phrase structure]]. Meanwhile, mathematicians explored related ideas through [[Semi-Thue system|string rewriting rules]] as [[formal logical systems]], such as [[Axel Thue]] in 1914, [[Emil Post]] in the 1920s–40s,<ref>{{cite journal |last=Post |first=Emil L. |year=1943 |title=Formal Reductions of the General Combinatorial Decision Problem |journal=American Journal of Mathematics |volume=65 |issue=2 |pages=197–215 |doi=10.2307/2371804}}</ref> and [[Alan Turing]] in 1936. [[Noam Chomsky]], teaching linguistics to students of [[information theory]] at [[MIT]] combined linguistics and mathematics, adapting Thue's formalism to describe natural language syntax. In 1956, he introduced a clear distinction between generative rules (those of [[context-free grammar]]s) and transformation rules.<ref>{{cite journal |last=Chomsky |first=Noam |year=1956 |title=Three models for the description of language |journal=IRE Transactions on Information Theory |volume=2 |issue=3 |pages=113–24 |doi=10.1109/TIT.1956.1056813 |s2cid=19519474}}</ref><ref name="Chomsky19573">{{cite book |last=Chomsky |first=Noam |title=Syntactic Structures |title-link=Syntactic Structures |publisher=Mouton |year=1957 |location=The Hague}}</ref> | ||
BNF itself emerged when [[John Backus]], a programming language designer at [[IBM]], proposed a [[metalanguage]] of ''metalinguistic formulas'' to define the syntax of the new programming language IAL, known today as [[ALGOL 58]], in 1959.<ref name="Backus.19695">{{Cite book |last=Backus |first=J. W. |title=Proceedings of the International Conference on Information Processing |publisher=UNESCO |year=1959 |pages=125–132 |contribution=The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference}}</ref> This notation was formalized in the [[ALGOL 60]] report, where [[Peter Naur]] named it ''Backus normal form'' in the committee's 1963 report.<ref name=" | BNF itself emerged when [[John Backus]], a programming language designer at [[IBM]], proposed a [[metalanguage]] of ''metalinguistic formulas'' to define the syntax of the new programming language IAL, known today as [[ALGOL 58]], in 1959.<ref name="Backus.19695">{{Cite book |last=Backus |first=J. W. |title=Proceedings of the International Conference on Information Processing |publisher=UNESCO |year=1959 |pages=125–132 |contribution=The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference}}</ref> This notation was formalized in the [[ALGOL 60]] report, where [[Peter Naur]] named it ''Backus normal form'' in the committee's 1963 report.<ref name="ALGOL60RPT">Revised ALGOL 60 report section 1.1. {{cite web |title=ALGOL 60 |url=http://www.masswerk.at/algol60/report.htm |access-date=April 18, 2015}}</ref> Whether Backus was directly influenced by Chomsky's work is uncertain.<ref>{{cite web |last=Fulton |first=Scott M., III |date=20 March 2007 |title=John W. Backus (1924 - 2007) |url=http://betanews.com/2007/03/20/john-w-backus-1924-2007 |access-date=Jun 3, 2014 |publisher=BetaNews, Inc.}}</ref><ref>{{cite report |url=https://archive.computerhistory.org/resources/text/Oral_History/Backus_John/Backus_John_1.oral_history.2006.102657970.pdf |title=Oral History of John Backus |author=John Backus |date=Sep 2006 |publisher=Computer History Museum |editor=Grady Booch}} Here: p.25</ref> | ||
[[Donald Knuth]] argued in 1964 that BNF should be read as ''Backus–Naur form'', as it is "not a [[Normal form (abstract rewriting)|normal form]] in the conventional sense," unlike [[Chomsky normal form]].<ref>{{cite journal |last=Knuth |first=Donald E. |year=1964 |title=Backus Normal Form vs. Backus Naur Form |journal=Communications of the ACM |volume=7 |issue=12 |pages=735–736 |doi=10.1145/355588.365140 |s2cid=47537431 |doi-access=free}}</ref> In 1967, Peter Zilahy Ingerman suggested renaming it ''Pāṇini Backus form'' to acknowledge Pāṇini's earlier, independent development of a similar notation.<ref | [[Donald Knuth]] argued in 1964 that BNF should be read as ''Backus–Naur form'', as it is "not a [[Normal form (abstract rewriting)|normal form]] in the conventional sense," unlike [[Chomsky normal form]].<ref>{{cite journal |last=Knuth |first=Donald E. |year=1964 |title=Backus Normal Form vs. Backus Naur Form |journal=Communications of the ACM |volume=7 |issue=12 |pages=735–736 |doi=10.1145/355588.365140 |s2cid=47537431 |doi-access=free}}</ref> In 1967, Peter Zilahy Ingerman suggested renaming it ''Pāṇini Backus form'' to acknowledge Pāṇini's earlier, independent development of a similar notation.<ref name="Ingerman1967" /> | ||
In the ALGOL 60 report, Naur described BNF as a ''metalinguistic formula'':<ref name=ALGOL60RPT | In the ALGOL 60 report, Naur described BNF as a ''metalinguistic formula'':<ref name="ALGOL60RPT" /> | ||
{{blockquote|Sequences of characters enclosed in the brackets <> represent metalinguistic variables whose values are sequences of symbols. The marks "::{{=}}" and "{{pipe}}" (the latter with the meaning of "or") are metalinguistic connectives. Any mark in a formula, which is not a variable or a connective, denotes itself. Juxtaposition of marks or variables in a formula signifies juxtaposition of the sequence denoted.}} | {{blockquote|Sequences of characters enclosed in the brackets <> represent metalinguistic variables whose values are sequences of symbols. The marks "::{{=}}" and "{{pipe}}" (the latter with the meaning of "or") are metalinguistic connectives. Any mark in a formula, which is not a variable or a connective, denotes itself. Juxtaposition of marks or variables in a formula signifies juxtaposition of the sequence denoted.}} | ||
| Line 60: | Line 57: | ||
This is exemplified in the report's section 2.3, where comments are specified:<blockquote>For the purpose of including text among the symbols of a program the following "comment" conventions hold: | This is exemplified in the report's section 2.3, where comments are specified:<blockquote>For the purpose of including text among the symbols of a program the following "comment" conventions hold: | ||
{| class="wikitable" | {| class="wikitable" | ||
! The sequence of basic symbols: | ! The sequence of basic symbols: | ||
! is equivalent to | ! is equivalent to | ||
| Line 77: | Line 74: | ||
Naur altered Backus's original symbols for ALGOL 60, changing <code>:≡</code> to <code>::=</code> and the overbarred "{{overline|or}}" to <code>|</code>, using commonly available characters.<ref name="Backus.1969"> | Naur altered Backus's original symbols for ALGOL 60, changing <code>:≡</code> to <code>::=</code> and the overbarred "{{overline|or}}" to <code>|</code>, using commonly available characters.<ref name="Backus.1969"> | ||
{{Cite book |last=Backus |first=J. W. |author-link=John W. Backus |title=Proceedings of the International Conference on Information Processing |publisher=UNESCO |year=1959 |pages=125–132 |contribution=The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference |contribution-url=http://www.softwarepreservation.org/projects/ALGOL/paper/Backus-Syntax_and_Semantics_of_Proposed_IAL.pdf/view}}</ref>{{rp|14}} | {{Cite book |last=Backus |first=J. W. |author-link=John W. Backus |title=Proceedings of the International Conference on Information Processing |publisher=UNESCO |year=1959 |pages=125–132 |contribution=The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference |contribution-url=http://www.softwarepreservation.org/projects/ALGOL/paper/Backus-Syntax_and_Semantics_of_Proposed_IAL.pdf/view}}</ref>{{rp|14}} | ||
BNF is very similar to [[Canonical form (Boolean algebra)|canonical-form]] [[Boolean algebra]] equations (used in logic-circuit design), reflecting Backus's mathematical background as a FORTRAN designer.<ref name=" | BNF is very similar to [[Canonical form (Boolean algebra)|canonical-form]] [[Boolean algebra]] equations (used in logic-circuit design), reflecting Backus's mathematical background as a FORTRAN designer.<ref name="class" /> Studies of Boolean algebra were commonly part of a mathematics curriculum, which may have informed Backus's approach. Neither Backus nor Naur described the names enclosed in <code>< ></code> as non-terminals—Chomsky's terminology was not originally used in describing BNF. Naur later called them "classes" in 1961 course materials.<ref name="class" /> In the ALGOL 60 report, they were "metalinguistic variables," with other symbols defining the target language. | ||
[[Saul Rosen]], involved with the [[Association for Computing Machinery]] since 1947, contributed to the transition from IAL to ALGOL and edited Communications of the ACM. He described BNF as a metalanguage for ALGOL in his 1967 book.<ref>{{cite book |author=Saul Rosen |title=Programming Systems and Languages |location=New York |publisher=McGraw Hill |series=McGraw Hill Computer Science Series |date=Jan 1967 |isbn=978-0070537088 |url=https://archive.org/details/programmingsyste0000unse/mode/1up |url-access=registration}}</ref> Early ALGOL manuals from IBM, Honeywell, Burroughs, and Digital Equipment Corporation followed this usage. | [[Saul Rosen]], involved with the [[Association for Computing Machinery]] since 1947, contributed to the transition from IAL to ALGOL and edited Communications of the ACM. He described BNF as a metalanguage for ALGOL in his 1967 book.<ref>{{cite book |author=Saul Rosen |title=Programming Systems and Languages |location=New York |publisher=McGraw Hill |series=McGraw Hill Computer Science Series |date=Jan 1967 |isbn=978-0070537088 |url=https://archive.org/details/programmingsyste0000unse/mode/1up |url-access=registration}}</ref> Early ALGOL manuals from IBM, Honeywell, Burroughs, and Digital Equipment Corporation followed this usage. | ||
| Line 86: | Line 83: | ||
BNF significantly influenced programming language development, notably as the basis for early [[compiler-compiler]] systems. Examples include Edgar T. Irons' "A Syntax Directed Compiler for ALGOL 60" and Brooker and Morris' "A Compiler Building System," which directly utilized BNF.<ref>{{cite book |last1=McKeeman |first1=W. M. |title=A Compiler Generator |last2=Horning |first2=J.J. |last3=Wortman |first3=D. B. |publisher=Prentice-Hall |year=1970 |isbn=978-0-13-155077-3}}</ref> Others, like Schorre's [[META II]], adapted BNF into a programming language, replacing <code>< ></code> with quoted strings and adding operators like $ for repetition, as in:<syntaxhighlight lang="ebnf"> EXPR = TERM $('+' TERM .OUT('ADD') | '-' TERM .OUT('SUB')); </syntaxhighlight> | BNF significantly influenced programming language development, notably as the basis for early [[compiler-compiler]] systems. Examples include Edgar T. Irons' "A Syntax Directed Compiler for ALGOL 60" and Brooker and Morris' "A Compiler Building System," which directly utilized BNF.<ref>{{cite book |last1=McKeeman |first1=W. M. |title=A Compiler Generator |last2=Horning |first2=J.J. |last3=Wortman |first3=D. B. |publisher=Prentice-Hall |year=1970 |isbn=978-0-13-155077-3}}</ref> Others, like Schorre's [[META II]], adapted BNF into a programming language, replacing <code>< ></code> with quoted strings and adding operators like $ for repetition, as in:<syntaxhighlight lang="ebnf"> EXPR = TERM $('+' TERM .OUT('ADD') | '-' TERM .OUT('SUB')); </syntaxhighlight> | ||
This influenced tools like [[yacc]], a widely used [[parser generator]] rooted in BNF principles.<ref>{{Citation |title=Source forge |type=project |url= | This influenced tools like [[yacc]], a widely used [[parser generator]] rooted in BNF principles.<ref name="bnfparser2">{{Citation |title=Source forge |type=project |url=https://bnfparser2.sourceforge.net/ |contribution=BNF parser²}}</ref> BNF remains one of the oldest computer-related notations still referenced today, though its variants often dominate modern applications. | ||
Examples of its use as a metalanguage include defining arithmetic expressions:<syntaxhighlight lang="bnf"> <expr> ::= <term> | <expr> <addop> <term> </syntaxhighlight>Here, {{Code|1=<expr>|2=bnf}} can recursively include itself, allowing repeated additions. | Examples of its use as a metalanguage include defining arithmetic expressions:<syntaxhighlight lang="bnf"> <expr> ::= <term> | <expr> <addop> <term> </syntaxhighlight>Here, {{Code|1=<expr>|2=bnf}} can recursively include itself, allowing repeated additions. | ||
| Line 92: | Line 89: | ||
==BNF representation of itself== | ==BNF representation of itself== | ||
{{Tall image|Bnf-syntax-diagram.png|250|500|BNF [[syntax diagram]]|alt=BNF syntax diagram}} | |||
BNF's syntax itself may be represented with a BNF like the following: | BNF's syntax itself may be represented with a BNF like the following: | ||
<syntaxhighlight lang="bnf"> | <syntaxhighlight lang="bnf" line linelinks="syntax"> | ||
<syntax> ::= <rule> | <rule> <syntax> | |||
<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end> | |||
<opt-whitespace> ::= " " <opt-whitespace> | "" | |||
<expression> ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression> | |||
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end> | |||
<list> ::= <term> | <term> <opt-whitespace> <list> | |||
<term> ::= <literal> | "<" <rule-name> ">" | |||
<literal> ::= '"' <text1> '"' | "'" <text2> "'" | |||
<text1> ::= "" | <character1> <text1> | |||
<text2> ::= "" | <character2> <text2> | |||
<character> ::= <letter> | <digit> | <symbol> | |||
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" | |||
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | |||
<symbol> ::= "|" | " " | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | ">" | "=" | "<" | "?" | "@" | "[" | "\" | "]" | "^" | "_" | "`" | "{" | "}" | "~" | |||
<character1> ::= <character> | "'" | |||
<character2> ::= <character> | '"' | |||
<rule-name> ::= <letter> | <rule-name> <rule-char> | |||
<rule-char> ::= <letter> | <digit> | "-" | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Note that "" is the [[empty string]]. | Note that "" is the [[empty string]]. | ||
The original BNF did not use quotes as shown in <code><literal></code> rule. This assumes that no [[whitespace (computer science)|whitespace]] is necessary for proper interpretation of the rule. | The original BNF did not use quotes as shown in [[#syntax-8|<code><literal></code>]] rule. This assumes that no [[whitespace (computer science)|whitespace]] is necessary for proper interpretation of the rule. | ||
<code><EOL></code> represents the appropriate [[newline|line-end]] specifier (in [[ASCII]], carriage-return, line-feed or both depending on the [[operating system]]). <code><rule-name></code> and <code><text></code> are to be substituted with a declared rule's name/label or literal text, respectively. | [[#syntax-5|<code><EOL></code>]] represents the appropriate [[newline|line-end]] specifier (in [[ASCII]], carriage-return, line-feed or both depending on the [[operating system]]). [[#syntax-17|<code><rule-name></code>]] and <code><text></code> are to be substituted with a declared rule's name/label or literal text, respectively. | ||
In the U.S. postal address example above, the entire block-quote is a <code><syntax></code>. | In the U.S. postal address example above, the entire block-quote is a [[#syntax-1|<code><syntax></code>]]. Each line or unbroken grouping of lines is a rule; for example one rule begins with <code><name-part> ::=</code>. The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a vertical bar <code>|</code>. These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name. | ||
==Variants== | ==Variants== | ||
| Line 129: | Line 124: | ||
=== EBNF === | === EBNF === | ||
{{Main|Extended Backus–Naur form}} | {{Main|Extended Backus–Naur form}} | ||
There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of [[regular expression]] repetition operators such as <code>*</code> and <code>+</code>. The [[extended Backus–Naur form]] (EBNF) is a common one. | There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of [[regular expression]] repetition operators such as <code>*</code> and <code>+</code>. The [[extended Backus–Naur form]] (EBNF) is a common one. | ||
| Line 135: | Line 131: | ||
=== ABNF === | === ABNF === | ||
{{Main|ABNF}} | {{Main|ABNF}} | ||
[[Augmented Backus–Naur form]] (ABNF) and Routing Backus–Naur form (RBNF)<ref>[http://tools.ietf.org/html/rfc5511 RBNF].</ref> are extensions commonly used to describe [[Internet Engineering Task Force]] (IETF) [[protocol (computing)|protocol]]s. | [[Augmented Backus–Naur form]] (ABNF) and Routing Backus–Naur form (RBNF)<ref>[http://tools.ietf.org/html/rfc5511 RBNF].</ref> are extensions commonly used to describe [[Internet Engineering Task Force]] (IETF) [[protocol (computing)|protocol]]s. | ||
| Line 140: | Line 137: | ||
=== Others === | === Others === | ||
Many BNF specifications found online today are intended to be human-readable and are non-formal. | Many BNF specifications found online today are intended to be human-readable and are non-formal. These often include many of the following syntax rules and extensions: | ||
* Optional items enclosed in square brackets: <code>[<item-x>]</code>. | * Optional items enclosed in square brackets: <code>[<item-x>]</code>. | ||
* Items existing 0 or more times are enclosed in curly brackets or suffixed with an asterisk (<code>*</code>) such as <code><word> ::= <letter> {<letter>}</code> or <code><word> ::= <letter> <letter>*</code> respectively. | * Items existing 0 or more times are enclosed in curly brackets or suffixed with an asterisk (<code>*</code>) such as <code><word> ::= <letter> {<letter>}</code> or <code><word> ::= <letter> <letter>*</code> respectively. | ||
| Line 157: | Line 154: | ||
* XACT X4MR System,<ref>{{Citation | title = Act world | contribution = Tools | url = http://www.actworld.com/tools/| archive-url = https://web.archive.org/web/20130129075050/http://www.actworld.com/tools/| url-status = dead| archive-date = 2013-01-29}}</ref> a rule-based expert system for programming language translation | * XACT X4MR System,<ref>{{Citation | title = Act world | contribution = Tools | url = http://www.actworld.com/tools/| archive-url = https://web.archive.org/web/20130129075050/http://www.actworld.com/tools/| url-status = dead| archive-date = 2013-01-29}}</ref> a rule-based expert system for programming language translation | ||
* [[XPL]] Analyzer, a tool which accepts simplified BNF for a language and produces a parser for that language in XPL; it may be integrated into the supplied SKELETON program, with which the language may be debugged<ref>If the target processor is System/360, or related, even up to z/System, and the target language is similar to PL/I (or, indeed, XPL), then the required code "emitters" may be adapted from XPL's "emitters" for System/360.</ref> (a [[SHARE (computing)|SHARE]] contributed program, which was preceded by ''A Compiler Generator''<ref>{{cite book |title=A Compiler Generator |first1=W. M. |last1=McKeeman |first2=J.J. |last2=Horning |first3=D. B. |last3=Wortman |year=1970 |publisher=Prentice-Hall |isbn=978-0-13-155077-3 |url=https://archive.org/details/compilergenerato00mcke |url-access=registration}}</ref>) | * [[XPL]] Analyzer, a tool which accepts simplified BNF for a language and produces a parser for that language in XPL; it may be integrated into the supplied SKELETON program, with which the language may be debugged<ref>If the target processor is System/360, or related, even up to z/System, and the target language is similar to PL/I (or, indeed, XPL), then the required code "emitters" may be adapted from XPL's "emitters" for System/360.</ref> (a [[SHARE (computing)|SHARE]] contributed program, which was preceded by ''A Compiler Generator''<ref>{{cite book |title=A Compiler Generator |first1=W. M. |last1=McKeeman |first2=J.J. |last2=Horning |first3=D. B. |last3=Wortman |year=1970 |publisher=Prentice-Hall |isbn=978-0-13-155077-3 |url=https://archive.org/details/compilergenerato00mcke |url-access=registration}}</ref>) | ||
* bnfparser<sup>2</sup>,<ref | * bnfparser<sup>2</sup>,<ref name="bnfparser2" /> a universal syntax verification utility | ||
* bnf2xml,<ref>[ | * bnf2xml,<ref>[https://sourceforge.net/projects/bnf2xml/ bnf2xml]</ref> Markup input with XML tags using advanced BNF matching | ||
* [[JavaCC]],<ref>{{Cite web |url=https://javacc.java.net/ |title=JavaCC |access-date=2013-09-25 |archive-url=https://web.archive.org/web/20130608172614/https://javacc.java.net/ |archive-date=2013-06-08 |url-status=dead }}</ref> Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator | * [[JavaCC]],<ref>{{Cite web |url=https://javacc.java.net/ |title=JavaCC |access-date=2013-09-25 |archive-url=https://web.archive.org/web/20130608172614/https://javacc.java.net/ |archive-date=2013-06-08 |url-status=dead }}</ref> Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator | ||
===Similar software=== | ===Similar software=== | ||
*[[GNU bison]], GNU version of yacc | * [[GNU bison]], GNU version of yacc | ||
*[[Yacc]], parser generator (most commonly used with the [[Lex (software)|Lex]] preprocessor) | * [[Yacc]], parser generator (most commonly used with the [[Lex (software)|Lex]] preprocessor) | ||
* Racket's parser tools, lex and yacc-style parsing (Beautiful Racket edition) | * Racket's parser tools, lex and yacc-style parsing (Beautiful Racket edition) | ||
*[[Qlik]] Sense, a BI tool, uses a variant of BNF for scripting <ref>{{cite web |title=Script Syntax - Qlik Sense on Windows |url=https://help.qlik.com/en-US/sense/May2021/Subsystems/Hub/Content/Sense_Hub/Scripting/script-syntax.htm |access-date=10 January 2022 |website=Qlik.com |publisher=QlikTech International AB |ref=qlikscriptsyntax}}</ref> | * [[Qlik]] Sense, a BI tool, uses a variant of BNF for scripting <ref>{{cite web |title=Script Syntax - Qlik Sense on Windows |url=https://help.qlik.com/en-US/sense/May2021/Subsystems/Hub/Content/Sense_Hub/Scripting/script-syntax.htm |access-date=10 January 2022 |website=Qlik.com |publisher=QlikTech International AB |ref=qlikscriptsyntax}}</ref> | ||
* BNF Converter (BNFC<ref>{{Citation |title=Language technology |url=http://bnfc.digitalgrammars.com/ |contribution=BNFC |place=[[Sweden|SE]] |publisher=Chalmers}}</ref>), operating on a variant called "labeled Backus–Naur form" (LBNF). In this variant, each production for a given non-terminal is given a label, which can be used as a constructor of an [[algebraic data type]] representing that nonterminal. The converter is capable of producing types and parsers for [[abstract syntax]] in several languages, including [[Haskell (programming language)|Haskell]] and Java | * BNF Converter (BNFC<ref>{{Citation |title=Language technology |url=http://bnfc.digitalgrammars.com/ |contribution=BNFC |place=[[Sweden|SE]] |publisher=Chalmers}}</ref>), operating on a variant called "labeled Backus–Naur form" (LBNF). In this variant, each production for a given non-terminal is given a label, which can be used as a constructor of an [[algebraic data type]] representing that nonterminal. The converter is capable of producing types and parsers for [[abstract syntax]] in several languages, including [[Haskell (programming language)|Haskell]] and Java | ||
| Line 185: | Line 182: | ||
==External links== | ==External links== | ||
* {{Citation | url = http://www.garshol.priv.no/download/text/bnf.html | title = BNF and EBNF: What are they and how do they work? | first = Lars Marius | last = Garshol | publisher = Priv | place = [[Norway|NO]]}}. | * {{Citation | url = http://www.garshol.priv.no/download/text/bnf.html | title = BNF and EBNF: What are they and how do they work? | first = Lars Marius | last = Garshol | publisher = Priv | place = [[Norway|NO]]}}. | ||
* {{IETF RFC|5234}} | * {{IETF RFC|5234}} — Augmented BNF for Syntax Specifications: ABNF. | ||
* {{IETF RFC|5511}} | * {{IETF RFC|5511}} — Routing BNF: A Syntax Used in Various Protocol Specifications. | ||
* ISO/IEC 14977:1996(E) ''Information technology | * ISO/IEC 14977:1996(E) ''Information technology – Syntactic metalanguage – Extended BNF'', available from {{Citation | contribution = Publicly available | title = Standards | url = http://standards.iso.org/ittf/PubliclyAvailableStandards/ | publisher = ISO}} or from {{Citation | url = http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf | first = Marcus | last = Kuhn | publisher = CAM | place = [[United Kingdom|UK]] | title = Iso 14977 }} <small>(the latter is missing the cover page, but is otherwise much cleaner)</small> | ||
===Language grammars=== | ===Language grammars=== | ||
| Line 193: | Line 190: | ||
* {{Citation | url = http://savage.net.au/SQL/ | title = Savage | contribution = BNF grammars for SQL-92, SQL-99 and SQL-2003 | publisher = Net | place = [[Australia|AU]]}}, freely available BNF grammars for [[SQL]]. | * {{Citation | url = http://savage.net.au/SQL/ | title = Savage | contribution = BNF grammars for SQL-92, SQL-99 and SQL-2003 | publisher = Net | place = [[Australia|AU]]}}, freely available BNF grammars for [[SQL]]. | ||
* {{Citation | url = http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html | contribution = BNF Web Club | publisher = Unige | place = [[Switzerland|CH]] | title = DB research | access-date = 2007-01-25 | archive-url = https://web.archive.org/web/20070124000335/http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html | archive-date = 2007-01-24 | url-status = dead }}, freely available BNF grammars for SQL, [[Ada (programming language)|Ada]], [[Java (programming language)|Java]]. | * {{Citation | url = http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html | contribution = BNF Web Club | publisher = Unige | place = [[Switzerland|CH]] | title = DB research | access-date = 2007-01-25 | archive-url = https://web.archive.org/web/20070124000335/http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html | archive-date = 2007-01-24 | url-status = dead }}, freely available BNF grammars for SQL, [[Ada (programming language)|Ada]], [[Java (programming language)|Java]]. | ||
* {{Citation | url = http://www.thefreecountry.com/sourcecode/grammars.shtml | contribution = | * {{Citation | url = http://www.thefreecountry.com/sourcecode/grammars.shtml | contribution = Free Programming Language Grammars for Compiler Construction | title = Source code | publisher = The free country}}, freely available BNF/[[Extended Backus–Naur form|EBNF]] grammars for C/C++, [[Pascal (programming language)|Pascal]], [[COBOL]], [[Ada (programming language)|Ada 95]], [[PL/I]]. | ||
* {{Citation | url = | * {{Citation | url = https://exp-engine.svn.sourceforge.net/viewvc/exp-engine/engine/trunk/docs/ | archive-url = https://archive.today/20121225083955/http://exp-engine.svn.sourceforge.net/viewvc/exp-engine/engine/trunk/docs/ | url-status = dead | archive-date = 2012-12-25 | contribution = BNF files related to the STEP standard | title = Exp engine | publisher = Source forge | type = [[Apache Subversion|SVN]] }}. Includes [[List of STEP (ISO 10303) parts|parts 11, 14, and 21]] of the [[ISO 10303]] (STEP) standard. | ||
{{Metasyntax}} | {{Metasyntax}} | ||
Latest revision as of 21:27, 12 November 2025
Template:Short description Script error: No such module "Distinguish".In computer science, Backus–Naur form (BNF, pronounced Template:IPAc-en), also known as Backus normal form, is a notation system for defining the syntax of programming languages and other formal languages, developed by John Backus and Peter Naur. It is a metasyntax for context-free grammars, providing a precise way to outline the rules of a language's structure.
It has been widely used in official specifications, manuals, and textbooks on programming language theory, as well as to describe document formats, instruction sets, and communication protocols. Over time, variations such as extended Backus–Naur form (EBNF) and augmented Backus–Naur form (ABNF) have emerged, building on the original framework with added features.
Structure
BNF specifications outline how symbols are combined to form syntactically valid sequences. Each BNF consists of three core components: a set of non-terminal symbols, a set of terminal symbols, and a series of derivation rules.[1] Non-terminal symbols represent categories or variables that can be replaced, while terminal symbols are the fixed, literal elements (such as keywords or punctuation) that appear in the final sequence. Derivation rules provide the instructions for replacing non-terminal symbols with specific combinations of symbols.
A derivation rule is written in the format: <symbol> ::= __expression__
where:
<symbol>[2] is a non-terminal symbol, enclosed in angle brackets (<>), identifying the category to be replaced::=is a metasymbol meaning "is replaced by,"__expression__is the replacement, consisting of one or more sequences of symbols—either terminal symbols (e.g., literal text like "Sr." or ",") or non-terminal symbols (e.g.,<last-name>)—with options separated by a vertical bar (|) to indicate alternatives.
For example, in the rule <opt-suffix-part> ::= "Sr." | "Jr." | "", the entire line is the derivation rule, "Sr.", "Jr.", and "" (an empty string) are terminal symbols, and <opt-suffix-part> is a non-terminal symbol.
Generating a valid sequence involves starting with a designated start symbol and iteratively applying the derivation rules.[1] This process can extend sequences incrementally. To allow flexibility, some BNF definitions include an optional "delete" symbol (represented as an empty alternative, e.g., <item> ::=<thing> | ), enabling the removal of certain elements while maintaining syntactic validity.[1]
Example
A practical illustration of BNF is a specification for a simplified U.S. postal address:
<postal-address> ::= <name-part> <street-address> <zip-part>
<name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <personal-part> <name-part>
<personal-part> ::= <first-name> | <initial> "."
<street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
<zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
<opt-suffix-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
<opt-apt-num> ::= "Apt" <apt-num> | ""
This translates into English as:
- A postal address consists of a name-part, followed by a street-address part, followed by a zip-code part.
- A name-part consists of either: a personal-part followed by a last name followed by an optional suffix (Jr. Sr., or dynastic number) and end-of-line, or a personal part followed by a name part (this rule illustrates the use of recursion in BNFs, covering the case of people who use multiple first and middle names and initials).[3]
- A personal-part consists of either a first name or an initial followed by a dot.
- A street address consists of a house number, followed by a street name, followed by an optional apartment specifier, followed by an end-of-line.
- A zip-part consists of a town-name, followed by a comma, followed by a state code, followed by a ZIP-code followed by an end-of-line.
- An opt-suffix-part consists of a suffix, such as "Sr.", "Jr." or a roman-numeral, or an empty string (i.e. nothing).
- An opt-apt-num consists of a prefix "Apt" followed by an apartment number, or an empty string (i.e. nothing).
Note that many things (such as the format of a first-name, apartment number, ZIP-code, and Roman numeral) are left unspecified here. If necessary, they may be described using additional BNF rules.
History
The concept of using rewriting rules to describe language structure traces back to at least Pāṇini, an ancient Indian Sanskrit grammarian who lived sometime between the 6th and 4th centuries BC.[4] His notation for describing Sanskrit word structure is equivalent in power to that of BNF and exhibits many similar properties.[5]
In Western society, grammar was long regarded as a subject for teaching rather than scientific study; descriptions were informal and targeted at practical usage. This perspective shifted in the first half of the 20th century, when linguists such as Leonard Bloomfield and Zellig Harris began attempts to formalize language description, including phrase structure. Meanwhile, mathematicians explored related ideas through string rewriting rules as formal logical systems, such as Axel Thue in 1914, Emil Post in the 1920s–40s,[6] and Alan Turing in 1936. Noam Chomsky, teaching linguistics to students of information theory at MIT combined linguistics and mathematics, adapting Thue's formalism to describe natural language syntax. In 1956, he introduced a clear distinction between generative rules (those of context-free grammars) and transformation rules.[7][8]
BNF itself emerged when John Backus, a programming language designer at IBM, proposed a metalanguage of metalinguistic formulas to define the syntax of the new programming language IAL, known today as ALGOL 58, in 1959.[9] This notation was formalized in the ALGOL 60 report, where Peter Naur named it Backus normal form in the committee's 1963 report.[10] Whether Backus was directly influenced by Chomsky's work is uncertain.[11][12]
Donald Knuth argued in 1964 that BNF should be read as Backus–Naur form, as it is "not a normal form in the conventional sense," unlike Chomsky normal form.[13] In 1967, Peter Zilahy Ingerman suggested renaming it Pāṇini Backus form to acknowledge Pāṇini's earlier, independent development of a similar notation.[5]
In the ALGOL 60 report, Naur described BNF as a metalinguistic formula:[10]
<templatestyles src="Template:Blockquote/styles.css" />
Sequences of characters enclosed in the brackets <> represent metalinguistic variables whose values are sequences of symbols. The marks "::=" and "|" (the latter with the meaning of "or") are metalinguistic connectives. Any mark in a formula, which is not a variable or a connective, denotes itself. Juxtaposition of marks or variables in a formula signifies juxtaposition of the sequence denoted.
Script error: No such module "Check for unknown parameters". This is exemplified in the report's section 2.3, where comments are specified:
For the purpose of including text among the symbols of a program the following "comment" conventions hold:
The sequence of basic symbols: is equivalent to ; comment <any sequence not containing ';'>; ; begin comment <any sequence not containing ';'>; begin end <any sequence not containing 'end' or ';' or 'else'> end Equivalence here means that any of the three structures shown in the left column may be replaced, in any occurrence outside of strings, by the symbol shown in the same line in the right column without any effect on the action of the program.
Naur altered Backus's original symbols for ALGOL 60, changing :≡ to ::= and the overbarred "or" to |, using commonly available characters.[14]Template:Rp
BNF is very similar to canonical-form Boolean algebra equations (used in logic-circuit design), reflecting Backus's mathematical background as a FORTRAN designer.[2] Studies of Boolean algebra were commonly part of a mathematics curriculum, which may have informed Backus's approach. Neither Backus nor Naur described the names enclosed in < > as non-terminals—Chomsky's terminology was not originally used in describing BNF. Naur later called them "classes" in 1961 course materials.[2] In the ALGOL 60 report, they were "metalinguistic variables," with other symbols defining the target language.
Saul Rosen, involved with the Association for Computing Machinery since 1947, contributed to the transition from IAL to ALGOL and edited Communications of the ACM. He described BNF as a metalanguage for ALGOL in his 1967 book.[15] Early ALGOL manuals from IBM, Honeywell, Burroughs, and Digital Equipment Corporation followed this usage.
Impact
BNF significantly influenced programming language development, notably as the basis for early compiler-compiler systems. Examples include Edgar T. Irons' "A Syntax Directed Compiler for ALGOL 60" and Brooker and Morris' "A Compiler Building System," which directly utilized BNF.[16] Others, like Schorre's META II, adapted BNF into a programming language, replacing < > with quoted strings and adding operators like $ for repetition, as in:
EXPR = TERM $('+' TERM .OUT('ADD') | '-' TERM .OUT('SUB'));
This influenced tools like yacc, a widely used parser generator rooted in BNF principles.[17] BNF remains one of the oldest computer-related notations still referenced today, though its variants often dominate modern applications.
Examples of its use as a metalanguage include defining arithmetic expressions:
<expr> ::= <term> | <expr> <addop> <term>
Here, <expr> can recursively include itself, allowing repeated additions.
BNF today is one of the oldest computer-related languages still in use.Script error: No such module "Unsubst".
BNF representation of itself
BNF's syntax itself may be represented with a BNF like the following:
<syntax> ::= <rule> | <rule> <syntax>
<rule> ::= <opt-whitespace> "<" <rule-name> ">" <opt-whitespace> "::=" <opt-whitespace> <expression> <line-end>
<opt-whitespace> ::= " " <opt-whitespace> | ""
<expression> ::= <list> | <list> <opt-whitespace> "|" <opt-whitespace> <expression>
<line-end> ::= <opt-whitespace> <EOL> | <line-end> <line-end>
<list> ::= <term> | <term> <opt-whitespace> <list>
<term> ::= <literal> | "<" <rule-name> ">"
<literal> ::= '"' <text1> '"' | "'" <text2> "'"
<text1> ::= "" | <character1> <text1>
<text2> ::= "" | <character2> <text2>
<character> ::= <letter> | <digit> | <symbol>
<letter> ::= "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<symbol> ::= "|" | " " | "!" | "#" | "$" | "%" | "&" | "(" | ")" | "*" | "+" | "," | "-" | "." | "/" | ":" | ";" | ">" | "=" | "<" | "?" | "@" | "[" | "\" | "]" | "^" | "_" | "`" | "{" | "}" | "~"
<character1> ::= <character> | "'"
<character2> ::= <character> | '"'
<rule-name> ::= <letter> | <rule-name> <rule-char>
<rule-char> ::= <letter> | <digit> | "-"
Note that "" is the empty string.
The original BNF did not use quotes as shown in <literal> rule. This assumes that no whitespace is necessary for proper interpretation of the rule.
<EOL> represents the appropriate line-end specifier (in ASCII, carriage-return, line-feed or both depending on the operating system). <rule-name> and <text> are to be substituted with a declared rule's name/label or literal text, respectively.
In the U.S. postal address example above, the entire block-quote is a <syntax>. Each line or unbroken grouping of lines is a rule; for example one rule begins with <name-part> ::=. The other part of that rule (aside from a line-end) is an expression, which consists of two lists separated by a vertical bar |. These two lists consists of some terms (three terms and two terms, respectively). Each term in this particular rule is a rule-name.
Variants
EBNF
Script error: No such module "Labelled list hatnote".
There are many variants and extensions of BNF, generally either for the sake of simplicity and succinctness, or to adapt it to a specific application. One common feature of many variants is the use of regular expression repetition operators such as * and +. The extended Backus–Naur form (EBNF) is a common one.
Another common extension is the use of square brackets around optional items. Although not present in the original ALGOL 60 report (instead introduced a few years later in IBM's PL/I definition), the notation is now universally recognised.
ABNF
Script error: No such module "Labelled list hatnote".
Augmented Backus–Naur form (ABNF) and Routing Backus–Naur form (RBNF)[18] are extensions commonly used to describe Internet Engineering Task Force (IETF) protocols.
Parsing expression grammars build on the BNF and regular expression notations to form an alternative class of formal grammar, which is essentially analytic rather than generative in character.
Others
Many BNF specifications found online today are intended to be human-readable and are non-formal. These often include many of the following syntax rules and extensions:
- Optional items enclosed in square brackets:
[<item-x>]. - Items existing 0 or more times are enclosed in curly brackets or suffixed with an asterisk (
*) such as<word> ::= <letter> {<letter>}or<word> ::= <letter> <letter>*respectively. - Items existing 1 or more times are suffixed with an addition (plus) symbol,
+, such as<word> ::= <letter>+. - Terminals may appear in bold rather than italics, and non-terminals in plain text rather than angle brackets.
- Where items are grouped, they are enclosed in simple parentheses.
Software using BNF or variants
Software that accepts BNF (or a superset) as input
- ANTLR, a parser generator written in Java
- Coco/R, compiler generator accepting an attributed grammar in EBNF
- DMS Software Reengineering Toolkit, program analysis and transformation system for arbitrary languages
- GOLD, a BNF parser generator
- RPA BNF parser.[19] Online (PHP) demo parsing: JavaScript, XML
- XACT X4MR System,[20] a rule-based expert system for programming language translation
- XPL Analyzer, a tool which accepts simplified BNF for a language and produces a parser for that language in XPL; it may be integrated into the supplied SKELETON program, with which the language may be debugged[21] (a SHARE contributed program, which was preceded by A Compiler Generator[22])
- bnfparser2,[17] a universal syntax verification utility
- bnf2xml,[23] Markup input with XML tags using advanced BNF matching
- JavaCC,[24] Java Compiler Compiler tm (JavaCC tm) - The Java Parser Generator
Similar software
- GNU bison, GNU version of yacc
- Yacc, parser generator (most commonly used with the Lex preprocessor)
- Racket's parser tools, lex and yacc-style parsing (Beautiful Racket edition)
- Qlik Sense, a BI tool, uses a variant of BNF for scripting [25]
- BNF Converter (BNFC[26]), operating on a variant called "labeled Backus–Naur form" (LBNF). In this variant, each production for a given non-terminal is given a label, which can be used as a constructor of an algebraic data type representing that nonterminal. The converter is capable of producing types and parsers for abstract syntax in several languages, including Haskell and Java
See also
- Augmented Backus–Naur form (ABNF)
- Compiler Description Language (CDL)
- Definite clause grammar – a more expressive alternative to BNF used in Prolog
- Extended Backus–Naur form (EBNF)
- Meta-II – an early compiler writing tool and notation
- Syntax diagram – railroad diagram
- Translational Backus–Naur form (TBNF)
- Van Wijngaarden grammar – used in preference to BNF to define Algol68
- Wirth syntax notation – an alternative to BNF from 1977
References
External links
- Script error: No such module "citation/CS1"..
- Template:IETF RFC — Augmented BNF for Syntax Specifications: ABNF.
- Template:IETF RFC — Routing BNF: A Syntax Used in Various Protocol Specifications.
- ISO/IEC 14977:1996(E) Information technology – Syntactic metalanguage – Extended BNF, available from Script error: No such module "citation/CS1". or from Script error: No such module "citation/CS1". (the latter is missing the cover page, but is otherwise much cleaner)
Language grammars
- Script error: No such module "citation/CS1"., the original BNF.
- Script error: No such module "citation/CS1"., freely available BNF grammars for SQL.
- Script error: No such module "citation/CS1"., freely available BNF grammars for SQL, Ada, Java.
- Script error: No such module "citation/CS1"., freely available BNF/EBNF grammars for C/C++, Pascal, COBOL, Ada 95, PL/I.
- Script error: No such module "citation/CS1".. Includes parts 11, 14, and 21 of the ISO 10303 (STEP) standard.
- ↑ a b c Script error: No such module "citation/CS1".
- ↑ a b c Script error: No such module "citation/CS1".
- ↑ Template:Talk other
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Revised ALGOL 60 report section 1.1. Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Template:Cite report Here: p.25
- ↑ Script error: No such module "Citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ a b Script error: No such module "citation/CS1".
- ↑ RBNF.
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ If the target processor is System/360, or related, even up to z/System, and the target language is similar to PL/I (or, indeed, XPL), then the required code "emitters" may be adapted from XPL's "emitters" for System/360.
- ↑ Script error: No such module "citation/CS1".
- ↑ bnf2xml
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".