8-bit clean: Difference between revisions
imported>AnomieBOT m Dating maintenance tags: {{Clarify}} |
imported>Coolclawcat Rewrote lead section and fixed grammar/formatting/links |
||
| (One intermediate revision by one other user not shown) | |||
| Line 2: | Line 2: | ||
{{More citations needed|date=June 2008}} | {{More citations needed|date=June 2008}} | ||
In [[computer network]]ing, a system is '''8-bit clean''' if it processes [[8-bit]] [[character encoding]]s without altering the [[high bit]] or treating any [[byte]] as an [[in-band]] control code. This property can describe both a [[communications protocol]] and the software and devices that implement such protocols. Although many early email systems only supported 7-bit data, the vast majority of modern email systems are 8-bit clean. | |||
== History == | == History == | ||
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters | Until the early 1990s, many programs and data transmission channels were [[character-oriented]] and treated some characters like [[end-of-text]] (ETX) as [[control character]]s. Others assumed a stream of seven-bit characters, with values between 0 and 127; for example, the [[ASCII]] standard used only seven bits per character, [[ASCII#Bit width|avoiding an eight-bit representation]] in order to save on data transmission costs. On computers and data links using [[8-bit byte]]s, this left the top [[bit]] of each byte free for use as a [[parity bit]], [[flag bit]], or metadata control bit. Seven-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-[[English language|English]]-speaking countries with larger [[alphabet]]s. | ||
[[Binary file]]s of [[octet (computing)| | [[Binary file]]s consisting of 8-bit [[octet (computing)|octet]]s cannot be transmitted through 7-bit data channels directly. To work around this, [[binary-to-text encoding]]s have been devised which use only 7-bit [[ASCII]] characters. Some of these encodings are [[uuencoding]], [[Ascii85]], [[SREC (file format)|SREC]], [[BinHex]], [[kermit (protocol)|kermit]] and [[MIME]]'s [[Base64]]. [[EBCDIC]]-based systems cannot handle all characters used in UUencoded data.{{clarify|reason=Identify problematic characters and indicate whether the issue exists for code pages 037 and 1047.|post-text=(see [[Talk:8-bit clean#EBCDIC|talk]])|date=March 2025}} However, the base64 encoding does not have this problem. | ||
==SMTP and NNTP | == SMTP and NNTP == | ||
Historically, various media were used to transfer messages, some of | Historically, various media were used to transfer messages, some of which only supported 7-bit data, so an 8-bit message had high chances to be [[mojibake|garbled]] during transmission in the 20th century. Some implementations ignored the formal discouraging of 8-bit data and allowed bytes with the [[high bit]] set to pass through. Such implementations are said to be 8-bit clean. In general, a [[communications protocol]] is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process. | ||
Many early | Many early communications protocol standards, such as {{IETF RFC|780|788|821|2821|5321}} (for [[SMTP]]), {{IETF RFC|977}} (for [[NNTP]]) and {{IETF RFC|1056|leadout=and}}, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII "transmitted as an 8-bit byte with the high-order bit cleared to zero", and some of these<ref>{{IETF RFC|780}}: Appendix A, {{IETF RFC|788}}: 4.5.2., {{IETF RFC|821}}: Appendix B, {{IETF RFC|1056}}: 4.</ref> explicitly restrict <em>all</em> data to 7-bit characters. | ||
For the first few decades of email networks (1971 to the early 1990s), most email messages were [[plain text]] in the 7-bit US-ASCII character set.<ref> John Beck. [http://www.sendmail.com/sm/open_source/docs/email_explained/ "Email Explained"]. 2011.</ref> | For the first few decades of email networks (1971 to the early 1990s), most email messages were [[plain text]] in the 7-bit US-ASCII character set.<ref> John Beck. [http://www.sendmail.com/sm/open_source/docs/email_explained/ "Email Explained"]. 2011.</ref> | ||
The | The RFC 788 definition of SMTP, like its predecessor RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.<ref>{{cite RFC | ||
| rfc = 788 | | rfc = 788 | ||
| title = SIMPLE MAIL TRANSFER PROTOCOL | | title = SIMPLE MAIL TRANSFER PROTOCOL | ||
| Line 45: | Line 45: | ||
</ref> The header field Content-Transfer-Encoding=binary{{efn|The header field Content-Transfer-Encoding{{=}}8BIT does not designate 8-bit clean, since [[CRLF]] has special significance.}} requires an 8-bit clean transport. | </ref> The header field Content-Transfer-Encoding=binary{{efn|The header field Content-Transfer-Encoding{{=}}8BIT does not designate 8-bit clean, since [[CRLF]] has special significance.}} requires an 8-bit clean transport. | ||
RFC 3977<ref>{{cite IETF | |||
| rfc = 3977 | | rfc = 3977 | ||
| title = Network News Transfer Protocol (NNTP) | | title = Network News Transfer Protocol (NNTP) | ||
| Line 51: | Line 51: | ||
| author = C. Feather | | author = C. Feather | ||
}} | }} | ||
</ref> specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel" | </ref> specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel" and changes the character set for commands to [[UTF-8]]. However, RFC 5536<ref>{{cite IETF | ||
| rfc = 5536 | | rfc = 5536 | ||
| title = Netnews Article Format | | title = Netnews Article Format | ||
| Line 59: | Line 59: | ||
| editor = K. Murchison | | editor = K. Murchison | ||
}} | }} | ||
</ref> still limits the character set to ASCII, including | </ref> still limits the character set to ASCII, including RFC 2047{{Ref RFC|2047}} and RFC 2231<ref>{{cite IETF | ||
| rfc = 2231 | | rfc = 2231 | ||
| title = MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations | | title = MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations | ||
| Line 74: | Line 68: | ||
</ref> MIME encoding of non-ASCII data. | </ref> MIME encoding of non-ASCII data. | ||
The Internet community generally adds features by ''extension'', allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be | The Internet community generally adds features by ''extension'', allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be broken and requiring that all software worldwide be upgraded to the latest standard. The recommended way to take advantage of 8-bit clean links between machines is to use the ESMTP ({{IETF RFC|1869}}) [[8BITMIME]] extension<ref>{{Cite web|url=http://www.imc.org/ietf-smtp/old-archive/msg02018.html|title=8-bit transmission in NNTP|author=Theodore Ts'o|author-link=Theodore Ts'o|author2=Keith Moore|author2-link=Keith Moore|author3=Mark Crispin|author3-link=Mark Crispin|work=[[IETF]]-SMTP mail list|date=12 September 1994|access-date=3 April 2010|archive-url=https://web.archive.org/web/20120320233721/http://www.imc.org/ietf-smtp/old-archive/msg02018.html|archive-date=20 March 2012|url-status=dead|df=dmy-all}}</ref><ref>{{Cite web|url=http://www.uni-giessen.de/faq/archiv/mail.mime-faq.part1-9/msg00002.html|title=comp.mail.mime FAQ, part 3 'What's ESMTP, and how does it affect MIME?'|work=[[Usenet]] FAQs|date=8 August 1997|access-date=3 April 2010|archive-url=https://web.archive.org/web/20120118070711/http://www.uni-giessen.de/faq/archiv/mail.mime-faq.part1-9/msg00002.html|archive-date=18 January 2012|url-status=dead|df=dmy-all}} </ref> for message bodies and the SMTP [[SMTPUTF8]]<ref>{{cite IETF| | ||
| rfc = 8531 | | rfc = 8531 | ||
| title = SMTP Extension for Internationalized Email | | title = SMTP Extension for Internationalized Email | ||
| Line 86: | Line 80: | ||
* [[32-bit clean]] | * [[32-bit clean]] | ||
* {{slink|MIME|Content-Transfer-Encoding}} | * {{slink|MIME|Content-Transfer-Encoding}} | ||
* | * [[Telnet]] | ||
== Notes == | == Notes == | ||
Latest revision as of 20:46, 10 October 2025
Template:Short description Template:More citations needed
In computer networking, a system is 8-bit clean if it processes 8-bit character encodings without altering the high bit or treating any byte as an in-band control code. This property can describe both a communications protocol and the software and devices that implement such protocols. Although many early email systems only supported 7-bit data, the vast majority of modern email systems are 8-bit clean.
History
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters like end-of-text (ETX) as control characters. Others assumed a stream of seven-bit characters, with values between 0 and 127; for example, the ASCII standard used only seven bits per character, avoiding an eight-bit representation in order to save on data transmission costs. On computers and data links using 8-bit bytes, this left the top bit of each byte free for use as a parity bit, flag bit, or metadata control bit. Seven-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets.
Binary files consisting of 8-bit octets cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data.Template:Clarify However, the base64 encoding does not have this problem.
SMTP and NNTP
Historically, various media were used to transfer messages, some of which only supported 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. Some implementations ignored the formal discouraging of 8-bit data and allowed bytes with the high bit set to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.
Many early communications protocol standards, such as Template:IETF RFC (for SMTP), Template:IETF RFC (for NNTP) and Template:IETF RFC, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII "transmitted as an 8-bit byte with the high-order bit cleared to zero", and some of these[1] explicitly restrict all data to 7-bit characters.
For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set.[2]
The RFC 788 definition of SMTP, like its predecessor RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters.[3][4][5][6]
Later, the format of email messages was redefined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images).[6] The header field Content-Transfer-Encoding=binaryTemplate:Efn requires an 8-bit clean transport.
RFC 3977[7] specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel" and changes the character set for commands to UTF-8. However, RFC 5536[8] still limits the character set to ASCII, including RFC 2047Template:Ref RFC and RFC 2231[9] MIME encoding of non-ASCII data.
The Internet community generally adds features by extension, allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be broken and requiring that all software worldwide be upgraded to the latest standard. The recommended way to take advantage of 8-bit clean links between machines is to use the ESMTP (Template:IETF RFC) 8BITMIME extension[10][11] for message bodies and the SMTP SMTPUTF8[12] extension for message headers. Despite this, some mail transfer agents, notably Exim and qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically quoted-printable, "Q-P conversion") required by Template:IETF RFC. This "just-send-8" attitude does not, in fact, cause problems in practice because virtually all modern email servers are 8-bit clean.[13]
See also
Notes
References
- ↑ Template:IETF RFC: Appendix A, Template:IETF RFC: 4.5.2., Template:IETF RFC: Appendix B, Template:IETF RFC: 4.
- ↑ John Beck. "Email Explained". 2011.
- ↑ Template:Cite RFC
- ↑ Template:Cite RFC
- ↑ Dan Sugalski. "E-mail with Attachments". "The Perl Journal". Summer 1999. "When mail was standardized way back in 1982 with RFC822, ... The only limits placed on the body were the character set (7-bit ASCII) and the maximum line length (1000 characters)."
- ↑ a b Template:Cite RFC
- ↑ Template:Cite IETF
- ↑ Template:Cite IETF
- ↑ Template:Cite IETF
- ↑ Script error: No such module "citation/CS1".
- ↑ Script error: No such module "citation/CS1".
- ↑ Template:Cite IETF
- ↑ Script error: No such module "citation/CS1".