UTF-EBCDIC

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:Short descriptionTemplate:Infobox character encoding

UTF-EBCDIC is a character encoding capable of encoding all 1,112,064 valid character code points in Unicode using 1 to 5 bytes (in contrast to a maximum of 4 for UTF-8).[1] It is meant to be EBCDIC-friendly, so that legacy EBCDIC applications on mainframes may process the characters without much difficulty. Its advantages for existing EBCDIC-based systems are similar to UTF-8's advantages for existing ASCII-based systems. Details on UTF-EBCDIC are defined in Unicode Technical Report #16.

To produce the UTF-EBCDIC encoded version of a series of Unicode code points, an encoding based on UTF-8 (known in the specification as UTF-8-Mod) is applied first (creating what the specification calls an I8 sequence). The main difference between this encoding and UTF-8 is that it allows Unicode code points Template:Tt through Template:Tt (the C1 control codes) to be represented as a single byte and therefore later mapped to corresponding EBCDIC control codes. In order to achieve this, UTF-8-Mod uses Template:Tt instead of Template:Tt as the format for trailing bytes in a multi-byte sequence. As this can only hold 5 bits rather than 6, the UTF-8-Mod encoding of codepoints above Template:Tt are larger than the UTF-8 encoding.

The UTF-8-Mod transformation leaves the data in an ASCII-based format (for example, Template:Tt "A" is still encoded as Template:Tt), so each byte is fed through a reversible (one-to-one) lookup table to produce the final UTF-EBCDIC encoding. For example, Template:Tt in this table maps to Template:Tt; thus the UTF-EBCDIC encoding of Template:Tt (Unicode's "A") is Template:Tt (EBCDIC's "A").

UTF-EBCDIC is rarely used, even on the EBCDIC-based mainframes for which it was designed. IBM EBCDIC-based mainframe operating systems, such as z/OS, usually use UTF-16 for complete Unicode support. For example, IBM Db2, COBOL, PL/I, Java and the IBM XML toolkit support UTF-16 on IBM mainframes.

Codepage layout

There are 160 characters with single-byte encodings in UTF-EBCDIC (compared to 128 in UTF-8). As can be seen, the single-byte portion is similar to IBM-1047 instead of IBM-37 due to the location of the square brackets. CCSID 37 has [] at hex BA and BB instead of at hex AD and BD respectively.

Template:Chset-left1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1
Template:Chset-left1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1
Template:Chset-left1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1
Template:Chset-left1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1 Template:Chset-ctrl1
Template:Chset-left1 Template:Chset-ctrl1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1
Template:Chset-left1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-cell1 Template:Chset-ctrl1

<templatestyles src="Legend/styles.css" />

  Start bytes for a sequence of that many bytes. Tooltip shows the lowest code point encoded using that start byte.

<templatestyles src="Legend/styles.css" />

  Start byte where not all combinations of continuation bytes are valid, either because it is an invalid overlong form (the tooltip shows the code point of the first valid sequence), or because it encodes a code point greater than U+10FFFF.

<templatestyles src="Legend/styles.css" />

  Continuation bytes. Tooltip shows the hexadecimal value of the 5 bits they add.

<templatestyles src="Legend/styles.css" />

  Unused, including lead bytes that can only start an invalid overlong form. For example, 0x76 because even 0x76 0x73 (which maps to the UTF-8-Mod sequence 0xC2 0xBF) would merely be an overlong encoding of U+005F (properly encoded as UTF-8-Mod 0x5F, UTF-EBCDIC 0x6D).

Script error: No such module "anchor".Oracle UTFE

Oracle UTFE is a Unicode 3.0 UTF-8 Oracle database variation, similar to the CESU-8 variant of UTF-8, where supplementary characters are encoded as two 4-byte characters rather than a single 4- or 5-byte character. It is used only on EBCDIC platforms.[2]

See also

References

<templatestyles src="Reflist/styles.css" />

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

External links

Template:Unicode navigation Template:Character encoding