Base64: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Bruce1ee
m Reverted edit by 2600:1011:A18B:8D74:D9B2:DDAF:BD88:CB7C (talk) to last version by Karl 334
imported>Monkeysmashingkeyboards
m Reverted edits by ~2025-34682-05 (talk) (WS)
Line 1: Line 1:
{{Short description|Group of binary-to-text encoding schemes}}
{{Short description| Encoding for a sequence of byte values using 64 printable characters}}
In [[computer programming]], '''Base64''' is a group of [[binary-to-text encoding]] schemes that transforms [[binary data]] into a sequence of [[Graphic character|printable]] characters, limited to a set of 64 unique characters.  More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.


As with all binary-to-text encoding schemes, Base64 is designed to carry data stored in binary formats across channels that only reliably support text content. Base64 is particularly prevalent on the [[World Wide Web]]<ref>{{cite web |title= Base64 encoding and decoding – Web APIs |url= https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |publisher= MDN Web Docs |archive-url= https://web.archive.org/web/20141111151440/https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |archive-date= 2014-11-11 |url-status=live}}</ref> where one of its uses is the ability to embed [[image files]] or other binary assets inside textual assets such as [[HTML]] and [[CSS]] files.<ref>{{cite web |title= When to base64 encode images (and when not to) |date= 28 August 2011 |url=https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-url= https://web.archive.org/web/20230829143759/https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-date=2023-08-29 |url-status=live}}</ref>
'''Base64''' is a [[binary-to-text encoding]] that uses 64 [[Graphic character|printable characters]] to represent each 6-bit segment of a sequence of byte<ref>technically [[octet (computing)|octet]]</ref> values. As for all binary-to-text encodings, Base64 encoding enables [[data transmission|transmitting]] [[binary data]] on a [[communication channel]] that only supports text.


Base64 is also widely used for sending [[e-mail]] attachments, because [[Simple Mail Transfer Protocol|SMTP]]&nbsp;– in its original form&nbsp;– was designed to transport [[7-bit ASCII]] characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers will not interfere with the attachment.  
When comparing the original data to the resulting encoded data, Base64 encoding increases the size by 33% plus about 4% additional if inserting line breaks for typical line length.


Base64 encoding causes an overhead of 33–37% relative to the size of the original binary data (33% by the encoding itself; up to 4% more by the inserted line breaks).
The earliest uses of this encoding were for dial-up communication between systems running the same [[operating system]] – for example, [[uuencoding|uuencode]] for [[UNIX]] and [[BinHex]] for the [[TRS-80]] (later adapted for the [[Macintosh]]) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.<ref name="rfc 1421">{{cite IETF |title= Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures |rfc= 1421 |date=February 1993 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="rfc 2045">{{cite IETF |title= Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies |rfc= 2045 |date=November 1996 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="rfc 3548">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 3548 |date=July 2003 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="autogenerated2006"/>


{{TOC limit|3}}
==Applications==
[[File:35_mm_angle_of_view_vs_focal_length.svg|thumb|link={{filepath:35_mm_angle_of_view_vs_focal_length.svg}}|Example of an SVG file containing embedded JPEG images encoded in Base64<ref>&lt;image xlink:href="data:image/jpeg;base64,<code>JPEG contents encoded in Base64</code>" ... /&gt;</ref>]]
Notable applications of Base64:
 
; Web pages: Encoding as Base64 is prevalent on the [[World Wide Web]]<ref>{{cite web |title= Base64 encoding and decoding – Web APIs |url= https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |publisher= MDN Web Docs |archive-url= https://web.archive.org/web/20141111151440/https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/Base64_encoding_and_decoding |archive-date= 2014-11-11 |url-status=live}}</ref> where is it often used to embed binary data such as a digital image in text such as [[HTML]] and [[CSS]].<ref>{{cite web |title= When to base64 encode images (and when not to) |date= 28 August 2011 |url=https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-url= https://web.archive.org/web/20230829143759/https://www.davidbcalhoun.com/2011/when-to-base64-encode-images-and-when-not-to/ |archive-date=2023-08-29 |url-status=live}}</ref>
 
; E-mail attachment: Base64 is widely used for sending [[e-mail]] attachments, because [[Simple Mail Transfer Protocol|SMTP]]&nbsp;– in its original form&nbsp;– was designed to transport [[7-bit ASCII]] characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers correctly transmit messages with attached binary information.
 
; Embed binary data in a text file: For example, to include the data of an image in a script to avoid depending on external files.
 
; Embed binary data in XML: To embed binary data in an [[XML]] file, using a syntax similar to <code><nowiki><data encoding="base64">...</data></nowiki></code> e.g. [[favicon]]s in [[Firefox]]'s exported <code>bookmarks.html</code>.
 
; Embed PDF file: To embed a [[PDF]] file in an HTML page.<ref>{{Cite web |title=Encode PDF (Portable Document Format) File (.pdf) to Base64 Data |url=https://base64.online/encoders/encode-pdf-to-base64?utm_campaign=og |access-date=2024-03-21 |website=base64.online |language=en}}</ref>
 
; Embedded elements: Although not part of the official specification for the [[SVG]] format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.<ref>{{cite web|url=http://jsfiddle.net/MxHPq/|title=Edit fiddle |website=jsfiddle.net}}</ref>
 
; Prevent delimiter collision: To transmit and store text that might otherwise cause [[delimiter collision]].
 
; LDAP Data Interchange Format: To encode character strings in [[LDAP Data Interchange Format]] files.
 
; Data URI scheme: The [[data URI scheme]] can use Base64 to represent file contents. For instance, background images and fonts can be specified in a [[CSS]] stylesheet file as <code>data:</code> URIs, instead of being supplied in separate files.
 
; Leverage clipboard: To store/transmit relatively small amounts of binary data via a computer's text [[Clipboard (computing)|clipboard]] functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of [[cryptocurrency]] recipients as Base64 encoded text strings, which can be easily copied and pasted into users' [[Cryptocurrency wallet|wallet software]].
 
; Support human verification: Binary data that must be quickly verified by humans as a safety mechanism, such as [[Checksum|file checksums]] or [[Public key fingerprint|key fingerprints]], is often represented in Base64 for easy checking, sometimes with additional formatting, such as separating each group of four characters in the representation of a [[Pretty Good Privacy|PGP]] key fingerprint with a space.


==Design==
; QR code encoding: A [[QR code]], which contains binary data, is sometimes stored as Base64 since it is more likely that a QR code reader accurately decodes text than binary data. Also, some devices more readily save text from a QR code than potentially malicious binary data.
The particular set of 64 characters chosen to represent the 64-digit values for the base varies between implementations. The general strategy is to choose 64 characters that are common to most encodings and that are also printable. This combination leaves the data unlikely to be modified in transit through information systems, such as email, that were traditionally not [[8-bit clean]].<ref name="autogenerated2006">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 4648 |date=October 2006 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref> For example, [[MIME]]'s Base64 implementation uses <code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>, and <code>0</code>–<code>9</code> for the first 62 values. Other variations share this property but differ in the symbols chosen for the last two values; an example is [[UTF-7]].


The earliest instances of this type of encoding were created for dial-up communication between systems running the same [[operating system|OS]] – for example, [[uuencoding|uuencode]] for [[UNIX]] and [[BinHex]] for the [[TRS-80]] (later adapted for the [[Macintosh]]) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.<ref name="rfc 1421">{{cite IETF |title= Privacy Enhancement for InternetElectronic Mail: Part I: Message Encryption and Authentication Procedures |rfc= 1421 |date=February 1993 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="rfc 2045">{{cite IETF |title= Multipurpose Internet Mail Extensions: (MIME) Part One: Format of Internet Message Bodies |rfc= 2045 |date=November 1996 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="rfc 3548">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 3548 |date=July 2003 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref><ref name="autogenerated2006"/>
==Alphabet==
The set of characters used to represent the values for each base-64 digit (value from 0 to 63) differs slightly between the variations of Base64. The general strategy is to use printable characters that are common to most [[character encoding]]s. This tends to result in data remaining unchanged as it moves through information systems, such as email, that were traditionally not [[8-bit clean]].<ref name="autogenerated2006">{{cite IETF |title= The Base16, Base32, and Base64 Data Encodings |rfc= 4648 |date=October 2006 |publisher=[[Internet Engineering Task Force|IETF]] |access-date= March 18, 2010}}</ref> Typically, an encoding uses <code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>, and <code>0</code>–<code>9</code> for the first 62 values. Many variants use <code>+</code> and <code>/</code> for the last two.


==Base64 table from RFC 4648==
<span id="Base64table">Per [https://datatracker.ietf.org/doc/html/rfc4648#section-4 RFC 4648 §4], the following table lists the characters used for each numeric value.</span> To indicate padding, <code>=</code> is used.
<span id="Base64table">This is the Base64 alphabet defined in [https://datatracker.ietf.org/doc/html/rfc4648#section-4 RFC 4648 §4] .</span> See also {{sectionlink||Variants summary table}}.


{|class="wikitable" style="text-align:center"
{|class="wikitable" style="text-align:center"
|+ Base64 alphabet defined in RFC 4648.
|+ Base64 alphabet
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}}
!scope="col"| value !!scope="col"| {{abbr|char|Character}}
|rowspan="17"|
|rowspan="17"|
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}}
!scope="col"| value !!scope="col"| {{abbr|char|Character}}
|rowspan="17"|
|rowspan="17"|
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}}
!scope="col"| value !!scope="col"| {{abbr|char|Character}}
|rowspan="17"|
|rowspan="17"|
!scope="col"| Index !!scope="col"| Binary !!scope="col"| {{abbr|Char.|Character}}
!scope="col"| value !!scope="col"| {{abbr|char|Character}}
|-
|  0 || 000000 || <code>A</code> || 16 || 010000 || <code>Q</code> || 32 || 100000 || <code>g</code> || 48 || 110000 || <code>w</code>
|-
|-
1 || 000001 || <code>B</code> || 17 || 010001 || <code>R</code> || 33 || 100001 || <code>h</code> || 49 || 110001 || <code>x</code>
0 || <code>A</code> || 16 || <code>Q</code> || 32 || <code>g</code> || 48 || <code>w</code>
|-
|-
2 || 000010 || <code>C</code> || 18 || 010010 || <code>S</code> || 34 || 100010 || <code>i</code> || 50 || 110010 || <code>y</code>
1 || <code>B</code> || 17 || <code>R</code> || 33 || <code>h</code> || 49 || <code>x</code>
|-
|-
3 || 000011 || <code>D</code> || 19 || 010011 || <code>T</code> || 35 || 100011 || <code>j</code> || 51 || 110011 || <code>z</code>
2 || <code>C</code> || 18 || <code>S</code> || 34 || <code>i</code> || 50 || <code>y</code>
|-
|-
4 || 000100 || <code>E</code> || 20 || 010100 || <code>U</code> || 36 || 100100 || <code>k</code> || 52 || 110100 || <code>0</code>
3 || <code>D</code> || 19 || <code>T</code> || 35 || <code>j</code> || 51 || <code>z</code>
|-
|-
5 || 000101 || <code>F</code> || 21 || 010101 || <code>V</code> || 37 || 100101 || <code>l</code> || 53 || 110101 || <code>1</code>
4 || <code>E</code> || 20 || <code>U</code> || 36 || <code>k</code> || 52 || <code>0</code>
|-
|-
6 || 000110 || <code>G</code> || 22 || 010110 || <code>W</code> || 38 || 100110 || <code>m</code> || 54 || 110110 || <code>2</code>
5 || <code>F</code> || 21 || <code>V</code> || 37 || <code>l</code> || 53 || <code>1</code>
|-
|-
7 || 000111 || <code>H</code> || 23 || 010111 || <code>X</code> || 39 || 100111 || <code>n</code> || 55 || 110111 || <code>3</code>
6 || <code>G</code> || 22 || <code>W</code> || 38 || <code>m</code> || 54 || <code>2</code>
|-
|-
8 || 001000 || <code>I</code> || 24 || 011000 || <code>Y</code> || 40 || 101000 || <code>o</code> || 56 || 111000 || <code>4</code>
7 || <code>H</code> || 23 || <code>X</code> || 39 || <code>n</code> || 55 || <code>3</code>
|-
|-
9 || 001001 || <code>J</code> || 25 || 011001 || <code>Z</code> || 41 || 101001 || <code>p</code> || 57 || 111001 || <code>5</code>
8 || <code>I</code> || 24 || <code>Y</code> || 40 || <code>o</code> || 56 || <code>4</code>
|-
|-
| 10 || 001010 || <code>K</code> || 26 || 011010 || <code>a</code> || 42 || 101010 || <code>q</code> || 58 || 111010 || <code>6</code>
| 9 || <code>J</code> || 25 || <code>Z</code> || 41 || <code>p</code> || 57 || <code>5</code>
|-
|-
| 11 || 001011 || <code>L</code> || 27 || 011011 || <code>b</code> || 43 || 101011 || <code>r</code> || 59 || 111011 || <code>7</code>
| 10 || <code>K</code> || 26 || <code>a</code> || 42 || <code>q</code> || 58 || <code>6</code>
|-
|-
| 12 || 001100 || <code>M</code> || 28 || 011100 || <code>c</code> || 44 || 101100 || <code>s</code> || 60 || 111100 || <code>8</code>
| 11 || <code>L</code> || 27 || <code>b</code> || 43 || <code>r</code> || 59 || <code>7</code>
|-
|-
| 13 || 001101 || <code>N</code> || 29 || 011101 || <code>d</code> || 45 || 101101 || <code>t</code> || 61 || 111101 || <code>9</code>
| 12 || <code>M</code> || 28 || <code>c</code> || 44 || <code>s</code> || 60 || <code>8</code>
|-
|-
| 14 || 001110 || <code>O</code> || 30 || 011110 || <code>e</code> || 46 || 101110 || <code>u</code> || 62 || 111110 || <code>+</code>
| 13 || <code>N</code> || 29 || <code>d</code> || 45 || <code>t</code> || 61 || <code>9</code>
|-
|-
| 15 || 001111 || <code>P</code> || 31 || 011111 || <code>f</code> || 47 || 101111 || <code>v</code> || 63 || 111111 || <code>/</code>
| 14 || <code>O</code> || 30 || <code>e</code> || 46 || <code>u</code> || 62 || <code>+</code>
|-
|-
| colspan="12" | || colspan="2" {{n/a|Padding}} || =
| 15 || <code>P</code> || 31 || <code>f</code> || 47 || <code>v</code> || 63 || <code>/</code>
|}
|}
Note that Base64URL encoding replaces '+' with '-' and '/' with '_' to make the encoded string HTTP-safe and avoid the need for escaping.


==Examples==
==Examples==
The example below uses [[ASCII]] text for simplicity, but this is not a typical use case, as it can already be safely transferred across all systems that can handle Base64. The more typical use is to encode [[binary data]] (such as an image); the resulting Base64 data will only contain 64 different ASCII characters, all of which can reliably be transferred across systems that may corrupt the raw source bytes.
To simplify explanation, the example below uses [[ASCII]] text for input even though this is not a typical use. More commonly, input is [[binary data]], such as an image, and the result then represents binary data in a printable text format.
 
For the input data:
 
Many hands make light work.
 
The typical Base64 represented is:


Here is a well-known [[idiom]] from [[distributed computing]]:
TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu


{{Quote box
===Encoding when no padding needed===
| align = none
| style = margin:1em 0;
| border = 2px
| fontsize = 800
| quote = Many hands make light work.
}}


When the quote (without trailing whitespace) is encoded into Base64, it is represented as a byte sequence of 8-bit-padded [[ASCII]] characters encoded in [[MIME]]'s Base64 scheme as follows (newlines and white spaces may be present anywhere but are to be ignored on decoding):
Each input sequence of 6 bits (which can encode 2<sup>6</sup>&nbsp;=&nbsp;64 values) is mapped to a Base64 alphabet letter. Therefore, Base64 encoding results in four characters for each three input bytes. Assuming the input is ASCII or similar, the byte-data for the first three characters 'M', 'a', 'n' are values <code>77</code>, <code>97</code>, and <code>110</code> which in 8-bit binary representation are <code>01001101</code>, <code>01100001</code>, and <code>01101110</code>. Joining these representations and splitting into 6-bit groups gives:


{{Quote box
  010011 010110 000101 101110
  | align = none
| style = margin:1em 0;
| border = 2px
| fontsize = 800
| quote={{mono|1=TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu}}
}}


In the above quote, the encoded value of ''Man'' is ''TWFu''. Encoded in ASCII, the characters ''M'', ''a'', and ''n'' are stored as the byte values <code>77</code>, <code>97</code>, and <code>110</code>, which are the 8-bit binary values <code>01001101</code>, <code>01100001</code>, and <code>01101110</code>. These three values are joined together into a 24-bit string, producing <code>010011010110000101101110</code>. Groups of 6 bits (6 bits have a maximum of 2<sup>6</sup>&nbsp;=&nbsp;64 different binary values) are [[Binary number#Counting in binary|converted into individual numbers]] from start to end (in this case, there are four numbers in a 24-bit string), which are then converted into their corresponding Base64 character values.
Which encodes the string {{code|TWFu}} (per ASCII or similar).


As this example illustrates, Base64 encoding converts three [[octet (computing)|octets]] into four encoded characters.
The following table shows how input is encoded. For example, the letter 'M' has the value 77 (per ASCII and similar). The first 6 bits of the value is <code>010011</code> or 19 decimal which maps to Base64 letter 'T' which has a value 84 (per ASCII and similar).


{| class="wikitable" style="text-align:center;"
{| class="wikitable" style="text-align:center;"
|+ Encoding of the source string ⟨Man⟩ in Base64
|+ Encoding 'M', 'a', 'n' as Base64
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! rowspan=2 scope="row" | Source <br/>ASCII text
! rowspan=2 scope="row" | input <br> (ASCII)
! scope="row" | Character
! scope="row" | letter (ASCII)
| colspan="8" | M
| colspan="8" | M
| colspan="8" | a
| colspan="8" | a
| colspan="8" | n
| colspan="8" | n
|-
|-
! scope="row" | Octets
! scope="row" | 8-bit <br> decimal value
| colspan="8" | 77 (0x4d)
| colspan="8" | 77
| colspan="8" | 97 (0x61)
| colspan="8" | 97
| colspan="8" | 110 (0x6e)
| colspan="8" | 110
|-
|-
! colspan=2 scope="row" | Bits
! colspan=2 scope="row" | bits
| 0 || 1 || 0 || 0 || 1 || 1 || 0 || 1
| 0 || 1 || 0 || 0 || 1 || 1 || 0 || 1
| 0 || 1 || 1 || 0 || 0 || 0 || 0 || 1
| 0 || 1 || 1 || 0 || 0 || 0 || 0 || 1
| 0 || 1 || 1 || 0 || 1 || 1 || 1 || 0
| 0 || 1 || 1 || 0 || 1 || 1 || 1 || 0
|-
|-
! rowspan=3 scope="row" | Base64<br/>encoded
! rowspan=3 scope="row" | encoded <br> (Base64)
! scope="row" | Sextets
! scope="row" | 6-bit <br> decimal value
| colspan="6" | 19
| colspan="6" | 19
| colspan="6" | 22
| colspan="6" | 22
Line 116: Line 132:
| colspan="6" | 46
| colspan="6" | 46
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! scope="row" | Character
! scope="row" | letter <br> (Base64 alphabet)
| colspan="6" | T
| colspan="6" | T
| colspan="6" | W
| colspan="6" | W
Line 122: Line 138:
| colspan="6" | u
| colspan="6" | u
|-
|-
! scope="row" | Octets
! scope="row" | byte
| colspan="6" | 84 (0x54)
| colspan="6" | 84
| colspan="6" | 87 (0x57)
| colspan="6" | 87
| colspan="6" | 70 (0x46)
| colspan="6" | 70
| colspan="6" | 117 (0x75)
| colspan="6" | 117
|}
|}


<code>=</code> padding characters might be added to make the last encoded block contain four Base64 characters.
===Encoding with one padding character===
 
[[Hexadecimal]] to [[octal]] transformation is useful to convert between binary and Base64. Such conversion is available for both advanced calculators and programming languages. For example, the hexadecimal representation of the 24 bits above is 4D616E. The octal representation is 23260556. Those 8 octal digits can be split into pairs ({{nowrap|23 26 05 56}}), and each pair is converted to decimal to yield {{nowrap|19 22 05 46}}. Using those four decimal numbers as indices for the Base64 alphabet, the corresponding ASCII characters are ''TWFu''.


If there are only two significant input octets (e.g., 'Ma'), or when the last input group contains only two octets, all 16 bits will be captured in the first three Base64 digits (18 bits); the two [[least significant bit]]s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding <code>=</code> padding character):
If the input consists of a number of bytes that is 2 more than a multiple of 3 (e.g. 'M', 'a'), then the last 2 bytes (16 bits) are encoded in 3 Base64 digits (18 bits). The two [[least significant bit]]s of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing <code>=</code> padding character).


{|class="wikitable" style="text-align:center;"
{|class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! rowspan=2 scope="row"  | Source <br/>ASCII text
! rowspan=2 scope="row"  | input <br> (ASCII)
! scope="row"            | Character
! scope="row"            | letter (ASCII)
| colspan="8"            | M
| colspan="8"            | M
| colspan="8"            | a
| colspan="8"            | a
| colspan="8" rowspan="2"  {{n/a|}}
| colspan="8" rowspan="2"  {{n/a|}}
|-
|-
! scope="row" | Octets
! scope="row" | 8-bit <br> decimal value
| colspan="8" | 77 (0x4d)
| colspan="8" | 77
| colspan="8" | 97 (0x61)
| colspan="8" | 97
|-
|-
! colspan=2 scope="row" | Bits
! colspan=2 scope="row" | bits
| 0 || 1 || 0 || 0 || 1 || 1
| 0 || 1 || 0 || 0 || 1 || 1
| 0 || 1 || 0 || 1 || 1 || 0
| 0 || 1 || 0 || 1 || 1 || 0
Line 160: Line 174:
| {{n/a|{{fsp}}}}
| {{n/a|{{fsp}}}}
|-
|-
! rowspan=3 scope="row" | Base64<br/>encoded
! rowspan=3 scope="row" | encoded <br> (Base64)
! scope="row" | Sextets
! scope="row" | 6-bit <br> decimal value
| colspan="6" | 19
| colspan="6" | 19
| colspan="6" | 22
| colspan="6" | 22
Line 167: Line 181:
| colspan="6"  {{n/a|Padding}}
| colspan="6"  {{n/a|Padding}}
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! scope="row" | Character
! scope="row" | letter <br> (Base64 alphabet)
| colspan="6" | T
| colspan="6" | T
| colspan="6" | W
| colspan="6" | W
Line 173: Line 187:
| colspan="6" | =
| colspan="6" | =
|-
|-
! scope="row" | Octets
! scope="row" | byte
| colspan="6" | 84 (0x54)
| colspan="6" | 84
| colspan="6" | 87 (0x57)
| colspan="6" | 87
| colspan="6" | 69 (0x45)
| colspan="6" | 69
| colspan="6" | 61 (0x3D)
| colspan="6" | 61
|}
|}


If there is only one significant input octet (e.g., 'M'), or when the last input group contains only one octet, all 8 bits will be captured in the first two Base64 digits (12 bits); the four [[least significant bit]]s of the last content-bearing 6-bit block will turn out to be zero, and discarded on decoding (along with the succeeding two <code>=</code> padding characters):
===Encoding with two padding characters===
 
If the input consists of a number of bytes that is 1 more than a multiple of 3 (e.g. 'M'), then the last 8 bits are represented in 2 Base64 digits (12 bits). The four [[least significant bit]]s of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing two <code>=</code> padding characters):


{| class="wikitable" style="text-align:center;"
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! rowspan=2 scope="row"    | Source <br/>ASCII text
! rowspan=2 scope="row"    | input <br> (ASCII)
! scope="row"              | Character
! scope="row"              | letter (ASCII)
| colspan="8"              | M
| colspan="8"              | M
| colspan="16" rowspan="2"  {{n/a|}}
| colspan="16" rowspan="2"  {{n/a|}}
|-
|-
! scope="row" | Octets
! scope="row" | 8-bit <br> decimal value
| colspan="8" | 77 (0x4d)
| colspan="8" | 77
|-
|-
! colspan=2 scope="row" | Bits
! colspan=2 scope="row" | bits
| 0 || 1 || 0 || 0 || 1 || 1
| 0 || 1 || 0 || 0 || 1 || 1
| 0 || 1
| 0 || 1
Line 214: Line 230:
| {{n/a|{{fsp}}}}
| {{n/a|{{fsp}}}}
|-
|-
! rowspan=3 scope="row" | Base64 <br/>encoded
! rowspan=3 scope="row" | encoded <br> (Base64)
! scope="row" | Sextets
! scope="row" | 6-bit <br> decimal value
| colspan="6" | 19
| colspan="6" | 19
| colspan="6" | 16
| colspan="6" | 16
Line 221: Line 237:
| colspan="6"  {{n/a|Padding}}
| colspan="6"  {{n/a|Padding}}
|- style="font-weight:bold;"
|- style="font-weight:bold;"
! scope="row" | Character
! scope="row" | letter <br> (Base64 alphabet)
| colspan="6" | T
| colspan="6" | T
| colspan="6" | Q
| colspan="6" | Q
Line 227: Line 243:
| colspan="6" | =
| colspan="6" | =
|-
|-
! scope="row" | Octets
! scope="row" | byte
| colspan="6" | 84 (0x54)
| colspan="6" | 84
| colspan="6" | 81 (0x51)
| colspan="6" | 81
| colspan="6" | 61 (0x3D)
| colspan="6" | 61
| colspan="6" | 61 (0x3D)
| colspan="6" | 61
|}
|}


===Output padding===
===Decoding with padding===
Because Base64 is a six-bit encoding, and because the decoded values are divided into 8-bit octets, every four characters of Base64-encoded text (4 sextets = {{times|4|6}} = 24 bits) represents three octets of unencoded text or data (3 octets = {{times|3|8}} = 24 bits). This means that when the length of the unencoded input is not a multiple of three, the encoded output must have padding added so that its length is a multiple of four. The padding character is <code>=</code>, which indicates that no further bits are needed to fully encode the input. (This is different from <code>A</code>, which means that the remaining bits are all zeros.) The example below illustrates how truncating the input of the above quote changes the output padding:
When decoding, each sequence of four encoded characters is converted to three output bytes, but with a single padding character the final 4 characters decode to only two bytes, or with two padding characters, the final 4 characters decode to a single byte. For example:


<!-- This is the encoding of **THE WHOLE** of the above passage and the ending fits in with both the above encoding and the first line of the following example, verified using
{| class="wikitable" style="text-align:center;"
http://www.motobit.com/util/base64-decoder-encoder.asp
In the previous version, the example started with a space, which was not visible and thus quite misleading. -->
{|class="wikitable"
! scope="col" colspan=2 | Input
! scope="col" colspan=2 | Output
! scope="col" rowspan=2 | Padding
|-
! scope="col" | Text
! scope="col" | Length
! scope="col" | Text
! scope="col" | Length
|-
| ''light {{bg|lightgrey|wor}}{{bg|#cef2e0|k.}}'' || 11
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}{{bg|#cef2e0|2=ay4=}}}} || 16
| 1
|-
| ''light {{bg|lightgrey|wor}}{{bg|#cef2e0|k}}'' || 10
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}{{bg|#cef2e0|2=aw==}}}} || 16
| 2
|-
| ''light {{bg|lightgrey|wor}}'' || 9
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}} || 12
| 0
|-
| ''light {{bg|lightgrey|wo}}'' || 8
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=d28=}}}} || 12
| 1
|-
| ''light {{bg|lightgrey|w}}'' || 7
| {{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}} || 12
| 2
|}
 
The padding character is not essential for decoding, since the number of missing bytes can be inferred from the length of the encoded text. In some implementations, the padding character is mandatory, while for others it is not used. An exception in which padding characters are required is when multiple Base64 encoded files have been concatenated.
 
===Decoding Base64 with padding===
When decoding Base64 text, four characters are typically converted back to three bytes. The only exceptions are when padding characters exist. A single <code>=</code> indicates that the four characters will decode to only two bytes, while <code>==</code> indicates that the four characters will decode to only a single byte. For example:
 
{| class="wikitable"
! Encoded !! Padding !! Length !! Decoded
! Encoded !! Padding !! Length !! Decoded
|-
|-
Line 291: Line 268:
| ''light {{bg|lightgrey|wor}}''
| ''light {{bg|lightgrey|wor}}''
|}
|}
Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a <code>=</code> is encountered. For example, when `{{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}}` is decoded, we convert each character (except the trailing occurrences of <code>=</code>) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first <code>=</code> and another 2 trailing bits for the other <code>=</code>. In this instance, we would get 6 bits from the <code>d</code>, and another 6 bits from the <code>w</code> for a bit string of length 12, but since we remove 2 bits for each <code>=</code> (for a total of 4 bits), the <code>dw==</code> ends up producing 8 bits (1 byte) when decoded.
Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a <code>=</code> is encountered. For example, when {{mono|1=bGlnaHQg{{bg|lightgrey|2=dw==}}}} is decoded, we convert each character (except the trailing occurrences of <code>=</code>) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first <code>=</code> and another 2 trailing bits for the other <code>=</code>. In this instance, we would get 6 bits from the <code>d</code>, and another 6 bits from the <code>w</code> for a bit string of length 12, but since we remove 2 bits for each <code>=</code> (for a total of 4 bits), the <code>dw==</code> ends up producing 8 bits (1 byte) when decoded.
 
===Decoding without padding===
Use of the padding character in encoded text is ''not'' essential for decoding. The number of missing bytes can be inferred from the length of the encoded text. In some variants, the padding character is mandatory, while for others it is not used. Notably, when [[string concatenation|concatenating]] Base64 encoded strings, then use of padding characters is required.


===Decoding Base64 without padding===
Without padding, after decoding each sequence of 4 encoded characters, there may be 2 or 3 encoded characters left over. A single remaining encoded character is not possible because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte. The first character contributes 6 bits, and the second character contributes its first 2 bits. The following table demonstrates decoding encoded strings that have 2, 3 or no left-over characters.
Without padding, after normal decoding of four characters to three bytes over and over again, fewer than four encoded characters may remain. In this situation, only two or three characters can remain. A single remaining encoded character is not possible, because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte, so a minimum of two Base64 characters are required: The first character contributes 6 bits, and the second character contributes its first 2 bits. For example:


{| class="wikitable"
{| class="wikitable" style="text-align:center;"
! Length !! Encoded !! Length !! Decoded
! Encoded !! Length <br> of last group !! Decoded !! Decoded length <br> of last group
|-
|-
| 2 || {{mono|1=bGlnaHQg{{bg|lightgrey|dw}}}}
| {{mono|1=bGlnaHQg{{bg|lightgrey|dw}}}} || 2
| 1 || ''light {{bg|lightgrey|w}}''
| ''light {{bg|lightgrey|w}}'' || 1
|-
|-
| 3 || {{mono|1=bGlnaHQg{{bg|lightgrey|d28}}}}
| {{mono|1=bGlnaHQg{{bg|lightgrey|d28}}}} || 3
| 2 || ''light {{bg|lightgrey|wo}}''
| ''light {{bg|lightgrey|wo}}'' || 2
|-
|-
| 4 || {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}}
| {{mono|1=bGlnaHQg{{bg|lightgrey|d29y}}}} || 4
| 3 || ''light {{bg|lightgrey|wor}}''
| ''light {{bg|lightgrey|wor}}'' || 3
|}
|}


Decoding without padding is not performed consistently among decoders. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes, which can be a security risk.<ref>{{cite conference |last1=Chalkias |first1=Konstantinos |last2=Chatzigiannis |first2=Panagiotis |title=Base64 Malleability in Practice |conference=ASIA CCS '22: 2022 ACM on Asia Conference on Computer and Communications Security |date=30 May 2022 |pages=1219–1221 |doi=10.1145/3488932.3527284 |url=https://eprint.iacr.org/2022/361.pdf}}</ref>
Decoding without padding is not performed consistently among decoders{{clarify|date=October 2025}}. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytes{{clarify|date=October 2025}}, which can be a security risk.<ref>{{cite conference |last1=Chalkias |first1=Konstantinos |last2=Chatzigiannis |first2=Panagiotis |title=Base64 Malleability in Practice |conference=ASIA CCS '22: 2022 ACM on Asia Conference on Computer and Communications Security |date=30 May 2022 |pages=1219–1221 |doi=10.1145/3488932.3527284 |url=https://eprint.iacr.org/2022/361.pdf}}</ref>
 
==Variants==
Variations of Base64 differ in the alphabet used and structural aspects like maximum line length. The most commonly used alphabet is that described by RFC 4648 and most variations only differ in the last two letters used. The following table describes more commonly used encodings that are specified by an [[Request for Comments|RFC]].


==Implementations and history==
===Variants summary table===
Implementations may have some constraints on the alphabet used for representing some bit patterns. This notably concerns the last two characters used in the alphabet at positions 62 and 63, and the character used for padding (which may be mandatory in some protocols or removed in others). The table below summarizes these known variants and provides links to the subsections below.
{|class="wikitable" style="text-align:center"
{|class="wikitable" style="text-align:center"
! rowspan=2 | Encoding
! rowspan=2 | Encoding<ref>Some specifications describe a Base64 encoding without naming it. This column identifies Base64 encodings in a descriptive way if no particular name is specified.</ref>
! colspan=3 | Encoding characters
! rowspan=2 | Specification
! colspan=3 | Separate encoding of lines
! colspan=3 | Alphabet
! rowspan=2 | Decoding non-encoding characters
! colspan=3 | Lines
|-
|-
! 62nd
! 62nd
! 63rd
! 63rd
! ''pad''
! pad
! Separators
! Separators
! Length
! Length
! Checksum
! Checksum
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc1421 RFC 1421]}}: Base64 for [[#Privacy-enhanced mail|Privacy-Enhanced Mail]] (deprecated)
! {{rh}} | Base 64 Encoding
| <code>+</code> || <code>/</code>  
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc4648#section-4 RFC 4648 §4]}}
| <code>=</code> || {{Yes}} || 64 || {{Yes}}, in PEM CRC || {{No}}
| <code>+</code> || <code>/</code>
| <code>=</code> || {{No}} || || {{No}}
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc2045 RFC 2045]}}: Base64 transfer encoding for [[#MIME|MIME]]
! {{rh}} | Base 64 Encoding with URL and Filename Safe Alphabet
| <code>+</code> || <code>/</code>
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc4648#section-5 RFC 4648 §5]}}
| <code>=</code> || {{Yes}} || 76 || {{No}} || {{No}}
| <code>-</code> || <code>_</code>
| <code>=</code><br><small>optional</small> || {{No}} || || {{No}}
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc2152 RFC 2152]}}: Base64 for [[#UTF-7|UTF-7]]
! {{rh}} | for [[#MIME|MIME]]
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc2045 RFC 2045]}}
| <code>+</code> || <code>/</code>
| <code>+</code> || <code>/</code>
| || {{No}} || || {{No}} || {{Yes}}
| <code>=</code> || {{Yes}} || 76 || {{No}}
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3 RFC 3501]}}: Base64 encoding for IMAP mailbox names
! {{rh}} | for [[#Privacy-enhanced mail|Privacy-Enhanced Mail]] (deprecated)
| <code>+</code> || <code>,</code>
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc1421 RFC 1421]}}
| || {{No}} || || {{No}} || {{No}}
| <code>+</code> || <code>/</code>  
| <code>=</code> || {{Yes}} || 64 || {{Yes}}, in PEM CRC
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc4648#section-4 RFC 4648 §4]}}: base64 (standard){{efn|name=common|This variant is intended to provide common features where they are not desired to be specialized by implementations, ensuring robust engineering. This is particularly in light of separate line encodings and restrictions, which have not been considered when previous standards have been co-opted for use elsewhere. Thus, the features indicated here may be overridden.}}
! {{rh}} | for [[#UTF-7|UTF-7]]
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc2152 RFC 2152]}}
| <code>+</code> || <code>/</code>
| <code>+</code> || <code>/</code>
| <code>=</code> || {{No}} || || {{No}} || {{No}}
| || {{No}} || || {{No}}
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc4648#section-5 RFC 4648 §5]}}: base64url (URL- and filename-safe standard){{efn|name=common}}
! {{rh}} | for IMAP mailbox names
| <code>-</code> || <code>_</code>
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3 RFC 3501]}}
| <code>=</code> optional || {{No}} || || {{No}} || {{No}}
| <code>+</code> || <code>,</code>
| || {{No}} || || {{No}}
|-
|-
! {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc9580 RFC 9580]}}: ASCII armor for [[#OpenPGP|OpenPGP]]
! {{rh}} | Textual Encodings of PKIX, PKCS, and CMS Structures
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc7468 RFC 7468]}}
| <code>+</code> || <code>/</code>  
| <code>+</code> || <code>/</code>  
| <code>=</code> || {{Yes}} || 76 || {{Yes}}, (CRC24) || {{No}}
| <code>=</code> || {{Yes}} || 64 || {{No}}
|-
|-
! {{rh}} | Other variations
! {{rh}} | ASCII armor for [[#OpenPGP|OpenPGP]]
| colspan="7" | See {{section link||Applications not compatible with RFC 4648 Base64}}
| {{rh}} | {{nowrap|[https://datatracker.ietf.org/doc/html/rfc9580 RFC 9580]}}
| <code>+</code> || <code>/</code>
| <code>=</code> || {{Yes}} || 76 || {{Yes}}, (CRC24)
|}
|}
{{notelist}}


===Privacy-enhanced mail===
===RFC 4648===
The first known standardized use of the encoding now called MIME Base64 was in the [[Privacy-enhanced Electronic Mail]] (PEM) protocol, proposed by {{IETF RFC|989}} in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of [[octet (computing)|octets]] to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as [[SMTP]].<ref>{{cite IETF |title=Privacy Enhancement for Internet Electronic Mail |rfc=989 |date=February 1987 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref>
{{IETF RFC|4648}} describes a various encodings including Base64, and it discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings. The variant that it calls ''Base 64 Encoding'' and ''base64'' is intended for general-use.
 
The current version of PEM (specified in {{IETF RFC|1421}}) uses a 64-character alphabet consisting of upper- and lower-case [[Roman letters]] (<code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>), the numerals (<code>0</code>–<code>9</code>), and the <code>+</code> and <code>/</code> symbols. The <code>=</code> symbol is also used as a padding suffix.<ref name="rfc 1421"/> The original specification, {{IETF RFC|989}}, additionally used the <code>*</code> symbol to delimit encoded but unencrypted data within the output stream.


To convert data to PEM printable encoding, the first byte is placed in the [[most significant bit|most significant]] eight bits of a 24-bit [[data buffer|buffer]], the next in the middle eight, and the third in the [[least significant bit|least significant]] eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "<code>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/</code>", and the indicated character is output.
The RFC also specifies a second Base64 encoding that is calls ''Base 64 Encoding with URL and Filename Safe Alphabet'' that is intended for representing relatively long identifying information. For example, a database persistence framework for [[Java (programming language)|Java]] objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit [[UUID]]s) as a string for use as an HTTP parameter in an HTTP form or an HTTP GET [[URL]]. Also, many [[application software|applications]] need to encode binary data in a way that is convenient for inclusion in a URL, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.


The process is repeated on the remaining data until fewer than four octets remain. If three octets remain, they are processed normally. If fewer than three octets (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
Using standard Base64 in a [[URL]] requires encoding the <code>+</code>, <code>/</code> and <code>=</code> characters as special [[percent-encoding|percent-encoded]] hexadecimal sequences (<code>+</code> becomes <code>%2B</code>, <code>/</code> becomes <code>%2F</code> and <code>=</code> becomes <code>%3D</code>), which makes the string longer and harder to read. Using a different alphabet allows for encoding as Base64 without requiring this extra markup. Typically, <code>+</code> and <code>/</code> are replaced by <code>-</code> and <code>_</code>, respectively, so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is [[YouTube#Uploading|YouTube]].<ref>{{cite web |title=Here's Why YouTube Will Practically Never Run Out of Unique Video IDs |url=https://www.mentalfloss.com/article/77598/heres-why-youtube-will-never-run-out-unique-video-ids |website=www.mentalfloss.com |access-date=27 December 2021 |language=en |date=23 March 2016}}</ref> Some variants allow or require omitting the padding <code>=</code> signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries {{which|date=December 2020}} encode <code>=</code> as <code>.</code>, potentially exposing applications to relative path attacks when a folder name is encoded from user data.{{Citation needed|date=June 2022}}


After encoding the non-padded data, if two octets of the 24-bit buffer are padded-zeros, two <code>=</code> characters are appended to the output; if one octet of the 24-bit buffer is filled with padded-zeros, one <code>=</code> character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
===RFC 3548===
{{IETF RFC|3548}}, entitled ''The Base16, Base32, and Base64 Data Encodings'', is an informational (non-normative) memo that attempts to unify the {{IETF RFC|1421}} and {{IETF RFC|2045}} specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings. RFC 4648 obsoletes RFC 3548.


PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.
Unless an encoder is written to a specification that refers to {{IETF RFC|3548}} and specifically requires otherwise{{clarify|date=October 2025}}, RFC 3548 forbids an encoder from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that a decoder must reject data that contain characters other than the encoding alphabet.<ref name="rfc 3548" />


===MIME===
===MIME===
{{Main|MIME}}
The [[MIME]] (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two [[binary-to-text encoding]] schemes (the other being [[quoted-printable]]).<ref name="rfc 2045"/> MIME's Base64 encoding is based on that of the {{IETF RFC|1421}} version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the <code>=</code> symbol for output padding in the same way, as described at {{IETF RFC|2045}}.
The [[MIME]] (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two [[binary-to-text encoding]] schemes (the other being [[quoted-printable]]).<ref name="rfc 2045"/> MIME's Base64 encoding is based on that of the {{IETF RFC|1421}} version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the <code>=</code> symbol for output padding in the same way, as described at {{IETF RFC|2045}}.


MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (For example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF [[newline]] pair to delimit encoded lines.
MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (for example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF [[newline]] pair to delimit encoded lines.


Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length ({{fract|4|3}}×{{fract|78|76}}), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:
Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length ({{fract|4|3}}×{{fract|78|76}}), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:
  bytes = (string_length(encoded_string) − 814) / 1.37
  bytes = (string_length(encoded_string) − 814) / 1.37
===Privacy-enhanced mail===
The first known standardized use of the encoding now called MIME Base64 was in the [[Privacy-Enhanced Mail]] (PEM) protocol, proposed by {{IETF RFC|989}} in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of bytes to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as [[SMTP]].<ref>{{cite IETF |title=Privacy Enhancement for Internet Electronic Mail |rfc=989 |date=February 1987 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref>
The current version of PEM (specified in {{IETF RFC|1421}}) uses a 64-character alphabet consisting of upper- and lower-case [[Roman letters]] (<code>A</code>–<code>Z</code>, <code>a</code>–<code>z</code>), the numerals (<code>0</code>–<code>9</code>), and the <code>+</code> and <code>/</code> symbols. The <code>=</code> symbol is also used as a padding suffix.<ref name="rfc 1421"/> The original specification, {{IETF RFC|989}}, additionally used the <code>*</code> symbol to delimit encoded but unencrypted data within the output stream.
To convert data to PEM printable encoding, the first byte is placed in the [[most significant bit|most significant]] eight bits of a 24-bit [[data buffer|buffer]], the next in the middle eight, and the third in the [[least significant bit|least significant]] eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "<code>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/</code>", and the indicated character is output.
The process is repeated on the remaining data until fewer than four bytes remain. If three bytes remain, they are processed normally. If fewer than three bytes (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.
After encoding the non-padded data, if two bytes of the 24-bit buffer are padded-zeros, two <code>=</code> characters are appended to the output; if one byte of the 24-bit buffer is filled with padded-zeros, one <code>=</code> character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.
PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.


===UTF-7===
===UTF-7===
{{Main|UTF-7}}
[[UTF-7]], described first in {{IETF RFC|1642}}, which was later superseded by {{IETF RFC|2152}}, introduced a system called ''modified Base64''. This data encoding scheme is used to encode [[UTF-16]] as [[ASCII]] characters for use in 7-bit transports such as [[SMTP]]. It is a variant of the Base64 encoding used in MIME.<ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=1642 |date=July 1994 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref><ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=2152 |date=May 1997 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref>
[[UTF-7]], described first in {{IETF RFC|1642}}, which was later superseded by {{IETF RFC|2152}}, introduced a system called ''modified Base64''. This data encoding scheme is used to encode [[UTF-16]] as [[ASCII]] characters for use in 7-bit transports such as [[SMTP]]. It is a variant of the Base64 encoding used in MIME.<ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=1642 |date=July 1994 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref><ref>{{cite IETF |title=UTF-7 A Mail-Safe Transformation Format of Unicode |rfc=2152 |date=May 1997 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=March 18, 2010}}</ref>


Line 393: Line 389:
[[OpenPGP]], described in {{IETF RFC|9580}}, specifies "[[ASCII armor]]", which is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit [[cyclic redundancy check|CRC]]. The [[checksum]] is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "<code>=</code>" symbol as the separator, appended to the encoded output data.<ref>{{cite IETF |title=OpenPGP Message Format |rfc=9580 |date=July 2024 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=February 13, 2025}}</ref>
[[OpenPGP]], described in {{IETF RFC|9580}}, specifies "[[ASCII armor]]", which is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit [[cyclic redundancy check|CRC]]. The [[checksum]] is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "<code>=</code>" symbol as the separator, appended to the encoded output data.<ref>{{cite IETF |title=OpenPGP Message Format |rfc=9580 |date=July 2024 |publisher=[[Internet Engineering Task Force|IETF]] |access-date=February 13, 2025}}</ref>


===RFC 3548===
===Javascript (DOM Web API) ===
{{IETF RFC|3548}}, entitled ''The Base16, Base32, and Base64 Data Encodings'', is an informational (non-normative) memo that attempts to unify the {{IETF RFC|1421}} and {{IETF RFC|2045}} specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings.
The <code>atob()</code> and <code>btoa()</code> JavaScript methods, defined in the HTML5 draft specification,<ref>{{cite web|title=7.3. Base64 utility methods|url=https://w3c.github.io/html/webappapis.html#atob|website=HTML 5.2 Editor's Draft|publisher=[[World Wide Web Consortium]]|access-date=2 January 2018}} Introduced by [http://html5.org/tools/web-apps-tracker?from=5813&to=5814 changeset 5814] {{Webarchive|url=https://web.archive.org/web/20140222225511/http://html5.org/tools/web-apps-tracker?from=5813&to=5814 |date=2014-02-22 }}, 2021-02-01.</ref><ref>{{cite web|title=Window: btoa() method|date=24 June 2025 |url=https://developer.mozilla.org/en-US/docs/Web/API/Window/btoa|access-date=2025-07-31}}</ref> provide Base64 encoding and decoding functionality to web pages. The <code>btoa()</code> method outputs padding characters, but these are optional in the input of the <code>atob()</code> method.<br />
Example: Encoding of the beginning of a GIF file: <code>btoa("GIF89a")</code> ↦ <code>"R0lGODlh"</code>.


Unless implementations are written to a specification that refers to {{IETF RFC|3548}} and specifically requires otherwise, RFC 3548 forbids implementations from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that decoder implementations must reject data that contain characters outside the encoding alphabet.<ref name="rfc 3548" />
===With atypical alphabet order===
Several variants use alphabets similar to the common variants, but in a different order.


===RFC 4648===
; Unix password: Unix stores password hashes computed with [[crypt (C)|'''crypt''']] in the [[passwd#Password file|<code>/etc/passwd</code> file]] using an encoding called <span id="B64">B64</span>. crypt's alphabet puts the punctuation <code>.</code> and <code>/</code> before the alphanumeric characters. crypt uses the alphabet "<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data.
{{IETF RFC|4648}} obsoletes {{IETF RFC|3548}} and focuses on Base64/32/16:


: ''This document describes the commonly used Base64, Base32, and Base16 encoding schemes. It also discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings.''
; GEDCOM: The '''[[GEDCOM]]''' 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is <span class="nowrap">"<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>.<ref>{{cite web|url=http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gctoc.htm |title=The GEDCOM Standard Release 5.5 |publisher=Homepages.rootsweb.ancestry.com |access-date=2012-06-21}}</ref>


===URL applications===
; bcrypt: '''[[bcrypt]]''' hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet <span class="nowrap">"<code>./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789</code>"</span>.<ref>{{cite web|url=https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/crypt/bcrypt.c?rev=1.1&content-type=text/x-cvsweb-markup|title=src/lib/libc/crypt/bcrypt.c r1.1|author-link=Niels Provos|first=Niels|last=Provos|date=1997-02-13|access-date=2018-05-18}}</ref>
Base64 encoding can be helpful when fairly lengthy identifying information is used in an HTTP environment. For example, a database persistence framework for [[Java (programming language)|Java]] objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit [[UUID]]s) into a string for use as an HTTP parameter in HTTP forms or HTTP GET [[URL]]s. Also, many applications need to encode binary data in a way that is convenient for inclusion in URLs, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.


Using standard Base64 in [[URL]] requires encoding of '<code>+</code>', '<code>/</code>' and '<code>=</code>' characters into special [[percent-encoding|percent-encoded]] hexadecimal sequences ('<code>+</code>' becomes '<code>%2B</code>', '<code>/</code>' becomes '<code>%2F</code>' and '<code>=</code>' becomes '<code>%3D</code>'), which makes the string unnecessarily longer.
; Xxencoding: '''[[Xxencoding]]''' uses a mostly-alphanumeric character set similar to crypt, but using <code>+</code> and <code>-</code> rather than <code>.</code> and <code>/</code>. Xxencoding uses the alphabet <span class="nowrap">"<code>+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>.


For this reason, '''modified Base64 for URL''' variants exist (such as  '''base64url''' in {{IETF RFC|4648}}), where the '<code>+</code>' and '<code>/</code>' characters of standard Base64 are respectively replaced by '<code>-</code>' and '<code>_</code>', so that using [[percent-encoding|URL encoders/decoders]] is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is [[YouTube#Uploading|YouTube]].<ref>{{cite web |title=Here's Why YouTube Will Practically Never Run Out of Unique Video IDs |url=https://www.mentalfloss.com/article/77598/heres-why-youtube-will-never-run-out-unique-video-ids |website=www.mentalfloss.com |access-date=27 December 2021 |language=en |date=23 March 2016}}</ref> Some variants allow or require omitting the padding '<code>=</code>' signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries {{which|date=December 2020}} will encode '<code>=</code>' to '<code>.</code>', potentially exposing applications to relative path attacks when a folder name is encoded from user data.{{Citation needed|date=June 2022}}
; 6PACK: Used with some [[terminal node controller]]s, uses an alphabet from 0x00 to 0x3f.<ref>{{cite web|url=http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|title=6PACK a "real time" PC to TNC protocol|access-date=2013-05-19|archive-date=2012-02-24|archive-url=https://web.archive.org/web/20120224051938/http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|url-status=dead}}</ref>


===Javascript (DOM Web API) ===
; Bash: [[Bash (Unix shell)|'''Bash''']] supports numeric literals in Base64. Bash uses the alphabet <span class="nowrap">"<code>0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_</code>"</span>.<ref>{{cite web |title=Shell Arithmetic |url=https://www.gnu.org/software/bash/manual/html_node/Shell-Arithmetic.html |website=Bash Reference Manual |access-date=8 April 2020 |quote=Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.}}</ref>
The <code>atob()</code> and <code>btoa()</code> JavaScript methods, defined in the HTML5 draft specification,<ref>{{cite web|title=7.3. Base64 utility methods|url=https://w3c.github.io/html/webappapis.html#atob|website=HTML 5.2 Editor's Draft|publisher=[[World Wide Web Consortium]]|access-date=2 January 2018}} Introduced by [http://html5.org/tools/web-apps-tracker?from=5813&to=5814 changeset 5814], 2021-02-01.</ref> provide Base64 encoding and decoding functionality to web pages. The <code>btoa()</code> method outputs padding characters, but these are optional in the input of the <code>atob()</code> method.


===Other applications===
===With atypical alphabet===
[[File:35_mm_angle_of_view_vs_focal_length.svg|thumb|link={{filepath:35_mm_angle_of_view_vs_focal_length.svg}}|Example of an SVG file containing embedded JPEG images encoded in Base64<ref>&lt;image xlink:href="data:image/jpeg;base64,<code>JPEG contents encoded in Base64</code>" ... /&gt;</ref>]]
Some variants use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (like RFC 4648).
Base64 can be used in a variety of contexts:


* Base64 can be used to transmit and store text that might otherwise cause [[delimiter collision]]
; Uuencoding: The '''[[Uuencoding]]''' alphabet includes no lowercase characters, instead using ASCII codes 32 ("<code>&nbsp;</code>" (space)) through 95 ("<code>_</code>"), consecutively. Uuencoding uses the alphabet <span class="nowrap">"<code>&nbsp;!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_</code>"</span>. Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.{{citation needed|date=April 2016}}
* Base64 is used to encode character strings in [[LDAP Data Interchange Format]] files
* Base64 is often used to embed binary data in an [[XML]] file, using a syntax similar to <code><nowiki><data encoding="base64">…</data></nowiki></code> e.g. [[favicon]]s in [[Firefox]]'s exported <code>bookmarks.html</code>.
* Base64 is used to encode binary files such as images within scripts, to avoid depending on external files.
* Base64 can be used to embed [[PDF]] files in HTML pages.<ref>{{Cite web |title=Encode PDF (Portable Document Format) File (.pdf) to Base64 Data |url=https://base64.online/encoders/encode-pdf-to-base64?utm_campaign=og |access-date=2024-03-21 |website=base64.online |language=en}}</ref>
* The [[data URI scheme]] can use Base64 to represent file contents. For instance, background images and fonts can be specified in a [[CSS]] stylesheet file as <code>data:</code> URIs, instead of being supplied in separate files.
* Although not part of the official specification for the [[SVG]] format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.<ref>{{cite web|url=http://jsfiddle.net/MxHPq/|title=Edit fiddle |website=jsfiddle.net}}</ref>
* Base64 can be used to store/transmit relatively small amounts of binary data via a computer's text [[Clipboard (computing)|clipboard]] functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of [[cryptocurrency]] recipients as Base64 encoded text strings, which can be easily copied and pasted into users' [[Cryptocurrency wallet|wallet software]].
* Binary data that must be quickly verified by humans as a safety mechanism, such as [[Checksum|file checksums]] or [[Public key fingerprint|key fingerprints]], is often represented in Base64 for easy checking, sometimes with additional formattings, such as separating each group of four characters in the representation of a [[Pretty Good Privacy|PGP]] key fingerprint with a space.
* [[QR code]]s which contain binary data will sometimes store it encoded in Base64 rather than simply storing the raw binary data, as there is a stronger guarantee that all QR code readers will accurately decode text, as well as the fact that some devices will more readily save text from a QR code than potentially malicious binary data.


=== Applications not compatible with RFC 4648 Base64 ===
; BinHex: [[BinHex|'''BinHex 4''']] (HQX), which was used within the [[classic Mac OS]], excludes some visually confusable characters like '<code>7</code>', '<code>O</code>', '<code>g</code>' and '<code>o</code>'. Its alphabet includes additional punctuation characters. It uses the alphabet <span class="nowrap">"<code><nowiki>!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr</nowiki></code>"</span>.
Some applications use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (see [[Base64#Variants summary table|Variants summary table]] above).


* The '''[[Uuencoding]]''' alphabet includes no lowercase characters, instead using ASCII codes 32 ("<code>&nbsp;</code>" (space)) through 95 ("<code>_</code>"), consecutively. Uuencoding uses the alphabet <span class="nowrap">"<code>&nbsp;!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_</code>"</span>. Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.{{citation needed|date=April 2016}}
; UTF-8: A [[UTF-8]] environment can use non-synchronized continuation bytes as base64: <code>0b10<b>xxxxxx</b></code>. See [[UTF-8#Comparison with other encodings|UTF-8#Self-synchronization]].
* [[BinHex|'''BinHex 4''']] (HQX), which was used within the [[classic Mac OS]], excludes some visually confusable characters like '<code>7</code>', '<code>O</code>', '<code>g</code>' and '<code>o</code>'. Its alphabet includes additional punctuation characters. It uses the alphabet <span class="nowrap">"<code><nowiki>!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr</nowiki></code>"</span>.
* A [[UTF-8]] environment can use non-synchronized continuation bytes as base64: <code>0b10<b>xxxxxx</b></code>. See [[UTF-8#Comparison with other encodings|UTF-8#Self-synchronization]].
* Several other applications use alphabets similar to the common variations, but in a different order:
** Unix stores password hashes computed with [[crypt (C)|'''crypt''']] in the [[passwd#Password file|<code>/etc/passwd</code> file]] using an encoding called <span id="B64">B64</span>. crypt's alphabet puts the punctuation <code>.</code> and <code>/</code> before the alphanumeric characters. crypt uses the alphabet "<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data.
** The '''[[GEDCOM]]''' 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is <span class="nowrap">"<code>./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>.<ref>{{cite web|url=http://homepages.rootsweb.ancestry.com/~pmcbride/gedcom/55gctoc.htm |title=The GEDCOM Standard Release 5.5 |publisher=Homepages.rootsweb.ancestry.com |access-date=2012-06-21}}</ref>
** '''[[bcrypt]]''' hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet <span class="nowrap">"<code>./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789</code>"</span>.<ref>{{cite web|url=https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/crypt/bcrypt.c?rev=1.1&content-type=text/x-cvsweb-markup|title=src/lib/libc/crypt/bcrypt.c r1.1|author-link=Niels Provos|first=Niels|last=Provos|date=1997-02-13|access-date=2018-05-18}}</ref>
** '''[[Xxencoding]]''' uses a mostly-alphanumeric character set similar to crypt, but using <code>+</code> and <code>-</code> rather than <code>.</code> and <code>/</code>. Xxencoding uses the alphabet <span class="nowrap">"<code>+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz</code>"</span>.
** '''6PACK''', used with some [[terminal node controller]]s, uses an alphabet from 0x00 to 0x3f.<ref>{{cite web|url=http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|title=6PACK a "real time" PC to TNC protocol|access-date=2013-05-19|archive-date=2012-02-24|archive-url=https://web.archive.org/web/20120224051938/http://private.freepage.de/cgi-bin/feets/freepage_ext/41030x030A/rewrite/alexs/xfr/flexnet/6pack_en/6pack.htm|url-status=dead}}</ref>
** [[Bash (Unix shell)|'''Bash''']] supports numeric literals in Base64. Bash uses the alphabet <span class="nowrap">"<code>0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_</code>"</span>.<ref>{{cite web |title=Shell Arithmetic |url=https://www.gnu.org/software/bash/manual/html_node/Shell-Arithmetic.html |website=Bash Reference Manual |access-date=8 April 2020 |quote=Otherwise, numbers take the form [base#]n, where the optional base is a decimal number between 2 and 64 representing the arithmetic base, and n is a number in that base.}}</ref>


==See also==
==See also==
* [[8BITMIME]]
* [[8BITMIME]] {{endash}} 8-bit data transmission for SMTP
* [[Ascii85]] (also called Base85)
* {{Annotated link |Ascii85}}
* [[Base16]]
* [[Base16]] {{endash}} Encoding for a sequence of byte values using hexadecimal
* [[Base32]]
* {{Annotated link |Base32}}
* [[Base36]]
* {{Annotated link |Base36}}
* [[Base62]]
* {{Annotated link |Base62}}
* [[Binary-to-text encoding]] for a comparison of various encoding algorithms
* {{Annotated link |Binary number}}
* [[Binary number]]
* [[URL]]


==References==
==References==

Revision as of 17:01, 18 November 2025

Template:Short description

Base64 is a binary-to-text encoding that uses 64 printable characters to represent each 6-bit segment of a sequence of byte[1] values. As for all binary-to-text encodings, Base64 encoding enables transmitting binary data on a communication channel that only supports text.

When comparing the original data to the resulting encoded data, Base64 encoding increases the size by 33% plus about 4% additional if inserting line breaks for typical line length.

The earliest uses of this encoding were for dial-up communication between systems running the same operating system – for example, uuencode for UNIX and BinHex for the TRS-80 (later adapted for the Macintosh) – and could therefore make more assumptions about what characters were safe to use. For instance, uuencode uses uppercase letters, digits, and many punctuation characters, but no lowercase.[2][3][4][5]

Applications

File:35 mm angle of view vs focal length.svg
Example of an SVG file containing embedded JPEG images encoded in Base64[6]

Notable applications of Base64:

Web pages
Encoding as Base64 is prevalent on the World Wide Web[7] where is it often used to embed binary data such as a digital image in text such as HTML and CSS.[8]
E-mail attachment
Base64 is widely used for sending e-mail attachments, because SMTP – in its original form – was designed to transport 7-bit ASCII characters only. Encoding an attachment as Base64 before sending, and then decoding when received, assures older SMTP servers correctly transmit messages with attached binary information.
Embed binary data in a text file
For example, to include the data of an image in a script to avoid depending on external files.
Embed binary data in XML
To embed binary data in an XML file, using a syntax similar to <data encoding="base64">...</data> e.g. favicons in Firefox's exported bookmarks.html.
Embed PDF file
To embed a PDF file in an HTML page.[9]
Embedded elements
Although not part of the official specification for the SVG format, some viewers can interpret Base64 when used for embedded elements, such as raster images inside SVG files.[10]
Prevent delimiter collision
To transmit and store text that might otherwise cause delimiter collision.
LDAP Data Interchange Format
To encode character strings in LDAP Data Interchange Format files.
Data URI scheme
The data URI scheme can use Base64 to represent file contents. For instance, background images and fonts can be specified in a CSS stylesheet file as data: URIs, instead of being supplied in separate files.
Leverage clipboard
To store/transmit relatively small amounts of binary data via a computer's text clipboard functionality, especially in cases where the information doesn't warrant being permanently saved or when information must be quickly sent between a wide variety of different, potentially incompatible programs. An example is the representation of the public keys of cryptocurrency recipients as Base64 encoded text strings, which can be easily copied and pasted into users' wallet software.
Support human verification
Binary data that must be quickly verified by humans as a safety mechanism, such as file checksums or key fingerprints, is often represented in Base64 for easy checking, sometimes with additional formatting, such as separating each group of four characters in the representation of a PGP key fingerprint with a space.
QR code encoding
A QR code, which contains binary data, is sometimes stored as Base64 since it is more likely that a QR code reader accurately decodes text than binary data. Also, some devices more readily save text from a QR code than potentially malicious binary data.

Alphabet

The set of characters used to represent the values for each base-64 digit (value from 0 to 63) differs slightly between the variations of Base64. The general strategy is to use printable characters that are common to most character encodings. This tends to result in data remaining unchanged as it moves through information systems, such as email, that were traditionally not 8-bit clean.[5] Typically, an encoding uses AZ, az, and 09 for the first 62 values. Many variants use + and / for the last two.

Per RFC 4648 §4, the following table lists the characters used for each numeric value. To indicate padding, = is used.

Base64 alphabet
value char value char value char value char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

Note that Base64URL encoding replaces '+' with '-' and '/' with '_' to make the encoded string HTTP-safe and avoid the need for escaping.

Examples

To simplify explanation, the example below uses ASCII text for input even though this is not a typical use. More commonly, input is binary data, such as an image, and the result then represents binary data in a printable text format.

For the input data:

Many hands make light work.

The typical Base64 represented is:

TWFueSBoYW5kcyBtYWtlIGxpZ2h0IHdvcmsu

Encoding when no padding needed

Each input sequence of 6 bits (which can encode 26 = 64 values) is mapped to a Base64 alphabet letter. Therefore, Base64 encoding results in four characters for each three input bytes. Assuming the input is ASCII or similar, the byte-data for the first three characters 'M', 'a', 'n' are values 77, 97, and 110 which in 8-bit binary representation are 01001101, 01100001, and 01101110. Joining these representations and splitting into 6-bit groups gives:

010011 010110 000101 101110

Which encodes the string TWFu (per ASCII or similar).

The following table shows how input is encoded. For example, the letter 'M' has the value 77 (per ASCII and similar). The first 6 bits of the value is 010011 or 19 decimal which maps to Base64 letter 'T' which has a value 84 (per ASCII and similar).

Encoding 'M', 'a', 'n' as Base64
input
(ASCII)
letter (ASCII) M a n
8-bit
decimal value
77 97 110
bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
encoded
(Base64)
6-bit
decimal value
19 22 5 46
letter
(Base64 alphabet)
T W F u
byte 84 87 70 117

Encoding with one padding character

If the input consists of a number of bytes that is 2 more than a multiple of 3 (e.g. 'M', 'a'), then the last 2 bytes (16 bits) are encoded in 3 Base64 digits (18 bits). The two least significant bits of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing = padding character).

input
(ASCII)
letter (ASCII) M a
8-bit
decimal value
77 97
bits 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 0 Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp
encoded
(Base64)
6-bit
decimal value
19 22 4 Padding
letter
(Base64 alphabet)
T W E =
byte 84 87 69 61

Encoding with two padding characters

If the input consists of a number of bytes that is 1 more than a multiple of 3 (e.g. 'M'), then the last 8 bits are represented in 2 Base64 digits (12 bits). The four least significant bits of the last content-bearing 6-bit block are treated as zero for encoding and discarded for decoding (along with the trailing two = padding characters):

input
(ASCII)
letter (ASCII) M
8-bit
decimal value
77
bits 0 1 0 0 1 1 0 1 0 0 0 0 Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp Template:Fsp
encoded
(Base64)
6-bit
decimal value
19 16 Padding Padding
letter
(Base64 alphabet)
T Q = =
byte 84 81 61 61

Decoding with padding

When decoding, each sequence of four encoded characters is converted to three output bytes, but with a single padding character the final 4 characters decode to only two bytes, or with two padding characters, the final 4 characters decode to a single byte. For example:

Encoded Padding Length Decoded
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg == 1 light Template:Bg
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg = 2 light Template:Bg
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg Template:CNone 3 light Template:Bg

Another way to interpret the padding character is to consider it as an instruction to discard 2 trailing bits from the bit string each time a = is encountered. For example, when <templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg is decoded, we convert each character (except the trailing occurrences of =) into their corresponding 6-bit representation, and then discard 2 trailing bits for the first = and another 2 trailing bits for the other =. In this instance, we would get 6 bits from the d, and another 6 bits from the w for a bit string of length 12, but since we remove 2 bits for each = (for a total of 4 bits), the dw== ends up producing 8 bits (1 byte) when decoded.

Decoding without padding

Use of the padding character in encoded text is not essential for decoding. The number of missing bytes can be inferred from the length of the encoded text. In some variants, the padding character is mandatory, while for others it is not used. Notably, when concatenating Base64 encoded strings, then use of padding characters is required.

Without padding, after decoding each sequence of 4 encoded characters, there may be 2 or 3 encoded characters left over. A single remaining encoded character is not possible because a single Base64 character only contains 6 bits, and 8 bits are required to create a byte. The first character contributes 6 bits, and the second character contributes its first 2 bits. The following table demonstrates decoding encoded strings that have 2, 3 or no left-over characters.

Encoded Length
of last group
Decoded Decoded length
of last group
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg 2 light Template:Bg 1
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg 3 light Template:Bg 2
<templatestyles src="Mono/styles.css" />bGlnaHQgTemplate:Bg 4 light Template:Bg 3

Decoding without padding is not performed consistently among decodersScript error: No such module "Unsubst".. In addition, allowing padless decoding by definition allows multiple strings to decode into the same set of bytesScript error: No such module "Unsubst"., which can be a security risk.[11]

Variants

Variations of Base64 differ in the alphabet used and structural aspects like maximum line length. The most commonly used alphabet is that described by RFC 4648 and most variations only differ in the last two letters used. The following table describes more commonly used encodings that are specified by an RFC.

Encoding[12] Specification Alphabet Lines
62nd 63rd pad Separators Length Checksum
Template:Rh | Base 64 Encoding Template:Rh | RFC 4648 §4 + / = No No
Template:Rh | Base 64 Encoding with URL and Filename Safe Alphabet Template:Rh | RFC 4648 §5 - _ =
optional
No No
Template:Rh | for MIME Template:Rh | RFC 2045 + / = Yes 76 No
Template:Rh | for Privacy-Enhanced Mail (deprecated) Template:Rh | RFC 1421 + / = Yes 64 Yes, in PEM CRC
Template:Rh | for UTF-7 Template:Rh | RFC 2152 + / No No
Template:Rh | for IMAP mailbox names Template:Rh | RFC 3501 + , No No
Template:Rh | Textual Encodings of PKIX, PKCS, and CMS Structures Template:Rh | RFC 7468 + / = Yes 64 No
Template:Rh | ASCII armor for OpenPGP Template:Rh | RFC 9580 + / = Yes 76 Yes, (CRC24)

RFC 4648

Template:IETF RFC describes a various encodings including Base64, and it discusses the use of line feeds in encoded data, the use of padding in encoded data, the use of non-alphabet characters in encoded data, use of different encoding alphabets, and canonical encodings. The variant that it calls Base 64 Encoding and base64 is intended for general-use.

The RFC also specifies a second Base64 encoding that is calls Base 64 Encoding with URL and Filename Safe Alphabet that is intended for representing relatively long identifying information. For example, a database persistence framework for Java objects might use Base64 encoding to encode a relatively large unique id (generally 128-bit UUIDs) as a string for use as an HTTP parameter in an HTTP form or an HTTP GET URL. Also, many applications need to encode binary data in a way that is convenient for inclusion in a URL, including in hidden web form fields, and Base64 is a convenient encoding to render them in a compact way.

Using standard Base64 in a URL requires encoding the +, / and = characters as special percent-encoded hexadecimal sequences (+ becomes %2B, / becomes %2F and = becomes %3D), which makes the string longer and harder to read. Using a different alphabet allows for encoding as Base64 without requiring this extra markup. Typically, + and / are replaced by - and _, respectively, so that using URL encoders/decoders is no longer necessary and has no effect on the length of the encoded value, leaving the same encoded form intact for use in relational databases, web forms, and object identifiers in general. A popular site to make use of such is YouTube.[13] Some variants allow or require omitting the padding = signs to avoid them being confused with field separators, or require that any such padding be percent-encoded. Some libraries Script error: No such module "Unsubst". encode = as ., potentially exposing applications to relative path attacks when a folder name is encoded from user data.Script error: No such module "Unsubst".

RFC 3548

Template:IETF RFC, entitled The Base16, Base32, and Base64 Data Encodings, is an informational (non-normative) memo that attempts to unify the Template:IETF RFC and Template:IETF RFC specifications of Base64 encodings, alternative-alphabet encodings, and the Base32 (which is seldom used) and Base16 encodings. RFC 4648 obsoletes RFC 3548.

Unless an encoder is written to a specification that refers to Template:IETF RFC and specifically requires otherwiseScript error: No such module "Unsubst"., RFC 3548 forbids an encoder from generating messages containing characters outside the encoding alphabet or without padding, and it also declares that a decoder must reject data that contain characters other than the encoding alphabet.[4]

MIME

The MIME (Multipurpose Internet Mail Extensions) specification lists Base64 as one of two binary-to-text encoding schemes (the other being quoted-printable).[3] MIME's Base64 encoding is based on that of the Template:IETF RFC version of PEM: it uses the same 64-character alphabet and encoding mechanism as PEM and uses the = symbol for output padding in the same way, as described at Template:IETF RFC.

MIME does not specify a fixed length for Base64-encoded lines, but it does specify a maximum line length of 76 characters. Additionally, it specifies that any character outside the standard set of 64 encoding characters (for example CRLF sequences), must be ignored by a compliant decoder, although most implementations use a CR/LF newline pair to delimit encoded lines.

Thus, the actual length of MIME-compliant Base64-encoded binary data is usually about 137% of the original data length (<templatestyles src="Fraction/styles.css" />43×<templatestyles src="Fraction/styles.css" />7876), though for very short messages the overhead can be much higher due to the overhead of the headers. Very roughly, the final size of Base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers). The size of the decoded data can be approximated with this formula:

bytes = (string_length(encoded_string) − 814) / 1.37

Privacy-enhanced mail

The first known standardized use of the encoding now called MIME Base64 was in the Privacy-Enhanced Mail (PEM) protocol, proposed by Template:IETF RFC in 1987. PEM defines a "printable encoding" scheme that uses Base64 encoding to transform an arbitrary sequence of bytes to a format that can be expressed in short lines of 6-bit characters, as required by transfer protocols such as SMTP.[14]

The current version of PEM (specified in Template:IETF RFC) uses a 64-character alphabet consisting of upper- and lower-case Roman letters (AZ, az), the numerals (09), and the + and / symbols. The = symbol is also used as a padding suffix.[2] The original specification, Template:IETF RFC, additionally used the * symbol to delimit encoded but unencrypted data within the output stream.

To convert data to PEM printable encoding, the first byte is placed in the most significant eight bits of a 24-bit buffer, the next in the middle eight, and the third in the least significant eight bits. If there are fewer than three bytes left to encode (or in total), the remaining buffer bits will be zero. The buffer is then used, six bits at a time, most significant first, as indices into the string: "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and the indicated character is output.

The process is repeated on the remaining data until fewer than four bytes remain. If three bytes remain, they are processed normally. If fewer than three bytes (24 bits) are remaining to encode, the input data is right-padded with zero bits to form an integral multiple of six bits.

After encoding the non-padded data, if two bytes of the 24-bit buffer are padded-zeros, two = characters are appended to the output; if one byte of the 24-bit buffer is filled with padded-zeros, one = character is appended. This signals the decoder that the zero bits added due to padding should be excluded from the reconstructed data. This also guarantees that the encoded output length is a multiple of 4 bytes.

PEM requires that all encoded lines consist of exactly 64 printable characters, with the exception of the last line, which may contain fewer printable characters. Lines are delimited by whitespace characters according to local (platform-specific) conventions.

UTF-7

UTF-7, described first in Template:IETF RFC, which was later superseded by Template:IETF RFC, introduced a system called modified Base64. This data encoding scheme is used to encode UTF-16 as ASCII characters for use in 7-bit transports such as SMTP. It is a variant of the Base64 encoding used in MIME.[15][16]

The "Modified Base64" alphabet consists of the MIME Base64 alphabet, but does not use the "=" padding character. UTF-7 is intended for use in mail headers (defined in Template:IETF RFC), and the "=" character is reserved in that context as the escape character for "quoted-printable" encoding. Modified Base64 simply omits the padding and ends immediately after the last Base64 digit containing useful bits leaving up to three unused bits in the last Base64 digit.

OpenPGP

Script error: No such module "labelled list hatnote". OpenPGP, described in Template:IETF RFC, specifies "ASCII armor", which is identical to the "Base64" encoding described by MIME, with the addition of an optional 24-bit CRC. The checksum is calculated on the input data before encoding; the checksum is then encoded with the same Base64 algorithm and, prefixed by the "=" symbol as the separator, appended to the encoded output data.[17]

Javascript (DOM Web API)

The atob() and btoa() JavaScript methods, defined in the HTML5 draft specification,[18][19] provide Base64 encoding and decoding functionality to web pages. The btoa() method outputs padding characters, but these are optional in the input of the atob() method.
Example: Encoding of the beginning of a GIF file: btoa("GIF89a")"R0lGODlh".

With atypical alphabet order

Several variants use alphabets similar to the common variants, but in a different order.

Unix password
Unix stores password hashes computed with crypt in the /etc/passwd file using an encoding called B64. crypt's alphabet puts the punctuation . and / before the alphanumeric characters. crypt uses the alphabet "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" without padding. An advantage over RFC 4648 is that sorting encoded ASCII data results in the same order as sorting the plain ASCII data.
GEDCOM
The GEDCOM 5.5 standard for genealogical data interchange encodes multimedia files in its text-line hierarchical file format. GEDCOM uses the same alphabet as crypt, which is "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".[20]
bcrypt
bcrypt hashes are designed to be used in the same way as traditional crypt(3) hashes, but bcrypt's alphabet is in a different order than crypt's. bcrypt uses the alphabet "./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".[21]
Xxencoding
Xxencoding uses a mostly-alphanumeric character set similar to crypt, but using + and - rather than . and /. Xxencoding uses the alphabet "+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".
6PACK
Used with some terminal node controllers, uses an alphabet from 0x00 to 0x3f.[22]
Bash
Bash supports numeric literals in Base64. Bash uses the alphabet "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ@_".[23]

With atypical alphabet

Some variants use a Base64 alphabet that is significantly different from the alphabets used in the most common Base64 variants (like RFC 4648).

Uuencoding
The Uuencoding alphabet includes no lowercase characters, instead using ASCII codes 32 (" " (space)) through 95 ("_"), consecutively. Uuencoding uses the alphabet " !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_". Avoiding all lower-case letters was helpful, because many older printers only printed uppercase. Using consecutive ASCII characters saved computing power, because it was only necessary to add 32, without requiring a lookup table. Its use of most punctuation characters and the space character may limit its usefulness in some applications, such as those that use these characters as syntax.Script error: No such module "Unsubst".
BinHex
BinHex 4 (HQX), which was used within the classic Mac OS, excludes some visually confusable characters like '7', 'O', 'g' and 'o'. Its alphabet includes additional punctuation characters. It uses the alphabet "!"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr".
UTF-8
A UTF-8 environment can use non-synchronized continuation bytes as base64: 0b10xxxxxx. See UTF-8#Self-synchronization.

See also

  1. REDIRECT Template:En dash

Template:R protected 8-bit data transmission for SMTP

  1. REDIRECT Template:En dash

Template:R protected Encoding for a sequence of byte values using hexadecimal

References

<templatestyles src="Reflist/styles.css" />

  1. technically octet
  2. a b Template:Cite IETF
  3. a b Template:Cite IETF
  4. a b Template:Cite IETF
  5. a b Template:Cite IETF
  6. <image xlink:href="data:image/jpeg;base64,JPEG contents encoded in Base64" ... />
  7. Script error: No such module "citation/CS1".
  8. Script error: No such module "citation/CS1".
  9. Script error: No such module "citation/CS1".
  10. Script error: No such module "citation/CS1".
  11. Script error: No such module "citation/CS1".
  12. Some specifications describe a Base64 encoding without naming it. This column identifies Base64 encodings in a descriptive way if no particular name is specified.
  13. Script error: No such module "citation/CS1".
  14. Template:Cite IETF
  15. Template:Cite IETF
  16. Template:Cite IETF
  17. Template:Cite IETF
  18. Script error: No such module "citation/CS1". Introduced by changeset 5814 Template:Webarchive, 2021-02-01.
  19. Script error: No such module "citation/CS1".
  20. Script error: No such module "citation/CS1".
  21. Script error: No such module "citation/CS1".
  22. Script error: No such module "citation/CS1".
  23. Script error: No such module "citation/CS1".

Script error: No such module "Check for unknown parameters".

Template:Data Exchange