Machine code: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Belbury
MOS:ORDER, enlarge image
 
imported>Explicit
m Removing link(s) because G8: Redirect to deleted page Detection of Intrusions and Malware, and Vulnerability Assessment (XFDcloser)
 
(3 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Short description|Lowest level instructions executed by a computer}}
{{Short description |Instructions directly executable by a computer}}
{{for|code that is completely internal to some CPUs and normally inaccessible to programmers|Microcode}}
{{for|code that is completely internal to some CPUs and normally inaccessible to programmers|Microcode}}
{{redirect|Native code|the French colonial legal system|Native code (France)}}
{{redirect|Native code|the French colonial legal system|Native code (France)}}
Line 7: Line 7:
{{Program execution}}
{{Program execution}}


In [[computer programming]], '''machine code''' is [[computer program|computer code]] consisting of '''machine language''' [[instruction set architecture|instructions]], which are used to control a computer's [[central processing unit]] (CPU). For conventional [[binary number|binary computer]]s, machine code is the binary<ref group=nb>On nonbinary machines it is, e.g., a decimal representation.</ref> representation of a computer program that is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions (possibly interspersed with data).<ref name="Stallings_2015"/>
In [[computing]], '''machine code''' is [[data]] [[encoded]] and structured to control a [[computer]]'s [[central processing unit]] (CPU) via its programmable [[Interface (computing)|interface]]. A [[computer program]] consists primarily of sequences of machine-code instructions.<ref name="Stallings_2015"/> Machine code is classified as [[native (computing)|native]] with respect to its host CPU since it is the language that the CPU interprets directly.<ref name="Managed"/> A [[interpreter (software)|software interpreter]] is a [[virtual machine]] that processes virtual machine code.


Each machine code instruction causes the CPU to perform a specific task. Examples of such tasks include:
A machine-code instruction causes the CPU to perform a specific task such as:
# Load a [[Word (computer architecture)|word]] from [[Random-access memory|memory]] to a [[Processor register|CPU register]]
* Load a [[Word (computer architecture)|word]] from [[Random-access memory|memory]] to a [[Processor register|CPU register]]
# Execute an [[arithmetic logic unit]] (ALU) operation on one or more registers or memory locations
* Execute an [[arithmetic logic unit]] (ALU) operation on one or more registers or memory locations
# [[jump instruction|Jump]] or [[Addressing mode#Skip|skip]] to an instruction that is not the next one
* [[jump instruction|Jump]] or [[Addressing mode#Skip|skip]] to an instruction that is not the next one


In general, each architecture family (e.g., [[x86]], [[ARM architecture family|ARM]]) has its own [[instruction set architecture]] (ISA), and hence its own specific machine code language. There are exceptions, such as the [[VAX]] architecture, which includes optional support of the [[PDP-11]] instruction set; the [[IA-64]] architecture, which includes optional support of the [[IA-32]] instruction set; and the [[PowerPC 600#PowerPC 615|PowerPC 615]] microprocessor, which can natively process both [[PowerPC]] and x86 instruction sets.
An [[instruction set architecture]] (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such as [[x86]] and [[ARM architecture family |ARM]]. Generally, machine code compatible with one family is not with others, but there are exceptions. The [[VAX]] architecture includes optional support of the [[PDP-11]] instruction set. The [[IA-64]] architecture includes optional support of the [[IA-32]] instruction set. And, the [[PowerPC 600#PowerPC 615|PowerPC 615]] can natively process both [[PowerPC]] and x86 instructions.


Machine code is a strictly numerical language, and it is the lowest-level interface to the CPU intended for a programmer. [[Assembly language]] provides a direct map between the numerical machine code and a human-readable mnemonic. In assembly, numerical [[opcode]]s and operands are replaced with mnemonics and labels. For example, the [[x86]] architecture has available the 0x90 opcode; it is represented as [[NOP (code)|NOP]] in the assembly [[source code]]. While it is possible to write programs directly in machine code, managing individual bits and calculating numerical [[memory address|addresses]] is tedious and error-prone. Therefore, programs are rarely written directly in machine code. However, an existing machine code program may be edited if the assembly source code is not available.
==Assembly language==
[[File:Machine language and assembly language.jpg|thumb|319x319px|Translation of assembly into machine code]]


The majority of programs today are written in a [[high-level programming language|high-level language]]. A high-level program may be translated into machine code by a [[compiler]].
[[Assembly language]] provides a relatively direct mapping from a [[human-readable]] [[source code]] to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.<ref name="Dourish_2004"/> For example, <code>[[NOP (code)|NOP]]</code> in assembly for an [[x86]] processor represents the x86 architecture [[opcode]] 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in a [[high-level programming language]].


==Instruction set==
==Instruction set==
{{main |Instruction set}}
A machine instruction encodes an operation as a pattern of [[bit]]s based on the specified format for the machine's instruction set.<ref group=nb>On early [[Decimal computer|decimal machines]], patterns of characters, digits and digit sign</ref><ref name="sco-p251">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/251 251]}}</ref>


Every processor or processor family has its own [[instruction set]]. Machine instructions are patterns of [[bit]]s<ref group=nb>On early [[Decimal computer|decimal machines]], patterns of characters, digits and digit sign</ref> that specify some particular action.<ref name="sco-p251">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/251 251]}}</ref> An instruction set is described by its [[UNIVAC_1100/2200_series#Instruction_format|instruction format]]. Some ways in which instruction formats may differ:<ref name="sco-p251"/>
Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the [[Word (computer architecture)|word]] size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.<ref name="sco-p251"/>
* all instructions may have the same length or instructions may have different lengths;
* the number of instructions may be small or large;
* instructions may or may not align with the architecture's [[Word (computer architecture)|word length]].


A processor's instruction set needs to execute the circuits of a computer's [[Logic level|digital logic level]]. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.<ref name="sco-p162">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/162 162]}}</ref> To control a computer's [[Computer architecture|architectural]] features, machine instructions are created. Examples of features that are controlled using machine instructions:
An instruction set needs to execute the circuits of a computer's [[Logic level|digital logic level]]. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.<ref name="sco-p162">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/162 162]}}</ref> To control a computer's [[Computer architecture|architectural]] features, machine instructions are created. Examples of features that are controlled using machine instructions:
* [[memory segmentation|segment registers]]<ref name="sco-p231">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/231 231]}}</ref>
* [[memory segmentation|segment registers]]<ref name="sco-p231">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/231 231]}}</ref>
* [[protected mode|protected address mode]]<ref name="sco-p237">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/237 237]}}</ref>
* [[protected mode|protected address mode]]<ref name="sco-p237">{{harvnb|Tanenbaum|1990|p= [https://archive.org/details/structuredcomput00tane/page/237 237]}}</ref>
Line 51: Line 49:
* Input/output
* Input/output


==Assembly languages==
==={{anchor |Overlapping instructions}}Overlapping instruction===
{{main|Assembly language}}
On processor architectures with [[variable-length instruction set]]s<ref name="Jacob-Jakubowski-Venkatesan_2007"/> (such as [[Intel]]'s [[x86]] processor family) it is, within the limits of the control-flow [[self-synchronizing code|resynchronizing]] phenomenon known as the [[Kruskal count]],<ref name="Lagarias-Rains-Vanderbei_2001"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="Andriesse-Bos_2014"/><ref name="Jakubowski_2016"/><ref name="Jämthagen_2016"/> sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.<ref group="nb" name="NB_Merging_or_branching"/> These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction''.<ref name="HN_2021"/><ref name="Kinder_2010"/><ref name="RE_2013"/>
[[File:Machine language and assembly language.jpg|thumb|319x319px|Translation of assembly language into machine language]]
 
A much more human-friendly rendition of machine language, named [[assembly language]], uses [[Assembly language#Opcode mnemonics and extended mnemonics|mnemonic code]]s to refer to machine code instructions, rather than using the instructions' numeric values directly, and uses [[Symbol table|symbolic names]] to refer to storage locations and sometimes [[Processor register|registers]].<ref name="Dourish_2004"/> For example, on the [[Zilog Z80]] processor, the machine code <code>00000101</code>, which causes the CPU to decrement the <code>B</code> [[general-purpose register]], would be represented in assembly language as <code>DEC B</code>.<ref name="Zaks_1982"/>
In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables in [[Microsoft]]'s [[Altair BASIC]], where ''interleaved instructions'' mutually shared their instruction bytes.<ref name="Gates"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="HN_2021"/> The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte level such as in the implementation of [[boot loader]]s which have to fit into [[boot sector]]s.<ref group="nb" name="NB_DR-DOS_707"/>
 
It is also sometimes used as a [[code obfuscation]] technique as a measure against [[disassembly]] and tampering.<ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="Jakubowski_2016"/>
 
The principle is also used in shared code sequences of [[fat binaries]] which must run on multiple instruction-set-incompatible processor platforms.<ref group="nb" name="NB_Merging_or_branching"/>
 
This property is also used to find [[unintended instruction]]s called [[gadget (machine instruction sequence)|gadget]]s in existing code repositories and is used in [[return-oriented programming]] as alternative to [[code injection]] for exploits such as [[return-to-libc attack]]s.<ref name="Shacham_2007"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/>
 
===Microcode===
In some computers, the machine code of the [[computer architecture |architecture]] is implemented by an even more fundamental underlying layer called [[microcode]], providing a common machine language interface across a line or family of different models of computer with widely different underlying [[dataflow]]s. This is done to facilitate [[porting]] of machine language programs between different models.<ref>{{Cite book |last1=Kent |first1=Allen |url=https://books.google.com/books?id=EjWV8J8CQEYC&pg=PA33 |title=Encyclopedia of Computer Science and Technology: Volume 28 - Supplement 13: AerosPate Applications of Artificial Intelligence to Tree Structures |last2=Williams |first2=James G. |date=1993-04-05 |publisher=CRC Press |isbn=978-0-8247-2281-4 |pages=33–34 |language=en}}</ref> An example of this use is the IBM [[System/360]] family of computers and their successors.<ref>{{Cite journal |last=Tucker |first=S. G. |date=31 December 1967 |title=Microprogram control for SYSTEM/360 |url=https://ieeexplore.ieee.org/document/5388391 |journal=IBM Systems Journal |volume=6 |issue=4 |pages=222–241 |doi=10.1147/sj.64.0222 |issn=0018-8670 |via=IEEE Xplore|url-access=subscription }}</ref>


==Examples==
==Examples==
Line 82: Line 89:


===MIPS===
===MIPS===
The [[MIPS architecture]] provides a specific example for a machine code whose instructions are always 32 bits long.<ref name="Harris_2007"/>{{Rp|299}} The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional field ''funct'' to determine the exact operation. The fields used in these types are:
The [[MIPS architecture]] provides a specific example for a machine code whose instructions are always 32 bits long.<ref name="Harris_2007"/>{{Rp|299}} The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional ''funct'' (function) field to determine the exact operation. The fields used in these types are:


     6      5    5    5    5      6 bits
     6      5    5    5    5      6 bits
Line 109: Line 116:
   000010 00000 00000 00000 10000 000000  binary
   000010 00000 00000 00000 10000 000000  binary


==Overlapping instructions==
==Bytecode==
<!-- This section header is used in incoming redirects -->
Machine code is similar to yet fundamentally different from [[bytecode]]. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as the [[Java processor]]. If bytecode is processed by an software interpreter, then that interpreter is a [[virtual machine]] for which the bytecode is its machine code.
On processor architectures with [[variable-length instruction set]]s<ref name="Jacob-Jakubowski-Venkatesan_2007"/> (such as [[Intel]]'s [[x86]] processor family) it is, within the limits of the control-flow [[self-synchronizing code|resynchronizing]] phenomenon known as the [[Kruskal count]],<ref name="Lagarias-Rains-Vanderbei_2001"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="Andriesse-Bos_2014"/><ref name="Jakubowski_2016"/><ref name="Jämthagen_2016"/> sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.<ref group="nb" name="NB_Merging_or_branching"/> These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction''.<ref name="HN_2021"/><ref name="Kinder_2010"/><ref name="RE_2013"/>
 
In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in [[Microsoft]]'s [[Altair BASIC]], where ''interleaved instructions'' mutually shared their instruction bytes.<ref name="Gates"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="HN_2021"/> The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of [[boot loader]]s which have to fit into [[boot sector]]s.<ref group="nb" name="NB_DR-DOS_707"/>
 
It is also sometimes used as a [[code obfuscation]] technique as a measure against [[disassembly]] and tampering.<ref name="Jacob-Jakubowski-Venkatesan_2007"/><ref name="Jakubowski_2016"/>
 
The principle is also used in shared code sequences of [[fat binaries]] which must run on multiple instruction-set-incompatible processor platforms.<ref group="nb" name="NB_Merging_or_branching"/>
 
This property is also used to find [[unintended instruction]]s called [[gadget (machine instruction sequence)|gadget]]s in existing code repositories and is used in [[return-oriented programming]] as alternative to [[code injection]] for exploits such as [[return-to-libc attack]]s.<ref name="Shacham_2007"/><ref name="Jacob-Jakubowski-Venkatesan_2007"/>
 
==Relationship to microcode==
{{Unreferenced section|date=September 2024}}
In some computers, the machine code of the [[computer architecture|architecture]] is implemented by an even more fundamental underlying layer called [[microcode]], providing a common machine language interface across a line or family of different models of computer with widely different underlying [[dataflow]]s. This is done to facilitate [[porting]] of machine language programs between different models. An example of this use is the IBM [[System/360]] family of computers and their successors.
 
==Relationship to bytecode==
Machine code is generally different from [[bytecode]] (also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with [[Java processor]]s.
 
Machine code and assembly code are sometimes called ''[[native (computing)|native]] code'' when referring to platform-dependent parts of language features or libraries.<ref name="Managed"/>
 
==Storing in memory==
{{Unreferenced section|date=September 2024}}
From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture.
 
The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address.


Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this is not valid machine code. This will typically trigger an architecture specific protection fault.
==Storage==
During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.<ref>{{Cite journal |last1=Su |first1=Chao |last2=Zeng |first2=Qingkai |date=2021 |title=Survey of CPU Cache-Based Side-Channel Attacks: Systematic Analysis, Security Models, and Countermeasures |journal=Security and Communication Networks |language=en |volume=2021 |issue=1 |article-number=5559552 |doi=10.1155/2021/5559552 |doi-access=free |issn=1939-0122}}</ref>


The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on [[Unix-like]] systems memory pages can be toggled to be executable with the {{code|mprotect()}} system call, and on Windows, {{code|VirtualProtect()}} can be used to achieve a similar result. If an attempt is made to execute machine code on a non-executable page, an architecture specific fault will typically occur. Treating [[data as machine code]], or finding new ways to use existing machine code, by various techniques, is the basis of some security vulnerabilities.
From the point of view of a [[process (computing)|process]], the machine code lives in ''code space'', a designated part of its [[virtual address space |address space]]. In a [[Thread (computing)|multi-threading]] environment, different threads of one process share code space along with data space, which reduces the overhead of [[context switching]] considerably as compared to process switching.<ref name=":2">{{Cite web |title=CS 537 Notes, Section #3A: Processes and Threads |url=https://pages.cs.wisc.edu/~bart/537/lecturenotes/processes-threads.html |access-date=2025-07-18 |website=pages.cs.wisc.edu |publisher=School of Computer, Data & Information Sciences, University of Wisconsin-Madison}}</ref>


Similarly, in a segment based system, segment descriptors can indicate whether a segment can contain executable code and in what [[protection ring|rings]] that code can run.
==Readability==
Machine code is generally considered to be not human readable,{{Sfn|Samuelson|1984|p=683}} with [[Douglas Hofstadter]] comparing it to examining the atoms of a [[DNA]] molecule.{{Sfn|Hofstadter|1979|p=[https://archive.org/details/godelescherbach00doug/page/290 290]}} However, various tools and methods support understanding machine code.


From the point of view of a [[process (computing)|process]], the ''code space'' is the part of its [[virtual address space|address space]] where the code in execution is stored. In [[computer multitasking|multitasking]] systems this comprises the program's [[code segment]] and usually [[shared libraries]]. In [[Thread (computing)|multi-threading]] environment, different threads of one process share code space along with data space, which reduces the overhead of [[context switching]] considerably as compared to process switching.
[[Disassembly]] decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.{{Sfn|Tanenbaum|1990|p=[https://archive.org/details/structuredcomput00tane/page/398 398]}}


==Readability by humans==
A [[decompiler]] converts machine code to a [[High-level programming language |high-level language]], but the result can be relatively [[Obfuscation (software)|obfuscated]]; hard to understand.  
Machine code can be seen as a set of electrical pulses that make the instructions readable to the computer; it is not readable by humans,{{Sfn|Samuelson|1984|p=683}} with [[Douglas Hofstadter]] comparing it to examining the atoms of a [[DNA]] molecule.{{Sfn|Hofstadter|1979|p=[https://archive.org/details/godelescherbach00doug/page/290 290]}} However, various tools and methods exist to decode machine code to human-readable [[source code]]. One such method is [[disassembly]], which easily decodes it back to its corresponding assembly language [[source code]] because assembly language forms a one-to-one mapping to machine code.{{Sfn|Tanenbaum|1990|p=[https://archive.org/details/structuredcomput00tane/page/398 398]}}


Machine code may also be decoded to [[High-level programming language|high-level language]] under two conditions. The first condition is to accept an [[Obfuscation (software)|obfuscated]] reading of the source code. An obfuscated version of source code is displayed if the machine code is sent to a [[decompiler]] of the source language. The second condition requires the machine code to have information about the source code encoded within. The information includes a [[symbol table]] that contains [[debug symbol]]s. The symbol table may be stored within the executable, or it may exist in separate files. A [[debugger]] can then read the symbol table to help the programmer interactively [[debugging|debug]] the machine code in [[Instruction cycle|execution]].
A program can be associated with [[debug symbol]]s (either embedded in the [[executable#native executable|native executable]] or in a separate file) that allow it to be mapped to external source code. A [[debugger]] reads the symbols to help a programmer interactively [[debugging |debug]] the program. Examples include:


* The [[SHARE Operating System]] (1959) for the [[IBM 709]], [[IBM 7090]], and [[IBM 7094]] computers allowed for an loadable code format named [[SQUOZE]]. SQUOZE was a compressed binary form of [[assembly language]] code and included a symbol table.
* The [[SHARE Operating System]] (1959) for the [[IBM 709]], [[IBM 7090]], and [[IBM 7094]] computers allowed for an loadable code format named [[SQUOZE]]. SQUOZE was a compressed binary form of [[assembly language]] code and included a symbol table.
* Modern IBM mainframe [[operating system]]s, such as [[z/OS]], have available a symbol table named ''Associated data'' (ADATA). The table is stored in a file that can be produced by the [[IBM High-Level Assembler]] (HLASM),<ref name="IBM_ADA"/><REF name=IBM_ADATA /> IBM's [[COBOL]] compiler,<ref name="IBM_COBOL"/> and IBM's [[PL/I]] compiler,<ref>{{cite web |date=2025-03-17 |title=SYSADATA message information |url=https://www.ibm.com/docs/en/epfz/6.1?topic=guide-sysadata-message-information |url-status=live |work=Enterprise PL/I for z/OS 6.1 information |language=en-US}}</ref> either as a separate SYSADATA file or as ADATA records in a [[Generalized object output file]] (GOFF).<ref name=IBM_GOFF />
* Modern IBM mainframe [[operating system]]s, such as [[z/OS]], have available a symbol table named ''Associated data'' (ADATA). The table is stored in a file that can be produced by the [[IBM High-Level Assembler]] (HLASM),<ref name="IBM_ADA"/><ref name=IBM_ADATA /> IBM's [[COBOL]] compiler,<ref name="IBM_COBOL"/> and IBM's [[PL/I]] compiler,<ref>{{cite web |date=2025-03-17 |title=SYSADATA message information |url=https://www.ibm.com/docs/en/epfz/6.1?topic=guide-sysadata-message-information |work=Enterprise PL/I for z/OS 6.1 information |language=en-US}}</ref> either as a separate SYSADATA file or as ADATA records in a [[Generalized object output file]] (GOFF).<ref name=IBM_GOFF /> This obsoletes the TEST records from [[OS/360]], although it is still possible to request them and to use them in the [[Time Sharing Option|TSO]] TEST command.
* [[Microsoft Windows]] has available a symbol table<ref name="Microsoft_Symbols"/> that is stored in a [[program database]] (<code>.pdb</code>) file.<ref name="Microsoft_PDB"/>
* [[Windows]] uses a symbol table<ref name="Microsoft_Symbols"/> that is stored in a [[program database]] ({{mono|.pdb}}) file.<ref name="Microsoft_PDB"/>
* Most [[Unix-like]] operating systems have available symbol table formats named [[stabs]] and [[DWARF]]. In [[macOS]] and other [[Darwin (operating system)|Darwin]]-based operating systems, the debug symbols are stored in DWARF format in a separate <code>.dSYM</code> file.
* Most [[Unix-like]] operating systems have available symbol table formats named [[stabs]] and [[DWARF]]. In [[macOS]] and other [[Darwin (operating system)|Darwin]]-based operating systems, the debug symbols are stored in DWARF format in a separate {{mono|.dSYM}} file.


==See also==
==See also==
{{Wiktionary|machine code}}
{{Wiktionary|machine code}}
* [[Assembly language]]
* {{Annotated link |Endianness}}
* [[Endianness]]
* {{Annotated link |List of programming languages by type#Machine languages|List of machine languages}}
* [[List of programming languages by type#Machine languages|List of machine languages]]
* {{Annotated link |Machine code monitor}}
* [[Machine code monitor]]
* {{Annotated link |Micro-Professor MPF-I}}
* [[Overhead code]]
* {{Annotated link |Executable#native executable|native executable}}
* [[P-code machine]]
* {{Annotated link |Object code}}
* [[Reduced instruction set computer]] (RISC)
* {{Annotated link |P-code machine}}
* [[Very long instruction word]]
* {{Annotated link |Reduced instruction set computer}} (RISC)
* Teaching Machine Code: [[Micro-Professor MPF-I]]
* {{Annotated link |Very long instruction word}}


==Notes==
==Notes==
Line 173: Line 157:


==References==
==References==
{{reflist|refs=
<references>
<ref name="Andriesse-Bos_2014">{{cite conference |title=Instruction-Level Steganography for Covert Trigger-Based Malware |first1=Dennis |last1=Andriesse |first2=Herbert |last2=Bos |author-link2=:d:Q56565972 |date=2014-07-10<!-- /11 --> |editor-first=Sven |editor-last=Dietrich |conference=11th [[International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment]] (DIMVA) |publisher=[[Springer International Publishing]] |series=[[Lecture Notes in Computer Science]] |publication-place=Egham, UK; Switzerland |location=Vrije Universiteit Amsterdam, Amsterdam, Netherlands |doi=10.1007/978-3-319-08509-8_3 |s2cid=4634611 |id=LNCS 8550 |issn=0302-9743 |eissn=1611-3349 |isbn=978-3-31908508-1 |pages=41–50 [45] |url=https://www.cs.vu.nl/~herbertb/papers/stega_dimva14.pdf |access-date=2023-08-26 |url-status=live |archive-url=https://web.archive.org/web/20230826135254/https://www.cs.vu.nl/~herbertb/papers/stega_dimva14.pdf |archive-date=2023-08-26}} (10 pages)</ref>
<ref name="Andriesse-Bos_2014">{{cite conference |title=Instruction-Level Steganography for Covert Trigger-Based Malware |first1=Dennis |last1=Andriesse |first2=Herbert |last2=Bos |author-link2=:d:Q56565972 |date=2014-07-10<!-- /11 --> |editor-first=Sven |editor-last=Dietrich |conference=11th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) |publisher=[[Springer International Publishing]] |series=[[Lecture Notes in Computer Science]] |publication-place=Egham, UK; Switzerland |location=Vrije Universiteit Amsterdam, Amsterdam, Netherlands |doi=10.1007/978-3-319-08509-8_3 |s2cid=4634611 |id=LNCS 8550 |issn=0302-9743 |eissn=1611-3349 |isbn=978-3-31908508-1 |pages=41–50 [45] |url=https://www.cs.vu.nl/~herbertb/papers/stega_dimva14.pdf |access-date=2023-08-26 |url-status=live |archive-url=https://web.archive.org/web/20230826135254/https://www.cs.vu.nl/~herbertb/papers/stega_dimva14.pdf |archive-date=2023-08-26}} (10 pages)</ref>
<ref name="Dourish_2004">{{cite book |url=https://books.google.com/books?id=DCIy2zxrCqcC&pg=PA7 |title=Where the Action is: The Foundations of Embodied Interaction |last=Dourish |first=Paul |author-link=Paul Dourish |publisher=[[MIT Press]] |date=2004 |access-date=2023-03-05 |page=7 |isbn=0-262-54178-5}}</ref>
<ref name="Dourish_2004">{{cite book |url=https://books.google.com/books?id=DCIy2zxrCqcC&pg=PA7 |title=Where the Action is: The Foundations of Embodied Interaction |last=Dourish |first=Paul |author-link=Paul Dourish |publisher=[[MIT Press]] |date=2004 |access-date=2023-03-05 |page=7 |isbn=0-262-54178-5}}</ref>
<ref name="Gates">{{citation |title=Personal communication |first=William "Bill" Henry |last=Gates |author-link=William Henry Gates III |date=}} (NB. According to {{citeref|Jacob|Jakubowski|Venkatesan|2007|Jacob et al|style=plain}}.)</ref>
<ref name="Gates">{{citation |title=Personal communication |first=William "Bill" Henry |last=Gates |author-link=William Henry Gates III |date=}} (NB. According to {{citeref|Jacob|Jakubowski|Venkatesan|2007|Jacob et al|style=plain}}.)</ref>
Line 180: Line 164:
<ref name="HN_2021">{{cite web |title=Unintended Instructions on x86 |date=2021 |work=Hacker News |url=https://news.ycombinator.com/item?id=27113890 |access-date=2021-12-24 |url-status=live |archive-url=https://web.archive.org/web/20211225000914/https://news.ycombinator.com/item?id=27113890 |archive-date=2021-12-25}}</ref>
<ref name="HN_2021">{{cite web |title=Unintended Instructions on x86 |date=2021 |work=Hacker News |url=https://news.ycombinator.com/item?id=27113890 |access-date=2021-12-24 |url-status=live |archive-url=https://web.archive.org/web/20211225000914/https://news.ycombinator.com/item?id=27113890 |archive-date=2021-12-25}}</ref>
<ref name="IBM_ADA">{{cite web |url=https://www.ibm.com/docs/en/hla-and-tf/1.6?topic=information-associated-data-architecture |title=Associated Data Architecture |work=High Level Assembler and Toolkit Feature}}</ref>
<ref name="IBM_ADA">{{cite web |url=https://www.ibm.com/docs/en/hla-and-tf/1.6?topic=information-associated-data-architecture |title=Associated Data Architecture |work=High Level Assembler and Toolkit Feature}}</ref>
<ref name=IBM_ADATA>{{cite book
<ref name="IBM_ADATA">{{cite book | title = High Level Assembler for z/OS & z/VM & z/VSE - 1.6 -HLASM Programmer's Guide | id = SC26-4941-07 | date = October 2022 | edition = Eighth | section = Associated data file output | section-url = https://www.ibm.com/docs/en/SSENW6_1.6.0/pdf/asmp1024_pdf.pdf#page=304 | pages = 278–332 | url = https://www.ibm.com/docs/en/SSENW6_1.6.0/pdf/asmp1024_pdf.pdf | publisher = [[IBM]] | access-date = February 14, 2025}}</ref>
| title       = High Level Assembler for z/OS & z/VM & z/VSE - 1.6 -HLASM Programmer's Guide
| id         = SC26-4941-07
| date       = October 2022
| edition     = Eighth
| section     = Associated data file output  
| section-url = https://www.ibm.com/docs/en/SSENW6_1.6.0/pdf/asmp1024_pdf.pdf#page=304
| pages       = 278–332
| url         = https://www.ibm.com/docs/en/SSENW6_1.6.0/pdf/asmp1024_pdf.pdf
| publisher   = [[IBM]]
| access-date = February 14, 2025
}}
</ref>
 
<ref name="IBM_COBOL">{{cite web |url=https://www.ibm.com/docs/en/cobol-zos/6.2?topic=appendixes-cobol-sysadata-file-contents |title=COBOL SYSADATA file contents |work=Enterprise COBOL for z/OS}}</ref>
<ref name="IBM_COBOL">{{cite web |url=https://www.ibm.com/docs/en/cobol-zos/6.2?topic=appendixes-cobol-sysadata-file-contents |title=COBOL SYSADATA file contents |work=Enterprise COBOL for z/OS}}</ref>
<ref name=IBM_GOFF>{{cite book
<ref name="IBM_GOFF">{{cite book | title = z/OS - 3.1 - MVS Program Management: Advanced Facilities | id = SA23-1392-60 | date = December 18, 2024 | edition = | section = Appendix C. Generalized object file format (GOFF) | section-url = https://www.ibm.com/docs/en/SSLTBW_3.1.0/pdf/ieab200_v3r1.pdf#page=203 | pages = 201–240 | url = https://www.ibm.com/docs/en/SSLTBW_3.1.0/pdf/ieab200_v3r1.pdf | publisher = [[IBM]] | access-date = February 14, 2025}}</ref>
| title       = z/OS - 3.1 - MVS Program Management: Advanced Facilities
| id         = SA23-1392-60
| date       = December 18, 2024
| edition     =  
| section     = Appendix C. Generalized object file format (GOFF)
| section-url = https://www.ibm.com/docs/en/SSLTBW_3.1.0/pdf/ieab200_v3r1.pdf#page=203
| pages       = 201–240
| url         = https://www.ibm.com/docs/en/SSLTBW_3.1.0/pdf/ieab200_v3r1.pdf
| publisher   = [[IBM]]
| access-date = February 14, 2025
}}
</ref>
<ref name="Jacob-Jakubowski-Venkatesan_2007">{{cite conference |title=Towards Integral Binary Execution: Implementing Oblivious Hashing Using Overlapped Instruction Encodings |first1=Matthias |last1=Jacob |first2=Mariusz H. |last2=Jakubowski |first3=Ramarathnam |last3=Venkatesan |author-link3=:d:Q102402462 |conference=Proceedings of the 9th workshop on Multimedia & Security (MM&Sec '07) |location=Dallas, Texas, US |date=20–21 September 2007 |publisher=[[Association for Computing Machinery]] |isbn=978-1-59593-857-2 |citeseerx=10.1.1.69.5258 |doi=10.1145/1288869.1288887 |s2cid=14174680 |pages=129–140 |url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jacob07overlap.pdf |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20180904062911/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jacob07overlap.pdf |archive-date=2018-09-04}} (12 pages)</ref>
<ref name="Jacob-Jakubowski-Venkatesan_2007">{{cite conference |title=Towards Integral Binary Execution: Implementing Oblivious Hashing Using Overlapped Instruction Encodings |first1=Matthias |last1=Jacob |first2=Mariusz H. |last2=Jakubowski |first3=Ramarathnam |last3=Venkatesan |author-link3=:d:Q102402462 |conference=Proceedings of the 9th workshop on Multimedia & Security (MM&Sec '07) |location=Dallas, Texas, US |date=20–21 September 2007 |publisher=[[Association for Computing Machinery]] |isbn=978-1-59593-857-2 |citeseerx=10.1.1.69.5258 |doi=10.1145/1288869.1288887 |s2cid=14174680 |pages=129–140 |url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jacob07overlap.pdf |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20180904062911/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/jacob07overlap.pdf |archive-date=2018-09-04}} (12 pages)</ref>
<ref name="Jakubowski_2016">{{cite web |title=Graph Based Model for Software Tamper Protection |first=Mariusz H. |last=Jakubowski |date=February 2016 |publisher=[[Microsoft]] |url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/mariuszj-jacob07overlap.ppt |access-date=2023-08-19 |url-status=live |archive-url=https://web.archive.org/web/20191031000757/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/mariuszj-jacob07overlap.ppt |archive-date=2019-10-31}}<!-- ACM Multimedia and Security '07. Dallas, TX (USA) Apply overlapped code towards obfuscation and tamper-resistance via OH. Disassembly tends to resynchronize naturally – but we can prevent this. ... Kruskal count: Such disassembly synchronizes in about B2/16 step --></ref>
<ref name="Jakubowski_2016">{{cite web |title=Graph Based Model for Software Tamper Protection |first=Mariusz H. |last=Jakubowski |date=February 2016 |publisher=[[Microsoft]] |url=https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/mariuszj-jacob07overlap.ppt |access-date=2023-08-19 |url-status=live |archive-url=https://web.archive.org/web/20191031000757/https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/mariuszj-jacob07overlap.ppt |archive-date=2019-10-31}}<!-- ACM Multimedia and Security '07. Dallas, TX (USA) Apply overlapped code towards obfuscation and tamper-resistance via OH. Disassembly tends to resynchronize naturally – but we can prevent this. ... Kruskal count: Such disassembly synchronizes in about B2/16 step --></ref>
<ref name="Jämthagen_2016">{{cite book |title=On Offensive and Defensive Methods in Software Security |last=Jämthagen |first=Christopher |date=November 2016 |issn=1654-790X |number=89 |isbn=978-91-7623-942-1 |type=Thesis |publisher=Department of Electrical and Information Technology, [[Lund University]] |publication-place=Lund, Sweden |page=96 |url=https://lucris.lub.lu.se/ws/portalfiles/portal/15764406/dissertation.pdf |access-date=2023-08-26 |url-status=live |archive-url=https://web.archive.org/web/20230826135321/https://lucris.lub.lu.se/ws/portalfiles/portal/15764406/dissertation.pdf |archive-date=2023-08-26}} (1+xvii+1+152 pages)</ref>
<ref name="Jämthagen_2016">{{cite book |title=On Offensive and Defensive Methods in Software Security |last=Jämthagen |first=Christopher |date=November 2016 |issn=1654-790X |number=89 |isbn=978-91-7623-942-1 |type=Thesis |publisher=Department of Electrical and Information Technology, [[Lund University]] |publication-place=Lund, Sweden |page=96 |url=https://lucris.lub.lu.se/ws/portalfiles/portal/15764406/dissertation.pdf |access-date=2023-08-26 |url-status=live |archive-url=https://web.archive.org/web/20230826135321/https://lucris.lub.lu.se/ws/portalfiles/portal/15764406/dissertation.pdf |archive-date=2023-08-26}} (1+xvii+1+152 pages)</ref>
<ref name="Kinder_2010">{{cite book |title=Static Analysis of x86 Executables |trans-title=Statische Analyse von Programmen in x86 Maschinensprache |type=Dissertation |first=Johannes |last=Kinder |location=Munich, Germany |date=2010-09-24 |publisher=[[Technische Universität Darmstadt]] |id=D17 |url=http://infoscience.epfl.ch/record/167546/files/thesis.pdf |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20201112013336/https://os.zhdk.cloud.switch.ch/tind-tmp-epfl/d6128d9d-0768-42e2-9576-1529206df956?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27thesis.pdf&response-content-type=application%2Fpdf&AWSAccessKeyId=ded3589a13b4450889b2f728d54861a6&Expires=1605231216&Signature=%2FqOKvUdS%2FETy6xHfdFh5q4UJ82k%3D |archive-date=2020-11-12}} (199 pages)</ref>
<ref name="Kinder_2010">{{cite book |title=Static Analysis of x86 Executables |trans-title=Statische Analyse von Programmen in x86 Maschinensprache |type=Dissertation |first=Johannes |last=Kinder |location=Munich, Germany |date=2010-09-24 |publisher=[[Technische Universität Darmstadt]] |id=D17 |url=http://infoscience.epfl.ch/record/167546/files/thesis.pdf |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20201112013336/https://os.zhdk.cloud.switch.ch/tind-tmp-epfl/d6128d9d-0768-42e2-9576-1529206df956?response-content-disposition=attachment%3B%20filename%2A%3DUTF-8%27%27thesis.pdf&response-content-type=application%2Fpdf&AWSAccessKeyId=ded3589a13b4450889b2f728d54861a6&Expires=1605231216&Signature=%2FqOKvUdS%2FETy6xHfdFh5q4UJ82k%3D |archive-date=2020-11-12}} (199 pages)</ref>
<ref name="Lagarias-Rains-Vanderbei_2001">{{cite book |title=The Mathematics of Preference, Choice and Order |first1=Jeffrey "Jeff" Clark |last1=Lagarias |author-link1=Jeffrey Clark Lagarias |first2=Eric Michael |last2=Rains |author-link2=Eric Michael Rains |first3=Robert J. |last3=Vanderbei |chapter=The Kruskal Count |author-link3=Robert J. Vanderbei |date=2009 |orig-date=2001-10-13 |arxiv=math/0110143 |series=Studies in Choice and Welfare |editor-first1=Stephen |editor-last1=Brams |editor-first2=William V. |editor-last2=Gehrlein |editor-first3=Fred S. |editor-last3=Roberts |publisher=[[Springer-Verlag]] |publication-place=Berlin / Heidelberg, Germany |isbn=978-3-540-79127-0 |pages=371–391|doi=10.1007/978-3-540-79128-7_23 }} (22 pages)</ref>
<ref name="Lagarias-Rains-Vanderbei_2001">{{cite book |title=The Mathematics of Preference, Choice and Order |first1=Jeffrey "Jeff" Clark |last1=Lagarias |author-link1=Jeffrey Clark Lagarias |first2=Eric Michael |last2=Rains |author-link2=Eric Michael Rains |first3=Robert J. |last3=Vanderbei |chapter=The Kruskal Count |author-link3=Robert J. Vanderbei |date=2009 |orig-date=2001-10-13 |arxiv=math/0110143 |series=Studies in Choice and Welfare |editor-first1=Stephen |editor-last1=Brams |editor-first2=William V. |editor-last2=Gehrlein |editor-first3=Fred S. |editor-last3=Roberts |publisher=[[Springer-Verlag]] |publication-place=Berlin / Heidelberg, Germany |isbn=978-3-540-79127-0 |pages=371–391|doi=10.1007/978-3-540-79128-7_23}} (22 pages)</ref>
<ref name="Managed">{{cite web |title=Managed, Unmanaged, Native: What Kind of Code Is This? |url=http://www.developer.com/net/cplus/article.php/2197621/Managed-Unmanaged-Native-What-Kind-of-Code-Is-This.htm |website=developer.com |date=28 April 2003 |access-date=2008-09-02}}</ref>
<ref name="Managed">{{cite web |last1=Gregory |first1=Kate |title=Managed, Unmanaged, Native: What Kind of Code Is This? |url=https://www.developer.com/net/cplus/article.php/2197621/Managed-Unmanaged-Native-What-Kind-of-Code-Is-This.htm |website=Developer.com |access-date=2008-09-02|archive-url=https://web.archive.org/web/20090923210338/https://www.developer.com/net/cplus/article.php/2197621/Managed-Unmanaged-Native-What-Kind-of-Code-Is-This.htm |archive-date=2009-09-23|date=2003-04-28}}</ref>
<ref name="Microsoft_PDB">{{cite web |url=https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/querying-the-dot-pdb-file?view=vs-2022 |title=Querying the .Pdb File |website=Microsoft Learn|date=12 January 2024}}</ref>
<ref name="Microsoft_PDB">{{cite web |url=https://learn.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/querying-the-dot-pdb-file?view=vs-2022 |title=Querying the .Pdb File |website=Microsoft Learn|date=12 January 2024}}</ref>
<ref name="Microsoft_Symbols">{{cite web |url=https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/symbols |title=Symbols for Windows debugging |website=Microsoft Learn|date=20 December 2022}}</ref>
<ref name="Microsoft_Symbols">{{cite web |url=https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/symbols |title=Symbols for Windows debugging |website=Microsoft Learn|date=20 December 2022}}</ref>
<ref name="RE_2013">{{cite web |title=What is "overlapping instructions" obfuscation? |date=2013-04-07 |work=Reverse Engineering Stack Exchange |url=https://reverseengineering.stackexchange.com/questions/1531/what-is-overlapping-instructions-obfuscation |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20211225002323/https://reverseengineering.stackexchange.com/questions/1531/what-is-overlapping-instructions-obfuscation |archive-date=2021-12-25}}</ref>
<ref name="RE_2013">{{cite web |title=What is "overlapping instructions" obfuscation? |date=2013-04-07 |work=Reverse Engineering Stack Exchange |url=https://reverseengineering.stackexchange.com/questions/1531/what-is-overlapping-instructions-obfuscation |access-date=2021-12-25 |url-status=live |archive-url=https://web.archive.org/web/20211225002323/https://reverseengineering.stackexchange.com/questions/1531/what-is-overlapping-instructions-obfuscation |archive-date=2021-12-25}}</ref>
<ref name="Shacham_2007">{{cite conference |title=The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86) |first=Hovav |last=Shacham |conference=Proceedings of the ACM, CCS 2007 |publisher=[[ACM Press]] |date=2007 |url=https://hovav.net/ucsd/dist/geometry.pdf |access-date=2021-12-24 |url-status=live |archive-url=https://web.archive.org/web/20211215203157/https://hovav.net/ucsd/dist/geometry.pdf |archive-date=2021-12-15}}</ref>
<ref name="Shacham_2007">{{cite conference |title=The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86) |first=Hovav |last=Shacham |conference=Proceedings of the ACM, CCS 2007 |publisher=[[ACM Press]] |date=2007 |url=https://hovav.net/ucsd/dist/geometry.pdf |access-date=2021-12-24 |url-status=live |archive-url=https://web.archive.org/web/20211215203157/https://hovav.net/ucsd/dist/geometry.pdf |archive-date=2021-12-15}}</ref>
<ref name="Stallings_2015">{{cite book |last=Stallings |first=William |title=Computer Organization and Architecture 10th edition |date=2015 |page=776 |publisher=Pearson Prentice Hall |isbn=9789332570405}}</ref>
<ref name="Stallings_2015">{{cite book |last=Stallings |first=William |title=Computer Organization and Architecture 10th edition |date=2015 |page=776 |publisher=Pearson Prentice Hall |isbn=978-93-325-7040-5}}</ref>
<ref name="Zaks_1982">{{cite book |url=https://archive.org/details/ProgrammingTheZ80 |title=Programming the Z80 |edition=Third Revised |last=Zaks |first=Rodnay |author-link=Rodnay Zaks |publisher=[[Sybex]] |date=1982 |access-date=2023-03-05 |pages=67, 120, 609 |isbn=0-89588-094-6}}</ref>
</references>
}}


==Sources==
==Sources==

Latest revision as of 23:36, 30 December 2025

Template:Short description Script error: No such module "For". Script error: No such module "redirect hatnote". Template:Use dmy dates Template:Use list-defined references

File:W65C816S Machine Code Monitor.jpeg
Machine language monitor running on a W65C816S microprocessor, displaying code disassembly and dumps of processor register and memory

Template:Program execution

In computing, machine code is data encoded and structured to control a computer's central processing unit (CPU) via its programmable interface. A computer program consists primarily of sequences of machine-code instructions.[1] Machine code is classified as native with respect to its host CPU since it is the language that the CPU interprets directly.[2] A software interpreter is a virtual machine that processes virtual machine code.

A machine-code instruction causes the CPU to perform a specific task such as:

An instruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such as x86 and ARM. Generally, machine code compatible with one family is not with others, but there are exceptions. The VAX architecture includes optional support of the PDP-11 instruction set. The IA-64 architecture includes optional support of the IA-32 instruction set. And, the PowerPC 615 can natively process both PowerPC and x86 instructions.

Assembly language

File:Machine language and assembly language.jpg
Translation of assembly into machine code

Assembly language provides a relatively direct mapping from a human-readable source code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.[3] For example, NOP in assembly for an x86 processor represents the x86 architecture opcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in a high-level programming language.

Instruction set

A machine instruction encodes an operation as a pattern of bits based on the specified format for the machine's instruction set.[nb 1][4]

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the word size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.[4]

An instruction set needs to execute the circuits of a computer's digital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.[5] To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.[4]
  • The memory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
  • The number of bits in the address field requires special consideration.[9]

Determining the size of the address field is a choice between space and speed.[9] On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also, virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.[10]

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have one operand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Script error: No such module "anchor".Overlapping instruction

On processor architectures with variable-length instruction sets[11] (such as Intel's x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal count,[12][11][13][14][15] sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.[nb 2] These are called overlapping instructions, overlapping opcodes, overlapping code, overlapped code, instruction scission, or jump into the middle of an instruction.[16][17][18]

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example was in the implementation of error tables in Microsoft's Altair BASIC, where interleaved instructions mutually shared their instruction bytes.[19][11][16] The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on the byte level such as in the implementation of boot loaders which have to fit into boot sectors.[nb 3]

It is also sometimes used as a code obfuscation technique as a measure against disassembly and tampering.[11][14]

The principle is also used in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms.[nb 2]

This property is also used to find unintended instructions called gadgets in existing code repositories and is used in return-oriented programming as alternative to code injection for exploits such as return-to-libc attacks.[20][11]

Microcode

In some computers, the machine code of the architecture is implemented by an even more fundamental underlying layer called microcode, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate porting of machine language programs between different models.[21] An example of this use is the IBM System/360 family of computers and their successors.[22]

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y

For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the logical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in multiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition to transfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.[23]Template:Rp The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional funct (function) field to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits
[  op  |  rs |  rt |  rd |shamt| funct]  R-type
[  op  |  rs |  rt | address/immediate]  I-type
[  op  |        target address        ]  J-type

rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly.[23]Template:Rp

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:[23]Template:Rp

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:[23]Template:Rp

[  op  |  rs |  rt | address/immediate]
   35     3     8           68           decimal
 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:[23]Template:Rp

[  op  |        target address        ]
    2                 1024               decimal
 000010 00000 00000 00000 10000 000000   binary

Bytecode

Machine code is similar to yet fundamentally different from bytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as the Java processor. If bytecode is processed by an software interpreter, then that interpreter is a virtual machine for which the bytecode is its machine code.

Storage

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.[24]

From the point of view of a process, the machine code lives in code space, a designated part of its address space. In a multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.[25]

Readability

Machine code is generally considered to be not human readable,Template:Sfn with Douglas Hofstadter comparing it to examining the atoms of a DNA molecule.Template:Sfn However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.Template:Sfn

A decompiler converts machine code to a high-level language, but the result can be relatively obfuscated; hard to understand.

A program can be associated with debug symbols (either embedded in the native executable or in a separate file) that allow it to be mapped to external source code. A debugger reads the symbols to help a programmer interactively debug the program. Examples include:

See also

Template:Sister project

Notes

<templatestyles src="Reflist/styles.css" />

  1. On early decimal machines, patterns of characters, digits and digit sign
  2. a b While overlapping instructions on processor architectures with variable-length instruction sets can sometimes be arranged to merge different code paths back into one through control-flow resynchronization, overlapping code for different processor architectures can sometimes also be crafted to cause execution paths to branch into different directions depending on the underlying processor, as is sometimes used in fat binaries.
  3. For example, the DR-DOS master boot records (MBRs) and boot sectors (which also hold the partition table and BIOS Parameter Block, leaving less than 446 respectively 423 bytes for the code) were traditionally able to locate the boot file in the FAT12 or FAT16 file system by themselves and load it into memory as a whole, in contrast to their counterparts in MS-DOS and PC DOS, which instead rely on the system files to occupy the first two directory entry locations in the file system and the first three sectors of IBMBIO.COM to be stored at the start of the data area in contiguous sectors containing a secondary loader to load the remainder of the file into memory (requiring SYS to take care of all these conditions). When FAT32 and logical block addressing (LBA) support was added, Microsoft even switched to require i386 instructions and split the boot code over two sectors for code size reasons, which was no option to follow for DR-DOS as it would have broken backward- and cross-compatibility with other operating systems in multi-boot and chain load scenarios, and as with older IBM PC–compatible PCs. Instead, the DR-DOS 7.07 boot sectors resorted to self-modifying code, opcode-level programming in machine language, controlled utilization of (documented) side effects, multi-level data/code overlapping and algorithmic folding techniques to still fit everything into a physical sector of only 512 bytes without giving up any of their extended functions.

Script error: No such module "Check for unknown parameters".

References

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".
  3. Script error: No such module "citation/CS1".
  4. a b c Script error: No such module "Footnotes".
  5. Script error: No such module "Footnotes".
  6. Script error: No such module "Footnotes".
  7. Script error: No such module "Footnotes".
  8. Script error: No such module "Footnotes".
  9. a b Script error: No such module "Footnotes".
  10. Script error: No such module "Footnotes".
  11. a b c d e Script error: No such module "citation/CS1". (12 pages)
  12. Script error: No such module "citation/CS1". (22 pages)
  13. Script error: No such module "citation/CS1". (10 pages)
  14. a b Script error: No such module "citation/CS1".
  15. Script error: No such module "citation/CS1". (1+xvii+1+152 pages)
  16. a b Script error: No such module "citation/CS1".
  17. Script error: No such module "citation/CS1". (199 pages)
  18. Script error: No such module "citation/CS1".
  19. Script error: No such module "citation/CS1". (NB. According to Template:Citeref.)
  20. Script error: No such module "citation/CS1".
  21. Script error: No such module "citation/CS1".
  22. Script error: No such module "Citation/CS1".
  23. a b c d e Script error: No such module "citation/CS1".
  24. Script error: No such module "Citation/CS1".
  25. Script error: No such module "citation/CS1".
  26. Script error: No such module "citation/CS1".
  27. Script error: No such module "citation/CS1".
  28. Script error: No such module "citation/CS1".
  29. Script error: No such module "citation/CS1".
  30. Script error: No such module "citation/CS1".
  31. Script error: No such module "citation/CS1".
  32. Script error: No such module "citation/CS1".

Sources

  • Script error: No such module "citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "citation/CS1".

Further reading

  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".

Template:Application binary interface Script error: No such module "Navbox". Template:Authority control