Machine code

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Template:Short description Script error: No such module "For". Script error: No such module "redirect hatnote". Template:Use dmy dates Template:Use list-defined references

File:W65C816S Machine Code Monitor.jpeg
Machine language monitor running on a W65C816S microprocessor, displaying code disassembly and dumps of processor register and memory

Template:Program execution

In computing, machine code is data encoded and structured to control a computer's central processing unit (CPU) via its programmable interface. A computer program consists primarily of sequences of machine-code instructions.[1] Machine code is classified as native with respect to its host CPU since it is the language that CPU interprets directly.[2] A software interpreter is a virtual machine that processes virtual machine code.

A machine-code instruction causes the CPU to perform a specific task such as:

An instruction set architecture (ISA) defines the interface to a CPU and varies by groupings or families of CPU design such as x86 and ARM. Generally, machine code compatible with one family is not with others, but there are exceptions. The VAX architecture includes optional support of the PDP-11 instruction set. The IA-64 architecture includes optional support of the IA-32 instruction set. And, the PowerPC 615 can natively process both PowerPC and x86 instructions.

Assembly language

File:Machine language and assembly language.jpg
Translation of assembly into machine code

Assembly language provides a relatively direct mapping from a human-readable source code to machine code. The assembly language source code represents numerical codes in machine code, as mnemonics and labels.[3] For example, NOP in assembly for an X86 processor, represents the x86 architecture opcode 0x90 in machine code. While it is possible to write a program in machine code, doing so is tedious and error-prone. Therefore, programs are usually written in assembly or, more commonly, in a high-level programming language.

Instruction set

A machine instruction encodes an operation as a pattern of bits based on the specified format for the machine's instruction set.[nb 1][4]

Instruction sets differ in various ways. Instructions of a set might all be the same length or different instructions might have different lengths; they might be smaller than, the same size as, or larger than the word size of the architecture. The number of instructions may be relatively small or large. Instructions may or may not have to be aligned on particular memory boundaries, such as the architecture's word boundary.[4]

An instruction set needs to execute the circuits of a computer's digital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components.[5] To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions:

The criteria for instruction formats include:

  • Instructions most commonly used should be shorter than instructions rarely used.[4]
  • The memory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions.
  • The number of bits in the address field requires special consideration.[9]

Determining the size of the address field is a choice between space and speed.[9] On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also, virtual address space needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format.

Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers.[10]

General-purpose instructions control:

  • Data movement from one place to another
  • Monadic operations that have one operand to produce a result
  • Dyadic operations that have two operands to produce a result
  • Comparisons and conditional jumps
  • Procedure calls
  • Loop control
  • Input/output

Script error: No such module "anchor".Overlapping instruction

On processor architectures with variable-length instruction sets[11] (such as Intel's x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal count,[12][11][13][14][15] sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences.[nb 2] These are called overlapping instructions, overlapping opcodes, overlapping code, overlapped code, instruction scission, or jump into the middle of an instruction.[16][17][18]

In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in Microsoft's Altair BASIC, where interleaved instructions mutually shared their instruction bytes.[19][11][16] The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of boot loaders which have to fit into boot sectors.[nb 3]

It is also sometimes used as a code obfuscation technique as a measure against disassembly and tampering.[11][14]

The principle is also used in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms.[nb 2]

This property is also used to find unintended instructions called gadgets in existing code repositories and is used in return-oriented programming as alternative to code injection for exploits such as return-to-libc attacks.[20][11]

Microcode

In some computers, the machine code of the architecture is implemented by an even more fundamental underlying layer called microcode, providing a common machine language interface across a line or family of different models of computer with widely different underlying dataflows. This is done to facilitate porting of machine language programs between different models.[21] An example of this use is the IBM System/360 family of computers and their successors.[22]

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats:

Generic
S,1-11
12-13 Flag, ignored in some instructions
14-17 unused
18-20 Tag
21-35 Y
Index register control, other than TSX
S,1-2 Opcode
3-17 Decrement
18-20 Tag
21-35 Y

For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the logical or of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in multiple tag mode, in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers.

The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y.

A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field.

In addition to transfer (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The MIPS architecture provides a specific example for a machine code whose instructions are always 32 bits long.[23]Template:Rp The general type of instruction is given by the op (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by op. R-type (register) instructions include an additional field funcational to determine the exact operation. The fields used in these types are:

   6      5     5     5     5      6 bits
[  op  |  rs |  rt |  rd |shamt| funct]  R-type
[  op  |  rs |  rt | address/immediate]  I-type
[  op  |        target address        ]  J-type

rs, rt, and rd indicate register operands; shamt gives a shift amount; and the address or immediate fields contain an operand directly.[23]Template:Rp

For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:[23]Template:Rp

[  op  |  rs |  rt |  rd |shamt| funct]
    0     1     2     6     0     32     decimal
 000000 00001 00010 00110 00000 100000   binary

Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:[23]Template:Rp

[  op  |  rs |  rt | address/immediate]
   35     3     8           68           decimal
 100011 00011 01000 00000 00001 000100   binary

Jumping to the address 1024:[23]Template:Rp

[  op  |        target address        ]
    2                 1024               decimal
 000010 00000 00000 00000 10000 000000   binary

Bytecode

Machine code is similar to yet fundamentally different from bytecode. Like machine code, bytecode is typically generated (i.e. by a compiler) from source code. But, unlike machine code, bytecode is not directly executable by a CPU. An exception is if a processor is designed to use bytecode as its machine code, such as the Java processor. If bytecode is processed by an software interpreter, then that interpreter is a virtual machine for which the bytecode is its machine code.

Storage

During execution, machine code is generally stored in RAM although running from ROM is supported by some devices. Regardless, the code may also be cached in more specialized memory to enhance performance. There may be different caches for instructions and data, depending on the architecture.[24]

From the point of view of a process, the machine code lives in code space, a designated part of its address space. In a multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.[25]

Readability

Machine code is generally considered to be not human readable,Template:Sfn with Douglas Hofstadter comparing it to examining the atoms of a DNA molecule.Template:Sfn However, various tools and methods support understanding machine code.

Disassembly decodes machine code to assembly language which is possible since assembly instructions can often be mapped one-to-one to machine instructions.Template:Sfn

A decompiler converts machine code to a high-level language, but the result can be relatively obfuscated; hard to understand.

A program can be associated with debug symbols (either embedded in the native executable or in a separate file) that allow it to be mapped to external source code. A debugger reads the symbols to help a programmer interactively debug the program. Example include:

See also

Template:Sister project

Notes

Template:Reflist

References

  1. Script error: No such module "citation/CS1".
  2. Script error: No such module "citation/CS1".
  3. Script error: No such module "citation/CS1".
  4. a b c Script error: No such module "Footnotes".
  5. Script error: No such module "Footnotes".
  6. Script error: No such module "Footnotes".
  7. Script error: No such module "Footnotes".
  8. Script error: No such module "Footnotes".
  9. a b Script error: No such module "Footnotes".
  10. Script error: No such module "Footnotes".
  11. a b c d e Script error: No such module "citation/CS1". (12 pages)
  12. Script error: No such module "citation/CS1". (22 pages)
  13. Script error: No such module "citation/CS1". (10 pages)
  14. a b Script error: No such module "citation/CS1".
  15. Script error: No such module "citation/CS1". (1+xvii+1+152 pages)
  16. a b Script error: No such module "citation/CS1".
  17. Script error: No such module "citation/CS1". (199 pages)
  18. Script error: No such module "citation/CS1".
  19. Script error: No such module "citation/CS1". (NB. According to Template:Citeref.)
  20. Script error: No such module "citation/CS1".
  21. Script error: No such module "citation/CS1".
  22. Script error: No such module "Citation/CS1".
  23. a b c d e Script error: No such module "citation/CS1".
  24. Script error: No such module "Citation/CS1".
  25. Script error: No such module "citation/CS1".
  26. Script error: No such module "citation/CS1".
  27. Script error: No such module "citation/CS1".
  28. Script error: No such module "citation/CS1".
  29. Script error: No such module "citation/CS1".
  30. Script error: No such module "citation/CS1".
  31. Script error: No such module "citation/CS1".
  32. Script error: No such module "citation/CS1".

Sources

  • Script error: No such module "citation/CS1".
  • Script error: No such module "Citation/CS1".
  • Script error: No such module "citation/CS1".

Further reading

  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".
  • Script error: No such module "citation/CS1".

Template:Application binary interface Template:Types of programming languages Template:Authority control
Cite error: <ref> tags exist for a group named "nb", but no corresponding <references group="nb"/> tag was found