Uninitialized variable: Difference between revisions

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
imported>Assyrio
m Example of the C language: changed for-loop to reflect modern C, in terms of C99 which is the current minimally supported standard.
 
imported>DreamRimmer bot II
m Bot: Implementing outcome of RfC: converting list-defined references from {{reflist|refs=…}} to <references>…</references> for VisualEditor compatibility
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
{{Short description|Computer program variable of undefined value}}
{{use dmy dates|date=December 2021|cs1-dates=y}}
{{use dmy dates|date=December 2021|cs1-dates=y}}
In [[computing]], an '''uninitialized variable''' is a [[variable (programming)|variable]] that is declared but is not set to a definite known value before it is used. It will have ''some'' value, but not a predictable one. As such, it is a programming error and a common source of [[computer bug|bug]]s in software.
In [[computing]], an '''uninitialized variable''' is a [[variable (programming)|variable]] that is declared but is not set to a definite known value before it is used. It will have ''some'' value, but not a predictable one. As such, it is a programming error and a common source of [[computer bug|bug]]s in software.
Line 7: Line 8:
Here's a simple example in C:
Here's a simple example in C:
<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
void count(void)
void count(void) {
{
     int k;
     int k;
      
      
     for (int i = 0; i < 10; i++)
     for (int i = 0; i < 10; i++) {
    {
         k = k + 1;
         k = k + 1;
     }
     }
Line 22: Line 21:
The final value of <code>k</code> is undefined. The answer that it must be 10 assumes that it started at zero, which may or may not be true. Note that in the example, the variable <code>i</code> is initialized to zero by the first clause of the <code>for</code> statement.
The final value of <code>k</code> is undefined. The answer that it must be 10 assumes that it started at zero, which may or may not be true. Note that in the example, the variable <code>i</code> is initialized to zero by the first clause of the <code>for</code> statement.


Another example can be when dealing with [[struct]]s. In the code snippet below, we have a <code>struct student</code> which contains some variables describing the information about a student. The function <code>register_student</code> leaks memory contents because it fails to fully initialize the members of <code>struct student new_student</code>. If we take a closer look, in the beginning, <code>age</code>, <code>semester</code> and <code>student_number</code> are initialized. But the initialization of the <code>first_name</code> and <code>last_name</code> members are incorrect. This is because if the length of <code>first_name</code> and <code>last_name</code> character arrays are less than 16 bytes, during the <code>strcpy</code>,<ref name="Man7_strcpy"/> we fail to fully initialize the entire 16 bytes of memory reserved for each of these members. Hence after <code>memcpy()</code>'ing the resulted struct to <code>output</code>,<ref name="Man7_memcpy"/> we leak some stack memory to the caller.
Another example can be when dealing with [[struct]]s. In the code snippet below, we have a <code>struct student</code> which contains some variables describing the information about a student. The function <code>registerStudent</code> leaks memory contents because it fails to fully initialize the members of <code>struct Student newStudent</code>. If we take a closer look, in the beginning, <code>age</code>, <code>semester</code> and <code>studentNumber</code> are initialized. But the initialization of the <code>firstName</code> and <code>lastName</code> members are incorrect. This is because if the length of <code>firstName</code> and <code>lastName</code> character arrays are less than 16 bytes, during the <code>strcpy</code>,<ref name="Man7_strcpy"/> we fail to fully initialize the entire 16 bytes of memory reserved for each of these members. Hence after <code>memcpy()</code>ing the resulted struct to <code>output</code>,<ref name="Man7_memcpy"/> we leak some stack memory to the caller.


<syntaxhighlight lang="c">
<syntaxhighlight lang="c">
struct student {
#include <stdio.h>
#include <string.h>
 
struct Student {
    unsigned int studentNumber;
    char firstName[16];
    char lastName[16];
     unsigned int age;
     unsigned int age;
     unsigned int semester;
     unsigned int semester;
    char first_name[16];
    char last_name[16];
    unsigned int student_number;
};
};


int register_student(struct student *output, int age, char *first_name, char *last_name)
// refer to as Student for simplicity
{
typedef struct Student Student;
     // If any of these pointers are Null, we fail.
 
     if (!output || !first_name || !last_name)
int registerStudent(Student* output, int age, char* firstName, char* lastName) {
    {
     // If any of these pointers are NULL, return -1.
         printf("Error!\n");
     if (!output || !first_name || !last_name) {
         fprintf(stderr, "Error! Some parameter is NULL.\n");
         return -1;
         return -1;
     }
     }
Line 44: Line 47:
     // We make sure the length of the strings are less than 16 bytes (including the null-byte)
     // We make sure the length of the strings are less than 16 bytes (including the null-byte)
     // in order to avoid overflows
     // in order to avoid overflows
     if (strlen(first_name) > 15 || strlen(last_name) > 15) {
     if (strlen(firstName) > 15 || strlen(lastName) > 15) {
      printf("first_name and last_name cannot be longer than 16 characters!\n");
        fprintf(stderr, "firstName and lastName cannot be longer than 16 characters!\n");
      return -1;
        return -1;
     }
     }


     // Initializing the members
     // Initializing the members
     struct student new_student;
     Student newStudent;
     new_student.age = age;
     newStudent.age = age;
     new_student.semester = 1;
     newStudent.semester = 1;
     new_student.student_number = get_new_student_number();
     newStudent.student_number = getNewStudentNumber();
      
      
     strcpy(new_student.first_name, first_name);
     strcpy(newStudent.firstName, firstName);
     strcpy(new_student.last_name, last_name);
     strcpy(newStudent.lastName, lastName);


     //copying the result to output
     //copying the result to output
     memcpy(output, &new_student, sizeof(struct student));
     memcpy(output, &newStudent, sizeof(Student));
     return 0;
     return 0;
}
}
</syntaxhighlight>
</syntaxhighlight>


In any case, even when a variable is ''implicitly'' initialized to a ''default'' value like 0, this is typically not the ''correct'' value. Initialized does not mean correct if the value is a default one. (However, default initialization to [[Null pointer|0]] is a right practice for pointers and arrays of pointers, since it makes them invalid before they are actually initialized to their correct value.) In C, variables with static storage duration that are not initialized explicitly are initialized to zero (or null, for pointers).<ref name="ISO9899"/>
In any case, even when a variable is ''implicitly'' initialized to a ''default'' value like 0, this is typically not the ''correct'' value. Initialized does not mean correct if the value is a default one. (However, default initialization to [[Null pointer|{{mono|NULL}}]] (or <code>nullptr</code> since [[C23 (C standard revision)|C23]] or in [[C++]]) is a right practice for pointers and arrays of pointers, since it makes them invalid before they are actually initialized to their correct value.) In C, variables with static storage duration that are not initialized explicitly are initialized to zero (or <code>NULL</code>, for pointers).<ref name="ISO9899"/>


Not only are uninitialized variables a frequent cause of bugs, but this kind of bug is particularly serious because it may not be reproducible: for instance, a variable may remain uninitialized only in some [[conditional (computer programming)|branch]] of the program. In some cases, programs with uninitialized variables may even pass [[software quality assurance|software test]]s.
Not only are uninitialized variables a frequent cause of bugs, but this kind of bug is particularly serious because it may not be reproducible: for instance, a variable may remain uninitialized only in some [[conditional (computer programming)|branch]] of the program. In some cases, programs with uninitialized variables may even pass [[software quality assurance|software test]]s.
Line 76: Line 79:
In other languages, variables are often initialized to known values when created.  Examples include:
In other languages, variables are often initialized to known values when created.  Examples include:
* [[VHDL]] initializes all standard variables into special 'U' value. It is used in simulation, for debugging, to let the user to know when the [[don't care]] initial values, through the [[multi-valued logic]], affect the output.
* [[VHDL]] initializes all standard variables into special 'U' value. It is used in simulation, for debugging, to let the user to know when the [[don't care]] initial values, through the [[multi-valued logic]], affect the output.
* [[Java (programming language)|Java]] does not have uninitialized variables. Fields of classes and objects that do not have an explicit initializer and elements of arrays are automatically initialized with the default value for their type (false for boolean, 0 for all numerical types, null for all reference types).<ref name="Java"/> Local variables in Java must be definitely assigned to before they are accessed, or it is a compile error.
* [[Java (programming language)|Java]] does not have uninitialized variables. Fields of classes and objects that do not have an explicit initializer and elements of arrays are automatically initialized with the default value for their type (<code>false</code> for <code>boolean</code>, <code>0</code> for all numerical types, <code>null</code> for all reference types).<ref name="Java"/> Local variables in Java must be definitely assigned to before they are accessed, or it is a compile error.
* [[Python (programming language)|Python]] initializes local variables to <code>NULL</code> (distinct from <code>None</code>) and raises an <code>UnboundLocalError</code> when such a variable is accessed before being (re)initialized to a valid value.
* [[Python (programming language)|Python]] initializes local variables to <code>NULL</code> (distinct from <code>None</code>) and raises an <code>UnboundLocalError</code> when such a variable is accessed before being (re)initialized to a valid value.
* [[D (programming language)|D]] initializes all variables unless explicitly specified by the programmer not to.
* [[D (programming language)|D]] initializes all variables unless explicitly specified by the programmer not to.
Line 89: Line 92:


==References==
==References==
{{reflist|refs=
<references>
<ref name="Man7_strcpy">[http://man7.org/linux/man-pages/man3/strcpy.3.html strcpy]</ref>
 
<ref name="Man7_memcpy">[http://man7.org/linux/man-pages/man3/memcpy.3.html memcpy()]</ref>
<ref name="Man7_strcpy">[https://man7.org/linux/man-pages/man3/strcpy.3.html strcpy]</ref>
<ref name="ISO9899">{{cite web |title=ISO/IEC 9899:TC3 (Current C standard) |date=2007-09-07 |page=126 |url=http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf |access-date=2008-09-26}} Section 6.7.8, paragraph 10.</ref>
<ref name="Man7_memcpy">[https://man7.org/linux/man-pages/man3/memcpy.3.html memcpy()]</ref>
<ref name="Java">{{cite web |title=Java Language Specification: 4.12.5 Initial Values of Variables |publisher=[[Sun Microsystems]] |url=http://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.12.5 |access-date=2008-10-18}}</ref>
<ref name="ISO9899">{{cite web |title=ISO/IEC 9899:TC3 (Current C standard) |date=2007-09-07 |page=126 |url=https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf |access-date=2008-09-26}} Section 6.7.8, paragraph 10.</ref>
}}
<ref name="Java">{{cite web |title=Java Language Specification: 4.12.5 Initial Values of Variables |publisher=[[Sun Microsystems]] |url=https://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.12.5 |access-date=2008-10-18}}</ref>
 
</references>


==Further reading==
==Further reading==
* {{cite web |title=CWE-457 Use of Uninitialized Variable |url=http://cwe.mitre.org/data/definitions/457.html}}
* {{cite web |title=CWE-457 Use of Uninitialized Variable |url=https://cwe.mitre.org/data/definitions/457.html}}


[[Category:Software bugs]]
[[Category:Software bugs]]
[[Category:Variable (computer science)]]
[[Category:Variable (computer science)]]

Latest revision as of 17:01, 24 December 2025

Template:Short description Template:Use dmy dates In computing, an uninitialized variable is a variable that is declared but is not set to a definite known value before it is used. It will have some value, but not a predictable one. As such, it is a programming error and a common source of bugs in software.

Example of the C language

A common assumption made by novice programmers is that all variables are set to a known value, such as zero, when they are declared. While this is true for many languages, it is not true for all of them, and so the potential for error is there. Languages such as C use stack space for variables, and the collection of variables allocated for a subroutine is known as a stack frame. While the computer will set aside the appropriate amount of space for the stack frame, it usually does so simply by adjusting the value of the stack pointer, and does not set the memory itself to any new state (typically out of efficiency concerns). Therefore, whatever contents of that memory at the time will appear as initial values of the variables which occupy those addresses.

Here's a simple example in C:

void count(void) {
    int k;
    
    for (int i = 0; i < 10; i++) {
        k = k + 1;
    }
    
    printf("%d", k);
}

The final value of k is undefined. The answer that it must be 10 assumes that it started at zero, which may or may not be true. Note that in the example, the variable i is initialized to zero by the first clause of the for statement.

Another example can be when dealing with structs. In the code snippet below, we have a struct student which contains some variables describing the information about a student. The function registerStudent leaks memory contents because it fails to fully initialize the members of struct Student newStudent. If we take a closer look, in the beginning, age, semester and studentNumber are initialized. But the initialization of the firstName and lastName members are incorrect. This is because if the length of firstName and lastName character arrays are less than 16 bytes, during the strcpy,[1] we fail to fully initialize the entire 16 bytes of memory reserved for each of these members. Hence after memcpy()ing the resulted struct to output,[2] we leak some stack memory to the caller.

#include <stdio.h>
#include <string.h>

struct Student {
    unsigned int studentNumber;
    char firstName[16];
    char lastName[16];
    unsigned int age;
    unsigned int semester;
};

// refer to as Student for simplicity
typedef struct Student Student;

int registerStudent(Student* output, int age, char* firstName, char* lastName) {
    // If any of these pointers are NULL, return -1.
    if (!output || !first_name || !last_name) {
        fprintf(stderr, "Error! Some parameter is NULL.\n");
        return -1;
    }

    // We make sure the length of the strings are less than 16 bytes (including the null-byte)
    // in order to avoid overflows
    if (strlen(firstName) > 15 || strlen(lastName) > 15) {
        fprintf(stderr, "firstName and lastName cannot be longer than 16 characters!\n");
        return -1;
    }

    // Initializing the members
    Student newStudent;
    newStudent.age = age;
    newStudent.semester = 1;
    newStudent.student_number = getNewStudentNumber();
    
    strcpy(newStudent.firstName, firstName);
    strcpy(newStudent.lastName, lastName);

    //copying the result to output
    memcpy(output, &newStudent, sizeof(Student));
    return 0;
}

In any case, even when a variable is implicitly initialized to a default value like 0, this is typically not the correct value. Initialized does not mean correct if the value is a default one. (However, default initialization to [[Null pointer|Template:Mono]] (or nullptr since C23 or in C++) is a right practice for pointers and arrays of pointers, since it makes them invalid before they are actually initialized to their correct value.) In C, variables with static storage duration that are not initialized explicitly are initialized to zero (or NULL, for pointers).[3]

Not only are uninitialized variables a frequent cause of bugs, but this kind of bug is particularly serious because it may not be reproducible: for instance, a variable may remain uninitialized only in some branch of the program. In some cases, programs with uninitialized variables may even pass software tests.

Impacts

Uninitialized variables are powerful bugs since they can be exploited to leak arbitrary memory or to achieve arbitrary memory overwrite or to gain code execution, depending on the case. When exploiting a software which utilizes address space layout randomization (ASLR), it is often required to know the base address of the software in memory. Exploiting an uninitialized variable in a way to force the software to leak a pointer from its address space can be used to bypass ASLR.

Use in languages

Uninitialized variables are a particular problem in languages such as assembly language, C, and C++, which were designed for systems programming. The development of these languages involved a design philosophy in which conflicts between performance and safety were generally resolved in favor of performance. The programmer was given the burden of being aware of dangerous issues such as uninitialized variables.

In other languages, variables are often initialized to known values when created. Examples include:

  • VHDL initializes all standard variables into special 'U' value. It is used in simulation, for debugging, to let the user to know when the don't care initial values, through the multi-valued logic, affect the output.
  • Java does not have uninitialized variables. Fields of classes and objects that do not have an explicit initializer and elements of arrays are automatically initialized with the default value for their type (false for boolean, 0 for all numerical types, null for all reference types).[4] Local variables in Java must be definitely assigned to before they are accessed, or it is a compile error.
  • Python initializes local variables to NULL (distinct from None) and raises an UnboundLocalError when such a variable is accessed before being (re)initialized to a valid value.
  • D initializes all variables unless explicitly specified by the programmer not to.

Even in languages where uninitialized variables are allowed, many compilers will attempt to identify the use of uninitialized variables and report them as compile-time errors. Some languages assist this task by offering constructs to handle the initializedness of variables; for example, C# has a special flavour of call-by-reference parameters to subroutines (specified as out instead of the usual ref), asserting that the variable is allowed to be uninitialized on entry but will be initialized afterwards.

See also

References

  1. strcpy
  2. memcpy()
  3. Script error: No such module "citation/CS1". Section 6.7.8, paragraph 10.
  4. Script error: No such module "citation/CS1".

Further reading

  • Script error: No such module "citation/CS1".