Preface
Class file is stored on the disk. If you want to use the class file in the JVM, you need to load it into memory. This article will provide a detailed introduction to the process of loading class files into memory.
1. Overview of Class Loading
The Java Virtual Machine loads data describing a class from the Class file into memory, performs verification, transformation, resolution, and initialization on the data, ultimately forming Java types that can be directly used by the virtual machine. This process is known as the virtual machine’s class loading mechanism.
Unlike languages that require linking at compile time, in the Java language, the loading, linking, and initialization processes of types are completed at runtime. This strategy poses additional challenges for ahead-of-time compilation in Java, and it may slightly increase the performance overhead during class loading. However, it provides Java applications with high extensibility and flexibility. The innate ability of Java to dynamically expand is achieved through the characteristics of dynamic loading and dynamic linking during runtime.
From the moment a type is loaded into the virtual machine’s memory until it is unloaded, its entire lifecycle involves seven stages: loading, verification, preparation, resolution, initialization, use, and unloading. Among these, the verification, preparation, and resolution stages are collectively referred to as the connection phase.
The sequence of loading, verification, preparation, initialization, and unloading is fixed, and the process of loading types must proceed systematically in this order. However, the resolution stage is not necessarily bound to a specific order; in some cases, it can commence after the initialization stage. This flexibility is introduced to support the Java language’s runtime-specific binding, also known as dynamic binding or late binding.
It is important to note that the emphasis here is on the systematic initiation rather than the systematic progression or completion. This is because these stages often intertwine and interact with each other during execution, with one stage calling or activating another within its own process.
2. Loading Stage
2.1 Three Tasks Completed during Loading
Loading is the first stage in the class loading process, which, in simple terms, involves loading the class file of a Java class into the machine’s memory. Subsequently, a prototype of the Java class is constructed in memory, referred to as the ‘class template object.’ This class template object essentially represents a snapshot of the Java class in the JVM’s memory.
During the loading stage, the JVM needs to accomplish the following three tasks:
1.Obtain the binary byte stream that defines the class through the fully qualified name of a class.
Retrieve the byte stream of the class file.
2. Transform the static storage structure represented by this byte stream into the data structure of the method area. Note: The class file is a static binary file containing constants, structures, fields, methods, and attributes, all of which are statically stored in the class file. The transformation is necessary to convert it into a runtime data structure required by the method area.
3. Generate a java.lang.Class object representing the class in memory, serving as the access point for various data of this class in the method area.
The generated object resides in the heap memory, and the access points for fields and methods within our objects are present in the heap. This access point directly connects to the method area.
The ‘Java Virtual Machine Specification’ does not provide extremely specific requirements for these three points, leaving significant flexibility to the virtual machine implementation and the adaptability to Java applications. For instance, the rule ‘Obtain the binary byte stream that defines the class through the fully qualified name of a class’ does not specify that the binary byte stream must be obtained from a particular Class file or, more precisely, does not specify where and how to obtain it.
It is within these gaps that users of the Java virtual machine can construct a fairly open and broad stage during the loading stage. Throughout the evolution of Java, creative developers have explored various possibilities on this stage, and many influential Java technologies are built upon this foundation. Examples include:
- Reading from ZIP archives, which later became the foundation for JAR, EAR, and WAR formats.
- Retrieving from the network, with the most typical application being Web Applets.
2.2 Class Model and Location of Class Instances
During loading, the corresponding class structure is created in the JVM, and the data structure of the class is stored in the method area.
Each class object corresponds to an object of the Class type, which is used to encapsulate the data structure of the class located in the method area. Class objects are stored in the heap.
Externally, one can access the Class object in the heap to obtain the data structure of the Order class. This is the process of reflection, where the Class object serves as an entry point to access specific data structures, methods, and field information in the method area.
2.3 Loading of Array Classes
For array classes, the situation is somewhat different. Array classes themselves are not created by class loaders; they are dynamically constructed in memory by the Java Virtual Machine. However, array classes still have a close relationship with class loaders because the element type of the array (referred to as the ‘Element Type,’ denoting the type obtained by removing all dimensions from the array) ultimately relies on the class loader to complete the loading process. The creation process of an array class (hereafter referred to as C) follows these rules:
1. If the component type (referred to as the ‘Component Type,’ denoting the type obtained by removing one dimension from the array, distinct from the previously mentioned element type) of the array is a reference type, then recursively use the loading process defined in this section to load this component type. Array C will be identified in the class namespace of the class loader loading this component type (this is crucial, as a type must be uniquely determined in conjunction with the class loader).
2. If the component type is not a reference type (e.g., the component type of an int[] array is int), the Java Virtual Machine marks array C as associated with the bootstrap class loader.
3. The accessibility of the array class is consistent with the accessibility of its component type. If the component type is not a reference type, the accessibility of its array class defaults to public and can be accessed by all classes and interfaces.
3. Connection Stage
3.1 Verification
Some actions during the loading stage and the connection stage, such as certain bytecode file format verification actions, are interleaved. While the loading stage may not be complete, the connection stage might have already begun. However, actions performed within the loading stage still belong to the connection stage. The starting times of these two stages still maintain a fixed sequential order.
Verification is the first step in the connection stage. The goal of this stage is to ensure that the byte stream contained in the Class file complies with all the constraints specified in the ‘Java Virtual Machine Specification,’ guaranteeing that this information, when executed as code, will not jeopardize the security of the virtual machine itself.
The verification stage is crucial, as its rigor directly determines whether the Java Virtual Machine can withstand malicious code attacks. From the perspective of code volume and execution performance, the workload of the verification stage constitutes a significant proportion of the overall class loading process.
In a broad sense, the verification stage typically involves the following four phases of verification:
- File format verification
- Semantic checks
- Bytecode verification
- Symbolic reference verification
3.1.1 Format Check
The first phase involves verifying whether the byte stream adheres to the specifications of the Class file format and can be processed by the current version of the virtual machine. This stage includes the following verification points:
1. Whether it begins with the magic number ‘CAFEBABE.’
2. Whether the major and minor version numbers are within the acceptable range for the current Java Virtual Machine.
3. Whether the constants in the constant pool have unsupported constant types (checking constant tag flags).
4. Whether various index values pointing to constants reference non-existent constants or constants of incompatible types.
5. Whether there is additional information added or removed in constant_utf8_info type constants.
6. Whether there is added or removed information in various parts of the Class file or the file itself.
In reality, the verification points in the first phase extend beyond those listed above. The points mentioned are only a small excerpt from the HotSpot virtual machine source code. The primary goal of this verification stage is to ensure that the input byte stream can be correctly parsed and stored in the method area, meeting the format requirements for describing Java type information. This verification phase operates based on the binary byte stream. Only after passing this verification phase is the byte stream allowed to enter the method area in the Java Virtual Machine’s memory for storage. Subsequently, the following three verification stages are all conducted based on the storage structure in the method area, without directly reading or manipulating the byte stream.
3.1.2 Semantic Checks
The second phase involves semantic analysis of the information described in the bytecode to ensure that it complies with the requirements of the ‘Java Language Specification.’ Possible verification points in this stage include:
- Whether the class has a superclass (excluding java.lang.Object, all classes should have a superclass).
- Whether the superclass of the class inherits from a class that is not allowed to be inherited (a class marked with the ‘final’ modifier).
- If the class is not an abstract class, whether it implements all the methods required by its superclass or interfaces.
- Whether fields and methods in the class conflict with those in the superclass (e.g., overriding a final field or having method overloads that violate rules, such as having consistent method parameters but different return types).
- …
The primary purpose of the second phase is to perform semantic verification on the metadata information of the class, ensuring that there are no conflicting metadata information contrary to the ‘Java Language Specification.’
3.1.3 Bytecode Verification
The third phase, which is the most complex of the entire verification process, aims to determine whether the bytecode can be executed correctly without causing harm to the virtual machine. For example, the JVM verifies bytecode for:
- Whether there are jumps to non-existent instructions during bytecode execution.
- Whether correct types of parameters are passed in function calls.
- Whether variable assignments are made to correct data types, and more.
- Ensuring that the data types on the operand stack work in harmony with the instruction code sequence at all times. For example, situations where an int type is placed on the operand stack, but it is loaded into the local variable table as a long type should be prevented.
If the bytecode of a method fails bytecode verification, there is a problem with it. However, passing bytecode verification does not guarantee the safety of the method. Even with extensive and rigorous checks during the bytecode verification stage, it still cannot ensure this.
3.1.4 Symbolic Reference Verification
The final verification behavior occurs when the virtual machine transforms symbolic references into direct references. This transformation takes place in the third stage of connection — the resolution stage. Symbolic reference verification can be seen as a compatibility check for various information outside the class itself (symbolic references in the constant pool). In simple terms, it checks whether the class lacks or is denied access to certain external classes, methods, fields, and other resources it depends on. This stage usually involves verifying the following:
- Whether the fully qualified name described by the string in the symbolic reference can find the corresponding class.
- Whether the specified class contains methods and fields described by the method’s field descriptors and simple names.
- The accessibility (private, protected, public, etc.) of the class, fields, and methods in the symbolic reference is accessible to the current class.
The main purpose of symbolic reference verification is to ensure that the resolution process can be executed normally. If symbolic reference verification cannot pass, the Java Virtual Machine will throw a subclass of the java.lang.IncompatibleClassChangeError, such as:
- java.lang.IllegalAccessError
- java.lang.NoSuchFieldError
- java.lang.NoSuchMethodError, etc.
The verification phase is a crucial but not mandatory stage in the virtual machine’s class loading mechanism. Since the verification stage only results in either passing or failing, and once verification is successful, it has no impact on the runtime behavior of the program. If all code in the program (including self-written, third-party libraries, externally loaded, dynamically generated, etc.) has been thoroughly tested and verified, in the production environment, one might consider using the -Xverify:none parameter to disable most class verification measures and shorten the time taken for virtual machine class loading.
3.2 Preparation
The preparation stage is the formal phase where memory is allocated for variables defined in the class (static variables marked with the ‘static’ modifier) and initial values for class variables are set. Conceptually, the memory used by these variables should be allocated in the method area. However, it’s important to note that the method area itself is a logical region. In JDK 7 and earlier, HotSpot used the permanent generation to implement the method area, which aligned with this logical concept. In JDK 8 and later, class variables are placed in the Java heap along with the Class object. At this point, the statement ‘class variables in the method area’ becomes merely a representation of a logical concept.
Regarding the preparation stage, there are two concepts that can be easily confused and need special emphasis:
Firstly, only class variables are allocated memory during this phase, not instance variables. Instance variables will be allocated in the Java heap along with the object during instantiation.
Secondly, the initial value mentioned here is, under ‘normal circumstances,’ the zero value for the data type. For instance, if a class variable is defined as follows:
public static int value = 123;
The variable ‘value’ will have an initial value of 0 after the preparation stage, not 123. This is because no Java methods have been executed at this point, and the action of assigning ‘value’ to 123 with the ‘putstatic’ instruction is part of the compiled program, stored in the class constructor () method. Therefore, the action of assigning ‘value’ to 123 will only be executed during the class initialization stage.
Lastly, in special cases, if a class field is marked with the ‘final’ modifier and the constant pool of the class contains a ConstantValue attribute for that field, then during the preparation stage, the variable’s value will be initialized to the initial value specified by the ConstantValue attribute. Assuming the definition of the class variable ‘value’ is modified as follows, where ‘value’ is declared as a final variable with an initial value of 123:
public static final int value = 123;
3.3 Resolution
The resolution stage is the process in which the Java Virtual Machine replaces symbolic references in the constant pool with direct references. Symbolic references, such as CONSTANT_Class_info, appear as types of constants in the constant pool. What is the connection between symbolic references and direct references in the resolution stage?
Symbolic References: Symbolic references use a set of symbols to describe the target being referred to. Symbols can take the form of any unambiguous literal, as long as they can be used to unequivocally locate the target.
Direct References: Direct references are pointers, relative offsets, or handles that can directly point to the target. Direct references are directly related to the virtual machine’s implementation of memory layout. The direct reference for the same symbolic reference may not be the same across different instances of the virtual machine. If a direct reference exists, the referenced target must already exist in the virtual machine’s memory.
4. Initialization Stage
Class initialization is the final step in the class loading process, where the Java Virtual Machine finally executes the Java program code written in the class. During the preparation stage, class variables have already been assigned an initial zero value. In the initialization stage, class variables and other resources are initialized.
The initialization process involves executing the class constructor () method.
The Clinit method is automatically generated by the compiler, combining all assignments of class variables and statements in static blocks in the class. Static statements in the superclass take precedence over those in the subclass.
The Clinit method is not necessary for every class or interface. If a class has no static blocks or variable assignments, the Clinit method will not be generated.
The Java Virtual Machine must ensure that the class constructor () method is correctly synchronized in a multi-threaded environment. If multiple threads attempt to initialize a class simultaneously, only one thread will execute the class constructor () method, while the other threads will be blocked and wait until the active thread completes the initialization process.
The JVM specifies six situations in which a class must be initialized:
- Encountering bytecode instructions such as new, getstatic, putstatic, invokestatic.
- When using reflection to invoke a class, triggering initialization.
- During the initialization of a class, if the superclass has not been initialized, the superclass is initialized first.
- When the JVM starts, and a main class (containing the main() method) is specified by the user, the JVM initializes this main class. The class is linked and initialized before the main() method is called, and subsequent classes needed for execution are loaded, linked, and initialized in sequence.
- When creating a MethodHandle instance for the first time, initializing the class associated with the method pointed to by the MethodHandle instance. This is relevant to resolving REF_getStatic, REF_putStatic, REF_invokeStatic method handles.
- When an interface defines a JDK8 new default method, the class implementing that interface needs to be initialized in advance.
4.2 Case 2
public class SuperClass {
public static void main(String[] args) {
Parent[] parents = new Parent[2];
}
}
class Parent {
static {
System.out.println("parent init");
}
public static int value = 123;
}
class Child extends Parent {
static {
System.out.println("child init");
}
}
In this case, there is no output. Even though the ‘new’ keyword is present, what is needed is the bytecode instruction for ‘new.’ In the case of ‘new Parent[2],’ it corresponds to the ‘newarray’ bytecode instruction, so no initialization occurs.
4.3 Case 3
public class SuperClass {
public static void main(String[] args) {
System.out.println(Child.value2);
}
}
class Parent {
static {
System.out.println("parent init");
}
public static int value = 123;
public static final int value2 = 456;
}
class Child extends Parent {
static {
System.out.println("child init");
}
}
The output is 456. Parent initialization is still not triggered. This is because the ‘final’ keyword, when compiled, is stored in the ConstantValue variable, preventing initialization.
Summary:
This article mainly introduces the process of loading class files into memory by the JVM, which can be divided into three steps: loading, linking, and initialization.
- The loading stage is responsible for creating class template objects based on binary data.
- Linking includes verification, preparation, and resolution.
- Initialization assigns values to class variables and executes the clinit method.