Have you ever wondered how are your java files executed or where is your local variables or class data stored or how does JVM works? If yes, and you still confused about all of it, then this post is what you must read.
In this post, we will be discussion JVM Architecture in detail.
Before we start with the JVM architecture, let us revise the compiling and linking of our C code. Lets just brief ourselves to how C codes are compiled and executed and then we’ll go through the overview of Java code compilation and then into the main architecture.
Let us suppose there are 3 classes: a1.c (with main function), a2.c (with f1 function) and a3.c (with f2 function). These classes are then sent to the compiler which then generates machine code for these 3 classes. (a1.obj, a2.obj and a3.obj). The machine code is then sent to the linker to generate an executable code i.e a.exe file. This a.exe file is then sent for execution in RAM. Hence. execution process in C/C++ is quite fast.
Let us suppose there are 3 java source files: a1.java (with main method), a2.java (with f1 method) and a3.c (with f2 method). These classes are then sent to the compiler which then generates byte code i.e the class files. (a1.class, a2.class and a3.class). The byte code is then sent to the JVM which resides in RAM. (Remember, there is no linking done in Java). The JVM the loads the class files (with the help of class loaders) from respective class paths and then verifies bytecodes for any viruses, etc (Java codes are always secure). The after verifying the byte code, the execution engine executes the byte code and produces machine code. Hence. the process is quite slow compared to C language.
JVM is a virtual machine that resides in our memory and is responsible to translate bytecode to machine code or into actions or Operating System calls. For example, a request to establish a socket connection to a remote machine will involve an Operating System call. Different Operating Systems handle sockets in different ways – but the programmer doesn’t need to worry about such details. It is the responsibility of the JVM to handle these translations so that the Operating System and the CPU architecture on which the Java software is running is completely irrelevant to the developer. (See figure below.)
JVM consists of 3 components:
1. ClassLoader SubSystem: This system is responsible for loading, linking and initialization of our class files.
- Loading: Have you ever wondered while using any java class like String, or Object from where do they get imbibed in your project? They must be residing at some place in your system and if they do, they must be loaded before your application does. This loading of all your JAVA API classes path ( the runtime classes in rt.jar, internationalization classes in i18n.jar, and others.) is done by BOOTSTRAP CLASSLOADER. Then the classes in JAR files in the lib/ext directory of the JRE,etc are loaded by EXTENSION CLASSLOADER. And in the later stage all user-defined classes gets loaded as and when required by SYSTEM CLASS LOADER. So there are basically 3 classloaders: Bootstrap Loader, Extension and System Class Loaders.
- Linking: The bytecode is verified (all bytecode is verified for any virus,etc) , prepared (i.e all static variables are initialised to default values) and resolved (all memory symbolic references are replaced with original references from method area) during this process.
- Initialization: In this phase the static classes are executed and static variables are given their original values.
2. RunTime Memory Areas: As we know, to load any class or to store any variable, constant,etc some memory is required. Also. to execute some memory is again required. This memory is fetched from specially designated Runtime memory areas residing in JVM. JVM consists of 5 runtime data areas.
- METHOD AREA: It consists of all the class data (static variables, method implementations, etc)
- HEAP AREA: it consists of the object data.
- STACK AREA: For each thread in the stack, a separate memory (TLA) will be allocated. Each entry in the stack is known as a stack frame which consists of 3 parts: local variable, frame data and operands.
- PC Registers AREA: These hold the next executing instruction for each thread present in stack.
- NATIVE METHOD STACK AREA: This holds the stack area for native emthods in our code.
3. Execution Engine: This is the central part of the JVM architecture which is responsible to translate your bytecode into machine code i.e your operating system could read and execute. It consists of 4 parts:
- INTERPRETER: Interpreter does the line by line interpretation of our bytecode.
- JIT COMPILER: Interpreting the loop code line-by-line can consume a lot of time. The Just-In-Time Compiler comes as a saviour. It compiles the bytecode and generates the machine code at once. It first converts the hotspot code into the intermediate code with the help of intermediate code generator. The Code optimizer then optimizes the intermediate code which is the converted into the native or machine code by target code generator.
- PROFILER: How do we know, whether the code consists of loops, or recursion is taking place during interpretation of our bytecode? This is done by Profiler. Profiler identifies the hotspots in our bytecode and send them to the JIT compiler.
- GARBAGE COLLECTOR: It is responsible to remove unused objects present in heap area to allow new object allocations. This process of allocating allocating new objects and removing unused objects to make space for those new object allocations is known as Memory Management. We will discuss it in detail in next post.
Sometimes, we require native codes or libraries in our application. These native information is provided by the JNI (Java Native Interfaces) stored in the Native method libraries. The whole JVM Architecture is now explained in the diagram below:
The Java Virtual Machine exists only in the memory of our computer. Reproducing a machine within our computer’s memory requires a mechanism which is the byte code instruction set.
To examine byte code, we can use the Java class file disassembler, javap. By examining bytecode instructions in detail, we gain valuable insight into the inner workings of the Java Virtual Machine and Java itself. Each byte code instruction performs a specific function of extremely limited scope, such as pushing an object onto the stack or popping an object off the stack. Combinations of these basic functions represent the complex high-level tasks defined as statements in the Java programming language. As amazing as it seems, sometimes dozens of byte code instructions are used to carry out the operation specified by a single Java statement. When we use these byte code instructions with the the Virtual Machine, Java gains its platform independence and becomes the most powerful and versatile programming language in the world.