Introduction

This manual describes the internals of the Jato virtual machine.

The reader is expected to have basic knowledge of the C programming language, Java Virtual Machine [Lindholm99] [Sun05] [JCP11], Java Native Interface [Sun03], and machine architecture.

The Virtual Machine

The virtual machine core manages classes and objects at runtime. It implements JVM semantics but at code level is mostly decoupled from classfiles, bytecode execution, and the Java runtime.

Classes

Classes are internally represented by struct vm_class and externally by java/lang/VMClass.

Objects

Objects are internally represented by struct vm_object and externally by java/lang/VMObject.

Methods

Methods are internally represented by struct vm_method and externally by java/lang/reflect/VMMethod.

Fields

Fields are internally represented by struct vm_field and externally by java/lang/reflect/VMField.

Exceptions

Exceptions are not handle by the core VM itself. Instead, they are dealt with by the JIT runtime support code.

The Just-in-Time Compiler

The compiler is divided into the following passes: control-flow graph construction, bytecode parsing, instruction selection, and code emission. The compiler analyzes the given bytecode sequence to find basic blocks for constructing the control-flow graph. This pass is done first to simplify parsing of bytecode branches. Bytecode sequence is then parsed and converted to an expression tree. The tree is given to the instruction selector to lower the IR to three-address code. Code emission phase converts that sequence to machine code which can be executed.

Programs are compiled one method at a time. Invocation of a method is replaced with an invocation of a special per-method JIT trampoline that is responsible for compiling the actual target method upon first invocation.

Compiler Passes

Subroutine Inlining

The first compiler pass is inlines all subroutines which are used to represent finally blocks in bytecode. This effectively eliminates the use of JSR, JSR_W and RET bytecode instruction which simplifies the rest of the JIT compiler pipeline.

Control-Flow Graph Analysis

The control-flow graph analysis pass detects basic block boundaries from bytecode and constructs a CFG.

Bytecode Parsing ("BC2IR")

The JVM has a stack-based architecture which means instructions operate on a stack. Modern CPUs, on the other hand, are register-based which means that instructions operate on registers. One of the first things the JIT compiler needs to do is to convert the stack-based bytecode into a register-based intermediate representation. The algorithm that does the conversion is referred to as BC2IR in literature. Entry point to the algorithm can be found in jit/bytecode-to-ir.c::convert_to_ir().

Instruction Selection

The instruction selector takes the HIR as an input and outputs LIR. The actual instruction selector is generated from a Monoburg rules file (e.g. arch/x86/insn-selector_32.brg).

Static-Single Assignment (SSA) Form

Static-Single Assignment Form is a representation form where every variable is defined exactly once and every use of a variable refers at most one definition. This intermediate representation form offers an uniform support for a large group of data analysis problems, and many optimizations and data flow algorithms are more efficient on the SSA form than on the normal Control Flow Graph form.

Support for SSA form implies two major steps: translating from LIR form to SSA form (lir_to_ssa function in jit/ssa.c) and translating from SSA form to LIR form (ssa_to_lir function in jit/ssa.c). The back translation is needed because the SSA form introduces virtual phi functions that cannot be processed by the code emission stage. Between these two steps some optimizations can be applied. Jato offers a simple variant of dead code elimination (jit/dce.c), copy folding (__rename_variables in jit/ssa.c) and a simple variant of array bound check elimination (jit/abc.c). All these optimizations are applied only if array bounds check elimination is required.

Liveness Analysis

Liveness analysis is a dataflow analysis that calculates which variables are in use at a given program point. The results of liveness analysis are used by the register allocator.

Register Allocation

The register allocator uses the linear scan register allocation algorithm [Poletto99] [Traub98] with interval splitting optimization [Wimmer04] [Wimmer05].

Resolution Blocks

The resolution blocks were introduced to solve the problem with reloading registers when jumping from one (source) basic block to another basic block (destination).

Typically when you have a virtual register which has live ranges before source block and in the destination block, what you must do is put MOV instructions at the end of the source basic block which restore register value from a spill slot. Actually you must put them before the JMP instruction if there is one at the end of source block.

Now, it might be the case that the ending JMP instruction is using a virtual register, which is allocated the same machine register as is to one of the virtual registers which are reloaded. If movs are before the jump, then the register which is used by JMP would by clobbered by those reload instructions and we would not jump into the right place. This happens because register allocator sees all those virtual registers as not live in the source block. As one might think of many different solutions to this problem, introducing resolution blocks was one that was probably the simplest one to implement. We put reload MOVs into an intermediate block on each (?) edge to avoid the problem of clobbering registers allocated to not-yet-dead virtual registers.

Code Generation

TODO

Intermediate Representations

The compiler uses two different intermediate representations: high-level intermediate representation (HIR) in the frontend and low-level intermediate representation (LIR) in the backend. The HIR is an abstract syntax tree (AST) of the compiled bytecode whereas LIR is corresponds almost one-to-one to the target machine instructions.

The JIT compiler operates on one method at a time called a compilation unit. A compilation unit is made up of one or more basic blocks which represent straight-line code. In HIR, each basic block has a list of one or more statements that can either be standalone or operate on one or two expression trees.

The instruction selector emits LIR for a compilation unit from HIR expression tree. This intermediate representation is essentially a sequence of instructions that mimic the native instruction set. One notable exception is branch targets which are represented as pointers to instructions. The pointers are converted to real machine code targets with back-patching during code emission.

Both HIR and LIR are standard intermediate representations that are documented in detail in, for example, [Muchnick97] and [Burke99]. Literature also describes a middle-level intermediate representation (MIR) but the compiler does not use that.

Intermediate Representations

High-Level Intermediate Representation (HIR)

For the front-end, we use a high-level intermediate representation (HIR) that is a forest of expression trees. That is, a compilation unit (a method) is divided into basic blocks that contain a list of statements and each statement can operate on an expression tree. Examples of statements include STMT_STORE that stores an expression to a local variable and STMT_IF that does conditional branch. The simplest form of expression is EXPR_VALUE which represents a constant value but there are more complex types of expressions including EXPR_BINOP for binary operations and EXPR_INVOKE for method invocation.

  • struct statement

  • struct expression

Low-Level Intermediate Representation (LIR)

  • struct operand

  • struct insn

Machine Architecture Support

TODO

Application Binary Interface (ABI)

TODO

Instruction Encoding

TODO

Java Runtime Interface

The virtual machine relies on GNU Classpath to provide essential Java libraries including java/lang/Object and java/lang/Class. GNU Classpath integration is documented in more detail here:

References

  • [Lindholm99] Tim Lindholm and Frank Yellin. The Java™ Virtual Machine Specification, 2nd Ed. 1999. URL

  • [Burke99] Michael Burke et al. The Jalapeno Dynamic Optimizing Compiler for Java. 1999. URL

  • [JCP11] JCP. Maintenance Review of JSR 924 (Java™ Virtual Machine Specification) for Java SE 7. 2011. URL

  • [Muchnick97] Steven Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann. 1997. ISBN 1558603204.

  • [Poletto99] Massimiliano Poletto and Vivek Sarkar. Linear scan register allocation. 1999. URL

  • [Sun03] Sun Microsystems. Java Native Interface 5.0 Specification. 2003. URL

  • [Sun05] Sun Microsystems. Clarifications and Amendments to the Java Virtual Machine Specification. 2005. URL

  • [Traub98] Omri Traub, Glenn Holloway, and Michael D. Smith M. Quality and Speed in Linear-scan Register Allocation. 1998. URL

  • [Wimmer04] Christian Wimmer. Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. 2004. URL

  • [Wimmer05] Christian Wimmer and Hanspeter Mössenböck. Optimized Interval Splitting in a Linear Scan Register Allocator. 2005. URL