Documentation
Reference Documentation
Here's some other related documentation that you're probably interested in when hacking on Jato:
Java Virtual Machine
- Lindholm, T., and Yellin, F. 1999. The Java™ Virtual Machine Specification, 2nd Ed. URL
- Sun Microsystems. 2005. Clarifications and Amendments to the Java Virtual Machine Specification. URL
- Sun Microsystems. 2003. Java Native Interface 5.0 Specification. URL
- JCP. 2011. Maintenance Review of JSR 924 (JavaTM Virtual Machine Specification) for Java SE 7. URL
Intel i386/x86-64
- Intel® 64 and IA-32 Architectures Software Developer's Manuals. URL
- System V Application Binary Interface Intel386™ Architecture Processor Supplement. URL
- AMD64 Application Binary Interface. URL
PowerPC
- Power Instruction Set Architecture. URL
Dynamic Compilation
- Arnold, M., et al. 2000. Adaptive Optimization in the Jalapeno JVM. URL
- Burke, M., et al. 1999. The Jalapeno Dynamic Optimizing Compiler for Java. URL
- Parikh, V, and Stichnoth, J. 1998. Fast, effective code generation in a just-in-time Java compiler. URL
Method Invocation
- Alpern, B. et al. 2001. Efficient Implementation of Java Interfaces: Invokeinterface Considered Harmless. URL
Register Allocation
- Poletto, M. and Sarkar, V. 1999. Linear scan register allocation. URL
- Traub, O., Holloway, G., and Smith M. D. 1998. Quality and Speed in Linear-scan Register Allocation. URL
- Wimmer, C. 2004. Linear Scan Register Allocation for the Java HotSpot™ Client Compiler. URL
- Wimmer, C. and Mössenböck, H. 2005. Optimized Interval Splitting in a Linear Scan Register Allocator. URL
Garbage Collection
- Diwan, A., Moss. E., and Hudson R. 1992. Compiler Support for Garbage Collection in a Statically Typed Language. URL
- Agesen, O. 1998. GC Points in a Threaded Environment. URL
Exception Handling
- Lee, S. et al. 2000. Efficient Java Exception Handling in Just-in-Time Compilation. URL
Optimization
- Würthinger, T., Wimmer, C., and Mössenböck, H. 2007. Array Bounds Check Elimination for the Java HotSpot™ Client Compiler. URL
- Wimmer, C., and Mössenböck, H. 2008. Automatic Array Inlining in Java Virtual Machines. URL
Compiler Design Overview
The Front-End
The front-end is responsible for parsing bytecodes and generating expression trees for them to be consumed by the instruction selector. However, you're strongly encouraged to write the back-end passes (instruction selection and code emission) for them at the same time to make sure the high-level intermediate representation makes sense.
For the front-end, we use a high-level intermediate representation (HIR) that
is a forest of expression trees. That is, a compilation unit (a method) is
divided into basic blocks
that contain a list of statements and each statement can
operate on an expression tree. Examples of statements include
STMT_STORE that stores an expression to a local variable and
STMT_IF that does conditional branch. The simplest form of
expression is EXPR_VALUE which represents a constant value but
there are more complex types of expressions including binary operations
(EXPR_BINOP) and method invocation (EXPR_INVOKE).
The relationships between a compilation unit, basic blocks, statements, and
expressions are illustrated in Figure 1.
The individual bytecodes are converted either to statements or expressions,
depending on whether they have side-effects or not and how the results of
the operations are used by other bytecodes (see include/jit/statement.h
and include/jit/expression.h for further details).
You can find more information about the bytecode instruction set in
Chapter 6 of the
Java Virtual Machine Specification.
Figure 1: Conceptual model of the Compiler
The Back-End
The back-end is responsible for instruction selection, register allocation, and code emission. The compiler doesn't do any optimizations yet. Both instruction selection and code emission are architecture specific whereas register allocation only has some per-architecture parts. The instruction selector takes the HIR as an input and outputs a list of instructions for each basic block as a low-level intermediate representation (LIR) as illustrated in Figure 1. The per-architecture LIR is very similar to the target machine code with the exception of branch instructions for which we need to calculate branch target offsets very late in the code emission phase.
The architecture specific instruction selector is generated with Monoburg, a code generator generator that produces tree-pattern mmatchers from a Burg-like specification.