In the previous instalment in this series we discussed why benchmarks can be limiting, especially when discussing potential changes to the JIT itself. In this article, we will back-track a bit and actually discuss how custom assembly snippets can be benchmarked in Java by looking at the JVMCI. This article will be heavy with code snippets for anyone wanting to try and follow along with future articles at home.
Let's say we have the below program and we run it.
1 2 3 4 5 6 7 8 9 10 11 12 13
package com.overlyenthusiastic.examples;
public class Substitute {
public static void main(String[] aArgs) throws InterruptedException {
System.out.println(getInteger());
Thread.sleep(1_000);
System.out.println(getInteger());
}
private static int getInteger() {
return 7;
}
}
What would you expect this to output? Naively, you might assume the answer is that it would print two lines, both containing 7. Such assumptions can sometimes be safe when dealing with sane co-workers, but, to me at least, this is far too pedestrian and I would like this to print 7 on the first line and 8 on the second.
In order to achieve this, we're going to use the JVMCI (the same interface used by Graal) to intercept the compilation request and deliver a custom machine code blob. We'll first need to use a compiler directives file and some command line arguments to force compilation of this function and dump the assembly for us to modify:
1 2 3 4 5 6
-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:CompilerDirectivesFile=./samples/src/main/resources/substitute_compiler_directives -XX:PrintAssemblyOptions=intel -XX:+UseG1GC -Xcomp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
[
{
match: "com.overlyenthusiastic.examples.Substitute::main",
inline: [
"-*::*"
]
},
{
match: "com.overlyenthusiastic.examples.Substitute::getInteger",
Exclude: false,
c2: {
PrintNMethods: true
}
},
{
match: "*::*",
Exclude: true
}
]
Using this, I get this output for getInteger:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
sub rsp, 0x18 mov qword ptr [rsp+0x10], rbp cmp dword ptr [r15+0x20], dword 0 ; Thread::_nmethod_disarmed_guard_value jne entry_barrier method_start: mov eax, 0x7 add rsp, 0x10 pop rbp safepoint0_check: cmp rsp, qword ptr [r15+0x448] ; JavaThread::_poll_data::_polling_word ja safepoint0 ret safepoint0: movabs r10, qword safepoint0_check mov qword ptr [r15+0x460], r10 ; JavaThread::_saved_exception_pc jmp Runtime::Safepoint entry_barrier: call Runtime::EntryBarrier jmp method_start ; [Exception Handler] jmp Runtime::ExceptionHandler ; [Deopt Handler Code] call deopt deopt: sub qword ptr [rsp], 0x5 jmp Runtime::Deoptimise
That's far more ceremony than I'd like. Let's change the 7 to an 8 trim it down to:
1 2 3
cmp dword ptr [r15+0x20], dword 0 mov eax, 0x8 ret
The cmp is to trick the JVM into thinking we've got an entry barrier, which is required for methods emitted by the JVMCI. This assembly assembles to 41817F2000000000B808000000C3, which we can feed into the VM using a dummy compiler like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
package com.overlyenthusiastic;
import jdk.vm.ci.code.*;
import jdk.vm.ci.code.site.*;
import jdk.vm.ci.hotspot.*;
import jdk.vm.ci.meta.*;
import jdk.vm.ci.runtime.*;
import java.util.HexFormat;
public class DummyCompiler implements JVMCICompiler {
@Override
public CompilationRequestResult compileMethod(CompilationRequest aRequest) {
if (aRequest.getEntryBCI() != -1) {
return HotSpotCompilationRequestResult.failure("OSR is not supported.", false);
}
final byte[] myMachineCode = HexFormat.of().parseHex("41817F2000000000B808000000C3");
final int myEntryBarrierPatchId = (int) (long) ((HotSpotVMConfigAccess) ((HotSpotJVMCIRuntime) JVMCI.getRuntime()).getConfig())
.getConstant("CodeInstaller::ENTRY_BARRIER_PATCH", Long.class);
final HotSpotCompilationRequest myRequest = (HotSpotCompilationRequest) aRequest;
JVMCI.getRuntime().getHostJVMCIBackend().getCodeCache().installCode(aRequest.getMethod(),
new HotSpotCompiledNmethod("dummy", myMachineCode, myMachineCode.length,
new Site[] { new Mark(0, myEntryBarrierPatchId) },
new Assumptions.Assumption[0], new ResolvedJavaMethod[0], new HotSpotCompiledCode.Comment[0],
new byte[0], 8,
new DataPatch[0], false, 16,
StackSlot.get(ValueKind.Illegal, 0, false),
(HotSpotResolvedJavaMethod) aRequest.getMethod(), aRequest.getEntryBCI(),
myRequest.getId(), myRequest.getJvmciEnv(), false),
null,
new HotSpotSpeculationLog(),
true);
return HotSpotCompilationRequestResult.success(0);
}
@Override
public boolean isGCSupported(int aGcIdentifier) {
return true; // Not dealing with anything needing GC barriers.
}
}
We'll also need a JVMCIServiceLocator implementation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
package com.overlyenthusiastic;
import jdk.vm.ci.runtime.*;
import jdk.vm.ci.services.*;
import javax.annotation.Nullable;
public class DummyServiceLocator extends JVMCIServiceLocator {
@Nullable
@Override
public <S> S getProvider(Class<S> aService) {
if (aService == JVMCICompilerFactory.class) {
//noinspection unchecked
return (S) new JVMCICompilerFactory() {
@Override
public String getCompilerName() {
return "Dummy";
}
@Override
public JVMCICompiler createCompiler(JVMCIRuntime aRuntime) {
return new DummyCompiler();
}
};
}
return null;
}
}
And a META-INF/services/jdk.vm.ci.services.JVMCIServiceLocator file in our resources:
1
com.overlyenthusiastic.DummyServiceLocator
And finally some command line arguments to go along with this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
-XX:+UnlockDiagnosticVMOptions -XX:-TieredCompilation -XX:CompileThreshold=1 -XX:CompilerDirectivesFile=./samples/src/main/resources/substitute_compiler_directives -XX:+UseG1GC -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler -Djvmci.Compiler=Dummy -Xbootclasspath/a:./samples/target/classes --add-modules jdk.internal.vm.ci --add-exports jdk.internal.vm.ci/jdk.vm.ci.code=ALL-UNNAMED --add-exports jdk.internal.vm.ci/jdk.vm.ci.hotspot=ALL-UNNAMED --add-exports jdk.internal.vm.ci/jdk.vm.ci.meta=ALL-UNNAMED --add-exports jdk.internal.vm.ci/jdk.vm.ci.runtime=ALL-UNNAMED --add-exports jdk.internal.vm.ci/jdk.vm.ci.services=ALL-UNNAMED
Phew. Ok, that was a lot. Let's run our application now.
1 2
7 8
Ok! So here what's happening is the first call to getInteger will go through the interpreter while -XX:CompileThreshold=1 causes compilation to begin in the background immediately. This completes while the main thread sleeps and by the time we call it a second time, our new and improved code will give the caller a bigger (and therefore better) int back.
For isolated examples like we have, we actually don't want a full-blown JVMCI compiler to be used. Instead, we can actually compile single methods on-demand by adding this to our toy compiler:
1 2 3 4
public CompilationRequestResult compileMethod(Executable aMethod) {
final ResolvedJavaMethod myMethod = HotSpotJVMCIRuntime.runtime().getHostJVMCIBackend().getMetaAccess().lookupJavaMethod(aMethod);
return compileMethod(new HotSpotCompilationRequest((HotSpotResolvedJavaMethod) myMethod, -1, 0, 0));
}
This can be manually invoked using:
1 2
final Method myMethod = SomeClass.class.getMethod("someMethod");
new DummyCompiler().compileMethod(myMethod); // TODO: Check the result object for an error
This allows us to omit -XX:+UseJVMCICompiler -Djvmci.Compiler=Dummy and -Xbootclasspath/a:... from our command line (but we still require -XX:+EnableJVMCI) and is suitable for use in larger examples (for example, replacing a single function in a testing or staging environment, if ever desired). Installing such a call at the top of our main method results in the following output:
1 2
8 8
Using JVMCI to deliver non-trivial code chunks by pasting in raw machine code, including snippets containing things like embedded object literals, raw data constants, implicit & explicit exceptions, working safepoints, etc, is possible though frustrating. Instead, a custom assembler (the source of which is too large to fit in the margin of this blog) will be used to parse snippets (as shown below) that have additional mark-up for communicating metadata to the runtime (such as the location of the nmethod_entry_barrier from above, calls that need to be relocated/fixed-up, exception information, etc). Future examples in this series will show the assembly to be benchmarked, however this is the last (and I suppose first) time I'll provide accompanying machine code and site/mark information (due to verbosity, the likelihood of JVM changes making such information useless, the effort required to produce it, and the low chance of a reader actually using such information).
Here is the getInteger example with annotated site/mark information:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
sub rsp, dword 0x18 ;!mark verified_entry mov qword ptr [rsp+0x10], rbp cmp dword ptr [r15+0x20], dword 0 ;!mark nmethod_entry_barrier jne entry_barrier method_start: mov eax, 0x8 add rsp, 0x10 pop rbp safepoint0_check: cmp rsp, qword ptr [r15+0x448] ;!site safepoint ; JavaThread::_poll_data::_polling_word ja safepoint0 ret safepoint0: lea r10, [rip + rel safepoint0_check] mov qword ptr [r15+0x460], r10 ; JavaThread::_saved_exception_pc jmp Runtime::Safepoint ;!site internal_call entry_barrier: call Runtime::EntryBarrier ;!site internal_call jmp method_start jmp Runtime::ExceptionHandler ;!mark exception_handler_entry !site call call deopt ;!mark deopt_handler_entry deopt: sub qword ptr [rsp], 0x5 jmp Runtime::Deoptimise ;!site internal_call
This assembles to: 4881EC1800000048896C241041817F20000000007527B8080000004883C4105D493BA7480400007701C34C8D15000000004D899760040000E900000000E800000000EBD2E900000000E80000000048832C2405E900000000.
The sites are as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
// ...
final long myDeoptBlobAddress = getAddress("CompilerToVM::Data::SharedRuntime_deopt_blob_unpack");
final long myEntryBarrierAddress = getAddress("CompilerToVM::Data::nmethod_entry_barrier");
final long mySafepointBlobAddress = getAddress("CompilerToVM::Data::SharedRuntime_polling_page_return_handler");
final long myExceptionHandlerAddress = getAddress("CompilerToVM::Data::SharedRuntime_polling_page_return_handler");
final int myVerifiedEntryId = (int) getConstant("CodeInstaller::VERIFIED_ENTRY");
final int myEntryBarrierPatchId = (int) getConstant("CodeInstaller::ENTRY_BARRIER_PATCH");
final int myExceptionHandlerEntryId = (int) getConstant("CodeInstaller::EXCEPTION_HANDLER_ENTRY");
final int myDeoptHandlerEntryId = (int) getConstant("CodeInstaller::DEOPT_HANDLER_ENTRY");
final DebugInfo myDebugInfo = new DebugInfo(new BytecodeFrame(null, aRequest.getMethod(), 0,
true, false, new JavaValue[0], new JavaKind[0], 0, 0, 0),
new VirtualObject[0]);
myDebugInfo.setReferenceMap(new HotSpotReferenceMap(new Location[0], new Location[0], new int[0], 0));
final Site[] mySites = new Site[] {
new Mark(0x00, myVerifiedEntryId),
new Mark(0x0C, myEntryBarrierPatchId),
new Infopoint(0x20, myDebugInfo, InfopointReason.SAFEPOINT),
new Call(new HotSpotForeignCallTarget(mySafepointBlobAddress), 0x38, 5, true, myDebugInfo),
new Call(new HotSpotForeignCallTarget(myEntryBarrierAddress), 0x3D, 5, true, myDebugInfo),
new Mark(0x44, myExceptionHandlerEntryId),
new Call(new HotSpotForeignCallTarget(myExceptionHandlerAddress), 0x44, 5, true, myDebugInfo),
new Mark(0x49, myDeoptHandlerEntryId),
new Call(new HotSpotForeignCallTarget(myDeoptBlobAddress), 0x53, 5, true, myDebugInfo),
};
// ...
private static long getConstant(String aName) {
return ((HotSpotVMConfigAccess) ((HotSpotJVMCIRuntime) JVMCI.getRuntime()).getConfig())
.getConstant(aName, Long.class);
}
private static long getAddress(String aName) {
return ((HotSpotVMConfigAccess) ((HotSpotJVMCIRuntime) JVMCI.getRuntime()).getConfig())
.getFieldValue(aName, Long.class, "address");
}
To benchmark this, we'll be using JMH.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
@State(Scope.Benchmark)
public class Article3 {
@Setup
public void setup() throws Exception {
final Method myMethod = Article3.class.getMethod("getIntegerJvmci");
@Nullable
final Object myFailure = new DummyCompiler().compileMethod(myMethod).getFailure();
if (myFailure != null) {
throw new IllegalStateException("Failed to compile: " + myFailure);
}
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int getIntegerJvmci() {
throw new IllegalStateException("Unreachable");
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int getIntegerStock() {
return 8;
}
public static void main() throws Exception {
new Runner(new OptionsBuilder()
.mode(Mode.AverageTime)
.timeUnit(TimeUnit.NANOSECONDS)
.warmupIterations(3)
.measurementIterations(5)
.shouldDoGC(true)
.measurementTime(TimeValue.seconds(5))
.warmupTime(TimeValue.seconds(3))
.forks(3)
.threads(1)
.include(Article3.class.getName())
.jvmArgs("-Xmx8G", "-XX:+UseG1GC",
"-XX:+UnlockExperimentalVMOptions", "-XX:+UnlockDiagnosticVMOptions",
"-XX:+EnableJVMCI", "-XX:-TieredCompilation",
"-XX:CompileCommand=print,com/overlyenthusiastic/benchmarks/Article3.*",
"-XX:PrintAssemblyOptions=intel",
"--add-modules", "jdk.internal.vm.ci",
"--add-exports", "jdk.internal.vm.ci/jdk.vm.ci.code=ALL-UNNAMED",
"--add-exports", "jdk.internal.vm.ci/jdk.vm.ci.hotspot=ALL-UNNAMED",
"--add-exports", "jdk.internal.vm.ci/jdk.vm.ci.meta=ALL-UNNAMED",
"--add-exports", "jdk.internal.vm.ci/jdk.vm.ci.runtime=ALL-UNNAMED",
"--add-exports", "jdk.internal.vm.ci/jdk.vm.ci.services=ALL-UNNAMED")
.build()).run();
}
}
The result of this, unsurprisingly, is the below boring result.
1 2 3
Benchmark Mode Cnt Score Error Units Article3.getIntegerJvmci avgt 5 0.771 ยฑ 0.026 ns/op Article3.getIntegerStock avgt 5 0.772 ยฑ 0.017 ns/op
It turns out that identical code has identical performance. Let's hope in part 4 where we actually start changing up what assembly we use we can observe some differences.
ยฉ 2024-2025 James Venning, All Rights Reserved
Any trademarks are properties of their respective owners. All content and any views or opinions expressed are my own and not associated with my employer. This site is not affiliated with Oracleยฎ.