HotSpot AOT Internals and performance results

HotSpot AOT Internals and performance results Vladimir Kozlov Igor Veresov HotSpot Compiler Team, Oracle October 5, 2017 Copyright © 2017, Oracle and...
Author: Phillip Poole
0 downloads 0 Views 387KB Size
HotSpot AOT Internals and performance results Vladimir Kozlov Igor Veresov HotSpot Compiler Team, Oracle October 5, 2017

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

1

Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

2

AOT motivations • Needed for longer term strategy of supporting Future Java based JIT compiler. • Provide faster startup for applications since hot methods and class Initializers will be readily available. Expected to be important for Cloud. • Provide quicker time to peak performance. Statically generated code could be 1st pass in a multi-tiered compilation system. • Density improvement - sharing AOT’d code, app dependent. • “Prevent global warming”. AOT uses much less CPU power by running compiled code instead of interpreting from start. And non-tiered (static) AOT excludes JIT compilations and profiling for corresponding java methods.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

3

AOT functionality overview New JDK tool jaotc is used for AOT compilation. It uses Graal-core as the code-generating backend. In JDK 9 Libelf was used to produce AOT shared libraries in ELF format. In JDK 10 we removed dependency on Libelf and added support for macOS and Windows.
 


To use jaotc user have to specify list of .class, .jar files or java module names as input and resulting AOT library name as output (unnamed.so is used if name is not specified): jaotc --output libHelloWorld.so HelloWorld.class jaotc --output libjava.base.and.javac.so --module java.base:jdk.compiler User can specify which methods to compile or exclude with --compile-commands flag 


jaotc --output libjava.base.so --compile-commands base.txt —module java.base The command file can have 2 commands: exclude or compileOnly

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

4

AOT functionality overview AOT code can be compiled in two modes controlled by --compile-for-tiered flag (by default it is off currently): • Non-tiered AOT compiled code behaves similarly to statically compiled C++ code (or C1 JIT compiled code in Client VM), in that no profiling information is collected and no JIT recompilations will happen if AOT code is not deoptimized. • Tiered AOT compiled code does collect profiling information. The profiling done is the same as the simple profiling (invocation + back-branch counters) done by C1 methods compiled at Tier 2. If AOT methods hit the AOT invocation thresholds then these methods are recompiled by C1 at Tier 3 first in order to gather full profiling information. This is required for C2 JIT recompilations in order to produce optimal code and reach peak application performance.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

5

AOT functionality overview Currently the same JVM version and runtime configuration should be used during AOT compilation: jaotc -J-XX:+UseParallelGC -J-XX:-UseCompressedOops --output libHelloWorld.so HelloWorld.class java -XX:+UseParallelGC -XX:-UseCompressedOops -XX:AOTLibrary=./libHelloWorld.so HelloWorld

They are recorded in AOT library and verified during execution. If verification failed this AOT library will not be used and JVM will continue run or exit if diagnostic flag -XX:+UseAOTStrictLoading is specified. AOT recompilation is required when Java is updated.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

6

AOT functionality overview AOT tool jaotc does not resolve referenced classes which are not system classes or part of compiled classes. Referenced classes have to be added to class path: jaotc --output=libfoo.so --jar foo.jar -J-cp -J./ or additional java modules are specified: jaotc --output=libactivation.so --module java.activation -J--add-module=java.se.ee Otherwise ClassNotFoundException could be thrown during AOT compilation.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

7

AOT functionality overview During JVM startup AOT initialization code looks for well-known AOT libraries in well-known location ($JAVA_HOME/lib) or libraries specified by -XX:AOTLibrary JVM flag. If shared libraries are found, these libraries are loaded and used. If no shared libraries can be found, AOT will be turned off. java -XX:AOTLibrary=./libHelloWorld.so,./libjava.base.so HelloWorld JVM knows AOT libraries names for next Java modules.: java.base, jdk.compiler (javac), jdk.scripting.nashorn, jdk.internal.vm.ci (JVMCI), jdk.internal.vm.compiler (Graal) Note, user himself have to compile and install well-known AOT libraries. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

8

AOT functionality overview The set of AOT libraries could be generated for different execution environment. JVM knows next well-known names for AOT libraries generated for specific runtime configuration. It will look for them in $JAVA_HOME/lib directory and load the one which correspond to current run-time configuration: -XX:-UseCompressedOops -XX:+UseG1GC : libjava.base.so -XX:+UseCompressedOops -XX:+UseG1GC : libjava.base-coop.so -XX:-UseCompressedOops -XX:+UseParallelGC : libjava.base-nong1.so -XX:+UseCompressedOops -XX:+UseParallelGC : libjava.base-coop-nong1.so

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

9

AOT functionality overview • Code sections in AOT library are treated by JVM as extension of existing CodeCache. When Java class is loaded JVM looks if corresponding AOT-compiled methods exist in loaded AOT libraries and add links to them from java methods descriptors. • AOT-compiled code follows the same invocation/deoptimization/unloading rules as normal JIT-compiled code. • To detect changes in classes AOT uses class fingerprinting. During AOT compilation fingerprint for each class is generated and stored in AOT library. During execution, when a class is loaded and AOT-compiled code is found for this class, fingerprint for class is compared to one stored in AOT library. If there is mismatch then AOT code for that particular class is not used (aot code marked non-entrant).

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

10

Graal changes • Indirect load of constants, including constant replacement • Class initialization • Profiling (tiered compilation support) • Inlining

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

11

Constants • Constants embedded in the code (field offsets, GC barriers, etc) — need to be validated before any code is run. • Global constants (heap top/end, card table base, etc) — eagerly initialized when the library is loaded. GraalHotSpotVMConfigNode represents these values. Folds to constants in JIT mode. Indirect loads in AOT. • Local constants (classes, objects, method counters) — lazily initialized at runtime.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

12

Constant replacement • Automatic — constants are replaced by nodes that provide indirection and handle lazy resolution if necessary: ReplaceConstantNodesPhase. • Replaces classes, method counters and string constants with ResolveConstantNode and ResolveMethodAndLoadCountersNode. • Some well-known class constants are eagerly resolved, replaced with LoadConstantIndirectlyNode (for example primitive array classes). • Class mirror constants are replaced with indirections though class constants: LoadJavaMirrorWithKlassPhase.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

13

Constant replacement optimizations • Currently single resolution for each constant (placed in a dominating block). • Reuse of dominating class initialization nodes (InitializeKlassNode). • Resolution of root method holder class and its superclasses can be omitted since it’s guaranteed to be initialized (and hence resolved). Replaced by LoadConstantIndirectlyNode.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

14

Constant replacement Later all these nodes are lowered into snippets that do (for example for class constants): KlassPointer result = LoadConstantIndirectlyNode.loadKlass(constant); if (probability(VERY_SLOW_PATH_PROBABILITY, result.isNull())) { result = ResolveConstantStubCall.resolveKlass(constant, EncodedSymbolNode.encode(constant)); }

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

15

Class initialization • InitializeKlassNode is inserted at every initialization point through new parser plugin interface ClassInitializationPlugin (see HotSpotClassInitializationPlugin for implementation). • Some optimizations are possible at parsing phase (type >=: holder, except for interfaces). • Separate phase (EliminateRedundantInitializationPhase) with data flow analysis to remove redundant class initialization nodes. Good after loop peeling. • Lowered into if-null-then-call-runtime diamond.

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

16

Tiered support • Similar to level 2 profiling, higher thresholds. • Counting invocation and back branches, calling back to runtime to re-JIT. • Profiling nodes are inserted in the parser via new plugin interface ProfilingPlugin (see HotSpotProfilingPlugin for implementation). • Later processed in FinalizeProfileNodesPhase. Assigns inlinee notification frequencies, and random sources (more about this later). • Profiling nodes can be smushed together if profiling the same thing and are in straight-line control flow (good with loop unrolling).

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

17

Kinds of profiling • Profiling nodes are lowered in two ways. Normal profiling and probabilistic profiling. • Normal profiling is what you’d expect (see ProfileSnippets). • Probabilistic profiling tries to minimize cache line ping-ponging (see ProbabilisticProfileSnippets).

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

18

Probabilistic profiling • Threads executing same methods are competing for the same cache line, where the counters are. • The idea is to not do increments every time, but do it with some predefined probability. • Branch on random: if (random() & ((1