/ Java EE Support Patterns

8.13.2014

Java Performance Optimization - DZone Refcard

This post is to inform you that I published recently a recent refcard on Java Performance Optimization which is now available from DZone. Refcardz are basically “Cheat Sheets” which can be very useful for Developers and Application Support individuals.

I highly recommend that you download your own copy today, it is FREE!


For now, find below a small teaser:

It is possible to define “optimal performance” in different ways, but the basic elements are: the ability of a Java program to perform its computing tasks within the business response time requirements, and the ability of an application to fulfill its business functions under high volume, in a timely manner, with high reliability and low latency. Sometimes the numbers themselves become patternized: for some major websites, a page response time of 500ms maximum per user function is considered optimal. This Refcard will include target numbers when appropriate, but in most cases you will need to decide these on your own, based on business requirements and existing performance benchmarks…”

Please feel free to comment and ask me any question on this subject.

Thank you.
P-H

3.18.2014

Java 8 is now released!

This post is to inform you that Java 8 has been officially released by Oracle. You can download the official release here.

In a nutshell:

  • Lambda expressions: a new language feature  that enables you to treat functionality as a method argument, or code as data.
  • Other significant enhancements and changes to the Java language and standard libraries including default methods, the new java.util.stream package, and the Date-Time API.
  • Compact Profiles contain predefined subsets of the Java SE platform and enable applications that do not require the entire Platform to be deployed and run on small devices.
  • Security enhancements include updates to the Java Cryptography Architecture; limited doPrivileged, a mechanism that enables code to assert a subset of its privileges; SSL/TLS Server Name Indication (SNI) Extension; and keystore enhancements.
  • JavaFX documentation has been updated for this release.
  • A new JavaScript engine, Nashorn, is included in JDK 8.
  • Java Mission Control 5.3 is included in JDK 8.
  • Decommission of the JVM Permanent Generation space and replaced by the Metaspace.

I will publish more articles in the upcoming weeks, including tutorials on Java Mission Control 5.3 which is now part of the JDK 8.

In the meantime, I suggest that you revisit my original article on the replacement of the HotSpot PermGen space by the Metaspace. I will publish the part 2 in the upcoming weeks.

3.13.2014

ClassNotFoundException: is it slowing down your JVM?

Most Java developers are familiar with the infamous and very common java.lang.ClassNotFoundException. While the sources of this problem are generally well understood (missing classes/libraries in classpath, class loader delegation problems etc.), the impact to the overall JVM and performance is often unknown. This situation can lead to significant impacts to your application response time and scalability.

Large Java EE enterprise systems with multiple applications deployed are the most exposed to this type of problem due to the large number of different application classloaders active at runtime. This increases the risk of facing “undetected” ClassNotFoundException unless a clear business impact is identified and close log monitoring is implemented, results: on-going performance impact along with possible JVM class loading IO and Thread lock contention.

The following article and sample programs will demonstrate that any occurrence of ClassNotFoundException found from your client production systems should be taken very seriously and addressed promptly.

Java class loading: the missing link for optimal performance

Proper understanding of this performance problem starts with proper knowledge of the Java class loading model. ClassNotFoundException essentially means a failure of the JVM to locate and/or load a particular Java class such as:

  • The Class.forName() method
  • The ClassLoader.findSystemClass() method
  • The ClassLoader.loadClass() method
While class loading of your application Java classes should occur only once in the JVM life-cycle (or through dynamic re-deploy functions), some applications also rely on dynamic class loading operations.

Regardless, repetitive valid and “failing” class loading operations can be quite intrusive, especially when the loading process is attempted by the default JDK java.lang.ClassLoader itself. Indeed, the default JDK 1.7+ behaviour, due to backward compatibility, will allow only one class to be loaded at the time, unless the class loader is tagged as “parallel capable”. Please keep in mind the even if the synchronization is done at the Class level only, a repetitive class loading failure against the same class name is still going to trigger thread lock contention depending of the Java thread concurrency level you are dealing with. The situation was much worst back with JDK 1.6 with synchronization done systematically at the class loader instance level.






For this reason, Java EE containers such as JBoss WildFly 8  are using their own internal concurrent class loaders for the loading of your application classes. These class loaders implements locking at a finer level of granularity, thus allowing loading different classes from the same instance of the class loader concurrently. This is also aligned with the latest JDK 1.7+ improvement introducing the support for Multithreaded Custom Class Loaders which also help prevent certain class loader deadlock scenarios.

That being said, class loading of system level classes such as java.* and Java EE container modules still fall back on the default JDK ClassLoader. This means that repetitive class loading failure for the same class name e.g. ClassNotFoundException can still trigger severe thread lock contention. This is precisely what we will replicate and demonstrate for the rest of the article.

Thread lock contention - problem replication

In order to recreate and simulate this problem, we created a simple application as per below specifications:

  • A JAX-RS (REST) Web Service executing Class.forName() against a dummy Class name “located” from the system package level:
             String className = "java.lang.WrongClassName";
              Class.forName(className);
            
The simulation essentially executes concurrently with 20 threads the JAX-RS Web service. Each invocation generates a ClassNotFoundException. The logging was fully disabled in order to reduce impact on IO and focus on the class loading contention only.

 Now let’s look at the results from JVisualVM from our 30-60 seconds run. We can clearly see a lot of BLOCKED threads waiting to acquire a lock on an Object monitor.



Analysis of the JVM thread dump is clearly exposing the problem: Thread lock contention. We can see from the execution stack trace that JBoss delegates the loading of the class to the JDK ClassLoader…why? This is because our wrong Java class name is detected to be part of the system class path e.g. java.*. In this situation, JBoss will delegate the loading to the system class loader, triggering systematic synchronization for that particular class name and waiters from the other threads waiting to acquire a lock to load the same Class name.

Many threads waiting to acquire LOCK 0x00000000ab84c0c8…

"default task-15" prio=6 tid=0x0000000014849800 nid=0x2050 waiting for monitor entry [0x000000001009d000]              
   java.lang.Thread.State: BLOCKED (on object monitor)                                                                  
            at java.lang.ClassLoader.loadClass(ClassLoader.java:403)                                                             
               - waiting to lock <0x00000000ab84c0c8> (a java.lang.Object)
  // Waiting to acquire a LOCK held by Thread “default task-20”                                                     
               at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)                                                     
               at java.lang.ClassLoader.loadClass(ClassLoader.java:356)    // JBoss now delegates to system ClassLoader..                                                          
               at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:371)                          
               at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119)                                 
               at java.lang.Class.forName0(Native Method)                                                                           
               at java.lang.Class.forName(Class.java:186)                                                                           
               at org.jboss.tools.examples.rest.MemberResourceRESTService.SystemCLFailure(MemberResourceRESTService.java:176)       
               at org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.SystemCLFailure(Unknown Source) 
               at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)                                                       
               at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)                             
               at java.lang.reflect.Method.invoke(Method.java:601)    
……………………..                                                              

Culprit thread - default task-20

"default task-20" prio=6 tid=0x000000000e3a3000 nid=0x21d8 runnable [0x0000000010e7d000]                             
   java.lang.Thread.State: RUNNABLE                                                                                   
               at java.lang.Throwable.fillInStackTrace(Native Method)                                                             
               at java.lang.Throwable.fillInStackTrace(Throwable.java:782)                                                        
               - locked <0x00000000a09585c8> (a java.lang.ClassNotFoundException)                                                 
               at java.lang.Throwable.<init>(Throwable.java:287)                                                                  
               at java.lang.Exception.<init>(Exception.java:84)                                                                   
               at java.lang.ReflectiveOperationException.<init>(ReflectiveOperationException.java:75)                             
at java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java:82) // ClassNotFoundException!                                      at java.net.URLClassLoader$1.run(URLClassLoader.java:366)                                                          
               at java.net.URLClassLoader$1.run(URLClassLoader.java:355)                                                          
               at java.security.AccessController.doPrivileged(Native Method)                                                      
               at java.net.URLClassLoader.findClass(URLClassLoader.java:354)                                                      
               at java.lang.ClassLoader.loadClass(ClassLoader.java:423)                                                           
               - locked <0x00000000ab84c0e0> (a java.lang.Object)                                                                  
               at java.lang.ClassLoader.loadClass(ClassLoader.java:410)                                                           
- locked <0x00000000ab84c0c8> (a java.lang.Object)   // java.lang.ClassLoader: LOCK acquired                                                             at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)                                                   
               at java.lang.ClassLoader.loadClass(ClassLoader.java:356)                                                           
               at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:371)                        
               at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119)                               
               at java.lang.Class.forName0(Native Method)                                                                         
               at java.lang.Class.forName(Class.java:186)                                                                         
               at org.jboss.tools.examples.rest.MemberResourceRESTService.SystemCLFailure(MemberResourceRESTService.java:176)     
               at org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.SystemCLFailure(Unknown Source)
…………………………………

Now let’s replace our Class name by a Java class tagged as part of the “application” package and re-run the test under the same load conditions.

String className = "org.ph.WrongClassName";
Class.forName(className);



As we can see, we are no longer dealing with BLOCKED threads…why is that? Let’s have a look at the JVM thread dump to better understand this change of behaviour.

"default task-51" prio=6 tid=0x000000000dd33000 nid=0x200c runnable [0x000000001d76d000]                               
   java.lang.Thread.State: RUNNABLE                                                                                    
               at java.io.WinNTFileSystem.getBooleanAttributes(Native Method)    // IO overhead due to JAR file search operation                                                   
               at java.io.File.exists(File.java:772)                                                                                
               at org.jboss.vfs.spi.RootFileSystem.exists(RootFileSystem.java:99)                                                   
               at org.jboss.vfs.VirtualFile.exists(VirtualFile.java:192)                                                            
               at org.jboss.as.server.deployment.module.VFSResourceLoader$2.run(VFSResourceLoader.java:127)                         
               at org.jboss.as.server.deployment.module.VFSResourceLoader$2.run(VFSResourceLoader.java:124)                         
               at java.security.AccessController.doPrivileged(Native Method)                                                        
               at org.jboss.as.server.deployment.module.VFSResourceLoader.getClassSpec(VFSResourceLoader.java:124)                  
               at org.jboss.modules.ModuleClassLoader.loadClassLocal(ModuleClassLoader.java:252)                                    
               at org.jboss.modules.ModuleClassLoader$1.loadClassLocal(ModuleClassLoader.java:76)                                   
               at org.jboss.modules.Module.loadModuleClass(Module.java:526)                                                         
               at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:189)   // JBoss now fully responsible to load the class                                      
               at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:444) // Unchecked since using JDK 1.7 e.g. tagged as “safe” JDK                
               at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:432)                   
               at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:374)                          
               at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119)                                 
               at java.lang.Class.forName0(Native Method)                                                                            
               at java.lang.Class.forName(Class.java:186)                                                                           
               at org.jboss.tools.examples.rest.MemberResourceRESTService.AppCLFailure(MemberResourceRESTService.java:196)          
               at org.jboss.tools.examples.rest.MemberResourceRESTService$Proxy$_$$_WeldClientProxy.AppCLFailure(Unknown Source)    
               at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)    
……………….                                                  

The above execution stack trace is quite revealing:

  • Since the Java class name was not detected to be part of the Java system package, no ClassLoader delegation was performed, thus no synchronization.
  • Since JBoss considers JDK 1.7+ as a “safe” JDK, the ConcurrentClassLoader .performLoadClassUnchecked() method was used, not triggering any Object monitor lock.
  • No synchronization means no Thread lock contention triggered as a result of the non-stop ClassNotFoundException error.
It is still important to note that while JBoss is doing a great job at preventing Thread lock contention in this scenario, the repetitive class loading attempt will still degrade the performance to a certain degree due to IO overhead associated with excessive JAR file search operations, again re-enforcing the need to take immediate corrective actions.

Final words

I hope that you enjoyed this article and now have a better understanding of potential performance impacts due to excessive class loading operations. While JDK 1.7 and modern Java EE containers brought great improvements on class loader related issues such as deadlocks and thread lock contention, potential problematic scenarios still remain. For this reason, I highly recommend that you closely monitor your application behaviour, logs and ensure that class loader related errors such as java.lang.ClassNotFoundException and java.lang.NoClassDefFoundError are aggressively corrected.

I’m looking forward for your comments and please share your troubleshooting experience with Java class loaders.

1.27.2014

Gartner Magic Quadrant for Application Performance Monitoring 2013

As IT architects and consultants, it is crucial to ensure a performing, stable and scalable production environment for you client applications. Proper monitoring along with a comprehensive capacity planning model is crucial in order to achieve such goals.

While most performance data can be extracted at runtime through various OS commands and open source tools, large and complex IT production environments often requires the deployment of commercial grade APM solutions for an efficient management of the operations.

Before providing recommendations to your client and purchase the first APM product you find on the Web, it is important to properly review the different APM solutions and compare their product capabilities vs. your client operation needs and budget.

A trusted source of information I have been following since a few years now is the Gartner Magic Quadrant report for APM. Gartner’s APM yearly report is essentially an extensive research of every vendor that meets their qualifications. The process also includes customer interviews. The final report places each vendor on two axes, Completeness of Vision and Ability to Execute, and then places them accordingly on a Quadrant. If not done already, I recommend that you obtain your own copy of the 2013 report.

I will publish more articles this year and share my recent experience with top APM solutions such as New Relic, AppDynamics, Compuware DynaTrace and Dell Foglight.

Please do not hesitate to share your experience along with your constructive comments on APM solutions.



Reference

11.14.2013

Java VM – Beware of the YoungGen space

As you may have seen from our previous performance oriented articles, a healthy JVM is one of the most important goals to achieve for optimal application performance and stability. Such health assessment is very often only focusing on the frequency (avoidance) of major collections or detecting the presence of memory leaks. What about the sizing and footprint of the Young Generation space or short lived objects?

This article is based on a true story and a recent troubleshooting experience we had with one of our IT clients. It will demonstrate, through a sample application, that an excessive memory footprint and frequency of minor collections can trigger performance problems as severe as its bigger brother, the Old Generation or tenured space.

JVM health diagnostic

If you are new to the JVM tuning world, you will soon realize that there are no universal solutions applicable for all applications. Regardless of the high quality material that you may find from the Web from various sources, you still need to do your due diligence and properly understand the type of application that you are dealing with, including its sensitivity to JVM GC pauses (some application requires very low JVM pause time < 1%).

Java profiling (including memory leak detection) along with performance & load testing are good examples of extra work that you will need to perform in order to gather all proper data and facts about your application memory footprint and JVM runtime health.

That being said, what do we mean by a “healthy” JVM? Please answer the following questions on the best of your knowledge.

** If you answer NO, please assume a confidence level of 90%+, otherwise answer I DON’T KNOW.

  • Is your Java heap or OldGen space leaking over time (after major collections)?
  • Is your application currently affected by large and/or frequent JVM GC pauses?
  • Is you JVM overall pause time higher than 5% or higher than an established ideal baseline?
  • Is your application response time currently affected by the JVM GC activity on a regular basis and beyond the application tolerance point?
  • Did you observe over the last 3 months occurrences of java.lang.OutOfMemoryError errors?
  • Did you observe over the last 3 months occurrences of JVM crashes (sudden JVM failure with core dump & crash report)?
  • Do you believe that your JVM is currently unstable and/or requires too much human intervention (regular restart etc.)?
If you answered YES or I DON’T KNOW to any of those, this means that you or your production performance tuning team have some work to do here, including a review of the current JVM GC policy.

If you answered NO to all of those with high confidence level, it means that you have likely achieved a solid application and JVM stability, congratulation. I still recommend that you re-assess the situation in between major releases and incremental load forecasts.

Young Generation: Stop the world, really?

As we saw from the quick JVM health assessment exercise, one of the points refers to the JVM overall pause time. This essentially means how much time the JVM is spending during the “stop the world” events. During such periods, application threads are suspended and not performing any work, increasing response time of your application. This metric is crucial since large JVM pauses will trigger unstable and unpredictable response times.



One common misperception that I have seen over the last few years is that YoungGen or minor collections are fully transparent and not effecting the application response time. This statement could almost be true if your Java heap size is small (YG space < 1 GB) and dealing with moderate short lived objects footprint or allocation rate. In this scenario, if the minor collections are executed very fast (< 20 ms) and not too frequently (every 30 seconds+), the overall JVM pause time contributed by the YoungGen space will remain small (<< 1%). However, the situation can change very quickly if the YG memory allocation rate increases (increased footprint per request, traffic surge etc.).

I recommend the following articles for more details about the YoungGen space and concurrent collectors available for the HotSpot JVM.

# Oracle HotSpot mostly concurrent collectors: CMS vs. G1

# Oracle HotSpot minor collections exhaustive coverage

Regardless of the HotSpot GC policy that you are using, including the mostly concurrent collectors such as CMS or G1, the YoungGen space collection remains a “stop the world” event. To our knowledge, Azul Zing C4 is the only JVM collector advertised as a true continuously concurrent compacting collector. We did not have a chance to experiment with this collector at this point. I welcome anyone with C4 tuning experience to share their observations, especially true fact figures vs. mostly concurrent collectors such as G1.

Now that we covered some of the theory, let’s deep dive into our sample application and review the performance testing results against various YoungGen footprint and allocation rates.

Sample application specifications

In order to compare the responsiveness and JVM pause time % between various YG allocation rates, we created a sample application as per below specifications:

  • A JAX-RS (REST) Web Service was created and exposed via the jvm URI as per below attributes.
    @GET
    @Path("/jvm")
    @Produces(MediaType.APPLICATION_JSON)
    public Integer jvm() {}

Each invocation of jvm is performing the following logic:

1.     Allocate a pre-determined size of short-lived objects (eligible for fast YG GC).

In addition, an initial memory footprint of 1 GB of long-lived objects (not eligible for GC) is allocated at class loading time in order to create some noise for the CMS collector.

The YG short lived objects memory allocation and OldGen static retention was simply achieved through the creation of a static array of primitive byte values as per below code snippet. The true memory footprint can be observed as per the JVM heap dump analysis using MAT.

private final static int LONG_LIVED_OBJ_FOOTPRINT = (1024 * 1024 * 1024);
private final static int SHORT_LIVED_OBJ_FOOTPRINT = (100 * 1024 * 1024);

// 1 GB static memory footprint
private final static byte byteArrayLongLivedObj[] = new byte[LONG_LIVED_OBJ_FOOTPRINT];

// 100 MB memory allocation (waste) created per execution
public void generateShortLivedObj(String objId) {          
  byte byteArrayShortLivedObj[] = new byte[SHORT_LIVED_OBJ_FOOTPRINT];
}




Finally, find below the environment specifications and software’s used to create, execute and monitor this YG comparison performance testing.


Performance testing results and observations

The following performance testing simulated a real life application that was dealing with high JVM pause time and severe degradation under peak load. 3 runs were executed, one for the baseline, and 2 runs after simulating improvements (reduction) of the application memory footprint per request.

Baseline

  • 10 concurrent threads
  • 100 MB of short lived objects created per execution per JVM process
The short lived objects memory footprint may look extreme but this is indeed what we were dealing with initially.

Results

  • Average response time: 140 ms
  • Throughput: 68 req / sec
  • JVM overall pause time: 25.8%
  • YG collection frequency: 7 collections per second
  • Rate of GC: 308 909 MB per minute






As per JVisualVM, it looks like the JVM is healthy (no memory leak, stable & low OldGen etc.). However, when you deep dive further in the verbose:gc logs, you then realize that the overall JVM pause time is 25.8%, and all due to excessive frequency of YG collections. This is strongly proving the point to properly analyze the verbose:gc logs vs. only focusing on the JVM tenured space trends.



Testing & tuning #1

  • 10 concurrent threads
  • 50 MB of short lived objects created per execution per JVM process
This run simulates an initial improvement of the application footprint and memory allocation rate from 100 MB to 50 MB per allocation. We can clearly see an improvement to all figures, especially the throughput by simply reducing the application memory footprint per request.

Results

  • Average response time: 119 ms  -21
  • Throughput: 79 req / sec  +11
  • JVM overall pause time: 15.59%  -10.21
  • YG collection frequency: 3-4 collections per second  -3
  • Rate of GC: 164 950 MB per minute  -143 959






Testing & tuning #2

  • 10 concurrent threads
  • 5 MB of short lived objects created per execution per JVM process
This run simulates a much reduced application footprint and memory allocation rate from 100 MB to only 5 MB per allocation.

Results

  • Average response time: 107 ms  -33
  • Throughput: 90 req / sec  +22
  • JVM overall pause time: 1.9%  -23.9
  • YG collection frequency: 1 collection every 2-3 seconds * significant reduction
  • Rate of GC: 15 841 MB per minute  -293 068






As you can see, the final improvement to the application footprint and memory allocation did significantly decrease the JVM pause time to a more acceptable 1.9%. It is important to note that throughout these 3 tests, the OldGen footprint and CMS activity did not have any substantial impact on the JVM pause time, the performance problem was due to the excessive activity and high volume of stop the world events associated with the YG collections.

Solutions and recommendations

Our problem case demonstrated that we can reduce the JVM pause time associated with the excessive YG collection activity by tuning and reducing the memory footprint per application request, thus reducing the allocation rate and YG GC frequency.

However, when such tuning strategy is not possible in the short term, it is worth exploring other solutions. Similar results can potentially be achieved through the following capacity improvement strategies:

·        Horizontal and vertical scaling: split the traffic via an increased number of JVM processes, at the expense of the available hardware, thus reducing the allocation rate and frequency of YG collections. This essentially means throwing hardware at the problem. My recommendation is always to fine tune your application memory footprint first, and then explore in parallel other scaling options.
·        Java heap size & YG ratio tuning: increasing the size of the YG space will definitely help reducing the frequency of stop the world YG collections. Now please be careful that you don’t “starve” the OldGen space otherwise you will simply move the problem with even more severe consequences such as JVM thrashing and OOM events.

Final words

I hope that you enjoyed this article and now have a better understanding of potential performance impact of excessive JVM YG collections.

I recommend that you do the following exercise after reading this article:

  • Pick one of your busiest applications.
  • Review the verbose:gc log and determine the JVM pause time via GCMV.
  • Determine the frequency and impact of the YG collections and identify tuning opportunities.
I’m looking forward for your comments and please share your JVM tuning experience.

JCG Member DZone MVB