/ How to analyze Thread Dump – Part 4: IBM VM ~ Java EE Support Patterns

6.04.2012

How to analyze Thread Dump – Part 4: IBM VM

This article is part 4 of our Thread Dump analysis series which will provide you with an overview of what is a JVM Thread Dump for the IBM VM and the different Threads and data points that you will find. As you will see and learn, the IBM VM Thread Dump format is different but provides even more out-of-the-box troubleshooting data.

** UPDATE: Thread Dump analysis tutorial videos now available here.

At this point, you should know how Threads interact with the Java EE container and what a Thread Dump is. Before we go any further in the deep dive analysis patterns, you also need to understand the IBM VM Thread Dump format since this is the typical Thread Dump data to expect when using IBM WAS on IBM VM.

IBM VM Thread Dump breakdown overview

In order for you to better understand, find below a diagram showing you a visual breakdown of an IBM 1.6 VM Thread Dump and its common data points found:


As you can, there are extra runtime data that you will not find from a HotSpot VM Thread Dump. Please keep in mind that you may not need to review all these data points but you still need to understand what data is available depending of your problem case.
The rest of the article will cover each Thread Dump portion in more detail.

# Thread Dump generation event
The first portion provides you with detail on how this Thread Dump was generated. IBM Thread Dump can be generated as a result of a “signal 3” or “user” e.g. kill -3 <Java pid> or automatically as a result of severe JVM conditions such as an OutOfMemoryError.

0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     Dump Event "user" (00004000) received
1TIDATETIME    Date:                 2012/03/12 at 20:52:13
1TIFILENAME    Javacore filename:    /apps/wl11g/domains/app/javacore.20120312.205205.1949928.0004.txt
1TIREQFLAGS    Request Flags: 0x81 (exclusive+preempt)
1TIPREPSTATE   Prep State: 0x4 (exclusive_vm_access)

0SECTION       TITLE subcomponent dump routine
NULL           ===============================
1TISIGINFO     OUTOFMEMORY received
1TIDATETIME    Date:                 2012/06/01 at 09:52:12
1TIFILENAME    Javacore filename:    /usr/WebSphere/AppServer/javacore311328.1338524532.txt

# HW and OS environment detail
The next section provides you with some detail on the current hardware and OS that this IBM VM is running from:

0SECTION       GPINFO subcomponent dump routine
NULL           ================================
2XHOSLEVEL     OS Level         : AIX 5.3
2XHCPUS        Processors -
3XHCPUARCH       Architecture   : ppc64
3XHNUMCPUS       How Many       : 6
3XHNUMASUP       NUMA is either not supported or has been disabled by user

# JRE detail and Java start-up arguments
This section is very useful as it provides you with a full view on your JRE major version and patch level along with all JVM start-up arguments.

0SECTION       ENVINFO subcomponent dump routine
NULL           =================================
1CIJAVAVERSION JRE 1.6.0 IBM J9 2.4 AIX ppc64-64 build jvmap6460sr9-20101124_69295
1CIVMVERSION   VM build 20101124_069295
1CIJITVERSION  JIT enabled, AOT enabled - r9_20101028_17488ifx2
1CIGCVERSION   GC - 20101027_AA
1CIRUNNINGAS   Running as a standalone JVM
…………………………………………………………………………………………

# User and environment variables
This section provides you with a listing of current user and environment variables such as File Descriptor limit.

1CIUSERLIMITS  User Limits (in bytes except for NOFILE and NPROC)
NULL           ------------------------------------------------------------------------
NULL           type                            soft limit           hard limit
2CIUSERLIMIT   RLIMIT_AS                        unlimited            unlimited
2CIUSERLIMIT   RLIMIT_CORE                     1073741312            unlimited
2CIUSERLIMIT   RLIMIT_CPU                       unlimited            unlimited
2CIUSERLIMIT   RLIMIT_DATA                      unlimited            unlimited
2CIUSERLIMIT   RLIMIT_FSIZE                     unlimited            unlimited
2CIUSERLIMIT   RLIMIT_NOFILE                         4096                 4096
2CIUSERLIMIT   RLIMIT_RSS                        33554432            unlimited
2CIUSERLIMIT   RLIMIT_STACK                      33554432           4294967296

# Java Heap detail and GC history
Similar to HotSpot VM 1.6+, IBM VM Thread Dump also contains information on the Java Heap capacity and utilization along with memory segments allocated for each memory space of the Java process.  

Please keep in mind that deeper Java Heap analysis will require you to analyze the Heap Dump binary snapshot as per below tutorial.

Finally, an history of the garbage collection process is also present.

0SECTION       MEMINFO subcomponent dump routine
NULL           =================================
1STHEAPFREE    Bytes of Heap Space Free: 51104BC8
1STHEAPALLOC   Bytes of Heap Space Allocated: 80000000

1STSEGTYPE     Internal Memory
…………………………………………………………………………………………

1STSEGTYPE     Object Memory
…………………………………………………………………………………………

1STSEGTYPE     Class Memory
…………………………………………………………………………………………

1STSEGTYPE     JIT Code Cache
…………………………………………………………………………………………

1STSEGTYPE     JIT Data Cache
…………………………………………………………………………………………

STGCHTYPE     GC History 
3STHSTTYPE     00:52:07:523048405 GMT j9mm.51 -   SystemGC end: newspace=466136480/483183616 oldspace=899251600/1610612736 loa=80530432/80530432
3STHSTTYPE     00:52:07:523046694 GMT j9mm.139 -   Reference count end: weak=40149 soft=87504 phantom=33 threshold=17 maxThreshold=32
3STHSTTYPE     00:52:07:522164027 GMT j9mm.91 -   GlobalGC end: workstackoverflow=0 overflowcount=0 weakrefs=40149 soft=87504 threshold=17 phantom=33 finalizers=4947 newspace=466136480/483183616 oldspace=899251600/1610612736 loa=80530432/80530432
3STHSTTYPE     00:52:07:522152764 GMT j9mm.90 -   GlobalGC collect complete

# Java and JVM object monitor lock and deadlock detail
This Thread Dump portion is very important. Quite often Thread problems involve Threads waiting between each other due to locks on particular Object monitors e.g. Thread B waiting to acquire a lock on Object monitor held by Thread A. Deadlock conditions can also be triggered from time to time; especially for non-Thread safe implementations.

The IBM VM Thread Dump provides a separate section where you can analyze lock(s) held by each Thread including waiting chain(s) e.g. Many Threads waiting to acquire the same Object monitor lock.

0SECTION       LOCKS subcomponent dump routine
NULL           ===============================
NULL          
1LKPOOLINFO    Monitor pool info:
2LKPOOLTOTAL     Current total number of monitors: 1034
NULL          
1LKMONPOOLDUMP Monitor Pool Dump (flat & inflated object-monitors):
2LKMONINUSE      sys_mon_t:0x0000000115B53060 infl_mon_t: 0x0000000115B530A0:
3LKMONOBJECT       java/util/Timer$TimerImpl@0x0700000000C92AA0/0x0700000000C92AB8: <unowned>
3LKNOTIFYQ            Waiting to be notified:
3LKWAITNOTIFY            "Thread-7" (0x0000000114CAB400)
…………………………………………………………………………

## Threads waiting chain
2LKMONINUSE      sys_mon_t:0x000000012462FE00 infl_mon_t: 0x000000012462FE40:
3LKMONOBJECT       com/inc/server/app/Request@0x07000000142ADF30/0x07000000142ADF48: owner "Thread-30" (0x000000012537F300), entry count 1
3LKNOTIFYQ            Waiting to be notified:
3LKWAITNOTIFY            "Thread-26" (0x0000000125221F00)
3LKWAITNOTIFY            "Thread-27" (0x0000000125252000)
3LKWAITNOTIFY            "Thread-28" (0x000000012527B800)
3LKWAITNOTIFY            "Thread-29" (0x00000001252DDA00)
3LKWAITNOTIFY            "Thread-31" (0x0000000125386200)
3LKWAITNOTIFY            "Thread-32" (0x0000000125423600)
3LKWAITNOTIFY            "Thread-33" (0x000000012548C500)
3LKWAITNOTIFY            "Thread-34" (0x00000001255D6000)
3LKWAITNOTIFY            "Thread-35" (0x00000001255F7900)
…………………………………………………………………………

# Java EE middleware, third party & custom application Threads
Similar to the HotSpot VM Thread Dump format, this portion is the core of the Thread Dump and where you will typically spend most of your analysis time. The number of Threads found will depend on your middleware software that you use, third party libraries (that might have its own Threads) and your application (if creating any custom Thread, which is generally not a best practice).

The following Thread in the example below is in BLOCK state which typically means it is waiting to acquire a lock on an Object monitor. You will need to search in the earlier section and determine which Thread is holding the lock so you can pinpoint the root cause.

3XMTHREADINFO      "[STUCK] ExecuteThread: '162' for queue: 'weblogic.kernel.Default (self-tuning)'" J9VMThread:0x000000013ACF0800, j9thread_t:0x000000013AC88B20, java/lang/Thread:0x070000001F945798, state:B, prio=1
3XMTHREADINFO1            (native thread ID:0x1AD0F3, native priority:0x1, native policy:UNKNOWN)
3XMTHREADINFO3           Java callstack:
4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createConnection(SingleConnectionFactory.java:207(Compiled Code))
4XESTACKTRACE                at org/springframework/jms/connection/SingleConnectionFactory.createQueueConnection(SingleConnectionFactory.java:222(Compiled Code))
4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate102.createConnection(JmsTemplate102.java:169(Compiled Code))
4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.execute(JmsTemplate.java:418(Compiled Code))
4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send(JmsTemplate.java:475(Compiled Code))
4XESTACKTRACE                at org/springframework/jms/core/JmsTemplate.send(JmsTemplate.java:467(Compiled Code))
…………………………………………………………………………………………………………

# JVM class loader summary
Finally, the last section of the IBM VM Thread Dump provides you with a detailed class loader summary. This is very crucial data when dealing with Class Loader related issues and leaks. You will find the number and type of loaded Classes for each active Class loader in the running JVM.

I suggest that you review the following case study for a complete tutorial on how to pinpoint root cause for this type of issues when using IBM VM.


0SECTION       CLASSES subcomponent dump routine
NULL           =================================
1CLTEXTCLLOS        Classloader summaries
1CLTEXTCLLSS               12345678: 1=primordial,2=extension,3=shareable,4=middleware,5=system,6=trusted,7=application,8=delegating
2CLTEXTCLLOADER            p---st-- Loader *System*(0x0700000000878898)
3CLNMBRLOADEDLIB           Number of loaded libraries 6
3CLNMBRLOADEDCL                  Number of loaded classes 3721
2CLTEXTCLLOADER            -x--st-- Loader sun/misc/Launcher$ExtClassLoader(0x0700000000AE8F40), Parent *none*(0x0000000000000000)
3CLNMBRLOADEDLIB           Number of loaded libraries 0
3CLNMBRLOADEDCL                  Number of loaded classes 91
2CLTEXTCLLOADER            -----ta- Loader sun/misc/Launcher$AppClassLoader(0x07000000008786D0), Parent sun/misc/Launcher$ExtClassLoader(0x0700000000AE8F40)
3CLNMBRLOADEDLIB           Number of loaded libraries 3
3CLNMBRLOADEDCL                  Number of loaded classes 15178
……………………………………………………………………………………………

I hope this article has helped to understand the basic view of an IBM VM Thread Dump. The next article (part 5) will provide you with a tutorial on how to analyze a JVM Thread Dump via a step by step tutorial and technique I have used over the last 10 years.
Please feel free to post any comment and question.

17 comments:

Hi all,

Please note that part 5 which is a step by step tutorial on Thread Dump analysis will be available in about 2 weeks from now.

Please ensure that you understand the base and Thread Dump format before moving to part 5.

Thanks.
P-H

By IBM VM, do you mean J9? By JVM Thread Dump, do you mean HotSpot?

Hi Stephane,

Yes, IBM VM ex: JRE 1.6.0 IBM J9 2.4

IBM VM is not HotSpot. HotSpot refers to Sun JVM implementation. They share common properties but these are 2 different JVM implementation.

Thread Dump format is also different between Sun HotSpot and IBM VM, please review part 3 for Thread Dump format for HotSpot.

http://javaeesupportpatterns.blogspot.ca/2012/02/how-to-analyze-thread-dump-part-3.html

Thanks.
P-H

Hi,

Can you please send the link for part 5.

Hi Jagadeesh,

The part 5 is not available yet. I'm planning to complete and release part 5 in about 1 week.

Thanks for your interest in this series.

P-H

Hi,

Please note that the part 5 is now available.
http://javaeesupportpatterns.blogspot.ca/2012/07/how-to-analyze-thread-dump-part-5.html

Thanks.
P-H

Hello,

We are using Weblogic version 10 and our application is in cluster environment.

Very often our portal dispatcher goes to FAILED state. When we verified the log files, we got this error.



This is the current setting in weblogic.

Max Stuck Thread Time : 600 sec
Stuck Thread Count: 1

By increasing the struck thread count will our problem gets resolved?
or do we need to increase both?
Please share your suggestion

Thanks,
Raghav

Hello,

My application error is not displayed in my previous comments.
Hence posting again.

------ Critical Subsystem Thread Pool has failed. Setting server state to FAILED. Server failed as the number of stuck threads has exceeded the max limit of 1

Hi Raghavendra,

Looks like you are hitting the classic issue of the over sensitive Weblogic stuck thread detection feature added since WLS 9.2. When enabled, it can disable/force shutdown the entire server for only 1 stuck thread.

Like we did for many of my clients production system, my recommendation is for you to disable the feature forcing Weblogic to take action when stuck thread is detected. You can still monitor the stuck threads and determine root cause as per Thread Dump analysis while keeping server up and running. It is only useful in the event stuck threads are using resources such as infinite looping, eating CPU & Java Heap.

To disable go to >> Environment >> WorkManager >> Select the workmanager you have facing problem with >> then check the Ignore Stuck Threads field

Also go under >> Environment >> servers (select each managed server) >> Overload >> Select ignore, take no action for both Failure Action & Panic Action

Regards,
P-H

Hello,

Thanks for your response.

Cannot find any entry/record in below path!

To disable go to >> Environment >> WorkManager >> Select the workmanager you have facing problem with >> then check the Ignore Stuck Threads field


This option is already set in the weblogic.

Also go under >> Environment >> servers (select each managed server) >> Overload >> Select ignore, take no action for both Failure Action & Panic Action

How to solve my problem? Please suggest some solution :(


Many thanks,
Raghav

hi Raghav,

Please go again under Environment >> servers (select each managed server) >> Overload >>

Can you please tell me what you see in the Stuck Thread Count field? The value should be 0. This will prevent Weblogic to change server in FAILED state.

Please confirm.

Thanks.
P-H

Hello,

Details are below.

Stuck Thread Count: 1
Max Stuck Thread Time : 600 sec

Thanks,
Raghav

Perfect, this is the problem. With this value, Weblogic will automatically go in FAILED state as soon as only ONE stuck Thread is found.

Please change the value from 1 to 0. This should fix your problem.

Regards,
P-H

Hi!

Could you please hint me how to match thread id from trace.log and thread id from thread dump?

Hi Banepmn,

Can you please give me detail of data found from your trace.log? Are you referring to the AIX CPU per Thread data?

Thanks.
P-H

Hi HeapUpNoRegrets,

I will see what I can find as reference such as IBM related articles.

Please stay tuned as I will also release Thread Dump analysis videos shortly which will be very useful for you to understand the analysis process.

Regards,
P-H

Hi,

In future articles will you teach us how to use IBM thread analyser tool?

Post a Comment