Core Dump Analysis with mdb/dbx

We had a Java application core dump. pstack and jstack will show all threads, but does not reveal which thread's guilty. Finding this one thread is simple:

# mdb /usr/bin/java /var/core/core_hostname_java_8081_8081_1276625241_1986
mdb: core file data for mapping at ffb80000 not saved: Bad address
Loading modules: [ libumem.so.1 libc.so.1 libuutil.so.1 ld.so.1 ]
> $C
4d2fe5f8 libc.so.1`_lwp_kill+8(6, 0, 20f04, ff36932c, ff38a000, ff38abdc)
4d2fe658 libumem.so.1`umem_do_abort+0x1c(3c, 4d2fe5a8, 6, 20e40, ff376ad8, 0)
4d2fe6b8 libumem.so.1`umem_err_recoverable+0x7c(ff377b54, a, 20d38, 656ebd84, ff38d0e8, ff377b5f)
4d2fe718 libumem.so.1`process_free+0x114(59c2008, 1, 0, 3e3a1000, 1ec08, 656d3e9c)
4d2fe778 libxy_xyzclient_native.so.solaris`XYZfree+0x1b8(59c2008, 65725b48, 15b, 4, 45a, 4d2fe9f8)
4d2fe810 libxy_xyzclient_native.so.solaris`XYZ_XYZ_FreeUser+0x28(59c2008, 4d2fea64, 4d2fe9e0, ffffff80, 80000000, 0)
4d2fe880 libxy_xyzclient_native.so.solaris`Java_net_xyz_xyzserver_XYZUser_free+0x24(68204c4, 4d2fea64, 0, 4, 45a, 4d2fe9f8)
...


Or using dbx (included in Oracle Solaris Studio):

# dbx /usr/bin/java /var/core/core_hostname_java_8081_8081_1276625241_1986
 For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.7' in your .dbxrc
Reading java
core file header read successfully
Reading ld.so.1
Reading libumem.so.1
... (omitted) ...
Reading libpthread.so.1
Reading libcmd.so.1
Reading libaio.so.1
WARNING!!
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
t@null (l@79) terminated by signal ABRT (Abort)
0xff2c642c: _postfork1_child+0x00ac:    add      %i4, 212, %o1
Current function is XYZfree
dbx: warning: can't find file "/tmp/builds/XYZ-UI-8.0.031/XYZ-UI-8.0.031-source/CommonLibraries/nplexlib/unix/../src/mdebug.c"
dbx: warning: see `help finding-files'
(dbx) lwps                                                                 
  l@1 LWP suspended in door_create_server()
  l@2 LWP suspended in _postfork1_child()
... (omitted) ...
  l@77 LWP suspended in _postfork1_child()
  l@78 LWP suspended in _postfork1_child()
o>l@79 signal SIGABRT in _postfork1_child()
  l@80 LWP suspended in libc_init()
  l@81 LWP suspended in _postfork1_child()
  l@82 LWP suspended in door_create_server()
... (omitted) ...
(dbx) select l@79
(dbx) where
  [1] _postfork1_child(0x6, 0x0, 0x20f04, 0xff36932c, 0xff38a000, 0xff38abdc), at 0xff2c642c
  [2] umem_do_abort(0x3e, 0x68f7e2b8, 0x6, 0x20e40, 0xff376ad8, 0x0), at 0xff369188
  [3] umem_err_recoverable(0xff377b54, 0xa, 0x20d38, 0x652ebd84, 0xff38d0e8, 0xff377b5f), at 0xff36932c
  [4] process_free(0x7262008, 0x1, 0x0, 0x3e3a1000, 0x1ec08, 0x652d3e9c), at 0xff36b504
=>[5] XYZfree(ptr = 0x7262008, f = 0x65325b48 "../src/xy_xyz_user.c", l = 347), line 311 in "mdebug.c"
 
... (omitted) ...

Now off to the SW vendor... :-)

Comments

Popular posts from this blog

SLOG Latency

Heating up the Data Pipeline (Part 1)

Heating up the Data Pipeline (Part 2)