Thursday, July 15, 2010

Core Dump Analysis with mdb/dbx

We had a Java application core dump. pstack and jstack will show all threads, but does not reveal which thread's guilty. Finding this one thread is simple:

# mdb /usr/bin/java /var/core/core_hostname_java_8081_8081_1276625241_1986
mdb: core file data for mapping at ffb80000 not saved: Bad address
Loading modules: [ ]
> $C
4d2fe5f8`_lwp_kill+8(6, 0, 20f04, ff36932c, ff38a000, ff38abdc)
4d2fe658`umem_do_abort+0x1c(3c, 4d2fe5a8, 6, 20e40, ff376ad8, 0)
4d2fe6b8`umem_err_recoverable+0x7c(ff377b54, a, 20d38, 656ebd84, ff38d0e8, ff377b5f)
4d2fe718`process_free+0x114(59c2008, 1, 0, 3e3a1000, 1ec08, 656d3e9c)
4d2fe778`XYZfree+0x1b8(59c2008, 65725b48, 15b, 4, 45a, 4d2fe9f8)
4d2fe810`XYZ_XYZ_FreeUser+0x28(59c2008, 4d2fea64, 4d2fe9e0, ffffff80, 80000000, 0)
4d2fe880`Java_net_xyz_xyzserver_XYZUser_free+0x24(68204c4, 4d2fea64, 0, 4, 45a, 4d2fe9f8)

Or using dbx (included in Oracle Solaris Studio):

# dbx /usr/bin/java /var/core/core_hostname_java_8081_8081_1276625241_1986
 For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.7' in your .dbxrc
Reading java
core file header read successfully
... (omitted) ...
A loadobject was found with an unexpected checksum value.
See `help core mismatch' for details, and run `proc -map'
to see what checksum values were expected and found.
dbx: warning: Some symbolic information might be incorrect.
t@null (l@79) terminated by signal ABRT (Abort)
0xff2c642c: _postfork1_child+0x00ac:    add      %i4, 212, %o1
Current function is XYZfree
dbx: warning: can't find file "/tmp/builds/XYZ-UI-8.0.031/XYZ-UI-8.0.031-source/CommonLibraries/nplexlib/unix/../src/mdebug.c"
dbx: warning: see `help finding-files'
(dbx) lwps                                                                 
  l@1 LWP suspended in door_create_server()
  l@2 LWP suspended in _postfork1_child()
... (omitted) ...
  l@77 LWP suspended in _postfork1_child()
  l@78 LWP suspended in _postfork1_child()
o>l@79 signal SIGABRT in _postfork1_child()
  l@80 LWP suspended in libc_init()
  l@81 LWP suspended in _postfork1_child()
  l@82 LWP suspended in door_create_server()
... (omitted) ...
(dbx) select l@79
(dbx) where
  [1] _postfork1_child(0x6, 0x0, 0x20f04, 0xff36932c, 0xff38a000, 0xff38abdc), at 0xff2c642c
  [2] umem_do_abort(0x3e, 0x68f7e2b8, 0x6, 0x20e40, 0xff376ad8, 0x0), at 0xff369188
  [3] umem_err_recoverable(0xff377b54, 0xa, 0x20d38, 0x652ebd84, 0xff38d0e8, 0xff377b5f), at 0xff36932c
  [4] process_free(0x7262008, 0x1, 0x0, 0x3e3a1000, 0x1ec08, 0x652d3e9c), at 0xff36b504
=>[5] XYZfree(ptr = 0x7262008, f = 0x65325b48 "../src/xy_xyz_user.c", l = 347), line 311 in "mdebug.c"
... (omitted) ...

Now off to the SW vendor... :-)

Monday, July 12, 2010

Read/Write Performance Observations

As I mentioned in my previous post, we have now moved more active mailboxes to the Sun Oracle 7000 Storage System. Active means, incoming mails, POP3/IMAP4 accesses etc.


First we will take a look at disk read latency. We can see that more than 50% are lower than 10ms.

A sanity check on one of the NFS-clients confirms this

    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device  
   73.6   36.5  494.4  317.7  0.0  0.3    0.1    2.4   1  22 filer:/filesystemb 
   39.8   31.7  322.2  445.4  0.0  0.1    0.4    1.8   1  12 filer:/filesystemb

We can see that the average service time over this 10 second sample is around 2ms. Usually I'm not interpreting the %b value too much, other than if it constantly is at 100%. Newer file-systems read/write in bursts, which makes it not a good problem indicator.

Next we will take a look at the read I/Os. We can see a moderate number of read I/Os per disk.

We can also see the bandwith usage for this operation. For comparison I've marked the same physical disk.


Writes are handled differently by the Oracle Storage 7000 systems, as syncronous writes are written to the SLOG/SSD devices. Asynchronous writes are collected into the ARC and then flushed to disk. You can see this as long bars in the latency graph.

A different view of this is the I/O per second graph for writes. The yellow area on the graph are the SLOG devices. You can see an almost straight line, whereas the blue area has some peaks.

The last graph shows the same as I/O bytes per second graph. Here we see about the same peaks.


Looking at the read/write-ratio, shows us, that we have much more write I/Os than read I/Os.



Analytics helps us tremendously in monitoring how the system behaves during migration. This is especially important, with workloads, that can not be easily simulated.

As we already had ZFS experience, the patterns have not been too surprising. What comes to me as a surprise is the high write portion. On our FC-attached storage array, we see a ~ 50%/50% read/write ratio for the same workload using VxFS. While not being a problem at all, I will investigate the reason for this further.

Friday, July 9, 2010

ARC Cache revisited

As we move more and more active mailboxes onto our OpenStorage box, it's time to have a look at the ARC cache.

If you remember, I already blogged a while ago about the ARC. At that time we had only inactive mailboxes, with almost no access other than incoming mails.

This is how it looks with active mailboxes:

I've colored all the hits. As you can see, there is a huge amount of metadata hits and prefetched metadata hits. You can also see some data hits and prefetched data hits.

We're now constantly adding more and more mailboxes to the system, resulting in more and more metadata. As the amount of data grows L2ARC will become more important. Stay tuned for a blog entry about how the L2ARC behaves...