DTrace saving the planet again....
...maybe not the planet, but DTrace was again a big help in finding another performance problem.
During peak-hours and when we had a spam-wave coming in, we saw a spike on our T5120 systems.
Where we usually see a very low load of ~ 1-2, the load jumped up to 400 within a few minutes.
Something did'nt seem right.
Like with the previous DTrace Problem, we saw that our userland processes had a high system time, so most of the things it was doing (or maybe not) was in the kernel.
It was time to use one of the secret weapons that Solars Admins have: lockstat (uses DTrace).
Sorry for the lengthy output...
What the h... are these rdccr_delay functions? Don't ask me :-) But looking at the code rcddr_delay appears somewhere in a mutex construct. Mutex? Sounds like we have a bad locking situation here....
Ok, let's check, if we have mutex spins:
# lockstat -C sleep 5
Adaptive mutex spin: 289321 events in 5.074 seconds (57021 events/sec)
Count indv cuml rcnt nsec Lock Caller
-------------------------------------------------------------------------------
174359 60% 60% 0.00 863315 0x6003fe15a38 nfs4_start_open_seqid_sync+0x68
87472 30% 90% 0.00 424820 0x6003fe15e98 nfs4_start_open_seqid_sync+0x68
2572 1% 91% 0.00 41206 door_knob door_return+0x1e8
2499 1% 92% 0.00 25247 door_knob door_upcall_limited+0x27c
2469 1% 93% 0.00 36722 door_knob door_return+0x23c
1664 1% 94% 0.00 260935 0x6003fe15e98 open_owner_rele+0x8
1506 1% 94% 0.00 53714 door_knob door_return+0xbc
1490 1% 95% 0.00 168535 0x6003fe15e98 find_open_owner_nolock+0x2c
1484 1% 95% 0.00 54843 door_knob door_return+0x44
1410 0% 96% 0.00 71905 door_knob door_upcall_limited+0x94
884 0% 96% 0.00 769573 0x6003fe15a38 open_owner_rele+0x8
762 0% 96% 0.00 894325 0x6003fe15a38 find_open_owner_nolock+0x2c
730 0% 97% 0.00 75239 0x6003fe15e98 nfs4_start_open_seqid_sync+0x48
386 0% 97% 0.00 766709 0x6003fe15a38 nfs4_start_open_seqid_sync+0x48
359 0% 97% 0.00 196996 0x6003c393000 wait_for_recovery+0x4
357 0% 97% 0.00 558062 0x6003fe15a38 nfs4_end_open_seqid_sync+0xc
350 0% 97% 0.00 17691 0x6003c393000 nfs4_rfscall+0x48
317 0% 97% 0.00 895414 0x6003c2a4000 wait_for_recovery+0x4
307 0% 97% 0.00 1561398 0x60033a86e80 sigunintr+0x38
275 0% 97% 0.00 123310 0x6003c2a4000 nfs4_rfscall+0x48
274 0% 97% 0.00 24396 0x6003c393000 nfs4_rfscall+0x17c
271 0% 98% 0.00 92475 0x6003fe15e98 nfs4_get_otw_cred+0x34
268 0% 98% 0.00 73946 0x6003fe15e98 nfs4_end_open_seqid_sync+0xc
260 0% 98% 0.00 236558 0x60033a86e80 clock_tick_process+0xc8
257 0% 98% 0.00 483864 0x6003c393000 nfs4_start_open_seqid_sync+0x9c
219 0% 98% 0.00 4798 mod_lock mod_hold_stub+0x18
211 0% 98% 0.00 64277 0x6003c2a4000 nfs4_rfscall+0x17c
197 0% 98% 0.00 18179 0x6003c393000 nfs4_start_open_seqid_sync+0x4
181 0% 98% 0.00 950176 0x6003c2a4000 nfs4_attr_cache+0xc
175 0% 98% 0.00 823990 0x60033a86e80 sigintr+0x2c
163 0% 98% 0.00 628697 0x6003fe15a38 nfs4_get_otw_cred+0x34
161 0% 98% 0.00 685230 0x6003fe15a38 nfs4open_otw+0xfa8
160 0% 98% 0.00 547855 0x6003fe15a38 open_owner_hold+0xc
159 0% 98% 0.00 643441 0x6003fe15a38 nfs4_get_otw_cred_by_osp+0x178
132 0% 98% 0.00 524536 0x60033a86e80 post_syscall+0x320
128 0% 98% 0.00 57596 0x6003c2a4000 nfs4_start_open_seqid_sync+0x4
123 0% 99% 0.00 4786 mod_lock mod_release_stub+0x10
120 0% 99% 0.00 15053 0x6003c392000 find_open_owner+0x4
115 0% 99% 0.00 13426 0x6003fe15e98 nfs4open_otw+0xfa8
115 0% 99% 0.00 222285 0x6003c393000 find_open_owner+0x4
113 0% 99% 0.00 173366 0x6003c393000 nfs4_attr_cache+0xc
109 0% 99% 0.00 8928 0x6003c392000 nfs4write+0x174
108 0% 99% 0.00 14343 0x6003fe15e98 open_owner_hold+0xc
105 0% 99% 0.00 924250 0x6003c2a4000 nfs4_start_open_seqid_sync+0x9c
103 0% 99% 0.00 86073 0x6003fe15e98 nfs4_get_otw_cred_by_osp+0x178
94 0% 99% 0.00 2909 0x60031e81380 hook_run+0xd8
93 0% 99% 0.00 9617 0x6003c392000 nfs4write+0x78
89 0% 99% 0.00 9063 0x6003c392000 nfs4_rfscall+0x48
89 0% 99% 0.00 5480 0x6003c24ef28 mir_wput+0xc4
88 0% 99% 0.00 8461 0x6003c392000 nfs4_rfscall+0x17c
87 0% 99% 0.00 1146262 0x60033a86e80 lookuppnat+0x2c
84 0% 99% 0.00 295989 0x6003c393000 nfs4_delegation_accept+0x2c8
83 0% 99% 0.00 4963 0x6003c393ae8 nfs_rw_exit+0xc
79 0% 99% 0.00 10948 0x6003c392000 wait_for_recovery+0x4
75 0% 99% 0.00 13338 0x6003c393000 nfs4_start_fop+0x378
75 0% 99% 0.00 380086 0x6003c2a4000 find_open_owner+0x4
61 0% 99% 0.00 2234 zone0+0x18 zone_getspecific+0x8
59 0% 99% 0.00 3084 0x60031e81380 hook_run+0x10
57 0% 99% 0.00 6601 0x60034fb6a40 e1000g_rxfree_func+0x48
54 0% 99% 0.00 12081 0x6003c393ae8 nfs_rw_enter_sig+0x8
49 0% 99% 0.00 1100 0x6003505a6a0 e1000g_send+0x1534
43 0% 99% 0.00 5959 0x6003c392000 nfs4_async_start+0x278
43 0% 99% 0.00 641531 0x6003c2a4000 nfs4_delegation_accept+0x2c8
43 0% 99% 0.00 916554 0x6003c2a4000 nfs4open_otw+0xbc0
43 0% 99% 0.00 7897 0x60037b50a38 nfs_idmap_str_uid+0xb0
40 0% 99% 0.00 19531 0x6003c392000 nfs4write+0x4dc
38 0% 99% 0.00 1135 0x60037554900 tcp_wput+0xa4
37 0% 99% 0.00 101535 0x6003c2a4000 nfs4_start_fop+0x378
35 0% 99% 0.00 6144 0x60037b50a38 nfs_idmap_str_gid+0xb0
34 0% 99% 0.00 23000 0x6003c392000 nfs4write+0x234
30 0% 99% 0.00 14483 0x6003c393000 nfs4open_otw+0x5e4
28 0% 99% 0.00 452930 0x6003c2a4000 nfs4write+0x174
27 0% 99% 0.00 1138896 0x6003c2a4000 nfs4read+0x240
This looks like we have a congestion of nfs4 operations!
I was not able to find much more information about this issue, so I contacted my Solaris Guru of Choice. He was able to find a bug:
"NFSv4 can be much slower and more expensive than NFSv3 when creating files at high concurrency"
This sounded exactly what our application does. Putting spam-mails into folders, using parallel threads. :-)
Now it was time to open a support-case. About 10 days after opening the support case, we received an "Interim Diagnostic Relief (IDR)" -Patch.
And this is how it helped us (Sunday/Monday on the right):
Again, it was relatively easy to find the root-cause for our problem, with the help of a few *stat commands.
The question remains, why did it happen.
First of all NFSv4 is still a relatively new protocol, with new bugs. We have also tried out NFSv3, but what we've seen, for our workload (small files) version 4 is much more efficient.
Second, while our workload is a perfect fit for these CoolThread CPUs (normally ;-), we're still on new grounds.
My guess is, the world will see much more concurrency problems like these in the future.
During peak-hours and when we had a spam-wave coming in, we saw a spike on our T5120 systems.
Where we usually see a very low load of ~ 1-2, the load jumped up to 400 within a few minutes.
Something did'nt seem right.
Like with the previous DTrace Problem, we saw that our userland processes had a high system time, so most of the things it was doing (or maybe not) was in the kernel.
It was time to use one of the secret weapons that Solars Admins have: lockstat (uses DTrace).
Sorry for the lengthy output...
# lockstat -i 133 -I sleep 5
Profiling interrupt: 43328 events in 5.088 seconds (8516 events/sec)
Count indv cuml rcnt nsec CPU+PIL Caller
-------------------------------------------------------------------------------
155 0% 0% 0.00 2347 cpu[42] rdccr_delay+0x8
152 0% 1% 0.00 2184 cpu[53] rdccr_delay+0x4
151 0% 1% 0.00 2135 cpu[58] rdccr_delay+0x8
150 0% 1% 0.00 2201 cpu[10] rdccr_delay+0x4
149 0% 2% 0.00 2486 cpu[24] rdccr_delay+0x8
147 0% 2% 0.00 2296 cpu[57] default_lock_delay+0x74
147 0% 2% 0.00 2476 cpu[56] default_lock_delay+0x74
147 0% 3% 0.00 2138 cpu[46] default_lock_delay+0x74
146 0% 3% 0.00 2148 cpu[55] default_lock_delay+0x74
145 0% 3% 0.00 2157 cpu[61] rdccr_delay+0x8
143 0% 4% 0.00 2098 cpu[49] rdccr_delay+0x8
143 0% 4% 0.00 2231 cpu[45] rdccr_delay+0x4
143 0% 4% 0.00 2025 cpu[28] rdccr_delay+0x8
142 0% 5% 0.00 2000 cpu[63] rdccr_delay+0x4
141 0% 5% 0.00 2106 cpu[55] rdccr_delay+0x4
141 0% 5% 0.00 2173 cpu[52] rdccr_delay+0x4
141 0% 6% 0.00 2464 cpu[48] default_lock_delay+0x74
141 0% 6% 0.00 2097 cpu[4] default_lock_delay+0x74
140 0% 6% 0.00 2195 cpu[54] rdccr_delay+0x4
140 0% 7% 0.00 2084 cpu[14] rdccr_delay+0x4
140 0% 7% 0.00 2475 cpu[8] default_lock_delay+0x74
139 0% 7% 0.00 2196 cpu[53] rdccr_delay+0x8
139 0% 8% 0.00 2123 cpu[44] rdccr_delay+0x8
139 0% 8% 0.00 2167 cpu[41] default_lock_delay+0x74
139 0% 8% 0.00 2386 cpu[31] rdccr_delay+0x8
139 0% 9% 0.00 2551 cpu[16] default_lock_delay+0x74
139 0% 9% 0.00 2598 cpu[16] rdccr_delay+0x8
139 0% 9% 0.00 2204 cpu[11] default_lock_delay+0x74
138 0% 10% 0.00 2238 cpu[62] rdccr_delay+0x4
138 0% 10% 0.00 2163 cpu[51] rdccr_delay+0x8
138 0% 10% 0.00 2164 cpu[47] rdccr_delay+0x4
138 0% 11% 0.00 2221 cpu[22] default_lock_delay+0x74
138 0% 11% 0.00 2158 cpu[22] rdccr_delay+0x4
137 0% 11% 0.00 2082 cpu[59] rdccr_delay+0x4
137 0% 11% 0.00 2190 cpu[54] rdccr_delay+0x8
137 0% 12% 0.00 2139 cpu[47] default_lock_delay+0x74
137 0% 12% 0.00 2169 cpu[46] rdccr_delay+0x4
137 0% 12% 0.00 2234 cpu[36] default_lock_delay+0x74
137 0% 13% 0.00 2491 cpu[32] rdccr_delay+0x8
137 0% 13% 0.00 2074 cpu[23] default_lock_delay+0x74
137 0% 13% 0.00 2198 cpu[19] default_lock_delay+0x74
137 0% 14% 0.00 2185 cpu[5] default_lock_delay+0x74
136 0% 14% 0.00 2076 cpu[60] rdccr_delay+0x8
136 0% 14% 0.00 2160 cpu[50] default_lock_delay+0x74
136 0% 15% 0.00 2225 cpu[43] rdccr_delay+0x4
136 0% 15% 0.00 2250 cpu[39] rdccr_delay+0x8
136 0% 15% 0.00 2088 cpu[20] rdccr_delay+0x4
136 0% 16% 0.00 2209 cpu[7] rdccr_delay+0x4
135 0% 16% 0.00 2253 cpu[62] default_lock_delay+0x74
135 0% 16% 0.00 2191 cpu[61] rdccr_delay+0x4
135 0% 17% 0.00 2539 cpu[40] rdccr_delay+0x4
135 0% 17% 0.00 2233 cpu[36] rdccr_delay+0x4
135 0% 17% 0.00 2047 cpu[28] rdccr_delay+0x4
135 0% 17% 0.00 2192 cpu[19] rdccr_delay+0x8
135 0% 18% 0.00 2203 cpu[1] default_lock_delay+0x74
134 0% 18% 0.00 2354 cpu[37] default_lock_delay+0x74
134 0% 18% 0.00 2411 cpu[37] rdccr_delay+0x4
134 0% 19% 0.00 2062 cpu[23] rdccr_delay+0x4
134 0% 19% 0.00 2317 cpu[15] rdccr_delay+0x8
133 0% 19% 0.00 2161 cpu[41] rdccr_delay+0x4
133 0% 20% 0.00 2211 cpu[19] rdccr_delay+0x4
133 0% 20% 0.00 2282 cpu[9] rdccr_delay+0x4
133 0% 20% 0.00 2093 cpu[4] rdccr_delay+0x8
133 0% 21% 0.00 2753 cpu[0] default_lock_delay+0x74
132 0% 21% 0.00 2444 cpu[48] rdccr_delay+0x4
132 0% 21% 0.00 2223 cpu[45] rdccr_delay+0x8
132 0% 21% 0.00 2238 cpu[39] default_lock_delay+0x74
132 0% 22% 0.00 2164 cpu[22] rdccr_delay+0x8
131 0% 22% 0.00 2572 cpu[56] rdccr_delay+0x8
131 0% 22% 0.00 2121 cpu[46] rdccr_delay+0x8
131 0% 23% 0.00 2539 cpu[40] rdccr_delay+0x8
131 0% 23% 0.00 2246 cpu[36] rdccr_delay+0x8
131 0% 23% 0.00 2388 cpu[31] rdccr_delay+0x4
131 0% 24% 0.00 1957 cpu[26] default_lock_delay+0x74
131 0% 24% 0.00 2252 cpu[18] rdccr_delay+0x4
131 0% 24% 0.00 2374 cpu[17] default_lock_delay+0x74
131 0% 24% 0.00 2287 cpu[12] rdccr_delay+0x4
131 0% 25% 0.00 2307 cpu[3] rdccr_delay+0x4
131 0% 25% 0.00 2221 cpu[1] rdccr_delay+0x8
130 0% 25% 0.00 2088 cpu[60] rdccr_delay+0x4
130 0% 26% 0.00 2198 cpu[51] default_lock_delay+0x74
130 0% 26% 0.00 2141 cpu[44] default_lock_delay+0x74
130 0% 26% 0.00 2264 cpu[42] default_lock_delay+0x74
130 0% 27% 0.00 2203 cpu[33] default_lock_delay+0x74
130 0% 27% 0.00 2110 cpu[14] default_lock_delay+0x74
130 0% 27% 0.00 2170 cpu[12] default_lock_delay+0x74
130 0% 27% 0.00 2251 cpu[7] default_lock_delay+0x74
130 0% 28% 0.00 2065 cpu[2] default_lock_delay+0x74
129 0% 28% 0.00 2028 cpu[63] rdccr_delay+0x8
129 0% 28% 0.00 2090 cpu[59] default_lock_delay+0x74
129 0% 29% 0.00 2418 cpu[56] rdccr_delay+0x4
129 0% 29% 0.00 2099 cpu[49] rdccr_delay+0x4
129 0% 29% 0.00 2290 cpu[38] default_lock_delay+0x74
129 0% 30% 0.00 2256 cpu[35] default_lock_delay+0x74
129 0% 30% 0.00 2324 cpu[17] rdccr_delay+0x8
129 0% 30% 0.00 2688 cpu[0] rdccr_delay+0x4
128 0% 30% 0.00 2237 cpu[30] default_lock_delay+0x74
128 0% 31% 0.00 2330 cpu[30] rdccr_delay+0x4
128 0% 31% 0.00 2291 cpu[29] rdccr_delay+0x4
128 0% 31% 0.00 1969 cpu[26] rdccr_delay+0x8
128 0% 32% 0.00 2224 cpu[18] default_lock_delay+0x74
128 0% 32% 0.00 2104 cpu[6] rdccr_delay+0x8
127 0% 32% 0.00 2148 cpu[60] default_lock_delay+0x74
127 0% 33% 0.00 2191 cpu[51] rdccr_delay+0x4
127 0% 33% 0.00 2143 cpu[49] default_lock_delay+0x74
127 0% 33% 0.00 2247 cpu[38] rdccr_delay+0x4
127 0% 33% 0.00 2261 cpu[30] rdccr_delay+0x8
127 0% 34% 0.00 2288 cpu[29] default_lock_delay+0x74
127 0% 34% 0.00 2027 cpu[28] default_lock_delay+0x74
127 0% 34% 0.00 2189 cpu[13] default_lock_delay+0x74
127 0% 35% 0.00 2497 cpu[8] rdccr_delay+0x4
127 0% 35% 0.00 2079 cpu[6] rdccr_delay+0x4
126 0% 35% 0.00 2174 cpu[50] rdccr_delay+0x4
126 0% 35% 0.00 2449 cpu[48] rdccr_delay+0x8
126 0% 36% 0.00 2229 cpu[34] rdccr_delay+0x4
126 0% 36% 0.00 2267 cpu[21] rdccr_delay+0x4
126 0% 36% 0.00 2569 cpu[16] rdccr_delay+0x4
126 0% 37% 0.00 2294 cpu[15] rdccr_delay+0x4
126 0% 37% 0.00 2213 cpu[5] rdccr_delay+0x4
126 0% 37% 0.00 2265 cpu[3] default_lock_delay+0x74
125 0% 37% 0.00 2523 cpu[32] rdccr_delay+0x4
125 0% 38% 0.00 2371 cpu[17] rdccr_delay+0x4
125 0% 38% 0.00 2242 cpu[9] default_lock_delay+0x74
124 0% 38% 0.00 2185 cpu[50] rdccr_delay+0x8
124 0% 39% 0.00 2201 cpu[43] rdccr_delay+0x8
124 0% 39% 0.00 2172 cpu[41] rdccr_delay+0x8
124 0% 39% 0.00 2162 cpu[33] rdccr_delay+0x8
124 0% 39% 0.00 2237 cpu[21] default_lock_delay+0x74
123 0% 40% 0.00 2153 cpu[58] default_lock_delay+0x74
123 0% 40% 0.00 2240 cpu[39] rdccr_delay+0x4
123 0% 40% 0.00 2121 cpu[33] rdccr_delay+0x4
123 0% 41% 0.00 2256 cpu[21] rdccr_delay+0x8
123 0% 41% 0.00 2360 cpu[15] default_lock_delay+0x74
123 0% 41% 0.00 2181 cpu[11] rdccr_delay+0x8
122 0% 41% 0.00 2249 cpu[57] rdccr_delay+0x8
122 0% 42% 0.00 2120 cpu[55] rdccr_delay+0x8
122 0% 42% 0.00 2136 cpu[52] rdccr_delay+0x8
122 0% 42% 0.00 2231 cpu[34] rdccr_delay+0x8
122 0% 43% 0.00 2115 cpu[20] default_lock_delay+0x74
122 0% 43% 0.00 2211 cpu[5] rdccr_delay+0x8
121 0% 43% 0.00 2274 cpu[35] rdccr_delay+0x8
121 0% 43% 0.00 2263 cpu[29] rdccr_delay+0x8
121 0% 44% 0.00 2125 cpu[25] rdccr_delay+0x8
121 0% 44% 0.00 2298 cpu[3] rdccr_delay+0x8
121 0% 44% 0.00 2711 cpu[0] rdccr_delay+0x8
120 0% 45% 0.00 2002 cpu[63] default_lock_delay+0x74
120 0% 45% 0.00 2124 cpu[43] default_lock_delay+0x74
120 0% 45% 0.00 2539 cpu[40] default_lock_delay+0x74
120 0% 45% 0.00 2018 cpu[27] default_lock_delay+0x74
120 0% 46% 0.00 2184 cpu[25] rdccr_delay+0x4
120 0% 46% 0.00 2089 cpu[23] rdccr_delay+0x8
120 0% 46% 0.00 2202 cpu[13] rdccr_delay+0x8
120 0% 47% 0.00 2243 cpu[10] default_lock_delay+0x74
119 0% 47% 0.00 2158 cpu[58] rdccr_delay+0x4
119 0% 47% 0.00 2489 cpu[24] rdccr_delay+0x4
119 0% 47% 0.00 2085 cpu[14] rdccr_delay+0x8
118 0% 48% 0.00 2064 cpu[59] rdccr_delay+0x8
118 0% 48% 0.00 2187 cpu[47] rdccr_delay+0x8
118 0% 48% 0.00 2277 cpu[35] rdccr_delay+0x4
118 0% 48% 0.00 2063 cpu[27] rdccr_delay+0x4
118 0% 49% 0.00 2259 cpu[10] rdccr_delay+0x8
117 0% 49% 0.00 1937 cpu[26] rdccr_delay+0x4
117 0% 49% 0.00 2174 cpu[11] rdccr_delay+0x4
117 0% 49% 0.00 2516 cpu[8] rdccr_delay+0x8
117 0% 50% 0.00 2113 cpu[4] rdccr_delay+0x4
116 0% 50% 0.00 2228 cpu[61] default_lock_delay+0x74
116 0% 50% 0.00 2111 cpu[44] rdccr_delay+0x4
116 0% 51% 0.00 2285 cpu[42] rdccr_delay+0x4
116 0% 51% 0.00 2432 cpu[37] rdccr_delay+0x8
116 0% 51% 0.00 1997 cpu[27] rdccr_delay+0x8
116 0% 51% 0.00 2472 cpu[24] default_lock_delay+0x74
116 0% 52% 0.00 2113 cpu[6] default_lock_delay+0x74
116 0% 52% 0.00 2040 cpu[2] rdccr_delay+0x4
115 0% 52% 0.00 2219 cpu[54] default_lock_delay+0x74
114 0% 52% 0.00 2221 cpu[38] rdccr_delay+0x8
114 0% 53% 0.00 2466 cpu[32] default_lock_delay+0x74
114 0% 53% 0.00 2148 cpu[25] default_lock_delay+0x74
114 0% 53% 0.00 2103 cpu[20] rdccr_delay+0x8
114 0% 53% 0.00 2259 cpu[9] rdccr_delay+0x8
113 0% 54% 0.00 2175 cpu[13] rdccr_delay+0x4
112 0% 54% 0.00 2367 cpu[31] default_lock_delay+0x74
112 0% 54% 0.00 2191 cpu[12] rdccr_delay+0x8
112 0% 55% 0.00 2060 cpu[2] rdccr_delay+0x8
112 0% 55% 0.00 2174 cpu[1] rdccr_delay+0x4
111 0% 55% 0.00 2240 cpu[57] rdccr_delay+0x4
111 0% 55% 0.00 2196 cpu[45] default_lock_delay+0x74
110 0% 56% 0.00 2189 cpu[52] default_lock_delay+0x74
108 0% 56% 0.00 2257 cpu[7] rdccr_delay+0x8
107 0% 56% 0.00 2224 cpu[53] default_lock_delay+0x74
107 0% 56% 0.00 2236 cpu[34] default_lock_delay+0x74
106 0% 57% 0.00 2267 cpu[18] rdccr_delay+0x8
105 0% 57% 0.00 2224 cpu[62] rdccr_delay+0x8
55 0% 57% 0.00 2643 cpu[8] cpu_halt+0x10c
54 0% 57% 0.00 2251 cpu[41] cpu_halt+0x10c
54 0% 57% 0.00 2469 cpu[12] cpu_halt+0x10c
54 0% 57% 0.00 2326 cpu[11] cpu_halt+0x10c
54 0% 57% 0.00 2373 cpu[7] cpu_halt+0x10c
54 0% 58% 0.00 2380 cpu[5] cpu_halt+0x10c
54 0% 58% 0.00 2535 cpu[3] cpu_halt+0x10c
53 0% 58% 0.00 2409 cpu[34] cpu_halt+0x10c
52 0% 58% 0.00 2691 cpu[24] cpu_halt+0x10c
52 0% 58% 0.00 2365 cpu[19] cpu_halt+0x10c
52 0% 58% 0.00 2520 cpu[17] cpu_halt+0x10c
52 0% 58% 0.00 2345 cpu[9] cpu_halt+0x10c
51 0% 58% 0.00 2315 cpu[53] cpu_halt+0x10c
51 0% 58% 0.00 2311 cpu[52] cpu_halt+0x10c
51 0% 59% 0.00 2471 cpu[42] cpu_halt+0x10c
51 0% 59% 0.00 2358 cpu[38] cpu_halt+0x10c
51 0% 59% 0.00 2367 cpu[21] cpu_halt+0x10c
What the h... are these rdccr_delay functions? Don't ask me :-) But looking at the code rcddr_delay appears somewhere in a mutex construct. Mutex? Sounds like we have a bad locking situation here....
Ok, let's check, if we have mutex spins:
# lockstat -C sleep 5
Adaptive mutex spin: 289321 events in 5.074 seconds (57021 events/sec)
Count indv cuml rcnt nsec Lock Caller
-------------------------------------------------------------------------------
174359 60% 60% 0.00 863315 0x6003fe15a38 nfs4_start_open_seqid_sync+0x68
87472 30% 90% 0.00 424820 0x6003fe15e98 nfs4_start_open_seqid_sync+0x68
2572 1% 91% 0.00 41206 door_knob door_return+0x1e8
2499 1% 92% 0.00 25247 door_knob door_upcall_limited+0x27c
2469 1% 93% 0.00 36722 door_knob door_return+0x23c
1664 1% 94% 0.00 260935 0x6003fe15e98 open_owner_rele+0x8
1506 1% 94% 0.00 53714 door_knob door_return+0xbc
1490 1% 95% 0.00 168535 0x6003fe15e98 find_open_owner_nolock+0x2c
1484 1% 95% 0.00 54843 door_knob door_return+0x44
1410 0% 96% 0.00 71905 door_knob door_upcall_limited+0x94
884 0% 96% 0.00 769573 0x6003fe15a38 open_owner_rele+0x8
762 0% 96% 0.00 894325 0x6003fe15a38 find_open_owner_nolock+0x2c
730 0% 97% 0.00 75239 0x6003fe15e98 nfs4_start_open_seqid_sync+0x48
386 0% 97% 0.00 766709 0x6003fe15a38 nfs4_start_open_seqid_sync+0x48
359 0% 97% 0.00 196996 0x6003c393000 wait_for_recovery+0x4
357 0% 97% 0.00 558062 0x6003fe15a38 nfs4_end_open_seqid_sync+0xc
350 0% 97% 0.00 17691 0x6003c393000 nfs4_rfscall+0x48
317 0% 97% 0.00 895414 0x6003c2a4000 wait_for_recovery+0x4
307 0% 97% 0.00 1561398 0x60033a86e80 sigunintr+0x38
275 0% 97% 0.00 123310 0x6003c2a4000 nfs4_rfscall+0x48
274 0% 97% 0.00 24396 0x6003c393000 nfs4_rfscall+0x17c
271 0% 98% 0.00 92475 0x6003fe15e98 nfs4_get_otw_cred+0x34
268 0% 98% 0.00 73946 0x6003fe15e98 nfs4_end_open_seqid_sync+0xc
260 0% 98% 0.00 236558 0x60033a86e80 clock_tick_process+0xc8
257 0% 98% 0.00 483864 0x6003c393000 nfs4_start_open_seqid_sync+0x9c
219 0% 98% 0.00 4798 mod_lock mod_hold_stub+0x18
211 0% 98% 0.00 64277 0x6003c2a4000 nfs4_rfscall+0x17c
197 0% 98% 0.00 18179 0x6003c393000 nfs4_start_open_seqid_sync+0x4
181 0% 98% 0.00 950176 0x6003c2a4000 nfs4_attr_cache+0xc
175 0% 98% 0.00 823990 0x60033a86e80 sigintr+0x2c
163 0% 98% 0.00 628697 0x6003fe15a38 nfs4_get_otw_cred+0x34
161 0% 98% 0.00 685230 0x6003fe15a38 nfs4open_otw+0xfa8
160 0% 98% 0.00 547855 0x6003fe15a38 open_owner_hold+0xc
159 0% 98% 0.00 643441 0x6003fe15a38 nfs4_get_otw_cred_by_osp+0x178
132 0% 98% 0.00 524536 0x60033a86e80 post_syscall+0x320
128 0% 98% 0.00 57596 0x6003c2a4000 nfs4_start_open_seqid_sync+0x4
123 0% 99% 0.00 4786 mod_lock mod_release_stub+0x10
120 0% 99% 0.00 15053 0x6003c392000 find_open_owner+0x4
115 0% 99% 0.00 13426 0x6003fe15e98 nfs4open_otw+0xfa8
115 0% 99% 0.00 222285 0x6003c393000 find_open_owner+0x4
113 0% 99% 0.00 173366 0x6003c393000 nfs4_attr_cache+0xc
109 0% 99% 0.00 8928 0x6003c392000 nfs4write+0x174
108 0% 99% 0.00 14343 0x6003fe15e98 open_owner_hold+0xc
105 0% 99% 0.00 924250 0x6003c2a4000 nfs4_start_open_seqid_sync+0x9c
103 0% 99% 0.00 86073 0x6003fe15e98 nfs4_get_otw_cred_by_osp+0x178
94 0% 99% 0.00 2909 0x60031e81380 hook_run+0xd8
93 0% 99% 0.00 9617 0x6003c392000 nfs4write+0x78
89 0% 99% 0.00 9063 0x6003c392000 nfs4_rfscall+0x48
89 0% 99% 0.00 5480 0x6003c24ef28 mir_wput+0xc4
88 0% 99% 0.00 8461 0x6003c392000 nfs4_rfscall+0x17c
87 0% 99% 0.00 1146262 0x60033a86e80 lookuppnat+0x2c
84 0% 99% 0.00 295989 0x6003c393000 nfs4_delegation_accept+0x2c8
83 0% 99% 0.00 4963 0x6003c393ae8 nfs_rw_exit+0xc
79 0% 99% 0.00 10948 0x6003c392000 wait_for_recovery+0x4
75 0% 99% 0.00 13338 0x6003c393000 nfs4_start_fop+0x378
75 0% 99% 0.00 380086 0x6003c2a4000 find_open_owner+0x4
61 0% 99% 0.00 2234 zone0+0x18 zone_getspecific+0x8
59 0% 99% 0.00 3084 0x60031e81380 hook_run+0x10
57 0% 99% 0.00 6601 0x60034fb6a40 e1000g_rxfree_func+0x48
54 0% 99% 0.00 12081 0x6003c393ae8 nfs_rw_enter_sig+0x8
49 0% 99% 0.00 1100 0x6003505a6a0 e1000g_send+0x1534
43 0% 99% 0.00 5959 0x6003c392000 nfs4_async_start+0x278
43 0% 99% 0.00 641531 0x6003c2a4000 nfs4_delegation_accept+0x2c8
43 0% 99% 0.00 916554 0x6003c2a4000 nfs4open_otw+0xbc0
43 0% 99% 0.00 7897 0x60037b50a38 nfs_idmap_str_uid+0xb0
40 0% 99% 0.00 19531 0x6003c392000 nfs4write+0x4dc
38 0% 99% 0.00 1135 0x60037554900 tcp_wput+0xa4
37 0% 99% 0.00 101535 0x6003c2a4000 nfs4_start_fop+0x378
35 0% 99% 0.00 6144 0x60037b50a38 nfs_idmap_str_gid+0xb0
34 0% 99% 0.00 23000 0x6003c392000 nfs4write+0x234
30 0% 99% 0.00 14483 0x6003c393000 nfs4open_otw+0x5e4
28 0% 99% 0.00 452930 0x6003c2a4000 nfs4write+0x174
27 0% 99% 0.00 1138896 0x6003c2a4000 nfs4read+0x240
This looks like we have a congestion of nfs4 operations!
I was not able to find much more information about this issue, so I contacted my Solaris Guru of Choice. He was able to find a bug:
"NFSv4 can be much slower and more expensive than NFSv3 when creating files at high concurrency"
This sounded exactly what our application does. Putting spam-mails into folders, using parallel threads. :-)
Now it was time to open a support-case. About 10 days after opening the support case, we received an "Interim Diagnostic Relief (IDR)" -Patch.
And this is how it helped us (Sunday/Monday on the right):
Again, it was relatively easy to find the root-cause for our problem, with the help of a few *stat commands.
The question remains, why did it happen.
First of all NFSv4 is still a relatively new protocol, with new bugs. We have also tried out NFSv3, but what we've seen, for our workload (small files) version 4 is much more efficient.
Second, while our workload is a perfect fit for these CoolThread CPUs (normally ;-), we're still on new grounds.
My guess is, the world will see much more concurrency problems like these in the future.
Comments
If you're creating files heavily in the same directory you may also run into 6886066 (not nfsv4-specific though).
The IDR contains the cv_signal change and there should soon be a patch available:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6996729
The IDR has solved the problem for us...