In my first part I wrote about storing long term analytics data in Splunk.
Wouldn't it be nice to also have storage capacity tracked with Splunk?
This is how it's done:
1. Get pool properties
#!/bin/ksh
# list capacity all Pools in a System to ${outputdir}/${poolname}.pools.log
# Example: listPools.ksh /tmp 10.16.5.14
typeset outputdir=$1
typeset ipname=$2
typeset debug=$3
typeset user=monitor
if [ -z "$1" -o -z "$2" ]; then
printf "\nUsage: $0 <output Dir≷ <ZFSSA ipname> [ debug ]\n\n"
exit 1
fi
mkdir -p ${outputdir}
dat=$(date +'%y-%m-%d %H:%M:%S')
ssh -T ${user}@${ipname} << --EOF-- > ${outputdir}/${ipname}.pools.log
script
run('status');
run('storage');
var poollist=list();
printf("Time,pool,avail,compression,used,space_percentage\\n");
for(var k=0; k<:poollist.length; k++) {
run('select ' + poollist[k]);
var space_used=get('used')/1024/1024/1024;
var space_avail=get('avail')/1024/1024/1024;
var compression=get('compression');
var space_percentage=space_used/(space_used + space_avail)*100;
printf("$dat,%s,%0.0f,%0.2f,%0.0f,%0.0f\\n",poollist[k],space_avail,compression,space_used,space_percentage);
run('done');
}
run('done');
--EOF--
exit $?
2. Write Splunk props.conf
As an exercise...
3. Enjoy Splunk Dashboards:
4. Repeat for projects and shares.
PS: If I find the time, I will eventually package this into a Splunk App.
My 2nd Head
Overflow Buffer for my 1st Head
Wednesday, October 12, 2011
Friday, October 7, 2011
Splunking Oracle's ZFS Appliance
We have a bunch of Oracle ZFS Appliances. What I really like is their integrated dtrace based analytics feature.
However, some things are missing or causing problems:
-Storing long-term analytics data on the appliances produces a lot of data on the internal disks. This can fill up your appliance and in the worst case slow down the appliance software
-Scaling the timeline out too much, makes peaks invisible. This is probably a problem of the rendering software used on the appliance (JavaScript)
-Comparing all our appliances is not possible. There is no central analytics console.
As we are a heavy Splunk user, I sat together with our friendly storage consultant from Oracle and we brought these two great products closer together:
This is how we did it:
1. Setting up analytics worksheets
First we had to create the analytics worksheets. This is best done using the CLI interface, as the order of drilldowns should be always the same. Otherwise fields in the generated csv file might be in a different order on every appliance. Doing this in the BUI is possible, but hard...
I would also recommend to store the worksheet under a separate Appliance User.
Sample CLI commands:
analytics
worksheets
create Monitor
select worksheet-???
dataset
set name=io.ops[op]
set drilldowns=read,write
set seconds=3600
commit
dataset
set name=nfs4.ops
set seconds=3600
commit
...
2. Fetch Analytics Data
Script Excerpt:
ssh -T ${user}@${ipname} << --EOF--
${outputdir}/${wsname}.${ipname}.out
script
run('analytics');
run('worksheets');
var ws=list();
printf("Worksheets:%d\\n",ws.length);
printf("%s\\n",ws);
for(var i=0; i<ws.length; i++)="" p="" {<=""></ws.length;>
run('select ' + ws[i]);
var wsname=get('name');
printf("Worksheet Name:%s\\n",wsname);
if ( wsname == "$wsname" ) {
var ds=list();
for(var j=0; j<ds.length; j++)="" p="" {<=""></ds.length;>
run('select ' + ds[j]);
var dsname=get('name');
printf("zfssa_%s\\n",dsname);
dump(run('csv'));
run('done');
}
}
run('done');
}
run('done');
--EOF--
3. Configure Splunk Inputs
4. Create Splunk Dashboards
5. Enjoy Analytics Data Under Splunk
Happy Spelunking...
However, some things are missing or causing problems:
-Storing long-term analytics data on the appliances produces a lot of data on the internal disks. This can fill up your appliance and in the worst case slow down the appliance software
-Scaling the timeline out too much, makes peaks invisible. This is probably a problem of the rendering software used on the appliance (JavaScript)
-Comparing all our appliances is not possible. There is no central analytics console.
As we are a heavy Splunk user, I sat together with our friendly storage consultant from Oracle and we brought these two great products closer together:
This is how we did it:
1. Setting up analytics worksheets
First we had to create the analytics worksheets. This is best done using the CLI interface, as the order of drilldowns should be always the same. Otherwise fields in the generated csv file might be in a different order on every appliance. Doing this in the BUI is possible, but hard...
I would also recommend to store the worksheet under a separate Appliance User.
Sample CLI commands:
analytics
worksheets
create Monitor
select worksheet-???
dataset
set name=io.ops[op]
set drilldowns=read,write
set seconds=3600
commit
dataset
set name=nfs4.ops
set seconds=3600
commit
...
2. Fetch Analytics Data
Script Excerpt:
ssh -T ${user}@${ipname} << --EOF--
${outputdir}/${wsname}.${ipname}.out
script
run('analytics');
run('worksheets');
var ws=list();
printf("Worksheets:%d\\n",ws.length);
printf("%s\\n",ws);
for(var i=0; i<ws.length; i++)="" p="" {<=""></ws.length;>
run('select ' + ws[i]);
var wsname=get('name');
printf("Worksheet Name:%s\\n",wsname);
if ( wsname == "$wsname" ) {
var ds=list();
for(var j=0; j<ds.length; j++)="" p="" {<=""></ds.length;>
run('select ' + ds[j]);
var dsname=get('name');
printf("zfssa_%s\\n",dsname);
dump(run('csv'));
run('done');
}
}
run('done');
}
run('done');
--EOF--
3. Configure Splunk Inputs
4. Create Splunk Dashboards
5. Enjoy Analytics Data Under Splunk
Happy Spelunking...
Sunday, May 22, 2011
Splunk: Unscaling units
I'm working on a Splunk Application for Solaris.
One of the commands that is of interest to me is the fsstat(1m) command output. Here's the output for two filesystem types (zfs, nfs4):
This makes it much easier to parse the data.
A splunk search with multikv will split this into several fields:
We will now have single line events with the fields new_file, name_remov, name_chng etc...
The trouble is, that the fsstat command scales values automatically into human readable format. This can not be disabled.
But we are lucky, Splunk is able to solve the problem. To unscale e.g. read_ops, we add a bit Splunk magic to the search:
| rex field=read_ops "(?[\d\.]+)(?\w+)?" | eval read_ops_unscaled=case(
read_ops_unit=="",read_ops_amount,
read_ops_unit=="K",read_ops_amount*1024, read_ops_unit=="M",read_ops_amount*1024*1024, read_ops_unit=="G",read_ops_amount*1024*1024*1024, read_ops_unit=="T",read_ops_amount*1024*1024*1024*1024)
Now we have created a new field called read_ops_unscaled.
Wasn't this cool?
As this is quite hard to type I have created macros for every field that has to be scaled.
After this, I have created a master macro called `unscale_fsstat` which calls all other macros. Now it is trivial to run a search and do some stats on the results.
Happy Splunking!
One of the commands that is of interest to me is the fsstat(1m) command output. Here's the output for two filesystem types (zfs, nfs4):
solaris# fsstat zfs nfs4 1 1
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
2.21K 881 521 585K 1.22K 1.71M 9.34K 1.66M 21.3G 765K 10.7G zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 20 0 0 0 279 997K 142 997K zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
new name name attr attr lookup rddir read read write write
file remov chng get set ops ops ops bytes ops bytes
2.21K 881 521 585K 1.22K 1.71M 9.34K 1.66M 21.3G 765K 10.7G zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
0 0 0 20 0 0 0 279 997K 142 997K zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
While Splunk is very flexible in parsing whatever output, for command outputs it is better to do a little pre-formatting:
-Make headers single line
-Drop the summary line (activity since fs loaded/mounted)
-Find a solution to be able to do stats on the autoscale values (K,M,G,T)
First, I wrote a script to adjust the output. The output looks like this now:
./fsstat.pl zfs nfs4
new_file name_remov name_chng attr_get attr_set lookup_ops rddir_ops read_ops read_bytes write_ops write_bytes fstype
1 0 1 9 1 27 0 260 1.14M 145 1.18M zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
new_file name_remov name_chng attr_get attr_set lookup_ops rddir_ops read_ops read_bytes write_ops write_bytes fstype
1 0 1 9 1 27 0 260 1.14M 145 1.18M zfs
0 0 0 0 0 0 0 0 0 0 0 nfs4
This makes it much easier to parse the data.
A splunk search with multikv will split this into several fields:
sourcetype="solaris_fsstat" |multikv
We will now have single line events with the fields new_file, name_remov, name_chng etc...
The trouble is, that the fsstat command scales values automatically into human readable format. This can not be disabled.
But we are lucky, Splunk is able to solve the problem. To unscale e.g. read_ops, we add a bit Splunk magic to the search:
| rex field=read_ops "(?
read_ops_unit=="",read_ops_amount,
read_ops_unit=="K",read_ops_amount*1024, read_ops_unit=="M",read_ops_amount*1024*1024, read_ops_unit=="G",read_ops_amount*1024*1024*1024, read_ops_unit=="T",read_ops_amount*1024*1024*1024*1024)
Now we have created a new field called read_ops_unscaled.
Wasn't this cool?
As this is quite hard to type I have created macros for every field that has to be scaled.
After this, I have created a master macro called `unscale_fsstat` which calls all other macros. Now it is trivial to run a search and do some stats on the results.
Happy Splunking!
Sunday, March 13, 2011
Adjusting ZFS resilvering speed
There are two kernel parameters that can be adjusted if ZFS resilvering speed is too slow/fast:
zfs_resilver_delay /* number of ticks to delay resilver */
and
In some cases the values can be too low or two high (e.g. when using Mirroring vs. RAIDZ).
A boost could be:
whereas a handbrake is e.g.:
Disclaimer: Use at your own risk. Do not try on production systems without contacting support first.
zfs_resilver_delay /* number of ticks to delay resilver */
and
zfs_resilver_min_time_ms /* min millisecs to resilver per txg */
In some cases the values can be too low or two high (e.g. when using Mirroring vs. RAIDZ).
A boost could be:
# echo zfs_resilver_delay/W0|mdb -kw
# echo zfs_resilver_min_time_ms/W0t3000|mdb -kw
whereas a handbrake is e.g.:
# echo zfs_resilver_delay/W2|mdb -kw
# echo zfs_resilver_min_time_ms/W0t300|mdb -kwDisclaimer: Use at your own risk. Do not try on production systems without contacting support first.
Tuesday, March 1, 2011
Useful Dtrace One-Liners....
Finding write operations for a process. Especially when writing to a NFS share...
Finding the top userland stacks for a process
# dtrace -n 'syscall:::entry /execname == "execname"/ \
{ @[ustack()] = count();}'
Finding the same for a certain system call
# dtrace -n 'syscall::mmap:entry /execname == "execname"/ \
{ @[ustack()] = count();}'
# dtrace -n 'fsinfo:::write /execname == "execname"/ \
{ printf("%s", args[0]->fi_pathname) }'
Finding the top userland stacks for a process
{ @[ustack()] = count();}'
Finding the same for a certain system call
# dtrace -n 'syscall::mmap:entry /execname == "execname"/ \
{ @[ustack()] = count();}'
Friday, January 21, 2011
Measuring Read/Write durations with DTrace
I had the situation, where I wanted to see if read/write operations take too much time.
It was something that I thought could be done easily with DTrace. Unfortunately, my DTrace skills are a bit rusty, so I contacted my personal DTrace guru Javier, who gave me a script.
Here is the script for read operations:
slow_read.d:
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option switchrate=10hz
syscall::*read:entry
{
this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
this->filep = this->ufentryp->uf_file;
self->offset = this->filep->f_offset;
this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "") : "";
self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
self->start=timestamp;
}
syscall::*read:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
printf ("Read from %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
self->vpath = 0;
self->offset = 0;
self->sc_trace = 0;
self->start = 0;
}
The script will return all read operations that take longer than 30ms (adjustable by changing the value of 30000000) , with the respective file name. Of course, if you have large files, this dtrace script doesn't necesserily make sense, but it's perfect for my small files environment...
Running the script will produce this output:
./slow_read.d
Read from /blabla/xyz.log took 205 ms
Read from /blabla/xyz.log took 42 ms
Read from /blabla/xyz.log took 65 ms
...
Turning this into scripts for writes was easy... just replace the syscall into "write".
slow_write.d:
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option switchrate=10hz
syscall::*write:entry
{
this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
this->filep = this->ufentryp->uf_file;
self->offset = this->filep->f_offset;
this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "") : "";
self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
self->start=timestamp;
}
syscall::*write:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
printf ("Write to %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
self->vpath = 0;
self->offset = 0;
self->sc_trace = 0;
self->start = 0;
}
With little effort, I have now another view of how my storage is behaving.
It was something that I thought could be done easily with DTrace. Unfortunately, my DTrace skills are a bit rusty, so I contacted my personal DTrace guru Javier, who gave me a script.
Here is the script for read operations:
slow_read.d:
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option switchrate=10hz
syscall::*read:entry
{
this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
this->filep = this->ufentryp->uf_file;
self->offset = this->filep->f_offset;
this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "
self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
self->start=timestamp;
}
syscall::*read:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
printf ("Read from %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
self->vpath = 0;
self->offset = 0;
self->sc_trace = 0;
self->start = 0;
}
The script will return all read operations that take longer than 30ms (adjustable by changing the value of 30000000) , with the respective file name. Of course, if you have large files, this dtrace script doesn't necesserily make sense, but it's perfect for my small files environment...
Running the script will produce this output:
./slow_read.d
Read from /blabla/xyz.log took 205 ms
Read from /blabla/xyz.log took 42 ms
Read from /blabla/xyz.log took 65 ms
...
Turning this into scripts for writes was easy... just replace the syscall into "write".
slow_write.d:
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option switchrate=10hz
syscall::*write:entry
{
this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
this->filep = this->ufentryp->uf_file;
self->offset = this->filep->f_offset;
this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "
self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
self->start=timestamp;
}
syscall::*write:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
printf ("Write to %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
self->vpath = 0;
self->offset = 0;
self->sc_trace = 0;
self->start = 0;
}
With little effort, I have now another view of how my storage is behaving.
Subscribe to:
Posts (Atom)







