Friday, January 21, 2011

Measuring Read/Write durations with DTrace

I had the situation, where I wanted to see if read/write operations take too much time.

It was something that I thought could be done easily with DTrace. Unfortunately, my DTrace skills are a bit rusty, so I contacted my personal DTrace guru Javier, who gave me a script.

Here is the script for read operations:


slow_read.d:

#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

syscall::*read:entry
{
        this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
        this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
        this->filep = this->ufentryp->uf_file;
        self->offset = this->filep->f_offset;
        this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
        self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "") : "";
        self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
        self->start=timestamp;
}

syscall::*read:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
        printf ("Read from %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
        self->vpath = 0;
        self->offset = 0;
        self->sc_trace = 0;
        self->start = 0;
}


The script will return all read operations that take longer than 30ms (adjustable by changing the value of 30000000) , with the respective file name. Of course, if you have large files, this dtrace script doesn't necesserily make sense, but it's perfect for my small files environment...

Running the script will produce this output:

./slow_read.d
Read from /blabla/xyz.log took 205 ms
Read from /blabla/xyz.log  took 42 ms
Read from /blabla/xyz.log took 65 ms

...


Turning this into scripts for writes was easy... just replace the syscall into "write".

slow_write.d:
#!/usr/sbin/dtrace -s

#pragma D option quiet
#pragma D option switchrate=10hz

syscall::*write:entry
{
        this->filistp = curthread->t_procp->p_user.u_finfo.fi_list;
        this->ufentryp = (uf_entry_t *)((uint64_t)this->filistp + (uint64_t)arg0 * (uint64_t)sizeof (uf_entry_t));
        this->filep = this->ufentryp->uf_file;
        self->offset = this->filep->f_offset;
        this->vnodep = this->filep != 0 ? this->filep->f_vnode : 0;
        self->vpath = this->vnodep ? (this->vnodep->v_path != 0 ? cleanpath(this->vnodep->v_path) : "") : "";
        self->sc_trace = this->vnodep ? this->vnodep->v_type == VREG || this->vnodep->v_type == VDIR ? 1 : 0 : 0;
        self->start=timestamp;
}

syscall::*write:return
/self->sc_trace && (timestamp - self->start) > 30000000 /
{
        printf ("Write to %s took %d ms\n", self->vpath, ((timestamp-self->start)/1000000));
        self->vpath = 0;
        self->offset = 0;
        self->sc_trace = 0;
        self->start = 0;
}



With little effort, I have now another view of how my storage is behaving.