Monitoring disk usage

i.e. How to find out what is causing the disk to spin up and down

Introduction

So, you have just got smart spindown working on your MBWE but it keeps spinning up and down. Now you want to find out what's accessing the disk thus causing it to spin up and down? Here's how to do it.

Prerequisites

Enabled SSH access.

A text editor and the know-how to use it. I recommend installing nano, but vi will also do just fine.

The how-to

  1. Make sure you are in superuser mode.

    # su
  2. First, you have to stop syslogd and klogd if you haven't already done that. We will be logging disk access and we don't want it to be logged on a log file, thus causing disk activity, which causes more log entries, which causes more disk access... and so on. You got the point. If you didn't, don't mess with your MBWE.

    To prevent them from starting, edit /etc/inittab. Comment out these lines:

    ::respawn:/sbin/syslogd -n -m 0
    ::respawn:/sbin/klogd -n
  3. Reboot.

  4. Enable I/O debugging

    # echo 1 > /proc/sys/vm/block_dump

  5. Now, all that's happening in your disks will be logged to your kernel ring buffer. You'll see its contents with

    # dmesg

    The output will have lots of stuff from your last boot in the beginning, so you might want to clear the buffer with

    # dmesg -c
  6. Now, leave your MBWE alone for a while.

    After a while, your ring buffer will be full of lines like

    <7>dmesg(10078): READ block 2525104 on md1

    Typically, one line describes one block access. First, there is the name of the executable, then its process id and finally the description of what happened. So this line means that dmesg was reading the disk (md1).

    When a process is writing to the disk, the line will look like:

    <7>tail(10976): dirtied inode 45 (temperature.log) on ram1

    This means that tail was writing to the file temperature.log on my ram disk. Actually, the data has been written to the write buffer in memory and will be written to the disk after a while. The actual write will be indicated by a line like this:

    <7>pdflush(36): WRITE block 984 on ram1
  7. So, after your MBWE spins up unintentionally, check with dmesg what was causing it.

    Remember to turn off I/O debugging off afterwards with

    # echo 0 > /proc/sys/vm/block_dump

  8. In the next tutorial, Reducing disk usage, a few of the most common disk activity causes will be eliminated.