The Kernel Column – 3.9 draws near

Jon Masters summarises the latest news from the Linux kernel community as the final 3.8 kernel release approaches and preparation for 3.9 begins


Linus Torvalds announced the 3.8-rc7 kernel from the LinuxConf AU (Australia) – or LCA – conference, where he and certain other kernel hackers are usually to be found at this time of year. In his email he notes that network access had been “horrendous”, but he still pulled in some last-minute 3.8 kernel fixes. This is likely to be very close to 3.8 final, which will be out by the time that you read this. The rc7 release was actually a little “better and smaller… due to the fact that there is no internet under water” where Linus went on a week-long diving break immediately prior to LCA. We will have a full summary of the latest 3.8 kernel features in our next issue.

Kernel probes

Recently, your author had cause to track down one of those gnarly low-level system issues that ensnared various different pieces of core software having complex interactions. A system management daemon was sending unwanted signals to various programs, but it was not clear exactly why, or where those
unwanted signals were really coming from. There were more conventional ways to debug the problem, but your author does so much enjoy having a little fun. The Linux kernel has many different and very useful debugging facilities, which these days even include a full GDB stub implementation for remote control of the kernel from another system via a serial port, USB or other interface. Still, a mainstay of kernel debugging remains the humble ‘printk’. Printk is a kernel function that is used to generate a log message in the same way that printf is used in regular user-space application code to output messages onto the terminal. Often, kernel hackers will insert liberal calls to printk throughout sections of code being debugged or analysed for other purposes. Such calls may look like the following, which logs the name of the currently running task (known as a process outside of the kernel):

printk(“currently running task:
%sn”, current->comm);

The output from this call will be visible in the kernel’s ‘ring buffer’, a circular log in which older entries are overwritten as newer ones are created. The kernel log is visible (sometimes requiring root privileges) using the ‘dmesg’ command on the terminal. While calling printk is often sufficient to provide the kind of information desired for debug, in order to cause that printk call to happen one would traditionally have to rebuild the entire kernel and reboot, or at least unload and reload a suspect driver. Either situation is potentially disruptive and may alter the behaviour of the system sufficiently to cause difficulty reproducing whatever problem is being worked on. Sometimes, it can take many hours to re- trigger awkward problems.

Enter kernel probes, known formally as kprobes. These provide a lightweight mechanism to dynamically modify running kernel code in-place, to insert new function calls or simply to instrument running functions for debug purposes. Probes come in two varieties: kprobes and jprobes. A kprobe allows a specific code location (which could be anywhere within the kernel) to be dynamically patched with a call to a (provided) function, while a jprobe is a streamlined version of a kprobe that is intended only to insert a call to new code at the entry point of any existing kernel function. It is the latter jprobe option that this author used to debug the kernel’s signal handling code recently. I provided the following function within my own custom kernel module:

static int jp_func_send_signal(int sig, struct siginfo *info,
struct task_struct *t, int group)
printk(“sending signal %d
from %s [%d] to %s [%d]n”,
sig, current->comm,
current->pid, t->comm, t->pid);
return 0; }

Which I registered to fire on every call to the kernel’s built-in signal processing function send_signal using the following code:

static struct jprobe jp_send_signal ={
.entry = jp_func_send_
//.kp.addr = (kprobe_
opcode_t *) send_signal,
.kp.addr = (kprobe_opcode_t
*) 0xc0036aac,
int init_module(void)
printk(“registered jprobe at
return 0; }

With the addition of a few standard header includes, a MODULE_LICENSE macro, and a call to unregister_jprobe on module unload, I had created a simple kernel module that could be loaded into my running system to debug the signal handling code without rebooting. Much more complicated examples could include altering the behaviour of the running kernel or even inserting fixes to obviate security problems, much in the same way that the Ksplice utility from Oracle allows hotfixes to be applied. You can read more about how to write and use kprobes within the Documentation directory of the Linux kernel source. And that problem I was debugging? By using kernel probes, I was able to track down what offending process was sending unwanted kill signals to a core system utility that I was using.

Ongoing development

Alan Cox has announced that he is “leaving the Linux world and Intel for a bit for family reasons”. He says “I’m aware that ‘family reasons’ is usually management speak for ‘I think the boss is an asshole’ but I’d like to assure everyone that while I frequently think Linus is an asshole (and therefore very good as kernel dictator), I am departing quite genuinely for family reasons”. Alan Cox has been a continuous presence in the kernel community ever since I first became involved with Linux 18 years ago (and even well before that). He will be missed, and we wish him only the best.

Anatol Pomozov posted a message entitled ‘Improving AIO cancellation’. AIO (Asynchronous I/O) is a mechanism (implemented through a single system call) that user-space applications can use to request one or more file I/O operations take place without waiting for completion. The individual parts of the request may complete in arbitrary order and the whole mechanism is generally used in a performance- critical database or other specialist application code. This is likely also the case for Anatol, who works at Google. Google engineers have scaling problems many of us only dream of, and it is frequently the case that they will find interesting corner cases of inefficiency in older kernel code. In this case, Anatol notes that the kernel AIO mechanism provides a cancellation interface, but that the cancellation nonetheless still results in actual read I/O taking place on a physical device (it is just ignored). He proposes various ways to modify the AIO code such that dropped requests really are dropped completely.

In another interesting Google patch, Derek Basehore of the Google Chromium project posted a couple of patches entitled ‘don’t wait on disk start on resume’. Rather than block the entire resume process on spinning up any rotational media (which can take some time relative to the overall resume), the patches insert the request to spin up in the disk I/O elevator (the algorithm that controls disk I/O traffic) and allow the remainder of the resume to continue. Actual disk I/O is blocked until the disk is available.

Finally this month, a little analysis of kernel release history shows that the most common day for a new kernel release is a Sunday. According to Linus, who got curious after he was a day later than planned on the 3.8-rc4 release, the “whole release in the middle of the week thing feels odd to me”.