The development cycle for the 2.6.36 kernel has rapidly come to a close, and with it comes the preparation of new features for the 2.6.37 ‘merge window’, which immediately follows the 2.6.36 release. More on those new features in a moment. New features are always exciting areas to work on: they pay the bills and keep kernel developers’ brains stimulated in appropriate ways. At this point, Linux averages 5.5 changes per hour, every hour of every day, and is perhaps one of the most active software projects in human history.
With that rate of churn do come a few downsides we should be mindful of, not the least of which are horribly embarrassing security regressions. I’ve mentioned this topic before, but recent events necessitate revisiting it once again.
This month saw not one, but two large and widespread (across many kernels) security problems to add to the compatibility system call issue mentioned last time (which itself affected pretty much every system out there). The first security bug has to do with a change back in 2.6.23 that altered the manner in which command-line options passed to new processes are handled to remove an arbitrary limit.
Unfortunately, that change introduced a stack overflow problem that can be used to crash systems or gain root access. The second security issue concerns support for running 32‑bit (x86) binaries on 64-bit systems and handling the special filtering needed in system call routines to make sure addresses, values and parameters are properly converted from one format to the other (through zero extension, and so forth).
Sadly, a previous fix from several years ago had been unfixed in a subsequent kernel update and lay silently as a gaping security hole.
Security is important, but these issues are representative of a wider concern. As Jon Corbet noted several times in Linux Weekly News, there are few (if any) people employed solely to audit the kernel for newly introduced or reintroduced security bugs. I believe the problem is even bigger: we don’t really have many people whose job is to ensure that the kernel is free of regressions and bugs in general.
There are plenty of people out there working on the kernel, especially in stabilising it for ‘enterprise’ products, and there are even people working on test suites for the kernel, but there is nobody spending all of their time rigorously testing and auditing the upstream kernels released by Linus, or the stable kernels released by Greg Kroah-Hartman, other than a few lonesome volunteers.
Famous author of the ext2 file system Ted Ts’o recently got into a discussion along these lines, noting in an email discussing the ‘stable’ series kernels that kernel developers “are not paid to fix bugs for random folks who want [to] run the latest stable kernel”. Perhaps it’s more than time someone was paid to help end these security embarrassments.
Some of the more exciting things being worked on at the moment ahead of the 2.6.37 (or perhaps 2.6.38) kernel include Arnd Bergmann’s efforts to finally rid the Linux world of the Big Kernel Lock once and for all, Tejun Heo’s work on concurrency-managed workqueues, and Mathieu Desnoyer’s poking at long-standing scheduler behaviour. Many other patches are being prepared for 2.6.37 and will be discussed next time, once the merge window is open.
Big Kernel Lock
I’ve mentioned the Big Kernel Lock on a number of occasions here. I do so because it really is an important topic. The BKL has been with us ever since Alan Cox did the first work on SMP (symmetric multiprocessing – the fancy way to refer to multiple processors) for kernel 2.0.
Back then it was revolutionary, but now it’s a very coarse-grained software locking mechanism for which we have many better alternatives that are tantalisingly close to finally, completely replacing this scalability nightmare. Fingers crossed, it looks like Arnd’s work is finally going to be complete in 2.6.37.
Concurrency-managed workqueues. Tejun Heo has been hard at work attempting to reduce the number of kernel threads (special processes that at run by the kernel itself) on an average system, from many hundreds down to a more sane and manageable level. He is doing this by implementing a kind of generic mechanism for the kernel to schedule units of work (tasklets and other things) that will run in a process context on behalf of the kernel.
Scheduling niceties. Mathieu Desnoyers (maintainer of LTTng) recently noticed some ways to improve scheduler performance by tinkering with the default amounts of time assigned to various processes. A lot more work is going on there, but one simple fix that has already been examined is that of adding a START_NICE capability to temporarily boost the ‘nice’ level (or process priority) of a newly forked process, splitting time between the new process and its parent, rather than having the new process join the back of the line for some processor time. This can really improve responsiveness.
That’s all for this month. In the meantime, don’t forget to subscribe to my free kernel podcast, which is newly updated and available at Kernelpodcast.org.