Linux and Linux 4.0 – the kernel column

Jon Masters updates us as Linus contemplates Linux 4.0 and debate continues over how to handle the year 2038 ‘Y2K’ problem


Linus Torvalds released the final 3.19 kernel roughly on cue, noting that “nothing all that exciting happened [since the 3.19-rc7 release candidate], and while I was tempted a couple of times to do an rc8, there really wasn’t any reason for it.” As mentioned in last month’s issue, the new kernel includes a number of exciting new features: support for Intel’s MPX Memory Protection Extensions (which we covered in detail previously), a new HSA driver for AMD GPU devices, enhanced RAID 5 and 6 support in Btrfs, and the final promotion of Android’s Binder IPC mechanism out of the kernel’s staging tree. As usual, KernelNewbies have an excellent summary of the various patches with links to commits.

One of the more miscellaneous features merged into Linux 3.19 is that of DeviceTree Overlays. These are relevant to embedded devices, such as the RaspberryPi, which uses a DeviceTree to describe the many assorted platform (non-discoverable) devices that form the System-on-Chip (SoC) upon which it is built. Modern SoCs have dozens or even hundreds of devices, including a plethora of IO interfaces (such as USB and networking). In most cases, these are directly attached to the CPU cores on the same chip with no intervening PCI-like enumerable buses in between. This means that determining the physical topology and structure of the system at boot time necessitates the provision of a DeviceTree data structure. Overlays extend DeviceTrees by allowing for the provision of run-time extensions that describe the devices contained within Ras Pi add-ons or BeagleBone capes (for example). Thus, Linux 3.19 will make life much easier for those using custom Ras Pi boards.

Linux 4.0?

With the release of Linux 3.19 came the near- immediate opening of the merge window for the subsequent release. The merge window is a period of up to several weeks (the actual duration varying depending upon how Linus is feeling), during which intrusive and disruptive changes are allowed to the kernel. Traditionally these could be quite impactful, and while it is still true that significant churn happens in the merge window, these days most of the 10,000+ changesets – collections of patches – applied during this small window of time have been through an entire previous kernel 7-8 week cycle in Stephen Rothwell’s linux-next test kernels, or the equivalent maintainer development tree for certain non-core parts of the kernel.

In much the same way that Linux development has changed to be less eventful in terms of churn, so has the significance of kernel version numbers. There was a time when kernels adopted odd/even numbering schemes in which an unstable 2.5 kernel series preceded the stable 2.6 series. But that was done away with during 2.6 development and there never was a 2.7 series. This was in large part due to the transition to the modern git-driven source development process and the 8 week kernel development cycle. As a consequence, Linux 3.0 came about not because of some new Earth-shattering development, but simply because Linus felt that the version numbers had grown too high. This same logic has been applied again, with Linus warning, “We are getting to release numbers where I have to take off my socks to count that high again”. Thus it is highly likely that the next kernel will be Linux 4.0, and even more likely that the change in version number will be entirely meaningless, except in product marketing literature which will likely use it to full effect.

The merge window for what will become the next kernel has already brought with it some goodies. For example, Linux has gained support for lazytime, a kernel file system mount time option that improves file system performance by intentionally delaying the update of certain file access times, so that simply reading files won’t result in many writes updating the associated metadata. This concept is of course not new. Kernels have long supported mount options such as relatime that are in widespread use (and the default for a number of distributions). These existing options also change the kernel behaviour – by only updating file access times under certain conditions – but they break strict POSIX compliance in the process. The new lazytime option instead performs the update of file access times but it stores this data in memory. It will not be written back to disk until the kernel has a reason to otherwise write file metadata, or a certain amount of time has passed (24 hours in the current implementation).

Binding Android APIs

Linux 3.19 promotes Android’s Binder out of the Linux staging tree and into the real part of the kernel. Binder is an Inter-Process Communication (IPC) mechanism, used to allow two different processes (applications, or ‘apps’) on an Android phone or device to communicate with one another by having one process call a runtime method provided by the other. It contrasts with traditional Unix IPC mechanisms such as SystemV IPC (SysV IPC). For the longest time, Binder lived as a driver module contained within Android patchsets that needed to be applied to an upstream Linux kernel before it could be used on an Android device. Several years ago, work to move Binder into the Linux staging tree took place and it was moved into the special drivers/staging subdirectory of the kernel, a place intended for unstable test driver code that isn’t quite ready for production, to be made available to those early adopters who nonetheless have a need or a desire to use it.

Binder was never really a conventional driver in the sense that it supported some specific hardware add-on devices, but its presence in staging served two purposes. The first was political: Linux is still seen as a Unix-like operating system to many developers. Those people did not like the notion of Google inventing a new IPC mechanism and thrusting it upon the masses without a fight. Thus, moving the code into staging enabled those with concerns to be placated, while beginning to carry the code in the same location as the remainder of the Linux kernel source. Secondly, keeping Binder out-of-tree meant that it had to be maintained separately and then patched into kernels before they could be used on Android systems. After a sufficient period of time there was a certain acceptance that millions of phones and other devices are running Android and all of these are using Binder. As Greg Kroah-Hartman said when promoting it out of staging: “No matter what comes in the future, we are going to have to support this API”.

Ongoing development

There was much debate over the past month concerning the continuing problem of the year 2038. This is the point in time at which the 32-bit time_t used in 32-bit Linux, Unix and Unix-like systems overflows such that time is seen to retrogress back to the year 1901. The cause of the ‘bug’ is simple: 32-bit integers can only encode 32 bits of information. Unix systems view time as the passage of seconds since a magic epoch of January 1 1970 (about the birth of Unix), and they encode time as a signed quantity of seconds added or taken from this value. Modern 64-bit Linux, Unix and Unix-like systems have updated APIs that allow for an unfathomable amount of time (292 billion years) to be represented in a full 64 bits, but there are many legacy 32-bit systems running 32-bit code, and there will be more 64-bit systems running legacy 32-bit code using special APIs, such as x32 and ILP32.