Processes are one of the most fundamental aspects of Linux. To carry out any task in the system, a process is required. A process is usually created by running a binary program executable, which in turn gets created from a piece of code.
It is very important to understand the transition of a piece of code to a process, how a process is born, and the states that it acquires during its lifetime and death.
In this article, we will explore in detail how a piece of code is converted first into a binary executable program and then into a process, identifiers associated with a process, the memory layout of a process, different states associated with a process and finally a brief summary of the complete life cycle of a process in Linux.
So, in short, if you are new to the concept of computing processes and are interested in learning more about it, read on…
A process is nothing but an executable program in action. While an executable program contains machine instructions to carry out a particular task, it is when that program is executed (which gives birth to a corresponding process) that the task gets done. In the following section, we will start from scratch and take a look at how an executable program comes into existence and then how a process is born out of it.
From code to an executable program
In this section we will briefly discuss the transformation of a piece of code to a program and then to a process.
The life of a software program begins when the developer starts writing code for it. Each and every software program that you use is written in a particular programming language. If you are new to the term ‘code’ then you could simply think of it as a set of instructions that the software program follows for its functioning. There are various software programming languages available for writing code.
Now, once the code is written, the second step is convert it into an executable program. For code written in the C
language, you have to compile it to create an executable program. The compilation process converts the instructions written in a software programming language (the code) into machine-level instructions (the program executable). So, a program executable contains machine code that can be understood by the operating system.
A compiler is used for compiling software programs. To compile C source files on Linux, the GCC compiler can be used. For example, the following command can be used to convert the C programming language source file (helloWorld.c) into an executable program (hello):
gcc -Wall helloWorld.c -o hello
This command should produce an executable program named ‘hello’ within the current working directory.
From an executable program to a process
An executable program is a passive entity that does nothing until it is run; but when it is run, a new entity is created which is nothing but a process. For example, an executable program named hello can be executed by running the command ./hello from the directory where hello is present.
Once the program is executed, you can check through the ps command that a corresponding process is created. To learn more about the ps command, read its manpage.
There are three particularly important identifiers associated with a process in Linux and you can learn about Process ID, Parent Process ID and Group ID in the boxout over the page.
You will note that a process named init is the first process that gets created in a Linux system. Its process ID is 1. All the other processes are init’s children, grandchildren and so on. The command pstree can be used to display the complete hierarchy of active processes in a Linux system.
Memory layout of a Linux process
The memory layout of a Linux process consists of the following memory segments…
Stack – The stack represents a segment where local variables and function arguments (that are defined in program code) reside. The contents on stack are stored in LIFO (last in, first out) order. Whenever a function is called, memory related to the new function is allocated on stack. As and when required, the stack memory grows dynamically but only up to a certain limit.
Memory mapping – This region is used for mapping files. The reason for this is that the input/output operations on a memory-mapped file are not processor and time expensive as compared to I/O from disk (where files are usually stored). As a result, this region is mostly used for loading dynamic libraries.
Heap – There are two main limitations of stack: one is that the stack size limit is not very high and secondly, all the variables on stack are lost once the function (in which they are defined) ends or returns. This is where the heap memory segment comes in handy. This segment allows you to allocate a very large chunk of memory that has both the same scope and lifetime as the complete program. This means that a memory allocated on heap is not deallocated until the program terminates or the programmer frees it explicitly through a function call.
BSS and data segments – The BSS segment stores those static and global variables that are not explicitly initialised, while the data segment stores those variables that are explicitly initialised to some value. Note that global variables are those which are not defined inside any function and have the same scope and lifetime as a program. The only exception are the variables that are defined inside a function but with a static keyword – their scope is limited to the function. These variables also share the same segment where the global variables reside: the BSS or the data segment.
Text segment – This segment contains all the machine-level code instructions of the program for the processor to read and execute them. You cannot modify this segment through the code, as this segment is write-protected. Any attempt to do so results in a program crash or segmentation fault.
Note: In the real world, the memory layout is actually a bit more complex, but this simplified version should give you enough idea about the concept.
Different states of a Linux process
To have a dynamic view of a process in Linux, always use the top command. This command provides a real-time view of the Linux system in terms of processes. The eighth column in the output of this command represents the current state of processes. A process state gives a broader indication of whether the process is currently running, stopped, sleeping etc. These are some important terms to understand. Let’s discuss different process states in detail.
A process in Linux can have any of the following four states…
Running – A process is said to be in a running state when either it is actually running/ executing or waiting in the scheduler’s queue to get executed (which means that it is ready to run). That is the reason that this state is sometimes also known as ‘runnable’ and represented by R.
Waiting or Sleeping – A process is said to be in this state if it is waiting for an event to occur or waiting for some resource-specific operation to complete. So, depending upon these scenarios, a waiting state can be subcategorised into an interruptible (S) or uninterruptible (D) state respectively.
Stopped – A process is said to be in the stopped state when it receives a signal to stop. This usually happens when the process is being debugged. This state is represented by T.
Zombie – A process is said to be in the zombie state when it has finished execution but is waiting for its parent to retrieve its exit status. This state is represented by Z.
Apart from these four states, the process is said to be dead after it crosses over the zombie state; ie when the parent retrieves its exit status. ‘Dead’ is not exactly a state, since a dead process ceases to exist.
A process life cycle
From the time when a process is created, to the time when it quits (or gets killed), it goes through various stages. In this section, we will discuss the complete life cycle of a Linux process from its birth to its death.
When a Linux system is first booted, a compressed kernel executable is loaded into memory. This executable creates the init process (or the first process in the system) which is responsible for creation of all the other processes in a Linux system.
A running process can create child processes. A child process can be created in two ways: through the fork() function or through exec(). If fork() is used, the process uses the address space of the parent process and runs in the same mode as that of parent. The new (child) process gets a copy of all the memory segments from the parent but keeps on using the same segments until either (parent or child) tries to modify any segment. On the other hand, if exec() is used, a new address space is assigned to the process and so a process created through exec() first enters the kernel mode. Note that the parent process needs to be in the running state (and actually being executed by the processor) in order to create a new process.
Depending upon the kernel scheduler, a running process may get preempted and put into the queue to processes ready for execution.
If a process needs to do things such as acquiring a hardware resource or a file I/O operation, then the process usually makes a system call that results in the process entering the kernel mode. Now, if the resource is busy or file I/O is taking time, then the process enters into the sleeping state. When the resource is ready or the file I/O is complete, the process receives a signal which wakes up the process and it can continue running in kernel mode or can go back to user mode. Note that there is no guarantee that the process would start executing immediately, as it purely depends on the scheduler, which might put the process into the queue of processes ready for execution.
If a process is running in debug mode (ie a debugger is attached to the process), it might receive a stop signal when it encounters a debug breakpoint. At this stage the process enters the stop state and the user gets time to debug the process: memory status, variable values etc.
A process might return or quit gracefully or might get killed by other processes. In either case, it enters into zombie state where, except for the entry of the process in the process table (maintained by kernel), there is nothing left for a process. This entry is not wiped out until the parent process fetches the return status of the process. A return status signifies whether the process did its work correctly or it encountered some error. The command echo $? can be used to fetch the status of the last command run through the command line (by default, only a return status of 0 means success). Once the process enters the zombie state, it cannot go back to any other state because there is nothing left for that process to enter into any other state.
If the parent process gets killed before the child process, then child process becomes an orphan. All the orphan processes are adopted by the init process, which means that init becomes the new parent of these processes.