Brief introduction to OS concepts: interrupts, system call, context switch
In the previous post, We mentioned system call, interrupt, signal, etc. Let’s try to understand them in this post.
Kernel
Kernel is a piece of software code that is responsible for process management. System will enter kernel mode via context switch typically from:
- Synchronous interrupt: Exceptions/Traps, i.e. division by zero or invalid memory access and System Call
- Asynchronous Interrupt caused by external devices
Context Switch
Context switch is the process of switching from a process to the other, the current process saves the state and resumed the execution from the same point later. A context switch occurs, for instance, when the system scheduler triggers the context switch or interrupt occurs.
-
A switch from user mode to kernel mode occurs. The current process saved it state to the process table, including the registers, program counter, and the memory reference bits in the page table.
-
When the new process starts, the memory management unit (MMU) reloads with the memory map of the new process. The process switch may invalidate memory cache and related tables, forcing it to be dynamically reloaded from the main memory upon entering and leaving the kernel.
Context switch is generally expensive because it involved switching memory address space. On the other hand, thread switching is faster because it only involves switching the processor state,i.e. program counter.
The reverse of system call is often called User-mode helper function, it invokes user-space applications from kernel.
System Call
System Call is a privileged operation that a user-space process asks the kernel to perform services. Here is how it works on high level:
-
When a system call is made, the user program pushes the parameters of the call onto the user stack, and then calls the user-space library procedure (assembly language) to put the system call number in a place where the operating system expects such as a register.
-
TRAP instruction (which is an exception from user process) is executed to switch from user mode to kernel mode and executes a kernel code that located in a kernel fixed memory. This code uses the system call number as an index in a table of pointers to locate the system call handler (a.k.a system call service routine).
TRAP instruction is similar to the library procedure as they are taken from a distant location and the return address is saved on the stack for use later. However, they are also different, i.e. The TRAP instruction switches into kernel mode while procedure-call does not change the mode. Secondly, rather than giving a relative or absolute address where the procedure is located, the TRAP instruction cannot jump to an arbitrary address.
-
The system call handler starts to run. Once it has completed its work, the control is returned to the previous user-space library procedure which returns to the user program in the usual way procedure calls return
-
The user program cleans up the user stack. The stack grows downward and the compiled code increments the stack pointer exactly enough to remove the parameters pushed before the system call, the program is now free.
The system may block the calling process during the system call, i.e. the calling process is paused and run wait()/waitpid() to wait for child process to complete. On the other hand, in non-blocking calls, the control is returned to the caller immediately, however the parent process does not know when the child process complete to receive output/message from child process and this makes programming non-blocking call very hard.
Asynchronous Interrupt
Asynchronous Interrupt is an input signal initiated by the devices and handled at the kernel:
-
The devices cause interrupts by asserting a signal on a bus line that it has been assigned, and will be detected by device controllers. The controller handles the interrupt immediately if there is no interrupt in progress or no higher priority interrupt request. Otherwise, the request is ignored.
-
To handle the interrupt, the controller puts an interrupt number on the address lines and keep asserting a signal to interrupt the CPU until the CPU serves the interrupt.
-
The interrupt signal causes user process stops and saves current context, i.e. program counter, program status word and other registers. There are a few choices of where to save:
-
Internal registers: this might delay the serving of next interrupt as all information needs to be read out by the OS to prevent another interrupt from overwriting these internal registers. This causes subsequent interrupts to be disabled for long time, and the interrupts and data could be lost.
-
Processor stack: the stack pointer may not even be legal, which would cause a fatal error when the hardware tried to write some words at the address pointed to. Also, it might point to the end of a page. After several memory writes, the page boundary might be exceeded and a page fault generated. Having a page fault occur during the hardware interrupt processing gives rise to the problem of where to save the state to handle the page fault.
-
The most common approach is to use kernel stack, as the stack pointer can still be legal and point to a pinned page. However, switching into kernel mode can change memory management unit (MMU) contexts and invalidate cache and translation lookaside buffer (TLB). Reloading all of these, statically or dynamically, will increase serving time for interrupt and thus waste CPU time.
Regardless of how CPU handles interrupts, OS designers often aim to achieve precise interrupt which satisfies these properties.
- After CPU saves the state, it uses the number on the address lines as an index in vector table or Interrupt descriptor table to fetch a new program counter. This program counter points to the start of the respective interrupt handlers or interrupt-service procedure (ISP).
The interrupt vector can be hardwired into the machine or it can be anywhere in memory, with a CPU register (loaded by the operating system) pointing to its origin.
Interrupt-service Procedure
The CPU delays the acknowledgment of the interrupt to prevent race conditions involving multiple interrupts, it leaves to the ISP to acknowledge by writing a certain value to one of the interrupt controller’s I/O ports to tell the controller that it is free to issue another interrupt
Large amount of work must be done in response to a device interrupt, but it is also undesirable to keep other interrupts being blocked for so long. These two needs (work and speed) conflict with each other. Linux resolves this problem by splitting the interrupt handler into two halves. The top half is the routine that actually responds to the interrupt. The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time. The top half saves device data to a device-specific buffer, schedules its bottom half, and exits. The bottom half then performs required works, such as awakening processes, starting up another I/O operation, etc. This setup permits the top half to service a new interrupt while the bottom half is still working.
A lower interrupt can be interrupted by another higher interrupt if your OS supports nested interrupt. The kernel saves the old execution code of the interrupted interrupt and starts the new interrupt.
Signal
Signal is an inter-process communication (IPC),initiated by the kernel or process, via kill() system call which asynchronously delivers to a thread or process.
Signal generation: When a process sends signal with kill(), the kernel confirms the calling process has sufficient privileges to send the signal, else an error is returned. In order to generate a signal, the OS simply sets a bit in a bit array maintained in the Process Control Block (PCB) data structure (task_struct in Linux) of the process that received signal. Each bit of the array corresponds to a particular signal, and when a bit is set, the signal corresponding to the bit is pending.
Signal delivery: when the receiving process is switched to kernel mode for scheduling, it will process the signal. Before switching back to user mode, the kernel always checks the pending signals for this process. This check must happen in kernel space because some signals can never be ignored by a process, i.e. SIGSTOP and SIGKILL in which the kernel unconditionally acts on it. Otherwise, a process have 4 ways to handle a signal:
- Take the default action, i.e. ignore it completely (SIGIGN), kill the process (SIGTERM), kill the process with a core dump (SIGSEGV).
- Block the signal, i.e. the signal will stay pending and no action is seen until it is unblocked.
- Handle the signal with a predefined procedure if the process elect to catch it.
- Ignore the signal.
Examples
-
When a user presses ctrl+C, kernel will sends SIGINT to child process and the default behavior is similar to SIGTERM, i.e terminate the process gracefully (SIGTERM is also the default signal when you use $kill command)
-
When you start a program from terminal, you can press Control+Z to send SIGTSTP to a foreground application, effectively putting it in the background, suspended; and then you can execute $bg to make child process in the background.
-
As I have mentioned in this post, when a child process terminates, kernel sends SIGCHLD to parent process in which parent can catch the child status with waipid().
-
When a Unix system is shut down, the Init process sends the SIGTERM signal to all processes which they can catch, it waits some fixed amount of time (often between 5 and 20 seconds) to give processes time to clean up. After that, the Init process sends the SIGKILL signal which cannot be caught to any processes still running to force termination.
-
When a program instructs the CPU to read or write an invalid physical memory address, SIGBUS is sent to the program. On the other hand, when there is segmentation fault, i.e. process references to invalid memory address, SIGSEGV is sent. The default action for both SIGBUS and SIGSEGV is to kill with a core dump.
-
When you exit the terminal, the terminal also sends SIGHUP to kill its child process. This behavior is not applicable to daemons as they don’t interact directly with the user, instead they will reload their configuration files when they receive SIGHUP.
Leave a comment