What happen when you type command into Linux terminal

10 minute read

Any operating system is built upon 4 foundational subsystems: process management, memory management, filesystem and I/O management. Process management is probably the most visible to the users. In this article, We will understand the basic concepts in process management, such as system call, files descriptors, etc. by walking through what happens when typing a command into a Unix/Linux prompt from the keyboard.

The Keyboard

Let’s first try to understand the keyboard. The operating system inputs and outputs data via hardware components called devices. Devices are generally categorized as block or stream devices (although some devices such as clock or touch screen do not belong to either). Keyboard is an example of stream device.

Devices communicate with computer via an electronic system called controller or adapter (The controller can be a chip on the motherboard or a printed circuit card that can be inserted into a PCIe slot) The controllers have connector that connects directly to the device via a physical cable.

The Device Controller receives the data from a connected device and stores in set of registers or local buffers storage which are a temporary storage areas, in which items are placed while waiting to be transferred from an input device or to an output device in which OS can read and write via device driver.

The controller’s job is to convert the serial bit stream to block bytes and perform any error correction necessary. Each device controller is in charge of a specific type of device, i.e. disk drives, audio devices, or video displays). The OS requests and receives data from devices via either Memory-mapped I/O (CPU-to-device) or Direct Memory Access (memory-to-device)

device controller

Normal keyboards have fewer than 128 keys, so only 7 bits are needed to represent the key number. The eighth bit is set to 0 on a key press and to 1 on a key release. When a key is struck, the respective key number (scan code) is put in an I/O register of the microprocessor. When a command is typed from the keyboard, there are 2 interrupts generated: one when the key is struck and another when the key is released. At each interrupt, the keyboard driver extracts the information about what happens from the I/O port associated with the keyboard.

A device driver is the standard interface through which the device controller communicates with the OS through Interrupts. The drivers can handle the preprocessing of the data and just delivers the corrected lines to the user programs, i.e. the cooked mode - the characters stored until an entire line has been accumulated, because the user may subsequently decide to erase part of it

The other mode is raw mode where the data is passed without interpreting any of the special characters, but the cooked mode remains default mode for most Unix systems.

After the driver delivers the input to the program, the program sends characters to the current window to display. Usually, block of characters, for example, a line is written in one system call.

The Shell

Upon entering the command from terminal, the user input is read by library function getline(), getline() reads from standard input file stream - STDIN and stores input into buffer as string. The buffer is broken into tokens and stored in an array by getopt(), i.e.

["ls", "-l", “\*.txt”, "NULL"]:  args[0] = "ls", args[1] = "-l", arg[2] =\*.txt”, args[3] = "NULL”

The shell built-in glob() function checks for expansion if the command has any wildcard characters by matching pattern for global list of patterns, i.e. * which will expand the shell to a list of files and directory. This process is called shell globbing. The shell also checks alias that stored in user specific initialization files such as .bashrc and .profile and checks if the command is built-in command, i.e. implemented and executed in shell interpreter itself, it is also faster to access these type of command as it is always available in RAM.

If command is not built-in, the shell uses getenv(“PATH”) function to get $PATH string from the environment. It is first parsed using the ‘=’ as a delimiter then it uses strtok(PATH,”:”) function to further parse out the paths using “:” as a delimiter into array. The shell append “/” to end of all the directories in the PATH environment variable, i.e. path[x] + ‘/’ + argv[0] into a buffer, and use access() function to accessibility of a file at that path location.

fork() and execve()

The shell will now make fork()/clone() system call to create a copy of itself. When fork() is called, the calling process traps to the kernel and creates PCB (process control block)(task_struct and thread_info in Linux) and kernel stack

Process contents are filled from parents’ descriptor values except PID (process ID) . Linux looks for available PID and updates PID hash-table to point to new task structure, it also sets the fields in task_struct to point to previous/next process on the task list (by using the linked list, process structs need not to occupy contiguous memory space)

Operating system actually does not copy the parent’s segment to child’s memory segments; but OS gives the child its own page table and have them point to parents’ pages (only read) and use copy-on-write technique to copy memory. Whenever either process (the child or the parent) tries to write on a page, it gets a protection fault, the kernel allocates a new copy of the page to the faulting process and marks it read/write, so only pages that are actually written have to be copied.

After the fork(), the child starts executing one of “exec family” (execl, execle, execv, execve, execvp) which internally uses execve() to run user’s command. execve() replaces the current content of the process with the program that executed by user.

  1. The kernel finds and verifies the executable file and copies the arguments and env variables strings to the kernel. These variables are passed down from parent to child process during fork()

  2. The old address space and page tables of child process are discarded; then the new page tables are set up, i.e. no pages are in memory. The arguments and environment strings are copied from old address space to temporary space and then copied to the new stack, the signals are reset, and the registers are initialized to all zeros

The first value on the stack is the argument count followed by an array of the addresses of the different arguments, then address 0 marking the end of the argument array. Right after that there is a second array of addresses which each point to a zero-terminated string which would be the environment variables, this array is also terminated by having address 0 at the end.

  1. When the new process starts running, it will get a page fault and cause the first page of code to be paged in from the executable file, hence the programs can start quickly and fault in just those pages they need.

If you want the terminal to run the program instead of forking a child process, you can use the exec command directly. However, upon termination, your terminal process will exit and may cause you to log out of the system.

$ exec ls
Connection to 127.0.0.1 closed

The Main Program

Depending on the program, process will make necessary system calls. You can use strace command to see which system calls are called.

$ strace -p <pid?

Open file and file descriptor

Open files are normal files, directories, stream or a network file, etc. that are read or modified by processes. File descriptor is the handle that process uses to access the file. When a file is opened, the OS will create an entry to represent the info about this file and file descriptor is an integer that represent these entries.

lsof will provide a list of all open files by reading from /proc filesystem:

 $ losf –p <pid> or ls /proc/<pid>/fd

When a file is opened, the file descriptor will start from 3 because 0,1,2 are reserved for standard streams, i.e. standard input (stdin), standard output (stdout) and standard error (stderr). These streams are used as input and output channels that handle data between devices and the applications.

The ls program

Let’s examine the ‘ls’ program with strace, we see a common pattern of system calls: open()/openat() => read() => fstat() => mmap() => close().

$ strace ls
execve("/bin/ls", ["ls"], 0x7ffe1f4741b0 /* 23 vars */) = 0
...
fstat(3, {st_mode=S_IFREG|0644, st_size=27848, ...}) = 0
mmap(NULL, 27848, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f066d110000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=154832, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f066d10e000
mmap(NULL, 2259152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f066ccc8000
mprotect(0x7f066cced000, 2093056, PROT_NONE) = 0
mmap(0x7f066ceec000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f066ceec000
mmap(0x7f066ceee000, 6352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f066ceee000
close(3)                                = 0

The ‘ls’ program handles the directory with opendir() function (which use open() syscall) to open directory and readdir() function (which use getdents()) to read directory entries. For closing, it uses closedir() function (use close() call internally). You can read more about the process of opening and searching file in this article.

  1. For any file in the directory, it uses open()/openat() to open file which will return a file descriptor if exists, else it returns error or create the file

  2. After opening the file, the program will read header entry to identify the file type, target platform, version, so on with read(fd, 512) for 32-bit or read(fd, 832) for 64-bit files.

  3. After reading the header, the program can call fstat(int fd, struct stat *info) or stat() which loads the file information such as file permission in the area of memory indicated by the call parameters. The ‘ls’ also performs write operation with write(1,…) to file with fd 1 which is actually stdout.

There is also mmap() syscall which is used to allocate memory

The data actually is not copied directly from process to/from hard drive, instead the write() syscall copies files from process to buffer outside the memory, and then the OS copies from the buffer to the actual store devices using fsyc(), hence the process does not know when the data is written. Similarly, the read() call will first checks and then copies data from from buffer to process memory if data is already in the buffer, otherwise the data is blocked while data is read into buffer and then unblocked to read the data from buffer. The process has to be blocked instead of return empty data to avoid the process falsely indicated end of file.

  1. Process uses close() to close file and releases the file descriptor for reuse on a subsequent open().
  2. Upon completion, the child process will free up memory and call exit(). Process termination is covered in next section.

unnamed pipe

$ls | tee file.txt
  • An unnamed pipe pair is requested from the kernel with pipe() syscall which returns 2 file descriptor, i.e. 1 for read and 1 for write

  • The shell will collect the exit codes of both children processes; even if the first one fails to produce any output, and the second runs anyway with empty input.

Process Termination

A process can be terminated in 4 ways:

  • Normal exit (voluntary), i.e. in examples above, the exit() is called
  • Error exit (voluntary), i.e. error caused by process often due to program bug
  • Fatal error (involuntary), i.e. process discovers a fatal error
  • Killed by another process (involuntary), i.e. sending kill() signal

When a process exit, all open streams with unwritten buffered data is flushed and closed. Open files will be closed with close(). OS must release its page table, its pages, and the disk space that the pages occupy when they are on disk. An exit status is returned to the OS, the kernel sends SIGCHILD to parent process

  • Parent process can catch the exit status by calling waitpid() or
  • It can choose to ignore the signal in which the child process might become zombie process.

We have repeatedly mentioned system call, interrupt, signal, etc. We will explain these jargons in this post.

Leave a comment