Type command into Linux terminal

9 minute read

Any operating system is built upon 4 foundational subsystems: process management, memory management, filesystem and I/O management. Process management is probably the most visible to the users. In this article, We will understand the basic concepts in process management, such as system call, files descriptors, etc. by walking through what happens when typing a command into a Unix/Linux prompt from the keyboard.

The Keyboard

Let’s first try to understand the keyboard. The operating system inputs and outputs data via hardware components called devices. Devices are generally categorized as block or stream devices (although some devices such as clock or touch screen do not belong to either). Keyboard is an example of stream device. Now devices communicate with computer via an electronic system called controller or adapter. The controller can be a chip on the motherboard or a printed circuit card that can be inserted into a PCIe slot. The controllers have connector that connects directly to the device via a physical cable.

device controller

Controllers usually consists of registers which the operating system perform actions such as deliver or accept data; and data buffers where the operating system can read and write. There are number of ways that allow CPU to communicate with the control registers and the device data buffers but it is beyond the interest of this post.

Normal keyboards have fewer than 128 keys, so only 7 bits are needed to represent the key number. The eighth bit is set to 0 on a key press and to 1 on a key release. When a key is struck, the respective key number (or scan code) is put in an I/O register. At the same time, when a command is typed from the keyboard, there are 2 interrupts generated: one when the key is struck and another when the key is released. At each interrupt, the keyboard driver extracts the information about what happens from the I/O port associated with the keyboard. We will re-visit interrupt at the end of the article.

The drivers can also handle the preprocessing of the data and just delivers the corrected lines to the user programs. This is known as the cooked mode. The characters must be stored until an entire line has been accumulated, because the user may subsequently decide to erase part of it. The other mode is raw mode where the data is passed without interpreting any of the special characters, if the program may not yet have requested input, the characters must also be buffered to allow type ahead.

After the driver delivers the input to the program, the program sends characters to the current window and they are displayed there. Usually, block of characters, for example, a line is written in one system call.

The Shell

Upon entering the command from terminal, the user input is read by library function getline(), getline() reads from standard input file stream - STDIN and stores input into buffer as string. The buffer is broken into tokens and stored in an array by getopt(), i.e.

["ls", "-l", “\*.txt”, "NULL"]:  args[0] = "ls", args[1] = "-l", arg[2] = “\*.txt”, args[3] = "NULL”

(Optional) The shell built-in glob() function checks for expansion if the command has any wildcard characters by matching pattern for global list of patterns, i.e. * which will expand the shell to a list of files and directory. This process is called shell globbing. The shell also checks alias that stored in user specific initialization files such as .bashrc and .profile and checks if the command is built-in command, i.e. implemented and executed in shell interpreter itself, it is also faster to access these type of command as it is always available in RAM.

If command is not built-in, the shell uses getenv(“PATH”) function to get $PATH string from the environment. It is first parsed using the ‘=’ as a delimiter then it uses strtok(PATH,”:”) function to further parse out the paths using “:” as a delimiter into array of char ** path which need to allocate with malloc(). The shell append “/” to end of all the directories in the PATH environment variable, i.e. path[x] + ‘/’ + argv[0] into a buffer, and use access() function to accessibility of a file at that path location.

fork() and execve()

The shell will now make fork()/clone() system call to create a copy of itself.

  • (Optional) Details of fork(): The calling process traps to the kernel and creates process structure (task_struct in Linux), kernel stack and thread_info structure. Process descriptor contents are filled from parents’ descriptor values except PID. Linux looks for available PID and updates PID hash-table to point to new task structure, it also sets the fields in task_struct to point to previous/next process on task array, this is called the task list. The idea of using a linked list is that it need not to occupy contiguous memory space.
  • (Optional) Operating system actually does not copy the parent’s segment to child’s memory segments; but OS gives the child its own page table and have them point to parents’ pages (only mark read) and use copy-on-write technique to copy memory. Whenever either process (the child or the parent) tries to write on a page, it gets a protection fault, the kernel allocates a new copy of the page to the faulting process and marks it read/write, so only pages that are actually written have to be copied.

After the fork(), the child starts executing one of “exec family” (execl, execle, execv, execve, execvp) which internally uses execve() to run user’s command.

  • (Optional) Details of execve(): execve() replaces the current content of the process with the program that executed by user. The kernel finds and verifies the executable file and copies the arguments and environment strings to the kernel. It releases the old address space and its page tables and the new page tables are set up to indicate that no pages are in memory, except perhaps one stack page, but that the address space is backed by the executable file on disk. When the new process starts running, it will get a page fault and cause the first page of code to be paged in from the executable file, i.e. nothing has to be loaded in advance, so programs can start quickly and fault in just those pages they need and no more. The arguments and environment strings are copied to the new stack, the signals are reset, and the registers are initialized to all zeros. The new command can start running.
  • (Optional) If you want the terminal to run the program instead of forking a child process, you can use the exec command directly. However, upon termination, your terminal process will exit and may cause you to log out of the system.
[email protected]:/tmp/a/b$ exec ls
Connection to 127.0.0.1 closed

The Main Program

Depending on the program, it will make necessary system calls. You can use strace command to see which system calls are called. However, let’s first briefly understand about open file and file descriptor. Open files are normal files, directories, stream or a network file, etc. that are read or modified by processes. File descriptor is the handle that process uses to access the file. When a file is opened, the OS will create an entry to represent the info about this file and file descriptor is an integer that represent these entries. lsof will provide a list of all open files by reading from /proc filesystem:

 $ losf – p <pid> or ls /proc/<pid>/fd

It worth to remember that when a file is opened, the file descriptor will start from 3 because 0,1,2 are reserved for standard streams, i.e. standard input (stdin), standard output (stdout) and standard error (stderr). These streams are used as innput and output channels that handle data between devices and the applications.

Let’s examine the ‘ls’ program with strace, we see a common pattern of system calls: open()/openat() => read() => fstat() => mmap() => close().

$ strace ls
execve("/bin/ls", ["ls"], 0x7ffe1f4741b0 /* 23 vars */) = 0
...
fstat(3, {st_mode=S_IFREG|0644, st_size=27848, ...}) = 0
mmap(NULL, 27848, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f066d110000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20b\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=154832, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f066d10e000
mmap(NULL, 2259152, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f066ccc8000
mprotect(0x7f066cced000, 2093056, PROT_NONE) = 0
mmap(0x7f066ceec000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f066ceec000
mmap(0x7f066ceee000, 6352, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f066ceee000
close(3)                                = 0

The ‘ls’ program handles the directory with opendir() function (which use open() syscall) to open directory and readdir() function (which use getdents()) to read directory entries. Later on, it closes directory stream with closedir() function (use close() internally). For any file in the directory, it uses open()/openat() to open file, then it reads with read(), it might needs to get file information with stat()/fstat() and close() to close file and makes the file descriptor available for reuse on a subsequent open().

You can read more about the process of opening and searching file in this article.

After opening the file, the program will read header entry to identify the file type, target platform, version, so on with read(fd, 512) or read(fd, 832) for 32-bit and 64-bit files.

After reading the header, the program can call fstat(int fd, struct stat *info) which returns the file information in the area of memory indicated by the info argument. Another similar syscall is stat() which uses filename instead of file descriptor. You might also notice the mmap() syscall which is used to allocate memory. The ‘ls’ also performs write operation with write(1,…) using fd 1, this means the child process prints the output and not the parent process. Upon completion, the child process will free up memory and call exit(). Process termination is covered in next section.

What happens with an unnamed pipe, i.e. $ls | tee file.txt

  • An unnamed pipe pair is requested from the kernel with pipe() syscall: One process write to the pipe, and the other process reads from the pipe.
  • The end of the pipe that accepts input is dup2(int oldfd, int newfd) into the stdout of the process that will execute ls. On success dup2() returns new descriptor. The other end of the same pipe used dup2(int oldfd, int newfd) into the stdin of the process that will execute tee.
  • The shell will collect the exit codes of both children processes; even if the first one fails to produce any output, and the second runs anyway with empty input.

Process Termination

A process can be terminated in 4 ways:

  • Normal exit (voluntary), i.e. in examples above, the exit() is called
  • Error exit (voluntary), i.e. error caused by process often due to program bug
  • Fatal error (involuntary), i.e. process discovers a fatal error
  • Killed by another process (involuntary), i.e. sending kill() signal

When a process exit, all open streams with unwritten buffered data is flushed and closed. Open files will be closed with close(). OS must release its page table, its pages, and the disk space that the pages occupy when they are on disk. An exit status is returned to the OS, the kernel sends SIGCHILD to parent process

  • Parent process can catch the exit status by calling waitpid() or
  • It can choose to ignore the signal in which the child process might become zombie process.

We have repeatedly mentioned Linux jargons such as system call, interrupt, signal, etc. We will explain these jargons in this post.

Leave a comment