What Operating Systems can and can't do

Kapil Hari Paranjape

What is an OS good for?

An operating system is a suite of programs. To answer the question ``What does this suite of programs do?'', we need to ask ourselves what we use computers for. Here are some possible answers:

Playing games.: This is not as frivolous an answer as it sounds. It turns out that the Unix¹ system has its roots in a game written by Dennis Ritchie! More generally, one can use a computer for entertainment such as creating and experiencing music, paintings and movies.
Document preparation.: This was one of the principal new ideas behind the design of the Unix system. The computer can be used effectively as a desktop publishing tool.
Programming.: This is the classical use of a computer. However, this has a modern flavour with the notion of providing not just the tools to write and run programs but also additional tools to develop and manage large suites of programs.
Communications.: This is one of the more modern uses of a computer. While some basic form of communication has always been present, it is only after about 1985 that it has become imperative that individual computers be ``well-behaved'' components of a network that communicates in ``real-time''.
Data Management.: Again one of the classical uses of computers is to manage large quantities of data and convert that into a useful ``humanly accessible'' form. As one application this could be used to generate time-tables for organisations or individuals.

Surely one could add to this list quite easily. Moreover, as computers develop one can visualise that more such applications will emerge. One of the amazing aspects of the design of the Unix operating system is that it has been able to cope with this changing world quite successfully (so far!). The GNU/Linux system that we have chosen as our practical example for this course provides the most modern working solution. Further it is a solution which enhances the traditional Unix solution in numerous ways (GNU stands for Gnu's Not Unix!).

Other than user requirements, three other critical ingredients have gone into the practical solution that we see before us today. The next input is from technology. The existence and availability of technology is often neglected but when one looks at GNU/Linux it is clear that its development and growth is closely tied to the emergence of comparatively cheap computer hardware. In the 1980's A. Tannenbaum had designed a nice operating system called Minix which worked on the PC/XT which had a 8086 chip. However, it was limited by the fact that the hardware did not offer any form of protection (see below) and so a badly designed program could crash the machine. With the 80386 appearing in the early 1990's, many users of Minix wanted to ``enhance'' it to make use of the memory protection features of the 80386. This was quite complicated and Bruce Evans did in fact do a lot of work. One of the Minix users was not quite satisfied with this piecemeal solution however and decided to write it all from ``scratch''--this user was Linus Torvalds. Thus the emergence of the 80386 was closely tied to the inception of the idea of Linux.

Neither Tannenbaum and Torvalds may like it, but the former had a big influence on the operating system we see before us. This is an instance of the influence of ``theoretical'' computer science on our system which comes about in two ways. On the one hand algorithms and data structures that computer scientists create do eventually make their way into real programs. On the other hand the logical ideas and metaphors created by TCS people are important as ways in which we learn to visualise problems and look for their algorithmic solutions. One need not look too deep to find concepts like ``B-trees'' and ``semaphores'' in our modern operating system. On a personal note, I first heard about a categorical view of programming in a TCS seminar at IIT, Kanpur in 1980 or so. Clearly this became a ``hot'' TCS topic for a while. Today similar ideas have lead to the much touted ``object-oriented'' programming paradigm.

The GNU aspect of the practical solution is perhaps not as evident as the ones above. We must thank Richard Stallman for pointing it out and further strengthening it by means of the General Public License (GPL). An important aspect of programming is re-usability--if not we would just hand everyone a computer with no programs and say ``Get to work''. We need to re-use what others have created before us. At the same time we often need to re-work and improve or adapt that work. In order that this process be carried forward we should of course provide others access to all the material that we made use of and that which we added to it. These principles are what the GPL clearly underlines. We cannot use, learn from, study and adapt a system to our needs were it not for ``Open Source''--this much is perhaps obvious. But we need to create a system that will remain usable, learnable, studiable and adaptable--for that we need the GPL copyleft.

When we combine all these inputs, we can list some of the features that we expect from our Operating System:

Multi-tasking: While I am typing these notes at the computer, the computer needs to respond to my keyboard interaction perhaps 2-3 times second. On the other hand a typical computer (CPU) can do something of the order of a billion operations per second. For efficiency, it is thus vital that the computer can do multiple tasks ``at once''. Of course, most CPU's are ``single-threaded'' and can actually do only one thing at a time, so it is upto the operating system (usually the kernel) so schedule multiple tasks so that no time (CPU cycles) are wasted (idle); of course, it is up to us to provide those tasks!
Multi-user: The Unix system was designed at a time when a computer was typically shared by many users, thus making it imperative that the design take this into account. Even nowadays when many computers are personal computers, this is useful. The reason is that we use the computer in different ways at different times and we would want these different ways to be ``orthogonal''. When one is typing a text one does not accidentally want to delete programs, when one is browsing the internet one does not want this communication channel to be used to mangle or destroy important data stored on the machine and so on.
Reflexive: Technically, this means that the system can describe (or even replace!) itself. In practice this means that we can use the system to build itself. In particular, the kernel should have a description that allows it to be re-created with modifications as required. One could demand more reflexivity than already exists with the GNU/Linux system, but it is already quite usefully so. For example, it is possible to use the GNU/Linux system on one type of computer architecture to build the same system on another architecture.
Orthogonal: This is the principle of ``minimal interference''. For example the work of one user should impinge on that of another only if both of them desire it and only to the extent that both of them do. Some would say that this applies to the kernel as well--it should keep out of the way of user programs unless those programs need it!
Persistent: Loosely speaking, things should be available during the time when they are needed. Here is an anecdotal explanation. In TIFR, the typing of film club notices was often given to ``novice'' volunteers. One such volunteer typed the entire notice (about a 1000 words) and was horrified to find that the computer had ignored everything after the first 128 characters! In other words, the operating system should try not to spring nasty surprises on the users by deleting their data.
Abstraction: We (more and more) wish to deal with computers at a higher-level of abstraction than bit manipulation. This is somewhat analogous to what we do in mathematics. At the lowest level all of mathematics deals with the fundamental undefined entities (numbers for some people, sets for others and points, lines, planes for yet others). However, it would be foolish of us to do all of mathematics at this level. Similarly, the operating system needs to offer us some high-level abstract objects to manipulate and use as a base to create even higher-level ones. At the same time it should be possible within our system to get ``close to the metal'' if need be.

Let us further examine the chosen method of offering a solution. The solution suite is divided into three parts kernel, libraries, utilities (the latter are also called applications).

The user typically interacts with the utilities and (perversely!) for that reason we will discuss that aspect the least during this course. A better reason is that a sufficiently good understanding of the kernel and the libraries should give enough clues about how utilities can be (and are) written; on the other hand there are far too many interesting utilities to learn in one lifetime let alone in one course! However, we will learn about a few utilities along the way as these will be essential in order to run practical tests and examples under the GNU/Linux system.

The libraries serve two purposes. On the one hand they are programs that are used frequently and so have been written in an optimised fashion by ``super''-users (in the sense of very good programmers!). This allows other programmers to develop programs fast as they need not ``re-invent the wheel''; an important philosophical basis of the design that we have seen earlier. A more important aspect of the library (from the point of view of this course) is that it provides methods by which we can interact with the kernel.

The kernel is so named because it is the central program in the operating system. One of the functions of the kernel is to perform ``hardware abstraction''. The kernel gives us a few basic object types that we can use to deal with the plethora of hardware currently available without really having to worry about the specialities of each device. In addition, the kernel provides us with ``software abstraction'' by replacing bit manipulation with the manipulation of higher level objects.

The basic objects that were originally used by the Unix system to deal with all our requirements are files and processes. With the emergence of networks there was the addition of the notion of sockets which can (but need not be) thought of as a special kind of file. In particular, documents (respectively programs) usually start their life as a source file (e. g. TEX or C) which is processed into binary files (e. g. DVI or a.out) which can be used for printing or viewing (respectively for creating new processes). Data is also stored in files which can be ``flat'' source files or binary files (e. g. DB files) which are hash-indexed for quick searching; various processing mechanisms are provided to visualise this data--i. e. to convert it into other files. Games are processes that interact with the user through the manipulation of devices which are also represented via the file abstraction. This allows the games to be written in a device-independent fashion (e. g. music is stored as WAV or Ogg); similar ideas also allow us to provide non-textual means to visualise data (e. g. PBM or JPEG). The ``slow'' communication channels (such as e-mail) are also handled by files which store (spool) incoming and outgoing communication. The interactive usage of computers (which is nowadays the primary method by which computers are used) is handled by devices called ``terminals'' which are also a type of device file.

If we accept this point of view it is clear that we should now more specifically ask ourselves what we would like to do with files and processes--to be modern we will also include sockets in this question. Thus the three principle sub-questions that we can ask are: What are the services that we would like our Operating System to provide with regard to Files, Processes and Sockets?

Before going ahead we should mention some attempts that have been made to split up the job of the kernel into smaller pieces. The current structure of the Linux kernel is monolithic. One of the proposed designs is to separate the ``hardware abstraction'' layer from those parts of the kernel that support the files (file-system), processes (execution or process server) and the network layer. Such an approach is called a micro-kernel. Interestingly, Minix was such a solution; perhaps it was before its time or perhaps it was too much of a ``toy''. A further extension of this idea takes the point of view that with the expanding hardware arena, even the drivers for devices need to be moved out of the kernel--one may wonder what the kernel does in that case! Such a kernel is called an exo-kernel. Some of the features of this approach can already be found in Linux with the introduction of ``modular'' device drivers, file-systems and so on. However, the separation in an exo-kernel takes place at the level of the processor task space. In particular, a fault in a particular device driver or file-system should not make it possible to crash the kernel; this is not as yet a feature of the Linux kernel, thus it not quite an exo-kernel.

What can we do with files?

As in the previous section let us ask ourselves what operations we perform on files (and perhaps add in some operations we wish we could do with files).

(The Triumvarate!) Create, destroy and possibly maintain versions of files.
Read from, write to, append to and make insertions into files.
Edit the attributes of files, like the name of the file, who has access to it and what level of access is permitted.
Access files from a variety of different storage devices--fixed disk, floppies, etc.

The names that we can give to files need to have a structure so that we can organise the files according to their use and/or the context. For example, we typically refer to executable files as ``binary'' files so we would like to place them all in one lot. On the other hand those who create the executables may wish to group the files along with the source code that was used to create them. There are numerous possible choices of the structure of this space of names or namespace for files. The simplest and most elegant is used in the Unix system--a rooted tree. We usually visualise this tree upside down, so it should probably be called the root of a tree but we are not biologists! Since this namespace should consist of human-readable names, the rule for creating names is simple--names can consist of any letter, digit or punctuation symbol except that `/' is treated specially. The latter symbol is reserved to denote the branching of the tree. Thus / denotes the root of the tree and all names must start with this. A name /usr/local/lib/stow like lies below a node at level three (where the level of a node is the distance of the node from the root of the tree). The namespace then consists of the leaves of this tree.

There are other possible choices for the namespace. For example, the convention on the internet is to use a forest as as namespace--these names are called URL's. The root of each tree in the forest has a name that describes the way in which one accesses this tree. For example, if we use the http protocol to access the port 8080 on the host the.end.of.the.internet then the URL is http://the.end.of.the.internet:8080/. A similar convention is found in DOS, where the root of a tree gets a single letter name followed by a `:'.

Now the namespace must itself be structured in terms of files since ``everything in Unix is a file''. One way to do this (and this is the traditional solution in Unix) is by means of ``special'' files called directories. Note that a name that ends in `/' doesn't really belong to our namespace since this denotes a node and not a leaf. At the same time, we can ask what the relation is between /usr and /usr/lib. We thus extend our namespace to allow names to end in / except that such a name should be the name of a directory, which is specially structured file that contains (at least) a list of all leaves that lie directly below the node. This allows us to dynamically manage the tree. Every time we look for a file with the name (say) /usr/local/lib/stow/apache, we recurse as follows. We examine the directory /usr/local/lib/stow/ for the existence of the leaf apache in it. Note that /usr/local/lib/stow/ is a leaf in the directory /usr/local/lib/ so that is where we can find it, and so on. It is further convenient (but not essential!) to mandate that in order for /usr/lib to make sense, the name /usr should point to the directory /usr/. (Let me point out that I have used ``point'' in the above sentence knowingly.)

What we have described so far is the space of names of files but not files themselves. We do not dictate what a file may or may not contain except for the special files called directories. In this sense all files are binary or data files. It is thus useful to assign attributes to files to indicate what we may or may not do with them--in what way we can ``access'' them. In traditional Unix these attributes are restricted to the read, write and execute attributes which are further distributed among the users under the headings user,group and world. Nowadays, people like to have extended file attributes including the possibility of providing detailed ``access control lists''.

A directory is essentially an association list (or a list of pairs) which associates a pointer to actual data with each leaf that lies below the directory node. Thus, it is clear that different names could point to the same data; two such names are said to be hard linked. Since the relationship between these names is symmetric it does not always satisfy our requirements. For example one user may create a file and want to only allow another to read it; at the same time the second user may have a different way of organising files--in other words the same file could have a different name.

One solution for this is to allow another kind of special file which only contains the name of another file. When an access is made to such a special file the operating system would then ``automatically'' use the name inside it to access the ``actual'' data. Such a special file is called a symbolic link.

Some other kinds of special files were also present in the traditional Unix file system. We saw earlier that ``everything'' in Unix is a file. So, in particular, access to devices should also be through a file interface. For example, as I am typing this the operating system is reading my keystrokes from a file called /dev/pts/3. Such files are called device special files and consist of names that point to devices. Devices are classified according a ``major'' and a ``minor'' number and this is the information stored in the parent directory of the device. In our modern system Linux the creation and deletion of device special files can be managed by a separate file system called devfs so we don't really have to bother with them too much at this point.

Finally, the tree of files needs to be dynamic--growing nd receding according to need. We might want to add to the tree all the files residing on floppy or somewhere on the network. This is achieved by ``mounting'' the new file system at some directory node. Some have even suggested that the use of nodes as directories is too limited. In their view, it should be possible to attach a process to any node; this process will provide the files required by the part of the name space that lies below the node. Thus ``special'' files are a particular case of a more general mechanism called ``translators'' in the GNU/Hurd operating system that we might discuss towards the end of this course.

Handling processes

In the discussion on what we want of an Operating System we mentioned that we need to handle multiple processes. What are the operations that we need to be able to handle processes? One of the differences between processes and files is that the task of handling processes is carried out by other processes, whereas operations on files are carried out by processes. With this in mind let us detail the operations on processes:

Create new processes.
Allocate/deallocate memory.
Access files (including special files).
Set alarms and find out the running time, and other resources used.
Keep track of context such as the method by which the process was started, the user who started the process and so on.
Destroy or terminate processes.
Inter-process communication.

The operations that the operating system provides to running processes are called system calls. In one way these could be thought of as interprocess communication between some process and the kernel. Indeed that is the way in which many system calls are handled in the micro-kernel context. However, in the Linux context, the kernel is not a process and so system calls are managed differently.

Now, since only processes can create new processes there must be a first process (the parent of all processes). This process ``init'' is started by the operating system and we shall see more details about that when we examine the way the computer boots-up. Since each process has children processes we see that the structure of processes is again tree-like. Perhaps for this reason Linux provides a file-system like view of the space of all processes under the /procfs file-system. (However, in this view the tree is flattened and all processes are referred to be their process identification number--PID).

But how does a process start anyway?

The basic operation to create processes that is provided by the Unix system is the fork. A process that calls this causes a replica of itself to be created as a child process. The value returned by this to the calling process tells that process the PID of the child process. The child process meanwhile starts running at the same point but is given 0 as the value returned by fork. In this way the child and parent can recognise their own identities.

A process often needs more memory than is currently available to it or no longer needs memory that was previously alloted to it. The malloc and free are the calls provided to the process to manage its memory needs. Note that the memory so allocated may not be contiguous to the memory previously alloted and so the kernel maintains certain memory maps for the running process that indicate which chunks of actual memory are available to it. Processes that allocate memory to themselves must be careful to keep track of it or they will be the cause of memory leaks.

More often than not what we really want to do is to run a different program in a new process. This operation is called exec and is essentially carried out by allocating memory for the new process, writing a program in that memory and then jumping to the newly created process image after deallocating the memory assigned to ones own process image. We thus see that exec does not create a new PID.

Every process has a number of attributes. In particular, the process has access to the command line which was invoked in order to execute it and a certain number of ``environment'' variables that were set by the process that called it. In addition, there are the maps to memory locations that we saw above and the file descriptors that we shall see below. Further, in order to handle security matters a process also has two pairs of user/group identification numbers. One of them is the ``real'' UID/GID pair while the other is the ``effective'' UID/GID pair. We will see the utility of this when we examine how security and access control are implemented.

Another important set of attributes is a collection of flags called signals--actually the signal handling instructions. There are a number of different signals that can be sent to a running process by another running process (providing it satisfies certain access control restrictions--see below). Some of these signals are sent by the operating system as well--these are ``reserved''. For all the other signals, the running process can either decide on how it wants to deal with them. The process can decide to ignore certain signals or it can setup certain sub-routines called signal handlers to deal with the signals. Note that these handlers are entered from a context more unpredictable than that available to the final end-point of a ``goto'' and so these sub-routines need care in writing! Signals that are not ignored or handled will cause a running process to terminate.

Signals give a rather minimal way for running processes to communicate with each other since each signal is essentially just a single up/down flag. To provided more detailed access, the operating system typically each process to create a queue where others can deposit messages for it. In order for processes to exchange large chunks of data one also allows the primitive operation of allocating shared memory. Such memory can be shared by two or more processes. In order to synchronise their access to such memory processes can also create new sets of flags called semaphores in order to signal each other. The exact semantics of these flags are left to the processes to decide upon.

Interaction between processes and files

Earlier we said that processes are ways of dealing with files; roughly that processes are ``methods'' for file ``objects''. Reality is a bit more complex as usual. We now discuss three aspects of the interaction between processes and files--only the first discusses how processes manipulate files. Then we see how program files are converted into running processes. Finally, security and protection mechanisms require a close interaction between the access control attributes of files and the capabilities of processes.

How processes handle files

Processes do not actually ``realise'' that they are dealing with files. Since processes run in the memory space of the processor (CPU) that actually interprets and executes commands, the processes only see chunks of memory. The operating system provides access to files-as-chunks-of memory via various file handling mechanisms. These are available in three guises, as file-descriptors, as streams and as buffers. The latter is provided by memory-mapped I/O which is a new feature of the Linux kernel and is the easiest to conceptualise--a file is mapped as a chunk of memory; unfortunately not all files can as yet be dealt with this way. The file-descriptor approach is the most classical way of handling files and requires the process to know quite a bit about the actual type of file (ordinary or special, type of device in the latter case and so on) that it is dealing with in order to correctly read and write data. The streams approach is actually not provided by the operating system at all, but is provided by the C library that most programs use as a more conceptual interface to system calls.

When a process starts it is provided with three ready-made file-descriptors. We have already seen examples of system calls when we saw that we need operations for process to deal with files. The open system call allows a process to open a new file descriptor and the close system call allows a process to close one of its file descriptors. The read and write system calls are the natural calls that one can make use of on an open file descriptor. In addition, there are ``control'' functions that one can use for device files. One can reposition the place at which operations are taking place (in order to append or insert) by means of the lseek system call. All these operate at the level of file descriptors.

Note that file descriptors are inherited by an child process that is forked. Also note that a file descriptor comes along with a ``position'' which is the location in the file at which read/write/seek operations will take place. However, file control operations such as open/close act independently. Thus, a parent can transfer control to the child by opening the file before the child is forked and closing it after; the child inherits the open file descriptor. This has important security implications as we shall see.

How programs become processes

Typically, a running process contains instructions which are in the form of ``machine code''. Programs are typically written in a more humanly accessible form like a ``high-level'' language. Languages like Perl, Python, Lisp and the shell programming language are interpreted by running processes. Thus, files that contain scripts for these only lead to ``data'' for a certain running process. On the other hand, languages like C, Fortran and Pascal are ``compiled''. This process is divided into steps as follows. At first these are converted into object code which is a kind of machine code except that externally defined objects are represented symbolically. The next step involves linking this code--which means that all symbols need to be resolved. This produces (on the GNU/Linux system) an ELF format executable. In order for this to work, among all the object files being linked there must be exactly one symbol called main which is the main entry point for the program which is the output of the linking process.

When one asks the operating system to execute such a program, the operating system must load the program and set it up as a process. This involves memory allocation, relocation and dynamic linking. Now, one way would be for the kernel to allocate a contiguous chunk of memory for each program as its ``working space''. The problem then arises that if one runs a large number of small programs and some of them terminate but others keep running then memory becomes ``fragmented'' preventing larger programs from getting loaded. If a large program is made up of smaller fragments that can be relocated then this can be solved. However, the operating system has then to do a lot of work in order to relocate the fragments; moreover, all this relocation information must be stored in the program header, making that quite complex. A much better solution called memory paging is available. For this to work, the underlying hardware must support ``page faults''.

In this solution, each program is allocated virtual memory in a contiguous manner in integer multiples of a unit called a memory page. The kernel maintains a table for each process that maps a virtual memory page to a ``backing store'' of real address space--but doesn't map all of them to the processor memory if not enough is available. Whenever the program tries to access memory that is not backed by processor memory, a ``page fault'' is triggered by the memory-management unit of the underlying hardware. This interrupts the kernel which then suspends the process until it can make a memory map that provides backing store for this memory location. When the process has not requested allocation of that portion of memory at all, a segmentation violation signal is sent by the operating system to the process.

This allows an useful extension called ``shared libraries''. Many running process may actually be using the same routines from the standard libraries to print, read from files and so on. The code for these can thus be kept in the same backing store by the operating system for all the programs. The data segments that are accessed by these code fragments will (in general) be different as these will belong to different processes. This leads to the possibility of dynamic linking of programs--certain number of symbols are resolved into locations within the shared library. The operating system links these code fragments with the program when it creates the process and its virtual address space.

Security of files and processes

Everytime a process requests additional system resources, the operating system must be able to decide whether to allocate these resources or not. One of the simplest ways of achieving this is to assign a ``user-identification'' number or UID to each running process. The availability for each user is then individually assigned. For example, certain files could be ``owned'' by the users, others could be readable, writable, appendable, executable and so on depending on permissions. This turns out to be too limiting in a number of ways. There must be ways for different users to share resources. Thus groups of users are created and permissions can be granted or denied on the basis of ``group-identification'' number or GID. Thus UID/GID pairs occur in two places--each process is assigned one and each file is assigned one. Access to a file by a process is determined by comparing these.

Even this can be limiting at times. For example, when a user logs-in to the system the system allocates resources such as a new terminal access point. It is clear that the users should not in general be allowed to create new terminal access points for other users; this function should be reserved for the system alone. Thus it should be possible for a process to start as one user and then be converted to another user. One mechanism for this is to have the notion of ``real'' and ``effective'' UID/GID pairs for each process. Any process could be temporarily run with ``effective'' ID different from its real one. Again it is clear that this facility should only be granted on a limited basis. The decision about which programs are allowed to change their UID/GID's when they are loaded is decided on the basis additional permission bits about them that are stored by the file-system.

Another useful mechanism is to impose per-process restrictions--like the number of open files, the amount of memory that can be allocated and so on. The process in turn can be allowed to ``lock'' access to certain files for a limited period of time--this means that during that period only this process can access that file. Similarly, a process could be permitted to lock itself into memory for a certain critical section so that it would not be swapped out delaying its execution. Moreover, it may be a good idea to do this while the process is manipulating passwords so that the passwords are never written to the disk (in the swap area).

As Linux evolves it is becoming clear that even the mechanisms described above have a number of limitations leading to incorrect use and security compromises. Thus the notion of ``capabilities'' as a way of limiting process resource access has come into play along with access control lists as a way for files to dictate ``who'', ``how'' and ``when'' they can be accessed. The important change is (in the words of a kernel hacker) to make access to resources a question to which the answer is dynamically decided rather than a fact which which is statically stored on the system.

Communication

Quite often there are different ways in which communication interfaces are required on a working computer. The different hardware components of the computer may want to communicate with each other or with the central processing unit. This is managed with certain hardware features called interrupts. Next, different processes may want to communicate amongst themselves or with the operating system. This is handled by the inter-process communication features of the operating system that we have already seen. Finally, different computers may want to communicate with each other or with humans. An important aspect of all these forms of communication is asynchronicity or rather un-predictability. The different interacting entities can never say when a message will arrive or once sent, when it will reach (rather like the helpless-ness one feels after sending an important letter by post!). The processes that interact in this fashion must take this into account.

One wasteful method is called polling. The process waiting for a message (or an acknowledgement which is also a message, albeit a shorter one) keeps coming to a specific location to check if some data has arrived. Another method is to block or sleep waiting for data. In other words, a process tells the operating to ``wake it up'' when data is available. Finally, there is the method of signals. The process continues to run after telling the system to send it a signal whenever some data is ready. One the signal has been sent it is up to the process to handle the data in a fashion that it interferes constructively with its on-going task. Clearly, the latter method is the best if one can manage it.

A pipe is a one-way communication channel on a Unix system that one process writes to and another one reads from. The operating system uses signals to indicate to the relevant process when there is something to read from the pipe. Similarly, writes to the pipe block until some process is ready to read from it.

Another complication of communication over ``long distances'' and in particular, between distinct machines is that having bi-directional communication becomes essential--even if all that one side is doing is sending acknowledgements. This is because, there is a non-trivial possibility of some part of the communication being lost due to a temporary failure in the physical layer of the communication.

The common metaphor of a telephone is the most natural one to use in this context. Another way to think about it is as bidirectional pipes. Each end of the connection is called a socket. There are various namespaces for these sockets depending on the type of communications being used. One of the most common is the INET or Internet Protocol (IP) namespace. Each socket is then associated with an address of the form a.b.c.d : e where a, b, c and d are integers between 0 and 255 (some addresses are reserved); e is an integer between 0 and 65535 (again some ports are usually allocated for certain types of communications). An IP communication channel thus takes the form a.b.c.d : e : : t : w.x.y.z.

How does one use sockets to communicate? First of all we create a socket, then we ``bind'' to a certain name in a name space. We can then use this socket to ``listen'' for incoming connections. On the other end, the machine creates a socket and again gets a name for it. Next, it tries to ``connect'' to a listening socket. Once it succeeds the channel is ``established''. Note that the creation of a channel is asymmetric since one side must initiate a connection to the other side which must be waiting for such initiations. However, once a channel is opened it is symmetric since such a channel is bidirectional. This method of using the IP namespace is called the TCP/IP method.

Another method is for a ``packet'' oriented approach. After creating a socket, on just sends packets out onto the wire in search of a specified receiving address. If someone else has created a socket to receive these packets then the packets are received--otherwise they are lost ``in the ether''. This is a rough description of the UDP protocol.

Our modern operating system must support both kinds of protocols. There are also numerous variants and a number of different namespaces associated with those. We should try to support as many as possible. Note that the actual sequence and kind of data that can be sent in the data packets or over the bidirectional channel is a matter to be decided between the communicating processes. In order that certain ``Well-Known-Services'' are properly recognised and looked for the Internet Engineering Task Force (IETF) has created a number of ``standard'' port and address assignments so that machines that do not share the same application programs or operating system or even hardware can safely and usefully exchange data. Examples of such protocols are the HyperText Transfer Protocol http, File Transfer Protocol ftp, Simple Mail Transfer Protocol smtp and so on. Since these protocols lie in the realm of application programs we will not discuss them in this course on operating systems. Instead we will focus on the provision of the basic framework of sockets, their creation and use.

Footnotes

... Unix ¹: Here and elsewhere we will not elaborate on this. All trademarks are owned by their respective owners!

Kapil Hari Paranjape 2003-09-12