Knowledge of the UNIX runtime environment
is a fundamental requirement that UNIX programmers and/or non-programmers
should be familiar with. Knowing how a program is started, how it terminates,
and the runtime memory layout of its execution will (i) allow programmers
to program more efficiently and (ii) give non-programmers a better understanding
of the UNIX runtime environment as compared to other operating systems.
In this article we first present the aforementioned topics; a brief discussion
of different object file formats is then followed; and we conclude with
a discussion of program loading.
Note that since the UNIX environment
is closely tied to the C programming language, materials present in this
article are particular to C and/or C++ programs. Of course, this does not
mean that the same implication can't be applied to other languages such
as Lisp, FORTRAN, Ada, etc.
Runtime Process
Program start up
In order for the UNIX kernel to
execute a program, there must be an entry point in the program to tell
the kernel where to begin execution. In a C program this entry point is
the 'main' routine, as known by most C programmers. What most C programmers
does not know, however, is that there lies a special start up routine that
get called before 'main'. This special routine is responsible for obtaining
the command-line arguments and environment variables from the kernel and
providing them to the main routine, i.e., via the 'argc', 'argv', and/or
an optional list of environment variables as arguments to 'main'. (For
a C++ program, the start up routine also handles the global constructor
and destructor lists.) The GNU compiler/linker, for example, defines this
special start up function as '_start' in the file 'crt1.o'. At the last
stage of the compilation process, the link editor links this file (along
with other start up object files) with the program in addition to any libraries
that might be needed to resolve unresolved references.
Program termination
There are various way in which a
program can be terminated. The most common way to exit a program is via
'return' from the main routine. Alternately, 'exit', '_exit', or 'abort'
can be called, within any function, to terminate the program. Of the different
ways to terminate a program, only calling 'exit' or 'return' will perform
cleaning up upon exit, i.e., closing any outstanding open file descriptors.
Runtime Data Structure
UNIX object and executable files
come in various flavors. There are currently a handful of different formats
being used by various operating systems. Some of the common formats include:
COFF (Common Object File Format), which is used on SunOS 4.0.x and an extended
version of it (XCOFF) is used on AIX; a.out is used through various flavors
of UNIX including SunOS 4.1.x and Linux versions prior to 1.2.13; and ELF
(Executable and Linkable Format), which has become the de-facto format
and is currently used on Solaris and many other operating systems.
Although each format is internally
different, there lies a common notion known as 'segment' that each format
possesses. Segment, often called section in ELF, is an area in an object
file that encapsulates a particular type of data (e.g., symbol table entries,
global variables, etc.). An a.out format, for example, contains three major
segments: BSS (Block Started by Symbol), text, and data. BSS segment does
not take up spaces in an object file; it is used to hold variables that
have not yet been initialized. Text segment refers to the actual code as
written and translated into machine instructions. And data segment is where
all the initialized global and static variables reside.
There are two segments that are
fundamental to each format mentioned above; namely, text segment and data
segment. Text segment, as mentioned, is where machine instructions are
located. Data segment is where global and statics variables are being defined.
At runtime, the data segment is broken down into three constituent parts
known as static, stack, and heap data. Static data refers to global or
'static' variables whose storage spaces are determined at compile-time,
e.g.,
int int_array[100];
int main() { static float float_array[100]; double double_array[100];
char *pchar;
pchar = (char *)malloc(100);
/* .... */
return (0); }
where both int_array and float_array
are static data. Stack data, on the other hand, refers to variables that
exist within a scope of a function; that is, stack data refers to memory
allocates at runtime for local (automatic) variables, e.g., double_array
in the above example. Heap data is data that dynamically allocates at runtime
(e.g., pchar above). This data remain in memory so long as it either being
freed explicitly or until the program terminates.
Runtime Loading
To execute a program, the kernel
maps all segments in an executable file directly into virtual memory. The
kernel also does additional work in assigning different permission to memory
regions based on the location of each segment. That is, for a segment that
remains unchanged (e.g., text segment) the kernel assigns read-only permission
to that particular block of memory. Similarly, for a segment that is bound
to change (e.g., data segment) the kernel assigns read and write permission
to that block of memory.
For a program that is dynamically
linked, the kernel has to perform an additional work of maintaining a single
copy of the library in virtual memory to be shared by different process
of the same program. That is, multiple copies of a program can be run simultaneous
and yet they all share one common library.
Conclusion
In this article, we have brief described
the runtime environment of the UNIX operating system. We first examine
the runtime process which is required to start and terminate a program.
Next, we survey various object file formats that are commonly used in today
modern operating systems. Knowing the format, in turn, provides us an easy
way to see how an executable file is mapped into memory.
Suggested Reading
- Peter van der Linden,
``Expert C Programming: Deep C Secrets,'' SunSoft Press, 1994. An excellent
(and not to mention entertaining) book on C. This book should not be used
as reference, but rather in conjunction with, say, K&R.
- W. Richard Stevens,
``Advanced Programming in the UNIX Environment,'' Addison Wesley, 1992.
Everything you will ever want to know about programming in the UNIX environment.
A must-have book for UNIX programmers.
- Executable and Linkable
Format Specification. This is only useful if you want to know the format
of an ELF file.
- Gintaras Gircys,
``Understanding and Using COFF,'' O'Reilly & Associates, 1988. Only
useful if you need to know about COFF.
- Various man-pages.
In particular, a.out(5), coff(5), elf(3).
Nguyen Trung
[email protected]
For discussion on this column, join [email protected]
Copyright © 1996 by VACETS and Nguyen Trung