Under Linux the execution of a system call is invoked by a maskable
interrupt or exception class transfer, caused by the instruction
int 0x80. We use vector 0x80 to transfer control to the kernel.
This interrupt vector is initialized during system startup, along with
other important vectors like the system clock vector.
As of version 0.99.2 of , there are 116 system calls.
Documentation for these can be found in the man (2) pages. When a user
invokes a system call, execution flow is as follows:
Each call is vectored through a stub in libc. Each call within the libc
library is generally a syscallX() macro, where X is the
number of parameters used by the actual routine. Some system calls are
more complex then others because of variable length argument lists,
but even these complex system calls must use the same entry point:
they just have more parameter setup overhead. Examples of a complex
system call include open() and ioctl().
Each syscall macro expands to an assembly routine which sets up the
calling stack frame and calls _system_call() through an
interrupt, via the instruction int $0x80
For example, the setuid system call is coded as
Which will expand to:
The macro definition for the syscallX() macros can be found in
the user-space system call library code can be found in
At this point no system code for the call has been executed. Not
until the int $0x80 is executed does the call transfer to the kernel entry
point _system_call(). This entry point is the same for all system
calls. It is responsible for saving all registers, checking to make
sure a valid system call was invoked and then ultimately transfering
control to the actual system call code via the offsets in the
_sys_call_table. It is also responsible for calling
_ret_from_sys_call() when the system call has been completed, but
before returning to user space.
Actual code for system_call entry point can be found in
Actual code for many of the system calls can be found in
/usr/src/linux/kernel/sys.c, and the rest are found elsewhere.
find is your friend.
After the system call has executed, _ret_from_sys_call()
is called. It checks to see if the scheduler should be run, and if
so, calls it.
Upon return from the system call, the syscallX()
macro code checks for a negative return value, and if there is one,
puts a positive copy of the return value in the global variable
_errno, so that it can be accessed by code like perror().