C programming in Mkos version p1¶
I started writing Mkos in assembly only, but decided early in the project I’d like to both be able to write kernel code in C, and to support C as a user space language. Let’s see how C programming is supported in Mkos version p1.
The C standard library¶
All standard library code is written for Mkos; it does not use a 3rd party standard library implementation. The standard library is incomplete; I’m writing each part of it as it’s needed for the OS.
Overview¶
The Mkos build system uses the 16-bit OpenWatcom toolchain to build a C program. The wcc compiler compiles C code to Relocatable Object Module (OMF) object files. The wlink linker links the OMF files and emits a program image.
After linking, a patch tool populates a small header in the program image with data the kernel needs to load the program.
Compiling¶
The wcc compiler takes C programs as input and compiles them to OMF object files. This is the set of wcc options we use that affect the output object file. These options are common across the different areas of the code base (kernel code, user code, etc.).
0
: emit 8088/8086 instructions
This option makes wcc emit only 8086 instructions, which is required for our goal of being backwards compatible to 8086.
od
: disable all optimizations.
The rationale behind disabling optimizations is they may make the generated code diverge from the source code in ways we don’t expect, making debugging more difficult. As this stage of development, we want to accept larger code size and suboptimal performance in return for a more straightforward debugging experience.
ms
: small memory model.
OpenWatcom defines several memory models, each of which refer to a specific combination of code and data models. The small memory model uses small code and small data, so that’s the option we use.
s
: remove stack overflow checks.
By default, wcc emits code at the beginning of every function to call a
function called STK
to check for a stack overflow. The STK
function is presumably defined in the libraries OpenWatcom includes for
its supported OSes. Since we don’t use those, it is undefined and wlink
fails. Enabling this option removes the stack overflow checks.
we
: treat all warnings as errors.
This option causes wcc to exit with an error code when it generates a warning. A wcc failure in turn causes the entire Mkos build to fail, making it easier to find bugs at build time.
zl
: suppress generation of library file names and references in
object file.
By default, wcc inserts into the object file the names of C libraries corresponding to memory model and floating-point options. Since we don’t use the OpenWatcom libraries, this always causes wlink to fail. Enabling this option stops wcc from placing these names into the object file.
zld
: suppress generation of file dependency information in object
file.
By default, wcc inserts into the object file the names and time stamps of all files referenced by the source file. This information is used by the wmake utility. We don’t use wmake, so we enable this option for a cleaner object file.
zls
: remove automatically inserted symbols (e.g. runtime library
references)
We don’t use OpenWatcom’s default library information or runtime libraries, so we enable this option for a cleaner object file.
In addition to the options above, we use the ad
, adt
, and
add
options to control automatic generation of Make-style dependency
rules (.d files), which are used by Make in the usual
way
to account for changes in C file #include
directives.
Linking¶
The wlink linker takes OMF files generated by wcc and links them together, along with any of their dependencies and the MKZ header template. The output is an intermediate program image. wlink also generates a memory map text file for the program.
The wlink linker is driven by a system of directives. Each directive is
given by name, followed by any arguments it has. The option
directive is used to set the values of more general options.
Directives maybe specified on the command line or in linker script
files. We specify the map
option for every program on the wlink
command line. We use the resulting map file to build the final MKZ
program image. The map file is also useful for debugging. Kernel and
user programs have very slightly different linker scripts.
This is the linker script for the kernel:
output raw
format dos
option fillchar=0xde
option nodefaultlibs
order
clname DATA segaddr=0x1000
clname CODE
output raw
The output
directive overrides the normal operating system specific
executable format and creates a raw binary image.
format dos
The format
directive specifies the format of the output file. While
we aren’t actually using a DOS file format, this directive is required.
Without it, wlink assumes an OS/2 format and begins output at an
undesired offset in the output file.
option fillchar=0xde
The fillchar
option specifies the byte value used to fill gaps in
the output image. We change this from its default value of 0 to make it
potentially easier to identify these areas when debugging at runtime.
option nodefaultlibs
This option instructs wlink to ignore default libraries when searching for any library files.
order
clname DATA segaddr=0x1000
clname CODE
The order
directive specifies the order in which classes are placed
in the output image. Any class name not listed is placed after the
listed ones.
We make sure the DATA
class is placed first so the information for
the program loader is in a memory location that the kernel knows at
load time. The value of segaddr
refers to the segment
address where the kernel starts.
This is the linker script for user programs. It is very similar to the kernel script, with two differences.
output raw
format dos
option fillchar=0xde
option nodefaultlibs
option start=main_
order
clname DATA segaddr=0x2000
clname CODE
option start=main_
The start
option defines the entry point for the output image. The
value main_
corresponds to the main
function in the C program
being linked. This is necessary for the wlink map file to report the
correct entry point address, which in turns is necessary to correctly
populate the MKZ data.
Finalizing the executable¶
The MKZ file format¶
The MKZ file format is the executable file format for Mkos. It is a simple file format consisting of a small header, followed by the rest of the program image generated by wlink. These are the fields:
Offset |
Segment |
Description |
---|---|---|
0x0 |
DATA |
32-bit memory offset of program entry point |
0x4 |
DATA |
Near address of the “return to kernel” instruction |
0x10 |
DATA |
Program arguments count (argc) |
0x12 |
DATA |
Array of pointers to program argument strings (argv) |
0x0 |
TEXT |
Far jump back to kernel |
The first 4 bytes in the MKZ header DATA
segment contain the 32-bit
absolute offset of the program’s entry point at runtime. The next 60
bytes (total 64 bytes) in the DATA
segment are reserved for the
program arguments (argc/argv, in C terminology). Both of these fields
are used by the kernel to load and execute the program. The first 5
bytes of the TEXT
(code) segment are 0x ea <IP> <CS>
, which is a
far jump instruction. CS:IP
is the 32-bit address of a subroutine to
re-initialize the kernel. This is how the user program transfers control
back to the kernel.
As mentioned above, we link an object file establishing the MKZ header into every program. The MKZ header is first in link order, so the header starts at the beginning of the program image at build time, and therefore at the beginning of the user space data area at load time. Then the kernel can refer to its data by absolute memory locations during the load and execute procedure.
The information for the MKZ header isn’t available at link time, so what is linked is a template that merely reserves the required memory. After linking, the build system uses the wlink-generated memory map file to populate the MKZ header’s program entry point field. The kernel populates the other two fields at runtime.
Loading¶
The kernel provides only a single syscall to execute a program:
exec
. exec
replaces the process image in user space with the new
program to run. Then it executes the new program. The MKZ file format
supports the load and execute procedure.
The exec
system call¶
The exec
system call takes the following arguments:
path
: the path to the file containing the user program to executeargv
: an array of pointers to strings containing program argumentsargc
: the count of elements inargv
It loads and executes the program as follows:
copies the program image from the file specified by the
path
argument into memory, using the user segment as the load base.copies
argc
andargv
into the corresponding area of the MKZ header in user space.saves the kernel’s stack (SP register) into memory in the kernel’s address space.
initializes the user’s DS, SS, and SP registers.
pushes the address of the “return to kernel” instruction in the MKZ header onto the user’s stack.
writes the address of the “return to kernel” instruction into the user’s data segment at the location referenced by the above pointer.
sets the AX and DX registers to
argc
andargv
.jump to the program entry point, through the pointer stored at MKZ header.
Now the user program begins executing. It has a working stack and can
access its argc
and argv
values. When the program reaches the
end of its main
function, the ret
instruction at the end of the
function causes the CPU to pop the address of the jmp
instruction in
the MKZ header and jump to it (near jump). Then it executes that jmp
instruction to return to the kernel (a far jump). At this point, the
user program has ended and the kernel restarts.