C programming in Mkos version p1

I started writing Mkos in assembly only, but decided early in the project I’d like to both be able to write kernel code in C, and to support C as a user space language. Let’s see how C programming is supported in Mkos version p1.

The C standard library

All standard library code is written for Mkos; it does not use a 3rd party standard library implementation. The standard library is incomplete; I’m writing each part of it as it’s needed for the OS.

Overview

The Mkos build system uses the 16-bit OpenWatcom toolchain to build a C program. The wcc compiler compiles C code to Relocatable Object Module (OMF) object files. The wlink linker links the OMF files and emits a program image.

After linking, a patch tool populates a small header in the program image with data the kernel needs to load the program.

Compiling

The wcc compiler takes C programs as input and compiles them to OMF object files. This is the set of wcc options we use that affect the output object file. These options are common across the different areas of the code base (kernel code, user code, etc.).

0: emit 8088/8086 instructions

This option makes wcc emit only 8086 instructions, which is required for our goal of being backwards compatible to 8086.

od: disable all optimizations.

The rationale behind disabling optimizations is they may make the generated code diverge from the source code in ways we don’t expect, making debugging more difficult. As this stage of development, we want to accept larger code size and suboptimal performance in return for a more straightforward debugging experience.

ms: small memory model.

OpenWatcom defines several memory models, each of which refer to a specific combination of code and data models. The small memory model uses small code and small data, so that’s the option we use.

s: remove stack overflow checks.

By default, wcc emits code at the beginning of every function to call a function called STK to check for a stack overflow. The STK function is presumably defined in the libraries OpenWatcom includes for its supported OSes. Since we don’t use those, it is undefined and wlink fails. Enabling this option removes the stack overflow checks.

we: treat all warnings as errors.

This option causes wcc to exit with an error code when it generates a warning. A wcc failure in turn causes the entire Mkos build to fail, making it easier to find bugs at build time.

zl: suppress generation of library file names and references in object file.

By default, wcc inserts into the object file the names of C libraries corresponding to memory model and floating-point options. Since we don’t use the OpenWatcom libraries, this always causes wlink to fail. Enabling this option stops wcc from placing these names into the object file.

zld: suppress generation of file dependency information in object file.

By default, wcc inserts into the object file the names and time stamps of all files referenced by the source file. This information is used by the wmake utility. We don’t use wmake, so we enable this option for a cleaner object file.

zls: remove automatically inserted symbols (e.g. runtime library references)

We don’t use OpenWatcom’s default library information or runtime libraries, so we enable this option for a cleaner object file.

In addition to the options above, we use the ad, adt, and add options to control automatic generation of Make-style dependency rules (.d files), which are used by Make in the usual way to account for changes in C file #include directives.

Linking

The wlink linker takes OMF files generated by wcc and links them together, along with any of their dependencies and the MKZ header template. The output is an intermediate program image. wlink also generates a memory map text file for the program.

The wlink linker is driven by a system of directives. Each directive is given by name, followed by any arguments it has. The option directive is used to set the values of more general options.

Directives maybe specified on the command line or in linker script files. We specify the map option for every program on the wlink command line. We use the resulting map file to build the final MKZ program image. The map file is also useful for debugging. Kernel and user programs have very slightly different linker scripts.

This is the linker script for the kernel:

output raw
format dos
option fillchar=0xde
option nodefaultlibs
order
  clname DATA segaddr=0x1000
  clname CODE

output raw

The output directive overrides the normal operating system specific executable format and creates a raw binary image.

format dos

The format directive specifies the format of the output file. While we aren’t actually using a DOS file format, this directive is required. Without it, wlink assumes an OS/2 format and begins output at an undesired offset in the output file.

option fillchar=0xde

The fillchar option specifies the byte value used to fill gaps in the output image. We change this from its default value of 0 to make it potentially easier to identify these areas when debugging at runtime.

option nodefaultlibs

This option instructs wlink to ignore default libraries when searching for any library files.

order
  clname DATA segaddr=0x1000
  clname CODE

The order directive specifies the order in which classes are placed in the output image. Any class name not listed is placed after the listed ones.

We make sure the DATA class is placed first so the information for the program loader is in a memory location that the kernel knows at load time. The value of segaddr refers to the segment address where the kernel starts.

This is the linker script for user programs. It is very similar to the kernel script, with two differences.

output raw
format dos
option fillchar=0xde
option nodefaultlibs
option start=main_
order
  clname DATA segaddr=0x2000
  clname CODE

option start=main_

The start option defines the entry point for the output image. The value main_ corresponds to the main function in the C program being linked. This is necessary for the wlink map file to report the correct entry point address, which in turns is necessary to correctly populate the MKZ data.

Finalizing the executable

The MKZ file format

The MKZ file format is the executable file format for Mkos. It is a simple file format consisting of a small header, followed by the rest of the program image generated by wlink. These are the fields:

Offset

Segment

Description

0x0

DATA

32-bit memory offset of program entry point

0x4

DATA

Near address of the “return to kernel” instruction

0x10

DATA

Program arguments count (argc)

0x12

DATA

Array of pointers to program argument strings (argv)

0x0

TEXT

Far jump back to kernel

The first 4 bytes in the MKZ header DATA segment contain the 32-bit absolute offset of the program’s entry point at runtime. The next 60 bytes (total 64 bytes) in the DATA segment are reserved for the program arguments (argc/argv, in C terminology). Both of these fields are used by the kernel to load and execute the program. The first 5 bytes of the TEXT (code) segment are 0x ea <IP> <CS>, which is a far jump instruction. CS:IP is the 32-bit address of a subroutine to re-initialize the kernel. This is how the user program transfers control back to the kernel.

As mentioned above, we link an object file establishing the MKZ header into every program. The MKZ header is first in link order, so the header starts at the beginning of the program image at build time, and therefore at the beginning of the user space data area at load time. Then the kernel can refer to its data by absolute memory locations during the load and execute procedure.

The information for the MKZ header isn’t available at link time, so what is linked is a template that merely reserves the required memory. After linking, the build system uses the wlink-generated memory map file to populate the MKZ header’s program entry point field. The kernel populates the other two fields at runtime.

Loading

The kernel provides only a single syscall to execute a program: exec. exec replaces the process image in user space with the new program to run. Then it executes the new program. The MKZ file format supports the load and execute procedure.

The exec system call

The exec system call takes the following arguments:

  • path: the path to the file containing the user program to execute

  • argv: an array of pointers to strings containing program arguments

  • argc: the count of elements in argv

It loads and executes the program as follows:

  • copies the program image from the file specified by the path argument into memory, using the user segment as the load base.

  • copies argc and argv into the corresponding area of the MKZ header in user space.

  • saves the kernel’s stack (SP register) into memory in the kernel’s address space.

  • initializes the user’s DS, SS, and SP registers.

  • pushes the address of the “return to kernel” instruction in the MKZ header onto the user’s stack.

  • writes the address of the “return to kernel” instruction into the user’s data segment at the location referenced by the above pointer.

  • sets the AX and DX registers to argc and argv.

  • jump to the program entry point, through the pointer stored at MKZ header.

Now the user program begins executing. It has a working stack and can access its argc and argv values. When the program reaches the end of its main function, the ret instruction at the end of the function causes the CPU to pop the address of the jmp instruction in the MKZ header and jump to it (near jump). Then it executes that jmp instruction to return to the kernel (a far jump). At this point, the user program has ended and the kernel restarts.