6. The Compiler
In the Modula-2 language, as defined by Niklaus Wirth, identifiers may be used before they are declared, except when they are used in another declaration (this restriction does not apply to pointers). This forces the compilation process to be done in at least two passes.
To avoid imposing unnecessary restrictions and, yet, provide reasonable performance, the two pass approach was selected: During the first pass, syntax analysis and declaration analysis are performed; The second pass performs the semantic analysis and code generation.
The compiler has an integrated text editor. Should errors be encountered, the editor is invoked at the end of the current compiler pass (sooner, if an error is found during the processing of an import list or if 20 errors are identified).
The compiler also has a built in "make" processor. A makefile must be created before this process is invoked. Although you can create a makefile using the editor, we recommend that you use the utility provided for that purpose: GENMAKE (this utility may be invoked from the compiler menu -- G).
The compiler can generate 2 different kinds of object files as output. By default, M2O (stands for "Modula-2 Object") files are generated. This file format is unique to this compiler, and it is optimized for our requirements and those of Modula-2. But the user can, through the use of an environment variable (M2OUTPUT), specify that standard OBJ files are to be generated instead.
6.1 The integrated compiler: MC
MC [workModule] [/p mainModule] [/s maxIds idSpace] [/c] [/e] [/m] [/d-]
The '/p' option may be used to indicate the name of the main module of the program that you are working on (the name should not have an extension). If this option is not used, but workModule is entered without an extension, the same name is also used for mainModule.
The '/s' option allows you to change the default sizes of the compiler identifier tables. The two arguments specify the maximum number of different identifiers to be processed, and the total string space to be allocated to store these identifiers. The default values are 2000 and 12000, respectively.
If you use of the '/c' command line option, the compiler starts compiling "workModule" immediately and, if no errors are encountered, will bring you right back to the DOS prompt. This is useful when running the compiler from a batch file:
MC myprog /c
The '/e' option will send you straight into the editor.
The '/m' option invokes the make processor, which will look for the file mainModule.MAK for the dependencies list.
The '/d-' option sets the default value of the compiler directives $R+ and $T+ to '-', disabling the default generation of most runtime error checking. We do not recommend disabling the stack overflow checking, and the use of '/d-' will not do it.
The compiler always sets the DOS errorlevel to 0 if the last compile was successful (no errors); otherwise, the DOS errorlevel is set to 1.
If the compiler is invoked without the '/c', '/e' or '/m' options, you will get a screen that looks something like this:
Modula-2 compiler, Version 3.5
(C) Copyright 1987-1995 Fitted Software Tools. All rights
Memory model in use: LARGE
Output file format: M2O
Runtime environment: Modula-2
Heap in use: 0K
Available Heap: 251K
Work module: work.mod
Program New DOS Quit
Compile Edit Genmake Make Link eXecute >
The options at this point are:
Program Specify the name of the main program module.
New Specify another "Work module".
DOS Invoke a new DOS shell. At the DOS prompt, you should type EXIT to return to this system.
Quit Return to DOS.
Compile Compile the "Work module".
Edit Edit the "Work module".
Genmake Invoke the GenMake program, passing as argument the name in Program -- and '/OBJ', if appropriate.
Make Recompile all the necessary modules as per the rules of a makefile (The makefile is assumed to have the name in Program and the extension of .MAK). Note: If errors are encountered during the compilation of one of the modules, the make process is aborted. After fixing the errors, select Make again.
Link Invokes the linker (M2Link) passing along as arguments the name in Program and '/L'.
eXecute You are prompted for any arguments that you may want to pass to the program; The Program is then executed.
6.2 The freestanding compiler: M2COMP
M2COMP filename [/m] [/s maxIds idSpace] [/d-]
filename is the name of the module to compile or, if the /M option is used, the name of the makefile to process. The DOS errorexit is set to 0 if the compilation (make) is successful and to 1 otherwise.
6.3 The compilation process
6.3.1 The input file
If the module to be compiled is already loaded into one of the editor buffers, that source is compiled. Otherwise, the compiler tries to open the named file.
6.3.2 The imported modules
The compiler and the linker cooperate in assuring that all the modules that refer to a particular definition module will have been compiled against the same version of that definition module.
To this end, the compiler places in the 'module header' and 'module import' records of the object file a "module key". This module key is the date of the DEF file used during the compilation of the implementation module or during the processing of the IMPORT statement. Due to this, the compiler will not look in the editor buffers for the DEF files needed to process an IMPORT list. These are always read in from the disk.
6.3.3 The output file
The output from the compilation of a main module or an implementation module is a single output file, with the same name of the source file but with the extension of 'M2O' (Modula-2 Object) -- OBJ files are created instead, if the environment variable M2OUTPUT so specifies.
The compilation of a definition module does not generate any new output files. If the compilation is successful (no errors), the compiler simply 'touches' the source file, updating its modification time.
6.4 A warning
Because of the fact that the compiler uses the date of the DEF file as that module's key, you may not modify a DEF file unless you intend to recompile all the modules that use it, nor can you copy the file in such a way that its date is not preserved.
In particular, if you are going to be transferring your modules between computers, you must use some procedure that will preserve all the DEF files' dates.
This is probably a good place to point out that, when you use OBJ files, you are not protected by this module version checking.
6.5 Compiler directives
Certain compiler code generation options may be set through directives included in the program text. These directives must appear immediately at the beginning of a comment; multiple directives may be entered in a single comment by separating them by commas. Example (* $S-, $R+ *). A '+' sets the directive to TRUE, a '-' sets it to FALSE, and a '=' resets the directive's value to the one prior to the last '+' or '-'.
The following compiler directives are defined:
- $A Alignment. Default $A+. If enabled, all new variables declared are aligned on a word boundary. Record fields are packed (not aligned) regardless of the setting of this option.
- $S Stack overflow checking. Default $S+. If enabled, stack overflow checking is performed on entry to a procedure and when copying open arrays to a procedure's local stack frame.
- $R Range checking. Default $R+. If enabled, before any assignment is made to a variable of a subrange type, the value to be assigned is tested against the limits of the subrange type.
- $T Array subscript and NIL pointer checking. Default $T+. If enabled, any time a subscript operation is performed on an array, the subscript value is checked to confirm that the operation would not generate an address outside the bounds of the array. In addition, before a pointer is dereferenced, its value is checked for NIL.
- $L Generate line number information. Default $L-. If this option is enabled, the compiler will include a list of source code line numbers and their corresponding object code offsets in the output file. This line number information is passed on to the .DBG file when the program is linked with the /L option.
6.6 Runtime errors
When, during the execution of a program, a runtime error is detected, the runtime error handler will terminate the program and write out a message indicating the type of error encountered and its location (module name, line number and PC address).
6.6.1 Trapping runtime errors in your program
The Library module System provides you with a means of intercepting runtime errors. The following are the currently defined runtime error numbers that may be passed to your error handler routine:
0 stack overflow ($S option)
1 range error ($R or $T option)
2 integer/cardinal overflow (divide by zero)
3 floating point error
4 function did not execute a RETURN
5 HALT was invoked
6 CASE selector out of range
6.7 Compiler size limits
The following are the code and data size limits imposed by this compiler:
- A string constant cannot exceed 80 characters. This is also the limit set for the size of any identifier.
- When using the HUGE memory model, each compilation module is assigned its own data segment, which can be up to 64k in size. In the data segment, the compiler allocates the space for all the module's global variables and some of the module's constants.
- When using the LARGE memory model, all the modules' data are combined, at link time, into a single data segment (64k maximum).
- The maximum size of a data structure is 65532 bytes.
- The maximum amount of space allocated for variables local to a procedure is 32000 bytes.
- The compiler refuses to generate the code to pass, in a procedure call, by value, a parameter greater than 65000 bytes in size.
The following are the compiler's internal limits:
- The maximum number of different (namewise) identifiers that can be processed in a single compilation is 2000. May be overwritten at compiler invocation.
- The total number of characters in all the different (namewise) identifiers processed cannot exceed 12000 characters. May be overwritten at compiler invocation.
- No single procedure can be translated into more than 10k bytes of object code.
- An array of 8k bytes is used to keep track of all the initialized data for a module. This imposes a limit on the total amount of string, real and long constants used in the compilation module.
6.8 The language supported
This release of the compiler will translate a program written in the Modula-2 language as defined by Niklaus Wirth in the 3rd edition of his book "Programming in Modula-2", with the exceptions noted below:
- Integer and Cardinal arithmetic overflow is not detected.
- INLINE, ASM, CLASS, INHERIT, OVERRIDE, INIT and DESTROY are reserved words in this implementation.
- For those programmers that "grew up" in the Hex world, a way to define CHAR literals in Hex is provided: 20X corresponds to the "space" character in ASCII.
- You may return, from a function procedure, a value of any type (including structured types).
6.8.1 LONGINT and LONGCARD
This compiler implements the standard types LONGINT and LONGCARD.
Operands of the type LONGINT or LONGCARD may appear in any expression, just like INTEGER or CARDINAL. But that is about it!
Subranges of these types are not supported.
No standard procedure, except INC, DEC and the ones listed later in this document will accept operands of one of these types.
A variable of type LONGINT or LONGCARD cannot be used as the control variable in a FOR loop. Neither can CASE labels be of a LONG type.
Constants of type LONGINT or LONGCARD can be coded in decimal only and must be terminated by an 'L' if the value is less than 65536. Example:
123L and 123567 are valid LONGCARD or LONGINT constant
-1L and -348762 are valid LONGINT constants
The standard type LONGREAL is implemented.
The rules for the use of LONGREALs are the same as for REALs.
The types REAL and LONGREAL are not compatible, and no automatic conversion from one type to another is ever performed -- the standard procedures SHORT and LONG should be used to convert between these types.
Constants of type LONGREAL are no different from REAL constants.
The type of the constant is determined by context. You may, however, "type" a constant by the use of the SHORT or LONG procedure. Example:
CONST longreal1 = LONG(1.0);
6.8.3 Additional or augmented standard procedures
220.127.116.11 NEW and DISPOSE -- pointer argument
NEW and DISPOSE have been deleted from the language definition in the 3rd edition of Wirth's book. We implement them thus:
NEW(p) Invokes the procedure ALLOCATE, which must conform to the type:
PROCEDURE ( VAR ADDRESS, CARDINAL )
passing along p and the size of the object p is defined as pointing to.
DISPOSE(p) Invokes the procedure DEALLOCATE, which must conform to the type:
PROCEDURE ( VAR ADDRESS, CARDINAL )
passing along p and the size of the object p is defined as pointing to.
The procedures ALLOCATE and DISPOSE must, therefore, be defined in the module using NEW and/or DISPOSE, or imported from some other module, like Storage.
18.104.22.168 LONG and SHORT
PROCEDURE LONG( INTEGER ) :LONGINT;
PROCEDURE LONG( CARDINAL ) :LONGCARD;
PROCEDURE LONG( REAL ) :LONGREAL;
PROCEDURE SHORT( LONGINT ) :INTEGER;
PROCEDURE SHORT( LONGCARD ) :CARDINAL;
PROCEDURE SHORT( LONGREAL ) :REAL;
LONG takes an INTEGER, CARDINAL or REAL and returns the equivalent LONGINT, LONGCARD or LONGREAL, respectively. SHORT takes a LONGINT, LONGCARD or LONGREAL and returns the equivalent INTEGER, CARDINAL or REAL, respectively.
22.214.171.124 FLOAT and TRUNC
With our two integer/cardinal and real sizes, here is the behavior of the TRUNC and FLOAT procedures.
PROCEDURE FLOAT( CARDINAL ) :REAL;
PROCEDURE FLOAT( LONGCARD ) :LONGREAL;
PROCEDURE TRUNC( REAL ) :CARDINAL;
PROCEDURE TRUNC( LONGREAL ) :LONGCARD;
The HALT procedure was enhanced to take an optional argument, a CARDINAL value. The value is the runtime error number generated.
When called without an argument, HALT generates a runtime error number 5.
6.9 Objects exported by the pseudo module SYSTEM
126.96.36.199 TYPE BYTE
Takes 1 byte of storage. Only assignment is defined for this type. If the formal parameter of a procedure is of type BYTE, the corresponding actual parameter may be of any type that takes 1 byte of storage.
If the formal parameter of a procedure is of type ARRAY OF BYTE, the corresponding actual parameter may be of any type.
188.8.131.52 TYPE WORD
Takes 1 word (2 bytes) of storage. Only assignment is defined for this type. If the formal parameter of a procedure is of type WORD, the corresponding actual parameter may be of any type that takes 1 word of storage.
If the formal parameter of a procedure is of type ARRAY OF WORD, the corresponding actual parameter may be of any type. Care should be taken in this case, as the size of the parameter passed is rounded up to an even size.
184.108.40.206 TYPE ADDRESS
The type ADDRESS is compatible with all pointer types. ADDRESS itself is defined as a POINTER TO WORD. In this implementation, the type ADDRESS is not compatible with any arithmetic type. This is due to the fact that the Intel 8086 series processors use segmented addresses. It would not be hard to implement automatic conversions between LONGCARD and ADDRESS but it is felt that this would be contrary to the spirit of the language, whereby the compiler is not expected to perform any "magic" tricks. Instead, two functions are provided for that purpose: FLAT and PTR.
For compatibility with other compilers, we relaxed the above a little. ADDRESS + CARDINAL and ADDRESS - CARDINAL are legal expressions. The CARDINAL is added or subtracted from the offset portion of the ADDRESS and the result is still an ADDRESS.
Also, INC and DEC can take an ADDRESS as their first argument. The operation is, however, performed on the offset portion of the ADDRESS only.
220.127.116.11 SEG and OFS
These are field definitions for POINTER types. If you import these, you may access the segment or offset portions of a pointer variable using regular field selection syntax. Example:
18.104.22.168 PROCEDURE ADR
ADR( designator )
Returns the address of designator (type ADDRESS).
22.214.171.124 PROCEDURE FLAT
FLAT( ADDRESS )
returns a LONGCARD "flat" address.
126.96.36.199 PROCEDURE PTR
PTR( LONGCARD )
returns an ADDRESS corresponding to the "flat" address represented by the LONGCARD.
188.8.131.52 PROCEDURE SEGMENT
SEGMENT( designator )
returns the segment portion of the address of 'designator'. Example:
DX := SEGMENT( buffer );
would assign to DX the segment value of ADR(buffer).
184.108.40.206 PROCEDURE OFFSET
OFFSET( designator )
returns the offset portion of the address of 'designator'.
220.127.116.11 PROCEDURE NEWPROCESS
NEWPROCESS(p:PROC; a:ADDRESS; n:CARDINAL; VAR p1:ADDRESS)
creates a new process whose entry point is p and workspace is at a for n bytes. p1 is the new process pointer. This process is not activated until a TRANSFER to p1 is done.
The starting priority of the new process is the current processor priority at the time NEWPROCESS is invoked (please refer to the section on Module Priorities).
18.104.22.168 PROCEDURE TRANSFER
TRANSFER( VAR p1, p2 :ADDRESS)
suspends the current process, assigning it to p1 and resumes p2. The current process' value is assigned to p1 only after p2 has been identified; it is, therefore, okay for p1 and p2 to be the same.
The process is resumed at the same priority level that it was running at, at the time of suspension.
22.214.171.124 PROCEDURE IOTRANSFER
IOTRANSFER( VAR p1, p2 :ADDRESS; intVector :CARDINAL )
issues a TRANSFER from p1 to p2 (just the way TRANSFER does it) after installing the current process for reactivation when an interrupt comes in through interrupt vector intVector.
When the interrupt occurs, the interrupt vector is reloaded with its previous value. A TRANSFER is done to the I/O process (the one that issued the IOTRANSFER) such that p2 now contains the value of the process that was running when the interrupt occurred.
An 8086 inline assembler is provided. Once ASSEMBLER is imported from SYSTEM, you can enter inline assembler code by bracketing it with the keywords ASM and END. Assembler input is free form. Comments are entered as in regular Modula-2. Example:
loop: CMP BYTE [SI], 0 (*end of string?*)
MOV BYTE [DI], [SI]
INC SI INC DI (*increment pointers*)
The assembler accepts all the 8086/8088 opcode mnemonics. Address operands can be coded in just about any form acceptable to other assemblers, except that the only operator supported is '+'.
Operand type overrides are: WORD, BYTE, FAR, NEAR and are not to be followed by the keyword POINTER or PTR. Example:
label: MOV AX, ES:[BX,DI+5]
MOV AX, ES:5[DI+BX]
MOV WORD , 1
CALL NEAR [DI]
TEST BYTE i+2, 1
All the mnemonics and register names must be entered in upper case. In case you need to use a Modula-2 name that conflicts with one of the assembler reserved symbols, you may precede it with a '@'. Example:
MOV @AX, AX
would generate a move from register AX to variable AX.
All modula-2 variables can generally be accessed in assembler.
Record field names are not accessible from assembler. The assembler will not automatically do anything for you. For example: if you specify a VAR parameter as an operand to an instruction, you are naming the address of the pointer to the actual parameter. Example:
PROCEDURE p( VAR done :BOOLEAN );
LES DI, done
MOV BYTE ES:[DI], TRUE
is the correct way of storing TRUE in done.
The following types of constants may be accessed in assembler: INTEGER, CARDINAL, BOOLEAN, CHAR and enumeration constants.
All labels declared inside an ASM section are local to that section of code. But labels names cannot match some name known in the scope of the current procedure. Labels can only be referenced in jump instructions.
All jumps are optimized by the compiler. There is, therefore, no need (or capability) to specify the size of a jump. In particular, the compiler will turn a conditional jump out of range into a reverse conditional jump over a far jump to the original destination.
Remember, this is a Modula-2 compiler, not an assembler! The inline assembler capability is provided for use in exceptional situations only.
126.96.36.199 ASSEMBLER - 8087 support
All the 8087 math coprocessor instructions are supported by the inline assembler. There are some restrictions, however.
Only the following operand types are supported by the load and store instructions: INTEGER, LONGINT, REAL and LONGREAL. You may not, therefore, load or store a value in temporary real or decimal format.
The meaning of the "no operand" form of the arithmetic instructions was retained:
FADD, FSUB, FMUL and FDIV all operate on the two top elements of the 8087 stack, using ST(1) as the destination and removing ST.
FSUBR subtracts ST(1) from ST (FDIVR divides ST by ST(1)), leaving the result in ST(1) and removing ST.
The 2 operand format of the arithmetic instructions was not implemented. You may not, therefore, specify a destination register other than ST, except in the "and pop" versions of the instructions.
With a regular assembler, in register to register operations, you can specify the register that gets the result of the operation (the destination register). By definition, the destination register is also the first operand of the instruction.
With our inline assembler, ST is always the destination of the operation, except in the "and pop" form of the instructions, in which case the register specified in the instruction "gets the result".
For consistency, we decided that ST should always be the first operand of the instruction, even when the "and pop" form is used.
The meaning of FSUBP, FSUBRP, FDIVP and FDIVRP is, therefore:
FSUBP ST(1) means FSUBRP ST(1),ST -> ST(1):=ST-ST(1)
FSUBRP ST(1) means FSUBP ST(1),ST -> ST(1):=ST(1)-ST
FDIVP ST(1) means FDIVRP ST(1),ST -> ST(1):=ST/ST(1)
FDIVRP ST(1) means FDIVP ST(1),ST -> ST(1):=ST(1)/ST
and ST is popped.
Arbitrary inline code may be generated using the INLINE procedure, which takes the form
INLINE ( value [ ,value, value ... ] )
where value can be a small cardinal (<= 255) literal (the 1 byte value is inserted into the code stream), a CONSTant (2 bytes are inserted into the code stream), or a variable reference (the variable's offset is inserted into the code stream). For example:
INLINE (0CCH) (* generate INT 3 as debug break point *)
6.10 The generated object code
6.10.1 Data type representation
CHAR 1 byte
INTEGER 2 bytes 2's complement
CARDINAL 2 bytes
LONGCARD 4 bytes
LONGINT 4 bytes 2's complement
BOOLEAN 1 byte (1=TRUE, 0=FALSE)
REAL 4 bytes Intel 8087 format.
LONGREAL 8 bytes Intel 8087 format.
BITSET 1 word. 0 is low order bit, 15 is high order bit.
Enumerations 1 byte
SETs 1 to 8 words (sets of up to 256 elements)
POINTERs 4 bytes in Intel 8086/88 format
PROCEDUREs 4 bytes POINTER to procedure entry point
Addresses are represented in the default Intel 8086 format:
1 word byte offset
1 word segment
Numeric values are likewise represented the way the Intel 8086 processor family likes them: low order byte first, high order byte last.
6.10.2 The runtime memory map
The compiler generates code using the "large" or "huge" memory model only.
In the "huge" memory model, each module has its own data and code segments. In the "large" memory model, each module has its own code segment. The entire program has one data segment.
The linker binds all the code segments first, and then all the data segments. The stack is allocated above the data segments. All the remaining memory is available for the heap.
When a program is loaded for execution, here is what the memory looks like:
From low to high addresses:
I Interrupt vectors I
I DOS I
PSP I Program segment prefix I
PSP+100h I Program Code segments I
I Program Data segment(s) I
StackSeg I Stack I
HeapTop I Heap I
I ... I
I DOS Command (resident portion) I
Label names on the left are the ones exported by System.
The system uses interrupt vector 192 (0C0H) at location 0000:0300. Interrupt 192 is issued by a program when a runtime error occurs, when HALT is invoked or when a coroutine other than the main one terminates via a return.
The first word (offset 0) in every code segment contains the data segment value for that particular module (for the program, in the case of the "large" memory model).
6.10.3 Procedure calling conventions
Procedure parameters are pushed into the stack 1st argument first. Control is then transferred to the procedure through a FAR call (NEAR call is used to invoke nested procedures). It is the called procedure's responsibility to remove its parameters from the stack before returning.
188.8.131.52 Parameter passing (all except open array parameters)
If the formal parameter of a procedure is a value parameter, the actual parameter is copied into the stack.
If the formal parameter is a variable parameter (VAR), the address of the actual parameter is pushed into the stack (first the segment portion of the address and then the offset part).
184.108.40.206 Parameter passing (open array parameters)
If the formal parameter is an open array, the address and HIGH value of the corresponding formal parameter are pushed into the stack (HIGH value first, and then the address, as above).
If the open array parameter is a value parameter, the value of the actual parameter is copied into the stack on procedure entry.
220.127.116.11 Returning values from a function procedure
One byte results are returned in AL, two byte results are returned in AX, and four byte results are returned in DX:AX (DX has the high order part of the result).
LONGREALs and arbitrary structures are returned in the stack, at a location reserved for that purpose by the caller. When invoking a function that returns a LONGREAL or a structured type, an extra parameter is pushed onto the stack: the two byte offset, in the SS segment, of where to place the result. This choice allows for full reentrancy of the code generated.
6.11 Module priorities
Eight module priority levels are supported in this implementation, from 0 (highest priority) to 7 (lowest).
Priorities are implemented by masking off, on the 8259 interrupt controller, all the interrupts at or below the current priority level.
Because the PC usually runs with several of the interrupt levels disabled, it is not easy to decide what the interrupt mask for the value for "no priority" should be for your particular application. The implementation of NEWPROCESS, therefore, assumes that you have enabled all the interrupts that your program will be capable of processing before you create your processes. The value in the interrupt mask register of the 8259 at the time of process creation will determine the initial priority level of this process, once it gets started. Because of this, invoking NEWPROCESS from inside a priority module is usually not what you want to do!
Execution priorities are changed when entering/exiting procedures in modules that have a priority specification, and during the execution of some form of a TRANSFER.
We highly recommend that you study the communications program provided, paying particular attention to the module Kernel, for an example of how to use priorities with this system.
NOTE: The compiler does not restrict the priority level specified (any number will do). You must, therefore, exercise care in defining a module's priority level. On the other hand, it is easy to add additional priority levels by simply modifying the runtime module M2Procs.
6.12 Memory models
In general, you may compile the same code under either the LARGE or the HUGE memory model. The only factor to consider is when using inline assembler.
Under the HUGE memory model, the compiler generates code to reload DS after any invocation of an imported procedure or a VARiable procedure. Under the LARGE memory model, this is not necessary as a single data segment is defined. If you write some inline assembler code that modifies DS, please restore it, even if the next thing you do is a RETurn; this way, your routine will work regardless of whether you use the LARGE or the HUGE memory model.
Under the HUGE memory model, access to external variables is done through an indirect pointer, whereas in the LARGE memory model the external variable resides in the program's ONLY data segment and is, therefore, directly accessible.