STAGE2 INTRODUCTION COPYRIGHT: Written: 06/15/79 Updated: 06/17/79 This introductory material is the property of: Dick Curtiss 843 NW 54th Seattle, Washington 98107 Permission is granted to copy for personal use only. This material may NOT be used for publication without written permission of the author. STAGE2 PROGRAMMING TECHNIQUE: STAGE2 is unlike conventional languages and requires getting used to a different way of approaching a problem. By its nature, STAGE2 forces a top down approach to problem solving using stepwise refinement and recursive descent. Expect to read this material several times before it sinks in. The best way to learn STAGE2 is to study examples and experiment. Good luck! EXAMPLE PROBLEM: Suppose it is desired to recognize the following WHILE statement and translate it into assembler type language. WHILE ( X < Y ) PRINT X*Y : X = X + 1 ; Comment A top level macro would be used to recognize the "WHILE" as a keyword. As part of the recognition process the statement would be broken into three parts, "WHILE", "X < Y", and the rest of the line. The first step in the translation process is to generate a looping label. Next, "X < Y" would be passed on to a set of macros designed to generate a sequence of assembler language instructions which evaluate the conditional expression. Then a "jump if false" instruction would be generated to branch to the loop exit label. Next, the remainder of the line would be passed on to macros designed to recursively break apart multiple statement lines and process each single statement with still other specialized macros. After decomposition of the "WHILE" statement is complete, a jump instruction is generated to branch to the looping label. Finally, "WHILE" statement processing is completed by generating a loop exit label. EXAMPLE MACRO EXPANSION: LOOP: LOAD X CMP-LT Y J-FALSE EXIT LOAD X MULTIPLY Y CALL PRINT-RESULT LOAD X ADD #1 STORE X JUMP LOOP EXIT: EXAMPLE MACROS: WHILE$($)$# To recognize WHILE statement LOOP:!F13% output looping label CONDITION !20% macro call to parse conditional expression J-FALSE EXIT!F13% output exit jump STATEMENT !30% macro call to parse remainder of line JUMP LOOP!F13% output loop jump EXIT:!F13% output loop exit label % ------------------------- end of macro CONDITION $<$# To recognize less than compare LOAD !10!F13% output load X instruction CMP-LT !20!F13% output compare Y instruction % ------------------------- end of macro STATEMENT $:$# To recognize and split multiple statement PROCESS !10% macro call to process single statement STATEMENT !20% recursive macro call to parse rest of line % ------------------------- end of macro STATEMENT $;$# To recognize and split statement with comment PROCESS !10% macro call to process single statement % ------------------------- end of macro STATEMENT $# To recognize single statement PROCESS !10% macro call to process single statement % ------------------------- end of macro PROCESS PRINT$# To recognize and process PRINT statement *** macro code not shown % ------------------------- end of macro PROCESS $=$# To recognize and process assignment stmt. *** macro code not shown % ------------------------- end of macro READING MACROS: Macros consist of a template line followed by one or more code body lines. The macro is terminated by an empty code body line. Macro templates, which are terminated by a special template end character, consist of character strings with special parameter flag characters interspersed. Code body lines are terminated by a special code body line end character. An empty code body line (macro terminator) has the code body line end character in column 1. A special escape character is used in code body lines for parameter reference and invocation of processor functions. Characters in a line following the special end characters are taken as comment only. SPECIAL CHARACTERS: # Template end of line $ Template parameter flag % Code body end of line ! Code body escape ( Left bracket ) Right bracket These special characters are user selectable on the first line of input to STAGE2. The particular characters shown above were arbitrarily chosen for the examples which follow. MACRO EXAMPLE: This macro may be used to store information into the STAGE2 built in memory. MEM[$]=$# 1. Template line !F3% 2. Store into memory % 3. End of macro The template in line 1 contains two parameter flags (maximum of nine allowed). For a string to match the template it must contain the literal characters in the order shown in the template line. The parameter flag characters, "$", will match any balanced strings including a null string. A balanced string is one containing equal numbers of left and right bracketing characters, usually "(" and ")". Line 2 contains a processor function request. The escape character "!" folowed by "F" followed by a digit specifies one of ten possible functions. The "F" can actually be any non- numeric or special character. The function "3" shown in the example instructs STAGE2 to store parameter string 2 into the memory using parameter string 1 for access to memory. In other words the string in parameter 1 is given a value in memory and that value is the string in parameter 2. Parameter 1 is the string segment represented by the first "$" in the template and parameter 2 is the string segment represented by the second "$" in the template. The maximum number of parameters is nine. Line 3 is an empty code body line or macro terminator. Strings for a successful match: MEM[25]=TWENTY FIVE# Parameter 1 = "25" Parameter 2 = "TWENTY FIVE" Parameters 3-9 = "" MEM[ABC]=HELLO# P1 = "ABC" P2 = "HELLO" MEM[EQUATION]=A=2*(B+C)# P1 = "EQUATION" P2 = "A=2*(B+C)" MEM[X=Y]=Z# P1 = "X=Y" P2 = "Z" Strings for a match failure: MM[12]="E" MISSING# MEM [XYZ]=SPACE AFTER "MEM"# MEM[ABC)]=UNBALANCED STRING# MEM[ABC]=UNBALANCED (STRING# MACRO EXAMPLE: This macro may be used to print information stored in the memory. PRINT MEM[$]# 4. Template line !10=!11!F14% 5. Extract info and output % 6. End of macro The template in line 4 contains 1 parameter flag which represents the string which will be used to access the memory. The first escape character in line 5 is followed by a non-zero digit, "1", which is taken to be a reference to parameter string 1. The digit, "0", following the parameter reference is a conversion code (0-8 allowed). Conversion "0" copies the parameter string unchanged to the constructed line. The constructed line can be thought of as a scratch string which is empty at the start of a code body line scan. In summary the three characters "!10" instruct STAGE2 to append parameter 1 to the constructed line. The next character in line 5 is a literal "=" which is appended to the constructed line. Next is another escape character followed by the digit "1". This is another reference to parameter string 1. This time, however, the conversion digit is a "1" which instructs STAGE2 to append information from the memory to the constructed line using the specified parameter string for access. At this point 3 items have been appended to the constructed line: parameter string 1, "=", and a string from the memory. The next character in line 5 is another escape. This time, however, the following character is non-numeric indicating a processor function request. The next character, the digit "1", specifies the output function. The following digit, "4", specifies the output channel. "!F14" causes output of the constructed line to channel 4. When the channel number is ommitted output is to channel 3 by default. Processing of line 5 is now complete and line 6 terminates the macro. Strings for successful match: PRINT MEM[25]# Channel 4 output = "25=TWENTY FIVE" PRINT MEM[ABC]# Ch4 = "ABC=HELLO" PRINT MEM[PDQ]# Ch4 = "PDQ=" nothing stored previously MACRO EXAMPLE: This macro will also display information stored in the memory but formatted into fields. FORMAT MEM[$]# 7. Template line !11!26% 8. Extract info !F14% 9. Output 1111111= 22222222222222222% 10. Format % 11. End of macro As in line 5, the "!11" in line 8 appends information from the memory to the constructed line using parameter 1 for access. Then "!2" is a reference to parameter 2. The following conversion digit, "6", instructs STAGE2 to copy the constructed line into the specified parameter. This also results in clearing the constructed line to null. Processing of line 8 is complete as the end of line character comes next. It is possible, however, to have more operations appear in that same code body line (i.e. "!11!26 !F14#"). The character after the 6 in this alternative is ignored so a space is shown. At this point parameter 1 still has the string resulting from the template match and parameter 2 contains the string extracted from the memory using parameter 1 for access. The output request in line 9 is like that of line 5 except that the constructed line is empty (null). This condition instructs STAGE2 to use the following code body line as a formatting template which should not be confused with macro templates. Fields of numeric characters refer to corresponding parameter strings. Non-numerics in the formatting template appear in the output line as is. Parameter strings are inserted into the fields left justified (leading blanks are not suppressed) and blank filled or truncated on the right depending on parameter length and field width. Strings for successful match: FORMAT MEM[25]# Ch4 = "25 = TWENTY FIVE " FORMAT MEM[ABC]# Ch4 = "ABC = HELLO " FORMAT MEM[PDQ]# Ch4 = "PDQ = " EXAMPLE RUN: FILE "MEMORY.INP" #$%!0 (+-*/) 0. Special character selection MEM[$]=$# 1. Template line !F3% 2. Store into memory % 3. End of macro PRINT MEM[$]# 4. Template line !10=!11!F14% 5. Extract info and output % 6. End of macro FORMAT MEM[$]# 7. Template line !11!26% 8. Extract info !F14% 9. Output 1111111= 22222222222222222% 10. Format % 11. End of macro END# 12. Template line !F0% 13. Terminate processing %% 14. End of macros MEM[25]=TWENTY FIVE# MEM[ABC]=HELLO# MEM[EQUATION]=A=2*(B+C)# MEM[X=Y]=Z# MM[12]="E" MISSING# MEM [XYZ]=SPACE AFTER "MEM"# MEM[ABC)]=UNBALANCED STRING# MEM[ABC]=UNBALANCED (STRING# PRINT MEM[25]# PRINT MEM[ABC]# PRINT MEM[PDQ]# FORMAT MEM[25]# FORMAT MEM[ABC]# FORMAT MEM[PDQ]# END# Note: The "#" shown at the end of the input lines are optional in the CP/M implementation as carriage return is sufficient for an end of line condition. The special character when used is the same terminator used for macro template end of line. It can be used if it is desired to allow comment information in the source input stream. COMMAND LINES: STAGE2 CH3,CH4=MEMORY.INP TYPE CH3 TYPE CH4 FLAG LINE: The first line read by STAGE2 is used to specify the user's special symbol selections. column description 1 End of template and source input lines 2 Template parameter flag 3 End of code body line 4 Escape character for code body (parameter or function ref.) 5 The character for zero 6 Space character for formatted output 7 Open bracket (arithmetic expressions and balanced strings) 8 Addition operator 9 Subtraction operator 10 Multiplication operator 11 Division operator 12 Closing bracket (to match #7) PARAMETER CONVERSIONS: There are a maximum of ten parameters, numbered 0 through 9. Parameter 0 is a special case. There are nine possible parameter conversions, numbered 0 through 8. Most of this discussion will refer to specific parameters but the remarks apply generally to parameters 1 through 9. !10 Append parameter string 1 to the constructed line. !20 Append P2 to the CL. !11 Append MEM(P1) to the CL. Using P1 for access, append the value of the symbol to the CL. CASE ( P1 = null ) Generate error mesage and trace back ( P1 undefined ) Append null to the CL ( otherwise ) Append MEM(P1) to the CL fin !12 Similar to conversion 1 except when P1 is undefined. CASE ( P1 = null ) Generate error message and trace back ( P1 undefined ) MEM(P1) = S1 ; define from symbol generator S1 = S1 + 1 ; increment symbol generator Append MEM(P1) fin ( otherwise ) Append MEM(P1) to the CL fin !13 Useful only in conjunction with context-controlled iteration. (Described after !17) !14 Append EVAL(P1) ; Evaluate the parameter string as an arithmetic expression and append a string of digits to the CL to represent the result. Non-numeric items in the expression will be taken as symbols for memory reference. An undefined symbol is treated as zero. If null, P1 will be treated as zero. A symbol with a non-numeric value will cause an error message and traceback. !15 Append LEN(P1) to the CL ; Append a string of digits to the CL to represent the length of the parameter string. A null string results in a single zero digit. !16 P1 = CL, CL = null ; Copies the CL into parameter 1, replacing whatever might have been there before. Also, the CL is cleared. The character immediately following "!16" will be ignored. If it is the end of code body line character processing will continue on the following line. Otherwise, processing will continue with the next character. When used inside of an iteration loop the string placed in the specified parameter is not retained from one iteration to the next or after exit from the loop. !17 This starts a context controlled iteration loop. The current value of the specified parameter is saved as the iteration process will supply new values for the parameter. The original value will will be restored after exit from the loop. The CL is scanned for break characters which are specified following the digit "7". All of the characters up to the end of line character will be used as break characters. If no break characters are specified the CL scan is broken on each character. When a break character or the end of the CL is reached scanning stops and the scanned string (excluding the break character) is copied into the specified parameter. The scanned string and break character are deleted from the CL. Break characters enclosed in brackets will not be recognized as the scanned string would not be balanced. After scanning stops code body lines are expanded within the loop which ends at an "!F8". After all lines within the loop have been processed, scanning of the CL continues unless the CL is null. When the CL is null the iteration loop is terminated. !F8 Processor function to define the scope of an iteration loop. !13 Append BREAK(P1) to the CL ; The break character is the single character immediately following the specified parameter which represents a substring of the line being scanned. When the end of line is reached, the break character is null. !18 Append a string of digits to the CL to represent the internal storage code for the character in P1. Unless P1 contains exactly one character an error message and traceback will result. SYMBOL GENERATOR: !0 Parameter "0" is a reference to an internal symbol generator. Within a given macro expansion up to ten unique symbols (actually integers or strings of digits) are available; "!00" through "!09". After the macro expansion is complete the symbol generator is incremented so that future macro expansions will get different symbols. PROCESSOR FUNCTIONS: There are eleven processor functions, numbered 0 through 9 and E. Some processor functions assume use of specific parameters. !F0 Terminate processing. !F1 Output request. The output request must appear at the end of a code body line. The CL is output if it is not null. If it is null the following code body line will be used as a format specification for output. A format line specifies exactly the number of characters in the line to be output. Non-numeric characters in the format specification are output exactly as they are. Parameter fields in the format specification are denoted by strings of identical digits. "22222" is a five character field into which parameter string 2 will be inserted, left justified with blank fill or truncation on the right as required. A given parameter may be referenced for more than one field in the formatted line. !F14 Output the CL to channel 4. !F1 Output to channel 3 as default channel. !F12R Output to channel 2 after rewind. !F2 Change I/O channels and copy text from the specified input channel to the specified output channel. If P1 is null no copying takes place. Copying continues up to an input line whose initial substring matches P1. The line which matches P1 is ignored, copying stops and the input channel is positioned to the line following the matched line. If no match line is found end of file terminates the copy. WHEN ( input channel specified ) make it the new current input (CI) channel fin ELSE the current input channel number is unchanged WHEN ( output channel specified ) copy to the specified channel fin ELSE copy to channel 3 5!F2 CI = 5 , out is 3 2R!F2 CI = 2 , Rewind 2 , out is 3 !F24 CI unchanged , out is 4 2!F23R CI = 2 , out is 3 , Rewind 3 before copy In all cases no copy takes place if P1 is null. !F3 MEM(P1) = P2 ; Using parameter string 1 for access, store parameter string 2 into memory. (i.e. the value of P1 is defined to be P2). If P1 is null an error message and traceback result. !F4 Set the skip counter unconditionally. The skip counter applies to macro code body lines. The skip feature allows conditional expansion of portions of a macro code body. Parameter string 1 is evaluated as an arithmetic expression (see conversion 4 description) and the result is placed in the skip counter. SKIP = EVAL(P1) !F5 Set skip counter based on string compare for equality. The test condition is specified by a character following "!F5". !F50 IF ( P1 == P2 ) SKIP = EVAL(P3) !F51 IF ( P1 <> P2 ) SKIP = EVAL(P3) !F6 Set skip counter based on the relative values of 2 arithmetic expressions. The test condition is specified by a character following "!F6". !F6- IF ( P1 < P2 ) SKIP = EVAL(P3) !F60 IF ( P1 == P2 ) SKIP = EVAL(P3) !F61 IF ( P1 <> P2 ) SKIP = EVAL(P3) !F6+ IF ( P1 > P2 ) SKIP = EVAL(P3) !F7 Count-controlled iteration. The CL is evaluated as an arithmetic expression (see conversion 4 description) and the resulting value is placed in an iteration counter. The loop, which ends at an "!F8", is repeated with the iteration counter decremented for each iteration. The loop terminates when the counter reaches zero. !F8 Defines the scope of count-controlled loops and context- controlled loops. Loop nesting is permitted. Skipping out of loops is permitted. Skipping over entire loops is tricky business (see Waite's book, page 398). !F9 Terminates expansion of the current macro. !FE Force an error message and traceback. All macro calls to the current level will be output in reverse order to channel 4. The last traceback line is the current input line. ----------------------------------------------------------------------- STAGE2 PROGRAM: This is a highly simplified description of the STAGE2 algorithm. PROGRAM; PROCEDURE MATCH ( STRING ); BEGIN attempt to match STRING against macro templates IF MATCH_SUCCESSFUL THEN FOR each line of macro code body DO BEGIN scan code body line and perform operations IF CONSTRUCTED_LINE <> NULL THEN MATCH ( CONSTRUCTED_LINE ); {note recursive call} END ELSE output STRING to channel 3 END; BEGIN { ---------------- program starts here ------------ } INPUT_FLAG_LINE {gets special character definitions from the first line input from channel 1} INPUT_MACROS {reads macro code bodies into memory from channel 1 and builds templates into tree structure for the template matching algorithm} INPUT_NEXT_LINE {gets first source line from input file (channel 1) - this is the first line following the last macro} WHILE NOT END_OF_FILE DO BEGIN MATCH ( LINE ); {attempt to match the line against all macro templates} INPUT_NEXT_LINE END; END. Except for switching channels or rewinding channels, the STAGE2 user has no control over input. The processor has a built in loop for input as can be seen in the above program. Output, however, is under user control through macro body processing. If STAGE2 fails to match an input line against a macro template the line will be output to channel 3 as is and the processor will go on to the next input line.