Docstoc

Building a Scripting Engine-Compiler

Document Sample
Building a Scripting Engine-Compiler Powered By Docstoc
					Let's Build a Scripting Engine-Compiler
Blunt Axe Basic Project Bxbasic
By: S. Arbayo

Alpha Version 2.0 Updated Edition

Copyright :(c) sarbayo 2001-2009

TABLE OF CONTENTS
Preface............................................................................................................................................................................................ 1 Welcome to Bxbasic ............................................................................................................................................................... 1 What is Bxbasic ?................................................................................................................................................................... 1 How did Bxbasic get startedhe HACK ................................................................................................................................................................................. 2 Native Code Compiler: ..................................................................................................................................................... 3 Scripting Engine-Compiler



– 11 SUPPLEMENTAL.......................................................................................................................................... 425 INTRODUCTION.................................................................................................................................................................. 425 Prototyp.h:.......................................................................................................................................................................... 425 Utility.c: ............................................................................................................................................................................. 425 Fileio.c: .............................................................................................................................................................................. 426 Getinput.c:.......................................................................................................................................................................... 431 Ifendif.c: ............................................................................................................................................................................. 433 Input.c: ............................................................................................................................................................................... 439 Variable.c: .......................................................................................................................................................................... 441 String.c: .............................................................................................................................................................................. 445 Loops.c:.............................................................................................................................................................................. 447 Bxcomp.c

tox87 FPU: ....................................................................................................................................................................... 577 THE PARSER:....................................................................................................................................................................... 582 CODING THE xcknowledgements .................................................................................................................................................................... 641

Preface
Welcome to Bxbasic
What is Bxbasic ?
Bxbasic is presented as a programming tutorial, to develop and construct a Console Mode Scripting Engine and Byte Code Compiler. The Bxbasic dialect, included here, is a subset of the GW-Basic and QBasic programming languages.

How did Bxbasic get started ?
Well, this whole thing started a few years back, when a number of programmers and I, that frequented the Rapid-Q developer’s group site, decided that we just didn't like what was going on out there in the realm of Rapid-Q and Basic programming in general. So, we set out to form our own programmer’s group, that being the QDepartment group. A number of us were toying around with the various dialects of Basic currently available and we just weren't satisfied with what we were seeing. Microsoft has entirely and long ago abandoned QBasic/QuickBasic. A language many of us got a great deal of enjoyment programming in. Some of us program as an occupation and others just for pleasure. At any rate, Visual Basic (VB) costs an absolute fortune and some of the other alternatives just don't have the 'Touch and Feel' of QBasic. Not that QBasic is the greatest dialect or language ever written, by any means, but, (IMHO) it's just a fun environment to program in. Unfortunately, QBasic (and Quick Basic 4.5) are still relegated to the world of 16 bit programming. When I first got involved with this, I was just looking for a tool that I could use to recompile some old Quick Basic programs I had written so that they could run in a 32 bit environment. Like so many others, I experimented with the various dialects of Basic now available, with mixed results. Some Basics claimed to be nearly QBasic, others claimed to need only minimal rewriting. Some were available for 'free' while others cost quite a hefty chunk. Some of the so-called 'free' ones ended up really being crippled or minimal versions of a commercially available full featured product. Rapid-Q, one that I thought I liked (and so did a lot of other people) ended up requiring a full 50% rewrite before I could get any pre-existing code to run on it. And then, it didn't do all the things I needed it to do. One day, some one suggested that maybe we should try to develop our own version of QBasic. If we could do that, then we would control what it did, what it didn't do and what it might do. So, after spending a few years learning what you need to know about writing interpreters and compilers, I began writing Bxbasic. From that point forward, in my spare time, I’ve been slogging through it. Writing code and finding out what worked and what didn't and finding out why not. Little by little, I started putting together the beginnings of a QBasic like scripting engine (interpreter). I needed to start with what I considered the most rudimentary aspects of the "Console Mode" GW-Basic and QBasic dialects. Anyway, I figured I'd start with a "Console Mode" scripting engine-compiler and work up gradually, to bigger and better things. I'd add features a little at a time and eventually build this into something I might actually be able to use. That's where we are now.

1

CHAPTER - 1
INTRODUCTION
I recently completed translating Jack Crenshaw's, now famous (although never completed), "Let's Build a Compiler" series, written in Turbo Pascal and for the Motorola 68000 CPU, over to Ansi-C and re-targeted for the Intel x86 CPU. Having done that, I became inspired to rewrite my own tutorial series 'Blunt Axe Basic' in a similar format. A word about VM: If you've already read a book, or an article or someone else's tutorial about compilers and you already know about Virtual Machines (VM), then you may already know more about compilers than I do. If you want to learn about VM, then this is not the article you should be reading. Except for what you see here, the word VM does not appear in this tutorial. It is beyond the scope of this tutorial to get into the discussion of Virtual Machines, stacks or registers. This is a very hands-on approach to crafting a compiler and it does not follow any pre-defined set of rules. When we need a new function or component, we will build (hack) that function or component as we need it, modifying or rewriting the compiler as we go. It is simply my intent to take some of the mystery out of what goes into making a working scripting engine / compiler. That said, here goes.

The HACK
The Hacker's approach of writing a Scripting Engine and Compiler
Welcome, to what I hope is a very low tech approach to learning C, C++, scripting engines, interpreters, compilers and x86 assembly language. It is not required that you already know C programming, but you will need an Ansi-C compiler, don't worry, they can be gotten cheaply or for free, more on that in a moment. The only assumption I'll make is that you know how to use a C compiler and/or you've read the compiler's documentation. I began this project in early 2001. At that time I was an absolute beginner and novice in C programming. Even though I've been coding since the early 1980's, no programs I've written or any language I've coded in prepared me for programming in 'C' and beginning this project. I intended this as a tutorial, for the purposes of self-teaching and possibly enlightening others who would like to know how to design and construct their own Scripting Engines and Compilers. Primarily, this project is a hobby, but, for years I've always wanted to construct my own programming tools and then one day I just decided to do it. I was thinking that someone might be asking; “why a Scripting Engine-Compiler and not just a Native Code Compiler? “ Well, that's a valid question. I'd like to answer that with the help of a little pro and con.

2

Native Code Compiler: Pros:
• • generally results in very fast executables, because the output is Assembly Code which is then assembled directly into the CPU's native language, Small, compact programs, because each instruction in the source file is reduced directly into native code.

Cons:
• • • • can require a considerable knowledge of assembly language on the part of the writer, the output is assembly language which is targeted to a specific CPU and Operating System, without considerable effort, they are generally not cross-platform or transportable, Depending on the programmer’s level of skill, it can take a long time to write.

Scripting Engine-Compiler: Pros:
• • • • • if designed properly, can offer very good performance, can be written using just about any compilable language, when written in C/C++ they can easily be made cross-platform, scripted languages are in very wide usage, such as HTML, JavaScript and Perl, to name only a few, Can be readily used by programmers who simply wish to obfuscate their Basic, HTML, JavaScript or other code from prying eyes. not as fast as Native Code, because each instruction must be interpreted, A larger executable size, because the script after being reduced to a byte code, is appended to the engine.

Cons:
• •

I'm sure I haven't covered all the pros and cons, but, this covers some of the main concerns.

BXBASIC
In this chapter we will lay out the ground work for our engine by constructing some of the basic components we will need, like source file input, display output and error handling. Then we'll begin to define our language by setting up some preliminary parameters and syntax requirements. When the foundation is laid out, we'll construct a rudimentary scanner and parser and begin executing our first programs. As a prototype language I've chosen to use Basic as a base, beginning with the GW and QBasic models. Why Basic ? Just about everyone who programs, knows Basic and how it should behave under most circumstances. Despite the fact that there are dozens of new languages out there these days, Basic is by no means an obsolete language nor is it a language only for beginners. Besides, as I've learned, once you have the basic principles of a given language down, you can easily retarget it to any other language of your choosing by simple redefining the language parameters, which as you will see is not that hard to do. I coined the term "blunt axe basic" and call this language Bxbasic, (which I refer to here simply as Bxb). It is being written in Ansi-C and compiled and tested using two different C compilers; Power C and LccWin32. Information on Power C (a commercial product) can be found at: http://www.mixsoftware.com Information on LccWin32 (free, download) can be found at: http://www.cs.virginia.edu/~Lcc.win32/

3

There is one header file that LccWin32 requires for this project, but, it doesn't come with, that is: tcconio.h, This is an add-on and may be easily downloaded from: http://tech.groups.yahoo.com/group/QDepartment/files/

GETTING STARTED
Okay, rather than getting bogged down in Backus-Naur Form and technical aspects of compiler design and language theories, (not that they aren't important to know), I'll just leave that for you to do on your own and take a queue from Jack Crenshaw and use his "KISS" method (Keep It Simple Sidney). We'll just jump right in and start building our engine and we'll learn as we go. Additionally, rather than simply coding fragments of our compiler and hop from topic to topic, I'll try to stay on track and keep designing and adding new components to our engine. We'll keep building onto a working machine, making improvements, fixing bugs and adding new functionality. By the end of this first chapter we'll have a rudimentary, but functioning, Scripting Engine. Before we begin, I’d like to mention, that if you are new to C and this is perhaps your first project, please feel free to ask any questions, whether they are Bxb or C/C++ related. Either I or someone else in the QDepartment group will be glad to assist you in any way we can, at: http://tech.groups.yahoo.com/group/QDepartment/ The first thing we need is to create the basic skeleton of our Bxbasic program, which of course will be written in C. That means we need to make some declarations for our C compiler by declaring the headers we will be needing. The headers are files that contain definitions and routines that the C compiler needs to compile our program. Copy these lines of code to your favorite editor or IDE:

/* bxbasic.c.v01 */ /* #define Power_C #define LccWin32 */

/* --- declare headers --- */ #include <stdio.h> #include <conio.h> #include <io.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <malloc.h> #ifdef Power_C #include <bios.h> #endif #ifdef LccWin32 #include <tcconio.h> #endif /* Power-C version */ /* LccWin32 version */

4

Save this file as:

bxbasic.c

But, keep it open because we're going to keep adding to it.

**NOTE:
If you are using an IDE; In your compiler's IDE (Lcc), Under the heading of: Projects Create a project named: Bxbasic And include Bxbasic.c into that project. Bxbasic.c will be the only file “INCLUDED” in the project through out this series.

DO NOT “INCLUDE” the additional files we create into the project. Doing so will generate compiler errors!
Notice this line of code: in the above. define LccWin32

This acts like a switch, in that, by defining LccWin32 we are turning that switch ON. We are in effect telling the compiler to create a global variable called LccWin32 and to set its value to TRUE. The last few lines of the above code act much like an IF-Then-EndIF statement. Where; IF: LccWin32 is defined, (i.e. TRUE), Then: include file: tcconio.h, EndIF: Otherwise, we would have defined Power_C. If you are using another compiler, then use the name of that compiler instead, I'll explain why later. Okay, that done, let's define some constants. Globally defined constants are fixed-variables, as far as the compiler is concerned, whose values we may want to change at some future point. But, rather that trying to track them all down where ever they are, and change them throughout the program, they need only be changed here, once. All constants must be declared prior to the point where they are used within the program. In fact, unlike GW-Basic, which allows you to declare variables as you use them, C requires that all variables must be declared beforehand. Add these lines of code just beneath our includes:

/* --- declare constants --- */ #define BUFSIZE 81 #define LINE_NUM 6 #define TOKEN_LEN 21

Now we need to declare some global variables that we will be using. Add these lines of code to bxbasic.c:

5

/* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* pointer to program array char **array2; /* pointer to line number array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder char token[TOKEN_LEN]; /* the token string char *xstring; /* the print string int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token

*/ */ */ */ */ */ */ */ */ */ */ */ */

The above lines declare variables whose scope is global, versus locally declared variables whose scope exists only within the function in which they are declared. These variables are accessible from all functions throughout the program. As with all variables, global variables must be declared prior to their usage. By default, all C variables are local, unless declared here as global. To the side of each variable is a description of its intended purpose. Some are character strings and character arrays and two are integers. Now we need to declare some "function prototypes" for the functions and subroutines we will be constructing in this chapter. Add these lines, beneath our global vars (variables):

/* ----void void void void void void void void void void void

function prototypes ----- */ a_bort(int,int); line_cnt(char *argv[]); program_array(void); pgm_parser(void); get_token(void); parser(void); beep(void); cls(void); xstring_array(void); get_prnstring(void); go_to(void);

Prototypes are very much like declaring a variable or a constant. Most versions of C require that all functions be declared (or prototyped) prior to their usage. Example: void a_bort(int,int) a_bort is the function name. The word "void" in front of the name signifies that this function does not return a value to the calling process. (int,int) signifies that 2 integer values are passed to function a_bort. Thus "void a_bort(int,int)" returns no value and it receives 2 integers upon entry. Okay, our basic skeleton is done, for the moment. We'll make changes and add to it as we progress.

6

THE MAIN PROGRAM
Unlike C, Basic programs generally begin at the first executable line of code, such as line #1. C programs on the other hand always begin with the first executable line of code within function "Main" and function "Main" need not be at the beginning of the program. In fact, it is quite common to find Main at the very end of the program source listing. If no function Main exists, the compiler will probably throw a fit. For now, since we can't compile a single line of code and test it without Main, our function Main will be at the beginning of our program. Now add function "main" to bxbasic.c:

/* ----- begin program ------------- */ int main(int argc, char *argv[]) { printf("BXBasic Interpreter\n"); return 0; } /*-------------------------------*/

That done, compile what we have so far, to make sure all is working as expected. **(if you are having trouble at this point, feel free to ask questions). Hopefully,...... it compiled without any errors. Let's do a bug test: from the DOS prompt, execute: and press [Enter].

bxbasic.exe

**(if you are unfamiliar with using DOS or the Command Line, just ask.) Hopefully,......it executed properly.

ERROR HANDLER
Okay, to make sure the program is executing in a predictable manner and rather than just crashing when it encounters a problem, we need to add an abort routine that will, at the very least, tell us what might have gone wrong, such as forgetting to add the name of our Bxbasic program on the command line. Add the following code for function a_bort to our program, beneath "main":

7

void a_bort(int code,int line_ndx) { switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasic program_name.bas\"\n"); printf("code(%d)\n",code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

Now let's alter our function "main" so that it reads as follows:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("BXBasic Interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } return 0; } /*-------------------------------*/

Now recompile Bxbasic.c. Let's test this by executing Bxbasic from the command line, again. Just type:

Bxbasic
and press [Enter].

Well, function "a_bort" has given us an error message, something about needing to add the Basic program name and it tells us that this was error code number (1). Now type:

Bxbasic test.bas
and press enter. This time, no error message! The key is in the parameters passed to function main: "(int argc, char *argv[])". This refers to command line arguments passed to the program by the operating system. "argc" is the number of arguments and "argv[]" is a string array that holds the separate arguments. The line: if(argc != 2) is where the program determines if a "basic" program name was specified on the command line, such as:

8

C\> Bxbasic test.bas If none was specified, the program terminates by calling the error handler, function "a_bort", passing to it the proper abort-code.

LOADING THE PROGRAM - 1
Now that we can specify our source file's name, we need a method of reading-in our source code. What I'd like to do, is to be able to read the entire source file into memory, into in an array, with each array element holding an entire line of code. To do this we need to know how wide and how long our program array has to be. We have already established that our program lines will be a maximum of 80 characters in width, so that's one dimension of our array. Now we need to know how large it has to be, i.e.: how many lines it has to hold. So, let's start off by counting them, we'll use function "line_cnt" to do that, here it is:

void line_cnt(char *argv[]) { int line_counter=0, ab_code=2; int fnam_len, x=0; fnam_len = strlen(argv[1]); fnam_len++; prog_name = malloc(fnam_len * sizeof(char)); strcpy(prog_name, argv[1]); f_in = fopen(prog_name,"r"); /* does program_name.bas exist */ if(f_in == NULL) /* file not found */ { a_bort(ab_code, x); } else { while(!feof(f_in)) /* until EOF, read-in and */ { fgets(p_string, BUFSIZE, f_in); /* count each line */ if(!feof(f_in)) { line_counter++; } } fclose(f_in); } nrows=line_counter; } /*-------------------------------*/

Copy function "line_cnt" to bxbasic.c and place it just under function "a_bort". "line_cnt" receives the source file name as a parameter, within the parenthesis, in the statement: char *argv[] On attempting to open the source file, if the filename is not found or the file can't be opened, "a_bort" is called with the appropriate error code. So that we can test this, modify the code for functions "main" and "a_bort" so that they read as follows:

9

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("BXBasic Interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); return 0; } /*-------------------------------*/ void a_bort(int code,int line_ndx) { switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasic program_name.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxbasic program_name.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

Compile bxbasic.c Test it by entering:

Bxbasic test.bas

Okay, now we are able to test for "no source filename provided" and "file not found" situations. So, now we need to create "Test.bas" so that we can attempt to successfully read-in our source file. Let's start with something simple, copy these lines to a new text file: 1 2 3 4 5 REM test.bas version 1.1 CLS PRINT "hello world!" BEEP END

10

and save it in the same working directory as:

test.bas
Once again, at the command line enter: and press [Enter]. Nothing happened, did it ? Hmm,..., or did it ? Actually, it did exactly what it was supposed to do. It located test.bas, read it in and counted the number of lines in the file and terminated normally. We just didn't ask it to report to us the number of lines it found. Let’s fix that by adding one line of code, a print statement, to "main": int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); printf("nrows=%d\n",nrows); return 0; } /*-------------------------------*/

Bxbasic test.bas

Now re-compile and try it.

nrows=5

(displayed)

Okay, that seems to be working fine. Now delete that last line we added or /*comment it out*/, we won't need it any more. **Note: examine the listing for test.bas, we've already begun to define our language. 1 2 3 4 5 REM test.bas version 1.1 CLS PRINT "hello world!" BEEP END

•

•

First, each line begins with a line number, which is located in column 1, at the far left of the line. Many languages, including assembly language, allow for what are referred to as "labels" and labels generally begin at column 1. So, don't think of the numbers as line-numbers, but, think of them as labels. In the near future we will make labels optional, i.e.: not every line needs a number. Since in "Standard Basic" linenumbers were required and GW-Basic required them and QuickBasic allowed them, we'll do the same. Second, statements are indented, or, at least don't begin at column 1. We'll play with this concept further down the road, but for now, we'll assume each line has a "label" which is then followed by a "keyword".

11

•

Third, all "keywords" are in caps (uppercase), that's "Standard Basic".

LOADING THE PROGRAM - 2
Now we know our routine can locate our source file. It can read it in and correctly count the number of lines of source code. Not bad ! Now we know how long our program array has to be. What we need now is a routine that will create our two dimensional program array[ ][ ] and load the source file into memory. Let's call this function "program_array". Here it is:

void program_array() { int ii, len, pi; char ch; char ln_holder[LINE_NUM]; array1 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array1[ii] = malloc(ncolumns * sizeof(char)); } array2 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array2[ii] = malloc(LINE_NUM * sizeof(char)); } f_in = fopen(prog_name,"r"); ii = 0; while(!feof(f_in)) /* p_string holds incoming-data { fgets(p_string, BUFSIZE, f_in); if(!feof(f_in)) { len = sizeof(p_string); /* pass p_string to array1[] strcpy(array1[ii], p_string); array1[ii][len] = '\0'; /* add string terminator pi = 0; /* ----- fill array2[0] here ----ch = p_string[pi]; while(isdigit(ch)) { ln_holder[pi] = ch; pi++; ch = p_string[pi]; } ln_holder[pi] = '\0'; strcpy(array2[ii], ln_holder); } ii++; } fclose(f_in); } /*-------------------------------*/

*/ */ */ */

Function "program_array" begins by dynamically creating two arrays;

array1
and

array2.

12

Array1:
that's our program array and it is dimensioned nrows by that is; a length of 5 rows by a width of 81 columns. Array2: is a storage area to keep track of our line labels. Following that, it opens the test.bas source file for input. A while-loop, shown here; while(!feof(f_in)), reads the input file and stores each line in the program array. We use the C fgets function to read-in each line of text from the source file and place it in p_string, which acts as our input buffer. p_string is then copied to the array and a string terminator, "\0", is added to the end. The line labels are collected from the beginning of each line and placed into array2 and then it closes the input file when we've read the last line. Now, add a Call to "program_array", to "main" so that it reads:

ncolumns,

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); program_array(); return 0; } /*-------------------------------*/

Before we compile and test this, let's add a diagnostic print statement to program_array, so that we can see what it's reading in. Near the end of program_array, where I've indicated below, add this print statement: printf("%s",p_string); [as shown here, beginning with line 29] ln_holder[pi] = '\0'; strcpy(array2[ii], ln_holder); printf("%s",p_string); }

13

Okay, compile Bxbasic.c and run it;

Bxbasic test.bas
If it displayed our source code, then were doing ok. You can delete that last print line we just added, it’s no longer needed.

PROGRAM PARSER
Now that our program is loaded into memory, the next thing we need is a function to serve as a program parser, to control the program flow. Here's the code for function "pgm_parser", copy this to bxbasic.c :

void pgm_parser() { line_ndx = 0; while(line_ndx < nrows) { s_pos = 0; e_pos = 0; get_token(); parser(); line_ndx++; } } /*-------------------------------*/

As you can see, "pgm_parser" is a fairly short routine. All it does is create a continuous while-loop of calling two other functions, get_token() and parser(). It does this while the condition exists (true) that the value of variable, line_ndx is less than the value of nrows. After returning from the two routines, the variable line_ndx is incremented and the loop repeats, as long as: line_ndx is still less than nrows. The program terminates normally when line_ndx is equal to nrows. Next, we need to fetch the keyword and store it in "token". Here is the code for function "get_token" :

void get_token() { char ch; int pi=0, ti=0, ab_code=3; int stlen, x=line_ndx; strcpy(p_string, array1[line_ndx]); stlen = strlen(p_string); ch = p_string[pi];

14

(Continued) while((isupper(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } if(pi == stlen) { a_bort(ab_code, x); } while(isupper(ch)) { token[ti] = ch; ti++; pi++; ch = p_string[pi]; } token[ti]='\0'; } /*-------------------------------*/

Add the above code to bxbasic.c. If you recall, p_string was our input buffer, while reading in our source file. Well, we are going to be using p_string thru-out the program. It will become our general purpose data string. p_string will always hold the current program line information we are working on, at any given moment. Two other variables we will see a lot of are "ch" and "pi" and they will be used in conjunction with p_string. "ch" is a character variable and will point to the current character (thus "ch") we are working with or testing. "pi" is an integer variable and will be the program line "position index" (thus "pi") which will point to the current character position being tested. Here is where we first begin to define our language. Our keywords (or tokens) will all be made up of uppercase alpha characters, such as: REM, CLS, PRINT, GOTO. That simplifies detecting a program statement from anything else on the program line and helps us to predict what should come next. get_token begins by copying the current program line, from the array, into p_string. It then advances to the first uppercase alpha character. If, by some chance, it never encounters a valid token it then calls the a_bort routine.

TOKEN PARSER
Now that we have our token, we need a parser to decipher the token and execute the program statement. Here is the code for function "parser":

void parser() { int ab_code=4, x=line_ndx; if(strcmp(token, "REM") == 0) { /* return */ }

15

(Continued) else if(strcmp(token, "PRINT") == 0) { xstring_array(); get_prnstring(); } else if(strcmp(token, "GOTO") == 0) { go_to(); } else if(strcmp(token, "BEEP") == 0) { beep(); } else if(strcmp(token, "CLS") == 0) { cls(); } else if(strcmp(token, "END") == 0) { printf("\nEnd of Program\n"); line_ndx = nrows; } else { a_bort(ab_code, x); } } /*-------------------------------*/

Add this code to bxbasic.c. Parser uses a simple, although not especially efficient, series of IF-ELSE statements to compare the string held in token with the string in the double quotes. If the two strings match, the code in that block is executed. If a match is not found, then program flow branches to the error handler routine. Each of the Basic Keywords represents a different course of action, such as CLS (clear screen), BEEP, GOTO, PRINT, etc. We can't compile and test this yet, because we haven't added the individual routines that each keyword represents. So that we can test what we have, modify parser by commenting out everything from PRINT to CLS, so that only REM, END and the error handler will compile, as follows;

int parser() { int ab_code=4, x=line_ndx; if(strcmp(token, "REM") == 0) { /* return */ } /* <--here else if(strcmp(token, "PRINT") == 0) { xstring_array(); get_prnstring(); }

16

(Continued) else if(strcmp(token, "GOTO") == 0) { go_to(); } else if(strcmp(token, "BEEP") == 0) { beep(); } else if(strcmp(token, "CLS") == 0) { cls(); } ---> */ else if(strcmp(token, "END") == 0) { printf("\nEnd of Program\n"); line_ndx = nrows; } else { a_bort(ab_code, x); } } /*-------------------------------*/

Next, we need to add a Call to pgm_parser, to the function "main" to set things in motion:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); program_array(); pgm_parser(); return 0; } /*-------------------------------*/

Now, compile: bxbasic.c and execute:

Bxbasic test.bas
What the “!#%^&$” ? "Program aborted, undefined error." ? Oops!

17

What went wrong ? Well, actually nothing,... not really. Bxbasic correctly "interpreted" the source code for test.bas and acted accordingly. Let's look at the source code for test.bas: 1 2 3 4 5 REM test.bas version 1.1 CLS PRINT "hello world!" BEEP END

Do you see the problem ? Line 2 calls CLS. The problem is that CLS is not yet a valid keyword. Remember we just commented it out. So, what Bxbasic did was; • • it correctly interpreted REM and if you look at the code block for REM you will see that all it does is free(token); and then return to the calling function, it seeks CLS, but, does not find it, because it doesn't exist yet and falls thru to the error handler at the final "else". If you look at the block for "else" it reads: else { a_bort(ab_code, x); }

That part is okay, but look at the value for ab_code, at the top of parser: int parser(int line_ndx) { int ab_code=4, x=line_ndx;

ab_code equals 4. Now let's take a look at our error handler a_bort:

18

{

case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasic program_name.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxbasic program_name.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break;

}

as you can see, there is no case 4. There isn't even a case 3. So, in that event, it falls thru to the default message: "Program aborted, undefined error.". Plain enough, let’s fix that. We need a case 3 and a case 4. To fix case 3 we need to look back at get_token:

void get_token() { char ch; int pi=0, ti=0, ab_code=3;

that's where it errors if no uppercase alpha was found. Here's the code for "case 3" and "case 4", add them to a_bort:

case 3: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Keywords must be in UpperCase:\ncode(%d)\n", code); break; case 4: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Unknown Command.\ncode(%d)\n", code); break;

Now, recompile Bxbasic.c Run it again:

Bxbasic test.bas
19

Doesn't that make more sense now ? The error message is pretty exact and to the point. It tells us where the error is located, what the error looks like and what the source of the problem is. Okay, so let’s fix that. Since we're not ready to add all the routines that we're missing, let's just do a quick fix so that it works without any errors by adding some REMs. Change test.bas so that it reads: 1 2 3 4 5 REM test.bas version 1.2 REM CLS REM PRINT "hello world!" REM BEEP END

Save test.bas and run it again. Much better ! We have a "normal" program termination.

EXPAND VOCABULARY
Now we can expand our language vocabulary by adding the missing pieces. We'll begin with BEEP. Here's the code for "beep" : void beep() { printf("%c", 7); } /*-------------------------------*/

Add this code to bxbasic.c.

20

Well that was painless. This is the smallest function in the program and all it does is sound a beep on the CPU's speaker, but, this can be a useful tool. Ascii character (7) is the "bell". Even though it makes a beep sound, for historical reasons, it's called the bell. Next, let's add CLS, so that we can clear the display screen. Here's the code for function "cls": void cls() { #ifdef Power_C clrscrn(); /* CLS - PowerC.ver */ #endif #ifdef LccWin32 clrscr(); /* CLS - LCC.ver */ #endif } /*------ end cls ------*/

Add this code to bxbasic.c. Well, I admit this looks a little congested, but, it's really pretty simple: How you clear the display screen, it turns out, is dependent on the compiler. Recall at the very beginning of Bxbasic when we "defined" the compiler we used: #define LccWin32 well, this is a place where that comes in really handy. Instead of having to rewrite entire segments of code, depending on the compiler, we can include all the code needed and let the compiler sort it all out. In this case, only the spelling of the clear-screen function differs between compilers. If you are using a compiler other than the two listed, add the correct call for clear screen for your compiler, in the same manner as the above. Now, go back to parser and remove the comments from BEEP and CLS, so that only PRINT and GOTO are commented out, like so: /* else if(strcmp(token, "PRINT") == 0) { xstring_array(); get_prnstring(); } else if(strcmp(token, "GOTO") == 0) { go_to(); } */ Recompile Bxbasic.c. Now, we need to make some changes to test.bas. Remove the REMs from lines 2 and 4, so that it reads:

21

1 2 3 4 5

REM test.bas version 1.3 CLS REM PRINT "hello world!" BEEP END

Save test.bas and test it:

Bxbasic test.bas
Okay, now we're getting some where! Our vocabulary at present is: REM, CLS, BEEP and END. Okay, so it's not all that impressive, but, we actually have a working “scripting-engine" that's up and running, (in a small way). Let's add PRINT. Here's the code, copy it to bxbasic.c :

void xstring_array() { char ch; int pi=0, si=0, ab_code; int stlen, x=line_ndx; stlen = strlen(p_string); while((ch != '\"') && (pi < stlen)) /* scan for 1st quote { pi++; ch = p_string[pi]; } s_pos = pi; ch = ' '; while((ch != '\"') && (pi < stlen)) /* scan for 2nd quote { si++; /* keep a count of number of characters pi++; ch = p_string[pi]; } if((si <= 1) && (pi < stlen)) /* if it's an empty string, error { ab_code=5; a_bort(ab_code, x); } else if(pi >= stlen) /* if no quoted string found, error { ab_code=6; a_bort(ab_code, x); } else { si++; /* increment data count and allocate xstring = malloc(si * sizeof(char)); } } /*-------------------------------*/

*/

*/ */

*/

*/

*/

22

(Continued) void get_prnstring() { char ch; int pi=s_pos, si=0; pi++; ch = p_string[pi]; while(ch != '\"') /* copy characters until 2nd quote { s_holder[si] = ch; pi++; si++; ch = p_string[pi]; } s_holder[si]='\0'; /* add NULL terminator to end of string strcpy(xstring, s_holder); /* transfer data to print string pi++; ch = p_string[pi]; if(ch == ',') /* print string followed by a tab { printf("%s%c", xstring, '\t'); } else if(ch == ';') /* print string only { printf("%s", xstring); } else /* print string followed by newline { printf("%s\n", xstring); } free(xstring); } /*-------------------------------*/

*/

*/ */ */ */ */

In the first function, the "while-loop" says: until you find a double-quote (") keep moving down the line, one-step at a time. Specifically, we are looking for a "quoted string". When we find the first quote, we will be at the beginning of the string. If a program line read: 10 PRINT "hello world" ---------^

we must skip over everything that comes before the first " (double quote). In the second while-loop, we need to capture each character up to the second ", which signals the end of the string. "s_holder" (for string_holder) is our character array that will save each character as we move down the string. Integer variables "pi" and "si" are the position pointers (like arrows) pointing to the next character to fetch in p_string and the next storage cell in s_holder.

23

Example: 10 PRINT "hello world" p_string: [10 PRINT "hello world"] pi------^ s_holder: [hello ] si------^ pi=14 si=4

"pi" and "si" each point to a different location and are incremented after each step. "ch" is planted with the next character and the character is then stored in s_holder. When the final " is reached, we are done. Before we can test the print function, we need to update our error handler. Copy these additions to a_bort:

case 5: printf("\nSyntax error: in program line: %d.\n", (line_ndx+1)); printf("%s", p_string); printf("Empty string or double quotes.\ncode(%d)\n", code); break; case 6: printf("\nSyntax error: in program line: %d.\n", (line_ndx+1)); printf("%s", p_string); printf("No closing quotes.\ncode(%d)\n", code); break;

and change the comments in parser so that only GOTO is commented out, like so: /* else if(strcmp(token, "GOTO") == 0) { go_to(); } */ Okay, now recompile bxbasic.c. We also need to fix test.bas. Remove the REM from line 3 so that it reads: 1 2 3 4 5 REM test.bas version 1.4 CLS PRINT "hello world!" BEEP END

Okay, now you can run it, type; Bxbasic test.bas

24

• • • • •

If all went as planned; line 2 cleared the display, line 3 printed hello world! , line 4 made a beep sound, line 5 terminated the program normally.

PROGRAM BRANCHING
Basic has always be berated because of its unstructured nature and the over use of the GOTO command. Truth be known, the GOTO command is just another means of redirecting program flow, even C has a "goto" command. The x86 assembly language has a "goto" command, actually several of them, only their not called GOTOs, their called "JMP"s, short for JUMPS. A quick glance at my x86 reference manual and I counted 32 different JUMPS, but, they all do one thing in common, they GOTO ! A GOTO or a JUMP is an explicit jump to another part of the program, without a return command, (unless you use another jump to come back), such as in a GOSUB or a CALL command. Basic language programs can be constructed to be as structured as any other programming language, it's up to the programmer. That said, let's GOTO. Here is the code we need, add this to bxbasic.c :

void go_to() { char ch; char gtl_holder[LINE_NUM]; int pi=0, lh=0, ab_code; int xtest, stlen, x=line_ndx; ch = ' '; gtl_holder[0] = '\0'; while(isupper(ch) == 0) { ch = p_string[pi]; pi++; } while(isupper(ch)) { ch = p_string[pi]; pi++; } ch = p_string[pi]; if(isdigit(ch) == 0) { ab_code=7; a_bort(ab_code, x); } while(isdigit(ch)) { gtl_holder[lh] = ch; pi++; lh++; ch = p_string[pi]; } gtl_holder[lh] = '\0';

/* advance to the word: GOTO */

/* advance past the GOTO */

/* error, expected a number */

/* add string terminator */

25

(Continued) pi = -1; /* now compare gtl_holder[] to array2[n] */ xtest = -1; while(xtest != 0) { pi++; xtest = strcmp(array2[pi], gtl_holder); if(pi == nrows) { strcpy(t_holder, gtl_holder); ab_code=8; a_bort(ab_code, x); /* error, line not found */ } } pi--; line_ndx = pi; } /*-------------------------------*/ /* set line_ndx to the goto_line */

In a Basic statement such as:

GOTO 100
we aren't necessarily expecting to jump to the 100th program line. What we are really expecting to do is, jump to the line that has a label called "100". We begin by extracting the label name that we will be looking for. In this case: "100". Then we search the label name array, (array2), searching for a match. When we find a match, what we are really after is the array["index"], since each program line is stored sequentially within the array, all we need to know is which program-line-number relates to which array index. i.e.: here is how the program looks stored in the array: line no. 10 20 30 40 50 array2[0]--> [1]--> [2]--> [3]--> [4]--> keyword/statement CLS: BEEP: PRINT "hello world": LET a$="that's all folks": GOTO 20: [10] [20] [30] [40] [50]

In this example, "array2" contains the Basic program line numbers, or rather labels. Therefore, in the Basic statement, in line 50, it says:

GOTO 20:
Our while-loop scans thru array2[] looking for "20" and finds it at array2[element 1]. The program would then branch to program_array[line 1].

26

**Note: we have to adjust the line-number by decrementing it by 1, because "pgm_parser" will automatically increment it when we return from here and branch to that line. Before we progress, we have to make some modifications to the error handler. Replace function "a_bort" with this version:

void a_bort(int code,int line_ndx) { free(array1); free(array2); free(prog_name); free(token); free(xstring); beep();

/* clear array memory */

switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasic program_name.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxbasic program_name.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 3: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Keywords must be in UpperCase:\n"); printf("code(%d)\n", code); break; case 4: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Unknown Command.\ncode(%d)\n", code); break; case 5: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Empty string or double quotes.\n"); printf("code(%d)\n",code); break; case 6: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("No closing quotes.\ncode(%d)\n", code); break; case 7: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Use format: GOTO 1234:\n"); printf(" single space--^\n"); break;

27

(Continued) case 8: printf("\nGOTO Error: no such line-number:"); printf(" %s:\nin program line:",t_holder); printf(" %d:\n%s",(line_ndx+1),p_string); printf("Program Terminated\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

Now modify "main" so that it reads as follows:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); program_array(); pgm_parser(); /* --- end of program --- */ free(array1); free(array2); free(prog_name); return 0; } /*-------------------------------*/

and remove the comments from around the GOTO statement in "parser": /* <-- remove else if(strcmp(token, "GOTO") == 0) { go_to(); } <-- remove

*/

Okay, save bxbasic.c and compile it. But, don't run it yet ! One last thing to do, we need to add some GOTOs to our Basic program: test.bas,

28

1 REM test.bas version 1.5 2 CLS 3 PRINT "hello world!" 4 BEEP 5 GOTO 9 6 PRINT "Back at line #6" 7 PRINT "The End." 8 END 9 PRINT "Now at line #9" 10 BEEP 11 GOTO 6

Save test.bas and give it a whirl. Pretty neat, huh ? Here's a flow chart for test.bas :

And, that is how we do that !

CONCLUSION
Play around with test.bas, come up with your own examples. Try to see where it breaks and if the error handler catches the errors. At this stage there may be bugs in Bxbasic, but, we'll uncover them as we progress, so don't worry too much if you do find a bug in the code, it's expected. I hope this has been a positive and interesting learning experience and you look forward to the next Chapter. In Chapter 2 we'll cleanup some of the code for Bxbasic Chapter 1 and start working with variables and assigning values. Also, I hope I haven't injected too many typos into this chapter, I hate reading other peoples stuff and finding all their mistakes and typos. In case I've left something out or missed something somewhere, I'm attaching the entire source code for bxbasic.c version 1 along with this document.

29

CHAPTER - 2
INTRODUCTION
At this point, our parser can: recognize a comment: REM, recognize the END of program and execute a: PRINT, GOTO, BEEP and CLS. The un-optimized compile of Bxbasic.c with LccWin32 is currently at 33K, that's not too bad, but we have a long way to go. We're also going to start breaking the program down into smaller units by putting things into modules, rather than just dumping everything in Bxbasic.c. Then, we'll jump into working with variables and assigning values to them and add some more functionality.

MODIFYING BXBASIC
Due to the fact that our error handler is rapidly going to out grow Bxbasic.c, we're going to create it's own file. 1) with your editor, create a new file in the same working directory and name it "Error.c", 2) now copy this prototype at the very top:

/* bxbasic : error.c : alpha version */ /* ----- function prototypes ----- */ void a_bort(int,int);

3) now, after adding a couple of blank lines, copy the entire function a_bort() into the file "Error.c", save it and close it. 4) delete function a_bort() from Bxbasic.c. 5) in the function prototypes, near the top of Bxbasic.c, delete the prototype for a_bort(), /* ----- function prototypes ----- */ void a_bort(int,int); <---- delete

6) now, beneath the prototypes and just above "main", add this new code:

/* --- include functions --- */ #include "error.c"

30

by placing this "include" here, it tells the compiler to load that module and compile it as well. In the near future we will be adding more to this list. I'd like to make a modification to function get_token(), one that will be useful later on and add a new function. This new function, called get_upper(), will be used when we need to find an uppercase alpha. Here is the new code for get_token() and get_upper() :

void get_token() { char ch; int pi=0, ti=0, ab_code=3; int stlen, x=line_ndx; strcpy(p_string, array1[line_ndx]); stlen = strlen(p_string); pi = get_upper(pi, stlen); ch = p_string[pi]; if(pi == stlen) { a_bort(ab_code, x); } while(isupper(ch)) { token[ti] = ch; ti++; pi++; ch = p_string[pi]; } token[ti]='\0'; e_pos = pi; } /*-------------------------------*/ int get_upper(int pi, int stlen) { char ch; ch = p_string[pi]; while((isupper(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/ /* ----- function prototypes ----- */ ... ... int get_upper(int,int);

Okay, add these to Bxbasic.c, along with the new prototype for function get_upper and save it. Now compile Bxbasic.c and run Test.bas again. Make sure it's still working. Then, change Test.bas in a way that will cause an error, to make sure the error handler is still with us.

31

VARIABLES
There are a number of ways in which variables can be dealt with. In the C language there are different mechanisms that can be employed; unions, structures, arrays and maybe others as well. When I began this study, I pondered at length just how to dynamically create a variable at a given point in the program and have it be globally accessible, from anywhere in the program. If you will recall, in C, you have "global variables" and you have "local variables". Local variables are limited in scope. That is, they exist only within the confines of the function in which they are created. From the perspective of a local variable, it lives in a sealed box. So, that leaves out local variables as an option. The only other possibility is to create our variables in the outside world and access them globally. If you remember, when we declared our global vars, like "p_string", "token" and others, we were able to use them at will, throughout the program. So, what we need is to do something like that. But, that raises another interesting problem: "How do you dynamically create a variable, long after the program is compiled and running ?" In this instance, the "program" is Bxbasic itself ! This has nothing to do with Test.bas. If a test.bas statement says: LET MyName$ = "Steve" how am I going to create a variable, within Bxbasic, at runtime, called "MyName$" ? In C, you will recall, you have to declare your variables even before the program is compiled. Well, that's not very "dynamic". I suppose you could limit your variables to : a,b,c,....x,y,z or A,B,C,...X,Y,Z but that's not very pretty either. That limits you to a grand total of 26 uppercase vars and 26 lowercase var. Or, you could do: aa, bb, cc,...,zz and: AA, BB, CC,...,ZZ (ad infinitum.) That's less than perfect and that's a lot of declaring to have to do. Let's determine what constitutes a variable, what are it's requirements: 1) it needs a name, 2) the name has to be of undetermined length, but within limits, 3) it has to be of a particular "type", i.e.: character, string, integer, etc. 4) based on the type, memory space is needed to store the value, 5) a means of telling a character string from an integer value : $,%,!,# examine this diagram: variable 1)--> name 2)--> [abc ] [count% ] [quantity ] [ ] [ ]

value [100] <--(4 [10 ] [1 ] [ ] [ ]

32

Based on the above list, this diagram illustrates most of what we need. Notice how much it looks like an array, or, rather a pair of arrays? One is a character array, for holding the variable's names and the other a value array, to hold the "numeric value". Since variable names can vary in length, we need to allow for several characters. But how many characters. Ten, twenty ? And, we don't know how many variables we will be needing. Ten, one hundred, one thousand ? To answer this, let's do an experiment. Let's start off simple. We'll begin with an "integer type" of variable and we'll create a character "name" array that will allow fifteen alpha characters for the name. Let's start with the possibility for one hundred variables. So, that's a two dimensional character array that's one hundred in length, by fifteen in width and a second integer array, that's one hundred in length, to hold the value. That's easy enough. Our variable arrays will look something like this: variable name [ [ [ [ [ ... ] ] ] ] ] value (data) [ ] [ ] [ ] [ ] [ ] ...

The first thing we need to do is to define two more constant. We'll call them: MAX_VARS and VAR_NAME. MAX_VARS will equal 100 and VAR_NAME will equal 16 (remember, strings need a terminating '"0" at the end). Add those to the list of constants as shown here:

/* --- declare constants --- */ #define BUFSIZE 81 #define LINE_NUM 6 #define TOKEN_LEN 21 #define MAX_VARS 100 #define VAR_NAME 16

Next, we'll create the "integer value" array and for short we will call it the "iv_stack" . Then, the "integer name" array, which we'll call the "in_stack". Add these to the list of "global vars", beneath "ncolumns":

33

/* ------ global vars ------------ */ ... int iv_stack[MAX_VARS]; /* stack:integer variable values */ char in_stack[MAX_VARS][VAR_NAME]; /* integer variable names */

Now, how do we make this work? The first thing we need to do is to clear the arrays. That is, initialize the "value stack" to zero and null the "name stack". That way we can be assured that they will be empty when we begin. So, to do that let's add an initialization function :

void init_vars() { int ndx; for(ndx=0; ndx < MAX_VARS; ndx++) { iv_stack[ndx] = 0; in_stack[ndx][0] = '\0'; } } /*-------------------------------*/

Copy this code to the end of Bxbasic.c. And, add this to the function prototypes:

/* ----- function prototypes ----- */ ... void init_vars(void);

We will also need to instruct "main()" to initialize our arrays at the start of the program. So modify "main()" to include a call to "init_vars()", just above "pgm_parser()" :

34

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); program_array(); init_vars(); pgm_parser(); /* --- end of program --- */ free(array1); free(array2); free(prog_name); return 0; } /*-------------------------------*/

Okay, compile Bxbasic.c and run Test.bas Just to make sure there aren't any obvious bugs in the code. It's not going to do anything special at this point, the variables aren't plugged in yet. Now take a look at this diagram :

variable name [abc [count% [quantity ] ] ] value [100] [10 ] [1 ] index [0] [1] [2]

You'll notice that what the arrays share in common is the index. When we store a variable name in the "name stack", we will only need its index to tell us where to store the value in the "value stack". Now, in theory, if we want to store a new variable in the "stack", all we have to do is look for the first empty element in the stack, i.e.: the first null.

35

Example: variable name [abc [count% [quantity null [ null [ ] ] ] ] ] index [0] [1] [2] [3] [4] value [100] [10 ] [1 ] [0 ] [0 ]

It doesn't work looking for the first zero in "value", because zero is a valid numerical value. Any variable could be assigned the value of 0. After we have stored a variable, when we want to find it and retrieve or alter it's value, all we have to do is a stringto-string comparison of the name we have in-hand to the names in the "name stack". When we find a match, we go straight across to the "value stack", using the same index and retrieve the value. As shown here, if we need the value of variable "abc", then abc's array index points directly to the correct value array index:

variable name [abc [count% [quantity [ [ index value ] -->[0] -->[100] ] [1] [10 ] ] [2] [1 ] ] [3] [0 ] ] [4] [0 ]

null null

36

ASSIGNMENTS
Now that we have a place to put our variables, we need the ability to assign a value to them. A typical statement might read:

LET a = 10
so here's the code for "parse_let()" and three more small utility functions that we will be needing :

void parse_let() { char ch, varname[VAR_NAME], cvalue[6]; int pi, si=0, stlen, ndx=0, x=line_ndx; int vflag=0, vi_pos=0, ivalue, ab_code; stlen = strlen(p_string); pi = e_pos; /* ------- retrieve variable name from statement ------- */ pi = get_alpha(pi, stlen); ch = p_string[pi]; if(pi == stlen) /* error: didn't find it */ { ab_code=11; a_bort(ab_code, x); } while((isalnum(ch) != 0)) /* now copy name to varname */ { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; /* ---------- we now have complete varname ---------- */ /* --- now compare name to array for existing variable --- */ while((strcmp(in_stack[ndx],varname) != 0) && (ndx<MAX_VARS)) { if(vflag == 0) { if(in_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == MAX_VARS) /* did we reach the end of the stack */ { if(vflag == 0) /* was it full, error if so */ { ab_code=12; a_bort(ab_code, x); } ndx = vi_pos; /* if not, store this variable */ strcpy(in_stack[ndx], varname); } pi = iswhite(pi); ch = p_string[pi]; /* ------- now get assignment value --------- */ if(ch == '=')

37

(Continued) { pi = get_digit(pi, stlen); ch = p_string[pi]; vi_pos = 0; while((isdigit(ch) != 0) && (vi_pos <= 5)) { cvalue[vi_pos] = ch; pi++; vi_pos++; ch = p_string[pi]; } cvalue[vi_pos] = '\0'; ivalue = atoi(cvalue); iv_stack[ndx] = ivalue;

} } /*-------------------------------*/

int get_alpha(int pi, int stlen) { char ch; ch = p_string[pi]; while((isalpha(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/

int get_digit(int pi, int stlen) { char ch; ch = p_string[pi]; while((isdigit(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/

38

int iswhite(int pi) { char ch; ch = p_string[pi]; while(isspace(ch) != 0) { pi++; ch = p_string[pi]; } return pi; /* if next char is "whitespace" */ /* get rid of it */

} /*-------- end iswhite ----------*/

/* ----- function prototypes ----- */ void parse_let(void); int get_alpha(int,int); int get_digit(int,int); int iswhite(int);

Okay, copy these new functions and the prototypes to Bxbasic.c and save it. Add these error handlers to Error.c, just above "default".

case 11: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Usage LET (variable assignment):\ncode(%d)\n", code); break; case 12: printf("\nOut of stack space: in line: %d.\n", (line_ndx+1)); printf("%s", p_string); printf("(new variable assignment):\ncode(%d)\n", code); break;

Now, we need to add a LET keyword to our parser that will call parse_let(). As I've shown here, add the LET block to parser(), just under the REM block :

39

void parser() { int ab_code=4, x=line_ndx; if(strcmp(token, "REM") == 0) { /* return */ } else if(strcmp(token, "LET") == 0) { parse_let(); }

Now save everything and go ahead and compile Bxbasic.c. With any luck it compiled without errors. Now try it with this new code for Test.bas: 1 REM test.bas version 2.1 2 CLS 3 PRINT "hello world!" 4 BEEP 5 GOTO 12 6 PRINT "Back at line #6" 7 PRINT "The End." 8 END 9 PRINT "Now at line #9" 10 BEEP 11 GOTO 6 12 LET abc = 100 13 PRINT "abc = 100" 14 GOTO 9

If it ran without breaking, the second line should have read:

abc = 100

(displayed)

Well, that's really no big deal, because if you look at line 13, we instructed it to say that. So, because of the mere fact that it didn't crash, we might safely presume that it did in fact store both the name and value for our variable "abc". Before we continue, let's confirm that it did in fact store the information. At the very end of parse_let, add the print statement as I have shown it here:

40

[snip]... } cvalue[vi_pos] = '\0'; ivalue = atoi(cvalue); iv_stack[ndx] = ivalue; } printf("name=%s value=%d ndx=%d\n",in_stack[ndx],iv_stack[ndx],ndx); } /*-------------------------------*/

Okay, save it and compile Bxbasic.c. And, run Test.bas again. This time, the second line should read:

name=abc value=100 ndx=0

(displayed)

There! That's proof that our program is correctly parsing a variable assignment statement. Since this was our first variable, it correctly stored it at index "0". Now try this version of Test.bas: (pay close attention to lines 12 and 15) : 1 REM test.bas version 2.2 2 CLS 3 PRINT "hello world!" 4 BEEP 5 GOTO 12 6 PRINT "Back at line #6" 7 PRINT "The End." 8 END 9 PRINT "Now at line #9" 10 BEEP 11 GOTO 6 12 LET abc = 100 13 LET xyz = 999 14 LET qwerty = 12345 15 LET abc = 32123 16 GOTO 9

How about that ?! Not only can we assign a new variable, we can go back and change its value. Pretty neat huh ? Function parse_let() is designed to dissect, or parse, a LET statement, such as:

LET a=1
The first part just locates the variable name.

41

In part two there's a pair of nested "if"s, that add conditions to our search. • IF "vflag" is zero, • Then IF "in_stack[ndx]" contains a "null", • Then assign "vi_pos" this index and set vflag to 1, • Then increment "ndx" and continue looping. The effect of all this is, as we scroll through the names stored in the array, we also want to be looking for an empty slot in which to store a new variable, if varname is not already in use. So, "vi_pos" is a place-holder, marking the first free-space in the array. Once we find a free-space, we no longer need to keep looking for one, so we skip over that step on subsequent loops. • • • If we did not find a match for varname in the array, then "ndx" will be equal to MAX_VARS. That tells us that we reached the end of the array. If "vflag" is equal to "0", it also tells us that there was no more stack space to hold a new variable. We have run out of available space, so we call the abort handler. If, however, "vflag" is not "0", then we assign "ndx" with the index marker held by "vi_pos" and we copy varname onto the stack at the position pointed to by "ndx". Example: ndx|name in_stack[0] [abc ] [1] [a1 ] [2] [xyz ] vi_pos = [3] [0 ] <--- free space [4] [0 ] ... ... [MAX_VARS] --- end of stack

In part three, we are now looking for the assignment value. We advance forward to the first digit of our assignment and with the use of a while-loop we copying each digit into array "cvalue[]". The digits are then converted to a numeric value. That value is then stored in the iv_stack, using the same "ndx" as in the name stack. Example:

LET sum=100
ndx|name name_stack[0] [abc ] [1] [b1 ] [2] [xyz ] vi_pos = [3] [sum ] [4] [0 ] ndx|value value_stack[0] [1 ] [1] [10 ] [2] [3 ] <-------> [3] [100] [4] [0 ]

Oh, don't forget to delete this print statement from the bottom of parse_let() : "printf("name=%s value=%d ndx=%d\n",in_stack[ndx],iv_stack[ndx],ndx);"

42

PRINTING VARIABLES
Well, the obvious next step is to be able to access our variables and do something with them. The first thing we will do is to retrieve our variables and display their values. Here's a list of what we need: 1) a function that will locate a variable name in the in_stack, 2) return it's index number, 3) using the stack-index, retrieve the value from the iv_stack 4) display the value. Here is the code for function "get_prnvar()" :

void get_prnvar() { char ch; int pi, ivalue; pi = e_pos; pi = iswhite(pi); e_pos = pi; ivalue = get_varvalue(); /* subroutine to get the var value */ pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* --- comma="tab", semi-colon="no \n", Default is: colon="\n". --- */ if(ch == ',') { printf("%d%c", ivalue, '\t'); } else if(ch == ';') { printf("%d", ivalue); } else { printf("%d\n", ivalue); } } /*-------------------------------*/

43

int get_varvalue() { char ch, varname[VAR_NAME]; int pi, si=0, ndx=0, ivalue; int ab_code=13, x=line_ndx; pi = e_pos; ch = p_string[pi]; while((isalnum(ch) != 0)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; /* --- now compare name to array --- */ while((strcmp(in_stack[ndx], varname) != 0) && (ndx < MAX_VARS)) { ndx++; /* find varname in stack */ } if(ndx == MAX_VARS) /* error: did not find it */ { a_bort(ab_code, x); } ivalue = iv_stack[ndx]; /* copy stack-value to row */ e_pos = pi; return ivalue; } /*-------------------------------*/

/* ----- function prototypes ----- */ void get_prnvar(void); int get_varvalue(void);

Copy get_prnvar(), get_varvalue() and the function prototypes to Bxbasic.c and save it. Before we can use get_prnvar(), we need to make a few modifications to our other print functions that will allow this to work. Here are new versions of xstring_array() and get_prnstring() :

void xstring_array() { char ch, quote='\"'; int pi, si=0, ab_code; int stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); e_pos = pi; ch = p_string[pi];

44

(Continued) if(ch == ':') /* if next character is a ":", get out */ { return; } if(isalpha(ch)) /* if next character is an alpha, */ { return; /* it's a varname, get out */ } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { ch = ' '; while((ch != quote) && (pi < stlen)) { si++; pi++; ch = p_string[pi]; } if((si <= 1) && (pi < stlen)) { ab_code=5; a_bort(ab_code, x); } else if(pi >= stlen) { ab_code=6; a_bort(ab_code, x); } else { si++; xstring = malloc(si * sizeof(char)); } } } /*-------------------------------*/

void get_prnstring() { char ch, quote='\"'; int pi, si=0; pi = e_pos; ch = p_string[pi]; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { get_prnvar(); return; } pi++; ch = p_string[pi];

/* if character is a colon, */ /* print a new line */ /* if an alpha, get variable */ /* and print value */

45

(Continued) while(ch != quote) /* else, it's a quoted string */ { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; pi++; pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') { printf("%s%c", xstring, '\t'); } else if(ch == ';') { printf("%s", xstring); } else { printf("%s\n", xstring); } free(xstring); } /*-------------------------------*/

Copy these new versions of xstring_array() and get_prnstring() to Bxbasic.c and save it. Examine the changes made to these functions and make sure you understand them. Let's test what we have so far. Compile Bxbasic.c And, try it with this new version of Test.bas : 1 REM test.bas version 2.3 2 CLS 3 PRINT "hello world!" 4 LET abc = 100 5 PRINT abc, 6 LET xyz = 999 7 LET qwerty = 12345 8 LET abc = 32123 9 PRINT xyz, 10 PRINT qwerty, 11 PRINT abc 12 PRINT "The End." 13 END

I hope it compiled and ran without errors. Assuming it did, you can see this thing is really working. And, quite well I might add ! Notice the commas at the end of lines: 5, 9 and 10. That tells the print routine to add a TAB. We still have a little bit of house keeping to do. We need to add two error handler routines to function a_bort. Copy these to file Error.c, in their correct order :

46

case 9: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Missing quotes.\ncode(%d)\n", code); break; case 13: printf("\nVariable not found: in line: %d.\n", (line_ndx+1)); printf("%scode(%d)\n", p_string, code); break;

Now save Error.c. Re-compile Bxbasic.c. Before you running Test.bas, change the code so that it will generate an error, such as : PRINT (or) PRINT (or) PRINT Do that now. Now run Bxbasic. Were the messages what you expected ? testing testing "testing testing 12345

ERASE VARIABLES
Now that our scripting-engine can : o create variables, o assign them values and o display them, we should have the option to clear or erase them. This is really a pretty simple task at this point, because we already have the tools to do that. When the program starts, "main()" calls init_vars(), which zeros everything out, so all we have to do to clear our variables is to call "init_vars()" again. To accomplish that, all we need to do is expand our language vocabulary by adding the keyword CLEAR to parser(). Add CLEAR to parser(), just after LET, as I've done here :

47

if(strcmp(token, "REM") == 0) { /* return */ } else if(strcmp(token, "LET") == 0) { parse_let(); } else if(strcmp(token, "CLEAR") == 0) { init_vars(); }

Save Bxbasic.c and re-compile it. Now add a CLEAR command to Test.bas, as in this example (line 12), and then run it : [snip] 6 LET xyz = 999 7 LET qwerty = 12345 8 LET abc = 32123 9 PRINT xyz, 10 PRINT qwerty, 11 PRINT abc 12 CLEAR 13 PRINT abc [snip]

Go ahead and run Test.bas. If it worked correctly, you should have gotten the error message : Variable not found: in line: 13. 13 PRINT abc code(13) (displayed)

that's because, after the CLEAR command, "abc" does not exist anymore. At present CLEAR erases all the variables at once. It doesn't give us the option to erase individual variables, yet.

VARIABLES AT WORK
It's nice that we can assign values to variables and have them displayed on the screen, but, now it's time to put our variables to work. After all, the purpose of having variables is to make them do work. Our first task will be to use a variables value to locate a character position on the display screen and print a message beginning at that position. The Basic keyword LOCATE performs just that function. So, once again, we are expanding our Bxbasic vocabulary and adding the keyword LOCATE. Here is the code for "locate()" :

48

void locate() { char ch; int pi, stlen, row_x, col_y; int ab_code=10, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* -------- rows -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; row_x = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; } else /* error: failed to find an alpha */ { a_bort(ab_code, x); } pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') /* comma separates row and column */ { pi++; pi = iswhite(pi); ch = p_string[pi]; } else { a_bort(ab_code, x); } /* -------- columns -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; col_y = get_varvalue(); } else { a_bort(ab_code, x); } /* --- now position cursor --- */ /* ---------- Power-C version ----------- */ #ifdef Power_C poscurs(row_x, col_y); #endif /* ---------- Lcc version --------------- */ #ifdef LccWin32 row_x++; col_y++; gotoxy(col_y,row_x); #endif } /*-------------------------------*/

49

/* ----- function prototypes ----- */ void locate(void);

Copy function locate() and it's prototypes to Bxbasic.c and save it. Now we need to update function parser(). Copy LOCATE to parser(), as I've shown here, between CLEAR and PRINT :

else if(strcmp(token, "CLEAR") == 0) { init_vars(); } else if(strcmp(token, "LOCATE") == 0) { locate(); } else if(strcmp(token, "PRINT") == 0)

Be sure and save Bxbasic.c. Now, add "case 10" to function a_bort(), in Error.c : case 10: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Use: LOCATE var_x, var_y: .\ncode(%d)\n", code); break;

Save Error.c and compile Bxbasic.c. Then, run this version of Test.bas: 1 REM test.bas version 2.4 2 CLS 3 PRINT "hello world!" 4 LET abc = 2 5 LET xyz = 10 6 LOCATE abc , xyz 7 PRINT "hello world!" 8 LET abc = 4 9 LET xyz = 20 10 LOCATE abc, xyz 11 PRINT "hello world!" 12 END

Real working variables !!!

50

Study the code for "locate()". The console-mode display consists of 25 rows down by 80 columns across, beginning at cursor position "0,0". i.e.: 0,0 80

25

25,80

You might have noticed one thing function locate() does not do is accommodate numeric constants. Such as: LOCATE 10, 20 or LOCATE abc, 20 Not yet. We can easily amend locate() so that it will work with numbers (constants) as well as variables. Currently, locate() tests for alpha characters only, such as this example in the rows section : /* -------- rows -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; row_x = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; }

All we have to do, to expand the capability for handling numbers, is to add a statement block that will test for digits, such as this: else if(isdigit(ch)) { while(isdigit(ch) != 0) { rows[si] = ch; pi++; si++; ch = p_string[pi]; } rows[si] = '\0'; row_x = atoi(rows); } /* this is a number */

/* convert alpha to integer */

The while-loop reads the digits and copies them to the "rows" string. Notice the last line, you will see where the C, atoi function (ascii-to-integer) is called to translate the character string "rows" into an integer. That's just about all there is to it.

51

Here is the new code for "locate()" :

void locate() { char ch, rows[3], cols[3]; int pi, stlen, row_x, col_y; int si=0, ab_code=10, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* -------- rows -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; row_x = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; } else if(isdigit(ch)) /* this is a number */ { while(isdigit(ch) != 0) { rows[si] = ch; pi++; si++; ch = p_string[pi]; } rows[si] = '\0'; row_x = atoi(rows); /* convert alpha to integer */ } else /* error: failed to find an alpha */ { a_bort(ab_code, x); } pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') /* comma separates row and column */ { pi++; pi = iswhite(pi); ch = p_string[pi]; } else { a_bort(ab_code, x); } /* -------- columns -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; col_y = get_varvalue(); } else if(isdigit(ch)) /* this is a number */ { si = 0; while(isdigit(ch) != 0) { cols[si] = ch; pi++; si++; ch = p_string[pi]; }

52

(Continued) cols[si] = '\0'; col_y = atoi(cols); } else { a_bort(ab_code, x); } /* --- now position cursor --- */ /* ---------- Power-C version ----------- */ #ifdef Power_C poscurs(row_x, col_y); #endif /* ---------- Lcc version --------------- */ #ifdef LccWin32 row_x++; col_y++; gotoxy(col_y,row_x); #endif } /*-------------------------------*/

/* convert alpha to integer */

Replace the existing locate function with this one. Save it and then re-compile Bxbasic.c. Test it by using numbers as well as variables for the x and y screen co-ordinates. Now we have a more versatile Locate function.

VARIABLES PART II
We've made a lot of progress in creating and assigning variables, but, our variables are only partially "dynamic". That is, we can create a new variable and assign a value to it, at will, and even erase it, but, o o o what if we need more than 100 integer variables ? is our method very efficient ? what if we only needed ten or twenty variables ?

If we were writing an end-user application, we could just count how many variables we needed and make sure that we had allowed for enough. But, remember, we are creating a scripting-engine that will be used to write end-user applications and we can never know how many variables of any given type that will be needed. Let's take another look back at how we created our variables.

53

First we defined the constant "max_vars" : #define MAX_VARS 100

And we followed that by declaring two global arrays: int iv_stack[MAX_VARS]; char in_stack[MAX_VARS][VAR_NAME];

The iv_stack doesn't consume that much memory space, at;

100 x integer = 200 bytes
The in_stack, on the other hand, is:

100 x 16 characters = 1600 bytes
So, our overhead is about 1800 bytes for 100 integer type variables. If we want to lengthen the variable name to 20 or 30 characters, that number jumps up to 2200 or 3200 bytes. And, that's still for only 100 variables. For our variables to be both dynamic and memory efficient, we need to be able to start with zero variables and allocate unlimited storage space as needed. To accomplish this we need to change from our static (fixed) arrays to "pointers" to arrays. We've already seen "pointers" in use, just take another look at the global vars section. **array1, **array2 and *xstring are examples of what we need to use for our variables. *xstring is a dynamic one dimensional character array, or string. While **array1 is a dynamic two dimensional character array. That is, it has both length and width. This is exactly what we need ! We need a one dimensional array for iv_stack and a two dimensional array for in_stack. Like this : int *iv_stack; char **in_stack;

Accessing dynamic arrays is no different than accessing static arrays. But, creating them is not as simple. A little more effort goes into creating and adding to a dynamic array. You have to keep track of how many variables you currently have, in order to add a new variable. This is not really a problem though. You just use an integer variable to keep a count of the total number of variables: int imax_vars;

when a new variable is added , the count gets incremented. Simple! Implementing this is going to require several small changes and some code additions.

54

So, we will start at the top : • begin by deleting MAX_VARS from the list of constants, /* --- declare constants --- */ #define BUFSIZE 81 #define LINE_NUM 6 #define TOKEN_LEN 21 #define VAR_NAME 16

• • •

copy these new versions of iv_stack , and in_stack , and the addition of imax_vars to the global-vars list: int *iv_stack; char **in_stack; int imax_vars=0; /* stack:integer variable values /* stack:integer variable names /* stack:integer variable counter */ */ */

• •

in the function prototypes, delete init_vars() from the list, add these two new function prototypes to the prototypes list, void init_int(void); void clr_int(void);

• •

from "main()", delete the call to init_vars(), replace the existing parse_let() with this one :

void parse_let() { char ch, varname[VAR_NAME], cvalue[6]; int pi, si=0, stlen, ndx=0, x=line_ndx; int vflag=0, vi_pos=0, ivalue, ab_code; stlen = strlen(p_string); pi = e_pos; /* ------- retrieve variable name from statement ------- */ pi = get_alpha(pi, stlen); ch = p_string[pi]; if(pi == stlen) /* error: didn't find it */ { ab_code=11; a_bort(ab_code, x); } while((isalnum(ch) != 0)) /* now copy name to varname */ { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0';

55

(Continued) /* ---------- we now have complete varname ---------- */ /* --- now compare name to array for existing variable --- */ while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { if(vflag == 0) { if(in_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == imax_vars) /* did we reach the end of the stack { ndx = vi_pos; /* next available stack location if(vflag == 0) { init_int(); /* initialize a new integer variable ndx = imax_vars; ndx--; strcpy(in_stack[ndx], varname); /* save new varname } else /* if not, store this variable */ { strcpy(in_stack[ndx], varname); } } pi = iswhite(pi); ch = p_string[pi]; /* ------- now get assignment value --------- */ if(ch == '=') { pi = get_digit(pi, stlen); ch = p_string[pi]; vi_pos = 0; while((isdigit(ch) != 0) && (vi_pos <= 5)) { cvalue[vi_pos] = ch; pi++; vi_pos++; ch = p_string[pi]; */ */ */ */

} cvalue[vi_pos] = '\0'; ivalue = atoi(cvalue); iv_stack[ndx] = ivalue; } else { ab_code=0; a_bort(ab_code, x); } } /*-------------------------------*/

56

•

and replace get_varvalue() with this one :

int get_varvalue() { char ch, varname[VAR_NAME]; int pi, si=0, ndx=0, ivalue; int ab_code=13, x=line_ndx; pi = e_pos; ch = p_string[pi]; while((isalnum(ch) != 0)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; /* --- now compare name to array --- */ while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == imax_vars) /* error: did not find it */ { a_bort(ab_code, x); } ivalue = iv_stack[ndx]; /* copy stack-value to row */ e_pos = pi; return ivalue; } /*-------------------------------*/

•

add these two new functions to Bxbasic.c :

void init_int() { int ndx; unsigned size; if(imax_vars == 0) { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = malloc(size * sizeof(long)); in_stack = malloc(size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); }

57

(Continued) else { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = realloc(iv_stack, size * sizeof(long)); in_stack = realloc(in_stack, size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_int -----------*/

void clr_int() { int ndx; if(imax_vars > 0) { free(iv_stack); for(ndx=0; ndx < imax_vars; ndx++) { free(in_stack[ndx]); } free(in_stack); imax_vars = 0; } } /*---------- end clr_int -----------*/

•

now, in parser(), change the CLEAR block to read : else if(strcmp(token, "CLEAR") == 0) { clr_int(); }

There, that's it! Save Bxbasic.c. Examine the code changes made to parse_let() and get_varvalue(). Be sure you understand what is happening in init_int() and clr_int(). Okay, re-compile Bxbasic.c And, try it with Test.bas. The changes should be completely transparent, in that everything appears to be working as it did before. The difference is that there is no limit on the number of variables.

58

MODULES
Now that we have a working scripting-engine, with a vocabulary of nine keywords and dynamically created variables, it's time to start shuffling some of our routines into modules of their own. Where they can be grouped by function and type. We need to create a header file that will contain the prototypes of the files we will be moving out of Bxbasic.c. Begin by : • • • creating a new file, name it "Prototyp.h", copy these prototypes to "Prototyp.h":

/* bxbasic : Prototyp.h : alpha version */ /* ----- function prototypes ----- */ /* Error.c */ void a_bort(int,int); /* Input.c */ void line_cnt(char *argv[]); void program_array(void); Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); Utility.c */ int get_upper(int,int); int get_alpha(int,int); int get_digit(int,int); int iswhite(int); Variable.c */ void parse_let(void); int get_varvalue(void); void init_int(void); void clr_int(void);

/*

/*

/*

•

And save it.

Notice the file-names in the comments, that's to help us remember where they are if we need to modify them. Now: • Delete those same prototypes from Bxbasic.c. We will now create the individual module files and move those functions into them. The first thing we will do is:

59

•

modify Error.c. Change the heading to read as follows:

/* bxbasic : error.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

• • • •

deleting the original function prototype, save Error.c and close it. Create a new file and name it "Utility.c", copy this information and these functions into file Utility.c:

/* bxbasic : Utility.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

int get_upper(int pi, int stlen) { char ch; ch = p_string[pi]; while((isupper(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/

int get_alpha(int pi, int stlen) { char ch; ch = p_string[pi]; while((isalpha(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/

60

int get_digit(int pi, int stlen) { char ch; ch = p_string[pi]; while((isdigit(ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*-------------------------------*/

int iswhite(int pi) { char ch; ch = p_string[pi]; while(isspace(ch) != 0) { pi++; ch = p_string[pi]; } return pi; /* if next char is "whitespace" */ /* get rid of it */

} /*-------- end iswhite ----------*/

•

now delete each of these functions from Bxbasic.c :

get_upper(); get_alpha(); get_digit(); iswhite();
• • • save Utility.c and close it. Create a new file and name it "Output.c", copy this information and these functions into Output.c:

/* bxbasic : Output.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

61

void beep() { printf("%c", 7); } /*-------------------------------*/

void cls() { #ifdef Power_C clrscrn(); /* CLS - PowerC.ver */ #endif #ifdef LccWin32 clrscr(); /* CLS - LCC.ver */ #endif } /*------ end cls ------*/

void get_prnstring() { char ch, quote='\"'; int pi, si=0; pi = e_pos; ch = p_string[pi]; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { get_prnvar(); return; } pi++; ch = p_string[pi]; while(ch != quote) { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; pi++; pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') { printf("%s%c", xstring, '\t'); }

62

(Continued) else if(ch == ';') { printf("%s", xstring); } else { printf("%s\n", xstring); } free(xstring); } /*-------------------------------*/

void get_prnvar() { char ch; int pi, ivalue; pi = e_pos; pi = iswhite(pi); e_pos = pi; ivalue = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* --- comma="tab", semi-colon="no \n", Default is: colon="\n". --- */ if(ch == ',') { printf("%d%c", ivalue, '\t'); } else if(ch == ';') { printf("%d", ivalue); } else { printf("%d\n", ivalue); } } /*-------------------------------*/

void locate() { char ch, rows[3], cols[3]; int pi, stlen, row_x, col_y; int si=0, ab_code=10, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* -------- rows -------------- */

63

(Continued) if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; row_x = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; } else if(isdigit(ch)) /* this is a number */ { while(isdigit(ch) != 0) { rows[si] = ch; pi++; si++; ch = p_string[pi]; } rows[si] = '\0'; row_x = atoi(rows); /* convert alpha to integer */ } else /* error: failed to find an alpha */ { a_bort(ab_code, x); } pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') /* comma separates row and column */ { pi++; pi = iswhite(pi); ch = p_string[pi]; } else { a_bort(ab_code, x); } /* -------- columns -------------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; col_y = get_varvalue(); } else if(isdigit(ch)) /* this is a number */ { si = 0; while(isdigit(ch) != 0) { cols[si] = ch; pi++; si++; ch = p_string[pi]; } cols[si] = '\0'; col_y = atoi(cols); /* convert alpha to integer */ } else { a_bort(ab_code, x); } /* --- now position cursor --- */ /* ---------- Power-C version ----------- */ #ifdef Power_C poscurs(row_x, col_y); #endif

64

(Continued) /* ---------- Lcc version --------------- */ #ifdef LccWin32 row_x++; col_y++; gotoxy(col_y,row_x); #endif } /*-------------------------------*/

•

now delete each of these functions from Bxbasic.c :

beep(); cls(); get_prnstring(); get_prnvar(); locate();
• • • save Output.c and close it. Create a new file and name it "Variable.c", copy this information and these functions into Variable.c :

/* bxbasic : Variable.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

void parse_let() { char ch, varname[VAR_NAME], cvalue[6]; int pi, si=0, stlen, ndx=0, x=line_ndx; int vflag=0, vi_pos=0, ivalue, ab_code; stlen = strlen(p_string); pi = e_pos; /* ------- retrieve variable name from statement ------- */ pi = get_alpha(pi, stlen); ch = p_string[pi]; if(pi == stlen) /* error: didn't find it */ { ab_code=11; a_bort(ab_code, x); } while((isalnum(ch) != 0)) /* now copy name to varname */ { varname[si] = ch; si++; pi++; ch = p_string[pi]; }

65

(Continued) varname[si] = '\0'; /* ---------- we now have complete varname ---------- */ /* --- now compare name to array for existing variable --- */ while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { if(vflag == 0) { if(in_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == imax_vars) /* did we reach the end of the stack */ { ndx = vi_pos; /* next available stack location */ if(vflag == 0) { init_int(); /* initialize a new integer variable */ ndx = imax_vars; ndx--; strcpy(in_stack[ndx], varname); /* save new varname */ } else /* if not, store this variable */ { strcpy(in_stack[ndx], varname); } } pi = iswhite(pi); ch = p_string[pi]; /* ------- now get assignment value --------- */ if(ch == '=') { pi = get_digit(pi, stlen); ch = p_string[pi]; vi_pos = 0; while((isdigit(ch) != 0) && (vi_pos <= 5)) { cvalue[vi_pos] = ch; pi++; vi_pos++; ch = p_string[pi]; } cvalue[vi_pos] = '\0'; ivalue = atoi(cvalue); iv_stack[ndx] = ivalue; } else { ab_code=0; a_bort(ab_code, x); } } /*-------------------------------*/

66

int get_varvalue() { char ch, varname[VAR_NAME]; int pi, si=0, ndx=0, ivalue; int ab_code=13, x=line_ndx; pi = e_pos; ch = p_string[pi]; while((isalnum(ch) != 0)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; /* --- now compare name to array --- */ while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == imax_vars) /* error: did not find it */ { a_bort(ab_code, x); } ivalue = iv_stack[ndx]; /* copy stack-value to row */ e_pos = pi; return ivalue; } /*-------------------------------*/

void init_int() { int ndx; unsigned size; if(imax_vars == 0) { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = malloc(size * sizeof(int)); in_stack = malloc(size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = realloc(iv_stack, size * sizeof(int)); in_stack = realloc(in_stack, size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_int -----------*/

67

void clr_int() { int ndx; if(imax_vars > 0) { free(iv_stack); for(ndx=0; ndx < imax_vars; ndx++) { free(in_stack[ndx]); } free(in_stack); imax_vars = 0; } } /*---------- end clr_int -----------*/

•

now delete each of these functions from Bxbasic.c :

parse_let(); get_varvalue();
• • • save Variable.c and close it. Create a new file and name it "Input.c", copy this information and these functions into Input.c :

/* bxbasic : Input.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

68

void line_cnt(char *argv[]) { int line_counter=0, ab_code=2; int fnam_len, x=0; fnam_len = strlen(argv[1]); fnam_len++; prog_name = malloc(fnam_len * sizeof(char)); strcpy(prog_name, argv[1]); f_in = fopen(prog_name,"r"); /* does program_name.bas exist */ if(f_in == NULL) /* file not found */ { a_bort(ab_code, x); } else { while(!feof(f_in)) /* until EOF, read-in and */ { fgets(p_string, BUFSIZE, f_in); /* count each line */ if(!feof(f_in)) { line_counter++; } } fclose(f_in); } nrows=line_counter; } /*-------------------------------*/

void program_array() { char ch, ln_holder[LINE_NUM]; int ii, len, pi; array1 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array1[ii] = malloc(ncolumns * sizeof(char)); } array2 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array2[ii] = malloc(LINE_NUM * sizeof(char)); } f_in = fopen(prog_name,"r"); ii = 0; while(!feof(f_in)) /* p_string holds incoming-data */ { fgets(p_string, BUFSIZE, f_in); if(!feof(f_in)) { len = sizeof(p_string); /* pass p_string to array1[] */ /* ----- fill array1[] here ----- */ strcpy(array1[ii], p_string); array1[ii][len] = '\0'; /* add string terminator */ /* ----- fill array2[] here ----- */ pi = 0; ch = p_string[pi];

69

(Continued) while(isdigit(ch)) { ln_holder[pi] = ch; pi++; ch = p_string[pi]; } ln_holder[pi] = '\0'; strcpy(array2[ii], ln_holder); } ii++; } fclose(f_in); } /*-------------------------------*/

•

now delete each of these functions from Bxbasic.c :

line_cnt(); program_array();
• • save Input.c and close it. Replace the old "function includes" list in Bxbasic.c, with this one : /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "input.c"

Done! Now, re-compile Bxbasic.c. Make sure we didn't introduce any new errors. It should compile without errors, but, if you should encounter any, backtrack and make sure you followed each step correctly. If it compiled without errors, test it out with Test.bas. Try changing Test.bas around and add some more code, get to know its limits and breaking points.

70

CONCLUSION
We've covered quite a bit in this section. We learned how to use a pair of static arrays as storage space for variables, in which we stored the variable name and the value. Ultimately we converted the process to using dynamically created arrays. Once we created variables, we learned how to access and make use of them. And, we've expanded our language definition by adding another keyword in the process. We now have integer variables, but, there are other variable types we need to add, too. We still have a long way to go.

71

CHAPTER - 3
INTRODUCTION
Welcome. In the last chapter we covered quite a lot of ground. We jumped right into creating variables, experimented with using static arrays for variables and settled on using dynamic arrays. We learned how to assign variables and retrieve them and we even put our variables to work using our new keyword LOCATE . I expect this chapter to be quite interesting, we're going to incorporate a great deal of new functionality and we're going to do some experimentation in this issue. Well, let's get started.

COMPILERS
I'd like to start this chapter with a of discussion on the subject of compilers. After all, compilers are part of what we are doing here. Since we haven't discussed byte-code, we are not really ready for a full-on byte-code compiler at this point in our project, but, I'd like to give you a hint about how byte-code compilers work. Heck, we'll even build one using the tools we have so far ! There are different types of compilers: There are Native Code compilers, which output Assembly or Machine Language for a targeted CPU and Operating System. Assembly Language requires an assembler to assemble the Assembly code into Machine Language or Machine Code. Machine Language is the actual ones and zeros (binary code) that the cpu actually understands. Example of Assembly Language: ... Display: mov mov add inc loop ...

ah, [bx] es:[di], ax di, 320 bx DISPLAY

; ; ; ; ; ;

block label get new byte move byte to memory point to new position point to next byte continue looping

Assembly Language (or Assembler) is extremely powerful, because you are communicating with the cpu directly at the cpu level. But, Assembly Language is also extremely low-level and not user friendly, in that Assembler lacks the familiar constructs of a higher level language. For instance; Assembler doesn't have the basic ability to perform something simple like:

PRINT "hello world!"
To do that requires a lengthy set of instructions that copy a string in memory, byte-for-byte, either directly into the display hardware's memory or calls a BIOS or Operating System routine to perform the operation. To write programs in Assembler requires an intimate knowledge of the target cpu, computer hardware and Operating System.

72

A high-level language Compiler or Interpreter, such as Basic, C/C++, Pascal, JavaScript and others, are comprised of large libraries of these Assembly or Machine Language routines. As an example, let's assume for a moment that from the cpu's point of view there is only one way to display a message on the screen, what differentiates one high level language from another is the command for performing that task. • • • • • One language might call it PRINT as in Basic, another might call it WRITE as in Pascal, C uses "printf", while C++ uses "cout", and an OOP language like JavaScript requires an entire sentence.

You see what I mean. In actuality, each high level language has its own libraries of routines for performing various tasks. Each language uses its own keywords and methods for calling the libraries of Machine Language routines to achieve the same or similar result. Then there are Byte-Code or P-Code (short for pseudo-code) compilers. A typical byte-code compiler doesn't convert or assemble the source code directly into Assembly or Machine code. Instead, it begins with the ScriptingEngine or Interpreter, that is made up of the entire library of Assembly or Machine Language routines that make up the specific language or dialect. Then it runs the source code through a series of filters that translate the source code into a character coded representation of the language. For instance, let's say that the keyword PRINT can be represented by the integer value zero ("0") and the source code has a print statement like this:

PRINT "hello world!"
the byte-code compiler would reduce this to something like:

0 hello world!
The Byte-Code compiler would have read the above statement, extracted and interpreted the keyword PRINT and translated it into the single byte value of zero. Next, based on the fact that the compiler now knows that this is a PRINT statement, it then proceeds to interpret the context of the PRINT statement, which in this case is the constant character string: hello world!. Both of these elements, the "token" zero and the character string, are then stored in memory until the compiler is ready to write the fully tokenized source code to a file.

73

THE BYTE-CODE
Here is an illustration of how keywords can be tokenized into byte code : Keywords PRINT LET GOTO CLS LOCATE END IF FOR ... Byte value 0 1 2 3 4 5 6 7 ...

Source code: 1 REM test.bas version 2 CLS 3 PRINT "hello world!" 4 LET abc = 2 5 LET xyz = 10 6 LOCATE abc , xyz 7 PRINT "hello world!" 8 LET abc = 4 9 LET xyz = 20 10 LOCATE abc, xyz 11 PRINT "hello world!" 12 GOTO 100 50 END

Byte-code: 2 3 0 hello world! 1 abc=2 1 xyz=4 4 abc,xyz 0 hello world! 1 abc=4 1 xyz=20 4 abc,xyz 0 hello world! 2 100 5

Each keyword is represented by a number value. At runtime, the scripting-engine does not have to read-in each letter that makes up a keyword, compare it to a list of keywords and then take action based on the result of the comparison. The engine merely reads the single token and takes direct action based on the integer value of the token.

THE MERGE
After the source code has been compiled, how and from where does the scripting engine read the compiled and tokenized source code ? Well it turns out to be a rather simple matter of attaching or merging the byte-coded source to the engine. Example:

74

•

you begin telling the scripting engine where the engine ends and the byte-code begins. This is as simple as planting the engine with an "offset" value, which is the file length of the engine, in actual bytes, not kb, scripting engine file length offset

•

the compiler interprets the source code which is now reduced to byte-code,

source code

compiler

byte-code

•

the compiler then merges the engine and byte-code,

scripting engine

byte-code

scripting engine

merge

byte-code

scripting engine program.exe scripting engine

byte-code

byte-code

offset At runtime, the first thing the engine does is it reads the "offset" value. It then opens the executable file for "Reading" and moves the file index-pointer to the "offset" position at the back end of the "program.exe". The engine now starts reading the byte-coded statements back into memory and begins executing the program. There are a number of ways that you can merge the Engine with the Byte-Code, but, it all boils down to copying the engine and byte-code into one new executable file.

75

THE COMPILER
As I stated before, we have no byte-code to work with, but, that doesn't have to stop us from constructing a rudimentary compiler and even testing it. What we can do, is just use the source code we have and pretend its byte-code. What I mean is, append the actual source code itself to the back-end of the engine. Let's get started. The first thing we need to do is put together the program skeleton for our compiler, with all the headers, definitions, declarations and prototypes, just like we did for Bxbasic when we started. • Start by opening a new file and name it "Bxcomp.c", do this in the same working directory as Bxbasic.c. If your C compiler requires it, like Lcc does, simply create a new project using the same working directory, with an object filename as "Bxcomp.exe", now copy the following into Bxcomp.c :

•

/* bxbasic: Bxcomp.c : alpha version.01 */ /* --- declare headers --- */ #include <stdio.h> #include <io.h> #include <stdlib.h> #include <string.h> /* --- declare constants --- */ #define BUFSIZE 81 #define PATH 81 /* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char t_holder[20]; /* token data holder int nrows; /* numbers of lines in source file /* ----void void void function prototypes ----- */ line_cnt(char *argv[]); merge_source(void); a_bort(int); */ */ */ */ */

As you can see, this is somewhat reduced in size compared to Bxbasic. • now we need to add our "main()" function, copy this code to Bxcomp.c:

76

/* ----- begin program ------------- */ int main(int argc, char *argv[]) { int ab_code=1; printf("Bxbasic Compiler\n"); if(argc != 2) { a_bort(ab_code); } strcpy(t_holder, argv[1]); line_cnt(argv); merge_source(); return 0; } /*-------------------------------*/

You'll notice that, except for some minor changes, it resembles what we've done before. Also, main() calls two functions: line_cnt(), which we've seem before, and merge_source(). • • Here's the code for line_cnt() copy it to Bxcomp.c :

void line_cnt(char *argv[]) { int line_counter=0, ab_code=2; int fnam_len; fnam_len = strlen(argv[1]); fnam_len++; prog_name = malloc(fnam_len * sizeof(char)); strcpy(prog_name, argv[1]); f_in = fopen(prog_name,"r"); /* does program_name.bas exist */ if(f_in == NULL) /* file not found */ { a_bort(ab_code); } else { while(!feof(f_in)) /* until EOF, read-in and */ { fgets(p_string, BUFSIZE, f_in); /* count each line */ if(!feof(f_in)) { line_counter++; } } fclose(f_in); } nrows=line_counter; } /*-------------------------------*/

77

As you can see, this is our same line counting function that we use in Bxbasic.c. We need this because we need to tell the engine how long the program storage array has to be. • • Next, as you might have guessed, function merge_source() is going to merge our two files together. Here is the code for merge_source() :

void merge_source() { char ch, dot='.', *destin, source[20]; int ii, data, ab_code=3; unsigned size=PATH; destin = malloc(size * sizeof(char)); strcpy(destin, prog_name); /* copy source file name */ ii = 0; ch = '\0'; while(ch != dot) { ch = prog_name[ii]; /* make source.bas = source. */ ii++; } destin[ii] = '\0'; strcat(destin, "exe"); /* append "exe" to filename */ /* --- open destination file (write-binary) --- */ f_out = fopen(destin,"wb"); printf("Destination file: %s\n",destin); /* --- read-in scripting engine (read-binary) --- */ strcpy(source, "engine.exe"); f_in = fopen(source, "rb"); if(f_in == NULL) { a_bort(ab_code); } printf("Source file: %s\n",source); while(! feof(f_in)) { data = fgetc(f_in); /* data = incoming stream */ if(! feof(f_in)) { fputc(data, f_out); /* write to destination file */ } } fclose(f_in); /* done copying engine */ /* --- store nrows --- */ fprintf(f_out,"%d\n", nrows); /* save number of rows */

/* --- read-in Basic source file --- */ f_in = fopen(prog_name,"rb"); printf("Source file: %s\n",prog_name); while(!feof(f_in)) { data = fgetc(f_in); /* data = incoming stream */ if(!feof(f_in)) { fputc(data, f_out); /*write source to destination */ } }

78

(Continued) fclose(f_in); fclose(f_out); free(prog_name); printf("Number of Rows=%d\nDone!\n",nrows); } /*------ end merge_source -------*/

There are three variables I'd like to point out to you :

and

dot, destin data

"destin" is an abbreviation for "destination" and is the string variable that is going to contain the name of our destination (or merged) file. In this snippet of code you will see that : • we begin by copying the source file name, which is held in the string : prog_name, into destin. strcpy(destin, prog_name); ii = 0; ch = '\0'; while(ch != dot) { ch = prog_name[ii]; ii++; } destin[ii] = '\0'; strcat(destin, "exe"); /* copy source file name */

/* make source.bas = source. */

/* append "exe" to filename */

• • •

The second step is to increment through the filename until we reach "dot", which of course is the dot separating the filename from the extension. Once we have the index of the "dot", the rest of the filename is truncated by placing a '\0' just after the dot. Then, the correct extension, "exe", is concatenated onto the end of our destination filename.

The next procedure is to open the destination file for Writing. Having done that, we open the file : "engine.exe" for Reading (binary) as our source file. In this segment of code, we use the integer variable "data" to read the input stream from "engine.exe", one byte at a time and write it to the destination file.

79

while(! feof(f_in)) { data = fgetc(f_in); if(! feof(f_in)) { fputc(data, f_out); } } fclose(f_in);

/* data = incoming stream */ /* write to destination file */ /* done copying engine */

When we have read and copied the entire source file, that file is closed, but, the output file is left open, we're not through with it yet. This statement writes the number of rows in the Basic source file to the destination file. fprintf(f_out,"%d\n", nrows); /* save number of rows */

That number is the first thing the engine will need in order to create the program arrays before the rest of the source file can be read-in. The next step is to open the source file for Reading and Write the data, byte for byte, to the destination file. When that is done, the merger is complete. The only other thing we need to add is a small error handler. Here is the code, add this to the end of Bxcomp.c :

void a_bort(int code) { switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxcomp source.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxcomp source.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 3: printf("Engine.exe not found.\n"); printf("Copy Script-Engine to this directory.\n"); printf("Program Terminated.\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

80

Now save Bxcomp.c and compile it. Hopefully, if it's not asking too much, it compiled without error.

BXCOMP.EXE
Now, how can we test Bxcomp.exe to make sure it's working ? Remember, we can't use it yet with Bxbasic, because Bxbasic is lacking the code to read-in the source code correctly. We can't compile and execute a source file, but, we can at least compile something, anything, to test that Bxcomp.exe is in fact working. Here's what we are going to do, we're going to make a copy of Bxbasic.exe and name it Engine.exe, then we are going to execute Bxcomp.exe with "Test.bas" on the command line and create a dummy "Test.exe" file. When that is done, by using a special tool, we are then going to look into the actual Test.exe file and confirm that Test.bas has indeed been appended to the end of Engine.exe. Let's get started : • make a copy of Bxbasic.exe, in the same working directory, and rename it Engine.exe, • now, make sure we have a copy of Test.bas, from ten to twenty lines long, in the same directory as well, • write down on a piece of paper the exact file lengths, in bytes, (not kb), for both Engine.exe and Test.bas, • now, at the command line, enter : BXCOMP TEST.BAS • • • • • Bxcomp will then display some diagnostic information, if all went well, we should have a Test.exe file, at the command line, enter : DIR *.exe and examine the file length of Test.exe, it should be the combined length of the two numbers you wrote down, PLUS three bytes, is it ? the three extra bytes comes from "nrows", an integer (two bytes) plus a newline character.

Now, let's examine Test.exe a little closer. we are going to use a tool I included for use with this chapter, a little gem of a program called : "List.com" : • be sure List.com is in this working directory (or at least in the search path), • at the command line, enter : LIST TEST.EXE • except for the very top and bottom lines, the screen is full of smiley faces and garbage, • that's okay, what we are looking at is an Ascii form of the Binary Machine Language that makes up Engine.exe, (not very interesting is it ?), • now, press the letter : W, notice how things moved around a bit, • now, press the letter : B, and what do you see ? There it is ! At the very end of Test.exe is our Test.bas source code. Question: how many lines did Test.bas have in it ? On the line above line #1 (or whatever the first line number is), at the end of the line, there is a number. Is that the correct number of lines in Test.bas ?

81

Pretty interesting, isn't it. Oh, if you are interested in learning Machine Language, • press the letter : T, (top of page), • now, press : ALT H, (hexadecimal mode), Sorry, you're on your own from here. To exit form List.com, simply press: X If you want a listing of List.com's commands, press: ? or the F1 key. Well, wasn't that an interesting little excursion ? We still have one glaring problem, unfortunately we can't execute Test.exe. If you will recall, Test.exe was only a copy of Bxbasic.exe. If you execute Test.exe all that happens is you get the Bxbasic error handler. We can fix that !

THE ENGINE
It would sure be nice if we could throw together an engine as quickly as we threw together the compiler, without having to start from scratch, wouldn't it ? Actually, we can. And, a fully functioning one too, not just a dummy engine. We already have an up and running scripting engine that reads-in our source file, Test.bas, and executes whatever we tell it to, within the confines of our limited language. What if we modified it so that instead of reading-in the source code from Test.bas, it read-in the source from the executable file itself ? Does this sound familiar ? Here's what we need to do: • • • begin by creating a new file, in the same working directory and name it : "Engine.c", if you are using Lcc, you will probably need to create a new project, do so, using the same directory, with the object file as "Engine.exe". since you've seen most all this code before, I won't waste time explaining it, just copy all of this code to Engine.c :

/* bxbasic: Engine.c : alpha version.01 */ /* #define Power_C */ #define LccWin32 /* --- declare headers --- */ #include <stdio.h> #include <conio.h>

82

(Continued) #include #include #include #include #include <io.h> <stdlib.h> <ctype.h> <string.h> <malloc.h> /* Power-C version */ /* LccWin32 version */

#ifdef Power_C #include <bios.h> #endif #ifdef LccWin32 #include <tcconio.h> #endif

/* --- declare constants --- */ #define BUFSIZE 81 #define LINE_NUM 6 #define TOKEN_LEN 21 #define VAR_NAME 16 /* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* pointer to program array char **array2; /* pointer to line number array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder char token[TOKEN_LEN]; /* the token string char *xstring; /* the print string int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token /**/ int *iv_stack; char **in_stack; int imax_vars=0; /**/ /* ----void void void void void void function prototypes ----- */ load_bas1(void); pgm_parser(void); get_token(void); parser(void); xstring_array(void); go_to(void); /* stack:integer variable values /* stack:integer variable names /* stack:integer variable counter */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */

/* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c"

83

(Continued) #include "enginput.c" /* ----- begin program ------------- */

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic Engine\n"); strcpy(token, argv[0]); /*make argv[0] (source filename) global*/ load_bas1(); pgm_parser(); /* --- end of program --- */ free(array1); free(array2); free(prog_name); return 0; } /*-------------------------------*/

void pgm_parser() { line_ndx = 0; while(line_ndx < nrows) { s_pos = 0; e_pos = 0; get_token(); parser(); line_ndx++; } } /*-------------------------------*/

void get_token() { char ch; int pi=0, ti=0, ab_code=3; int stlen, x=line_ndx; strcpy(p_string, array1[line_ndx]); stlen = strlen(p_string); pi = get_upper(pi, stlen);

84

(Continued) ch = p_string[pi]; if(pi == stlen) { a_bort(ab_code, x); } while(isupper(ch)) { token[ti] = ch; ti++; pi++; ch = p_string[pi]; } token[ti]='\0'; e_pos = pi; } /*-------------------------------*/

void parser() { int ab_code=4, x=line_ndx; if(strcmp(token, "REM") == 0) { /* return */ } else if(strcmp(token, "LET") == 0) { parse_let(); } else if(strcmp(token, "CLEAR") == 0) { clr_int(); } else if(strcmp(token, "LOCATE") == 0) { locate(); } else if(strcmp(token, "PRINT") == 0) { xstring_array(); get_prnstring(); } else if(strcmp(token, "GOTO") == 0) { go_to(); } else if(strcmp(token, "BEEP") == 0) { beep(); } else if(strcmp(token, "CLS") == 0) { cls(); } else if(strcmp(token, "END") == 0) { printf("\nEnd of Program\n"); line_ndx = nrows; } else { a_bort(ab_code, x); }

85

(Continued) } /*-------------------------------*/

void xstring_array() { char ch, quote='\"'; int pi, si=0, ab_code; int stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; if(ch == ':') /* if next character is a ":", get out */ { return; } if(isalpha(ch)) /* if next character is an alpha, */ { return; /* it's a varname, get out */ } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { ch = ' '; while((ch != quote) && (pi < stlen)) { si++; pi++; ch = p_string[pi]; } if((si <= 1) && (pi < stlen)) { ab_code=5; a_bort(ab_code, x); } else if(pi >= stlen) { ab_code=6; a_bort(ab_code, x); } else { si++; xstring = malloc(si * sizeof(char)); } } } /*-------------------------------*/

86

void go_to() { char ch; char gtl_holder[LINE_NUM]; int pi=0, lh=0, ab_code; int xtest, stlen, x=line_ndx; ch = ' '; gtl_holder[0] = '\0'; while(isupper(ch) == 0) /* advance to the word: GOTO */ { ch = p_string[pi]; pi++; } while(isupper(ch)) /* advance past the GOTO */ { ch = p_string[pi]; pi++; } ch = p_string[pi]; if(isdigit(ch) == 0) /* error, expected a number */ { ab_code=7; a_bort(ab_code, x); } while(isdigit(ch)) { gtl_holder[lh] = ch; pi++; lh++; ch = p_string[pi]; } gtl_holder[lh] = '\0'; /* add string terminator */ pi = -1; /* now compare gtl_holder[] to array2[n] */ xtest = -1; while(xtest != 0) { pi++; xtest = strcmp(array2[pi], gtl_holder); if(pi == nrows) { strcpy(t_holder, gtl_holder); ab_code=8; a_bort(ab_code, x); /* error, line not found */ } } pi--; line_ndx = pi; /* set line_ndx to the goto_line */ } /*-------------------------------*/

Now save this as Engine.c . Except for a few key differences, this code is almost a clone of Bxbasic.c. Note : • that prototypes has a new function : "load_bas1()", • and the includes has an addition as well : "enginput.c", • and "main()" calls "load_bas1()".

87

/* ----- function prototypes ----- */ void load_bas1(void); /* --- function includes --- */ #include "enginput.c" int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("bxbasic Engine\n"); strcpy(token, argv[0]); /*make argv[0] (source filename) global*/ load_bas1(); pgm_parser(); ...

So, now we need the code for "Enginput.c" and "load_bas1". First we need to create a new file and name it : "Enginput.c", Here is the code :

/* Enginput.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h" void load_bas1() { char ch, *tmp=token; char ln_holder[LINE_NUM]; int ii, len, pi; unsigned buffr=10; long offset=12345;

/* offset: Engine.exe file length */

/* --- open file --- */ f_in = fopen(tmp,"r"); /* source file is now: "filename.exe" */ fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); /* fetch "nrows" */ nrows = atoi(p_string); /* --- create arrays --- */ array1 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array1[ii] = malloc(ncolumns * sizeof(char)); } array2 = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { array2[ii] = malloc(LINE_NUM * sizeof(char)); } ii = 0; while(!feof(f_in)) /* p_string holds incoming-data */ { fgets(p_string, BUFSIZE, f_in); if(!feof(f_in)) { len = sizeof(p_string); /* pass p_string to array1[] */

88

(Continued) /* ----- fill array1[] here ----- */ strcpy(array1[ii], p_string); array1[ii][len] = '\0'; /* add string terminator */ /* ----- fill array2[] here ----- */ pi = 0; ch = p_string[pi]; while(isdigit(ch)) { ln_holder[pi] = ch; pi++; ch = p_string[pi]; } ln_holder[pi] = '\0'; strcpy(array2[ii], ln_holder); } ii++; } fclose(f_in); } /*-------------------------------*/

Copy this to "Enginput.c" and save it. You will notice that, except for the beginning part, this looks almost like "program_array()". Two things I would like to point out, the first is this line, in the variables section : long offset=12345; /* offset: Engine.exe file length */

the "OFFSET" is the engine's file length, in bytes. This is important ! (the "12345" is a dummy number, a place holder) Second, this is the part that positions the file index pointer to the end of the engine and the beginning of the source code (or byte-code) and reads-in "nrows" : /* --- open file --- */ f_in = fopen(tmp,"r"); /* source file is now: "filename.exe" */ fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); /* fetch "nrows" */ nrows = atoi(p_string);

Now compile Engine.c. It should compile without errors. If you are not using Lcc or Power-C and you do get an error, any errors should be easily correctable.

89

TEST FIRE
The next thing to do is to "Fire It Up" : • • • • • • • • you'll need to drop down to the DOS command line and get the exact file length of Engine.exe, in bytes. If you remember, that number is going to be our OFFSET, using your editor, open "Enginput.c", in "load_bas1()" replace the numbers "12345" with the correct OFFSET value, MAKE NO OTHER CHANGES! With my current version of Lcc, the un-optimized compile of Engine.exe yielded a file length of 44060, now, having entered the OFFSET, save Enginput.c. Re-Compile Engine.c, again, look at the file length, make sure it's the same and didn't change in any way, if it did change, go back and fix it, before we proceed, take a look at the file length of Test.bas and write it down. My Test.bas file length was 259 bytes.

Now we are ready to fire this baby up ! While holding your breathe, at the command line, enter : BXCOMP TEST.BAS If you got the full diagnostic report, as before when we ran Bxcomp.exe, you can exhale now. If not, exhale anyway. If there was a problem, based on the error message, correct the problem and go through this procedure from the beginning. Assuming we got an error free compile, look at the file length of Test.exe and make sure it adds up. Mine was : 44060 259 + 3 ----44322 Engine.exe Test.bas nrows Test.exe

If it didn't add-up something is wrong. Go back and try to find out what might be wrong. If it did add-up, at the command line, enter : TEST BXBASIC TEST.BAS

It should have run exactly as it would have had you entered : Pretty cool, huh ?

Now we have a Scripting-Engine and a Source-Code Compiler ! With tools like these, a person could transform their scripts into completely self-executing programs.

90

BACK TO BXBASICS
I'd like to shift gears now and focus in on variables again, or rather, things we might do with them. We have them working pretty well now, since we've made them dynamic. We are no longer limited by the number of variables we can have and they are considerably more memory efficient. Other than the fact that we only have integer variables at this point, the only limit we currently have on them is that we can only assign them a constant value. What we are in need of is the ability to calculate a value as a function of an assignment. We need the ability to add, subtract and multiply both constants and variables, in the same expression. What we need is an Expression Parser. By no means am I a mathematical wizard, but, I am intrigued by mathematical algorithms. I have found that Expression Parsers, next to Compression Algorithms, are right up there. Several months ago, having done a little bit of research on the subject, I pondered the various aspects of Expression Parsing and set out to write one of my own. After many writes and rewrites, I succeeded in handcrafting one that works with a great deal of accuracy. Although, it lacks elegance and I have to admit, it operates in kind of a weird way, I was rather proud of it, being it was my first attempt. I've always held the belief that you shouldn't shy away from attempting something just because you don't know the established or even correct way of doing it.

EXPRESSION PARSER
Originally, I was going to use my expression parser, but, then I decided that it would be a far better idea to use one of Jack Crenshaw's parsers. I mentioned in Chapter 1 that I had just finished translating Jack Crenshaw's tutorial series from Pascal into C. That gave me the wonderful opportunity to really study his various methods of designing and implementing Expression and Recursive Descent Parsers. I have to say, Jack really knows his stuff. What I've done for this issue is to take one of Jack's parsers and make the little tweaks to adapt it for use in Bxbasic. We're going to take this a little easy, at first and start off with a parser that will handle numeric expressions only. Then, we will add variables into the equation. I'll start by giving you all the code first, then, I'll do a bit of explaining how it works. First, we need to create a new file to drop the parser and its supporting functions in: • with your editor, create a new file and name it "Rdparser.c", • here is the code, in it's entirety :

/* bxbasic : Rdparser.c : alpha version */ /* special credits to: Jack Crenshaw's tutorial: "How to Build a Compiler" */ /* ----- function prototypes ----- */ #include "prototyp.h"

91

int rdp_main() { int value;

/* Recursive Descent Parser Main */

value = Expression(); return value; } /*-------------------------------*/

int Expression() /* Parse and Translate an Expression */ { char ch; int pi, Value; pi = e_pos; ch = p_string[pi]; if(IsAddop(ch)) { Value = 0; } else { Value = Term(); pi = e_pos; ch = p_string[pi]; } while(IsAddop(ch)) { switch(ch) { case '+': Match('+'); Value = Value + Term(); break; case '-': Match('-'); Value = Value - Term(); break; default: break; } pi = e_pos; ch = p_string[pi]; } return Value; } /*-------------------------------*/

92

int Term() /* Parse and Translate a Math Term */ { char ch; int pi, Value; Value = Factor(); pi = e_pos; ch = p_string[pi]; while(IsMultop(ch)) { switch(ch) { case '*': Match('*'); Value = Value * Factor(); break; case '/': Match('/'); Value = Value / Factor(); break; default: break; } pi = e_pos; ch = p_string[pi]; } return Value; } /*-------------------------------*/

int Factor() { char ch; int pi, value;

/* Parse and Translate a Math Factor */

pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); value = Expression(); Match(')'); } else { value = GetNum(); } return value; } /*-------------------------------*/

93

void Match(char x) /* Match a Specific Input Character */ { char ch, string[6]; int pi, ab_code=12, ln=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch != x) { strcpy(string, "\" \""); string[1] = x; strcpy(t_holder, string); a_bort(ab_code,ln); } else { _GetChar(); SkipWhite(); } } /*-------------------------------*/

void _GetChar() { e_pos++; } /*-------------------------------*/

int Is_White(char ch) { int test=0; if((ch == ' ') || (ch == '\t')) { test = -1; } return test; } /*-------------------------------*/

94

void SkipWhite() { char ch; int pi;

/* Skip Over Leading White Space */

pi = e_pos; ch = p_string[pi]; while(Is_White(ch)) { _GetChar(); pi = e_pos; ch = p_string[pi]; } } /*-------------------------------*/

int GetNum() /* Get a Number */ { char ch; int pi, Value=0, ab_code=12, ln=line_ndx; pi = e_pos; ch = p_string[pi]; if(! isdigit(ch)) { strcpy(t_holder, "Integer"); a_bort(ab_code,ln); } while(isdigit(ch)) { Value = 10 * Value + ch - '0'; _GetChar(); pi = e_pos; ch = p_string[pi]; } SkipWhite(); return Value; } /*-------------------------------*/

int IsAddop(char ch) { int rval=0;

/* Recognize an Addop */

if((ch == '+') || (ch == '-')) { rval = 1; } return rval; } /*-------------------------------*/

95

int IsMultop(char ch) { int rval=0;

/* Recognize an Addop */

if((ch == '*') || (ch == '/')) { rval = 1; } return rval; } /*-------------------------------*/

• • •

Okay, copy this into "Rdparser.c", and just save it, now we need to make a few changes to some of the existing files. In Bxbasic.c, add the "rdparser.c" to the "function-includes" list, as I've shown here :

/* --- function includes --- */ ... #include "rdparser.c"

•

in "Prototyp.h", add this block of prototypes to the list :

/* bxbasic : Prototyp.h : alpha version */ /* ----- function prototypes ----- */ ... ... /* Rdparser.c */ int rdp_main(void); int Expression(void); int int void void int int int int void Term(void); Factor(void); Match(char); _GetChar(void); GetNum(void); IsAddop(char); IsMultop(char); Is_White(char); SkipWhite(void);

96

•

in function "parse_let()", of file "Variable.c", at the very bottom part, from just above the "now get assignment value" section, change it to read as follows : ... ... pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; /* ------- now get assignment value --------- */ Match('='); iv_stack[ndx] = rdp_main(); } /*-------------------------------*/

•

in function "a_bort()", of file "Error.c", change "case 12" to read as follows :

case 12: printf("\nExpected %s ",t_holder); printf(": in line: %d.\n", (line_ndx+1)); printf("%scode(%d)\n", p_string, code); break;

Okay, now save all of this and compile Bxbasic.c. With any luck, there were no errors. Now modify "Test.bas" to read as follows :

1 2 3 4 5 6

REM test.bas version 3.1 CLS PRINT "hello world!" LET abc = 2 + 2 PRINT abc END

Save it and run: BXBASIC TEST.BAS Did it work okay ? If not, try to track down the bug and correct it.

97

PRECEDENCE
Now change line 4, in Test.bas, to this : 4 LET abc = 2+3*4 and run it. The correct answer is 14 and not 20 ! Why not 20 ? After all: and 2+3 = 5 5*4 = 20 !

Because of Precedence. If you will recall from your algebra classes, that some "operators" have a higher precedence than others. A Multiply and Divide have a higher precedence than an Add or Subtract. That means that when evaluating a left-to-right math expression, a Multiply and/or a Division has to be performed before the Add or Subtract, (that is, unless an operation is within parenthesis). The correct evaluation is: and 3*4 = 12 2+12 = 14

voila !
If the expression had been written: (2+3)*4 then the correct answer would have been 20. Now change line 4 to read as follows: 4 LET abc = 2*(3+4)*5/10

and run it. Operators of equal precedence are evaluated in a left-to-right manner, as neither is of higher precedence than the other. In the above expression, the correct procedure is : then then then (3+4) = 7 2*7 = 14 14*5 = 70 70/10 = 7

98

FLOW CHART
Let's take a look at how the parser works. We'll begin with a simple equation and single step through it. Examine the code as we walk through this. parse_let() first retrieves the variable name and locates the correct index in the stack and then calls: Match('='): Expression: abc = 2+2 parse_let: Match(=): _GetChar: SkipWhite: [ret] ** "ch" will represent the current character. [= 2+2] (ch=2) [2+2]

rdp_main: (iv_stack[ndx]=rdp_main) Expression: Term: (value=Term) Factor: GetNum: _GetChar: SkipWhite: (ch=+) [+2] [ret(value=2)] Match(+): _GetChar: SkipWhite: [ret] [+2] (ch=2)

Term: (value=value+Term) Factor: GetNum: _GetChar: SkipWhite: (ch=\n) [ret(value=2)] [ret(value=2+2)] iv_stack[ndx] = 4 Were you able to follow that ? It may seem that Term() and Factor() get called a lot, for no real reason. Why not just take a short-cut and jump straight to GetNum() ? Well, the procedure as it is laid out, even thought it may waste a few steps, guarantees that the hierarchy and integrity of precedence is preserved. The above expression is just about as simple an equation as you can have, therefore, not all the features are put to use.

99

RECURSION
In a simple expression like:

2+2
the equation is evaluated in a straight left to right manner. This expression does not have the qualities that require "Recursion". What is Recursion ? Well, let's take a look at rdparser.c and the Flow Chart above, you will notice that : Expression calls Term and Term calls Factor and Factor calls GetNum you could call this a Descent Parser, as in: "step down to the next level". But, if you take another look at Factor, you will see that in some cases Factor can call Expression: Expression calls Term and Term calls Factor and Factor calls Expression Recursion is a sort of loop, where procedures at lower levels of the parser make repeated calls to procedures higher up in the parser, which again, work their way down. This is called Recursive Descent. In the example above, our parser is descending up to the point where Factor calls Expression. At that point it becomes recursive. It's like sliding a thread through the eye of a needle and then doing it again, repeatedly. Consider this expression: 2*(3+4) single step through this,

100

parse_let: Match(=): _GetChar: SkipWhite: [ret]

[= 2*(3+4)] (ch=2) [2*(3+4)]

rdp_main: (iv_stack[ndx]=rdp_main) Expression: Term: Factor: GetNum: _GetChar: SkipWhite: (ch=*) [*(3+4)] [ret(value=2)] Match(*): _GetChar: SkipWhite: [ret] Factor: Match("(") _GetChar: SkipWhite: [ret] [*(3+4)] (ch="(") [(3+4)] (Value=Value*Factor) (ch="(") (ch=3) [3+4)]

Expression: ** --- Recursion --- ** Term: (ch=3) [3+4)] Factor: GetNum: (value=GetNum) _GetChar: SkipWhite: (ch=+) [+4)] [ret(value=3)] Match(+): _GetChar: SkipWhite: [ret] (ch=+) (ch=4) [4)]

Term: (Value=Value+Term) Factor: GetNum: (value=GetNum) _GetChar: SkipWhite: (ch=")") [ret(value=4)] [ret(value=3+4)] Match(")"): _GetChar: SkipWhite: [ret] [ret(value=7)] [ret(value=2*7)] [ret(value=14)] iv_stack[ndx] = 14 (ch=")") (ch=\n)

101

Even though there was only one "recursion", this gives you an idea of how it works. You jump back up to the top and start working your way down again. When you have reached the end of that portion of the equation, the resulting value is returned to Factor and the expression picks up where it left off before the recursion. This is by no means is the only way to design a parser, it's just the cleanest I've seen. If you care to design your own, remember to "Flow Chart" it, as we've done here, to work the bugs out.

102

VARIABLES REVISITED
Now that we have that taken care of, it's time to make provisions for incorporating variables into our expressions. To accomplish that is only going to require the addition of a couple of lines to Factor(), in Rdparser.c. Here is the new code for Factor() :

int Factor() { char ch; int pi, value;

/* Parse and Translate a Math Factor */

pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); value = Expression(); Match(')'); } else { if(isalpha(ch)) { value = get_varvalue(); SkipWhite(); } else { value = GetNum(); } } return value; } /*-------------------------------*/

Copy this code to Rdparser.c, replacing the existing Factor(). Save Rdparser.c and re-compile Bxbasic.c. Now copy this new version of Test.bas : 1 REM test.bas version 3.2 2 CLS 3 PRINT "hello world!" 4 LET xylophone = 50 5 LET yazoo = 100 6 LET abc = yazoo/xylophone 7 LET xyz = yazoo/10 8 LOCATE abc , xyz 9 PRINT "hello world!" 10 LET quasar = 2 11 LET zapp = 4 12 LET abc = (quasar * quasar * zapp + zapp)/5

103

(Continued) 13 14 15 50 LET xyz = ((quasar*quasar)*zapp)+zapp LOCATE abc, xyz PRINT "hello world!" END

Save it and run : BXBASIC TEST.BAS

CONCLUSION
Well, I think we've just about covered everything for this installment. We've covered quite a bit of ground, this time : • we wrote a (pseudo) compiler, • we constructed a scripting-engine, • and we added an expression parser that handles numbers and variables. There's still more to come. Here is a short list of some of the things we will tackle in the next and up-coming chapters: * variables: long-integers, floating-point, character strings, * converting the original Basic source code into a byte-code, * compiling the byte-code into an executable program,

104

CHAPTER - 4
INTRODUCTION
In this chapter we will be taking steps towards producing a form of byte-code from the source file. To accomplish that, we will need to begin filtering out unwanted comments and pre-interpreting program statements and determining their context. At present our Test.bas scripts have been written in a way so as to not cause any unwanted side effects. Each line begins with a line label (or line number), but, in reality the labels or numbers are not required, at all, unless there is a GOTO statement. In that case, the destination line must be labeled. If you recall, I've previously stated that a particular language's definition is what sets it apart from any other language. Besides the Keywords used in a programming language, you could say the "look and feel" are just as much a part of the language definition. In older versions of Standard Basic, it was common to expect every program line to be numbered (or labeled). That gave Basic a particular "look and feel". By our definition, so far, line numbers are entirely optional, with the exception of GOTO statements and that's only because we haven't incorporated "block labels" yet. What I'd like to do in this chapter is more clearly define the language and give it a less rigid "look and feel". In doing that we will be adding 'block labels". Both line numbers and labels will be entirely optional except of course where they represent the target of a GOTO statement. Here is how our language definition is looking at present: Keywords REM LET CLEAR LOCATE PRINT GOTO BEEP CLS END

a comment line, required, variable assignments, erases all variable names and values, may use either constants or variables, may print quoted strings or variables, uses a (constant) line number, makes a sound, clears display, normal program termination.

Syntax o line numbers are optional, but they must appear in column 1, o only one statement per program line, o all variable names must begin with an alpha character, o variable names may be alpha-numeric of mixed upper and lower case, o variables are of type integer, only, o blank lines are not permitted, o print keyword only prints one variable or string, o number of spaces between keywords and variables is of no consequence, o variable assignments may be constants or the result of an expression, o variable names may be up to 15 characters in length.

105

Program o on the command line, the source filename must include the extension, o the entire source file, including comments, is loaded into memory, o the parser uses a string comparison to detect keywords, o program errors call the error handler and program is terminated, o variable names and values are stored in a pair of dynamic arrays.

In the test.bas scripts we've been writing, each line contained a keyword, blank or empty lines have not been permitted. Currently, a blank line generates an error. Often times, blank lines can make a program more readable, in a visual sense. Also, besides the REM keyword for comments, Basic allows the single quote (') mark (or apostrophe) as a line comment, we will begin to allow for that as well.

NEW LOOK-N-FEEL
Here is some of what I have planned for this chapter: • the .bas extension will not be required on the command line, but source filename must end in .bas, i.e.: C:> Bxbasic Test • • • • • • • • • • • • • • read-in source file and write modified source to a temporary "source.tmp" file, filter out single quote (') commented lines, blank lines will be allowed, they will be filtered out, block labels, must begin with an uppercase alpha, block labels can be up to 30 alpha-numeric characters, block labels must be terminated with a colon, ":", labels may be up to 30 characters, no blank spaces, if used, line numbers and block labels must appear in column number 1, un-labeled program lines should be indented, but, it's not required, GOTO statements can use numbers or alpha-numeric labels as the target, LET keyword is entirely optional, any program line that does not begin with a keyword is assumed to be an assignment. Assignment is the default, keywords will be converted to a token, a numeric byte-code representation, addition of other variable types, variable names may be up to 32 characters in length,

To accomplish all of these things is going to require a considerable amount of re-writing of some existing functions, the addition of several new functions and discarding a couple that we've out grown. We will begin with the source code input.

SOURCE INPUT
Many of the changes to bxbasic from its current form is going to take place in file Input.c. We will begin by changing the way in which the source code is handled. The source will be read into memory and be written out to a temporary file. In the process of writing the source file to the temporary file, the comments that use the single quote (') and blank lines will be filtered out. Comments that use the keyword REM will be filtered out in a later step.

106

The way in which the final program code is to be stored in memory will be handled here as well. The line numbers or block labels will be stored in a character string array. The keywords will be converted into a numeric code and stored in an integer array and the remaining portion of the program statement will be stored in a character string array. Here is a diagram of how this will be set up: Labels Array [10 [20 [Label [ ... Byte Code Array [4 ] [1 ] [ ] [ ] Statement Array ["hello world!" [abc=2*(3+4) [ [

] ] ] ]

] ] ] ]

Each array will be of the same length, which is the number of program lines, and will share the same indexes. Thus; program line[index] = label[index] = byte[index] = statement[index]

Now open file "Input.c". Numerous changes will take place here, function program_array() will no longer exist and several new functions will be added. Begin by deleting function program_array(). As you can see (below), line_cnt() bares almost no resemblance to the prior version. The first part determines whether or not the ".bas" extension was provided on the command line and adds it if not. The ".bas" extension on the command line is now optional. The source file is then opened and function load_src() is called. Two other functions are called and the temporary file is deleted before returning to "main()". Copy this to Input.c, replacing the current version;

void line_cnt(char *argv[]) { int pi, len, ii=0, x=0; int ab_code=2; unsigned fnam_len; nrows = 0; /* --- get length of command line argument --- */ fnam_len = strlen(argv[1]); len = fnam_len; fnam_len += 5; /* add padding */ prog_name = malloc(fnam_len * sizeof(char)); strcpy(prog_name, argv[1]); /* --- does source filename end in: .bas --- */ pi = (len-4); for(; pi < len; pi++) { p_string[ii] = prog_name[pi]; ii++; }

107

(Continued) if(strcmp(p_string, ".bas") != 0) { strcat(prog_name, ".bas"); } p_string[0] = '\0'; strcpy(s_holder, prog_name); /* if not, add it */

/* --- open filename.bas for Read --- */ f_in = fopen(prog_name,"r"); if(f_in == NULL) /* error:file not found */ { a_bort(ab_code, x); } else { load_src(); /* read source-file, write temp-file */ loader_1(); /* read temp-file, load temp arrays */ loader_2(); /* load program arrays */ free(prog_name); x = remove("source.tmp"); /* if test is '0', successful */ } } /*------- end line_cnt ----------*/

The first part of "load_src()" opens file "source.tmp" for writing. Each line is read in from the source file, blank lines are discarded and upon encountering a valid program line, save_tmp() is called. Copy this to Input.c.

void load_src() { char *tmp="source.tmp"; int pi, len; f_out = fopen(tmp,"w"); while(!feof(f_in)) { fgets(p_string, BUFSIZE, f_in); pi = 0; pi = iswhite(pi); len = strlen(p_string); if((len > 2) && (pi < len)) { if(!feof(f_in)) { save_tmp(); } } } fclose(f_in); fclose(f_out); } /*------- end load_src ----------*/ /* open output file */

/* skip over blank lines */

108

At the start of function "save_tmp()", xstring is padded with blank spaces to force indenting of un-labeled lines. That helps make statements easier to parse. Depending on the context of the program line, either xstring or p_string is written to "source.tmp" and the line counter is incremented. Single quote comment lines are not written out. Copy to Input.c;

void save_tmp() { char ch; int pi, len; /* --- setup xstring to write indented line --- */ strcpy(xstring, " "); strcat(xstring, p_string); pi = 0; ch = p_string[pi]; /* --- test for a Label: --- */ if(isupper(ch) != 0) { len = (LLEN-2); while(isalnum(ch) != 0) { pi++; ch = p_string[pi]; } if((ch == ':') && (pi <= len)) { pi++; p_string[pi] = '\0'; strcat(p_string, "\n\0"); fprintf(f_out,"%s", p_string); nrows++; } else { fprintf(f_out,"%s", xstring); nrows++; } }

/* loop thru "Label:" */

/* write block label */

/* write indented line */

/* --- test for numbered line --- */ else if(isdigit(ch)) { fprintf(f_out,"%s", p_string); /* write numbered line */ nrows++; } else { pi = iswhite(pi); ch = p_string[pi]; if(ch != '\'') /* eliminate comment lines */ { fprintf(f_out,"%s", xstring); /* write indented line */ nrows++; } } } /*------- end save_tmp ----------*/

109

Function loader_1() creates three temporary arrays for processing the source code. File source.tmp is then opened and read in, one line at a time for processing. Functions:

and

tmp_label() tmp_byte() tmp_prog()

are then called sequentially. Copy this to Input.c:

void loader_1() { char ch, ln_holder[LLEN]; char *tmp="source.tmp"; int ii, len, pi; unsigned size=ncolumns; /* --- create temp arrays --- */ temp_prog = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { temp_prog[ii] = malloc(size * sizeof(char)); } temp_label = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { temp_label[ii] = malloc(LLEN * sizeof(char)); } temp_byte = malloc(nrows * sizeof(int)); /* --- open source.tmp for Read --- */ f_in = fopen(tmp,"r"); ii = 0; while(!feof(f_in)) /* p_string holds incoming-data */ { fgets(p_string, BUFSIZE, f_in); if(!feof(f_in)) { len = strlen(p_string); /* ----- fill temp_label[] here ----- */ tmp_label(ii); /* call label filter */ pi = e_pos; ch = p_string[pi]; if(ch == ':') /* found a label */ { temp_byte[ii] = -1; strcpy(temp_prog[ii], "\n\0"); } /* ----- fill temp_byte[] here ----- */ else { tmp_byte(ii); /* call byte filter */ tmp_prog(ii); } } ii++; } fclose(f_in); } /*-------------------------------*/

110

Function "tmp_label()" tests the beginning of the program line to determine if it contains a line number or block label. If it does, the label is extracted and stored in the temp_label[] array. Copy this to Input.c: void tmp_label(int ii) { char ch, ln_label[LLEN]; int pi; /* ----- fill temp_label[] here ----- */ pi = 0; ch = p_string[pi]; if(isalnum(ch)) { while(isalnum(ch)) { ln_label[pi] = ch; pi++; ch = p_string[pi]; } ln_label[pi] = '\0'; strcpy(temp_label[ii], ln_label); } else { strcpy(temp_label[ii], " \0"); } e_pos = pi; } /*-------------------------------*/

Function "tmp_byte()" begins the process of converting each keyword in a statement into a byte code token. Any comment lines that are designated by using a single quote (') are tokenized to a "0". If the first word in the line begins with an uppercase, then a call to get_byte() is made. A line beginning with a lowercase alpha is presumed to be an assignment statement. Anything else will generate an error. The byte code is then stored in the temp_byte[] array. Copy this code to Input.c:

void tmp_byte(int ii) { char ch; int pi, si, byte; int x=ii, ab_code=4; /* ----- fill temp_byte[] here ----- */ pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; if(ch == '\'') /* it's a comment */ { byte = 0; strcpy(temp_prog[ii], "\n\0"); }

111

(Continued) else { if(isupper(ch)) /* is this a keyword */ { e_pos = pi; byte = get_byte(ii); /* call get_byte */ pi = e_pos; } else if(isalpha(ch)) /* a possible assignment */ { si = pi; /* save pointer position */ while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(ch == '=') /* a variable assignment */ { byte = 1; pi = si; } else { a_bort(ab_code, x); /* not an assignment */ } } else { a_bort(ab_code, x); /* not a keyword or variable */ } } temp_byte[ii] = byte; e_pos = pi; } /*-------------------------------*/

Function "get_byte()" isolates the keyword in the program line and compares it to the keyword list. The integer variable "byte" is then assigned the token value. This keyword and byte code list will grow as the language is expanded. In the event that the "keyword" string is not a valid keyword, then a test is made to verify that it is an assignment, otherwise an error is generated. Copy this code to Input.c:

int get_byte(int ii) { char ch, keyword[TOKEN_LEN]; int pi, si=0, byte; int x=ii, ab_code=4; pi = e_pos; ch = p_string[pi]; while(isalnum(ch)) { keyword[si] = ch; si++; pi++; ch = p_string[pi]; }

112

(Continued) keyword[si] = '\0'; /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) byte=1; else if(strcmp(keyword, "CLEAR") == 0) byte=2; else if(strcmp(keyword, "LOCATE") == 0) byte=3; else if(strcmp(keyword, "PRINT") == 0) byte=4; else if(strcmp(keyword, "GOTO") == 0) byte=5; else if(strcmp(keyword, "BEEP") == 0) byte=6; else if(strcmp(keyword, "CLS") == 0) byte=7; else if(strcmp(keyword, "END") == 0) byte=8; else { pi = iswhite(pi); ch = p_string[pi]; if(ch == '=') /* a variable assignment */ { byte = 1; pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } e_pos = pi; return byte; } /*-------------------------------*/

Function "tmp_prog()" copies the remainder of the program statement into array temp_prog[]. Copy this to Input.c and save it:

void tmp_prog(int ii) { char ch, prog[BUFSIZE]; int pi, si=0, len; len = strlen(p_string); pi = e_pos; pi = iswhite(pi); if((pi < len) && (temp_byte[ii] != 0)) { ch = p_string[pi]; while(ch != '\0') { prog[si] = ch; si++; pi++; ch = p_string[pi]; } prog[si] = '\0'; }

113

else { strcpy(prog, "\n\0"); } strcpy(temp_prog[ii], prog); } /*-------------------------------*/

Function "loader_2()" begins by recounting the number of actual program lines, discounting any remaining comment lines. With the corrected number of program lines, the runtime program arrays:

and are created.

array1[ ] label_nam[ ] byte_array[ ]

The data stored in the temp arrays is then transferred into the new arrays. The final task is to delete the temp arrays from memory. Copy this code to Input.c:

void loader_2() { int ndx, ii, line_count=0, lines=nrows; unsigned size; /* --- re-count number of lines --- */ for(ndx=0; ndx < nrows; ndx++) { if(temp_byte[ndx] != 0) { line_count++; } } nrows = line_count; /* --- create program arrays --- */ array1 = malloc(nrows * sizeof(char *)); label_nam = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { label_nam[ii] = malloc(LLEN * sizeof(char)); } byte_array = malloc(nrows * sizeof(int)); /* --- transfer temp_arrays to program_arrays --- */ ndx = 0; for(ii=0; ii < lines; ii++) { if(temp_byte[ii] != 0) { strcpy(label_nam[ndx], temp_label[ii]); byte_array[ndx] = temp_byte[ii]; /**/

114

(Continued) size = strlen(temp_prog[ii]); size++; array1[ndx] = malloc(size * sizeof(char)); strcpy(array1[ndx], temp_prog[ii]); ndx++; } } /* --- free temp array memory --- */ for(ii=0; ii < lines; ii++) { free(temp_label[ii]); free(temp_prog[ii]); } free(temp_label); free(temp_byte); free(temp_prog); } /*-------------------------------*/

Save and close file Input.c.

OTHER CHANGES
Here are the code revisions to be made to Bxbasic.c. Copy the following definitions to Bxbasic.c, replacing the existing ones and save it:

/* --- declare constants --- */ #define BUFSIZE 256 #define LINE_NUM 6 #define TOKEN_LEN 21 #define VAR_NAME 33 #define LLEN 33 /* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* program array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token /**/

*/ */ */ */ */ */ */ */ */ */

115

(Continued) char char char int int char int /**/ int *iv_stack; char **in_stack; int imax_vars=0; /**/ /* stack:integer variable values /* stack:integer variable names /* stack:integer variable counter */ */ */ xstring[BUFSIZE]; **temp_prog; **temp_label; *temp_byte; *byte_array; **label_nam; token; /* /* /* /* /* /* /* the print string temp program array temp label name array temp byte code array byte code array labels name array token: current byte code */ */ */ */ */ */ */

Copy this new version of "main()" to Bxbasic.c:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("Bxbasic Interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); pgm_parser(); /* --- end of program --- */ clr_arrays(); return 0; } /*-------------------------------*/

You will notice that "main()" calls "line_cnt()", then calls "pgm_parser()" directly after. Function "get_token()" is now considerably smaller. Copy this to Bxbasic.c:

void get_token() { strcpy(p_string, array1[line_ndx]); token = byte_array[line_ndx]; } /*-------------------------------*/

116

Function "parser()" is now dramatically changed. All keywords have been tokenized into numeric values. A switch case function now replaces the IF-ELSE statement. The switch case is faster than performing a string comparison because only two integer values need to be compared. Copy this to Bxbasic.c, replacing the existing one:

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* LET */ parse_let(); break; case 2: /* CLEAR */ clr_int(); break; case 3: /* LOCATE */ locate(); break; case 4: /* PRINT */ xstring_array(); get_prnstring(); break; case 5: /* GOTO */ go_to(); break; case 6: /* BEEP */ beep(); break; case 7: /* CLS */ cls(); break; case 8: /* END */ printf("\nEnd of Program\n"); line_ndx = nrows; break; case -1: /* block label */ break; default: a_bort(ab_code, x); break; } } /*-------------------------------*/

117

void xstring_array() { char ch, quote='\"'; int pi, si=0, ab_code; int stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; if(ch == ':') /* if next character is a ":", get out */ { return; } if(isalpha(ch)) /* if next character is an alpha, */ { return; /* it's a varname, get out */ } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { ch = ' '; while((ch != quote) && (pi < stlen)) { si++; pi++; ch = p_string[pi]; } if((si <= 1) && (pi < stlen)) { ab_code=5; a_bort(ab_code, x); } else if(pi >= stlen) { ab_code=6; a_bort(ab_code, x); } } } /*-------------------------------*/

118

void go_to() { char ch; char goto_label[LLEN]; int pi, si=0, ab_code=8; int xtest, stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; while(isalnum(ch)) { goto_label[si] = ch; pi++; si++; ch = p_string[pi]; } goto_label[si] = '\0'; /* add string terminator */ pi = -1; /* now compare gtl_holder[] to array2[n] */ xtest = -1; while(xtest != 0) { pi++; xtest = strcmp(label_nam[pi], goto_label); if(pi == nrows) { strcpy(t_holder, goto_label); a_bort(ab_code, x); /* error, label not found */ } } pi--; line_ndx = pi; /* set line_ndx to the goto_line */ } /*-------------------------------*/

The two above functions contain only minor changes, mainly to accommodate changes made in Input.c. Copy and save to Bxbasic.c. We are through with Bxbasic.c, close it. Open file Error.c. Change the top portion of a_bort() to read as follows, by deleting the "free()" statements:

void a_bort(int code,int line_ndx) { beep(); switch(code) [snip]...

now delete "case 7" and change "case 8" to read as follows:

119

[snip]... case 8: printf("\nGOTO Error: no such label:"); printf(" %s:\nin program line:",t_holder); printf(" %d:\nGOTO %s",(line_ndx+1),p_string); printf("Program Terminated\ncode(%d)\n", code); break; [snip]...

Save this and close Error.c. Now open Utility.c and add this new function, "clr_arrays()":

void clr_arrays() { int ii; /* --- free program array memory --- */ for(ii=0; ii < nrows; ii++) { free(label_nam[ii]); free(array1[ii]); } free(label_nam); free(byte_array); free(array1); } /*-------- end clr_arrays ----------*/

Then save and close Utility.c. Now open file Prototyp.h and replace the prototypes for Input.c and Utility.c with these:

/* ----- function prototypes ----- */ [snip]... /* Input.c */ void line_cnt(char *argv[]); void load_src(void); void save_tmp(void); void tmp_byte(int); void loader_1(void); void tmp_label(int); void tmp_byte(int); int get_byte(int); void tmp_prog(int); void loader_2(void);

120

(Continued) /* Utility.c */ int get_upper(int,int); int get_alpha(int,int); int get_digit(int,int); int iswhite(int); void clr_arrays(void); [snip]...

Now save Prototyp.h and close it. Well that should just about do it ! Compile Bxbasic. If we're lucky there will be no complaints from the compiler. Try it with this version of Test.bas: ' test.bas version 4.1 CLS PRINT "hello world!" LET xylophone = 50 LET yazoo = 100 LET abc = yazoo/xylophone xyz = yazoo/10 LOCATE abc , xyz PRINT "hello world!" ' -----------------------------------------quasar = 2 zapp = 4 abc = (quasar * quasar * zapp + zapp)/5 xyz = ((quasar * quasar) * zapp) + zapp LOCATE abc, xyz PRINT "hello world!" ' -----------------------------------------PRINT: PRINT " 2*(3+4)*5/10 = "; abc = 2*(3+4)*5/10 PRINT abc TheEnd: END ' ------------------------------------------

121

Let's try adding some block labels. Run this Test.bas: test.bas version 4.2 CLS ' now jump to a block label GOTO OverThere ' TheBeginning: PRINT "We're at The Beginning!" GOTO TheEnd ' There: PRINT "We're There!" GOTO TheBeginning ' JumpBack1: PRINT "We Jumped Back 1!" GOTO There ' OverThere: PRINT "We're Over There!" GOTO JumpBack1 ' TheEnd: END '

That completes this section. Except for adding new variable types, I think we've done all the things on our list we wanted to in order to expand the definition of Bxbasic and give it a better "look and feel".

122

COMPILER
Realistically, transforming the source code into byte code at run time probably doesn't benefit the Bxbasic interpreter all that much, as the source code has to be loaded and manipulated prior to being executed. This probably wouldn't improve performance too much on smaller programs, unless there was a lot of repetition, by repeatedly performing a certain task. In most cases, the time spent on the byte coding process would over shadow any benefit in improved performance. Well I guess the next question would be, how do these changes translate to the compiler ? What changes have to be made so that these new capabilities can be carried over to the compiler and the engine ? After all, the purpose of using byte code is to pre-interpret much of the source code, rendering an abbreviated set of instructions, which can then be affixed to the runtime engine. At runtime, the engine would load the abbreviated set of instructions and execute them without the need to interpret their meaning or context. This would really produce a performance benefit at runtime. As an example, for any given program line, Bxbasic version 3 had to first determine whether or not there was a line number and jump over it if there was one. Then, extract the keyword, one character at a time, and do a string comparison against a list before it could execute the first command. At this point in version 4, because the entire program line is now divided across three arrays, the line number or label is completely ignored. The parser jumps directly to the token byte-code and quickly does a comparison of two integers, in a single step, and executes the command. What do we need to do to make this work ? Well, not really very much. We have already written nearly all the code that's needed. All we need to do is cut-n-paste it where it needs to go. Since all the code that makes the source file to byte-code transformation is contained within Input.c, that's where we have to look for the answers. The new Input.c begins with "line_cnt()" and ends with "loader_2()". By the time we are done with "loader_2()" the transformation is complete and the byte code is now in memory. In Bxbasic, the next step is for the program parser to begin program execution. For our purposes, though, we don't want to execute the program, but instead, we want to merge the byte-code to the engine, as we did the last time with the original source code. So, all we need to do is have "main()" call "line_cnt()", let it do its thing, then make a final call to "merge_source()" to combine the two into the final executable. Here is the new code for Bxcomp.c:

/* bxbasic: Bxcomp.c : alpha version.02 */ /* --- declare headers --- */ #include <stdio.h> #include <io.h> #include <stdlib.h> #include <string.h> #include <ctype.h> #include <malloc.h> /* --- declare constants --- */ #define BUFSIZE 256 #define PATH 81 #define TOKEN_LEN 21

123

(Continued) #define LLEN 33 */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */

/* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* program array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token /**/ char char char int int char int /**/ /* ----- function prototypes ----- */ void write_src(void); void merge_source(void); #include "prototyp.h" /* --- function includes --- */ #include "input.c" #include "utility.c" /* ----- begin program ------------- */ xstring[BUFSIZE]; **temp_prog; **temp_label; *temp_byte; *byte_array; **label_nam; token; /* /* /* /* /* /* /* the print string temp program array temp label name array temp byte code array byte code array labels name array token: current byte code

int main(int argc, char *argv[]) { int ab_code=1, x=0; printf("Bxbasic Compiler\n"); if(argc != 2) { a_bort(ab_code,x); } strcpy(t_holder, argv[1]); line_cnt(argv); merge_source(); /* --- end of program --- */ clr_arrays(); return 0; } /*-------------------------------*/

124

void merge_source() { char ch, dot='.', *destin, source[20]; int ii, data, ab_code=3, x=0; unsigned size=PATH; destin = malloc(size * sizeof(char)); strcpy(destin, s_holder); /* copy source file name */ ii = 0; ch = '\0'; while(ch != dot) { ch = s_holder[ii]; /* make source.bas = source. */ ii++; } destin[ii] = '\0'; strcat(destin, "exe"); /* append "exe" to filename */ /* --- open destination file (write-binary) --- */ f_out = fopen(destin,"wb"); printf("Destination file: %s\n",destin); /* --- read-in scripting engine (read-binary) --- */ strcpy(source, "engine.exe"); f_in = fopen(source, "rb"); if(f_in == NULL) { a_bort(ab_code,x); } printf("Source file: %s\n",source); while(! feof(f_in)) { data = fgetc(f_in); /* data = incoming stream */ if(! feof(f_in)) { fputc(data, f_out); /* write to destination file */ } } fclose(f_in); /* done copying engine */ /* --- store byte code --- */ printf("Source file:%s\n",s_holder); write_src(); fclose(f_out); printf("Program lines=%d\nDone!\n",nrows); free(destin); } /*------ end merge_source -------*/

125

void write_src() { char *tmp="source.tmp"; int ii; /* --- store nrows --- */ fprintf(f_out,"%d\n", nrows); /* write nrows */

/* --- write byte code --- */ for(ii=0; ii < nrows; ii++) { fprintf(f_out,"%s\n", label_nam[ii]); /* write block label */ } for(ii=0; ii < nrows; ii++) { fprintf(f_out,"%d\n", byte_array[ii]); /* write byte_code */ } for(ii=0; ii < nrows; ii++) { fprintf(f_out,"%s", array1[ii]); /* write statement */ } } /*-------------------------------*/

void a_bort(int code,int x) { switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxcomp source.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxcomp source.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 3: printf("Engine.exe not found.\n"); printf("Copy Script-Engine to this directory.\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 4: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(x+1),p_string); printf("Unknown Command.\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

126

Copy this code to Bxcomp.c, replacing the existing code and save it. Examine "merge_source()" and you will see that the top portion is pretty much unchanged, it performs what it did in the prior version. The difference is in the latter portion, where a call is made to a new function, "write_src()". Here, in "write_src()", the three data arrays are appended sequentially to the engine. Compile Bxcomp.c. Make sure there are no errors. Copy this Basic source code to Test.bas: ' test.bas version 4.3 Start1: CLS GOTO Jump Return: GOTO TheEnd Jump: PRINT "hello world!" ' -----------LET xylophone = 50 LET yazoo = 100 LET abc = yazoo/xylophone xyz = yazoo/10 LOCATE abc , xyz PRINT "hello world!" ' -----------------------------------------quasar = 2 zapp = 4 abc = (quasar * quasar * zapp + zapp)/5 xyz = ((quasar * quasar) * zapp) + zapp LOCATE abc, xyz PRINT "hello world!" ' -----------------------------------------PRINT: PRINT " 2*(3+4)*5/10 = "; abc = 2*(3+4)*5/10 PRINT abc GOTO Return TheEnd: END ' ------------------------------------------

Test Bxcomp.exe, by entering at the command line : BXCOMP TEST You CAN NOT execute Test.exe yet, we still have some work to do on Engine.c. Using the "List.com" utility, on the command line, enter : LIST TEST.EXE Now press: W press: B

127

and there you have it. You are at the very bottom of the file, so arrow up until you get to: Start1 On the line above and to the right, you will see : "26" that is the number of program lines, (or nrows). "Start1" and the lines below it, are the data in the labels array. Arrow down, to : "TheEnd" the last label. That is where the byte-code begins. Below the byte codes are the program statements. Compare the source code in Test.bas with what you see in Test.exe.

THE ENGINE
Fortunately, Bxbasic and the Runtime Engine share many of the same components. The only differences from Bxbasic reside in the main engine module: Engine.c and Enginput.c. Since the byte coding has all been done, "load_bas1()" in Enginput.c only has to re-create the three arrays and load the byte-code. The parser takes it from there. Here is the new code for load_bas1():

/* Enginput.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h" void load_bas1() { char ch, *tmp=s_holder; char ln_holder[LINE_NUM]; int ii, len, pi; unsigned size, buffr=10; long offset = 28240; /* --- open file --- */ f_in = fopen(tmp,"r"); /* source file is now: "filename.exe" */ fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); /* fetch "nrows" */ nrows = atoi(p_string); /* --- create program arrays --- */ array1 = malloc(nrows * sizeof(char *)); label_nam = malloc(nrows * sizeof(char *));

/* offset: Engine.exe file length */

128

(Continued) for(ii = 0; ii < nrows; ii++) { label_nam[ii] = malloc(LLEN * sizeof(char)); } byte_array = malloc(nrows * sizeof(int)); /* --- read byte code --- */ for(ii=0; ii < nrows; ii++) { fgets(p_string, BUFSIZE, f_in); /* read in labels */ len = strlen(p_string); len--; p_string[len] = '\0'; strcpy(label_nam[ii], p_string); } for(ii=0; ii < nrows; ii++) { fgets(p_string, buffr, f_in); /* read in byte code */ byte_array[ii] = atoi(p_string); } for(ii=0; ii < nrows; ii++) { fgets(p_string, BUFSIZE, f_in); /* read in statement */ len = strlen(p_string); size = (len + 1); array1[ii] = malloc(size * sizeof(char)); strcpy(array1[ii], p_string); } fclose(f_in); } /*-------------------------------*/

Copy this new version of load_bas1() to Enginput.c and save it. Here is the new version of Engine.c, (you will notice that it is almost identical to Bxbasic.c):

/* bxbasic: Engine.c : alpha version.02 */ /* #define Power_C */ #define LccWin32 /* --- declare headers --- */ #include <stdio.h> #include <conio.h> #include <io.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <malloc.h> #ifdef Power_C #include <bios.h> #endif /* Power-C version */

129

(Continued) #ifdef LccWin32 #include <tcconio.h> #endif /* LccWin32 version */

/* --- declare constants --- */ #define BUFSIZE 256 #define LINE_NUM 6 #define TOKEN_LEN 21 #define VAR_NAME 33 #define LLEN 33 /* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* pointer to program array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token /**/ char char char int int char int /**/ int *iv_stack; char **in_stack; int imax_vars=0; /**/ /* ----- function prototypes ----- */ void load_bas1(void); void pgm_parser(void); void void void void get_token(void); parser(void); xstring_array(void); go_to(void); /* stack:integer variable values /* stack:integer variable names /* stack:integer variable counter */ */ */ xstring[BUFSIZE]; **temp_prog; **temp_label; *temp_byte; *byte_array; **label_nam; token; /* /* /* /* /* /* /* the print string temp program array temp label name array temp byte code array byte code array labels name array token: current byte code */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */

/* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "enginput.c" #include "rdparser.c"

130

(Continued) /* ----- begin program ------------- */

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("Bxbasic Engine\n"); strcpy(s_holder, argv[0]); load_bas1(); pgm_parser(); /* --- end of program --- */ clr_arrays(); return 0; } /*-------------------------------*/ /* make argv[0] global */

void pgm_parser() { line_ndx = 0; while(line_ndx < nrows) { s_pos = 0; e_pos = 0; get_token(); parser(); line_ndx++; } } /*-------------------------------*/

void get_token() { strcpy(p_string, array1[line_ndx]); token = byte_array[line_ndx]; } /*-------------------------------*/

131

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* LET */ parse_let(); break; case 2: /* CLEAR */ clr_int(); break; case 3: /* LOCATE */ locate(); break; case 4: /* PRINT */ xstring_array(); get_prnstring(); break; case 5: /* GOTO */ go_to(); break; case 6: /* BEEP */ beep(); break; case 7: /* CLS */ cls(); break; case 8: /* END */ printf("\nEnd of Program\n"); line_ndx = nrows; break; case -1: /* block label */ break; default: a_bort(ab_code, x); break; } } /*-------------------------------*/

void xstring_array() { char ch, quote='\"'; int pi, si=0, ab_code; int stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; if(ch == ':') { return; }

/* if next character is a ":", get out */

132

(Continued) if(isalpha(ch)) /* if next character is an alpha, */ { return; /* it's a varname, get out */ } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { ch = ' '; while((ch != quote) && (pi < stlen)) { si++; pi++; ch = p_string[pi]; } if((si <= 1) && (pi < stlen)) { ab_code=5; a_bort(ab_code, x); } else if(pi >= stlen) { ab_code=6; a_bort(ab_code, x); } } } /*-------------------------------*/

void go_to() { char ch; char goto_label[LLEN]; int pi, si=0, ab_code=8; int xtest, stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; while(isalnum(ch)) { goto_label[si] = ch; pi++; si++; ch = p_string[pi]; } goto_label[si] = '\0'; /* add string terminator */ pi = -1; /* now compare gtl_holder[] to array2[n] */ xtest = -1; while(xtest != 0) { pi++; xtest = strcmp(label_nam[pi], goto_label); if(pi == nrows) { strcpy(t_holder, goto_label); a_bort(ab_code, x); /* error, label not found */

133

(Continued) } } pi--; line_ndx = pi;

/* set line_ndx to the goto_line */ } /*-------------------------------*/

Copy and save this code to Engine.c, replacing all the existing code. Now compile Engine.c. And get the Engine.exe file length. Remember, we need that number for the "offset" value. Once you have that number, open Enginput.c and replace this statement with the correct file length: long offset = 28240; Save Enginput.c and recompile Engine.c. Now at the command line enter: Then execute : TEST.EXE BXCOMP TEST /* offset: Engine.exe file length */

Well, how's that ?! Our first real Byte-Code Compiler and Scripting Engine !

FLOATING POINT
Before we wrap up this session, let's see if we can add at least one more variable type to Bxbasic. Integers are somewhat limited in that you can't express real numbers, such as:

x = 10/3
We are limited to a whole number as the product of the expression. That's okay if all you want is the whole number value, but, in many cases you need the fractional part as well. For this reason, among others, I've selected Double Precision Floating Point for this new variable type. We are getting to the point in Bxbasic where making a small change, like the addition of a new variable type, has wide reaching implications. It affects everything from the functions in Input.c to those in Output.c. We are going to need the ability to distinguish one variable type from another. In C, except for the variable type declaration, there is no special characteristic or way to distinguish one variable name from another, thus you can't have two variables of the same name, at least the exact spelling and case, even if they are of different types.

134

Example:

char variable_name; int variable_name;
would generate an error. Fortunately, Standard Basic provides a method for doing just that. Basic uses a "Type Classification Symbol", which is appended to the variable name and sets it apart from any other variable, even one of the same name. Example:

abc = 100 abc# = 100.5 abc$ = "this is a test"
is perfectly valid in Standard Basic. The Classification Symbol is not part of the variable name, it merely distinguishes the "type" of variable. This could be referred to as type-casting. So, each time we add a new variable type, we have to specify to the program what type of variable we are referring to in a statement or expression. Currently, Bxbasic uses "type integer" as the default data type, so we don't have to specify a particular type, that's because integer is the only variable type we have. We can continue to use type integer as the default type, as long as we specify the type of any other variable that is not type integer. Example:

abc = 100 abc# = 1.5 abc x abc# -------150.0
With the exception of variables that are of type integer, in every case in which they are used, we will have to know the variable type, so that we know where to find it and how to redirect the program flow based on how we have to deal with that particular variable. The first thing we need to do is to declare, in Bxbasic.c, the new variable type and the supporting arrays it will need. Open Bxbasic.c and add these new declarations to the "global vars" list: /* ------ global vars ------------ */ [snip]... /**/ char double char int /**/ var_type; *dv_stack; **dn_stack; dmax_vars=0; /* /* /* /* current variable type stack:double float values stack:double float names stack:double float counter */ */ */ */

135

In function parser(), change "case 2" to read as follows:

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* LET */ parse_let(); break; case 2: /* CLEAR */ clr_vars(); break;

Save Bxbasic.c and close it. The next thing we need to deal with are the functions in Variable.c. Since nearly all the functions in Variable.c need to be modified and new ones added, here is the entire listing for Variable.c :

/* bxbasic : Variable.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h" void parse_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- compare name to double array --- */ if(ch == '#') /* double sign */ { ndx = get_dblndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */

136

(Continued) Match('='); dv_stack[ndx] = rdp_main(); } /* --- compare name to integer array --- */ else /* no type sign */ { ndx = get_intndx(varname); pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); iv_stack[ndx] = (int) rdp_main(); } } /*-------------------------------*/

This code in get_intndx() used to be integrated into parse_let(). This function now determines if the integer variable name already exists and creates it if not, then returns the variable index.

int get_intndx(char *name) { char varname[VAR_NAME]; int ndx=0, vflag=0, vi_pos=0; strcpy(varname, name); while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { if(vflag == 0) { if(in_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == imax_vars) /* did we reach the end of the stack */ { ndx = vi_pos; /* next available stack location */ if(vflag == 0) { init_int(); /* initialize a new integer variable */ ndx = imax_vars; ndx--; strcpy(in_stack[ndx], varname); /* save new varname */ } else /* if not, store this variable */ { strcpy(in_stack[ndx], varname); } } return ndx; } /*-------------------------------*/

137

Function get_dblndx() is identical, (except for the references to the doubles_stacks,) to get_intndx().

int get_dblndx(char *name) { char varname[VAR_NAME]; int ndx=0, vflag=0, vi_pos=0; strcpy(varname, name); while((ndx < dmax_vars) && (strcmp(dn_stack[ndx], varname) != 0)) { if(vflag == 0) { if(dn_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == dmax_vars) /* did we reach the end of the stack */ { ndx = vi_pos; /* next available stack location */ if(vflag == 0) { init_dbl(); /* initialize a new double variable */ ndx = dmax_vars; ndx--; strcpy(dn_stack[ndx], varname); /* save new varname */ } else /* if not, store this variable */ { strcpy(dn_stack[ndx], varname); } } return ndx; } /*-------------------------------*/

Function get_varname() was previously integrated into parse_let(), too.

char *get_varname() { char ch; static char varname[VAR_NAME]; int pi, si=0; pi = e_pos; ch = p_string[pi]; while((isalnum(ch) != 0)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; e_pos = pi; return varname; } /*-------------------------------*/

138

Function get_varvalue() now returns a value of type double and distinguishes between integer and double variables and handles them separately. Notice how type integer is the default.

double get_varvalue() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; double value; /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get var type --- */ pi = e_pos; ch = p_string[pi]; var_type = ch; /* --- now compare to var type array --- */ if(ch == '#') { while((ndx<dmax_vars)&&(strcmp(dn_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == dmax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = dv_stack[ndx]; _GetChar(); /* increment character pointer */ } else { while((ndx<imax_vars)&&(strcmp(in_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == imax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = (double) iv_stack[ndx]; } return value; } /*-------------------------------*/

void clr_vars() { clr_int(); clr_dbl(); } /*-------------------------------*/

139

void init_int() { int ndx; unsigned size; if(imax_vars == 0) { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = malloc(size * sizeof(int)); in_stack = malloc(size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = imax_vars; imax_vars++; size = imax_vars; iv_stack = realloc(iv_stack, size * sizeof(int)); in_stack = realloc(in_stack, size * sizeof(char *)); size = VAR_NAME; in_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_int -----------*/

void clr_int() { int ndx; if(imax_vars > 0) { free(iv_stack); for(ndx=0; ndx < imax_vars; ndx++) { free(in_stack[ndx]); } free(in_stack); imax_vars = 0; } } /*---------- end clr_int -----------*/

void init_dbl() { int ndx; unsigned size; if(dmax_vars == 0) { ndx = dmax_vars; dmax_vars++; size = dmax_vars; dv_stack = malloc(size * sizeof(double));

140

(Continued) dn_stack = malloc(size * sizeof(char *)); size = VAR_NAME; dn_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = dmax_vars; dmax_vars++; size = dmax_vars; dv_stack = realloc(dv_stack, size * sizeof(double)); dn_stack = realloc(dn_stack, size * sizeof(char *)); size = VAR_NAME; dn_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_dbl -----------*/

void clr_dbl() { int ndx; if(dmax_vars > 0) { free(dv_stack); for(ndx=0; ndx < dmax_vars; ndx++) { free(dn_stack[ndx]); } free(dn_stack); dmax_vars = 0; } } /*---------- end clr_dbl -----------*/

Copy these functions to Variable.c. Save and close it. Now we need to make changes to the functions in Rdparser.c and add some more new ones. Since Rdparser.c evaluates all the math expressions and it has to handle all numeric variables, regardless of type, it would be better to perform all calculations using just ONE type. It would be impractical to have separate routines that only processed variables of the same type. So, since Type Double is the largest (in bytes) and most accurate numeric type, it would be better to convert all numeric variables to type Double. Then using Type Doubles, calculate the expression to the highest degree of accuracy, then convert the product to the intended result type. With that said, here is the complete listing for Rdparser.c:

141

/* bxbasic : Rdparser.c : alpha version */ /* special credits to: Jack Crenshaw's "How to Build a Compiler" */ /* ----- function prototypes ----- */ #include "prototyp.h"

double rdp_main() /* Recursive Descent Parser Main */ { double value; value = Expression(); return value; } /*-------------------------------*/

double Expression() { char ch; int pi; double Value;

/* Parse and Translate an Expression */

pi = e_pos; ch = p_string[pi]; if(IsAddop(ch)) { Value = 0; } else { Value = Term(); pi = e_pos; ch = p_string[pi]; } while(IsAddop(ch)) { switch(ch) { case '+': Match('+'); Value = Value + Term(); break; case '-': Match('-'); Value = Value - Term(); break; default: break; } pi = e_pos; ch = p_string[pi]; } return Value; } /*-------------------------------*/

142

double Term() { char ch; int pi; double Value; Value = Factor(); pi = e_pos; ch = p_string[pi];

/* Parse and Translate a Math Term */

while(IsMultop(ch)) { switch(ch) { case '*': Match('*'); Value = Value * Factor(); break; case '/': Match('/'); Value = Value / Factor(); break; default: break; } pi = e_pos; ch = p_string[pi]; } return Value; } /*-------------------------------*/

double Factor() { char ch; int pi; double value;

/* Parse and Translate a Math Factor */

pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); value = Expression(); Match(')'); } else { if(isalpha(ch)) { value = get_varvalue(); SkipWhite(); } else { value = GetNum(); } } return value; } /*-------------------------------*/

143

void Match(char x) /* Match a Specific Input Character */ { char ch, string[6]; int pi, ab_code=12, ln=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch != x) { strcpy(string, "\" \""); string[1] = x; strcpy(t_holder, string); a_bort(ab_code,ln); } else { _GetChar(); SkipWhite(); } } /*-------------------------------*/

void _GetChar() { e_pos++; } /*-------------------------------*/

int Is_White(char ch) { int test=0; if((ch == ' ') || (ch == '\t')) { test = -1; } return test; } /*-------------------------------*/

144

void SkipWhite() { char ch; int pi;

/* Skip Over Leading White Space */

pi = e_pos; ch = p_string[pi]; while(Is_White(ch)) { _GetChar(); pi = e_pos; ch = p_string[pi]; } } /*-------------------------------*/

double GetNum() /* Get a Number */ { char ch; int pi, ab_code=12, ln=line_ndx; double value=0; pi = e_pos; ch = p_string[pi]; if((! isdigit(ch)) && (ch != '.')) { strcpy(t_holder, "Numeric Value"); a_bort(ab_code,ln); } value = asc_2_dbl(); pi = e_pos; ch = p_string[pi]; if(isdigit(ch)) { while(isdigit(ch)) { pi++; ch = p_string[pi]; } e_pos = pi; } SkipWhite(); return value; } /*-------------------------------*/

145

int IsAddop(char ch) { int rval=0;

/* Recognize an Addop */

if((ch == '+') || (ch == '-')) { rval = 1; } return rval; } /*-------------------------------*/

int IsMultop(char ch) { int rval=0;

/* Recognize an Addop */

if((ch == '*') || (ch == '/')) { rval = 1; } return rval; } /*-------------------------------*/

double asc_2_dbl() { char ch, cvalue[33]; int pi, vi_pos=0; double fvalue; pi = e_pos; ch = p_string[pi]; while((isdigit(ch)) || (ch == '.') && (vi_pos <= 32)) { cvalue[vi_pos] = ch; pi++; vi_pos++; ch = p_string[pi]; } cvalue[vi_pos] = '\0'; fvalue = atof(cvalue); /* convert ascii to integer */ e_pos = pi; return fvalue; } /*------- end asc_2_dbl ---------*/

You will notice that now many of the functions return a type Double and throughout, that variable "value" is of type Double. Replace Rdparser.c with this version and save it.

146

Now, in Input.c, two very minor changes need to be made to functions tmp_byte() and get_byte(). Open file Input.c and copy these two functions, replacing the existing ones:

void tmp_byte(int ii) { char ch; int pi, si, byte; int x=ii, ab_code=4; /* ----- fill temp_byte[] here ----- */ pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; if(ch == '\'') /* it's a comment */ { byte = 0; strcpy(temp_prog[ii], "\n\0"); } else { if(isupper(ch)) /* is this a keyword */ { e_pos = pi; byte = get_byte(ii); /* call get_byte */ pi = e_pos; } else if(isalpha(ch)) /* a possible assignment */ { si = pi; /* save pointer position */ while(isalnum(ch)) { pi++; ch = p_string[pi]; } else { a_bort(ab_code, x); } } else { a_bort(ab_code, x); } } temp_byte[ii] = byte; e_pos = pi; } /*-------------------------------*/

/* not an assignment */

/* not a keyword or variable */

int get_byte(int ii) { char ch, keyword[TOKEN_LEN]; int pi, si=0, byte; int x=ii, ab_code=4; pi = e_pos; ch = p_string[pi]; while(isalnum(ch))

147

(Continued) { keyword[si] = ch; si++; pi++; ch = p_string[pi];

} keyword[si] = '\0'; /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) byte=1; else if(strcmp(keyword, "CLEAR") == 0) byte=2; else if(strcmp(keyword, "LOCATE") == 0) byte=3; else if(strcmp(keyword, "PRINT") == 0) byte=4; else if(strcmp(keyword, "GOTO") == 0) byte=5; else if(strcmp(keyword, "BEEP") == 0) byte=6; else if(strcmp(keyword, "CLS") == 0) byte=7; else if(strcmp(keyword, "END") == 0) byte=8; else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#", ch)) /* a variable assignment */ { byte = 1; pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } e_pos = pi; return byte; } /*-------------------------------*/

That's it. Now save Input.c and close.

148

Open file Output.c and replace function get_prnvar() with this version:

void get_prnvar() { char ch; int pi; double value; pi = e_pos; pi = iswhite(pi); e_pos = pi; value = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* --- comma="tab", semi-colon="no \n", Default is: colon="\n". --- */ if(var_type == '#') { if(ch == ',') printf("%.2f%c", value, '\t'); else if(ch == ';') printf("%.2f", value); else printf("%.2f\n", value); } else { if(ch == ',') printf("%d%c", (int) value, '\t'); else if(ch == ';') printf("%d", (int) value); else printf("%d\n", (int) value); } } /*-------------------------------*/

Save Output.c and close it. Now, let's finish up with file Prototyp.h. Copy these modified versions of the Variable.c and Rdparser.c sections to Prototyp.c:

/* bxbasic : Prototyp.h : alpha version */ /* ----- function prototypes ----- */ [snip]... /* Variable.c */ void parse_let(void); double get_varvalue(void); char *get_varname(void); int get_intndx(char *); int get_dblndx(char *); void clr_vars(void); void init_int(void); void clr_int(void); void init_dbl(void); void clr_dbl(void);

149

(Continued) /* Rdparser.c */ double rdp_main(void); double Expression(void); double Term(void); double Factor(void); void Match(char); void _GetChar(void); double GetNum(void); int IsAddop(char); int IsMultop(char); int Is_White(char); void SkipWhite(void); double asc_2_dbl(void);

Save Prototyp.c and close it. Well, that should be all that we need. As you see, most of the significant changes take place in Variable.c and Rdparser.c. The rest of the changes are mostly minor. Compile Bxbasic.c. Assuming that it compiles without errors, run this new version of Test.bas: ' test.bas version 4.4 Start1: CLS PRINT "hello world!" ' -----------LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT "hello world!" ' -----------------------------------------quasar = 2 zapp = 4 abc = (quasar * quasar * zapp + zapp)/5 xyz = ((quasar * quasar) * zapp) + zapp LOCATE abc, xyz PRINT "hello world!" ' -----------------------------------------PRINT: PRINT " 2*(3+4)*5/10 = "; abc = 2*(3+4)*5/10 PRINT abc ' -----------------------------------------PRINT: PRINT "xylophone# = "; PRINT xylophone#

150

(Continued) PRINT "yazoo# = "; PRINT yazoo# TheEnd: CLEAR END ' ------------------------------------------

CONCLUSION
I think we've just about covered everything we intended to in our redefinition of Bxbasic and except for adding Type Double to the Engine, it's up to date as well. There's still more to come.

151

CHAPTER - 5
INTRODUCTION
Welcome back. In the last chapter we made considerable progress, by making some changes to the appearance of Bxbasic and adding a second variable type. Let's get started by carrying those changes over to the engine.

ENGINE UPDATE
Open Engine.c and add these declarations for the type double variables under the "global vars" list: /**/ char double char int /**/ var_type; *dv_stack; **dn_stack; dmax_vars=0; /* /* /* /* current variable type stack:double float values stack:double float names stack:double float counter */ */ */ */

In function parser(), also in Engine.c, change the block for CLEAR as follows:

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* LET */ parse_let(); break; case 2: /* CLEAR */ clr_vars(); break; case 3: /* LOCATE */

Save Engine.c and close it. Now, in Enginput.c, we just need to delete one line. In the variable declarations for load_bas1(), delete the line that says : char ln_holder[LN_NUM];

so that it reads as follows:

152

void load_bas1() { char ch, *tmp=s_holder; int ii, len, pi; unsigned size, buffr=10; long offset = 37072;

/* offset: Engine.exe file length */

Okay, save Enginput.c and re-compile Engine.c. Now, we need to update the current offset. So get the new file length for Engine.exe and replace the old offset with the new offset number. That should do it. Re-compile Engine.c. Make sure the offset is still the same. If it is the same, we still have one more step to complete. In Bxbasic, we made some changes that affect Bxcomp. We don't need to make any changes to Bxcomp, that's already done, we just need to recompile Bxcomp.c so the changes take effect. Recompile Bxcomp.c now. Now, using Bxcomp.exe, let's compile Test.bas : and execute : That's all there is to it ! TEST BXCOMP TEST

LONG INTEGER
Let's add Long Integers to Bxbasic. The one thing Basic doesn't clearly define is a Type Symbol for Long Integers. Basic does use the "%" symbol (optionally) for integers. Since regular integers are our default type, requiring no type sign, we will use that symbol to represent Long Integers. To accomplish that, all we have to do is to duplicate what we did when we added Doubles. Add these variable declarations to the "global vars" list in Bxbasic.c: /* ------ global vars ------------ */ [snip]... /**/ long *lv_stack; /* stack:long variable values char **ln_stack; /* stack:long variable names int lmax_vars=0; /* stack:long variable counter /**/

*/ */ */

Save Bxbasic.c and close it. Now we need to make some additions to Variable.c. All we need to add to parse_let() is the portion that does the type comparison.

153

Here's the new listing for parse_let() :

void parse_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- compare name to double array --- */ if(ch == '#') /* double sign */ { ndx = get_dblndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); dv_stack[ndx] = rdp_main(); } /* --- compare name to long array --- */ else if(ch == '%') /* long sign */ { ndx = get_lngndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); lv_stack[ndx] = (long) rdp_main(); } /* --- compare name to integer array --- */ else /* no type sign */ { ndx = get_intndx(varname); pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); iv_stack[ndx] = (int) rdp_main(); } } /*-------------------------------*/

Add this new listing for get_lngndx() :

154

int get_lngndx(char *name) { char varname[VAR_NAME]; int ndx=0, vflag=0, vi_pos=0; strcpy(varname, name); while((ndx < lmax_vars) && (strcmp(ln_stack[ndx], varname) != 0)) { if(vflag == 0) { if(ln_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == lmax_vars) /* did we reach the end of the stack */ { ndx = vi_pos; /* next available stack location */ if(vflag == 0) { init_lng(); /* initialize a new long variable */ ndx = lmax_vars; ndx--; strcpy(ln_stack[ndx], varname); /* save new varname */ } else /* if not, store this variable */ { strcpy(ln_stack[ndx], varname); } } return ndx; } /*-------------------------------*/

And here's the newly modified listing for get_varvalue() :

double get_varvalue() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; double value; /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get var type --- */ pi = e_pos; ch = p_string[pi]; var_type = ch; /* --- now compare to var type array --- */ if(ch == '#') { while((ndx<dmax_vars)&&(strcmp(dn_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == dmax_vars) /* error: did not find it */ { a_bort(ab_code, x); }

155

(Continued) /* increment character pointer */ } else if(ch == '%') { while((ndx<lmax_vars)&&(strcmp(ln_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == lmax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = (double) lv_stack[ndx]; _GetChar(); /* increment character pointer */ } else { while((ndx<imax_vars)&&(strcmp(in_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == imax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = (double) iv_stack[ndx]; } return value; } /*-------------------------------*/ value = dv_stack[ndx]; _GetChar();

Here's the new code for clr_vars(), init_lng() and clr_lng() :

void clr_vars() { clr_int(); clr_dbl(); clr_lng(); } /*-------------------------------*/

void init_lng() { int ndx; unsigned size; if(lmax_vars == 0) { ndx = lmax_vars; lmax_vars++; size = lmax_vars; lv_stack = malloc(size * sizeof(long)); ln_stack = malloc(size * sizeof(char *));

156

(Continued) size = VAR_NAME; ln_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = lmax_vars; lmax_vars++; size = lmax_vars; lv_stack = realloc(lv_stack, size * sizeof(long)); ln_stack = realloc(ln_stack, size * sizeof(char *)); size = VAR_NAME; ln_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_lng -----------*/

void clr_lng() { int ndx; if(lmax_vars > 0) { free(lv_stack); for(ndx=0; ndx < lmax_vars; ndx++) { free(ln_stack[ndx]); } free(ln_stack); lmax_vars = 0; } } /*---------- end clr_lng -----------*/

Copy all these functions to Variable.c. Then save and close it. In Input.c, there are two small changes that need to be made to tmp_byte() and get_byte(). In tmp_byte(), in the "if(isalpha(ch))" section, we need to add the type symbol for Type Long Integer. Change that part to read as follows: [snip]... else if(isalpha(ch)) { si = pi; while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%",ch)) { byte = 1;

/* a possible assignment */ /* save pointer position */

/* a variable assignment */

157

(Continued) pi = si; } else { a_bort(ab_code, x); } } [snip]...

/* not an assignment */

We need to make a similar addition to get_byte(). Here is the part we need to change: [snip]... else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%", ch)) /* a variable assignment */ { byte = 1; pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } [snip]...

Copy these changes to Input.c. Then save and close it. Now, in Output.c, we need to add the ability to print Long Integer values. Change get_prnvar() so that this section reads as follows: [snip]... /* --- comma="tab", semi-colon="no \n", Default is: colon="\n". --- */ if(var_type == '#') { if(ch == ',') printf("%.2f\t", value); else if(ch == ';') printf("%.2f", value); else printf("%.2f\n", value); } else if(var_type == '%') { if(ch == ',') printf("%ld\t", (long) value); else if(ch == ';') printf("%ld", (long) value); else printf("%ld\n", (long) value); } else { if(ch == ',') printf("%d\t", (int) value); else if(ch == ';') printf("%d", (int) value); else printf("%d\n", (int) value); } [snip]...

158

While we're at it, I discovered a small error in locate(), it's something that I over looked last time. In the ROWS and COLUMNS sections, the return value from get_varvalue() has to be type-cast to integer. Change both these parts as follows: [snip]... /* -------- rows -------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; row_x = (int) get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; } [snip]... [snip]... /* -------- columns -------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; col_y = (int) get_varvalue(); } [snip]...

Make these changes to Output.c. Then save and close it. The last thing we need to do is to update Prototyp.h. Modify the Variable.c part to include : get_lngndx(), init_lng() and clr_lng(): [snip]... /* Variable.c */ void parse_let(void); double get_varvalue(void); char *get_varname(void); int get_intndx(char *); int get_dblndx(char *); int get_lngndx(char *); void init_int(void); void init_dbl(void); void init_lng(void); void clr_vars(void); void clr_int(void); void clr_dbl(void); void clr_lng(void); [snip]...

I believe that's everything. Save Prototyp.h and close it.

159

Now compile Bxbasic.c. Try Bxbasic.exe with this version of Test.bas that incorporates long integers: ' test.bas version 5.1 Start1: CLS PRINT "hello world!" ' ------------------------------------------double float LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT "hello world!" ' ------------------------------------------long integers quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT "hello world!" ' -----------------------------------------PRINT: PRINT " 2*(3+4)*5/10 = "; abc = 2*(3+4)*5/10 PRINT abc ' -----------------------------------------PRINT: PRINT "xylophone# = "; PRINT xylophone# PRINT "yazoo# = "; PRINT yazoo# PRINT: PRINT "abc%="; PRINT abc% PRINT "xyz%="; PRINT xyz% ' -----------------------------------------TheEnd: CLEAR END ' ------------------------------------------

160

SINGLE PRECISION
Well, to round this part out, we might as well add Single Precision Floating Point variables as well. Basic does provide a Symbol Type for Single Precision numbers, that is the "!" symbol. With that in hand, let's add Single Precision. Since this is all pretty fresh in our minds, we should have no trouble doing this part. All we have to do is to duplicate what we just did in the last section. Here's the new code for Bxbasic.c, add these declarations for type "float" to the "global vars" :

/* ------ global vars ------------ */ [snip]... /**/ float *fv_stack; /* stack:float variable values char **fn_stack; /* stack:float variable names int fmax_vars=0; /* stack:float variable counter /**/ [snip]...

*/ */ */

Copy the above to Bxbasic.c. Save and close it. Here are the changes for file Variable.c : In parse_let(), change the comparison portion to read as follows:

void parse_let() [snip]... /* --- compare name to double array --- */ if(ch == '#') /* double sign */ { ndx = get_dblndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); dv_stack[ndx] = rdp_main(); } /* --- compare name to float array --- */ else if(ch == '!') /* float sign */ { ndx = get_fltndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('=');

161

(Continued) fv_stack[ndx] = (float) rdp_main(); } /* --- compare name to long array --- */ else if(ch == '%') /* long sign */ { ndx = get_lngndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); lv_stack[ndx] = (long) rdp_main(); } /* --- compare name to integer array --- */ else /* no type sign */ { ndx = get_intndx(varname); pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); iv_stack[ndx] = (int) rdp_main(); } [snip]...

Now we need to add a new function:

get_fltndx():

int get_fltndx(char *name) { char varname[VAR_NAME]; int ndx=0, vflag=0, vi_pos=0; strcpy(varname, name); while((ndx < fmax_vars) && (strcmp(fn_stack[ndx], varname) != 0)) { if(vflag == 0) { if(fn_stack[ndx][0] == '\0') /* found a null */ { vi_pos = ndx; /* mark this array index */ vflag = 1; /* set the exit flag to true */ } } ndx++; /* increment the index */ } if(ndx == fmax_vars) /* did we reach the end of the stack */ { ndx = vi_pos; /* next available stack location */ if(vflag == 0) { init_flt(); /* initialize a new float variable */ ndx = fmax_vars; ndx--; strcpy(fn_stack[ndx], varname); /* save new varname */ }

162

(Continued) else /* if not, store this variable */ { strcpy(fn_stack[ndx], varname); } } return ndx; } /*-------------------------------*/

And, in get_varvalue(), we need to modify the compare section:

double get_varvalue() [snip]... /* --- now compare to var type array --- */ if(ch == '#') { while((ndx<dmax_vars)&&(strcmp(dn_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == dmax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = dv_stack[ndx]; _GetChar(); /* increment character pointer */ } else if(ch == '!') { while((ndx<fmax_vars)&&(strcmp(fn_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == fmax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = (double) fv_stack[ndx]; _GetChar(); /* increment character pointer */ } else if(ch == '%') { while((ndx<lmax_vars)&&(strcmp(ln_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == lmax_vars) /* error: did not find it */ { a_bort(ab_code, x); } value = (double) lv_stack[ndx]; _GetChar(); /* increment character pointer */ } else { while((ndx<imax_vars)&&(strcmp(in_stack[ndx],varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == imax_vars) /* error: did not find it */ { a_bort(ab_code, x); }

163

(Continued) value = (double) iv_stack[ndx]; } [snip]...

Change clr_vars() to read as follows:

void clr_vars() { clr_int(); clr_dbl(); clr_lng(); clr_flt(); } /*-------------------------------*/

Now we need to add two more new functions, init_flt() and clr_flt() :

void init_flt() { int ndx; unsigned size; if(fmax_vars == 0) { ndx = fmax_vars; fmax_vars++; size = fmax_vars; fv_stack = malloc(size * sizeof(float)); fn_stack = malloc(size * sizeof(char *)); size = VAR_NAME; fn_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = fmax_vars; fmax_vars++; size = fmax_vars; fv_stack = realloc(fv_stack, size * sizeof(float)); fn_stack = realloc(fn_stack, size * sizeof(char *)); size = VAR_NAME; fn_stack[ndx] = malloc(size * sizeof(char)); } } /*---------- end init_flt -----------*/

164

void clr_flt() { int ndx; if(fmax_vars > 0) { free(fv_stack); for(ndx=0; ndx < fmax_vars; ndx++) { free(fn_stack[ndx]); } free(fn_stack); fmax_vars = 0; } } /*---------- end clr_flt -----------*/

Okay, copy these new functions and changes to Variable.c. Then save and close it. In file Input.c, we need to add the symbol for single precision in tmp_byte() and get_byte() :

void tmp_byte(int ii) [snip]... else if(isalpha(ch)) { si = pi; while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!",ch)) { byte = 1; pi = si; } else { a_bort(ab_code, x); } } [snip]... /* a possible assignment */ /* save pointer position */

/* a variable assignment */

/* not an assignment */

165

int get_byte(int ii) [snip]... else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!", ch)) /* a variable assignment */ { byte = 1; pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } [snip]...

Copy these changes to Input.c. Save and close it. Only one small addition has to be made to Output.c. That's adding the float symbol to the print routine in get_prnvar(). The "print:double" routine will work with regular floats as well as doubles, so all we need to do is to add the symbol to the "double" routine:

void get_prnvar() [snip]... /* --- comma="tab", semi-colon="no \n", Default is: colon="\n". --- */ if(strchr("#!", var_type)) { if(ch == ',') printf("%.2f\t", value); else if(ch == ';') printf("%.2f", value); else printf("%.2f\n", value); } else if(var_type == '%') { if(ch == ',') printf("%ld\t", (long) value); else if(ch == ';') printf("%ld", (long) value); else printf("%ld\n", (long) value); } else { if(ch == ',') printf("%d\t", (int) value); else if(ch == ';') printf("%d", (int) value); else printf("%d\n", (int) value); } [snip]...

Save these changes to Output.c and close it. The last thing to do is add the new prototypes to Prototyp.h.

166

/* ----- function prototypes ----- */ [snip]... /* Variable.c */ void parse_let(void); double get_varvalue(void); char *get_varname(void); int int int int void void void void get_intndx(char get_dblndx(char get_lngndx(char get_fltndx(char init_int(void); init_dbl(void); init_lng(void); init_flt(void); *); *); *); *);

void clr_vars(void); void clr_int(void); void clr_dbl(void); void clr_lng(void); void clr_flt(void); [snip]...

Okay, save these changes to Prototyp.h and close it. Now re-compile Bxbasic.c. Assuming it compiled without errors, try Bxbasic.exe with this version of Test.bas: ' test.bas version 5.2 Start1: CLS ' -----------------------------------------abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integer PRINT: PRINT "abc="; PRINT abc PRINT "xyz="; PRINT xyz ' long PRINT "abc%="; PRINT abc%

167

(Continued) PRINT "xyz%="; PRINT xyz% ' PRINT PRINT PRINT PRINT ' "abc!="; abc! "xyz!="; xyz! float

double PRINT "abc#="; PRINT abc# PRINT "xyz#="; PRINT xyz# ' -----------------------------------------TheEnd: CLEAR END ' ------------------------------------------

OPTIMIZATION
You may have observed that each time we add a new variable type, the amount of code seems to increase exponentially. Well, we can do something about that. One thing we can do, is to eliminate those four functions; get_intndx(), get_lngndx(), get_fltndx() and get_dblndx(). With the help of some C magic, using pointers, we can create one generic function that does what all four functions currently do. Here is a snapshot of what they all have in common. I'll highlight the key points: [snip]... while((ndx < imax_vars) && (strcmp(in_stack[ndx], varname) != 0)) { if(vflag == 0) { if(in_stack[ndx][0] == '\0') { vi_pos = ndx; vflag = 1; } } ndx++; } if(ndx == imax_vars) { ndx = vi_pos; if(vflag == 0) { init_int(); ndx = imax_vars; ndx--; strcpy(in_stack[ndx], varname); }

168

(Continued) else { strcpy(in_stack[ndx], varname); } } [snip]...

Do you see what I mean ? They each make references to variable "?max_vars", name stack array "?n_stack[ndx]" and function "init_???()". Instead of using the literal names of those objects, we could use a generic form of each object. That would be a pointer to the object that we want to reference, instead of the actual object name. We simply assign generic pointers to the objects prior to entry to the generic function, which would then act as though we had given it the real object name. Example: we declare three generic objects: a pointer to the name_stack : a global variable: a pointer to a function: Then we assign the pointers to the real objects: [snip]... if(ch == '#') /* double sign */ { nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; init_fn = init_dbl; /* indirect reference to function */ ndx = get_varndx(varname); [snip]...

char **nam_stack; int max_vars; void (*init_fn)();

Then, in the function itself, we replace the real object names with the pointers. I admit, this sounds a little complex and it is, but, this is one of those great features of the C language that allows you to refer to an object indirectly, it's called "indirection". Okay, to get started, open Bxbasic.c. And, add these three objects to the "global vars" list: [snip]... /**/ char **nam_stack; int max_vars; void (*init_fn)(); /**/ [snip]... Save Bxbasic.c and close it. Now, open Variable.c. Here is the new listing for parse_let():

169

void parse_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- compare name to double array --- */ if(ch == '#') /* double sign */ { nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; init_fn = init_dbl; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); dv_stack[ndx] = rdp_main(); } /* --- compare name to float array --- */ else if(ch == '!') /* float sign */ { nam_stack = fn_stack; /* indirect reference to name_stack max_vars = fmax_vars; init_fn = init_flt; /* indirect reference to function ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); fv_stack[ndx] = (float) rdp_main(); } /* --- compare name to long array --- */ else if(ch == '%') /* long sign */ { nam_stack = ln_stack; /* indirect reference to name_stack max_vars = lmax_vars; init_fn = init_lng; /* indirect reference to function ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi;

*/ */

*/ */

170

(Continued) /* --- now get assignment value --- */ Match('='); lv_stack[ndx] = (long) rdp_main(); } /* --- compare name to integer array --- */ else /* no type sign */ { nam_stack = in_stack; /* indirect reference to name_stack */ max_vars = imax_vars; init_fn = init_int; /* indirect reference to function */ ndx = get_varndx(varname); pi = iswhite(pi); e_pos = pi; /* --- now get assignment value --- */ Match('='); iv_stack[ndx] = (int) rdp_main(); } } /*-------------------------------*/

Copy this to Variable.c. Notice how three extra lines of code are added under each variable type. That's okay, because when we're done we'll cut out about 90 lines of redundant code. Now, delete these four functions from Variable.c:

int int int int

get_intndx(); get_dblndx(); get_lngndx(); get_fltndx();

Here is the new function that will replace them:

int get_varndx(char *name) { char varname[VAR_NAME]; int ndx=0, vflag=0, vi_pos=0; strcpy(varname, name); while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != { if(vflag == 0) { if(nam_stack[ndx][0] == '\0') /* found a null { vi_pos = ndx; /* mark this array index vflag = 1; /* set the exit flag to true } } ndx++; /* increment the index } if(ndx == max_vars) /* did we reach the end of the stack { ndx = vi_pos; /* next available stack location 0)) */ */ */ */ */ */

171

(Continued) if(vflag == 0) { (*init_fn)(); /* initialize a new variable */ ndx = max_vars; ndx--; } strcpy(nam_stack[ndx], varname); } return ndx; } /*-------------------------------*/

Copy get_varndx() to Variable.c. Here is a new listing for function get_varvalue() :

double get_varvalue() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; double value; /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get var type --- */ pi = e_pos; ch = p_string[pi]; var_type = ch; /* --- now compare to var type array --- */ if(ch == '#') { nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; _GetChar(); /* increment character pointer */ } else if(ch == '!') { nam_stack = fn_stack; /* indirect reference to name_stack */ max_vars = fmax_vars; _GetChar(); /* increment character pointer */ } else if(ch == '%') { nam_stack = ln_stack; /* indirect reference to name_stack */ max_vars = lmax_vars; _GetChar(); /* increment character pointer */ } else { nam_stack = in_stack; /* indirect reference to name_stack */ max_vars = imax_vars; } while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ }

172

(Continued) if(ndx == max_vars) /* error: did not find it */ { a_bort(ab_code, x); } if(ch == '#') { value = dv_stack[ndx]; } else if(ch == '!') { value = (double) fv_stack[ndx]; } else if(ch == '%') { value = (double) lv_stack[ndx]; } else { value = (double) iv_stack[ndx]; } return value; } /*-------------------------------*/

Copy this code to Variable.c. Now we need to make a few adjustments to our variable initialization functions. For each of the four function listed below, add the two lines to the bottom of the function, as I've shown here:

void init_int() [snip]... } nam_stack = in_stack; max_vars = imax_vars; } /*---------- end init_int -----------*/

void init_lng() [snip]... } nam_stack = ln_stack; max_vars = lmax_vars; } /*---------- end init_lng -----------*/

173

void init_dbl() [snip]... } nam_stack = dn_stack; max_vars = dmax_vars; } /*---------- end init_dbl -----------*/

void init_flt() [snip]... } nam_stack = fn_stack; max_vars = fmax_vars; } /*---------- end init_flt -----------*/

Make note that each pair of lines is unique to each function. Don't copy one pair of lines to all four functions. After you make these changes, save Variable.c and close it. Now open Prototyp.h. Delete those four functions from the "Variable.c" section and add the new one. It should now read as follows: /* ----- function prototypes ----- */ [snip]... /* Variable.c */ void parse_let(void); double get_varvalue(void); char *get_varname(void); int get_varndx(char *); void void void void init_int(void); init_dbl(void); init_lng(void); init_flt(void);

void clr_vars(void); void clr_int(void); void clr_dbl(void); void clr_lng(void); void clr_flt(void); [snip]...

174

Save Prototyp.h and close it. Re-compile Bxbasic.c. If there were any compiler errors double check your source code. Run this Test.bas to make sure it's running just as it did before: ' test.bas version 5.3 Start1: CLS PRINT "hello world!" ' ------------------------------------------double float LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT "hello world!" ' ------------------------------------------long integers quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT "hello world!" ' -----------------------------------------PRINT: PRINT " 2*(3+4)*5/10 = "; abc = 2*(3+4)*5/10 PRINT abc ' -----------------------------------------abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT: PRINT "abc="; PRINT abc PRINT "xyz="; PRINT xyz ' long integers PRINT "abc%="; PRINT abc% PRINT "xyz%="; PRINT xyz% ' float PRINT "abc!="; PRINT abc! PRINT "xyz!="; PRINT xyz!

175

(Continued) ' double PRINT "abc#="; PRINT abc# PRINT "xyz#="; PRINT xyz# ' -----------------------------------------TheEnd: CLEAR END ' ------------------------------------------

There should be no noticeable difference in the way it processes any of the four variable types. It's just more efficient now.

PRINTING NUMBERS
Take a look at the code segment below from get_prnvar(); [snip]... if(strchr("#!", var_type)) { if(ch == ',') printf("%.2f\t", value); else if(ch == ';') printf("%.2f", value); else printf("%.2f\n", value); } else if(var_type == '%') { if(ch == ',') printf("%ld\t", (long) value); else if(ch == ';') printf("%ld", (long) value); else printf("%ld\n", (long) value); [snip]...

Notice the part that displays the values for doubles and singles, (#!). The printf() statements read: printf("%.2f", value);

minus the "\t" and "\n". Run the last Test.bas and look at the output. See how two digits are displayed past the decimal point:

abc!=33000.33 xyz!=99000.47 abc#=333000.33 xyz#=999000.47

176

That is because of this statement: "%.2f", in the above printf command. What we've done here is tell printf() that regardless of how many digits follow the decimal point, only print two digits. The problem with that is that in many cases there will be more than two digits. We can solve that by eliminating the ".2" so that it reads: "%f". Let's do that now. Open Output.c and change: printf("%.2f\t", value); printf("%.2f", value); printf("%.2f\n", value); so that they read: printf("%f\t", value); printf("%f", value); printf("%f\n", value); now, save Output.c. Re-compile Bxbasic.c. Run Test.bas again. There, that's better,... well,... sort of....

abc!=33000.328125 xyz!=99000.468750 abc#=333000.330000 xyz#=999000.470000
Now we have a new problem. Now, it's printing not only all the digits, but up to six decimal places of zeros. That's not good ! Well, we could probably ponder this problem for hours and still not come up with a suitable solution, but, let's give it a try. Let's list the issues: • first, can we get rid of all those trailing zeros, • single precision values don't use enough bytes to store the decimal part accurately, so we end up with a long, drawn out, decimal value, • can we round-up single precision values, to perhaps two decimal places, Yes ! But, not in a numerical sense. That is, unless we use the "%.2f" or something similar, we can't manipulate how the numbers are displayed. Unless, we convert the "value" to a character string. i.e.:

assignment abc#=333000.33

displays print abc#: 333000.330000

we start by converting abc# into a character string: (string)=(abc#)

177

Then, snip off the trailing zeros:

String = 333000.330000
snip and then print string:

displays print string: 333000.33
Okay, that will take care of trailing zeros, but, now how about rounding up those single precision values to two decimal places ? How do you round-up a character string ? Well, believe it or not, that's not really all that difficult to do, it's just a little more of a drawn out process. Let's say our "single" (!) is assigned a value equal to = 33000.33, and we do the same as we did for abc#, we convert it to a character string:

(string)=(abc!)
now "string" contains : 33000.328125 Now, we know that string has six decimal digits, but we don't know how long string is. For all we know, string could equal: 1.123456 or 123456.123456. So we get the length of string:

len = strlen(string); 33000.328125 123456789 10 11 12

(len = 12)

We know that the decimal point is going to be 6 characters less than "len". So, "len" is 12 and the decimal, (or "dot") is at 6. Now recall, we want to round up, leaving two decimal places, so let's add 2 to "dot", so that it's pointing to the 2, (.32):

33000.328125 123456789 dot len

Now all we have to do is: •while "len" > "dot": •if character[len] is 5 or greater, •increment character[len-1], •delete character[len], •decrement "len", loop •else •delete character[len], •decrement "len", loop Here is the result in string, after each pass:

178

start: pass-1: pass-2: pass-3: pass-4:

33000.328125 33000.32813 33000.3281 33000.328 33000.33

As long as character[len-1] isn't a "9" this works okay, but, even if it is, we can take care of that too. Okay, now it's time to make these changes. Open Output.c. Here is the new code for get_prnvar() :

void get_prnvar() { char ch, *val_strng; int pi; double value; pi = e_pos; pi = iswhite(pi); e_pos = pi; value = get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* --- convert value to string$ --- */ val_strng = value2strng(value); if(ch == ',') printf("%s \t", val_strng); else if(ch == ';') printf("%s ", val_strng); else printf("%s \n", val_strng); } /*-------------------------------*/

I admit, get_prnvar() got a little smaller, that's because our new function; value2strng() is going to make up the difference. Notice the call to value2strng() returns a character string pointer. Copy this to Output.c. Here is the new code for value2strng():

char *value2strng(double value) { static char buffer[81]; int pi, len, idx=0, dot=0; char ch, chx=var_type; /* --- convert to ascii, here --- */ /* Double: '#' */

179

(Continued) sprintf(buffer, "% f", value); len = strlen(buffer); ch = buffer[idx]; /* --- trim trailing zeros --- */ if(chx == '#') { idx = (len-1); ch = buffer[idx]; while(ch == '0') { idx--; ch = buffer[idx]; } buffer[(idx+1)] = '\0'; } /* --- round up to .nn --- */ else if(chx == '!') /* Single: '!' */ { dot = (len-5); idx = (len-1); while(idx > dot) { if((buffer[idx] >= '5') && (buffer[(idx-1)] == '9')) { buffer[idx] = '\0'; } else if(buffer[idx] >= '5') { buffer[(idx-1)]++; buffer[idx] = '\0'; } else { buffer[idx] = '\0'; } idx--; } } /* --- trim to dot --- */ else /* Integers: '%' */ { idx = (len-7); buffer[idx] = '\0'; } return buffer; } /*------ end value2strng -------*/

Copy value2strng() to Output.c. Then save and close it. Open Prototyp.h. We need to add the Prototype for value2strng():

180

/* ----- function prototypes ----- */ [snip]... /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); [snip]...

Now save Prototyp.h and close it. Re-compile Bxbasic.c. And, the run Test.bas again.

abc!= xyz!= abc#= xyz#=

33000.33 99000.47 333000.33 999000.47

You might have noticed that the numbers are tabbed over one space on the display. That is one of Basic's Standards. It reserves a space for a minus sign. So, now we have done what we set out to. We eliminated trailing zeros and round-up single precision values to two decimal places.

ENGINE-COMPILER
You guessed it. It's time to do an update on our engine and compiler. But, in addition to the obvious matter of incorporating the new variable types into the engine and compiler, I'd like to also take care of a matter of the "offset". You know, that nasty little number we have to add to the engine's input routine, so that it knows where the engine ends and the byte code starts. What I'd like to do is to make Bxcomp.exe automatically fetch the engine's file length and then embed that number, (the offset), into the executable file itself. That way, we don't have to keep doing all that housekeeping that we've been doing:

compile get offset add offset recompile
Where to embed the offset ? Well, I think the most obvious and perhaps the safest place is at the tail end of the executable file.

181

Example:

0 engine

file length nrows byte code offset

From the perspective of Bxcomp.exe, it already knows the length of engine.exe. After writing the byte code, it just tacks on the offset. From the perspective of Test.exe, it only needs to move the file pointer to the end of the file:

0 engine

file length nrows byte code offset

file pointer
then, backup to the beginning of the offset:

0 engine

file length nrows byte code offset backup

file pointer
and read-in the offset value and proceed from there. It's that simple !

Here is a code snippet from Bxcomp.c that illustrates fetching the offset and then writing it:

[snip]... /* --- get engine file-length offset --- */ offset = filelength(fileno(f_in)); printf("Offset=%ld\n", offset); while(! feof(f_in)) [snip]... [snip]... /* --- write byte code --- */ [snip]... /* --- write offset --- */ fprintf(f_out,"%ld\n", offset); } /*-------------------------------*/ [snip]...

/* write nrows */

And, here is a snippet from the Enginput.c that illustrates retrieving the offset:

182

[snip]... /* --- get offset from end-of-file --- */ f_len = filelength(fileno(f_in)); offset = (f_len-7); fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); [snip]... offset = atol(p_string); [snip]...

With that said, let's add the code. Open Bxcomp.c. Begin by adding global variable "offset" to the "global vars" section: /* ------ global vars ------------ */ [snip]... long offset; /* offset:engine file length

*/

Now, copy these new versions of functions merge_source() and write_src() to Bxcomp.c:

void merge_source() { char ch, dot='.', *destin, source[20]; int ii, data, ab_code=3, x=0; unsigned size=PATH; destin = malloc(size * sizeof(char)); strcpy(destin, s_holder); /* copy source file name */ ii = 0; ch = '\0'; while(ch != dot) { ch = s_holder[ii]; /* make source.bas = source. */ ii++; } destin[ii] = '\0'; strcat(destin, "exe"); /* append "exe" to filename */ /* --- open destination file (write-binary) --- */ f_out = fopen(destin,"wb"); printf("Destination file: %s\n",destin); /* --- read-in scripting engine (read-binary) --- */ strcpy(source, "engine.exe"); f_in = fopen(source, "rb"); if(f_in == NULL) { a_bort(ab_code,x); } printf("Source file: %s\n",source);

183

(Continued) /* --- get engine file-length offset --- */ offset = filelength(fileno(f_in)); printf("Offset=%ld\n", offset); while(! feof(f_in)) { data = fgetc(f_in); /* data = incoming stream */ if(! feof(f_in)) { fputc(data, f_out); /* write to destination file */ } } fclose(f_in); /* done copying engine */ /* --- store byte code --- */ printf("Source file:%s\n",s_holder); write_src(); fclose(f_out); printf("Program lines=%d\nDone!\n",nrows); free(destin); } /*------ end merge_source -------*/

void write_src() { char *tmp="source.tmp"; int ii; /* --- store nrows --- */ fprintf(f_out,"%d\n", nrows); /* write nrows */

/* --- write byte code --- */ for(ii=0; ii < nrows; ii++) { fprintf(f_out,"%s\n", label_nam[ii]); } for(ii=0; ii < nrows; ii++)

/* write block label */

{ fprintf(f_out,"%d\n", byte_array[ii]); /* write byte_code */ } for(ii=0; ii < nrows; ii++) { fprintf(f_out,"%s", array1[ii]); /* write statement */ } /* --- write offset --- */ fprintf(f_out,"%ld\n", offset); } /*-------------------------------*/ /* write nrows */

Save Bxcomp.c and close it.

184

Now open Enginput.c. Here is a replacement for the upper portion of load_bas1(), up to the /* --- get nrows --- */ part, (the sections from /* --- create program arrays --- */ on down remain the same):

void load_bas1() { char ch, *tmp=s_holder; int ii, len, pi; unsigned size, buffr=10; long offset, f_len; /* --- open file --- */ f_in = fopen(tmp,"r");

/* offset: Engine.exe file length */ /* source file is now: "filename.exe" */

/* --- get offset from end-of-file --- */ f_len = filelength(fileno(f_in)); offset = (f_len-7); fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); while((p_string[0] == '\n') && (offset < f_len)) { offset++; fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); } offset = atol(p_string); /* --- get nrows --- */ fseek(f_in, offset, SEEK_SET); fgets(p_string, buffr, f_in); nrows = atoi(p_string); /* --- create program arrays --- */ [snip]...

Save Enginput.c and close it. Now open Engine.c. We need to add the declarations for the new variable types to Engine.c, in the "global vars" section. Here's what the variable types portion should look like: /* ------ global vars ------------ */ [snip]... char var_type; /* current variable type /**/ int *iv_stack; /* stack:integer variable values char **in_stack; /* stack:integer variable names int imax_vars=0; /* stack:integer variable counter /**/ double *dv_stack; /* stack:double float values char **dn_stack; /* stack:double float names int dmax_vars=0; /* stack:double float counter /**/

*/ */ */ */ */ */ */

185

(Continued) long *lv_stack; char **ln_stack; int lmax_vars=0; /**/ float *fv_stack; char **fn_stack; int fmax_vars=0; /**/ char **nam_stack; int max_vars; void (*init_fn)(); /**/ [snip]... /* stack:long variable values /* stack:long variable names /* stack:long variable counter /* stack:float variable values /* stack:float variable names /* stack:float variable counter /* generic pointer to name stacks /* variable for vars count /* generic function pointer */ */ */ */ */ */ */ */ */

Copy the above to Engine.c, then save and close it. The next step it to compile what we have. Begin by compiling Bxcomp.c. Next, compile Engine.c. Hopefully, again, there were no compiler errors reported. Now, using Bxcomp.exe, compile Test.bas. Enter: BXCOMP TEST Bxcomp.exe will report something like this:

Bxbasic Compiler Destination file: test.exe Source file: engine.exe Offset=40656 Source file:test.bas Program lines=47 Done!
Before executing Test.exe, use the "List.com" utility to examine Test.exe. Enter: LIST TEST.EXE then press: w press: b

and there you have it. The very last line is the "Offset". Exit "List.com"; press: x

186

Now enter: TEST Voile !!!

CONCLUSION
We've covered everything I'd intended to in this installment, except for adding variable type Character String. That will have to wait for the next chapter. We now have a Byte-code Engine that handles four different, numeric data types and a Compiler that embeds the byte code offset within the executable itself.

187

CHAPTER - 6
INTRODUCTION
Welcome back. In the last chapter we added more numeric variable types and enhanced performance and functionality by a little bit. For the purposes of controlling how numbers are displayed, numeric values are now converted to a character string. We also made an improvement to the way the compiler produces the executable by generating and embedding the offset value. In this chapter, I'd like to continue with variables and start out by adding character string variables. Since we already know how to generate variables and many of the components are already in place for those purposes, all we should need to do is to create the character string specific portions.

CHARACTER STRINGS
To start with, we need the ability to assign a character string, whether it is a single character, a string of characters, or even a null (empty) string, to a character variable. Example:

astring = "this is an example"
Typically, a string is enclosed within a pair of double quotes, as shown in the example. However, just as numeric variables can be manipulated in numerous ways, character strings can also be manipulated in as many or more ways. String variables can be truncated (shortened), concatenated (added to), characters removed, characters inserted, assigned the contents of other strings and even accept characters from the keyboard. What we will do is take this just one step at a time, start small and work up to the more glamorous features of string assignments and manipulations. To start with, let's begin with the example shown above. We first need the basic ability to say:

astring = "this is an example"
Some languages, which are considered to be loosely typed, allow any variable to be of any given type, be it numeric or character, at any given time. These variables are referred to as Type Variant. Example: DIM my_variable as integer my_variable = 12345 DIM my_variable as string my_variable = "this is a string" Or my_variable = 12345 … my_variable = “this is a string” Both statements within the same program could be valid. The reason for this has a lot to do with how the variables are created and stored. In these instances, the variables are probably not stored using arrays, but, most probably in a combination of structures, unions and linked lists. In the above example the context of the variable, how it's being

188

used, determines whether or not a variable is numeric or character in nature. This is really an interesting idea and it provides a lot of freedom on the part of the programmer. In that a variable’s type can change “on the fly”. **(My personal feeling is that use of variant type variables is a poor programming practice and potentially leads to many hard to find programming bugs.) Do note that the programmer also has to keep closer track of their variable types and how they are being used, because there is no obvious distinction between the two examples. Besides C, JavaScript and Perl are just a few languages I'm familiar with that do not use a variable type symbol. When you are creating and defining your own language specifications, you get to make these kinds of decisions. We will deal with more of that a little later on. On a side note, I have always thought that when dealing with type-cast variables, it would be of greater benefit to know the variable type before hand, rather than after the fact. i.e.: if the type symbol were provided at the beginning of the variable name instead of after the variable name. Just as our other variable types utilize a symbol, (or the lack of one to distinguish one data type from the other,) the character string commonly uses the “$” symbol to specify the data type. For our purposes we will use the "$" symbol to indicate a string variable. We already have the ability to print a double quoted character string. The code needed to perform that function already exists:

[snip]... while(ch != quote) { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; pi++; pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') printf("%s\t", xstring); else if(ch == ';') printf("%s", xstring); else printf("%s\n", xstring); [snip]...

All we need to do now is to cut-n-paste the segments we need to extract the string and then make the assignment by storing the name and string content in the arrays.

STRING VARIABLES
We will begin by adding the array declarations to Bxbasic.c, in the "global vars" section. Open bxbasic.c. And, copy this portion:

189

/* ------ global vars ------------ */ [snip]... /**/ char **sv_stack; /* stack:string variable array char **sn_stack; /* stack:string variable names int smax_vars=0; /* stack:string variable counter /**/ [snip]...

*/ */ */

Save Bxbasic.c and close it. We need to inform the code parsing routines in Input.c that we are adding a new variable type. Functions tmp_byte() and get_byte() need the "$" symbol added. Open Input.c. And, modify tmp_byte(), (in this section shown below,) to read as follows:

void tmp_byte(int ii) [snip]... else if(isalpha(ch)) { si = pi; while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) { byte = 1; pi = si; } else { a_bort(ab_code, x); } } [snip]... /* a possible assignment */ /* save pointer position */

/* a variable assignment */

/* not an assignment */

Now do the same for this segment of get_byte():

int get_byte(int ii) [snip]... else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) { byte = 1;

/* a variable assignment */

190

(Continued) pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } [snip]...

Make these changes. Then save and close Input.c. Next, we need to make a provision within function parse_let(), in file Variable.c, for handling type "$" variables. Open Variable.c. And, make this addition to parse_let():

void parse_let() [snip]... /* --- we now have varname and type --- */ /* --- is this a character string --- */ if(ch == '$') { nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); strng_assgn(ndx); } /* --- compare name to double array --- */ else if(ch == '#') /* double sign */ { nam_stack = dn_stack; /* indirect reference to name_stack */ [snip]...

Pay special attention to this line: else if(ch == '#') /* double sign */

Remember to add the "else" to the beginning of the IF statement. An error will be generated if you overlook this. Now add these three new functions to Variable.c:

191

void init_str() { int ndx; unsigned size; if(smax_vars == 0) { ndx = smax_vars; smax_vars++; size = smax_vars; sv_stack = malloc(size * sizeof(char *)); sv_stack[ndx] = malloc(1 * sizeof(char)); sn_stack = malloc(size * sizeof(char *)); size = VAR_NAME; sn_stack[ndx] = malloc(size * sizeof(char)); } else { ndx = smax_vars; smax_vars++; size = smax_vars; sv_stack = realloc(sv_stack, size * sizeof(char *)); sv_stack[ndx] = malloc(1 * sizeof(char)); sn_stack = realloc(sn_stack, size * sizeof(char *)); size = VAR_NAME; sn_stack[ndx] = malloc(size * sizeof(char)); } nam_stack = sn_stack; max_vars = smax_vars; } /*---------- end init_str -----------*/

void clr_str() { int ndx; if(smax_vars > 0) { for(ndx=0; ndx < smax_vars; ndx++) { free(sv_stack[ndx]); free(sn_stack[ndx]); } free(sv_stack); free(sn_stack); smax_vars = 0; } } /*---------- end clr_str -----------*/

192

void strng_assgn(int ndx) { char ch, quote='\"'; int pi, stlen, si=0, ab_code=6, x=line_ndx; unsigned size; stlen = strlen(p_string); pi = e_pos; /* plant "pi" with first quote */ pi++; ch = p_string[pi]; /* --- we now have first character --- */ /* --- fill buffer with string --- */ si = 0; while((ch != quote) && (pi < stlen)) { s_holder[si] = ch; si++; pi++; ch = p_string[pi]; } s_holder[si] = '\0'; if(pi == stlen) /* error:if at end of line */ { a_bort(ab_code,x); } /* --- copy buffer to string_stack --- */ size = strlen(s_holder); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], s_holder);

Note the "temporary" print statement at the end. That is so that we can test this new feature, since we don't yet have the means of displaying string variables. Copy these three functions to Variable.c. We need to add a "clear string vars" to our CLEAR utility. Modify clr_vars() as follows:

void clr_vars() { clr_int(); clr_dbl(); clr_lng(); clr_flt(); clr_str(); } /*-------------------------------*/

Save and close Variable.c. Now, we need to update the prototypes for variables in file Prototyp.c. Open Prototyp.c.

193

And, make these changes to the Variable.c section: /* ----- function prototypes ----- */ [snip]... /* Variable.c */ void parse_let(void); double get_varvalue(void); char *get_varname(void); int get_varndx(char *); void strng_assgn(int); void void void void void init_int(void); init_dbl(void); init_lng(void); init_flt(void); init_str(void);

void clr_vars(void); void clr_int(void); void clr_dbl(void); void clr_lng(void); void clr_flt(void); void clr_str(void); [snip]...

Save Prototyp.c and close it. Now compile Bxbasic.c. Now run it using this shortened version of Test.bas: ' test.bas version 6.1 Start1: CLS PRINT "hello world!" abc$ = "test" xyz$ = "" END The output should have been something like this:

hello world! >test< ><
The third line printed was a null string (empty), as we assigned:

xyz$ = ""
Remember to delete that temporary print statement in strng_assgn(). We won't need it anymore.

194

PRINT STRING VARIABLES
Next we need the ability to display our string variables. Since that's going to involve the "output routines", we first need to look at the existing components in Output.c to determine what we already have and what more we need. Examine this snippet from get_prnstring() : [snip]... pi = e_pos; ch = p_string[pi]; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { get_prnvar(); return; } [snip]...

<-----

At this point, where we reach the second IF statement, we already know we are dealing with a variable.

i.e.: abc$ ch = ---^
The only other option would be a quoted string. The problem with this is, the assumption that it's one of our numeric variables, as it makes a call to get_prnvar() and that function is not compatible with printing character string variables. So, clearly we need to first establish whether or not this is a numeric or string variable and then branch accordingly. The problem arises that "ch" contains the first character in the variable name. Nothing wrong with that, except that, what we want to know is, what is the last character, what's the "variable type symbol".

i.e.:

abc$ ch = ---^

This is one case where knowing the variable type up front would sure make things simpler. What we need to do now is scan ahead and locate the type symbol. That may not seem like any big deal, and it isn't, except that it's a waste of time, or rather "clock cycles". If we call a simple routine to look ahead and determine what the type symbol is, while we are at it, we may as well take the time to extract the variable name, or we have wasted those clock cycles. The only problem with that is, that the numeric variable print routines are already laid out. The get name function is called farther down the road. We would have to rearrange the timing of events in that part of the program as well to accommodate this function. By no means is our engine "etched in stone". This is just an inconvenience that we are going to have to deal with, sooner or later. In these kinds of situations, if you don't recognize these types of issues and deal with them as they come up, you will get a snowball effect. In that every little detail that stole a hundred clock cycles, builds into a thousand clock cycles and then ten thousand clock cycles, until your engine crawls at a snails pace. This is especially critical when you are building a programming language, with which application programs will be written. A prime example of this would be a programming tool called DBase, made popular in the mid 1980's. Everyone used it, but, everyone hated it, because it was soooooo slooooow. We will deal with this and other bullet holes we still have, just not at this moment. For now, let's pretend that this is not the main issue.

195

What we will do, is add a little routine that simply looks ahead and finds out what type of variable we are dealing with. Actually, all we really want to know is, is this a "string" variable or not. If it is, then great, if it isn't, we fall through and proceed as we would have, before. So, the second IF statement should look something like this: [snip]... else if(isalpha(ch)) { type = get_vartype(); if(type == '$') { get_strvar(); } else { get_prnvar(); } return; } [snip]...

<---- get type <---- is a string <---- not a string

Do you see the point of the call to get_vartype() and the subsequent if-else that follows ? Open Output.c. Replace the existing version of get_prnstring() with this revised version :

void get_prnstring() { char ch, quote='\"', type; int pi, si=0; pi = e_pos; ch = p_string[pi]; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { type = get_vartype(); if(type == '$') { get_strvar(); } else { get_prnvar(); } return; } pi++; ch = p_string[pi]; while(ch != quote) { xstring[si] = ch; pi++; si++; ch = p_string[pi];

196

(Continued) } xstring[si]='\0'; pi++; pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') printf("%s\t", xstring); else if(ch == ';') printf("%s", xstring); else printf("%s\n", xstring); } /*-------------------------------*/

Here is the code for function get_vartype():

char get_vartype() { char ch; int pi; pi = e_pos; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } return ch; } /*-------------------------------*/

See how simple this is? Copy this to Output.c. Here is the code for get_strvar(). Since there is only one character variable type (string), all we need to do is find the index, then display it. Copy this to Output.c:

void get_strvar() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get stack index --- */ while((ndx < smax_vars) && (strcmp(sn_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ }

197

(Continued) if(ndx == smax_vars) { a_bort(ab_code, x); } pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; /* --- display string$ --- */ if(ch == ',') printf("%s\t", sv_stack[ndx]); else if(ch == ';') printf("%s", sv_stack[ndx]); else printf("%s\n", sv_stack[ndx]); } /*-------------------------------*/ /* error: did not find it */

Save Output.c and close it. With the addition of these two new functions, we need to update the prototypes. Revise Prototyp.h. to include the new functions under the "Output.c" heading:

/* bxbasic : Prototyp.h : alpha version */ /* ----- function prototypes ----- */ [snip]... /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); char get_vartype(void); void get_strvar(void); [snip]...

Save Prototyp.h and close it. Remember to delete this line:

printf(">%s<\n",sv_stack[ndx]);

/* temporary */

from the bottom of function strng_assgn() in Variable.c, so that it now reads as follows:

198

[snip]... /* --- copy buffer to string_stack --- */ size = strlen(s_holder); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], s_holder); } /*------ end strng_assgn -------*/

Compile Bxbasic.c. Now try it with this Test.bas: ' test.bas version 6.2 Start1: CLS PRINT "hello world!" abc$ = "test"; xyz$ = ""; abc$ = "test" END Try some variations of the above.

OPTIMIZING
Okay, it's time to make a few adjustments to the code. One function we've out grown is xstring_array(). Originally, it served a minor purpose, but now it really doesn't and it wastes clock cycles. We can relocate the one remaining thing it does and then eliminate it. Open Bxbasic.c. In the "Function Prototypes Section", delete the prototype for xstring_array(), so that it reads as follows:

/* bxbasic.c : alpha version.06 */ [snip]... /* ----- function prototypes ----- */ void pgm_parser(void); void get_token(void); void parser(void); void go_to(void); [snip]...

199

In function parser(), change the PRINT block, by deleting the call to xstring_array(), so that it reads:

[snip]... case 3: /* LOCATE */ locate(); break; case 4: /* PRINT */ get_prnstring(); break; case 5: /* GOTO */ [snip]...

Now, delete the function xstring_array() itself. Save Bxbasic.c and close it. Open file Output.c. And, copy this version of get_prnstring():

void get_prnstring() { char ch, quote='\"', type; int pi, si=0, stlen; int ab_code, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { type = get_vartype(); if(type == '$') { get_strvar(); } else { get_prnvar(); } return; } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { pi++; ch = p_string[pi];

200

(Continued) while((ch != quote) && (pi < stlen)) { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; if(pi >= stlen) { ab_code=6; /* error: no closing quote */ a_bort(ab_code, x); } } /* --- advance to next character --- */ pi++; pi = iswhite(pi); ch = p_string[pi]; /* --- print quoted string --- */ if(ch == ',') printf("%s\t", xstring); else if(ch == ';') printf("%s", xstring); else printf("%s\n", xstring); } /*-------------------------------*/

Save Output.c and close it. One final thing, open Error.c. In function a_bort(), delete the block for "case 5": [snip]... case 5: break; case 6: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("No closing quotes.\ncode(%d)\n", code); break; [snip]...

Save Error.c and close it. Now Recompile Bxbasic.c. Test it using test.bas again. Make sure it's still working. When we were working on the last topic, I stated that eventually we needed to fix what was wrong with get_prnstring() and what was wrong with our procedure. Well we might as well do it now. If you recall, the issue at that point was, should we use a separate routine just to determine the variable type and possibly waste cpu time in the process, or should we refine the code by rewriting the way we have done things up to that point, that are

201

affected by it. If you examine the code we have so far, where it concerns functions get_varname() and get_varvalue(), you will see that several functions that use those routines have to be taken into consideration. I will list them: • • • • • get_prnstring(), get_strvar(), get_prnvar(), locate(), factor(),

Typically, the following routines: get_prnvar(), locate() and factor(), made use of a satisfactory method of calling get_varname(), as shown here: value = get_varvalue();

they did it in a rather discrete way, by placing a call to get_varvalue(), which in turn called get_varname() as it's first task: double get_varvalue() [snip]... /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get var type --- */ [snip]...

This is a very clean way of doing that procedure, after all, neither of those functions cares in the least what the variable name is, it only seeks the result, "value". Unfortunately, now that we have a non numeric variable type, this becomes an issue. We must now separate the call to get_varname() from get_varvalue() in order for things to operate correctly, without redundancy, wasting code and cpu cycles. Until we come up with a better way of doing it, the obvious solution is for the calling function to first place the call to get_varname(), store the variable name and then proceed with the call to get_varvalue(). This sequence of events would allow get_prnstring() to make the determination of the variable type, after the call to get_varname() and prior to branching to the correct print routine, without wasting clock cycles. Here is the corrected code for all the affected functions:

double get_varvalue() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; double value; /* --- get varname --- */ strcpy(varname, s_holder);

202

(Continued) /* --- get var type --- */ pi = e_pos; ch = p_string[pi]; var_type = ch; /* --- now compare to var type array --- */ if(ch == '#') { nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; _GetChar(); /* increment character pointer */ } else if(ch == '!') { nam_stack = fn_stack; /* indirect reference to name_stack */ max_vars = fmax_vars; _GetChar(); /* increment character pointer */ } else if(ch == '%') { nam_stack = ln_stack; /* indirect reference to name_stack */ max_vars = lmax_vars; _GetChar(); /* increment character pointer */ } else { nam_stack = in_stack; /* indirect reference to name_stack */ max_vars = imax_vars; } while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == max_vars) /* error: did not find it */ { a_bort(ab_code, x); } if(ch == '#') { value = dv_stack[ndx]; } else if(ch == '!') { value = (double) fv_stack[ndx]; } else if(ch == '%') { value = (double) lv_stack[ndx]; } else { value = (double) iv_stack[ndx]; } return value; } /*-------------------------------*/

203

double Factor() { char ch; int pi; double value;

/* Parse and Translate a Math Factor */

pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); value = Expression(); Match(')'); } else { if(isalpha(ch)) { strcpy(s_holder, get_varname()); value = get_varvalue(); SkipWhite(); } else { value = GetNum(); } } return value; } /*-------------------------------*/

void locate() { char ch, rows[3], cols[3]; int pi, stlen, row_x, col_y; int si=0, ab_code=10, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; /* -------- rows -------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; strcpy(s_holder, get_varname()); row_x = (int) get_varvalue(); pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; } else if(isdigit(ch)) /* this is a number */ { while(isdigit(ch) != 0) { rows[si] = ch; pi++; si++; ch = p_string[pi];

204

(Continued) } rows[si] = '\0'; row_x = atoi(rows); } else { a_bort(ab_code, x); } pi = iswhite(pi); ch = p_string[pi]; if(ch == ',') { pi++; pi = iswhite(pi); ch = p_string[pi]; } else { a_bort(ab_code, x); }

/* convert alpha to integer */ /* error: failed to find an alpha */

/* comma separates row and column */

/* -------- columns -------- */ if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; strcpy(s_holder, get_varname()); col_y = (int) get_varvalue(); } else if(isdigit(ch)) /* this is a number */ { si = 0; while(isdigit(ch) != 0) { cols[si] = ch; pi++; si++; ch = p_string[pi]; } cols[si] = '\0'; col_y = atoi(cols); /* convert alpha to integer */ } else { a_bort(ab_code, x); } /* -------- now position cursor -------- */ /* --- Power-C version --- */ #ifdef Power_C poscurs(row_x, col_y); #endif /* --- Lcc version --- */ #ifdef LccWin32 row_x++; col_y++; gotoxy(col_y,row_x); #endif } /*-------------------------------*/

205

void get_strvar() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; /* --- get varname --- */ strcpy(varname, s_holder); /* --- get stack index --- */ while((ndx < smax_vars) && (strcmp(sn_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == smax_vars) /* error: did not find it */ { a_bort(ab_code, x); } pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; /* --- display string$ --- */ if(ch == ',') printf("%s\t", sv_stack[ndx]); else if(ch == ';') printf("%s", sv_stack[ndx]); else printf("%s\n", sv_stack[ndx]); } /*-------------------------------*/

void get_prnstring() { char ch, quote='\"'; int pi, si=0, stlen; int ab_code, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; if(ch == ':') { printf("\n"); return; } else if(isalpha(ch)) { strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); } else { get_prnvar(); } return;

206

(Continued) } stlen = strlen(p_string); if((ch != quote) || (pi == stlen)) /* next character must be a */ { ab_code=9; /* quote, or error: */ a_bort(ab_code, x); } else { pi++; ch = p_string[pi]; while((ch != quote) && (pi < stlen)) { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; if(pi >= stlen) { ab_code=6; /* error: no closing quote */ a_bort(ab_code, x); } } /* --- advance to next character --- */ pi++; pi = iswhite(pi); ch = p_string[pi]; /* --- print quoted string --- */ if(ch == ',') printf("%s\t", xstring); else if(ch == ';') printf("%s", xstring); else printf("%s\n", xstring); } /*-------------------------------*/

Copy all of the above corrected functions to their respective files. Delete function get_vartype() and delete it from the prototypes list as shown below: /* ----- function prototypes ----- */ [snip]... /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); char get_vartype(void); void get_strvar(void); [snip]...

<----- delete

207

After making the changes and saving the files, recompile Bxbasic.c. To make sure things are still working, execute Test.bas, again. Even though we added at least one line of code to each of the above functions, hopefully we didn't contribute to slowing the engine down. These types of things must always be weighted in terms of diminishing returns. At what point does performance become over shadowed by increased executable size?

PRINT STATEMENTS
One of the more useful things we can do, at this point, it to expand the print routines so that we can have multiple PRINT statements on a single line. Example:

PRINT "abc="; abc, "xyz="; xyz
The above example condenses four PRINT statements into one line. To monitor and control the sequence of print events, we will need a print parser. The print parser should, (much as it does now,) simply determine what it has before it to print. The difference would be that it would work by using a loop. After printing the first item in a list of items, it would advance to the next item on the list and keep looping until the end of line is encountered. Example: print_parser: loop: while (not:end of line) if (quoted string) print_string: else if (string_variable) print_stringvar: else if (numeric_variable) print_numvar: else (error) then advance (to next item) return to loop: end while: Each item in the list will have to be delimited (separated) by a token. A token can be represented by anything or any character. For our purposes, since we already use the semi-colon and the comma in our print routines, we may as well use them as the delimiters. The comma serves the purpose of also being the token for a tab. What we also need is a token for a newline. If we use a colon as the token for the newline, then we will have the ability to print on multiple lines as well as print multiple items, within the same statement. Example:

PRINT "abc="; abc: "xyz="; xyz: newline ------^
Using the above example, the parser would begin by printing a quoted string:

"abc=";

208

which is immediately followed by a semi-colon. The next item on the list is an integer variable:

abc:
which in turn is followed by a colon. After the newline, another quoted string and integer variable are printed:

"xyz="; xyz:
Here is the code for the parser, function parse_print():

void parse_print() { char ch, quote='\"'; int pi, si=0, stlen; int ab_code=9, x=line_ndx; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- print newline --- */ if(strchr(":\n", ch)) { printf("\n"); return; } /* --- LOOP: multiple print statements --- */ while(ch != '\n') {/* --- print variable --- */ if(isalpha(ch)) { strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); /* string variable */ } else { get_prnvar(); /* numeric variable */ } } /* --- next char is a quote -- */ else if(ch == quote) { get_prnstring(); } /* --- error: --- */ else { a_bort(ab_code, x); } /* --- return from subroutines --- */ pi = e_pos; ch = p_string[pi]; /* --- is it end of statement --- */

209

(Continued) if(ch != '\n') { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } /* --- LOOP: if more to print --- */ } } /*-------- end parse_print --------*/

Copy function parse_print() to file Output.c. Here are new versions of functions: get_prnstring(), get_strvar() and get_prnvar(), copy these to Output.c as well :

void get_prnstring() { char ch, quote='\"'; int pi, si=0, stlen; int ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi++; ch = p_string[pi]; while((ch != quote) && (pi < stlen)) { xstring[si] = ch; pi++; si++; ch = p_string[pi]; } xstring[si]='\0'; /* --- error: no closing quote --- */ if(pi >= stlen) { a_bort(ab_code, x); } /* --- advance to next character --- */ pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi;

210

(Continued) /* --- print quoted string --- */ if(ch == ',') printf("%s\t", xstring); else if(ch == ';') printf("%s", xstring); else printf("%s\n", xstring); } /*-------- end get_prnstring --------*/

void get_strvar() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; /* --- get varname --- */ strcpy(varname, s_holder); /* --- get stack index --- */ while((ndx < smax_vars) && (strcmp(sn_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == smax_vars) /* error: did not find it */ { a_bort(ab_code, x); } pi = e_pos; pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- display string$ --- */ if(ch == ',') printf("%s\t", sv_stack[ndx]); else if(ch == ';') printf("%s", sv_stack[ndx]); else printf("%s\n", sv_stack[ndx]); } /*--------- end get_strvar -----------*/

void get_prnvar() { char ch, *val_strng; int pi; double value; pi = e_pos; pi = iswhiter(pi); e_pos = pi; value = get_varvalue(); pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi;

211

(Continued) /* --- convert value to string$ --- */ val_strng = value2strng(value); if(ch == ',') printf("%s \t", val_strng); else if(ch == ';') printf("%s ", val_strng); else printf("%s \n", val_strng); } /*--------- end get_prnvar -----------*/

Copy the above to Output.c. Save and close it. Note there is a new version of iswhite() used in all of the above; "iswhiter()". The difference is that iswhite() skips over all white space, including spaces, tabs and newlines, where as iswhiter() will only skip over blank spaces. The reason we need to make this change, is that, the "end of line" (return) is used as the condition in the "while" loop. If we inadvertently skipped over the newline, we would never know when we reached the end of the list. Making such a mistake would cause the program to crash, because the loop would blindly attempt to keep reading whatever is in memory past that point.

int iswhiter(int pi) { char ch, space=32; ch = p_string[pi]; while(ch == space) { pi++; ch = p_string[pi]; } return pi; } /*-------- end iswhiter ---------*/ /* if next char is "space" */ /* get rid of it */

Copy function "iswhiter()" to Utility.c and save it. Open file Prototyp.h. Copy the prototype for iswhiter() to the prototypes for Utility.c:

[snip]... /* Utility.c */ int get_upper(int,int); int get_alpha(int,int); int get_digit(int,int); int iswhite(int); void clr_arrays(void); int iswhiter(int); [snip]...

212

And, copy the prototype for parse_print(), under the Output.c heading: [snip]... /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); void get_strvar(void); void parse_print(void); [snip]...

Save Prototyp.h and close it. The last thing we have to do is to update the block for PRINT (case 4), in function parser(), file Bxbasic.c. Open Bxbasic.c. Change the PRINT block so that it calls parse_print(), as follows: [snip]... case 3: /* LOCATE */ locate(); break; case 4: /* PRINT */ parse_print(); break; case 5: /* GOTO */ [snip]... Save Bxbasic.c. Close it and recompile Bxbasic.c. Now run Bxbasic.exe with this version of Test.bas: ' test.bas version 6.3 Start1: CLS PRINT "hello world!" ' ------------------------------------------double float LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT "hello world!" ' ------------------------------------------long integers quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp%

213

(Continued) LOCATE abc%, xyz% PRINT "hello world!" ' -----------------------------------------abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc ' -----------------------------------------abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz ' long integers PRINT "abc%="; abc%: "xyz%="; xyz% ' float PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# ' -----------------------------------------' -----------------------------------------TheEnd: CLEAR END ' ------------------------------------------

CLEARING VARIABLES
In today's computer world of megabytes and gigabytes of available RAM and Virtual Ram memory, the need to clear or delete variables may be a frivolous issue. Not too long ago, conserving available memory was more than a minor issue. There was a time when memory was so limited, that program size was restricted so that all the variables could fit into memory and recycling variables space was a normal practice. Perhaps because I was around and programming during those days, I've always thought it a good practice to keep an eye out for excessive variable space being used and memory leaks. In programming environments like GW-Basic and QuickBasic, you had the ability to Clear or Erase variables to save memory. We currently have the ability to erase all variables at once, but, sometimes you only want to clear a particular variable or a list of variables. What I'd like to add to what we currently have, is the ability to clear variables individually and by type. This isn't too difficult to do, either. Consider what we just completed, in the last section, about printing variables either individually or by a list of variables. Nearly the same or similar type of code can be used, the difference being the target of the procedure is for clearing variables rather than printing variables. Example: • determine if there is a variables list, • if not, clear all variables, • if yes, process list,

214

•

a "loop" condition could be established and an entire list of variables or variable types could be processed in a single statement.

At present, clr_vars() just deletes all variables at once. Below is the modified code for clr_vars(). Now, in this new version, if there is an alpha character following the keyword, it is presumed to be a "list" of variables. In that event, function parse_clr() is called, otherwise it drops through and all variables are cleared, just as before.

void clr_vars() { char ch; int pi; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; if(isalpha(ch)) { parse_clr(); } else { clr_int(); clr_dbl(); clr_lng(); clr_flt(); clr_str(); } } /*--------- end clr_vars --------*/

/* a list of variables */ /* clear all variables */

Function parse_clr() uses the "while" loop to process the entire list, up to the "newline" character. We will use an uppercase abbreviation of the variables type to signify that variable type, such as INT for integers, etc. Inside the loop, it has to determine whether or not a particular list item is a variable type or a single variable. If the name of the item being analyzed is in uppercase, it will be compared to the variable types. On finding a match, that "clear var" routine will be called and the entire variable array will be erased from memory. If no match is found or the name is not in uppercase, then the "clear individual var" routine will be called :

void parse_clr() { char ch, varname[VAR_NAME]; int pi; pi = e_pos; ch = p_string[pi]; while(ch != '\n') { if(isupper(ch)) { strcpy(varname, get_varname()); if(strcmp("INT", varname) == 0) { clr_int(); /* clear variable type */ } else if(strcmp("LNG", varname) == 0) { clr_lng();

215

(Continued) } else if(strcmp("FLT", varname) == 0) { clr_flt(); } else if(strcmp("DBL", varname) == 0) { clr_dbl(); } else if(strcmp("STR", varname) == 0) { clr_str(); } else { clr_indvar(varname); /* individual var */ } } else { strcpy(varname, get_varname()); clr_indvar(varname); /* individual var */ } pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; if(ch != '\n') { pi++; pi = iswhiter(pi); ch = p_string[pi]; } e_pos = pi; /* --- loop: until done --- */ } } /*-------- end parse_clr ---------*/

Function clr_indvar() is the routine that clears individual variables, based on their name. You will recognize part of this function as being similar to that in the variable assignments routine. Once the variables array index has been located, the value array and the name array are set to zero. Although this function does not erase the variable from memory, by setting the name array cell to ascii character zero makes that array cell available for reassignment.

void clr_indvar(char *name) { char ch, varname[VAR_NAME]; int pi, ndx=0; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; /* --- indirect pointers --- */ if(ch == '$')

216

(Continued) { if(smax_vars == 0) { return; } nam_stack = sn_stack; max_vars = smax_vars; _GetChar();

} else if(ch == '#') { if(dmax_vars == 0) { return; } nam_stack = dn_stack; max_vars = dmax_vars; _GetChar(); } else if(ch == '!') { if(fmax_vars == 0) { return; } nam_stack = fn_stack; max_vars = fmax_vars; _GetChar(); } else if(ch == '%') { if(lmax_vars == 0) { return; } nam_stack = ln_stack; max_vars = lmax_vars; _GetChar(); } else { if(imax_vars == 0) { return; } nam_stack = in_stack; max_vars = imax_vars; } /* --- get variable index --- */ while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == max_vars) { return; /* reached end: not found */ } /* --- zero individual variable --- */ else if((ch == '$') && (ndx < max_vars)) { sv_stack[ndx][0] = '\0'; /* clear string */ sn_stack[ndx][0] = '\0'; } else if((ch == '#') && (ndx < max_vars)) { dv_stack[ndx] = 0; /* clear double */ dn_stack[ndx][0] = '\0'; }

217

(Continued) else if((ch == '!') && (ndx < max_vars)) { fv_stack[ndx] = 0; /* clear float */ fn_stack[ndx][0] = '\0'; } else if((ch == '%') && (ndx < max_vars)) { lv_stack[ndx] = 0; /* clear long */ ln_stack[ndx][0] = '\0'; } else if(ndx < max_vars) { iv_stack[ndx] = 0; /* clear integer */ in_stack[ndx][0] = '\0'; } } /*-------- end clr_indvar --------*/

Copy these three functions to file Variable.c. Save and close it. Add to Prototyp.h, the prototypes for the two new functions, under the Variable.c heading.

/* ----- function prototypes ----- */ [snip]... /* Variable.c */

[snip]... void parse_clr(void); void clr_indvar(char *); [snip]...

Compile Bxbasic.c. Test it on this version of Test.bas: ' test.bas version 6.4 Start1: CLS PRINT "hello world!" ' -----------------------------------------abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz

218

(Continued) ' PRINT "abc%="; abc%: "xyz%="; xyz% ' PRINT "abc!="; abc!: "xyz!="; xyz! double PRINT "abc#="; abc#: "xyz#="; xyz# ' -----------------------------------------test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR PRINT test$ CLEAR ' -----------------------------------------TheEnd: END ' -----------------------------------------' float long integers

The PRINT test$ statement, just prior to "TheEnd" should generate an error, since test$ no longer exists at that point. Try changing that statement to any of the variables on the clear list. Now delete that PRINT statement and run Test.bas again.

ENGINE UPDATE
Since most everything we've done in this chapter has related to either output or variables, updating the engine should be pretty easy. All we should have to do is duplicate the things we did to Bxbasic.c. Open Engine.c. Add the declarations for the string variables to the "global vars": /* ------ global vars ------------ */ [snip]... /**/ char **sv_stack; /* stack:string variable array char **sn_stack; /* stack:string variable names int smax_vars=0; /* stack:string variable counter /**/ [snip]...

*/ */ */

From the function prototypes section, delete the prototype for xstring_array(), so that they read as follows:

219

[snip]... /* ----- function prototypes ----- */ void load_bas1(void); void pgm_parser(void); void get_token(void); void parser(void); void go_to(void); [snip]... In parser(), change "case 4" so that it calls parse_print(): [snip]... case 3: /* LOCATE */ locate(); break; case 4: /* PRINT */ parse_print(); break; case 5: /* GOTO */ [snip]... And, delete function "xstring_array()" entirely. Save Engine.c. Now, recompile Engine.c. Since everything has already been compiled in Bxbasic, it should compile with no errors. Using Bxcomp.exe, compile then execute this version of Test.bas: ' test.bas version 6.5 Start1: CLS Hello$ = "hello world!" PRINT Hello$ ' ------------------------------------------double float LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT Hello$ ' ------------------------------------------long integers quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT Hello$ ' -----------------------------------------abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc ' ------------------------------------------

220

(Continued) abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' PRINT "": "abc="; abc: "xyz="; xyz ' PRINT "abc%="; abc%: "xyz%="; xyz% ' PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# ' -----------------------------------------test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR CLEAR ' -----------------------------------------TheEnd: END ' -----------------------------------------float long integers integers

CONCLUSION
We've covered a few of the things on our short list, • we added character string variables, • we can print string variables, • we can have a list of items to be printed in a single print statement, • and we have a number of options for clearing variables from memory. There's still more to come.

221

CHAPTER - 7
INTRODUCTION
Welcome back. Before we start, I'd like to point out, if I haven't already, that the methods and illustrations shown in this series are by no means considered (by anyone and most of all myself), to be the best, the most preferred, the most efficient, the recommended, the correct, or the only ways of accomplishing the tasks covered in this series. I'm sure there are far better methods of doing any number of things we cover here. As I've previously stated, in no way do I consider myself an expert on this subject or any subject for that matter. I simply love computer programming and I'd like to be able to share that passion with others of like mind. A few of the methods shown here are creations of my own, while most of the methods illustrated are things I've learned from others. In some cases I will opt to use a particular method over another simply because it makes more sense or is just easier to understand. As we all know, program code can often be very cryptic and not easy to comprehend. I try to avoid using code that fits into that category. In fact, I tend to latch on to coding methods that are easier to understand and convey to others, rather than coding methods that may be considered better or more advanced. Simply put, if it's not easy to understand, I don't use it. What I hope to accomplish in this series is to remove some of the mystique surrounding this subject and learn as much as I can with the help and input from others. That said, let's get started. We've accomplished quite a bit in the previous six chapters. Here's a run-down of what we now have: • we've constructed a fully functioning language Interpreter, • and runtime Script Engine, • and Byte Code Compiler that transforms scripts into executable programs. We also have a small but growing vocabulary: • BEEP :produces a sound • CLS :clear display • END :terminate program • LET :variable assignments • CLEAR :erase variables • LOCATE :display positioning • PRINT :display • GOTO :program control And, our variable types include: • Integers • Long Integers • Single Precision Floating Point • Double Precision Floating Point • Character Strings The next extensions to our language definition that I'd like to cover in this chapter are Subroutines and For/Next loops.

222

GOSUB-RETURN
Subroutines are small segments of code that can often times be utilized by more than one function. Often times a subroutine performs a specific function, like looking up a variable name or retrieving a value or performing a display function. After performing the specific task, the subroutine returns control to the calling function and program execution resumes from that point onward. In many cases, a program function is made up mostly of general purpose subroutines. The GOTO function is useful, in a limited way, when we need to completely change the program flow from the direction it is currently taking. In most cases, as in a modular style of programming, the entire program flow can be directed or redirected with the use of subroutines. Various dialects of Basic use GOSUB or CALL SUB or some other variation for calling a subroutine. At this point, we will use the GOSUB keyword for calling a subroutine and the RETURN keyword will be used for returning program control to the point where the subroutine was called. Unlike function calls, Gosubs use Block Labels or line numbers as the destination for the subroutine call. Example: ... GOSUB Start1 ... ... END ' -----------------Start1: CLS Hello$ = "hello world!" PRINT Hello$ RETURN ' ------------------

resume

In the above example, Gosub jumps to the block label Start1:. After executing the subroutine, upon encountering the RETURN token, the program resumes execution on the line following the original GOSUB. To execute a gosub, you first have to record (or store) the program line number that is currently being executed and then treat the remainder of the statement as though it were a GOTO statement. Why a "goto”? Well, that's easy, using the "goto" function offers these benefits: • it allows us to recycle existing code, • it identifies the Block Label that is the destination and it's program line index, • it then increments the program line counter to the desired program line index. With those things done, the next line to be executed will be the subroutine in question, just as if it were a GOTO. After completing execution of the subroutine and upon encountering the RETURN token, the program line that was previously saved is now retrieved and the program line counter is set to that line number and execution resumes on the line following that point.

223

Example: line# Label 100 101 keyword GOSUB ... statement Start1 ... return stack [100]<---ptr [ ]

On encountering the gosub token, the current line number, (100), is placed on a "return stack". This stack is a dynamic integer array that can increase in size with each gosub. Each time a gosub is encountered and the current line number is placed on top of the stack, a pointer is positioned to point to the top of the stack. Therefore, on encountering a RETURN, the stack element that is being pointed to by the "pointer" is loaded into the program line counter and the "pointer" is incremented to point to the next line number. Example: (subroutine) line# Label 200 Start1: 201 ... 290 keyword ... ... RETURN statement ... ... [100]<---ptr [ ] return stack

Line number 100 is reloaded into the program line counter, variable "line_ndx" is incremented and execution resumes on the following line, which would be line number 101. Here's the code, there are two global variables needed by the gosub function: /* ------ global vars ------------ */ [snip]... /**/ int *gosub_stack; /* Gosub: stack array int gs_ptr=-1; /* Gosub: stack position pointer /**/ [snip]...

*/ */

gosub_stack is the "return stack" and gs_ptr is the stack pointer. The new functions for Gosub and Return are do_gosub() and do_return(). For now, rather than placing the function prototypes for Gosub and Return in file Prototyp.h, these prototypes will be added to the prototypes section in Bxbasic.c. /* ----- function prototypes ----- */ [snip]... void do_gosub(void); void do_return(void); [snip]...

Update function parser() by adding "case 9" and "case 10" for the Gosub and Return byte codes, as shown:

224

void parser() [snip...] case 9: /* GOSUB */ do_gosub(); break; case 10: /* RETURN */ do_return(); break; [snip...]

Our Gosub function is called do_gosub(). Below is the code:

void do_gosub() { unsigned size; gs_ptr++; if(gs_ptr == 0) { size = 1; gosub_stack = malloc(size * sizeof(int)); } else { size = (gs_ptr + 1); gosub_stack = realloc(gosub_stack, size * sizeof(int)); } gosub_stack[gs_ptr] = line_ndx; go_to(); } /*------- end do_gosub ----------*/

On entry, a test is made on the value of "gs_ptr" to determine if the stack exists yet. If not, the value is less than zero, it creates it. Otherwise, an additional element is added to the stack. The pointer, "gs_ptr", is incremented to point to the top element and the line number is placed in that stack element. The Return function is called do_return(). On entry to do_return(), gs_ptr is tested to determine if it has a value of less than zero. If that is the case, then this should generate an error, but, currently it does not, it simply returns to the parser(). The reason that it should generate an error is because that would mean that there was a Return without there having been a Gosub.

void do_return() { unsigned size; if(gs_ptr < 0) { return; } else if(gs_ptr == 0) { line_ndx = gosub_stack[gs_ptr]; /* return w/no "Gosub" */

/* restore line number */

225

(Continued) free(gosub_stack); } else { line_ndx = gosub_stack[gs_ptr]; /* restore line number */ } gs_ptr--; /* decrement stack pointer */ } /*------- end do_return ----------*/

In the second "IF", if gs_ptr is equal to zero, then that means that there is only one stack element. "line_ndx", which is the "program line counter", is given the value contained in the top stack element and the stack is now empty so the stack array is freed. If the stack contains two or more elements, "line_ndx" is assigned the top stack element's value, then the "gs_ptr" is decremented. Copy all of the above to Bxbasic.c, in their respective places. There is one minor "fix" that needs to be made to function go_to(). Change the bottom half of go_to() to read as follows:

void go_to() [snip]... while(xtest != 0) { pi++; if(pi == nrows) { strcpy(t_holder, goto_label); a_bort(ab_code, x); /* error, label not found */ } xtest = strcmp(label_nam[pi], goto_label); } pi--; line_ndx = pi; /* set line_ndx to the goto_line */ } /*-------------------------------*/

In file Input.c, function get_byte() has to be updated. We need to add GOSUB and RETURN to the byte code list:

int get_byte(int ii) [snip...] /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) else if(strcmp(keyword, "LET") == 0) else if(strcmp(keyword, "CLEAR") == 0) else if(strcmp(keyword, "LOCATE") == 0) else if(strcmp(keyword, "PRINT") == 0) byte=0; byte=1; byte=2; byte=3; byte=4;

226

(Continued) else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, [snip...] "GOTO") == 0) "BEEP") == 0) "CLS") == 0) "END") == 0) "GOSUB") == 0) "RETURN") == 0) byte=5; byte=6; byte=7; byte=8; byte=9; byte=10;

Make the above changes and save Input.c. Compile Bxbasic.c and then test it using this version of Test.bas: ' test.bas version 7.1 GOSUB Start1 GOSUB DoubleFloat GOSUB LongIntegers GOSUB RDParser GOSUB PrintVars GOSUB ClearVars GOTO TheEnd

' -----------------------------------------Start1: CLS Hello$ = "hello world!" PRINT Hello$ RETURN ' ------------------------------------------double float DoubleFloat: LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT Hello$ RETURN ' ------------------------------------------long integers LongIntegers: quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT Hello$ RETURN ' -----------------------------------------RDParser: abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc RETURN

227

(Continued) ' -----------------------------------------PrintVars: abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz ' long integers PRINT "abc%="; abc%: "xyz%="; xyz% ' float PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# RETURN ' -----------------------------------------ClearVars: test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR CLEAR RETURN ' -----------------------------------------TheEnd: END ' ------------------------------------------

The above Test.bas processes the list of gosubs sequentially, that is, one at a time. What I mean is, it calls a subroutine, then, returns from that subroutine and then calls the next. Now, try this version of Test.bas: ' test.bas version 7.2 GOSUB Start1 GOTO TheEnd ' -----------------------------------------Start1: CLS Hello$ = "hello world!" PRINT Hello$ GOSUB DoubleFloat RETURN ' ------------------------------------------double float DoubleFloat: LET xylophone# = 50.3 LET yazoo# = 101.25

228

(Continued) LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT Hello$ GOSUB LongIntegers RETURN ' ------------------------------------------long integers LongIntegers: quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT Hello$ GOSUB RDParser RETURN ' -----------------------------------------RDParser: abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc GOSUB PrintVars RETURN ' -----------------------------------------PrintVars: abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz ' long integers PRINT "abc%="; abc%: "xyz%="; xyz% ' float PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# GOSUB ClearVars RETURN ' -----------------------------------------ClearVars: test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR CLEAR RETURN ' -----------------------------------------TheEnd: END ' ------------------------------------------

Do you see the difference?

229

This time gosubs are processed serially, in a series. One gosub is called and before returning to the calling function, another gosub is called and on and on, until a return is finally encountered. At that point, like falling dominos, each gosub returns to the one before it.

FOR-NEXT
For-Next loops are among the more popular for performing repetitive processes where the condition of the loop is specified by a predetermined or calculated number of iterations, or cycles. The typical For-Next loop looks something like this: FOR count = start TO end STEP increment [process...] [ ... ] [ ... ] NEXT count The basic For-Next loop could be described as:

while [value-A] is less than [value-B] loop: compare [value-A to value-B] do [block... ] then [increment value-A ] goto loop:

end while:
The FOR statement is comprised of three numeric values: • starting assignment, :count = start • the upper limit, :TO end • the increment, :[STEP increment] The "starting assignment" sets the beginning value for the iteration, or loop counter. In most cases this would be the lower limit, such as a "0" or a "1", in a normal "counting up" type of condition. Example: FOR x = 0 In the above, "x" is the variable used to maintain the count. Conversely, the starting assignment could also be set to a higher value in a "counting down" type of condition. Such as: FOR x = 100 The counter's start value must be set, as there is no default starting value. The "upper limit" is the point at which, if the counter reaches that predetermined value, the condition has been met and the loop is terminated. The "upper limit" value may be either greater than or less than the starting value of the counter. Based on whether the upper limit is greater than or less than the start value, the counter will either increment or decrement until the condition is met.

230

Example: FOR x = 0 TO 10 In the above, the condition could be stated as:

repeat: [this.....] 10 times:
There is no default value for the upper limit and it must be specified. The "increment" or "STEP" value is the number that is add to the counter each time it increments. The STEP [value] is optional and if it is not specified, the default value is "1". Meaning, if no increment value is specified, "1" is added to the counter after each loop. Before the loop can be executed, those three values must be determined.

LOOPS.C :
To begin with, create a new file and name it: "Loops.c". Copy this header information to the top of that file:

/* bxbasic : Loops.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

DO FOR( ) :
Now, into file: Loops.c, copy the code below for function do_for():

void do_for() { char ch, varname[VAR_NAME]; int pi, f_ndx, Inc=1, ab_code; int start, next_tru, x=line_ndx; long From, Final; /* --- assign FROM --- */ f_ndx = get_From(); /* --- get TO --- */ Final = get_To(); /* --- get STEP --- */ pi = e_pos; ch = p_string[pi]; if(ch == 'S') { Inc = get_Step(); }

231

(Continued) /* --- setup for-loop conditions --- */ From = lv_stack[f_ndx]; start = line_ndx; /* register: line counter */ fornxt_flg++; /* increment For/Next flag */ /* --- increment loop --- */ if(From < Final) { for(; From <= Final; From += Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { get_token(); s_pos = 0; e_pos = 0; parser(); if(token == 12) /* encountered a: return */ { next_tru = 0; token = 0; } else { line_ndx++; } } } } /* --- decrement loop --- */ else { for(; From >= Final; From += Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { get_token(); s_pos = 0; e_pos = 0; parser(); if(token == 12) /* encountered a: return */ { next_tru = 0; token = 0; } else { line_ndx++; } } } } fornxt_flg--; /* decrement For-Next flag */ } /*-------- end do_for ----------*/

The first thing function do_for() does, under the heading "assign FROM", is to call function get_From(). Which returns a value in variable "f_ndx". I will explain get_From() in greater detail below, but for the moment, suffice it to say that get_From() makes the assignment for the starting value.

232

GET FROM( ) :
Here is the code for function get_From():

int get_From() { char ch, varname[VAR_NAME]; int pi, f_ndx, sav_pi, ab_code, x=line_ndx; pi = e_pos; sav_pi = pi; /* --- get varname --- */ /* FOR x%=a TO b STEP c: */ strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch != '%') /* variable type must be: long */ { ab_code=5; a_bort(ab_code, x); } /* --- assign FROM --- */ e_pos = sav_pi; parse_let(); pi = e_pos; ch = p_string[pi]; /* --- get long array index --- */ nam_stack = ln_stack; max_vars = lmax_vars; f_ndx = get_varndx(varname); return f_ndx; } /*-------- end get_from ----------*/

As in the previous example;

FOR count = start TO end STEP increment
the first task is to identify the variable which will be used as the loop counter. Under the heading: "get varname", function get_varname() is called and variable "varname" is assigned the name of the counter. Beneath that you will see this statement: if(ch != '%') /* variable type must be: long */

what we are going to do, (for the sake of simplicity,) is to make the loop counter a variable of "type long". So, the corrected statement should read as follows:

FOR count% = start long ---^

233

We could of course use integers, floats, or any and all types of variables, but, to do so would add a degree of complexity. Not that it would be too complex to do, it's just that when you add more options, you add more code. So, for the sake of minimizing the amount of bloat, we will use "type long" variables. The advantage of type long, over type integer, is that longs have a higher upper limit. The disadvantage of type long, over type float, is that increments have to be in whole numbers. None of this is etched in stone, we can change this at anytime, if we have a need to. Under the heading: "assign FROM", you will see that function parse_let() is called to make the assignment of the start value to the loop counter ;

count% = start
Recycling existing code in this manner, by calling parse_let() in this case, can be an efficient means of constructing a new function. There is one problem though and that is that, parse_let() was not designed to return, to the calling function, any information about where in the array that the value was stored. In other words, after calling parse_let(), we have no array[index] reference by which to access the counter. It is for this reason, that prior to calling parse_let(), we call get_varname(). At the very least, we will already know what the name of the counter variable is. After returning from parse_let(), we can recycle another pre-existing function by calling: get_varndx(), in the following statement: f_ndx = get_varndx(varname); Function get_varndx() returns the array[index], in "f_ndx", of the variable name that we have provided it. Since we are only using "type long" variables, we provide get_varndx() with the information it needs so that it is looking in the correct variables stack array for "varname". So, by recycling three existing functions and writing a minimal amount of new code, get_From() returns to do_for() the array[index] of the counter variable. The second step, (in do_for()), under the heading "get TO", is to get the upper limit value. For this purpose, function get_To() is called and it returns a value in variable "Final".

GET TO( ) :
Here is the code for function get_To():

long get_To() { char ch, varname[VAR_NAME]; int pi, ab_code, x=line_ndx; long Final; pi = e_pos; ch = p_string[pi]; /* --- get TO --- */ if(ch != 'T')

234

(Continued) { strcpy(t_holder,"TO"); ab_code=7; a_bort(ab_code,x);

} else { strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; if(strcmp(varname, "TO") != 0) { strcpy(t_holder,"TO"); ab_code=7; a_bort(ab_code,x); } } pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; /* --- get Final value --- */ if(isalnum(ch)) { Final = (long) get_avalue(); } else /* error: failed to find a value */ { strcpy(t_holder,"TO"); ab_code=15; a_bort(ab_code, x); } return Final; } /*-------- end get_to ----------*/

Despite it's appearance, function get_To() does only one thing, it calls function get_avalue() (we will cover that in a moment), to return the "TO", or upper limit value, in this statement:

Final = (long) get_avalue();
Since the "TO", or upper limit value,

count = start TO end ---^
is not optional, three error tests are made to confirm that a valid value is provided. The upper limit value is then returned to do_for(). The third step, under the heading of "get STEP", is to get the increment value and get_Step() is called for that purpose.

235

GET STEP( ) :
The STEP, or increment value is optional.

FOR count = start TO end STEP increment ----^
In the above example, a test is made to determine if a Step value is even provided, with this code segment: if(ch == 'S') { Inc = get_Step(); } [snip...]

Here is the code for get_Step():

int get_Step() { char ch, varname[VAR_NAME]; int pi, Inc; int ab_code, x=line_ndx; char sav_ch=' '; strcpy(varname, get_varname()); if(strcmp(varname, "STEP") != 0) { strcpy(t_holder, "STEP"); ab_code=7; a_bort(ab_code,x); } pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; /* --- get Inc value --- */ if(IsAddop(ch)) { sav_ch = ch; pi++; pi = iswhiter(pi); ch = p_string[pi]; } e_pos = pi; if(isalnum(ch)) { Inc = (int) get_avalue(); } else /* error: failed to find a value */ { strcpy(t_holder,"STEP"); ab_code=15; a_bort(ab_code, x); } if(sav_ch == '-') { Inc = (0 - Inc); }

236

(Continued) return Inc; } /*-------- end get_Step ----------*/

If a Step value is furnished, a simple test is made, by using function IsAddop(), to determine whether or not it is a negative value. Then, function get_avalue() is called to return the Step value in variable "Inc". The final stage of function do_for() is the increment and decrement sections. They are almost identical, except that one is for incrementing the loop counter and the other does the opposite. You could say that this is the real "meat and potatoes" of the For-Next loop. [snip...] /* --- increment loop --- */ if(From < Final) { for(; From <= Final; From += Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { get_token(); s_pos = 0; e_pos = 0; parser(); if(token == 12) /* encountered a: return */ { next_tru = 0; token = 0; } else { line_ndx++; } } } } [snip...]

In the above, a C "for()" function is utilized to execute the loop. The "for()" is provided with the variables "From", "Final" and "Inc". Shown in this code snippet:

{

next_tru = 1; line_ndx = (start + 1); while(next_tru == 1)

/* set pointer to: here+1 */

the program line index (line_ndx), is incremented so that it points to the program line that follows the FOR, as in this example:

237

FOR abc% = zero TO ten STEP one PRINT "hello world." [...] [...] NEXT abc%

<--- line_ndx

Then, a "while" loop is used to take control of program execution, with the condition that "next_tru" is TRUE. In this case "1" is TRUE. What follows, you may recognize as closely resembling function pgm_parser(). The difference is that here, we want to keep looping until we encounter the "NEXT" token. You an see that get_token() and parser() are both called from inside the "while" loop. Until a token for "Next" is encountered, the line_ndx is incremented to point to the next program line and execution continues. Upon encountering the token for "Next", the "while(condition)" is set to FALSE, thus terminating this "while" loop.

while(next_tru == 1) { get_token(); s_pos = 0; e_pos = 0; parser(); if(token == 12) { next_tru = 0; token = 0; } else { line_ndx++; } }

/* encountered a: Next */

/* execute next program line */

At that point, the "for()" resumes control and variable "From" is incremented and tested. [snip...] { for(; From <= Final; From += Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { [snip...]

Until variable "From" is equal to variable "Final", "From" is incremented and the looping continues. Here is the code for function do_next():

238

void do_next() { int ab_code=16, x=line_ndx; if(fornxt_flg == 0) { a_bort(ab_code,x); } } /*-------- end do_next ---------*/

As you can see, do_next() is quite small. In fact, it is mostly symbolic, all it does is verify that the program is currently engaged in a For-Next loop, by testing variable "fornxt_flg" and generates an error if it isn't. Make sure you understand how these functions work. Copy all of the above functions to file: Loops.c. Save and close it. Open file Prototyp.h. And, add to it this list of new prototypes:

/* ----- function prototypes ----- */ [snip...] /* Loops.c */ void do_for(void); void do_next(void); int get_From(void); long get_To(void); int get_Step(void); [snip...]

Save and close Prototyp.h.

GET AVALUE( ) :
Here is the code for function get_avalue(), which I mentioned previously:

239

double get_avalue() { char ch, varname[VAR_NAME]; int pi, si=0; double value=0; pi = e_pos; ch = p_string[pi]; if(isalpha(ch)) /* this is a variable name */ { e_pos = pi; strcpy(s_holder, get_varname()); value = get_varvalue(); pi = e_pos; } else if((isdigit(ch)) || (IsAddop(ch))) /* this is a number */ { if(IsAddop(ch)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } while(isdigit(ch) != 0) { varname[si] = ch; pi++; si++; ch = p_string[pi]; } varname[si] = '\0'; value = atof(varname); /* convert alpha to float */ } pi = iswhite(pi); e_pos = pi; return value; } /*-------- end get_avalue --------*/

This function is a generic routine that returns a "type double" value for either a variable or a numeric character string. Copy this function to file Variable.c. Add the prototype for get_avalue() to the "Variable.c" section in file Prototyp.h: /* ----- function prototypes ----- */ [snip...] /* Variable.c */ [...] double get_avalue(void); [snip...]

Now we need to do the house keeping chores of adding the declarations to the other files.

240

Open Bxbasic.c. Add this declaration for variable "fornxt_flg" to the "global vars" section: /* ------ global vars ------------ */ [snip...] /**/ int fornxt_flg=0; /* For/Next: global flag /**/

*/

Update the "function includes" by adding "loops.c" to the list, as shown here: /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "input.c" #include "rdparser.c" #include "loops.c" [snip...]

Now, in function parser(), we need to add "case 11" and "case 12" for the FOR and NEXT tokens.

void parser() [snip...] case 11: /* FOR */ do_for(); break; case 12: /* NEXT */ do_next(); break; [snip...]

Now save and close Bxbasic.c. As you may have guessed, function get_byte() in file Input.c has to be updated with the new byte codes as well. Add the lines of code for byte codes "11" and "12" to the list, as shown here:

241

int get_byte(int ii) [snip...] /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) else if(strcmp(keyword, "LET") == 0) else if(strcmp(keyword, "CLEAR") == 0) else if(strcmp(keyword, "LOCATE") == 0) else if(strcmp(keyword, "PRINT") == 0) else if(strcmp(keyword, "GOTO") == 0) else if(strcmp(keyword, "BEEP") == 0) else if(strcmp(keyword, "CLS") == 0) else if(strcmp(keyword, "END") == 0) else if(strcmp(keyword, "GOSUB") == 0) else if(strcmp(keyword, "RETURN") == 0) else if(strcmp(keyword, "FOR") == 0) else if(strcmp(keyword, "NEXT") == 0) else [snip...] byte=0; byte=1; byte=2; byte=3; byte=4; byte=5; byte=6; byte=7; byte=8; byte=9; byte=10; byte=11; byte=12;

Save and close file Input.c. We have added some new error codes that need to be incorporated into the error handler. Here is the current code listing for function a_bort():

void a_bort(int code,int line_ndx) { beep(); switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasic program_name.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxbasic program_name.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 3: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Keywords must be in UpperCase:\n"); printf("code(%d)\n", code); break; case 4: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Unknown Command.\ncode(%d)\n", code);

242

(Continued) break; case 5: printf("\nVariable Type error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Type must be: Long \"%c\".\ncode(%d)\n",'%',code); break; case 6: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("No closing quotes.\ncode(%d)\n", code); break; case 7: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Expected: \"%s\".\ncode(%d)\n",t_holder,code); break; case 8: printf("\nGOTO Error: no such label:"); printf(" %s:\nin program line:",t_holder); printf(" %d:\nGOTO %s",(line_ndx+1),p_string); printf("Program Terminated\ncode(%d)\n", code); break; case 9: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Missing quotes.\ncode(%d)\n", code); break; case 10: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Use: LOCATE var_x, var_y: .\ncode(%d)\n", code); break; case 11: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Useage LET (variable assignment):\ncode(%d)\n", code); break; case 12: printf("\nExpected %s ",t_holder); printf(": in line: %d.\n", (line_ndx+1)); printf("%scode(%d)\n", p_string, code); break; case 13: printf("\nVariable not found: in line: %d.\n", (line_ndx+1)); printf("%scode(%d)\n", p_string, code); break; case 14: printf("\nInvalid operator: in line: %d.\n", (line_ndx+1)); printf("%scode(%d)\n", p_string, code); break; case 15: printf("\nSyntax error: in line: %d.\n", (line_ndx+1)); printf("%s {value} not found.\n%s", t_holder, p_string); printf("code(%d)\n", code); break;

243

(Continued) case 16: printf("\nFOR NEXT error: in line: %d.\n", (line_ndx+1)); printf("NEXT without a FOR.\nNEXT %s", p_string); printf("code(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

Copy this to file Error.c and save it. Now, compile Bxbasic.c. This should have compiled without error. If not, go back over the code and confirm that everything is where it should be. Assuming it did compile, try Bxbasic.exe with this new version of Test.bas: test.bas version 7.3 CLS GOSUB TOP GOSUB Center GOSUB Bottom GOTO TheEnd '---------------------Center: FOR x% = 1 TO 5 PRINT "*"; FOR y% = 1 TO 28 PRINT " "; NEXT y% PRINT "*": NEXT x% RETURN '---------------------TOP: Bottom: FOR x% = 1 TO 30 PRINT "*"; NEXT x% PRINT "": RETURN '---------------------TheEnd: END '---------------------'

244

As you can see, this Test.bas makes use of GOSUB's as well as FOR/NEXT. It should generate a box made of stars in the upper left corner of the display, like this: ****************************** * * * * * * * * * * ******************************

POWER & MODULUS
There are still two arithmetic operators that need to be included in the math expression parser. They are the "power-of" and "modulus" operators. The power-of operator's symbol is the "^" and it is used to raise a number to the power of another number. What that means is, the first number is multiplied by itself, times the value of the second number. Example:

10 ^ 1 10 ^ 2 10 ^ 3
ten to the power of 3 is:

equates to: 10 equates to: 10*10 equates to: 10*10*10

10 ^ 3 = 10*10*10 = 1000
The modulus operator's symbol in some languages (C in particular) is the "%", but in Basic, the abbreviation "MOD" is used. I suppose the reason why Basic doesn't use the "%" symbol is because that is the same symbol used to identify Integer variables. I guess that can get confusing. (Standard Basic does not appear to distinguish between integers and long integers). Modulus is used to return the remainder of a division problem. Example:

10 MOD 3 = 1
What is taking place in the above example can best be characterized as: the remainder of: 10/3 = 1 Modulus always returns the remainder of a division expression. For our purposes, the word MOD is rather cumbersome and unworkable. The way our expression parser is designed, it uses ascii characters as operators, such as "*/+-". So, on one hand, we have an operator symbol that is used as a variable type symbol and on the other hand we have a word operator where we need a character symbol. Fortunately, because of the way the expression parser was written, we can write an expression that will accept the "%" symbol as the modulus symbol and accept that same character as a variable type symbol, as long as we don't confuse the modulus operator for the long integer operator.

245

Example:

or:

varA = varB % varC varA% = varB% % varC%

As shown in the first example, as long as there is a space between the variable name and the modulus symbol, it will be recognized as the modulus operator. In the second example, varB% will be identified as a type long integer and the "%" character will be correctly identified as the modulus symbol. Perhaps to reduce the number of blank spaces in source code, you will frequently see programmers write an expression like this:

or

varA=varB+varC varA=varB*varC

If, however, a person were to write the following expression:

varA=varB%varC
This example would generate an error, because the % symbol, in this case, would incorrectly be identified as a type long integer symbol and not a modulus operator. Because the symbol is in direct contact with the variable name, it can only be interpreted in that way. Additionally, the parser would fail to locate an expression operator. The expression would not be interpreted as being valid. The same expression would have to be written as:

varA = varB % varC
to be correctly interpreted. In the case of the word MOD, if it were allowed to be used, this could help eliminate some of the confusion in writing expressions. Example:

or

varA=varB MOD varC varA%=varB%MOD varC%

As long as the word MOD did not come in direct contact with either of the variable names, and a variable did not begin or end with the letters "MOD" this could be made to work. Of course, MOD would be a reserved word and no variable could be named MOD. Since our expression parser can not accept a word as an operator, as currently written, we would need to prescreen the expression and replace any occurrences of MOD with the % operator. To substitute the word MOD with the % operator at runtime could significantly slow program execution. This would need to be done during the "Input" phase of the program. Since the word MOD is three characters in length, each of those three characters would need to be changed. A suitable substitution would be the modulus symbol followed by two blank spaces. Example:

varA = varB MOD varC varA = varB % varC
This could be accomplished with a code segment looking something like this: **(assume that variable "temp" is a character string and contains the word "MOD")

246

if(strcmp(temp, "MOD") == 0) { p_string[pi] = '%'; pi++; p_string[pi] = ' '; pi++; p_string[pi] = ' '; }

/* compare(temp,"MOD") */

This might be acceptable, but, the problem with this is that at runtime, the parser would need to skip past those two extra blank spaces to reach the variable name that followed. This solution would add extra clock cycles to the execution time. A better solution might be to eliminate those extra clock cycles by squeezing those extra blank spaces out. What is required to do that is a routine that would in effect rewrite that entire line of source code. Example: instead of this:

varA = varB MOD varC varA = varB % varC
you would have this:

varA = varB MOD varC varA = varB % varC
That would result in two less characters for the parser to have to identify and process. This could be accomplished with a routine similar to this: **(assume that "si" points to the character "M" and "pi" points to the first character after the word MOD):

if(strcmp(temp, "MOD") == 0) /* compare(temp,"MOD") */ { p_string[si] = '%'; si++; while(pi < len) /* squeeze p_string */ { p_string[si] = p_string[pi]; pi++; si++; } p_string[si] = '\0'; }

This would produce the desired effect. Since many compilers display the number of lines compiled and the speedy performance of the compiler in terms of compile time, someone might be thinking that all of this just adds more work for the compiler to have to do and that it would slow down the compiler. Displaying compiler performance is nothing more than commercial advertising. I would like to make it clear that compiler performance is not important. Today, in fact, how long the compiler takes to compile a source file is of no consequence. What is important is performance at runtime. Anything that can be done to increase performance at runtime is worth consideration. To enable these features, we obviously have to incorporate them into the math expression parser, but, there is something to consider, operator precedence. In different languages, different operators have different degrees of precedence. In C, the "power of" operator is in fact a function called "pow()" and not a symbolic character. Function

247

calls have a higher precedence than any operator. In Standard Basic, the "power of" symbol ("^") also has the highest precedence. In both Basic and C, the modulus operator is grouped between the Multi-Ops and the AddOps, but, more closely associated with the Multi-Ops. For our purposes, we will simplify this and group the "power of" and modulus operators with the Multi-Ops. This will give these operators the same precedence as multiply and divide. By doing it this way, we need only to add them to the "switch-case" block in Term(), in Rdparser.c. Example: switch(ch) { case '*': Match('*'); Value = Value * Factor(); break; case '/': Match('/'); Value = Value / Factor(); break; case '^': Match('^'); Value = pow(Value, Factor()); break; case '%': Match('%'); Value = (int) Value % (int) Factor(); break; }

There are numerous other small changes that will have to be made for all of this to take effect. The largest is the addition of the new function for handling the modulus word-to-symbol conversion. Let's call this new function get_MOD(). Here is the code :

void get_MOD(int pi) { char ch, temp[VAR_NAME]; int i, si=0, xi, len; len = strlen(p_string); pi++; pi = iswhite(pi); ch = p_string[pi]; while( pi < (len - 2)) { if(ch == 'M') { si = pi; i = 0; while(isalnum(ch)) { temp[i] = ch; pi++; i++; ch = p_string[pi];

/* is this a MOD */

248

(Continued) } temp[i] = '\0'; if(strcmp(temp, "MOD") == 0) /* compare(temp,"MOD") */ { p_string[si] = '%'; si++; xi = si; /* save si pointer */ while(pi < len) /* squeeze p_string */ { p_string[si] = p_string[pi]; pi++; si++; } p_string[si] = '\0'; pi = xi; /* restore pi */ } pi++; } else { pi++; } ch = p_string[pi]; } } /*---------- end get_MOD ---------*/

Open file: Input.c. Copy this code to that file. Now we have to add the calls to get_MOD(). In function tmp_byte(), amend this segment to read as follows, by adding the call to get_MOD():

void tmp_byte(int ii) [snip...] else if(isalpha(ch)) { si = pi; while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) { byte = 1; /*-->*/ get_MOD(pi); pi = si; } else { a_bort(ab_code, x); } } [snip...]

/* a possible assignment */ /* save pointer position */

/* a variable assignment */ /* scan for a MOD expression */

/* not an assignment */

249

Two changes have to be made to function get_byte(). The first is for the LET byte code and the second is in the final "else" block, as shown. Make these changes:

int get_byte(int ii) { [snip...] /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) { byte=1; get_MOD(pi); } /* scan for a MOD expression */ [snip...] else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) { byte = 1; get_MOD(pi); pi = e_pos; } [snip...]

/* a variable assignment */ /* scan for a MOD expression */ /* push pointer back */

Save Input.c and close it. Open file Rdparser.c. Copy this new code for Term():

double Term() /* Parse and Translate a Math Term */ { char ch; int pi; double Value, power; Value = Factor(); pi = e_pos; ch = p_string[pi]; while(IsMultop(ch)) { switch(ch) { case '*': Match('*'); Value = Value * Factor(); break; case '/': Match('/'); Value = Value / Factor(); break; case '^':

250

Match('^'); Value = pow(Value, Factor()); break; case '%': Match('%'); Value = (int) Value % (int) Factor(); break; default: break; } pi = e_pos; ch = p_string[pi]; } return Value; } /*-------------------------------*/

Save Rdparser.c and close it. These new operators are described in the math header file. Open Bxbasic.c. Add the "include" for the math header file:

/* --- declare headers --- */ #include <stdio.h> #include <conio.h> #include <io.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <malloc.h> #include <math.h>

Save Bxbasic.c and close it. Open file Prototyp.h and add the prototype for get_MOD() to the list under the heading for Input.c:

251

/* ----- function prototypes ----- */ [snip...] Input.c */ void line_cnt(char *argv[]); void load_src(void); void save_tmp(void); void tmp_byte(int); void loader_1(void); void tmp_label(int); void tmp_byte(int); int get_byte(int); void tmp_prog(int); void loader_2(void); void get_MOD(int); [snip...] /*

Save Prototyp.h and close it. Now compile Bxbasic.c. With that done, our parser will now evaluate "power of" expressions that use the "^" symbol and modulus expressions that use either the "%" symbol or the MOD keyword. Now try this version of Test.bas: ' test.bas version 7.4 GOSUB Start1 END ' -----------------------------------------Start1: CLS power = 10 ^ 2 PRINT "power="; power ' ten = 10 three = 3 ' we can use the modulus operator: % mod = ten % three PRINT "mod="; mod ' or we can use the keyword: MOD mod = ten MOD (30 MOD 9) PRINT "mod="; mod RETURN

252

COMPILER UPDATE
There is just one minor addition to be made to Bxcomp.c. Add the declaration for VAR_NAME to the constants list, as shown below:

/* --- declare constants --- */ #define BUFSIZE 256 #define PATH 81 #define TOKEN_LEN 21 #define LLEN 33 #define VAR_NAME 33

ENGINE UPDATE
There are several additions needed to be made to Engine.c. Beginning at the top, add the "include" for the math header ("math.h") as shown here:

/* --- declare headers --- */ #include <stdio.h> #include <conio.h> #include <io.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <malloc.h> #include <math.h> [snip...]

Add these variables to the "global vars" list: /* ------ global vars ------------ */ [snip...] /**/ int *gosub_stack; int gs_ptr=-1; /**/ int fornxt_flg=0; /**/ [snip...] /* For/Next: global flag */ /* Gosub: stack array /* Gosub: stack position pointer */ */

253

The prototypes for do_gosub() and do_return() have to be added to the function prototypes list: /* ----- function prototypes ----- */ void load_bas1(void); void pgm_parser(void); void get_token(void); void parser(void); void go_to(void); void do_gosub(void); void do_return(void); [snip...]

The "include" for file: Loops.c needs to be added to the includes list: /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "enginput.c" #include "rdparser.c" #include "loops.c" [snip...]

Make the additions to parser() so that it reads as follows:

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* parse_let(); break; case 2: /* clr_vars(); break; case 3: /* locate(); break; case 4: /* parse_print(); break; case 5: /* go_to(); break; case 6: /* LET */ CLEAR */ LOCATE */ PRINT */ GOTO */ BEEP */

254

(Continued) beep(); break; case 7: /* CLS */ cls(); break; case 8: /* END */ printf("\nEnd of Program\n"); line_ndx = nrows; break; case 9: /* GOSUB */ do_gosub(); break; case 10: /* RETURN */ do_return(); break; case 11: /* FOR */ do_for(); break; case 12: /* NEXT */ do_next(); break; case -1: /* block label */ break; default: a_bort(ab_code, x); break; } } /*-------------------------------*/

Function go_to() needs only one small adjustment, near the end. Make sure it reads exactly like this:

void go_to() { char ch; char goto_label[LLEN]; int pi, si=0, ab_code=8; int xtest, stlen, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; while(isalnum(ch)) { goto_label[si] = ch; pi++; si++; ch = p_string[pi]; }

255

(Continued) goto_label[si] = '\0'; /* add string terminator */ pi = -1; /* now compare gtl_holder[] to array2[n] */ xtest = -1; while(xtest != 0) { pi++; if(pi == nrows) { strcpy(t_holder, goto_label); a_bort(ab_code, x); /* error, label not found */ } xtest = strcmp(label_nam[pi], goto_label); } pi--; line_ndx = pi; /* set line_ndx to the goto_line */ } /*-------------------------------*/

Copy these functions for do_gosub() and do_return() to Engine.c:

void do_gosub() { unsigned size; gs_ptr++; if(gs_ptr == 0) { size = 1; gosub_stack = malloc(size * sizeof(int)); } else { size = (gs_ptr + 1); gosub_stack = realloc(gosub_stack, size * sizeof(int)); } gosub_stack[gs_ptr] = line_ndx; go_to(); } /*------- end do_gosub ----------*/

void do_return() { unsigned size; if(gs_ptr < 0) { return; } else if(gs_ptr == 0) { line_ndx = gosub_stack[gs_ptr]; free(gosub_stack); } /* return w/no "Gosub" */

/* restore line number */

256

(Continued) else { line_ndx = gosub_stack[gs_ptr]; /* restore line number */ } gs_ptr--; /* decrement stack pointer */ } /*------- end do_return ----------*/

I think that's it. Make all the above changes and additions to Bxcomp.c and Engine.c. Save Bxcomp.c and Engine.c and close them. Now re-compile Bxcomp.c and Engine.c. With Bxcomp.exe, compile and test the different versions of Test.bas used in this chapter.

'

test.bas version 7.5 GOSUB Start1 GOSUB DoubleFloat GOSUB LongIntegers GOSUB RDParser GOSUB PrintVars GOSUB ClearVars GOTO TheEnd

' -----------------------------------------Start1: CLS Hello$ = "hello world!" PRINT Hello$ RETURN ' ------------------------------------------double float DoubleFloat: LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT Hello$ RETURN ' ------------------------------------------long integers LongIntegers: quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT Hello$

257

(Continued) RETURN ' -----------------------------------------RDParser: abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc RETURN ' -----------------------------------------PrintVars: abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz ' long integers PRINT "abc%="; abc%: "xyz%="; xyz% ' float PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# RETURN ' -----------------------------------------ClearVars: test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR CLEAR RETURN ' -----------------------------------------TheEnd: END ' -----------------------------------------##################################################

'

test.bas version 7.6 GOSUB Start1 GOTO TheEnd ' -----------------------------------------Start1: CLS Hello$ = "hello world!" PRINT Hello$ GOSUB DoubleFloat

258

(Continued) RETURN ' ------------------------------------------double float DoubleFloat: LET xylophone# = 50.3 LET yazoo# = 101.25 LET abc = yazoo#/xylophone# xyz = yazoo#/10 LOCATE abc , xyz PRINT Hello$ GOSUB LongIntegers RETURN ' ------------------------------------------long integers LongIntegers: quasar% = 2 zapp% = 4 abc% = (quasar% * quasar% * zapp% + zapp%)/5 xyz% = ((quasar% * quasar%) * zapp%) + zapp% LOCATE abc%, xyz% PRINT Hello$ GOSUB RDParser RETURN ' -----------------------------------------RDParser: abc = 2*(3+4)*5/10 PRINT "": " 2*(3+4)*5/10 ="; abc GOSUB PrintVars RETURN ' -----------------------------------------PrintVars: abc = 11100 xyz = 32000 abc% = 33000 xyz% = 99000 abc! = 33000.33 xyz! = 99000.47 abc# = 333000.33 xyz# = 999000.47 ' integers PRINT "": "abc="; abc: "xyz="; xyz ' long integers PRINT "abc%="; abc%: "xyz%="; xyz% ' float PRINT "abc!="; abc!: "xyz!="; xyz! ' double PRINT "abc#="; abc#: "xyz#="; xyz# GOSUB ClearVars RETURN ' -----------------------------------------ClearVars: test$ = "test" CLEAR abc, xyz, INT, abc%, xyz%, LNG, abc!, xyz!, FLT CLEAR abc#, xyz#, DBL, test$, STR CLEAR RETURN ' ------------------------------------------

259

(Continued) TheEnd: END ' -----------------------------------------##################################################

'

test.bas version 7.7 CLS GOSUB TOP GOSUB Center GOSUB Bottom GOTO TheEnd '---------------------Center: FOR x% = 1 TO 5 PRINT "*"; FOR y% = 1 TO 28 PRINT " "; NEXT y% PRINT "*": NEXT x% RETURN '---------------------TOP: Bottom: FOR x% = 1 TO 30 PRINT "*"; NEXT x% PRINT "": RETURN '---------------------TheEnd: END '---------------------##########################

'

test.bas version 7.8 GOSUB Start1 END ' -----------------------------------------Start1: CLS power = 10 ^ 2 PRINT "power="; power ' ten = 10 three = 3 ' we can use the modulus operator: %

260

(Continued) mod = ten % three PRINT "mod="; mod ' or we can use the keyword: MOD mod = ten MOD (30 MOD 9) PRINT "mod="; mod RETURN ' ------------------------------------------

CONCLUSION
Our vocabulary has grown slightly, here's how it looks: • BEEP :produces a sound • CLS :clear display • END :terminate program • LET :variable assignments • CLEAR :erase variables • LOCATE :display positioning • PRINT :display • GOTO :program control • GOSUB :call a subroutine • RETURN :return from a subroutine and we have expanded our math expression capabilities with the "power of" and modulus operators. There's still more to come.

261

CHAPTER - 8
INTRODUCTION
In this chapter I'd like to take on what I think is perhaps the most difficult addition to our compiler thus far. What I'm talking about is the concept of parsing and execution of conditional expressions; IF,THEN and ELSE. Before we get started with the coding, there are some issues that need to be discussed first. Some of these issues include: • the different forms of the IF construct in it's usage, • executing multiple statements on a single line, • evaluating complex expressions and logical operators, • recursion and nested levels of recursion. These factors will affect how we will implement the IF construct.

IF-THEN-ELSE:
In the early days of PCs, there were very few programming languages available. You had low-level languages like Machine Language (which required a 4 year degree in cryptology to understand how to use it) and Assembly Language and a few high level languages, among them was Basic. Just as now, there were many flavors of the Basic programming language. Just about every personal computer available came with a dialect of Basic bundled along with the operating system. Although Standard Basic existed, almost no one used it. Every PC vendor provided their own version or subset of the language, hence there was no real standard that everyone followed. Among the most widely available Basics was MS-Basic from Microsoft. In fact, Basic is where Microsoft got it's start. In those early days, before venturing into operating systems, Microsoft specialized in porting Basic to the most popular CPUs and microcomputers then in use. Microsoft became the main source of Basic for PC manufacturers and vendors. It's no surprise that Microsoft's various dialects and incarnations of Basic became the defacto standard for the Basic language. In versions of Basic such as MS-Basic and GW-Basic, the IF construct took on the general form of a single line statement: ...the single line statement:

10 20 30

... IF (expression) THEN (statement) ELSE (statement) ...

In this example, if the expression is true, the statement following the keyword "THEN" is executed. Then, normal program execution resumes on the following line, (line 30). If the expression is false, the program resumes with the statement following the "ELSE". Beginning with QuickBasic, the IF statement was expanded to more closely resemble the IF construct then in use by other languages:

262

...the multi-line statement:

10 20 30 40 50 60

... IF (expression) THEN (statement block) (...) ELSE (statement block) ...

In the multi-line statement, if the expression is true, execution may begin on the next line down (line 30) and execute a block of statements, up to the "ELSE" keyword. ...alternately:

IF (expression) THEN (statement block) ELSE IF (expression) THEN (statement block) ELSE (statement block) ENDIF

This latest form included the additions of ELSE IF and ENDIF, where the IF statement is terminated by the "ENDIF" keyword. As a comparison, in C, the IF statement has a more simplified appearance:

if(expression) else
Or:

(statement); (statement);

if(expression) { (statement block) (...) } else { (statement block) (...) }

As you can see, there is no THEN keyword, nor is there an ENDIF. The reason for this is in the way that C is structured. The operative word being "structured". The keyword "THEN" is not incorporated, "THEN" is implied. THEN may be implied in one of two ways; a) the expression may be immediately followed by a single statement, b) or by a statement block enclosed in a pair of left and right curly braces; "{ }". a) and b) also apply to the "ELSE" keyword.

263

The "ENDIF" is implied by the final right curly brace "}" of the "else" statement block. Example:

if(expression) { } else { implied ENDIF --> } implied THEN -->

MULTIPLE STATEMENTS:
Incorporating the conditional IF statement is not simple and Basic's different flavors of the implementation can be somewhat difficult. What makes it difficult has to do with the diversity of the two forms shown. To enable both types of statements involves a piece of code to differentiate one form from the other and then separate pieces of code to interpret each of the two forms. Another difficulty concerns executing multiple statements on a single line. Some languages allow multiple statements on a single line, such as: 10 CLS: PRINT: PRINT "hello world"

and of course, Basic is one of them. On the face of it, this doesn't seem so bad, but, it creates the false impression that fewer lines of code results in tighter or more efficient code. This would be a false conclusion. As in our case, if you are attempting to tokenize each statement for the purposes of compiling and improving performance, having multiple statements on a single program line does not work very well. In our implementation, only one keyword and statement are allowed per program line. This is because each program array element contains only one token. Example: byte Basic statement: code statement:

PRINT "hello world"

[4]

["hello world"]

If the program parser is reading and interpreting a program in a linear fashion, where newline characters have no special meaning and the program is read from beginning to end and newlines are merely skipped over, then multiple statement lines are not a disadvantage. However, that does not serve a useful purpose in our implementation. In the multiple statement example shown above, we would have to pre-scan the source code and mechanically reduce each source code line into single statement lines, prior to compiling.

264

Example: Basic multi-statement line:

10 CLS: PRINT: PRINT "hello world"
would have to be reduced to: byte code statement:

CLS PRINT: PRINT "hello world"

[7] [4] [4]

[] [:] ["hello world"]

The same would have to apply to single line IF statements: Basic statement:

IF(expression) THEN PRINT "hello world" ELSE PRINT "good bye"

would have to be reduced to:

byte code

statement:

IF (expression) THEN PRINT "hello world" ELSE PRINT "good bye"

[17] [18] [4] [19] [4]

[expression] [] ["hello world"] [] ["good bye"]

A more complex example would be:

IF (expression) THEN CLS:PRINT "hello world":GOTO 100 ELSE (statement)

Since we are defining our own language, in both of these cases it would just be easier to make it a requirement, as part of the language definition, that only one statement per line be allowed and IF statements must conform to that requirement as well. Therefore, IF statements should take the form of:

IF (expression) THEN CLS PRINT PRINT "hello world" GOTO 100 ELSE (statement block) (...)

265

Notice I didn't move the THEN keyword down to the line below the IF. In practice, "THEN" should be implied just like in C, so we really don't need it at all. But, for some small bit of conformity, we can keep the THEN at the end of the expression line as sort of a delimiter, but, we know it really serves no purpose. The logic is in the following: 1) 2) if expression is True: a) begin execution on next line. if expression is False: a) advance to ELSE token, b) begin execution on the line that follows.

Doing things this way makes the language cleaner and less confusing to read and debug.

ELSEIF:
I also like the usage of the ELSEIF keyword. I think an ELSEIF token is much more dynamic than the combined "ELSE" and "IF" tokens. When implementing and tokenizing this "ELSE" "IF" statement:

IF (expression) THEN PRINT "hello" ELSE IF (expression) THEN PRINT "good bye" ELSE PRINT "gone"

it would have to be reduced to: byte code statement:

1 2 3 4 5 6 7

IF(expression) PRINT "hello" ELSE IF(expression) PRINT "good bye" ELSE PRINT "gone"

[17] [4] [19] [17] [4] [19] [4]

[expression] ["hello"] [] [expression] ["good bye"] [] ["gone"]

This looks a bit confusing, because it appears that you have two ELSE's. In theory, if an expression is False, the program pointer should be advanced to the ELSE in the IF/ELSE construct. In the above example, that should be line six, but, it looks like it could be line three. Also, the question arises, is the ELSE in line six associated with the IF in line one, or the IF in line four ?

266

In order to use the "ELSE IF" form, a pre-scanner would have to look ahead, past the ELSE, to see if it is followed by an IF, to correctly interpret the meaning. But, by simply changing ELSE IF to the single keyword ELSEIF, and giving it it's own unique token is so much easier to deal with. Thus, the ELSEIF keyword and token would look more like:

ELSEIF (expression) THEN
byte code

statement:

IF(expression) PRINT "hello" ELSEIF(expression) PRINT "good bye" ELSE PRINT "gone"

[17] [4] [19] [4] [20] [4]

[expression] ["hello"] [expression] ["good bye"] [] ["gone"]

In this example, ELSEIF has it's own individual token, (19). As you can see in the above, the token for ELSE really does nothing. It performs no function. It's more of a "label", or a "sign post", that indicates where the THEN statement ends and where in the program to jump to if the Boolean expression is false. Using this method actually can offer an improvement in performance and is easier to understand. By using this construct, an IF statement-block would have: a) only one IF token, b) unlimited ELSEIF tokens, c) only one ELSE token. such as: IF (expression) THEN (statement block) ELSEIF (expression) THEN (statement block) ELSEIF (expression) THEN (statement block) ELSEIF (expression) THEN (statement block) ELSE (statement)

267

ENDIF:
The only thing missing in the above is an "ENDIF" keyword. The reason for having an ENDIF keyword is not immediately apparent, but, it is an important addition. It has a lot to do with "nested IF's" and the ELSE. Nested IF's are IF expressions that are within the statement block following an IF, ELSEIF or an ELSE. Here is an example of a nested IF without and ENDIF keyword:

1 2 3 4 5 6 7

IF (expression-1) THEN IF (expression-2) THEN (statement block) ELSEIF (expression-3) (statement block) ELSE (statement block)

Despite it's appearance, this could be a valid construct. The only problem with it is that there is no delimiter to signify the ending point of the nested IF, that's expression-2 on line 2. As a side note, recall that indenting a program statement is only symbolic and makes the source code easier to read, it means nothing to the compiler. Just because a statement appears indented does not mean that it will be interpreted that way by the compiler. How do we know that the compiler won't misinterpret the above statements as this:

1 2 3 4 5 6 7

IF (expression-1) THEN IF (expression-2) THEN (statement block) ELSEIF (expression-3) (statement block) ELSE (statement block)

Now, consider the above construct written in C:

1 2 3 4 5 6 7 8 9 10 11 12

if(expression-1) { if(expression-2) { (statement) } } else if(expression-3) { (statement) } else { (statement) }

As you can see, C uses the curly braces as delimiters, (i.e.: where something begins or ends). It's not too difficult to see that the nested IF, within the curly braces of expression-1, is a completely separate and valid construct. IF

268

expression-1 is True, then expression-2 is evaluated. In the overall picture, whether or not expression-2 is True or False doesn't matter. After the evaluation and subsequent execution of expression-2, normal program execution resumes on line 12, the line following the right curly brace "}" after the ELSE.

1 2 3 4 5 6 7 8 9 10 11 12

if(expression-1) { if(expression-2) { (statement) } } else if(expression-3) { } else { }

jump

here

Basic uses no delimiters such as curly braces. Our only delimiter so far is the THEN keyword. To make the Basic construct work properly we need a delimiter, a way of telling the compiler what we mean. Example:

IF (expression-1) THEN IF (expression-2) THEN (statement block) ENDIF ELSEIF (expression-3) (statement block) ELSE (statement block) ENDIF

Notice that in this version, there are two ENDIF's, one after the statement block in expression-2 and one after the statement block in ELSE. If we assume that expression-1 is True, after evaluating expression-2, the program pointer has to point to the ending point, beyond the last line in the statement block for the ELSE. Without the terminating ENDIF, consider the following:

IF (expression-1) THEN IF (expression-2) THEN (statement block) ENDIF ELSEIF (expression-3) (statement block) ELSE PRINT "hello" PRINT "world" GOSUB DoSomething

=True

This example moves the program pointer to the second line after the ELSE. This would be incorrect. We have to make a presumption that there is the possibility that there might be more than just one program statement after the ELSE. We can't just jump to the second line as in the example above.

269

Now consider the addition of the ENDIF, in the following:

IF (expression-1) THEN IF (expression-2) THEN (statement block) ENDIF ELSEIF (expression-3) (statement block) ELSE PRINT "hello" PRINT "world" ENDIF GOSUB DoSomething

=True

Without the ENDIF we would have no way of executing a statement block after the ELSE. We would be limited to only a single, one line statement. With this in mind, our language definition has to specify that each IF has to have a corresponding ENDIF. Example:

IF (expression-1) THEN IF (expression-2) THEN (statement block) ENDIF ENDIF

This rule has to apply regardless of whether or not there is an ELSEIF or an ELSE. We have to be consistent in the way we instruct the compiler to handle situations. Now, of course we could redefine our language to minimize the number of keywords and use symbolic tokens like parens and curly braces instead. Perl and JavaScript do it that way, but then, both Perl and JavaScript are really just interpreted descendants of the C language. With only a few minor differences, they both look and feel just like C.

EXPRESSIONS:
The product of an "IF" expression is referred to as a Boolean value. (Boolean is pronounced: "Boo”- “lee" -"an".) In computer programs, as well as in algebraic expressions, there is often a need to know if a thing or a condition is True or False. An expression that returns a True or False as the answer is called a Boolean Expression and it uses Boolean Logical. Human beings use logic everyday and generally without ever thinking about it. There are different types of logic where it pertains to human and animal behavior. Some require an in-depth analysis of a situation or condition to arrive at a conclusion. Logic could be described as the ability to ask a question pertaining to a particular situation that will result in a desirable action or in-action. While many things we do require skill or physical ability, playing a game like Chess is an example where game strategy is the product of numerous logical conclusions. Chess matches may be won based on the ability and speed at which a series of "IF" questions can be asked and answered, based on current, possible and changing conditions and arriving at the correct logical conclusion.

270

Boolean Logic can be expressed as a question where the answer is an unqualified Yes or No. In other words a simple True or False answer. • • a Boolean False is represented by a value of zero (0), also referred to as a NULL, generally, a Boolean True may be represented by any non-zero value, often times a one (1), or a minus one (-1), but, there is no absolute rule governing this.

Example:

Boolean True:(1): Boolean True:(-1): Boolean False:(0) Boolean False:(NULL)

2 = 2 2 = 1+1 2 = 3 2 = 1+2

The desired result of a Boolean expression is not the resulting value of the expression, such as:

result = 2+2
but rather a Yes or No as to whether or not a condition is True or False, such as: Yes or No: is the Left-hand content equal to the Right-hand content ? Example:

True or False

=

is the light on ?

if the answer to the question is Yes, then the Boolean answer is 1.

True or False

=

is the door closed ?

if the answer to the question is Yes, then the Boolean answer is 1.

True or False

=

is the door closed AND is the light off ?

The question is: "is the door closed ?" AND "is the light off ?" if the answer to both questions is Yes, then the Boolean answer is 1.

True or False

=

is the door closed OR is the light off ?

The question is: "is the door closed ?" OR "is the light off ?" if the answer to either questions is Yes, then the Boolean answer is 1.

271

This is Boolean Logic, a simple Yes or No, True or False. Computers, by their very nature, are very well suited for performing Boolean Logic. For our compiler to make Boolean decisions, it will have to deal with different data types and they can come from many different sources. At present, our compiler has two basic data types; numeric values and character strings: • numeric data can come from a numeric constant or a numeric variable and it can be an integer or floating point value, • character strings can be a double quoted string, or a string variable. In the future, our compiler will also need to be able to deal with external sources and devices for it's data. Obviously, when doing comparisons, we can only compare numbers to numbers and strings to strings. I suppose you could compare strings to their ascii number values, but that's a whole different thing. Before we can evaluate an expression we need to make a distinction; is this a numeric expression or is it a character string comparison ? Both sides of the expression have to be of the same data type: does: (number value = number value) does: (character string = character string) After we have determined what the data type is, we can proceed to evaluating the expression. Character string expressions are often times just a comparison of one string to another, such as:

True or False: does: abc$ = xyz$
Numeric expression often times require more than simply comparing two values. On many occasions the expression will contain an algebraic expression that must first be calculated before the logical expression can be evaluated. Example:

IF abc = (2*(x+y)+(xyz/10))
In this expression, the right-hand variable is an unknown until the mathematical calculation is made. Boolean expressions can also be comprised of compound expression, where a logical AND or a logical OR is a factor in the expression. Logical AND and OR are called "logical operators" and are used when more that one set of conditions must be evaluated to arrive at a single conclusion. Example:

IF abc=(2*(x+y)) AND abc$=xyz$

In this example, the expression contains a logical AND and this sets up a condition where, in order for the result to be True, both the left-hand expression and the right-hand expression must be True. Anything else would be False.

True: False: False: False:
Or:

left AND right 1 | 1 1 | 0 0 | 1 0 | 0

272

IF abc=(2*(x+y)) OR abc$=xyz$
In this example, for the logical OR to return a result of True, either the left-hand expression or the right-hand expression may be True, or both the left and right may be True.

True: True: True: False:

left OR right 1 | 1 1 | 0 0 | 1 0 | 0

Also, the logical OR is not the opposite of a logical AND. The opposite of a logical AND is called an XOR, for Exclusive OR. For a logical XOR to be True, only one side of the equation may be True. In other words, either the left side is Exclusively True or the right side is Exclusively True, but not both. Example: True: True: False: False: left XOR right 1 | 0 0 | 1 1 | 1 0 | 0

Theoretically, there is no limit to the depth of the expression, or number of logical operators. There are other logical operators as well, but, we will not get into them yet. For now, we will work on AND and OR. Anytime the Boolean parser encounters a numeric expression that contains a mathematical problem, the formula will be passed to the math expression parser where it will be calculated before evaluating the Boolean expression. The resulting value will then be used to evaluate the Boolean expression.

RECURSION:
Up to this point we haven't discussed recursion even though we have been using it since we added the math expression parser. Recursion takes place when a function is entered into and that same function is called again, before the function has completed it's original task. Sometimes a function is called recursively by another function or even by itself.

273

Example:

void widget() { if(expression) { sprocket(); } } void sprocket() { if(expression) { widget(); } }
This example is really a little bit misleading though. It gives the impression that the same function was re-entered, without regard to any variables that may be in use. Ordinarily, every function creates it's own set of local variables and shares all global variables. Special consideration has to be given to variables where recursion is concerned. As just stated, when a function is entered into, all the local variables are created on the stack. When a function is entered a second time, through recursion, an all new set of variables is created on the stack. The two instances are completely separate and do not share the same set of local variables. Therefore, unless information is passed to the second instance, in the form of parameters, it has no idea of what the prior instance's variables contain. Example:

void widget(int abc) { if(expression) { sprocket(abc); } else { (...) } } void sprocket(int abc) { if(expression) { abc++; widget(abc); } } void widget(int abc) { if(expression) { sprocket(abc); } else { (...) } }

/* first instance:abc=5 */

/* second instance:abc=6 */

In the first instance of widget(), let's say the variable abc has a value of 5. Variable abc's value is then passed to function sprocket(), where the value of abc is incremented just prior to the recursive call to function widget(). As you can see, the value of abc is then passed to widget() for a second time.

274

In the second instance of widget(), the value of abc would be 6, while abc's original value of 5 would still remain in the first instance. Why is this so ? It's because for each instance of a function, a whole new set of local variables is created on the stack. Unlike Global variables, Local variables are not assigned a permanent memory location. Local variable space is created on the stack and destroyed dynamically, as the program progresses. As a function is exited, the variable space it was allocated is freed, thus erasing those variables and their values. So, each recursive call to a function is like entering the function for the first time, with no regard for any prior existing instances and variables. When program execution returns to a pre-existing instance of a function, all the local variables will still be holding their original values, unless that is, a return value alters the value of a specific variable or a variable is declared as "static". Static local variables do have a permanent memory location and thus are affected by recursive calls. Global variables can be altered by any secondary instance of a function, as would be expected. If variable values in a primary instance need to be modified by a recursive secondary instance, then those values will need to assigned to either static variables, global variables, pointers or structures. What else do we need to know about recursion ? The IF parser has to take over program execution while there are program statements in the statement block following an IF or ELSEIF. Example:

IF abc = xyz AND qaz < zaq THEN CLS PRINT "hello world" FOR i=1 to 10 STEP 1 PRINT "." NEXT i ELSEIF abc = xyz OR qaz < zaq THEN IF a$ = b$ GOTO BlockLabel ENDIF ENDIF

Unlike the regular program parser, the IF parser has to be able to deal with recursion, such as recursive calls to "IF". Normally, pgm_parser() increments the line counter and calls function parser(), which executes one program statement and returns and the cycle is repeated. The IF parser has to know when the end of the statement block that is currently being executed has been reached, so that when the end is reached, it can jump over the rest of the statements within the IF construct. In the example above, when the last "NEXT i" in the loop has been executed, the parser has to jump to the last ENDIF before exiting and returning program control to pgm_parser(). If this did not happen, all of the statements that followed would be executed. Doing so could be disastrous.

IF PARSER:
Well, I guess it's time we got started with some coding. Create a new file and name it "Ifendif.c".

275

Copy this header information to the top of file Ifendif.c: /* bxbasic : Ifendif.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

Now copy all of the following functions, from do_if() through get_op() to file Ifendif.c. I will be adding some documentation here, in between the functions, don't copy the documentation to the file.

void do_if() { int bool, els_ndx, end_ndx; bool = boolexpress(); /* --- now take action --- */ if(bool == 1) { els_ndx = find_else(line_ndx); end_ndx = find_endif(line_ndx); line_ndx++; while(line_ndx < els_ndx) { s_pos = 0; e_pos = 0; get_token(); if(token == 5) { break; } parser(); line_ndx++; } if(token == 5) { line_ndx--; return; } line_ndx = end_ndx;

/* find next elseif/else */ /* find the endif */

/* increment line */

} else { els_ndx = find_else(line_ndx); els_ndx--; line_ndx = els_ndx; } } /*------- end do_if ----------*/

/* find this elseif/else */

Function "do_if()" is the main "IF" parser function. It begins by calling "boolexpress()", which returns a Boolean value in "bool". If "bool" is True, the next section will temporarily replace the program parser and execute the statement block following the IF expression. If the value of bool is False, the next step is to call function "find_else()".

276

int find_else(int ndx) { int tok=0; while((tok != 14) && (tok != 15) && (tok != 16)) { ndx++; /* find: elseif, else, endif */ tok = byte_array[ndx]; if(tok == 13) { ndx = find_endif(ndx); /* find next exit point */ } } return ndx; } /*------- end find_else -------*/

"find_else()" is used to find the line index of the next ELSEIF, ELSE, or ENDIF that corresponds to the current "IF" expression. It returns the line index number to the calling function.

int find_endif(int ndx) { int tok=0; while(tok != 16) /* find the exit point */ { ndx++; tok = byte_array[ndx]; if(tok == 13) { ndx = find_endif(ndx); } } return ndx; } /*------- end find_endif -------*/

"find_endif()" is used to locate the line index of the ENDIF that corresponds to the current "IF" expression. It returns the line index number to the calling function.

int boolexpress() { int bool, type, a_bool, or_bool, op; int ab_code=17, x=line_ndx; type = get_type(); /* what type of comparison is it */ if((type == 1) || (type == 2)) { bool = Nboolterm(type); /* numeric evaluation */ } else if((type == 3) || (type == 4)) { bool = Sboolterm(type); /* a string evaluation */ }

277

(Continued) else { a_bort(ab_code,x); } /* --- process AND / OR --- */ op = IsAndOrOp(); while(op != 0) { if(op == 1) { a_bool = AndOrBoolExp(); if((bool == 1) && (a_bool == 1)) { bool = 1; } else { bool = 0; } } else if(op == 2) { or_bool = AndOrBoolExp(); if((bool == 1) || (or_bool == 1)) { bool = 1; } } op = IsAndOrOp(); } return bool; } /*------- end boolexpress --------*/

/* do: AND */

/* do: OR */

"boolexpress()" is called by do_if() to evaluate the "IF(expression)" and return a Boolean value. It begins by determining whether the expression is a numeric or character evaluation, then makes the evaluation. In the event that there are multiple expressions, the next step uses a while loop to evaluate any AND or OR expressions.

int AndOrBoolExp() { int bool, type; int ab_code=17, x=line_ndx; type = get_type(); /* what type of comparison is it */ if((type == 1) || (type == 2)) { bool = Nboolterm(type); /* numeric evaluation */ } else if((type == 3) || (type == 4)) { bool = Sboolterm(type); /* a string evaluation */ } else { a_bort(ab_code,x); } return bool; } /*------- end AndOrBoolExp --------*/

278

"AndOrBoolExp()" works in a way similar to the way boolexpress() does.

int Nboolterm(int type) { char ch; int pi, bool, a_bool=0; double lvalue; if(type == 1) { lvalue = get_avalue(); } else { lvalue = rdp_main(); } bool = Nrelation(lvalue); return bool; } /*------- end Nboolterm --------*/ /* variable or digit value */ /* expression within parens */

"Nboolterm()" evaluates all numeric value expressions. If the left most part of the expression contains a variable name or digits, get_avalue() is called, if it is an algebraic expression, rdparser() is called to get the left hand value of the expression. It then calls "Nrelation()" to get the right hand value.

int Nrelation(double lvalue) { int bool, op, type; int ab_code=17, x=line_ndx; double rvalue; op = get_op(); /* get eval op type */ type = get_type(); /* get right side type */ if(type == 1) { rvalue = get_avalue(); /* variable or digit value */ } else if(type == 2) { rvalue = rdp_main(); /* expression within parens */ } else { a_bort(ab_code,x); } bool = eval_value(lvalue,rvalue,op); /* evaluate operators */ return bool; } /*------- end Nrelation --------*/

"Nrelation()" gets the operator between the left and right values and then gets the right hand value of the expression. It then evaluates the expression using the left and right hand values. A Boolean value is returned based on the result.

279

int eval_value(double lval,double rval,int op) { int bool=0; if(op == 1) { if(lval == rval) { bool = 1; } } else if(op == 2) { if(lval < rval) { bool = 1; } } else if(op == 3) { if(lval > rval) { bool = 1; } } else if(op == 4) { if(lval <= rval) { bool = 1; } } else if(op == 5) { if(lval >= rval) { bool = 1; } } else if(op == 6) { if(lval != rval) { bool = 1; } } return bool; } /*------- end eval_value -------*/

"eval_value()" makes the actual evaluation of the left and right hand values. The operator is passed to "eval_value()" in variable "op" as an integer from 1 to 6.

280

int Sboolterm(int type) { char lstring[BUFSIZE]; int bool, ndx; if(type == 3) { ndx = get_string(); /* string variable */ strcpy(lstring, sv_stack[ndx]); } else { get_qstring(); /* quoted string */ strcpy(lstring, s_holder); } bool = Srelation(lstring); return bool; } /*------- end Sboolterm --------*/

"Sboolterm()" evaluates all character and string expressions. If the left hand part is a variable name, the character string is copied to "lstring", otherwise the quoted string is copied to "lstring".

int Srelation(char *lstr) { char lstring[BUFSIZE], rstring[BUFSIZE]; int bool, ndx, op, type; int ab_code=17, x=line_ndx; strcpy(lstring, lstr); op = get_op(); /* get eval op type */ type = get_type(); /* get right side type */ if(type == 3) { ndx = get_string(); /* string variable */ strcpy(rstring, sv_stack[ndx]); } else if(type == 4) { get_qstring(); /* quoted string */ strcpy(rstring, s_holder); } else { a_bort(ab_code,x); } bool = eval_string(lstring,rstring,op); return bool; } /*------- end Srelation --------*/

"Srelation()" gets the operator and the right hand part of the expression. It then compares the two character strings.

281

int eval_string(char *lstr,char *rstr,int op) { char lstring[BUFSIZE], rstring[BUFSIZE]; int bool=0, comp; strcpy(lstring, lstr); strcpy(rstring, rstr); comp = strcmp(lstring, rstring); /* --- now test expression --- */ if(op == 1) { if(comp == 0) { bool = 1; } } else if(op == 2) { if(comp < 0) { bool = 1; } } else if(op == 3) { if(comp > 0) { bool = 1; } } else if(op == 4) { if(comp <= 0) { bool = 1; } } else if(op == 5) { if(comp >= 0) { bool = 1; } } else if(op == 6) { if(comp != 0) { bool = 1; } } return bool; } /*------- end eval_string -------*/

"eval_string()" is passed the operator in variable "op" as an integer. It then does a string comparison to evaluate the character strings.

int IsAndOrOp() { char ch; int pi, op=0; pi = e_pos;

282

(Continued) pi = iswhiter(pi); ch = p_string[pi]; if(ch == '&') { op = 1; pi++; pi = iswhiter(pi); } else if(ch == '|') { op = 2; pi++; pi = iswhiter(pi); } e_pos = pi; return op; } /*------- end IsAndOrOp --------*/

"IsAndOrOp()" is used to determine whether or not the expression just evaluated is followed by an AND or OR.

int get_type() { char ch; int pi, type=0; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; if(isalpha(ch)) { type = get_vtype(pi); } else if(isdigit(ch)) { type = 1; } else if(ch == '(') { type = 2; } else if(ch == '\"') { type = 4; } e_pos = pi; return type; } /*------- end get_type ---------*/

/* variable name */ /* number value */ /* expression in parens */ /* quoted string */

"get_type()" is a utility used to determine the type of expression that needs to be evaluated. Such as: a digit value, variable name, algebraic expression or quoted string.

283

int get_vtype(int pi) { char ch; int type=0; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#", ch)) { type = 1; /* a numeric variable */ } return type; } /*------- end get_vtype --------*/

"get_vtype()" determines if a variable name is a string variable or a numeric variable type.

int get_string() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; /* --- get varname --- */ strcpy(varname, get_varname()); /* --- get stack index --- */ while((ndx < smax_vars) && (strcmp(sn_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == smax_vars) /* error: did not find it */ { a_bort(ab_code, x); } pi = e_pos; pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; return ndx; } /*------- end get_string -------*/

"get_string()" returns the array index for a string variable.

284

void get_qstring() { char ch, quote='\"'; int pi, si=0, stlen, ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* plant "pi" with first quote */ pi++; ch = p_string[pi]; /* --- we now have first character --- */ /* --- fill buffer with string --- */ si = 0; while((ch != quote) && (pi < stlen)) { s_holder[si] = ch; si++; pi++; ch = p_string[pi]; } s_holder[si] = '\0'; if(pi == stlen) /* error:if at end of line */ { a_bort(ab_code,x); } pi++; e_pos = pi; } /*------- end get_qstring -------*/

"get_qstring()" is used to copy a double quoted string into global variable "s_holder".

int get_op() { char ch; int pi, op, ab_code=18, x=line_ndx; pi = e_pos; ch = p_string[pi]; if(strchr("\"$%!#", ch)) { pi++; } pi = iswhite(pi); ch = p_string[pi]; if(ch == '=') { op = 1; } else if(ch == '<') { pi++; ch = p_string[pi]; if(ch == '>') { op = 6; } else if(ch == '=')

/* increment past current symbol */

/* an is_equal evaluation */

/* a not_equal evaluation */

285

(Continued) { op = 4; } else { op = 2; } } else if(ch == '>') { pi++; ch = p_string[pi]; if(ch == '=') { op = 5; } else { op = 3; } } else { a_bort(ab_code,x); } if(strchr("=>", ch)) { pi++; } e_pos = pi; return op; } /*------- end get_op -------*/ /* a less-than or equal eval */ /* a less-than evaluation */

/* a greater-than or equal eval */ /* a greater-than evaluation */

"get_op()" returns an integer token for the operator of an expression. Such as: =, <, >, <=, >=, <>. Save Ifendif.c and close it. Open Bxbasic.c. Add "Ifendif.c" to the function includes list as shown here:

/* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "input.c" #include "rdparser.c" #include "loops.c" #include "ifendif.c"

286

To function parser(), add to the switch/case: case 13 through 16, as shown:

void parser() [snip] case 13: /* do_if(); break; case 14: /* do_if(); break; case 15: /* return; break; case 16: /* return; break; [snip] IF */ ELSEIF */ ELSE */ ENDIF */

Save Bxbasic.c and close it. Open Error.c and add error codes 17 and 18 to a_bort():

void a_bort(int code,int line_ndx) [snip] case 17: printf("\nIF:Operand Type error: in line: "); printf("%d.\n%s\nNot a valid ", (line_ndx+1), p_string); printf("variable type.\ncode(%d)\n", code); break; case 18: printf("\nRelational Operator Type error: in line: "); printf("%d.\n%s\nValid operators:", (line_ndx+1), p_string); printf(" =<> .\ncode(%d)\n", code); break; [snip]

Save Error.c and close it. Open Input.c. Copy these new versions of these functions:

287

int get_byte(int ii) { char ch, keyword[TOKEN_LEN]; int pi, si=0, byte; int x=ii, ab_code=4; pi = e_pos; ch = p_string[pi]; while(isalnum(ch)) { keyword[si] = ch; si++; pi++; ch = p_string[pi]; } keyword[si] = '\0'; /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) { byte=1; get_MOD(pi); } /* scan for a MOD expression */ else if(strcmp(keyword, "CLEAR") == 0) byte=2; else if(strcmp(keyword, "LOCATE") == 0) byte=3; else if(strcmp(keyword, "PRINT") == 0) byte=4; else if(strcmp(keyword, "GOTO") == 0) byte=5; else if(strcmp(keyword, "BEEP") == 0) byte=6; else if(strcmp(keyword, "CLS") == 0) byte=7; else if(strcmp(keyword, "END") == 0) byte=8; else if(strcmp(keyword, "GOSUB") == 0) byte=9; else if(strcmp(keyword, "RETURN") == 0) byte=10; else if(strcmp(keyword, "FOR") == 0) byte=11; else if(strcmp(keyword, "NEXT") == 0) byte=12; else if(strcmp(keyword, "IF") == 0) byte=13; else if(strcmp(keyword, "ELSEIF") == 0) byte=14; else if(strcmp(keyword, "ELSE") == 0) byte=15; else if(strcmp(keyword, "ENDIF") == 0) byte=16; else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) /* a variable assignment */ { byte = 1; get_MOD(pi); /* scan for a MOD expression */ pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } e_pos = pi; return byte; } /*---------- end get_byte ----------*/

288

void loader_2() { int ndx, ii, line_count=0, lines=nrows; unsigned size; /* --- re-count number of lines --- */ for(ndx=0; ndx < nrows; ndx++) { if(temp_byte[ndx] != 0) { line_count++; } if((temp_byte[ndx] == 13) || (temp_byte[ndx] == 14)) { token_if(ndx); /* tokenize expression */ } } nrows = line_count; /* --- create program arrays --- */ array1 = malloc(nrows * sizeof(char *)); label_nam = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { label_nam[ii] = malloc(LLEN * sizeof(char)); } byte_array = malloc(nrows * sizeof(int)); /* --- transfer temp_arrays to program_arrays --- */ ndx = 0; for(ii=0; ii < lines; ii++) { if(temp_byte[ii] != 0) { strcpy(label_nam[ndx], temp_label[ii]); byte_array[ndx] = temp_byte[ii]; /**/ size = strlen(temp_prog[ii]); size++; array1[ndx] = malloc(size * sizeof(char)); strcpy(array1[ndx], temp_prog[ii]); ndx++; } } /* --- free temp array memory --- */ for(ii=0; ii < lines; ii++) { free(temp_label[ii]); free(temp_prog[ii]); } free(temp_label); free(temp_byte); free(temp_prog); } /*---------- end loader_2 ----------*/

289

And, add this new function to file Input.c:

void token_if(int ndx) { char temp[TOKEN_LEN]; int loc; strcpy(p_string, temp_prog[ndx]); s_pos = 0; strcpy(temp, "AND"); loc = 0; while(loc >= 0) { loc = find_strng(temp); loc--; if(loc > 0) { p_string[loc] = '&'; p_string[(loc+1)] = ' '; p_string[(loc+2)] = ' '; } } strcpy(temp, "OR"); loc = 0; while(loc >= 0) { loc = find_strng(temp); loc--; if(loc > 0) { p_string[loc] = '|'; p_string[(loc+1)] = ' '; } } strcpy(temp, "THEN"); loc = 0; while(loc >= 0) { loc = find_strng(temp); loc--; if(loc > 0) { p_string[loc] = '\n'; p_string[(loc+1)] = '\0'; } } strcpy(temp_prog[ndx], p_string); } /*---------- end token_if ----------*/

/* replace AND w/token */

/* replace OR w/token */

/* replace THEN w/EOL */

"token_if()" tokenizes the AND's, OR's and THEN's in an "IF" statement. Save Input.c and close it.

290

Open file Utility.c. Copy this function to it:

int find_strng(char *tmp) { char ch, cx, quote='\"'; char temp[TOKEN_LEN], xxstring[TOKEN_LEN]; int pi, i, mark, len, len2; strcpy(xxstring, tmp); len = strlen(p_string); /* locate xxstring within p_string */ len2 = strlen(xxstring); pi = s_pos; /* plant pi w/starting position */ ch = p_string[pi]; cx = xxstring[0]; /* plant cx with 1st char in xxstring */ while(pi < len) { while((ch != cx) && (pi < len)) /* find cx in p_string */ { if(ch == quote) /* line contains a quoted string */ { pi++; /* advance past quoted strings */ ch = p_string[pi]; while(ch != quote) { pi++; ch = p_string[pi]; } } pi++; ch = p_string[pi]; } if((pi == len) || (pi > len)) { mark = 0; /* an error trap, in case string not found */ return mark; } mark = pi; /* this marks the beginning of search string */ for(i=0; i < len2; i++) /* load temp with test string */ { temp[i] = ch; pi++; ch = p_string[pi]; } temp[i] = '\0'; if(strcmp(temp, xxstring) != 0) /* test failed, loop again */ { pi = (mark+1); /* advance pi by 1 from "mark" */ ch = p_string[pi]; } else { mark++; pi = len; } } return mark; } /*-------- end find_strng ---------*/

"find_strng()" is a utility used to locate a character string within a source code statement. Save Utility.c and close it.

291

Open file Prototyp.h. Make these additions and adjustments: /* Input.c */ void line_cnt(char *argv[]); void load_src(void); void save_tmp(void); void tmp_byte(int); void loader_1(void); void tmp_label(int); void tmp_byte(int); int get_byte(int); void tmp_prog(int); void loader_2(void); void get_MOD(int); void token_if(int);

/*

Utility.c */ int get_upper(int,int); int get_alpha(int,int); int get_digit(int,int); int iswhite(int); void clr_arrays(void); int iswhiter(int); int find_strng(char *); void int int int int int int int int int int int int void int int int Ifendif.c */ do_if(void); boolexpress(void); Nboolterm(int); Nrelation(double); eval_value(double,double,int); AndOrBoolExp(void); IsAndOrOp(void); Sboolterm(int); Srelation(char *); eval_string(char *,char *,int); get_type(void); get_vtype(int); get_string(void); get_qstring(void); get_op(void); find_endif(int); find_else(int);

/*

Save Prototyp.h and close it. Okay, now compile Bxbasic.c.

292

Try it with this version of Test.bas: ' test.bas version 8.1 abc = 99 xyz = 33 abc$ = "test" xyz$ = "testing"

' IF xyz$ = "testing" AND abc >= xyz THEN PRINT "if:expression = true" ELSEIF abc <= 100 OR abc$ <= "hello" THEN PRINT "elseif:expression = true" ELSE PRINT "else:expressions = false" ENDIF PRINT "done" ' -----------------------------------------TheEnd: END ' ------------------------------------------

Come up with some "IF" expressions of your own.

ENGINE:
Now let's bring the engine up to date. Open Engine.c. Make the addition of Ifendif.c to the function includes:

/* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "enginput.c" #include "rdparser.c" #include "loops.c" #include "ifendif.c"

Add case 13 through 16 to function parser():

293

void parser() [snip] case 13: /* do_if(); break; case 14: /* do_if(); break; case 15: /* return; break; case 16: /* return; break; [snip] Save Engine.c and close it. Now compile both Bxcomp.c and Engine.c. Using Bxcomp.exe, compile Test.bas: Bxcomp Test IF */ ELSEIF */ ELSE */ ENDIF */

CONCLUSION
Well there you have it, a compiler that will evaluate and execute conditional expressions using the IF/ELSE construct. I told you this part wasn't going to be easy and it wasn't. The mechanics of implementing conditional expressions requires a good deal of understanding. Especially in this case where recursion is quite common. Another thing, this implementation of IF/ELSE was a design of our own. It doesn't truly follow either the GW or QBasic model exactly. One reason for that was we were taking the easy way out. Rather than add all the code that would be needed to implement both versions, we combined the two into a single form that works quite well. By simply adding the code that is unique to each of the two types, (GW and QB), a combined version if the IF/ELSE construct could be created that would accept and execute both versions. If you would like, as a side project, you can make a version that function so that it mirrors both. There's still more to come.

294

CHAPTER - 9
INTRODUCTION
I hope you found the last chapter interesting. I have to say, having completed the "IF" parser, things should be a lot easier from here on out. Both the math and the "IF" expression parsers are really tough nuts to crack. With them out of the way we should be able to make good progress. In this chapter I'd like to resume where we left off on working with string variables and the various string functions. There are numerous functions for dealing with character strings that slice and dice them up any way you want. Where we left off the last time we worked with character strings was assigning quoted strings to a variable. The first thing we need to do is add the ability to reassign one string to another. That's where we will begin.

STRING ASSIGNMENTS:
Let's begin by going back to file Variable.c. Open that file now. In function parse_let(), we need to branch out to a section that will deal exclusively with strings. Take a look at parse_let() and locate the part that identifies the "$" character: [snip] strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- is this a character string --- */ if(ch == '$') { nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); strng_assgn(ndx); } [snip]

This part we want to change. What we want to do at this point is make a call to a separate function that is going to steer us in the right direction. Change parse_let() so that this part reads as follows:

295

void parse_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- is this a character string --- */ if(ch == '$') { e_pos = pi; parse_str(varname); } [snip]

the rest of parse_let() remains the same. As you can see, we now call a new function; parse_str() and pass along to it the "varname". Since character strings can be manipulated in so many ways, we might as well have a separate parser for dealing with them. Create a new file and name it "Strings.c". And, copy this header information to the top:

/* bxbasic : Strings.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

Here is the code for our new function parse_str():

void parse_str(char *name) { char ch, varname[VAR_NAME]; int pi, ndx; int ab_code=11, x=line_ndx; strcpy(varname, name); pi = e_pos;

296

(Continued) ch = p_string[pi]; nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); pi = e_pos; ch = p_string[pi]; if(ch == '\"') { strng_assgn(ndx); /* quotes string */ } else { strvar_assgn(ndx); /* copy variable */ } } /*---------- end parse_str ----------*/

As you can see, this is pretty much a copy of the part we deleted from the section in parser_let(). At the bottom, you can see that we add a new function call to: strvar_assgn(). Here is the code for that new function:

void strvar_assgn(int ndx) { char ch, varname[VAR_NAME]; int pi, indx=0, ab_code=13, x=line_ndx; unsigned size; strcpy(varname, get_varname()); while((indx < smax_vars) && (strcmp(sn_stack[indx], varname) != 0)) { indx++; /* find varname in stack */ } if(indx == smax_vars) /* error: did not find it */ { a_bort(ab_code, x); } /* --- copy string to string --- */ size = strlen(sv_stack[indx]); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], sv_stack[indx]); } /*------ end strvar_assgn -------*/

It's pretty straight forward and not very complicated. It receives the integer variable "ndx" as a parameter, which is the array index of the destination variable. Then a call is made to function get_varname() to get the name of the source variable and that name is copied to "varname". A simple look-up routine is executed to locate the array index for the source variable. The destination variable is resized and the character data is copied. Real simple.

297

Copy the code for the above two functions to file Strings.c. Below is the code for function strng_assgn(), which is currently in file Variable.c. Copy this to Strings.c as well. And delete it from file Variable.c:

void strng_assgn(int ndx) { char ch, quote='\"'; int pi, stlen, si=0, ab_code=6, x=line_ndx; unsigned size; stlen = strlen(p_string); pi = e_pos; /* plant "pi" with first quote */ pi++; ch = p_string[pi]; /* --- we now have first character --- */ /* --- fill buffer with string --- */ si = 0; while((ch != quote) && (pi < stlen)) { s_holder[si] = ch; si++; pi++; ch = p_string[pi]; } s_holder[si] = '\0'; if(pi == stlen) /* error:if at end of line */ { a_bort(ab_code,x); } /* --- copy buffer to string_stack --- */ size = strlen(s_holder); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], s_holder); } /*------ end strng_assgn -------*/

There are a few changes to be made to file Prototyp.h. Add the following list to that file: [snip] /* Strings.c */ void parse_str(char *); void strng_assgn(int); void strvar_assgn(int); [snip]

Additionally, delete the declaration for:

298

void strng_assgn(int); from the "Variable.c" list. If we don't do it now, it will cause us problems later. In file: Bxbasic.c, we need to add the filename: "Strings.c" to the list of includes. Do so as shown here: [snip] /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "input.c" #include "rdparser.c" #include "loops.c" #include "ifendif.c" #include "strings.c" [snip]

Now save everything and compile Bxbasic.c. That done, copy this version of Test.bas: ' test.bas version 9.1 CLS abc$ = "testing" xyz$ = abc$ PRINT abc$ PRINT xyz$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

and enter:

Bxbasic Test
Okay, now we have variable to variable assignments.

299

STRING FUNCTIONS:
Here is a list of some of the character string manipulation functions we will add in this chapter: • CHR$(code) • LEFT$(xyz$, n) • RIGHT$(xyz$, n) • MID$(xyz$, start, len) • SPACE$(n) • STR$(n) • STRING$(n, char) The one thing which they all share in common, that I find a bit odd, is the syntax in which they are written. For example, a statement using CHR$ is written as:

abc$ = CHR$(code)
What I find odd is that the function name comes after the destination variable name, as opposed to before it. I suppose that is just one of the many idiosyncrasies of the Basic language. The above statement would appear more logical it it were written as:

CHR$(abc$, code)
Also, that would make tokenizing the statement a lot simpler, because you would know at the beginning of the statement that it was going to result in a string function. As an example, in C, the function name is followed by parenthesis and inside the parenthesis are the variables to be operated on. Such as:

strcpy(abc, xyz)
The reason I mention this has to do with tokenizing the statement. If you don't tokenize the string functions at compile time, then at runtime you have to waste time making the determination of whether or not a string variable assignment relates to another string variable or to a string function. For instance, let's say you have written an application (a database) with your newly created language and you have one thousand string variables in use (not unrealistic). Before you can execute this statement:

abc$ = CHR$(code)
you have to: • 1) identify abc$, • 2) make the determination: is CHR$: a variable name, or a function, • 3) then branch based on the results of step 2). If, however, the statement had been written as:

CHR$(abc$, code)
CHR$ could be tokenized and branching could take place at the start of the statement. In a worst case scenario, CHR$ could be compared to a list of string functions and the program would branch. The way that I would like to approach this, at least for now, would be to accommodate the standard format that is used in Basic and then when we have more time, we can expand the language to include the method used in the last example. The way I propose to resolve this is to go ahead and tokenize the function name, at compile time, so that the program can branch immediately on encountering the function token.

300

As an example, this statement:

abc$ = CHR$(code)
would be tokenized to:

abc$ = 1(code)
Normally, in a character string expression, the first character encountered on the right-hand side of the equation is going to be either a double quote, ("), or an alpha character representing a variable name. Such as:

or

abc$ = "hello" abc$ = xyz$

What is not expected is a digit:

abc$ = 1
In fact, finding a digit would make it an invalid statement. In the tokenized statement shown above, the digit would represent a valid statement which could be verified by the presence of the parens:

abc$ = 1(code)
We already have experience tokenizing the right-hand side of the expression. We first had to do it when we changed "MOD" to "%" in function get_MOD() and we've done it again, most recently in tokenizing IF/THEN/ELSE statements. So, we will begin there, near the final input stage of the compiler.

TOKENS:
The first thing we are going to do is open file: Input.c. Now scroll down to function loader_2(). At this point in the program, we have already tokenized the IF/ELSE statements and have the entire program ready to run. What we will do is, at this point: [snip] void loader_2() { int ndx, ii, line_count=0, lines=nrows; unsigned size; /* --- re-count number of lines --- */ for(ndx=0; ndx < nrows; ndx++) { if(temp_byte[ndx] != 0) { line_count++; } if((temp_byte[ndx] == 13) || (temp_byte[ndx] == 14))

301

(Continued) { } [snip] token_if(ndx); /* tokenize expression */

Just after this "if" expression, we will add an "else if" (shown below) that will look for "LET" (byte code-1) assignment statements. [snip] else if(temp_byte[ndx] == 1) { str_functn(ndx); /* tokenize string functions */ } [snip]

This will call a separate parsing routine that will then scan the program line looking for string function names. When one is found, the function name will be replaced by a numeric token. We could just as well use a single ascii character and the result would be the same. In the long run, it might be worth an experiment to see which way offers better performance. For now though, because digits are more visually representative of a token's ranking in a list, we will use numbers for tokens. Here is the new version of loader_2():

void loader_2() { int ndx, ii, line_count=0, lines=nrows; unsigned size; /* --- re-count number of lines --- */ for(ndx=0; ndx < nrows; ndx++) { if(temp_byte[ndx] != 0) { line_count++; } if((temp_byte[ndx] == 13) || (temp_byte[ndx] == 14)) { token_if(ndx); /* tokenize expression */ } else if(temp_byte[ndx] == 1) { str_functn(ndx); /* tokenize string functions */ } } nrows = line_count; /* --- create program arrays --- */ array1 = malloc(nrows * sizeof(char *)); label_nam = malloc(nrows * sizeof(char *)); for(ii = 0; ii < nrows; ii++) { label_nam[ii] = malloc(LLEN * sizeof(char)); } byte_array = malloc(nrows * sizeof(int));

302

(Continued) /* --- transfer temp_arrays to program_arrays --- */ ndx = 0; for(ii=0; ii < lines; ii++) { if(temp_byte[ii] != 0) { strcpy(label_nam[ndx], temp_label[ii]); byte_array[ndx] = temp_byte[ii]; /**/ size = strlen(temp_prog[ii]); size++; array1[ndx] = malloc(size * sizeof(char)); strcpy(array1[ndx], temp_prog[ii]); ndx++; } } /* --- free temp array memory --- */ for(ii=0; ii < lines; ii++) { free(temp_label[ii]); free(temp_prog[ii]); } free(temp_label); free(temp_byte); free(temp_prog); } /*---------- end loader_2 ----------*/

Here is the code for str_functn():

void str_functn(int ndx) { char ch, temp[VAR_NAME]; int pi=0, type;

/* tokenize string functions */

strcpy(p_string, temp_prog[ndx]); type = get_vtype(pi); if(type != 3) /* is it a string variable */ { return; /* exit if not */ } while(IsEqu(pi) == 0) /* advance to '=' sign */ { pi = e_pos; pi++; } pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; if(isalpha(ch) == 0) { return; /* a quoted string */ } else if(isupper(ch) == 0)

303

(Continued) { return; /* not uppercase */ } e_pos = pi; strcpy(temp, get_varname()); /* get varname */ e_pos = pi; get_strfunc(temp,ndx); /* identify function */ } /*---------- end str_functn ----------*/

At the beginning of this function, the program line is copied to the program string. Then a call is made to get_vtype() to see it this is a (type=3) string operation. If not, we make a quick exit. That is followed by a pair of tests to see if the right-side is a quoted string or if the varname/keyword is in lower case. All keywords must be in upper case. Passing those two tests, we then call get_strfunc(). Here is that code:

void get_strfunc(char *name,int ndx) { char varname[VAR_NAME], temp[TOKEN_LEN]; strcpy(varname, name); temp[0] = '\0';

/* identify function */

/* --- now compare to functions --- */ if(strcmp(varname, "CHR") == 0) { strcpy(temp, " 1"); /* is this a chr$(n) assnmnt */ } else if(strcmp(varname, "LEFT") == 0) { strcpy(temp, " 2"); /* a left$(a$,n) assnmnt */ } else if(strcmp(varname, "RIGHT") == 0) { strcpy(temp, " 3"); /* a right$(a$,n) assnmnt */ } else if(strcmp(varname, "MID") == 0) { strcpy(temp, " 4"); /* a mid$(a$,s,n) assnmnt */ } else if(strcmp(varname, "SPACE") == 0) { strcpy(temp, " 5"); /* a space$(n) assnmnt */ } else if(strcmp(varname, "STR") == 0) { strcpy(temp, " 6"); /* a str$(x) assnmnt */ } else if(strcmp(varname, "STRING") == 0) { strcpy(temp, " 7"); /* a string$(n,a) assnmnt */ } str_copy(temp,ndx); /* replace function name w/token */ } /*---------- end get_strfunc ----------*/

304

This function should not be too difficult to understand. It simply compares the "varname/keyword" to the list of keywords. If a match is found, a string containing the token is copied to string variable "temp". At that point, function str_copy() is called. Here is that code:

void str_copy(char *temp,int ndx) { char ch, tok[TOKEN_LEN]; /* replace function name w/token */ int pi, si=0; strcpy(tok, temp); pi = e_pos; ch = tok[si]; if(ch != '\0') { while(ch != '\0') { p_string[pi] = ch; si++; pi++; ch = tok[si]; } strcpy(temp_prog[ndx], p_string); e_pos = pi; } } /*---------- end str_copy ----------*/

If a valid keyword has been located, the keyword will be over-written by the contents of "tok". Then the program line is rewritten back to the temp-program array. Otherwise, the program line in question would just be skipped over and the next line examined. This extra amount of work may slow down the compiler to a small degree, but, that doesn't matter. All of this is to the benefit of the runtime engine and improved performance. Here is a small function that we need to add, too:

int IsEqu(int pi) { char ch; int bool=0; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; if(ch == '=') { bool = 1; } return bool;

/* is equal sign */

} /*---------- end IsEqu ----------*/

Copy all of these (above) functions to file: Input.c. And, save it.

305

Open file: Prototyp.h. Add these prototypes to the list under "Input.c": void int void void str_functn(int); IsEqu(int); get_strfunc(char *,int); str_copy(char *,int);

and save it. That takes care of tokenizing string functions. Next, we will put them to use.

STRINGS:
Begin by opening file: Strings.c. We are going to make a number of changes to what we already have in Strings.c. These changes will help streamline the whole character string operation. We will begin with function parse_str(). Here is the new code:

void parse_str(char *name) { char ch, varname[VAR_NAME]; int pi, ndx; unsigned size; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); pi = e_pos; ch = p_string[pi]; if(ch == '\"') { strng_assgn(); /* quotes string */ } else if(isdigit(ch)) { asn_function(); /* string function */ } else { strvar_assgn(); /* copy variable */

306

(Continued) } size = strlen(s_holder); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], s_holder); } /*---------- end parse_str ----------*/

As you can see, the bottom section is completely different. First, if the character on the right-side of the expression is a numeric token (digit), the program branches to: asn_function(), where all string functions will be parsed. Second, the actual string assignment does not take place until returning to this function from either of the three branches. Those branches store their results in string variable "s_holder". This, I hope, will make these routines more versatile. Here is the new code for function strng_assgn():

void strng_assgn() { char ch, quote='\"'; int pi, stlen, si=0, ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi++; ch = p_string[pi]; /* plant "pi" with first quote */

/* --- fill buffer with string --- */ si = 0; while((ch != quote) && (pi < stlen)) { s_holder[si] = ch; si++; pi++; ch = p_string[pi]; } s_holder[si] = '\0'; if(pi == stlen) /* error:if at end of line */ { a_bort(ab_code,x); } } /*---------- end strng_assgn ----------*/

As you can see, the assignment portion has been stripped from the bottom. Here is the new code for function strvar_assgn():

307

void strvar_assgn() { int indx; /* --- get string index --- */ indx = get_strndx(); strcpy(s_holder, sv_stack[indx]); } /*---------- end strvar_assgn ----------*/

Whoa! It wasn't a very big function to begin with, but, now it's down to only three lines of code! That has a lot to do with the next function. What I've done is take the common portion of several other functions and reduce it to a single callable function that performs a single task. That way, rather than having redundant code in lots of functions, each function just calls this one routine. Here is the code for function get_strndx():

int get_strndx() { char varname[VAR_NAME]; int indx=0, ab_code=13, x=line_ndx; /* --- get string name --- */ strcpy(varname, get_varname()); while((indx < smax_vars)&&(strcmp(sn_stack[indx],varname) != 0)) { indx++; } if(indx == smax_vars) /* error: did not find it */ { a_bort(ab_code, x); } return indx; } /*---------- end get_strndx ----------*/

This routine simply returns the stack_array[index] for the given string variable. Here is the code for new function asn_function():

void asn_function() { int type; type = (int) get_avalue(); switch(type)

308

(Continued) { case 1: chrstr(); break; case 2: leftstr(); break; case 3: rightstr(); break; case 4: midstr(); break; case 5: spacestr(); break; case 6: strsval(); break; case 7: stringstr(); break; default: /* error */ break;

} } /*---------- end asn_function ----------*/

This is a simple "switch-case" that calls the appropriate character string function based on the token. The first is the CHR$ function. Example:

abc$ = CHR$(32)
This has the effect of loading a single ascii character into string variable abc$. Here is the code for chrstr():

void chrstr() { int pi; long xxxchar; pi = e_pos; /* pi enters pointing to: ( */ pi++; /* advance to alpha/num: 10) */ pi = iswhite(pi); e_pos = pi; xxxchar = (long) get_avalue(); s_holder[0] = (int) xxxchar;

309

(Continued) s_holder[1] = '\0'; } /*---------- end chrstr ----------*/

Next is LEFT$. Where a left substring is copied to the destination string. Example:

abc$ = LEFT$(xyz$, 3)
This would have the effect of copying the three left most characters of the source string to the destination string. Here is a good example of where I think a more logical syntax, when using string functions, would be: LEFT$(abc$, xyz$, 3) (maybe it's just me, I don't know). Here is the code for leftstr():

void leftstr() { int i, pi, indx, count, len; pi = e_pos; /* pi enters pointing to: ( */ pi++; /* advance to first alpha: a$,n) */ pi = iswhite(pi); e_pos = pi; /* --- get string index --- */ indx = get_strndx(); pi = e_pos; /* pi re-enters pointing to: $,n) */ pi += 2; /* advance to first digit: n) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); if(count < 1) { count = 0; } len = strlen(sv_stack[indx]); if(count > len) { count = len; } for(i=0; i < count; i++) { s_holder[i] = sv_stack[indx][i]; } s_holder[count] = '\0'; } /*---------- end leftstr ----------*/

There are a couple of safe-guards here, in case the count (number of characters to transfer) is too little or too great. The value of count is corrected so that the program does not crash. The function allows integer variables to be used and since you can't predict what a variable might contain at any given time, it is better to be safe. Also, this allows the program to keep running without throwing an error.

310

Next is RIGHT$. This is the exact opposite of LEFT$:

abc$ = RIGHT$(xyz$, 3)
This would copy the three right-most characters to the destination. Here is the code:

void rightstr() { int ii, pi, indx, count, len, left; pi = e_pos; /* pi enters pointing to: ( */ pi++; /* advance to first alpha: a$,n) */ pi = iswhite(pi); e_pos = pi; /* --- get string index --- */ indx = get_strndx(); pi = e_pos; /* pi re-enters pointing to: $,n) */ pi += 2; /* advance to first digit: n) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); if(count < 1) { count = 0; } len = strlen(sv_stack[indx]); if(count > len) { count = len; } left = (len - count); for(ii=0; ii < count; ii++, left++) { s_holder[ii] = sv_stack[indx][left]; } s_holder[count] = '\0'; } /*---------- end rightstr ----------*/

Next is MID$. This allows you to copy a chunk out of the center of a character string. You can specify both the left boundary and the number of characters to transfer. Example:

abc$ = MID$(xyz$, 5, 10)
This would begin at character number 5, from the left, and copy the next 10 characters to the destination. Here is the code:

311

void midstr() { char ch; int ii, pi, indx, count, len, left; pi = e_pos; /* pi enters pointing to: ( */ pi++; /* advance to first alpha: a$,s,n) */ pi = iswhite(pi); e_pos = pi; /* --- get string index --- */ indx = get_strndx(); pi = e_pos; /* pi re-enters pointing to: $,s,n) */ pi += 2; /* advance to first al/num: s,n) */ pi = iswhite(pi); e_pos = pi; /* --- get left start point --- */ left = (int) get_avalue(); pi = e_pos; /* pi re-enters pointing to: ,n) */ pi = iswhite(pi); /* -or- pointing to: ) */ ch = p_string[pi]; e_pos = pi; if(ch == ')') { count = 255; /* force it to upper limit */ } else { pi++; /* advance to first alpha/num: n) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); } if(count < 1) { count = 0; } len = strlen(sv_stack[indx]); left--; /* correct string[index] */ if((left + count) > len) { count = (len - left); } for(ii=0; ii < count; ii++, left++) { s_holder[ii] = sv_stack[indx][left]; } s_holder[count] = '\0'; } /*---------- end midstr ----------*/

Next is SPACE$. This populates the destination string with "n" number of blank spaces. Here is the code:

312

void spacestr() { char space=32; int ii, pi, count; pi = e_pos; /* pi enters pointing to: $(n) */ pi++; /* advance to first alpha/num: n) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); if(count < 1) { count = 0; } else if(count > 255) { count = 255; } for(ii=0; ii < count; ii++) { s_holder[ii] = space; } s_holder[count] = '\0'; } /*---------- end spacestr ----------*/

Next is STR$. This converts a number value to a character string. Example:

abc$ = STR$(n)
where "n" would be a numeric variable. Here is the code:

void strsval() { char ch; int pi; double value; pi = e_pos; pi++; pi = iswhite(pi); e_pos = pi; /* --- get value --- */ value = get_avalue(); /* pi enters pointing to: (abc) */ /* advance to first alpha: abc) */

/* --- convert value to string --- */ strcpy(s_holder, value2strng(value)); } /*---------- end strsval ----------*/

313

Next is STRING$. This function is a little like CHR$ and SPACE$ combined. It populates the destination string with "n" number of an ascii character. Example:

abc$ = STRING$(10, "*")
This would fill the destination with 10 asterisks. Here is the code:

void stringstr() { char ch, char_x, quote='\"'; int ii, pi, count; long xxx; pi = e_pos; /* pi enters pointing to: (num, chr) */ pi++; /* advance to first number: num, chr) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); if(count < 1) { count = 0; } else if(count > 255) { count = 255; } pi = e_pos; /* pi re-enters pointing to: ,chr) */ pi++; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; /* --- get character --- */ if(ch == quote) /* is it a quoted char: "*") */ { pi++; ch = p_string[pi]; char_x = ch; while(ch != ')') /* advance to paren */ { pi++; ch = p_string[pi]; } e_pos = pi; } else { xxx = (long) get_avalue(); char_x = xxx; } for(ii=0; ii < count; ii++) { s_holder[ii] = char_x; } s_holder[count] = '\0'; } /*---------- end stringstr ----------*/

314

Copy all these (above) functions to file: Strings.c. And, save it. Add these prototypes to the list under "Strings.c" in Prototyp.h: void void void void void void void void int asn_function(void); chrstr(void); leftstr(void); rightstr(void); midstr(void); spacestr(void); strsval(void); stringstr(void); get_strndx(void);

and save it. I think that's it. I hope I didn't leave anything out. Compile Bxbasic.c. Using Bxbasic.exe, try this version of Test.bas: ' test.bas version 9.2 CLS xyz = 42 abc$ = CHR$(xyz) PRINT "abc$ = ";abc$ xyz$ = "testing" abc$ = LEFT$(xyz$, 4) PRINT abc$ ' abc$ = RIGHT$(xyz$, 5) PRINT abc$ ' abc$ = MID$(xyz$, 3, 3) PRINT abc$ ' abc$ = SPACE$(3) PRINT ">";abc$;"<" ' abc$ = STR$(199) PRINT abc$ ' abc$ = STRING$(10, xyz) PRINT ">";abc$;"<" ' -----------------------------------------TheEnd: END ' ------------------------------------------

'

315

CONCATENATION:
The obvious thing missing is the ability to combine or concatenate strings and characters together, such as in this expression:

abc$ = CHR$(34) + "Hello" + CHR$(32) + "world!" + CHR$(34)
Unfortunately, the Basic language does not incorporate special "escape" characters the way many other languages do. In the above example, ascii character 34 is the double quote, ("). Some special characters present a problem when you want to print them or copy them to a character string. The double quote is one of them. For instance, if you wanted a string to contain the following:

"hello world!"
including the pair of quotes, if you were new to programming, you might try to do something like:

abc$ = ""hello world!""
which would be a completely invalid statement. That is because, (in Basic,) the double quote is treated as a "delimiter". A delimiter is a character that is used to designate the beginning or ending of an object or a separator between two objects. In a language like C, the expression could easily be made legal by adding an escape character, like:

"\"hello world!\""
In this example, the inner most pair of quotes are treated like a normal text character and included in the string, because the back-slash, "\", is the escape character. It basically says; "the character that follows is a normal text character". Since Basic has no escape character, we have to concatenate special characters using the CHR$ function. That is why the first statement reads the way it does:

abc$ = CHR$(34) + "Hello" + ...
Situations like this make string concatenation an everyday occurrence in Basic. Of course there are any number of reasons for needing to concatenate strings. This makes concatenation something we can't do without.

CODING IT:
The simplest string concatenation statement might look something like this:

abc$ = abc$ + xyz$
If we examine this statement, we can see that the right-hand side of the expression;

abc$ + xyz$
is a compound statement. Compound statements are not new to us. We've seen them in PRINT statements, where we print out multiple strings. What is new though, is when the statement gets a little more complex by calling string functions, such as:

abc$ = CHR$(34) + CHR$(32) + CHR$(34)

316

It's clear that this kind of statement would have to be run through a looping function tokenizer, at the input stage of the compiler. Currently, our tokenizing routine accepts the simplest statements:

abc$ = CHR$(34)
To make these routines perform in a way so that they accept a statement like this:

abc$ = CHR$(34) + CHR$(32) + CHR$(34)
will require a considerably more versatility piece of code than it has now. It will have to ignore quoted strings and string variables while it scans across the statement, looking for keyword names, until it reaches the end of the program line. Actually, that isn't going to be all that hard to do. Here is the code with the changes:

void str_functn(int ndx) { char ch, temp[VAR_NAME]; int pi=0, type, len;

/* tokenize string functions */

strcpy(p_string, temp_prog[ndx]); type = get_vtype(pi); if(type != 3) /* is it a string variable */ { return; /* exit if not */ } while(IsEqu(pi) == 0) /* advance to '=' sign */ { pi = e_pos; pi++; } pi = e_pos; len = strlen(p_string); /* --- loop --- */ while(pi < len) { pi = get_upper(pi, len); /* advance to uppercase */ if(pi < len) { e_pos = pi; strcpy(temp, get_varname()); /* get varname */ s_pos = e_pos; ch = p_string[s_pos]; if(ch == '$') /* valid '$' symbol */ { e_pos = pi; get_strfunc(temp,ndx); /* identify function */ } pi = e_pos; pi++; } } } /*---------- end str_functn ----------*/

317

void str_copy(char *temp,int ndx) { char ch, tok[TOKEN_LEN]; /* replace function name w/token */ int pi, si=0; strcpy(tok, temp); pi = e_pos; ch = tok[si]; if(ch != '\0') { while(ch != '\0') { p_string[pi] = ch; si++; pi++; ch = tok[si]; } strcpy(temp_prog[ndx], p_string); e_pos = pi; } else { e_pos = s_pos; /* not a function */ } } /*---------- end str_copy ----------*/

If you compare this version of function str_functn() to it's prior version, you can see the main difference is the bottom part, marked "loop". That part of the code looks a little different, but, it does pretty much the same thing that it did before, just inside of a loop. The loop condition could be described as: "do this for as long as "pi" is less than "len"). The routine will now translate and tokenize all function names in a statement. Copy these two functions to file: Input.c, replacing the existing ones. The next thing to change is going to be in the execution routines that are in the runtime engine. Those are in file: Strings.c. Here is the new code for function parse_str():

void parse_str(char *name) { char ch, varname[VAR_NAME], temp[BUFSIZE]; int pi, ndx; unsigned size; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi++;

318

(Continued) pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); strcpy(temp, str_express()); /* --- make string assignment --- */ size = strlen(temp); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], temp); } /*---------- end parse_str ----------*/

You will notice that from "Match" on down, it's all different. Now, instead of calling the string functions, a single call is made to function str_express(). Also, str_express() returns a character string that is copied to "temp". In the previous version, the string functions stored their data in the global variable "s_holder" and upon returning back to this function, the data in "s_holder" was copied to the destination variable. The first part of what I just said still takes place, but, now the data is stored in a temporary variable, which is then returned back to this function. Here is the code for str_express():

char *str_express() { char ch; static char temp[BUFSIZE]; int pi, len; temp[0] = '\0'; len = strlen(p_string); pi = e_pos; while(pi < len) { ch = p_string[pi]; if(ch == '\"') { strng_assgn(); } else if(isdigit(ch)) { asn_function(); } else { strvar_assgn(); } strcat(temp, s_holder); pi = e_pos; pi++; pi = iswhite(pi); e_pos = pi; if(pi < len)

/* quotes string */ /* string function */ /* copy variable */ /* concatenate */

319

(Continued) { Match('+'); pi = e_pos; /* concatenate symbol */

} } return temp; } /*---------- end str_express ----------*/

This function does what the old parse_str() used to do, but, it does it inside of a loop. That is how we will enable string concatenations. Again, the loop condition is to loop until the end of line is reached. If the program statement is a simple assignment, then it falls through and returns to the calling function. There is only one other minor change that has to be made and that is to: strng_assgn(). Here is the new version:

void strng_assgn() { char ch, quote='\"'; int pi, stlen, si=0, ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; pi++; ch = p_string[pi]; /* plant "pi" with first quote */

/* --- fill buffer with string --- */ si = 0; while((ch != quote) && (pi < stlen)) { s_holder[si] = ch; si++; pi++; ch = p_string[pi]; } s_holder[si] = '\0'; if(pi == stlen) /* error:if at end of line */ { a_bort(ab_code,x); } e_pos = pi; } /*---------- end strng_assgn ----------*/

Copy these three (above) functions to Strings.c, replacing the two existing ones. We need to add a new prototype for str_express() to prototyp.h. Copy this to the list under the heading of "Strings.c":

320

char *str_express(void);

That's all we need. Save everything and re-compile Bxbasic.c. Using Bxbasic.exe, here is a new Test.bas to try:

'

test.bas version 9.3 CLS xyz = 42 xyz$ = "testing"

' abc$ = LEFT$(xyz$, 4) + CHR$(xyz) + RIGHT$(xyz$, 5) + SPACE$(3) + MID$(xyz$, 3, 3) + STR$(-199) + STRING$(10, xyz) PRINT abc$ ' abc$ = CHR$(34) + "Hello" + CHR$(32) + "world!" + CHR$(34) PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

**Note: In the first abc$ assignment, what appears as two lines, is actually all on one program line. As:
abc$ = LEFT$(xyz$, 4) + CHR$(xyz) + RIGHT$(xyz$, 5) + SPACE$(3) + MID$(xyz$, 3, 3) + STR$(-199) + STRING$(10, xyz)

ENGINE UPDATE:
It might not be a bad idea if we incorporated these additions to our Engine and Compiler. Since all of the additions involved pre-existing files, we don't have to make too many changes to the Engine. Open Engine.c. To the "function includes" section, add the "include" for file Strings.c, as shown here: [snip] /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c"

321

(Continued) #include #include #include #include #include [snip] "enginput.c" "rdparser.c" "loops.c" "ifendif.c" "strings.c"

That's it. Now save Engine.c. For the compiler update, open file: Bxcomp.c. We are going to need to borrow two functions that are located in other files, to complete the update to Bxcomp.c. These functions are being used as part of the new input routine. They are: • get_varname(), from file: Variable.c • get_vtype(), from file: Ifendif.c I said we need to borrow them, what I mean by that is, we are going to copy them over to Bxcomp.c. You might be wondering: why not just add an "#include" statement for those two files ? If we did that, then we would have to include several other files too, or else the C compiler would generate several warnings and errors due to missing functions. If we added all the files and functions required just to stop the errors, we would just be bloating the compiler for no good reason. It will just be a lot easier to copy those two functions and put them right where we need them. Here are the two functions we need, copy them to the bottom of Bxcomp.c:

char *get_varname() { char ch; static char varname[VAR_NAME]; int pi, si=0; pi = e_pos; ch = p_string[pi]; while((isalnum(ch) != 0)) { varname[si] = ch; si++; pi++; ch = p_string[pi]; } varname[si] = '\0'; e_pos = pi; return varname; } /*-------------------------------*/

322

int get_vtype(int pi) { char ch; int type=0; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#", ch)) { type = 1; /* a numeric variable */ } return type; } /*------- end get_vtype --------*/

There! No more needs to be done. The prototypes are already declared in Prototyp.h and anything else we need will take care of itself. Save Bxcomp.c and close it. Now compile Bxcomp.c. With a little luck, we'll have no errors. Now compile Engine.c. There should be no errors here either. Now, using Bxcomp.exe, compile Test.bas, Enter:

Bxcomp Test
Now execute Test.exe. That should have been rather painless.

CONCLUSION
Now we have character string functions and the ability to parse and concatenate compound string expressions. And, we did all that with a very small amount of added code.

323

CHAPTER - 10
INTRODUCTION
In this issue, I'd like to pickup where we left off on our scripting engine and add some more functions. In the last chapter we added a number of character string functions. Now I think it's time we explored the subject of console input and inputting data directly from the keyboard. A program, especially a programming language, can't do much if you aren't able to interact with the user. So, for this issue we will focus on the different ways our language will interact with the outside world.

INKEY$:
There is one more string function I'd like to add and that is the INKEY$ function. Even though this is a character input function, it is both a console input and string function. The Inkey$ is a little different from some of the other keyboard input functions in that it has a unique feature in how it works. Here is a Basic statement showing how the Inkey$ is used:

a$ = INKEY$
There doesn't appear to be anything special between this and any of the other string functions we have covered already, except that we know that it accepts keyboard input. There are two things different about it though: • it will only accept a single character from the keyboard, • it will not wait (pause) for the user to strike a key. "Okay," you might be asking, "if it will only accept the input of a single character and it won't wait for me to even do that, what good is it" ? Well, it turns out that the Inkey$ function is one of the most widely used forms of keyboard input and it is extremely versatile, at the same time. Allow me to explain. 1) Inkey$'s popularity of usage: • in one scenario, it allows the program to ask a question of the user where the answer requires a simple Yes or No. In that case, pressing the "Y" key for Yes, or the "N" key for No. Example: "Do you accept these terms ?: (Y)es or (N)o • there are many other situations where a single key-press is all that is required, such as a pause in the program, like responding to an error and the user is asked to press a key to continue.

2) Inkey$'s versatility: • while this function does not pause the program until the user has hit a key on the keyboard, it is generally put into a loop, where a number of other things can be done, in the background during the time that elapses before the user strikes a key.

324

Example: Loop: a$ = INKEY$ LOCATE x, y PRINT "Time: "; TIME$ GOSUB CheckPort ... IF a$ = "" THEN GOTO Loop ENDIF ...

' update clock display ' check input port for data

due to the fact that computers operate at very high cpu speeds these days, programs can be written so that the program is multitasking, in the background, during the time it takes for the user to respond the a prompt to press a key. • programs such as data entry or word processing programs can benefit greatly by trapping and decoding each key that is pressed, in order to detect and respond to control keys that affect how the program functions. Example: • Loop: a$ = INKEY$ IF a$ = "" THEN GOTO Loop ELSEIF a$ = UpArrow THEN GOTO Prior ELSEIF a$ = DownArrow THEN GOTO Next ELSE data$ = data$ + a$ GOTO Loop ENDIF ...

' no key pressed ' prior data field ' next data field ' add character to string

By this example, you can see, Inkey$ can be much more versatile than the other keyboard input functions. To enable Inkey$, we will need to add a string function to the list of routines in file Input.c. In function get_strfunc(), we will add an eighth string function to the list. Here is the new get_strfunc() as it should appear:

void get_strfunc(char *name,int ndx) { char varname[VAR_NAME], temp[TOKEN_LEN]; strcpy(varname, name); temp[0] = '\0'; /* --- now compare to functions --- */ if(strcmp(varname, "CHR") == 0)

/* identify function */

325

(Continued) { strcpy(temp, " 1"); /* is this a chr$(n) assnmnt */ } else if(strcmp(varname, "LEFT") == 0) { strcpy(temp, " 2"); /* a left$(a$,n) assnmnt */ } else if(strcmp(varname, "RIGHT") == 0) { strcpy(temp, " 3"); /* a right$(a$,n) assnmnt */ } else if(strcmp(varname, "MID") == 0) { strcpy(temp, " 4"); /* a mid$(a$,s,n) assnmnt */ } else if(strcmp(varname, "SPACE") == 0) { strcpy(temp, " 5"); /* a space$(n) assnmnt */ } else if(strcmp(varname, "STR") == 0) { strcpy(temp, " 6"); /* a str$(x) assnmnt */ } else if(strcmp(varname, "STRING") == 0) { strcpy(temp, " 7"); /* a string$(n,a) assnmnt */ } else if(strcmp(varname, "INKEY") == 0) { strcpy(temp, " 8"); /* an inkey$ assnmnt */ } str_copy(temp,ndx); /* replace function name w/token */ } /*---------- end get_strfunc ----------*/

Copy this to Input.c, and save it. The next changes we need to make will be in file Strings.c. Beginning with asn_function(), we will add a "case 8:" to the list of string functions. Here is the new code:

void asn_function() { int type; type = (int) get_avalue(); switch(type) { case 1: chrstr(); break; case 2: leftstr(); break; case 3: rightstr(); break; case 4: midstr();

326

(Continued) break; case 5: spacestr(); break; case 6: strsval(); break; case 7: stringstr(); break; case 8: inkeystr(); break; default: /* error */ break; } } /*---------- end asn_function ----------*/

As you can see, "case 8:" makes a call to function inkeystr() and below is the code. In the first statement, a keyboard test function is called to determine whether or not a key has been depressed: • if not, global string "s_holder" is assigned a null character, • when a key has been pressed, the C library function getch() is called to read the character from the keyboard buffer. Certain keys issue a control code, which is made up of two characters. The first is ascii character zero (null) and the second is the ascii character. From within a program you can test for a control key by either testing the length of the string, to see if it's length is one character or two characters, or by testing character[0] to see if it is a blank space.

void inkeystr() { char ch; int chr; chr = kbhit(); if(chr == 0) { s_holder[0] = '\0'; } else { ch = getch(); if(ch == 0) { s_holder[0] = ' '; ch = getch(); s_holder[1] = ch; s_holder[2] = '\0'; } else /* test for keyboard hit */

/* read key pressed */ /* a control code */ /* read secondary code */

327

(Continued) { } } } /*---------- end inkeystr ----------*/ s_holder[0] = ch; s_holder[1] = '\0'; /* an ascii character */

Copy this to Strings.c and save it. Next, we will have to update the prototypes in file: Prototyp.h. Under the "Strings.c" list, add inkeystr(): /* Strings.c */ void parse_str(char *); void strng_assgn(void); void strvar_assgn(void); void asn_function(void); void chrstr(void); void leftstr(void); void rightstr(void); void midstr(void); void spacestr(void); void strsval(void); void stringstr(void); int get_strndx(void); char *str_express(void); void inkeystr(void);

Now, compile Bxbasic.c. With the new Bxbasic.exe, compile and run this Test.bas: ' test.bas version 10.1 CLS Start: abc$ = INKEY$ IF abc$ = "" THEN GOTO Start ENDIF PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

328

INPUT:
"Input" is also a data entry function that accepts input from the keyboard . There are several differences between Input and Inkey$: • • • • • Input will accept as many characters as you type, data may be stored in either a character string or a numeric variable, an optional quoted string, that is to be displayed, may be included in the statement multiple variables may appear in the same statement, entry is terminated by the Enter key (return).

The standard format is:

INPUT [;] ["prompt"] variable [;prompt;[variable]]
A typical statement using Input might look like this:

INPUT "Enter your Name: "; name$
For multiple variables, the text and cursor position may be made to appear on the same line, or on separate lines. Example, for this statement:

INPUT "Enter your name: "; name$; "your age: "; age
the output would be:

Enter your name: [Dave] your age: [20]
By inserting a semi-colon, between the word INPUT and the "quoted string", will cause the returns (enter-key), terminating data input, to be ignored. Example:

INPUT ;"Enter your name: "; name$; "your age: "; age **semi-colon----^
the output would be:

Enter your name: [Dave]your age: [20]
Delimiters after each quoted string or variable may be: • a comma "," • a semi-colon ";" or • a colon ":". A comma will cause the next string or cursor position to be shifted to the right, one tab position. A semi-colon will cause the next string or cursor position to be right beside the last string or cursor position. A colon will cause a newline to be printed and the next display position will be in the left column. The following statement:

INPUT ;"Enter your name: ": name$: "your age: ": age: colon------^ -----^ ------------^ ---^

329

will be displayed as:

Enter your name: [Dave] your age: [20]
The first thing we need to do is add a new keyword to the list in function get_byte(), in file Input.c. "Input" will be item number and byte code 17 on the list:

int get_byte(int ii) { char ch, keyword[TOKEN_LEN]; int pi, si=0, byte; int x=ii, ab_code=4; pi = e_pos; ch = p_string[pi]; while(isalnum(ch)) { keyword[si] = ch; si++; pi++; ch = p_string[pi]; } keyword[si] = '\0'; /* --- assign byte code --- */ if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) { byte=1; get_MOD(pi); } /* scan for a MOD expression */ else if(strcmp(keyword, "CLEAR") == 0) byte=2; else if(strcmp(keyword, "LOCATE") == 0) byte=3; else if(strcmp(keyword, "PRINT") == 0) byte=4; else if(strcmp(keyword, "GOTO") == 0) byte=5; else if(strcmp(keyword, "BEEP") == 0) byte=6; else if(strcmp(keyword, "CLS") == 0) byte=7; else if(strcmp(keyword, "END") == 0) byte=8; else if(strcmp(keyword, "GOSUB") == 0) byte=9; else if(strcmp(keyword, "RETURN") == 0) byte=10; else if(strcmp(keyword, "FOR") == 0) byte=11; else if(strcmp(keyword, "NEXT") == 0) byte=12; else if(strcmp(keyword, "IF") == 0) byte=13; else if(strcmp(keyword, "ELSEIF") == 0) byte=14; else if(strcmp(keyword, "ELSE") == 0) byte=15; else if(strcmp(keyword, "ENDIF") == 0) byte=16; else if(strcmp(keyword, "INPUT") == 0) byte=17; else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) /* a variable assignment */ { byte = 1; get_MOD(pi); /* scan for a MOD expression */ pi = e_pos; /* push pointer back */

330

(Continued) } else { a_bort(ab_code, x); /* not a keyword or variable */ } } e_pos = pi; return byte; } /*---------- end get_byte ----------*/

Copy this change and save it.

Next, in Bxbasic.c, we need will need to add a new "include" for file: Getinput.c to the list: /* --- function includes --- */ #include "prototyp.h" #include "error.c" #include "utility.c" #include "output.c" #include "variable.c" #include "input.c" #include "rdparser.c" #include "loops.c" #include "ifendif.c" #include "strings.c" #include "getinput.c"

To function parser(), add "case 17", as shown here:

void parser() [snip] case 17: /* INPUT */ get_input(); break; case -1: /* block label */ break; default: a_bort(ab_code, x); break; } } /*-------------------------------*/

331

Copy these changes and save it. Getinput.c is a new file and will contain the three functions that comprise the Input function. Create Getinput.c. Now copy this header to the top of the file:

/* bxbasic : Getinput.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

Beginning with get_input(), this is the function that is called by parser() and it uses a while-loop to process single or multiple variables in a program statement. The first thing this function does is check to see if the first character in p_string is a semi-colon: if(ch == ';') /* do not echo newline */ { loc = 1; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; }

The semi-colon in that position signifies that returns entered at the end of an input are supposed to be ignored. This allows the cursor to remain on the same line, instead of dropping down a line when the enter key is pressed. If a semi-colon is found, variable "loc's" value is set to "1". Later in the program, "loc" will be tested to see whether or not it is set. As shown below, the while-loop loops until the end-of-line. Inside the loop, if the character encountered is a double quote, then what follows is a quoted string. The program branches to the routine that displays the text. If an alpha character is encountered, what follows is a variable. while(pi < len) /* process to end of line */ { if(ch == '\"') /* INPUT "Print string"; */ { [snip] } /* --- a string$ or numeric variable --- */ else if(isalpha(ch)) { [snip] }

The part that handles the variables is shown here. Function get_vtype() is called to determine whether this is a string or numeric variable. If it is a string variable, input_str() is called, otherwise input_val() is called.

332

else if(isalpha(ch)) { type = get_vtype(pi); strcpy(varname, get_varname()); if(type == 3) /* a string$ assignment */ { input_str(varname,loc); pi = e_pos; ch = p_string[pi]; } else /* type==1: numeric assignment */ { input_val(varname,loc); pi = e_pos; ch = p_string[pi]; } [snip]

Here is function get_input():

void get_input() { char ch, varname[VAR_NAME]; int pi, len, type, loc=0; int ab_code=19, x=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch == ';') /* do not echo newline */ { loc = 1; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; } len = strlen(p_string); while(pi < len) /* process to end of line */ { if(ch == '\"') /* INPUT "Print string"; */ { get_prnstring(); pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; } /* --- a string$ or numeric variable --- */ else if(isalpha(ch)) { type = get_vtype(pi); strcpy(varname, get_varname()); if(type == 3) /* a string$ assignment */ { input_str(varname,loc); pi = e_pos; ch = p_string[pi];

333

(Continued) } else /* type==1: numeric assignment */ { input_val(varname,loc); pi = e_pos; ch = p_string[pi]; } } else { a_bort(ab_code, x); } } } /*------ end get_input ------*/

In function input_str(), the first thing done is to get the variable's index position in the string variables array. get_varndx() will search for the varname and create it if it does not already exist. ndx = get_varndx(varname);

If "loc" was set, the next thing to do is to preserve the cursor's row and column position. To do that, cursor_row() and cursor_col() are called. /* --- get cursor --- */ if(loc == 1) { row = cursor_row(); col = cursor_col(); [snip]

The next step is to get the keyboard input. You will notice in this section that there is some LccWin32 specific code. There is an anomaly with that compiler in that, under some situations, the keyboard buffer will retain whatever character was in there last. What happens is that, when gets() is called, the first character space in the input field will be pre-filled with the phantom character. I don't know why that is, but, it does. The way I have found to get rid of it is to call getch() first, before calling gets(). /* --- get data-input --- */ #ifdef LccWin32 ch = getch(); #endif gets(string); len = strlen(string); reset_cursor(loc,len,col,row);

After the data has been input, reset_cursor() is called to reposition the cursor on the same line.

334

void input_str(char *name, int loc) { char ch, varname[VAR_NAME], string[BUFSIZE]; int pi, ndx, len, row, col; unsigned xsize; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); /* --- get cursor --- */ if(loc == 1) { row = cursor_row(); col = cursor_col(); /* --- get data-input --- */ #ifdef LccWin32 ch = getch(); #endif gets(string); len = strlen(string); reset_cursor(loc,len,col,row); } else { #ifdef LccWin32 ch = getch(); #endif gets(string); } /* --- store data --- */ xsize = strlen(string); xsize++; sv_stack[ndx] = realloc(sv_stack[ndx], xsize * sizeof(char)); strcpy(sv_stack[ndx], string); /* save new string */ pi = e_pos; pi++; ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; set_TabNl(ch); } pi = iswhite(pi); e_pos = pi; } /*------- end input_str ----------*/

/* Tab-NewLine */

Function input_val() starts out in a similar way as input_str(). The main difference is in the lower portion, where it assigns the numbers that were input to the variable. Since there are four different numeric types; double, float, long, integer, there is a routine for each, based on the variable type symbol.

335

void input_val(char *name, int loc) { char ch, cx, varname[VAR_NAME], string[VAR_NAME]; int pi, ndx, len, row, col; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; /* --- get cursor --- */ if(loc == 1) { row = cursor_row(); col = cursor_col(); /* --- get data-input --- */ #ifdef LccWin32 cx = getch(); #endif gets(string); len = strlen(string); reset_cursor(loc,len,col,row); } else { #ifdef LccWin32 cx = getch(); #endif gets(string); } /* --- double --- */ if(ch == '#') { nam_stack = dn_stack; /* indirect ref.to name_stack */ max_vars = dmax_vars; init_fn = init_dbl; /* indirect ref.to function */ ndx = get_varndx(varname); dv_stack[ndx] = (double) atof(string); pi++; } /* --- float --- */ else if(ch == '!') { nam_stack = fn_stack; /* indirect ref.to name_stack max_vars = fmax_vars; init_fn = init_flt; /* indirect ref.to function ndx = get_varndx(varname); fv_stack[ndx] = (float) atof(string); pi++; } /* --- long --- */ else if(ch == '%') { nam_stack = ln_stack; /* indirect ref.to name_stack max_vars = lmax_vars; init_fn = init_lng; /* indirect ref.to function ndx = get_varndx(varname); lv_stack[ndx] = atol(string);

*/ */

*/ */

336

(Continued) pi++; } /* --- integer --- */ else { nam_stack = in_stack; /* indirect ref.to name_stack */ max_vars = imax_vars; init_fn = init_int; /* indirect ref.to function */ ndx = get_varndx(varname); iv_stack[ndx] = atoi(string); } pi = iswhite(pi); ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; set_TabNl(ch); /* Tab-NewLine */ } pi = iswhite(pi); e_pos = pi; } /*---------- end input_val ----------*/

Copy these functions to Getinput.c, and save it. We have several new routines to add to file: Output.c. The first, set_TabNl() prints either a tab or newline.

void set_TabNl(int ch) { if(ch == ',') { printf("\t"); } else if(ch == ':') { printf("\n"); } } /*---------- end set_TabNl ----------*/

reset_cursor() is called to reposition the cursor on the input line after the return key has been pressed.

337

void reset_cursor(loc,len,col,row) int loc, len, col, row; { if(loc == 1) { col += len; #ifdef Power_C poscurs(row, col); /* PowerC.ver */ #endif #ifdef LccWin32 gotoxy(col,row); /* LCC.ver */ #endif } } /*---------- end reset_cursor ----------*/

The next two functions; cursor_col() and cursor_row() get the row and column position of the cursor.

int cursor_col() { int col; #ifdef Power_C col = curscol(); #endif #ifdef LccWin32 col = wherex(); #endif return col; /* PowerC.ver */ /* LCC.ver */

} /*---------- end cursor_col ----------*/

int cursor_row() { int row; #ifdef Power_C row = cursrow(); #endif #ifdef LccWin32 row = wherey(); #endif return row; /* PowerC.ver */ /* LCC.ver */

} /*---------- end cursor_row ----------*/

338

Copy these to Output.c, and save it. Update file: Prototyp.h to reflect these changes and additions: /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); void get_strvar(void); void parse_print(void); void reset_cursor(int,int,int,int); void set_TabNl(int); int cursor_col(void); int cursor_row(void); Getinput.c */ void get_input(void); void input_str(char *,int); void input_val(char *,int);

/*

In file: Error.c, we need to add a "case 19" error handler as shown here:

void a_bort(int code,int line_ndx) { [snip] case 19: printf("\nINPUT : error: in statement: %d.\n",(line_ndx+1)); printf("INPUT %sUsage: INPUT \"Enter your ", p_string); printf("name\"; name$:\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*-------------------------------*/

There is only one minor change that needs to be made to parse_str() in file: Strings.c. Copy this version to that file:

339

void parse_str(char *name) { char ch, varname[VAR_NAME], temp[BUFSIZE]; int pi, ndx; unsigned size; strcpy(varname, name); nam_stack = sn_stack; /* indirect reference to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect reference to function */ ndx = get_varndx(varname); pi = e_pos; pi++; pi = iswhite(pi); e_pos = pi; /* --- now get assignment string --- */ Match('='); strcpy(temp, str_express()); /* --- make string assignment --- */ size = strlen(temp); size++; sv_stack[ndx] = realloc(sv_stack[ndx], size * sizeof(char)); strcpy(sv_stack[ndx], temp); } /*---------- end parse_str ----------*/

We're done! Now compile Bxbasic.c. If we have no errors, compile and run this version of Test.bas:

'

test.bas version 10.2 CLS PRINT "Press any key: "; Start: abc$ = INKEY$ IF abc$ = "" THEN GOTO Start ENDIF PRINT abc$ ' PRINT "Enter your name:" INPUT ;"First: "; first$; " Initial: "; init$; " Last: "; last$: PRINT "Enter your age and birth date:" INPUT ;"Age: "; age; " Month: "; mo; "/Day: "; day; "/Year: "; year%: ' PRINT first$, init$, last$ PRINT age, mo, day, year%

340

(Continued) ' -----------------------------------------TheEnd: END ' ------------------------------------------

PRINTING FUNCTIONS:
In a previous chapter we added special string functions for manipulating character strings and characters in string assignments. One thing left to do is to extend those abilities to the Print command. As long as the statement is not an assignment expression, just about any string expression can be placed on the Print command line. Example: Assignment: abc$ = LEFT$(xyz$, 5) Print: PRINT LEFT$(xyz$, 5) One difference concerning string expressions is that Print statements can not have the plus sign "+" between string elements or functions. Instead, string elements and functions must be separated by either a comma, colon or a semi-colon. Example:

abc$ = LEFT$(abc$, 8) + MID$(abc$, 19, 13) + RIGHT$(abc$, 7) ----------^ ---------^ PRINT LEFT$(abc$, 8); MID$(abc$, 19, 13); RIGHT$(abc$, 7) ---------^ ----------^
To begin, all we have to do is add a routine to Input.c, similar to the one we use for string assignments, for tokenizing string functions. As shown here: else if(temp_byte[ndx] == 1) { str_functn(ndx); /* tokenize string functions */ }

In loader_2(), if the byte code is equal to "1", then the tokenizing routine is called. Because of the differences between Print and assignment statements, we need to create a new function for this purpose. The token for the Print command is a "4", so we will use the following:

else if(temp_byte[ndx] == 4) { str_funct2(ndx); /* tokenize string functions */ }

341

Here is the new section of code for loader_2(), in file: Input.c:

void loader_2() { int ndx, ii, line_count=0, lines=nrows; unsigned size; /* --- re-count number of lines --- */ for(ndx=0; ndx < nrows; ndx++) { if(temp_byte[ndx] != 0) { line_count++; } if((temp_byte[ndx] == 13) || (temp_byte[ndx] == 14)) { token_if(ndx); /* tokenize expression */ } else if(temp_byte[ndx] == 1) { str_functn(ndx); /* tokenize string functions */ } else if(temp_byte[ndx] == 4) { str_funct2(ndx); /* tokenize string functions */ } } nrows = line_count; [snip]

Function str_funct2() is pretty much a copy of str_functn(), minus the assignment portions at the beginning:

void str_funct2(int ndx) { char ch, temp[VAR_NAME]; int pi=0, len;

/* tokenize string functions */ /* Print */

strcpy(p_string, temp_prog[ndx]); len = strlen(p_string); /* --- loop --- */ while(pi < len) { pi = get_upper(pi, len); /* advance to uppercase */ if(pi < len) { e_pos = pi; strcpy(temp, get_varname()); /* get varname */ s_pos = e_pos; ch = p_string[s_pos]; if(ch == '$') /* valid '$' symbol */ { e_pos = pi; get_strfunc(temp,ndx); /* identify function */ } pi = e_pos; pi++; } }

342

(Continued) } /*---------- end str_funct2 ----------*/

Copy these to file Input.c, and save it. Since the Print command is part of Output.c, we will be working with that file next. Currently, the loop that handles printing, in function parse_print(), looks like this: [snip] while(ch != '\n') { if(isalpha(ch)) /* --- print variable --- */ { strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); /* string variable */ } else { get_prnvar(); /* numeric variable */ } } else if(ch == quote) /* --- next char is a quote -- */ { get_prnstring(); } /* --- error: --- */ [snip]

Up to this point, we have either been printing quoted strings or variables. So, we've been looking for quotes or alpha characters. Since the string functions have been tokenized into digits, we now need to branch if we encounter a digit. So, we will add this code segment: else if(isdigit(ch)) { prn_strfunc(); }

/* string function */

Here is the new code for parse_print():

343

void parse_print() { char ch, quote='\"'; int pi, ab_code=9, x=line_ndx; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- print newline --- */ if(strchr(":\n", ch)) { printf("\n"); return; } /* --- LOOP: multiple print statements --- */ while(ch != '\n') { if(isalpha(ch)) /* --- print variable --- */ { strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); /* string variable */ } else { get_prnvar(); /* numeric variable */ } } else if(isdigit(ch)) { prn_strfunc(); /* string function */ } else if(ch == quote) /* --- next char is a quote -- */ { get_prnstring(); } /* --- error: --- */ else { a_bort(ab_code, x); } /* --- return from subroutines --- */ pi = e_pos; ch = p_string[pi]; /* --- is it end of statement --- */ if(ch != '\n') { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } /* --- LOOP: if more to print --- */ } } /*-------- end parse_print --------*/

344

Since all of the string function routines have already been written, all we have to do is call them, by making a call to asn_function(), when we need them and print the results. Here is the code for function prn_strfunc():

void prn_strfunc() { char ch; int pi; /* --- get function --- */ asn_function(); pi = e_pos; pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- display string$ --- */ if(ch == ',') printf("%s\t", s_holder); else if(ch == ';') printf("%s", s_holder); else printf("%s\n", s_holder); } /*--------- end prn_strfunc -----------*/

Copy these additions and changes to Output.c, and save it. Now all we have to do is to update these two lists in Prototyp.h: /* Input.c */ void line_cnt(char *argv[]); void load_src(void); void void void void void int void void void void void int void void void /* save_tmp(void); tmp_byte(int); loader_1(void); tmp_label(int); tmp_byte(int); get_byte(int); tmp_prog(int); loader_2(void); get_MOD(int); token_if(int); str_functn(int); IsEqu(int); get_strfunc(char *,int); str_copy(char *,int); str_funct2(int);

Output.c */ void beep(void); void cls(void); void get_prnstring(void);

345

(Continued) void void char void void void void int int void get_prnvar(void); locate(void); *value2strng(double); get_strvar(void); parse_print(void); reset_cursor(int,int,int,int); set_TabNl(int); cursor_col(void); cursor_row(void); prn_strfunc(void);

Now re-compile Bxbasic.c. Using Bxbasic.exe, try this Test.bas: test.bas version 10.3 CLS abc$ = "This is a test of the Emergency Broadcast System" PRINT "Test: >"; CHR$(251); "< End Test" PRINT ">"; LEFT$(abc$, 14); "<" PRINT ">"; RIGHT$(abc$, 26); "<" PRINT ">"; MID$(abc$, 23, 19); "<" PRINT ">"; SPACE$(10); "<" PRINT ">"; STR$(1000); "<" PRINT ">"; STRING$(10, 251); "<" ' -----------------------------------------TheEnd: END ' -----------------------------------------'

Feel free to experiment with these and make sure they're working as they should.

PRINT EXPRESSIONS:
While we are on the subject, we may as well add math expressions to the Print command. In the Basic language, just about any math or string expression can be placed on the Print command line, as I've said before, just as long as the expression is not an assignment statement. Example: Assignment: Print:

abc = 100*(10/xyz)+(2*xyz) PRINT 100*(10/xyz)+(2*xyz)

We have already written the math routines, in Rdparser.c, now all we have to do is make them accessible to the print command. Currently, the Print parser examines the Print statement and is expecting either a variable name, a tokenized string function (digit), or a quoted string,

346

as in this snippet shown here:

if(isalpha(ch)) [snip] else if(isdigit(ch)) [snip] else if(ch == quote) [snip]

/* print variable */ /* string function */ /* quoted string */

Our print parser can not handle statements like these:

PRINT abc + xyz PRINT (abc + xyz) PRINT 1 + (abc * xyz)
Except for printing simple variables, we have no capability of printing the result of an expression. In order to display this statement:

PRINT abc + xyz
we will need to make some fundamental changes to the way our print parser works. Presently, if the current character in the statement is am alpha character, we get the variable name and branch to either get_strvar() or get_prnvar(), as shown here:

{

if(isalpha(ch)) /* --- print variable --- */ strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); /* string variable */ } else { get_prnvar(); /* numeric variable */ } }[snip] {

The problem with that is, if we are going to call the math parser, the math parser is expecting to retrieve the variable name for itself. Calling get_varname() has the effect of advancing the program pointer "pi" beyond the variable name. What we need to do is, isolate the call to get_prnvar() from that process. We can do that by changing the process in this way:

347

{

if(isalpha(ch)) /* --- print variable --- */ type = get_vtype(pi); if(type == 3) { strcpy(s_holder, get_varname()); get_strvar(); /* string variable */ } else { get_prnvar(); /* variable or expression */ } }[snip] {

Here, if the variable name type is a string variable "$", we can proceed with getting the variable name and then making the branch to get_strvar(). Otherwise, we call get_prnvar(), without the varname. At get_prnvar(), we would then make the call to rdp_main(), the math parser:

void get_prnvar() {[snip] /* --- call math parser --- */ value = rdp_main(); [snip]

The value returned in variable: "value" will then be displayed in the normal manner. To display the results of this statement:

PRINT 1 + (abc * xyz)
we will need to change this handler: else if(isdigit(ch)) { prn_strfunc(); } to this: else if(isdigit(ch)) { proc_digit(); }

/* string function */

/* string function or expression */

This new function, proc_digit(), will determine whether we have a string function or an expression and branch accordingly. We call function prn_strfunc() when a string function has been tokenized to look like this:

2(abc$, 3)

348

In this expression:

1+(abc * xyz)
the subtle difference is that the digit is followed by an AddOp, the "plus" symbol. In the string function statement, the digit is followed by the left parenthesis. So, we can distinguish a string function from a math expression by knowing what character follows the digit(s). We can do that by advancing to the end of the digits and isolating the next character. We can also scan across and look for a parenthesis. Example:

2(abc$, 3) find paren---^ ^---first character past digits
If the positions are the same, we know we have located a string function.

Next example:

1 + (abc * xyz) find paren---^ ^---first character past digits
the pointers do not point to the same character, we know it is not a string function and therefore, it must be a math expression. Finding an expression, as in this case, would branch to the math parser. In this next statement:

PRINT (abc + xyz)
since the first character is the left paren, we can safely assume that this is an expression. No other type of statement would have a left paren as the first character in the statement. So all we need is to add a test for that character: else if(ch == paren) { get_prnvar(); }

/* expression */

We would call rdp_main(), the math parser, through get_prnvar(). For file Output.c, here is the new code:

void parse_print() { char ch, quote='\"', paren='('; int pi, type, ab_code=9, x=line_ndx; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- print newline --- */

349

(Continued) if(strchr(":\n", ch)) { printf("\n"); return; } /* --- LOOP: multiple print statements --- */ while(ch != '\n') { if(isalpha(ch)) /* --- print variable --- */ { type = get_vtype(pi); if(type == 3) { strcpy(s_holder, get_varname()); get_strvar(); /* string variable */ } else { get_prnvar(); /* variable or expression */ } } else if(isdigit(ch)) { proc_digit(); /* string function or expression */ } else if(ch == paren) { get_prnvar(); /* expression */ } else if(ch == quote) /* quoted string */ { get_prnstring(); } /* --- error: --- */ else { a_bort(ab_code, x); } /* --- return from subroutines --- */ pi = e_pos; ch = p_string[pi]; /* --- is it end of statement --- */ if(ch != '\n') { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } /* --- LOOP: if more to print --- */ } } /*-------- end parse_print --------*/

350

void proc_digit() { char ch, val_strng[VAR_NAME]; int pi, len, px; double value; pi = e_pos; len = strlen(p_string); px = get_paren(pi, len); /* find next paren: "(" pi = get_NextOp(pi); /* find next Op if(pi == px) /* string function: 1(abc$) { prn_strfunc(); /* if px is paren----^ } else { value = rdp_main(); /* math expression */ strcpy(val_strng, value2strng(value)); pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; if(ch == ',') printf("%s \t", val_strng); else if(ch == ';') printf("%s ", val_strng); else printf("%s \n", val_strng); } } /*-------- end proc_digit --------*/

*/ */ */ */

void get_prnvar() { char ch, val_strng[VAR_NAME]; int pi; double value; /* --- call math parser --- */ value = rdp_main(); /* --- convert value to string$ --- */ strcpy(val_strng, value2strng(value)); pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; if(ch == ',') printf("%s \t", val_strng); else if(ch == ';') printf("%s ", val_strng); else printf("%s \n", val_strng); } /*--------- end get_prnvar -----------*/

Copy these changes to file Output.c and save it.

351

These new functions go in file: Utility.c:

int get_paren(int pi, int stlen) { char ch; ch = p_string[pi]; while((strchr("()", ch) == 0) && (pi < stlen)) { pi++; ch = p_string[pi]; } return pi; } /*---------- end get_paren ----------*/

int get_NextOp(int pi) { char ch; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhiter(pi); return pi; } /*---------- end get_NextOp ----------*/

Update these prototype lists in file: Prototyp.h: /* Output.c */ void beep(void); void cls(void); void get_prnstring(void); void get_prnvar(void); void locate(void); char *value2strng(double); void get_strvar(void); void parse_print(void); void reset_cursor(int,int,int,int); void set_TabNl(int); int cursor_col(void); int cursor_row(void); void prn_strfunc(void); void proc_digit(void); Utility.c */ int get_upper(int,int);

/*

352

(Continued) int int int void int int int int get_alpha(int,int); get_digit(int,int); iswhite(int); clr_arrays(void); iswhiter(int); find_strng(char *); get_paren(int,int); get_NextOp(int);

Now compile Bxbasic.c. With Bxbasic.exe, run this version of Test.bas: ' test.bas version 10.4 CLS PRINT ">"; STR$(1000); "<" PRINT ">"; STRING$(10, 251); "<" PRINT CHR$(247) abc = 10 xyz = 3 PRINT abc * xyz PRINT 1 + (abc * xyz) PRINT (abc / xyz) * 2 PRINT 1+(2*5)/3 ' -----------------------------------------TheEnd: END ' ------------------------------------------

INPUT$:
The function INPUT$ is another string function and it's usage appears like this:

a$ = INPUT$(n)
Where (n) represents the length, in number of characters , that the string is required to be. Up to 255 characters may be entered. An example of how this might be used is when prompting the user to enter a date, or a phone number, or any type of data that has a specific number of characters.

353

Example:

Enter Phone: [123]-[555]-[1234] ^^^ ^^^ ^^^^ area$ = INPUT$(3) '---^ ^ ^ prefix$ = INPUT$(3) '---------^ ^ suffix$ = INPUT$(4) '---------------^ [12]/[01]/[2003] ^^ ^^ ^^^^ Month$ = INPUT$(2) '---^ ^ ^ Day$ = INPUT$(2) '--------^ ^ Year$ = INPUT$(4) '-------------^
Input$ does not require a return to terminate the input. As soon as the correct number of characters have been entered, Input$ automatically terminates. We start by adding INPUT to the bottom of the list of string functions in get_strfunc(), in file: Input.c:

Enter Date:

void get_strfunc(char *name,int ndx) { char varname[VAR_NAME], temp[TOKEN_LEN]; strcpy(varname, name); temp[0] = '\0';

/* identify function */

/* --- now compare to functions --- */ if(strcmp(varname, "CHR") == 0) { strcpy(temp, " 1"); /* is this a chr$(n) assnmnt */ } else if(strcmp(varname, "LEFT") == 0) { strcpy(temp, " 2"); /* a left$(a$,n) assnmnt */ } else if(strcmp(varname, "RIGHT") == 0) { strcpy(temp, " 3"); /* a right$(a$,n) assnmnt */ } else if(strcmp(varname, "MID") == 0) { strcpy(temp, " 4"); /* a mid$(a$,s,n) assnmnt */ } else if(strcmp(varname, "SPACE") == 0) { strcpy(temp, " 5"); /* a space$(n) assnmnt */ } else if(strcmp(varname, "STR") == 0) { strcpy(temp, " 6"); /* a str$(x) assnmnt */ } else if(strcmp(varname, "STRING") == 0) { strcpy(temp, " 7"); /* a string$(n,a) assnmnt */ } else if(strcmp(varname, "INKEY") == 0) { strcpy(temp, " 8"); /* an inkey$ assnmnt */ } else if(strcmp(varname, "INPUT") == 0) { strcpy(temp, " 9"); /* an input$ assnmnt */

354

(Continued) /* replace function name w/token */ } /*---------- end get_strfunc ----------*/ } str_copy(temp,ndx);

Make this addition to Input.c, and save it. In function asn_function(), in file: Strings.c, we need to add INPUT$ to the list of function calls, as well. We add "case 9", which calls function inputstr():

void asn_function() { int type; type = (int) get_avalue(); switch(type) { case 1: chrstr(); /* break; case 2: leftstr(); /* break; case 3: rightstr(); /* break; case 4: midstr(); /* break; case 5: spacestr(); /* break; case 6: strsval(); /* break; case 7: stringstr(); /* break; case 8: inkeystr(); /* break; case 9: inputstr(); /* break; default: /* error */ break; }

a$ = CHR$(n) */ a$ = LEFT$(x$,n) */ a$ = RIGHT$(x$,n) */ a$ = MID$(x$,n,n) */ a$ = SPACE$(n) */ a$ = STR$(n) */ a$ = STRING$(n,c) */ a$ = INKEY$ */ a$ = INPUT$(n) */

} /*---------- end asn_function ----------*/

355

Function inputstr() begins by getting the number count, of characters to be input. All normal keyboard characters, between ascii-32 and 126, may be entered and added to the destination string. The backspace key will function as expected, by reversing the direction of the cursor and moving back the string character pointer.

void inputstr() { char ch; int pi, count, i, len; pi = e_pos; pi++; e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); /* --- zero s_holder --- */ for(i=0; i <= count; i++) { s_holder[i] = '\0'; } i = 0; while(i < count) { ch = getche(); if((ch > 31) && (ch < 127)) { s_holder[i] = ch; i++; } else if((ch == 8) && (i > 0)) { s_holder[i] = '\0'; i--; } } s_holder[count] = '\0'; } /*---------- end inputstr ----------*/ /* pi enters pointing to: (n) */ /* advance to number: n) */

/* ascii chars */

/* backspace */

Copy this to Strings.c and save it. Update the prototype list for Strings.c, in file: Prototyp.h: /* Strings.c */ void parse_str(char *); void strng_assgn(void); void strvar_assgn(void); void asn_function(void); void chrstr(void); void leftstr(void); void rightstr(void); void midstr(void); void spacestr(void);

356

(Continued) void void int char void void strsval(void); stringstr(void); get_strndx(void); *str_express(void); inkeystr(void); inputstr(void);

Compile Bxbasic.c. With Bxbasic.exe, run this version of Test.bas: test.bas version 10.5 CLS abc$ = INPUT$(10) PRINT "": abc$ ' -----------------------------------------TheEnd: END ' -----------------------------------------'

LINE INPUT:
The Line Input command is similar to the Input command, with only a minor difference. Instead of accepting input for multiple variables and variable types and displaying multiple prompts, Line Input will only accept input for a single string variable. The usage is shown here:

LINE INPUT [;]["prompt";] abc$
Up to 255 characters may be entered and input is terminated with the [enter] key. Since LINE INPUT is actually two keywords, separated by a single space, we can't test for this in the normal way that we test for keywords, in function get_byte(), in Input.c. Here is the code fragment we use to retrieve the keyword:

357

while(isalnum(ch)) { keyword[si] = ch; si++; pi++; ch = p_string[pi]; } keyword[si] = '\0';

As you can see, as soon as we encounter the space character, this while-loop is broken:

LINE INPUT abc$ ^---break while-loop
Fortunately, this is the only occurrence of the keyword "Line", so we can assume (for now), that this is an implied "Line Input" statement. We will then test for the key word "Line", as shown here: else if(strcmp(keyword, "LINE") == 0) byte=18;

In file: Input.c, here is the code for get_byte():

int get_byte(int ii) {[snip] else if(strcmp(keyword, "INPUT") == 0) else if(strcmp(keyword, "LINE") == 0) else [snip] } /*---------- end get_byte ----------*/ byte=17; byte=18;

In function temp_prog(), we need to advance the program index "pi" past the keyword "INPUT", so that it doesn't get written to the program array. If we didn't, the program statement would end up looking like this:

INPUT abc$
and the program would assume that INPUT was a variable name and attempt to make an assignment, which of course would cause an error.

358

void tmp_prog(int ii) { char ch, prog[BUFSIZE]; int pi, si=0, len; len = strlen(p_string); pi = e_pos; pi = iswhite(pi); /* --- correct: LINE INPUT --- */ if(temp_byte[ii] == 18) { ch = p_string[pi]; while(isupper(ch)) { pi++; /* advance past INPUT */ ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; while(ch != '\0') { prog[si] = ch; si++; pi++; ch = p_string[pi]; } prog[si] = '\0'; } else if((pi < len) && (temp_byte[ii] != 0)) { ch = p_string[pi]; while(ch != '\0') { prog[si] = ch; si++; pi++; ch = p_string[pi]; } prog[si] = '\0'; } else { strcpy(prog, "\n\0"); } strcpy(temp_prog[ii], prog); } /*---------- end tmp_prog ----------*/

Copy these changes to Input.c and save it. The next thing we need to do is add Line-Input to parser() in Bxbasic.c. As shown below, add a "case 18", which will call function get_lninput():

359

void parser() {[snip] case 17: /* INPUT */ get_input(); break; case 18: /* LINE INPUT */ get_lninput(); break; case -1: /* block label */ break; default: a_bort(ab_code, x); break; } } /*-------------------------------*/

In file: Getinput.c, we will add the new function get_lninput():

void get_lninput() { char ch, varname[VAR_NAME]; int pi, loc=0; pi = e_pos; ch = p_string[pi]; if(ch == ';') /* do not echo newline */ { loc = 1; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; } if(ch == '\"') /* Prompt: "enter string"; */ { get_prnstring(); pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; } /* --- input string --- */ strcpy(varname, get_varname()); input_str(varname,loc); } /*---------- end get_lninput ----------*/

360

Update the prototype list for Getinput.c, in file: Prototyp.h: /* void void void void Getinput.c */ get_input(void); input_str(char *,int); input_val(char *,int); get_lninput(void);

Now re-compile Bxbasic.c. With Bxbasic.exe, run this Test.bas and make note of the comments: ' test.bas version 10.6 CLS PRINT "1: "; LINE INPUT abc$ ' ^-----notice no prompt PRINT abc$ LINE INPUT "2:enter a string: "; abc$ PRINT abc$ LINE INPUT ;"3:enter a string: "; abc$ ' ^------no echo:return PRINT abc$ LINE INPUT ;"4:enter a string: "; abc$, ' no echo---^ insert tab-------^ PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

MATH FUNCTIONS:
We've already added string functions, now it's time we added some math functions. The standard math functions are: • ABS(n): returns absolute value of n. • ASC($): returns ascii code of string-character. • ATN(n): returns arctangent of n. • COS(n): returns cosine of n. • SIN(n): returns sine of n. • TAN(n): returns tangent of n. • SQRT(n): returns square root of n. • INT(n): converts n to integer.

361

The typical usage for all of these would be similar to this:

abc = ABS(xyz) abc = ASC(a$)
The functions return these types of values: • int = ABS(n) • int = ASC($) • float = ATN(n) • float = COS(n) • float = SIN(n) • float = TAN(n) • float = SQRT(n) • int = INT(n) To enable math functions we will begin in file: Input.c, with function str_functn(). Math functions are similar in appearance to string functions and where they show up in the statement:

abc$ = CHR$(32) abc = ASC(a$)
so we can use some of the code we already have in str_functn() to help us and we will add a little more. Here is the code snippet that, if there appears to be a string function, causes a branch in the program: strcpy(temp, get_varname()); s_pos = e_pos; ch = p_string[s_pos]; if(ch == '$') { e_pos = pi; get_strfunc(temp,ndx); } [snip] /* get varname */ /* valid '$' symbol */ /* identify function */

At the beginning of this section, a possible function name, in uppercase, has been detected. The final test is whether there is a "$" symbol or not. Math functions do not have the "$" symbol. For our purposes, we could add an "else" clause and branch to a procedure that will process the suspected math function. Like this:

362

if(ch == '$') { e_pos = pi; get_strfunc(temp,ndx); } else { e_pos = pi; get_mathfunc(temp,ndx); } [snip]

/* valid '$' symbol */ /* identify function */

/* identify math function */

For file: Input.c, here is the new code for str_functn() and str_funct2(), since both are affected in the same way:

void str_functn(int ndx) { char ch, temp[VAR_NAME]; int pi=0, len;

/* tokenize string functions */

strcpy(p_string, temp_prog[ndx]); while(IsEqu(pi) == 0) /* advance to '=' sign */ { pi = e_pos; pi++; } pi = e_pos; /* leave this, IsEqu() changes e_pos */ len = strlen(p_string); /* --- loop --- */ while(pi < len) { pi = get_upper(pi, len); /* advance to uppercase */ if(pi < len) { e_pos = pi; strcpy(temp, get_varname()); /* get varname */ s_pos = e_pos; ch = p_string[s_pos]; if(ch == '$') /* valid '$' symbol */ { e_pos = pi; get_strfunc(temp,ndx); /* identify function */ } else { e_pos = pi; get_mathfunc(temp,ndx); /* identify math function */ } pi = e_pos; pi++; } } } /*---------- end str_functn ----------*/

363

void str_funct2(int ndx) { char ch, temp[VAR_NAME]; int pi=0, len;

/* tokenize string functions */ /* Print */

strcpy(p_string, temp_prog[ndx]); len = strlen(p_string); /* --- loop --- */ while(pi < len) { pi = get_upper(pi, len); /* advance to uppercase */ if(pi < len) { e_pos = pi; strcpy(temp, get_varname()); /* get varname */ s_pos = e_pos; ch = p_string[s_pos]; if(ch == '$') /* valid '$' symbol */ { e_pos = pi; get_strfunc(temp,ndx); /* identify function */ } else { e_pos = pi; get_mathfunc(temp,ndx); /* identify math function */ } pi = e_pos; pi++; } } } /*---------- end str_funct2 ----------*/

There is one detail we have to deal with though. When we tokenize this statement containing a string function:

abc$ = CHR$(32)
into this:

abc$ = 1(32)
we will encounter a problem when we go to tokenize a math function:

abc = ASC(x$)
If we use the same procedure and replace the function name with a numeric token, we will have no way of telling string functions from math functions. Example: abc$ = 1(32) ' string function abc = 1(x$) ' math function In an assignment statement, as shown here, you can imply that the destination variable type determines the function type. But, what about in a Print statement ?

(Print): (Print):

1(32) 1(x$)

' string ' math

364

We could say that all tokens from 1 to 127 are string function tokens and tokens 128 to 255 are math functions. (Print): 1(32) ' string (Print): 128(x$) ' math That might work, but, then we would need to test each function for range before we branched. That brings up another problem though and it has to do with parsing math expressions. Example:

abc = 1*(x/3) abc = 1(x$)
Obviously, the second expression would generate an error, because the math parser would be expecting an operator between the number and the left paren. It might just be easier to add a symbol to the token, to distinguish one token type from the other. Example: (Print): 1(32) ' string (Print): 1*(x/3) ' math (Print): ?1(x$) ' math Since we have already made provisions for dealing with numerically tokenized functions, we can, with no real difficulty, add the "?" symbol as a token representing math functions. The "?" character serves no other purpose in our program, so, when we encounter a "?" we will know what it represents.

Here is the code for our new function: get_mathfunc(), which will tokenize math functions:

void get_mathfunc(char *name,int ndx) /* identify math function */ { char varname[VAR_NAME], temp[TOKEN_LEN]; strcpy(varname, name); temp[0] = '\0'; /* --- now compare to functions --- */ if(strcmp(varname, "ABS") == 0) { strcpy(temp, " ?1"); /* a } else if(strcmp(varname, "ASC") == 0) { strcpy(temp, " ?2"); /* a } else if(strcmp(varname, "ATN") == 0) { strcpy(temp, " ?3"); /* a } else if(strcmp(varname, "COS") == 0) { strcpy(temp, " ?4"); /* a } else if(strcmp(varname, "SIN") == 0) { strcpy(temp, " ?5"); /* a }

abs(n) assnmnt */ asc(a$) assnmnt */ atn(n) assnmnt */ cos(n) assnmnt */ sin(n) assnmnt */

365

(Continued) else if(strcmp(varname, "TAN") == 0) { strcpy(temp, " ?6"); /* a tan(n) assnmnt */ } else if(strcmp(varname, "SQRT") == 0) { strcpy(temp, " ?7"); /* a sqrt(n) assnmnt */ } else if(strcmp(varname, "INT") == 0) { strcpy(temp, " ?8"); /* a int(n) assnmnt */ } str_copy(temp,ndx); /* replace function name w/token */ } /*---------- end get_mathfunc ----------*/

Copy these changes and additions to Input.c and save it. Since math functions will have to be incorporated into and callable from the math parser, we will need to add some code to allow that to happen. In file: Rdparser.c, and function Factor(), as shown here: { if(isalpha(ch)) /* variable name */ { strcpy(s_holder, get_varname()); value = get_varvalue(); SkipWhite(); } else /* numeric value */ { value = GetNum(); }

we test for variable names and number values. What we need to do now, is add a test for the "?" character, which is our queue that this is a function. Here is how it should look: else if(ch == '?') { value = math_functn(); SkipWhite(); } /* math function */

When the parser encounters the "?" symbol, it will then branch to math_functn(). Here is the new code for Factor():

366

double Factor() { char ch; int pi; double value;

/* Parse and Translate a Math Factor */

pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); value = Expression(); Match(')'); } else { if(isalpha(ch)) /* variable name */ { strcpy(s_holder, get_varname()); value = get_varvalue(); SkipWhite(); } else if(ch == '?') /* math function */ { value = math_functn(); SkipWhite(); } else /* numeric value */ { value = GetNum(); } } return value; } /*---------- end Factor ----------*/

Here is the code for math_functn(), and it is followed by the individual math functions:

double math_functn() { int pi, type; double value=0; pi = e_pos; /* enter: ?1(x) */ pi++; /* ----^ */ e_pos = pi; type = (int) get_avalue(); switch(type) { case 1: value = get_ABS(); /* a = ABS(exp) */ break; case 2: value = get_ASC(); /* a = ASC(str$) */ break; case 3: value = get_ATN(); /* a = ATN(exp) */ break;

367

(Continued) case 4: value = get_COS(); break; case 5: value = get_SIN(); break; case 6: value = get_TAN(); break; case 7: value = get_SQRT(); break; case 8: value = get_INT(); break; default: /* error */ break; } return value; } /*---------- end math_functn ----------*/ /* a = COS(exp) */ /* a = SIN(exp) */ /* a = TAN(exp) */ /* a = SQRT(exp) */ /* a = INT(exp) */

double get_ABS() { long ivalue; double value;

/* do not change because: abs(int!) */

ivalue = (long) Factor(); value = (double) abs(ivalue); return value; } /*---------- end get_ABS ----------*/

double get_ASC() { char ch; int pi, ndx, len; double value; pi = e_pos; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; if(isalpha(ch)) { ndx = get_strndx(); ch = sv_stack[ndx][0]; }

/* string varname */

368

(Continued) else { pi++; ch = p_string[pi]; } len = strlen(p_string); pi = get_paren(pi, len); pi++; e_pos = pi; value = (double) ch; return value; /* quoted "string" */

/* advance to: ')' */

} /*---------- end get_ASC ----------*/

double get_ATN() { double value; value = Factor(); value = atan(value); return value; } /*---------- end get_ATN ----------*/

double get_COS() { double value; value = Factor(); value = cos(value); return value; } /*---------- end get_COS ----------*/

double get_SIN() { double value; value = Factor(); value = sin(value); return value; } /*---------- end get_SIN ----------*/

369

double get_TAN() { double value; value = Factor(); value = tan(value); return value; } /*---------- end get_TAN ----------*/

double get_SQRT() { double value; value = Factor(); value = sqrt(value); return value; } /*---------- end get_SQRT ----------*/

double get_INT() { long ivalue; double value; ivalue = (long) Factor(); value = (double) ivalue; return value; } /*---------- end get_INT ----------*/

Copy these changes and additions to Rdparser.c, and save it. Update these prototype lists in: Prototyp.h: /* Input.c */ void line_cnt(char *argv[]); void load_src(void); void save_tmp(void); void tmp_byte(int); void loader_1(void); void tmp_label(int); void tmp_byte(int); int get_byte(int); void tmp_prog(int); void loader_2(void);

370

(Continued) void void void int void void void void /* get_MOD(int); token_if(int); str_functn(int); IsEqu(int); get_strfunc(char *,int); str_copy(char *,int); str_funct2(int); get_mathfunc(char *,int);

Rdparser.c */ double rdp_main(void); double Expression(void); double Term(void); double Factor(void); void Match(char); void _GetChar(void); double GetNum(void); int IsAddop(char); int IsMultop(char); int Is_White(char); void SkipWhite(void); double asc_2_dbl(void); double math_functn(void); double get_ABS(void); double get_ASC(void); double get_ATN(void); double get_COS(void); double get_SIN(void); double get_TAN(void); double get_SQRT(void); double get_INT(void);

Compile Bxbasic.c. Now, with Bxbasic.exe, run this Test.bas: ' test.bas version 10.7 CLS xyz = 99 abc = ABS(xyz - 1.75) PRINT abc abc = ASC("test") PRINT abc abc! = ATN(xyz / 3) PRINT abc! abc! = COS(5.8 * .0174533) PRINT abc! abc! = SIN(xyz / 11) PRINT abc! abc! = TAN(xyz / 10)

371

(Continued) PRINT abc! abc! = SQRT(xyz) PRINT abc! abc = INT(xyz / 3.1) PRINT abc ' -----------------------------------------TheEnd: END ' ------------------------------------------

PRINT MATH FUNCTIONS:
The final thing to do is to enable the printing of math functions, such as:

PRINT COS(5.8 * .0174533)
This will only require the addition of a few minor pieces of code. Basically, all we have to do is insert a call to rdp_main(), the math parser. There is one little problem though, concerning printing functions that we need to work around. The Print routine, that displays numeric values, uses the global variable "var_type" as a variable data type flag, to determine whether or not to display decimal places and if so, how many. Given this statement:

PRINT abc%
parse_print() will call get_prnvar(), which will in turn call rdp_main():

parse_print: get_prnvar: rdp_main: get_varvalue: var_type = %

var_type is set during a call to get_varvalue() and it is assigned the variable type symbol. That's fine for variables, but, math functions do not have a data type symbol, even though they do return a particular data type. When we print the result of a math function, we need to tell the print routine what data type to expect to print. If we don't, it will assume the data is of type integer, by default. Most math functions do not return data type integer. What would end up being printed would be in error. In the above illustration, we can set var_type to whatever we want, on entering get_prnvar() and prior to calling rdp_main(), but, the contents will change by the time we get back from rdp_main(). Since we cannot rely on

372

using var_type, what we can do is assign a local variable, in get_prnvar(), the symbol "?", as a flag. This flag will indicate to the print routine that when it's time to do the printing, it is dealing with a function and it needs to look at an alternate global variable ("func_type") for the data type. The global variable "func_type" will be set during the call to math_functn(). Here is a flow of events: in parse_print(), when we encounter the "?" symbol, we will assign "var_type" with the character "?", which will be passed on to get_prnvar(), as shown here: else if(ch == '?') { var_type = '?'; get_prnvar(); }

/* string function */

Then, on entering get_prnvar(), as shown here, "ch" will be temporarily assigned the symbol in var_type: void get_prnvar() { char ch=var_type, [snip]

In function math_functn(), in Rdparser.c, the math functions: 1, 2 and 8 return an integer value and the rest return floats. We will add a switch/case to assign variable func_type with the correct data type, as shown here: switch(type) { case 1: case 2: case 8: func_type = '%'; break; default: func_type = '!'; break; }[snip]

On returning to get_prnvar(), var_type will be re-assigned with the character contained in func_type: if(ch == '?') /* was this a math function */ { var_type = func_type; }[snip]

which will inform the print routine of the correct function data type.

373

Here is a new illustration:

parse_print: var_type = ? get_prnvar: ch = var_type rdp_main: math_functn: func_type = [%!] if(ch == ?) var_type = func_type Print
Here are the new parse_print() and get_prnvar():

void parse_print() { [snip] { if(isalpha(ch)) /* --- print variable --- */ { type = get_vtype(pi); if(type == 3) { strcpy(s_holder, get_varname()); get_strvar(); /* string variable */ } else { get_prnvar(); /* variable or expression */ } } else if(isdigit(ch)) { proc_digit(); /* string function or expression */ } else if(ch == '?') { var_type = '?'; get_prnvar(); /* string function */ } else if(ch == paren) { get_prnvar(); /* expression */ } [snip] /*-------- end parse_print --------*/

374

void get_prnvar() { char ch=var_type, val_strng[VAR_NAME]; int pi; double value; /* --- call math parser --- */ value = rdp_main(); if(ch == '?') /* was this a math function */ { var_type = func_type; } /* --- convert value to string$ --- */ strcpy(val_strng, value2strng(value)); var_type = '\0'; /* clear var_type and func_type */ func_type = '\0'; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; if(ch == ',') printf("%s \t", val_strng); else if(ch == ';') printf("%s ", val_strng); else printf("%s \n", val_strng); } /*--------- end get_prnvar -----------*/

Copy these changes to Output.c and save it. Copy this new code segment to math_functn(), in file: Rdparser.c:

double math_functn() { [snip] default: /* error */ break; } switch(type) { case 1: case 2: case 8: func_type = '%'; break; default: func_type = '!'; break; } return value; } /*---------- end math_functn ----------*/

375

In file: Bxbasic.c, add variable "func_type" to the global vars list: /* ------ global vars ------------ */ ... [snip] char func_type; /* current function type /**/

*/

Now compile Bxbasic.c. And, try this Test.bas: test.bas version 10.8 CLS abc = 10 xyz = 99 PRINT ABS(xyz - 1.75) PRINT ASC("test") PRINT ATN(xyz / 3) PRINT COS(5.8 * .0174533) PRINT SIN(xyz / 11) PRINT TAN(xyz / 10) PRINT SQRT(xyz) PRINT INT(xyz / 3.1) ' -----------------------------------------TheEnd: END ' -----------------------------------------'

CONCLUSION
Well we've covered quite a lot in this chapter. We've added: • INKEY$ • INPUT • INPUT$ • LINE INPUT • Math Functions • Printing String Functions • Printing Expressions • Printing Math Functions There's still more to come.

376

CHAPTER - 11
INTRODUCTION
Our next project will be to add disk file access. We will build routines to create disk files, store and retrieve data. Let's get started.

ENGINE UPDATE:
We first need to bring our runtime engine up to date with our Bxbasic interpreter. In Chapter 13 we made a number of changes to Bxbasic.c, that we haven't carried over to Engine.c. Fortunately, because of the way we have designed Bxbasic.c and Engine.c, we only need to make some very small changes. Begin by opening Engine.c. In the 'function includes' section, add this 'include' for getinput.c : /* --- function includes --- */ ... #include "getinput.c"

Next, add these two cases to the parser: case 17: /* INPUT */ get_input(); break; case 18: /* LINE INPUT */ get_lninput(); break;

That's it! Now compile Engine.c. Using Bxcomp.exe, try compiling a few of the Test.bas examples from the last chapter, like those shown here:

test.bas version 11.1 CLS Start: abc$ = INKEY$ IF abc$ = "" THEN

'

377

(Continued) GOTO Start ENDIF PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

'

test.bas version 11.2 CLS PRINT "Press any key: "; Start: abc$ = INKEY$ IF abc$ = "" THEN GOTO Start ENDIF PRINT abc$ ' PRINT "Enter your name:" INPUT ;"First: "; first$; " Initial: "; init$; " Last: "; last$: PRINT "Enter your age and birth date:" INPUT ;"Age: "; age; " Month: "; mo; "/Day: "; day; "/Year: "; year%: ' PRINT first$, init$, last$ PRINT age, mo, day, year% ' -----------------------------------------TheEnd: END ' ------------------------------------------

'

test.bas version 11.3 CLS abc = 10 xyz = 99 PRINT ABS(xyz - 1.75) PRINT ASC("test") PRINT ATN(xyz / 3) PRINT COS(5.8 * .0174533) PRINT SIN(xyz / 11) PRINT TAN(xyz / 10) PRINT SQRT(xyz) PRINT INT(xyz / 3.1) ' -----------------------------------------TheEnd: END ' ------------------------------------------

378

DISK FILE I/O:
Next up, we will begin working with sequential disk file input. Accessing disk files is perhaps one of the more important functions of a programming language. After all, often times the data that a user application works with is data stored in a disk file. In addition, after accepting user input, the data will need to be stored to disk. To store or retrieve data in a sequential disk file, the first step is to open the file for access. In the Standard Basic dialect, the statement looks something like this:

OPEN "I", #1, "filename"

called the short format,

Let's begin by dissecting the OPEN statement. First off, the OPEN keyword will be tokenized during the compilation process. So, what we will have remaining, in the program statement, is the following:

"I", #1, "filename"
• The first character, (within double quotes, the letter I), signifies the mode of access. The modes of access are: INPUT, OUTPUT, APPEND and RANDOM. In this case, the letter 'I' symbolizes the INPUT mode. What follows, is the hash-mark (or pound sign) followed by the number one. To access a file, you need to maintain what is called a File-Handle, for each file. In this example, the File-Handle for this file will be represented by the number '1'. For historical reasons, (and none other that I can think of), in the various dialects of Basic, File-Handles have been limited to a total of 15 file handles. We will do away with that limitation and set the number of file handles at 99. After the handle number, comes the file name and path of the file that is being accessed.

•

•

In the code listed below, we begin by making the determination of whether the access method is correct. Given this program statement:

"I", #1, "filename"
we must first determine if the character within the quotes represents the access mode. len = strlen(p_string); pi = e_pos; pi = get_upper(pi, len); /* ------------------------------ OPEN "I", #1, "path" if(pi < len) { e_pos = pi; get_iomode(len); }

*/

If the character contained within the quotes is an uppercase alpha character, then the get_iomode() function is called. Based on that alpha character, the next step is to determine the access mode and the file handle number. Here, the mode character is assigned to variable: io_mode.

379

pi = e_pos; ch = p_string[pi]; io_mode = ch;

Next, we need to advance past the hash-mark, to the file handle number. pi = get_digit(pi, len); /* advance to the digit */ fileno = get_avalue(); /* it's value will be a 'float' */ ndx = (int) fileno; /* convert to integer */ ndx--; fopen_short(io_mode, ndx, len);

Here, the file number will be assigned to the integer variable ndx and function fopen_short() is called. Now, we extract the file-name and path from the program statement. We have to allow for the possibility that the file-name and path will either be hard code into the program, within a double quoted string, or contained within a string variable. Example:

OPEN "I", #1, "path"
or

OPEN "I", #1, pathname$
Here is where we make that determination and the action taken: ch = p_string[pi]; if(ch == quote) /* OPEN "I", #1, "path" */ { strng_assgn(); strcpy(io_path, s_holder); } else /* OPEN "I", #1, pathname$ */ { indx = get_strndx(); strcpy(io_path, sv_stack[indx]); }

There are three details about any open file that need to be kept track of. Those are the path, mode and the actual file-pointer (handle) which is assigned by the operating system. We will be using a Structure to keep track of those details. Here is what that structure looks like: struct io_handles /* File: handles structure { char path[PATH]; char mode; FILE *fptr; }; struct io_handles fp[IOARRAY]; /* File: handles structure */

*/

Here is where the disk file is opened and the path, mode and handle are stored in the structure:

380

fp[ndx].mode = io_mode; strcpy(fp[ndx].path, io_path); /* --- now open file for I/O-'type' --- */ if(io_mode == 'I') { handle = fopen(io_path, "r"); fp[ndx].fptr = handle; }

/* copy to structure */

/* open file for Input */

FILE OPEN CODE:
Now it's time to start building the code. In many cases, we will only be adding small snippets of code to existing program files and functions. In others we will be creating new files and functions. Lets begin by adding some startup code to file Input.c. • Open Input.c, • in function: get_byte(), add the following to the list of keyword assignments: ... else if(strcmp(keyword, "OPEN") == 0) else if(strcmp(keyword, "CLOSE") == 0) else ...

byte=19; byte=20;

• • •

Save Input.c and close it. Now open file Bxbasic.c. In the constants declaration area, add the two new constants, as shown here:

/* --- declare constants --- */ #define BUFSIZE 256 #define TOKEN_LEN 21 #define VAR_NAME 33 #define LLEN 33 #define IOARRAY 99 #define PATH 129

•

Now, to the global vars area, at the bottom of the list, add the structure shown here:

381

/* ------ global vars ------------ */ ... /**/ struct io_handles /* File: handles structure { char path[PATH]; char mode; FILE *fptr; }; struct io_handles fp[IOARRAY]; /* File: handles structure /**/

*/

*/

•

In the function includes area, add this 'include': /* --- function includes --- */ ... #include "fileio.c"

•

In function 'Main', add the function call to: clr_iohandles(), as shown here:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("Bxbasic Interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); clr_iohandles(); /* clear I/O file handles */ pgm_parser();

•

Now, to function: parser(), add case 19 and case 20, as shown here:

void parser() { int ab_code=4, x=line_ndx; switch(token) ... case 18: /* LINE INPUT */ get_lninput(); break; case 19: /* OPEN */

382

(Continued) parse_open(); break; case 20: /* CLOSE */ do_fclose(); break;

•

Now save Bxbasic.c and close it.

• •

Next, open file: Prototyp.h. Add a new section called: Fileio.c, containing the following prototypes: /* void void void void void Fileio.c */ clr_iohandles(void); parse_open(void); get_iomode(int); fopen_short(char,int,int); do_fclose(void);

• • •

Save Prototyp.h and close it. Next, create a new file and name it: Fileio.c, and copy the header and all of the following functions into that file: /* bxbasic : Fileio.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

void clr_iohandles() { int ii; for(ii=0; ii<IOARRAY; ii++) { fp[ii].path[0] = '\0'; fp[ii].mode = '\0'; fp[ii].fptr = '\0'; } } /*-------- end clr_iohandles --------*/ /* clear all file handles */ /* reset "path" to 0 */

383

void parse_open() { int pi, len, ab_code=20, x=line_ndx; len = strlen(p_string); pi = e_pos; pi = get_upper(pi, len); /* -----------------------------if(pi < len) { e_pos = pi; get_iomode(len); } /* -----------------------------else { strcpy(xstring, "Invalid Open a_bort(ab_code,x); } } /*------ end parse_open -------*/

OPEN "I", #1, "path"

*/

if not an alpha, abort */ Mode:"); /* i.e.: not: "I", #1, "path" */

void get_iomode(int len) { char ch, io_mode; int pi, ndx, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=20; double fileno; pi = e_pos; ch = p_string[pi]; io_mode = ch; pi = get_digit(pi, len); if(pi == len) /* at end of line */ { strcpy(xstring, "Invalid File Format:"); a_bort(ab_code,x); } else { e_pos = pi; fileno = get_avalue(); ndx = (int) fileno; ndx--; if((ndx < 0) || (ndx > maxfiles))/* if it's out of range, abort */ { strcpy(xstring, "Invalid File Number:\n1 to 99:"); a_bort(ab_code,x); } else if(fp[ndx].path[0] != '\0') /* if it's not closed, abort */ { strcpy(xstring, "File Number In Use:"); a_bort(ab_code,x); /* if path[0]==\0, it's closed */ } else { fopen_short(io_mode, ndx, len); } } } /*------ end get_iomode -------*/

384

void fopen_short(char io_mode, int ndx, int len) { char ch, quote='\"', io_path[PATH]; int pi, indx, x=line_ndx, ab_code=20; FILE *handle; pi = e_pos; ch = p_string[pi]; while((ch != quote) && (isalpha(ch) == 0) && (pi < len)) { pi++; /* advance to: quote or alpha char */ ch = p_string[pi]; /* "filename" or filename$ */ } if(pi == len) { strcpy(xstring, "Invalid File Name:"); a_bort(ab_code,x); } e_pos = pi; if(ch == quote) /* OPEN "I", #1, "path" */ { strng_assgn(); strcpy(io_path, s_holder); } else /* OPEN "I", #1, pathname$ */ { indx = get_strndx(); strcpy(io_path, sv_stack[indx]); } pi = e_pos; fp[ndx].mode = io_mode; /* copy to structure */ strcpy(fp[ndx].path, io_path); /* --- now open file for I/O-'type' --- */ if(io_mode == 'I') { handle = fopen(io_path, "r"); /* open file for Input */ strcpy(xstring, "File Not Found:"); } else { strcpy(xstring, "Invalid Mode:\tI,O,A,R:\n"); a_bort(ab_code,x); /* incorrect mode specified */ } /* ----- */ if(handle == NULL) { a_bort(ab_code,x); /* could not be opened */ } else { fp[ndx].fptr = handle; } } /*------ end fopen_short -------*/

385

This last function: do_fclose(), (below), is needed to close a file once it has been opened. The CLOSE statement generally will accept a single file handle number, multiple file handle numbers or no file handle number at all. Example: or or

CLOSE 1 CLOSE 1,2,3 CLOSE

In the latter form, the CLOSE command will close ALL open files with the single statement. do_fclose() is divided into two sections. The top section closes all open files, while the bottom section closes only those open files that are specified in the statement line. • Copy this function to file Fileio.c as well:

void do_fclose() { char ch; int ii, pi, ndx, len, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=22; double fileno; FILE *handle; len = strlen(p_string); pi = e_pos; pi = get_digit(pi, len); if(pi == len) /* clear all file handles */ { for(ii=0; ii<IOARRAY; ii++) { if(fp[ii].path[0] != '\0') /* if path[0] != \0, it's open */ { handle = fp[ii].fptr; fclose(handle); fp[ii].path[0] = '\0'; /* reset "path" to 0 */ fp[ii].mode = '\0'; fp[ii].fptr = '\0'; } } } else { ch = p_string[pi]; while(pi < len) { if(isdigit(ch) == 0) { pi = get_digit(pi, len); } if(pi < len) { e_pos = pi; fileno = get_avalue(); ndx = (int) fileno; ndx--; if((ndx < 0) || (ndx > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if(fp[ndx].path[0] != '\0') /* if path[0] !=\0 */ { handle = fp[ndx].fptr; /* it's open */ fclose(handle); fp[ndx].path[0] = '\0'; /* reset "path" to 0 */

386

(Continued) fp[ndx].mode = '\0'; fp[ndx].fptr = '\0'; } pi = e_pos; ch = p_string[pi]; } } } } /*------ end do_fclose -------*/

• • •

Save file Fileio.c and close it. Next, open file Error.c. To the bottom of function a_bort(), add cases: 20, 21 and 22, as shown here: case 20: printf("\nOPEN : error: in statement: %d.\n",(line_ndx+1)); printf("%s\tOPEN %sUsage: OPEN \"I\", #1, ",xstring,p_string); printf("\"filename\"\ncode(%d)\n", code); break; case 21: printf("\nINPUT : error: in statement: "); printf("%d.\n\tINPUT %s%s",(line_ndx+1),p_string,xstring); printf("Usage: INPUT#1, input$, (var,,)\ncode(%d)\n", code); break; case 22: printf("\nCLOSE : error: in statement: "); printf("%d.\n\tCLOSE %s%s",(line_ndx+1),p_string,xstring); printf("Usage: CLOSE 1, (handle,,)\ncode(%d)\n", code); break;

•

Save Error.c and close it.

The next step is to compile and test what we have done, so far. At this time, compile Bxbasic.c. If all was copied correctly, there should be no compiler errors. Now, copy this script to: Test.bas:

387

' '

test.bas version 11.4 CLS

PRINT "Opening Test File" OPEN "I", #1, "test.txt" CLOSE 1 ' -----------------------------------------TheEnd: END ' ------------------------------------------

Before we can run the program, we need to create the file "test.txt". Create an empty file, (just hit the return key about three or four times), named "test.txt", then save and close it. Now we can try it. At the command line, enter:

Bxbasic test
If the program ran with no errors, all that will appear on the screen is the message: Opening Test File If that is the case, then the program executed correctly. All we instructed the program to do, was to open the file "test.txt" and then close it. That's all we can do at this point. We have no ability, yet, to do anything more with the file. Now, try this version of Test.bas: test.bas version 11.5 CLS filename$ = "test.txt" ' PRINT "Opening Test File" OPEN "I", #1, filename$ CLOSE 1 ' -----------------------------------------TheEnd: END ' -----------------------------------------'

388

READING FILE DATA:
Now that we have the ability to open and close disk files, we need to build a mechanism for reading-in the data that is stored in the file. A typical disk access statement might look like this:

OPEN "I", #1, "datatfile.dat" INPUT#1, firstname$, lastname$, street$, city$,... CLOSE 1
You will notice that the command for reading-in data is the INPUT command. Previously, we used the INPUT command for accepting keyboard input. We could have used READ instead, except that the keyword READ is already reserved for another purpose. Actually, since we are in the position of being able to redefine our language, we could substitute the keyword READ for INPUT, despite it's other intended use. Or, we could use it interchangeably with INPUT.

INPUT#1, firstname$, lastname$, street$, city$,... READ#1, firstname$, lastname$, street$, city$,...
As a creator of your own language, you can use any keyword you like. For now, and for reasons of compatibility with the Basic dialect, we will just use INPUT. When using the INPUT keyword, we need a means of letting the compiler know that we really want to input data from a disk file, instead of the keyboard. You will notice that the keyword INPUT is followed by the hash-mark and the file handle, #1. This is the method we will use to differentiate between keyboard and disk input. Otherwise, it looks just like an ordinary INPUT statement. This type of disk file access is referred to as "Sequential File Access", because each record is stored on a single line and a comma separates each data item. This is often referred to as a comma delimited file. Example:

"Fred","Smith","1234 Maple St.","Anytown",....\n "Bob","Smith","4321 Elm St.","Anytown",... \n
As shown here, each character string is contained within a pair of double quotes, with a comma between them. The end-of-line represents the end of the record. The next line will contain the next data record.

SEQUENTIAL INPUT CODE:
We will begin by making a few modifications to some existing files and functions. • • Open file: Prototyp.h and add the four prototypes shown below, to the Fileio.c group:

389

/*

Fileio.c */ void clr_iohandles(void); void parse_open(void); void get_iomode(int); void void void void void void fopen_short(char,int,int); do_fclose(void); input_io(void); get_finput(int,int); read_fstring(int,char *); read_fvalue(int,char *);

• • •

Save Prototyp.h and close it. Now, open file: Getinput.c. In the function: get_input(), alter the top portion of the code as follows:

void get_input() { char ch, varname[VAR_NAME]; int pi, len, type, loc=0; int ab_code=19, x=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch == '#') { input_io(); return; } else if(ch == ';') { loc = 1; pi++; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; } [snip]

/* this is a file: [INPUT#1, var:] */

/* do not echo newline */

• • •

Now, save and close Getinput.c. Open file: Fileio.c. Add the following four functions:

390

void input_io() { int pi, port, len, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=21; double fileno; len = strlen(p_string); pi = e_pos; pi = get_digit(pi, len); if(pi == len) /* at end of line */ { strcpy(xstring, "Invalid Input Format:"); a_bort(ab_code,x); } else { e_pos = pi; fileno = get_avalue(); port = (int) fileno; port--; if((port < 0) || (port > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if(fp[port].mode != 'I') { strcpy(xstring, "FILE is not OPEN for INPUT:\n"); a_bort(ab_code,x); /* file not opened for Input */ } else { get_finput(port,len); } } } /*------- end input_io ----------*/

void get_finput(int port, int len) { char ch, varname[VAR_NAME]; int pi, type; pi = e_pos; ch = p_string[pi]; while((pi < len) && (ch != '\n')) /* process up to line terminator */ { if(isalpha(ch) == 0) { pi = get_alpha(pi, len); ch = p_string[pi]; } if(pi < len) { e_pos = pi; type = get_vtype(pi); e_pos = pi; strcpy(varname, get_varname()); if(type == 3) /* a string$ assignment */

391

(Continued) { read_fstring(port, varname); } else /* type==0/1: numeric assignment */ { read_fvalue(port, varname); } pi = e_pos; ch = p_string[pi]; } } } /*------- end get_finput ----------*/

void read_fstring(int port, char *name) { char chIn='\0', varname[VAR_NAME], temp[BUFSIZE]; int ii=0, ndx=0, indx=0, xsize; FILE *handle; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); /* ------- read file input ------- */ handle = fp[port].fptr; while((chIn != '\"') && (!feof(handle))) { ii = fgetc(handle); chIn = (char) ii; } chIn = '\0'; if(!feof(handle)) { while((chIn != '\"') && (!feof(handle))) { ii = fgetc(handle); chIn = (char) ii; if(chIn != '\"') { temp[indx] = chIn; indx++; } } } temp[indx] = '\0'; /* -------- save data --------- */ xsize = strlen(temp); xsize++; sv_stack[ndx] = realloc(sv_stack[ndx], xsize * sizeof(char)); strcpy(sv_stack[ndx], temp); /* save new string */ } /*------- end read_fstring ----------*/

392

void read_fvalue(int port, char *name) { char chi='\0', varname[VAR_NAME], temp[BUFSIZE]; int pi, type, ii=0, ndx=0, indx=0; FILE *handle; strcpy(varname, name); pi = e_pos; type = get_Nvtype(pi); /* ------ get var-type index ------ */ if(type == 4) /* --- double float --- */ { nam_stack = dn_stack; max_vars = dmax_vars; init_fn = init_dbl; } else if(type == 3) /* --- single float --- */ { nam_stack = fn_stack; max_vars = fmax_vars; init_fn = init_flt; } else if(type == 2) /* --- long integer --- */ { nam_stack = ln_stack; max_vars = lmax_vars; init_fn = init_lng; } else /* type == 0/1 /* --- integer --- */ { nam_stack = in_stack; max_vars = imax_vars; init_fn = init_int; } ndx = get_varndx(varname); /* -------- read file data --------- */ handle = fp[port].fptr; while((isdigit(chi) == 0) && (!feof(handle))) { ii = fgetc(handle); /* get digit or: . or: - */ chi = (char) ii; if((chi == '-') || (chi == '.')) { temp[indx] = chi; indx++; } } while((isdigit(chi) != 0) || (chi == '.') && (!feof(handle))) { temp[indx] = chi; indx++; ii = fgetc(handle); chi = (char) ii; } temp[indx] = '\0'; /* --------- store data -------- */ if(type == 4) { dv_stack[ndx] = (double) atof(temp); } else if(type == 3) { fv_stack[ndx] = atof(temp); }

393

(Continued) else if(type == 2) { lv_stack[ndx] = atol(temp); } else { iv_stack[ndx] = atoi(temp); } } /*------- end read_fvalue ----------*/

/* type == 0/1 */

•

Save file Fileio.c and close it.

As you can see, the read_fvalue() function is the longest. That's because it has to deal with four different numerical data types. Now, let's compile Bxbasic.c. Before we can test it, we need to make a new Test.bas and test.txt file. Copy this line of data to our data file: test.txt:

"hello world","next string",32000,650000,1.123,3000000.123
and delete the excess returns we put in previously. Now, copy this new script for Test.bas: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# CLOSE 1 PRINT input$, next$, valuea, valueb%, valuec!, valued# ' -----------------------------------------TheEnd: END ' -----------------------------------------test.bas version 11.6 CLS

Okay, now for the moment of truth. Execute: Bxbasic test Now, let's try something a little more challenging. Copy the following into the data file: Test.txt

"hello world","next string",32000,650000,1.123,3000000.123 "hello ","world",2000,50000,0.123,5000000.123

394

And, change Test.bas so that it reads: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# ' INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# CLOSE 1 ' -----------------------------------------TheEnd: END ' -----------------------------------------test.bas version 11.7 CLS

Now, execute: Bxbasic Test As you can see, we are able to read-in and display multiple sequential records.

ENGINE UPDATE II:
Let's bring our engine up to date, so that we can try out our new features. Just as we made changes to Bxbasic.c, we need to make the exact same changes to Engine.c. Make these modifications to Engine.c: • Constants: /* --- declare constants --- */ #define BUFSIZE 256 #define TOKEN_LEN 21 #define VAR_NAME 33 #define LLEN 33 #define IOARRAY 99 #define PATH 129

395

•

Global Vars: /* ------ global vars ------------ */ ... /**/ struct io_handles /* File: handles structure { char path[PATH]; char mode; FILE *fptr; }; struct io_handles fp[IOARRAY]; /* File: handles structure /**/

*/

*/

•

Includes: /* --- function includes --- */ ... #include "fileio.c"

•

Function: Main:

int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("Bxbasic Interpreter\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); clr_iohandles(); /* clear I/O file handles */ pgm_parser();

•

Function: Parser:

396

void parser() { int ab_code=4, x=line_ndx; switch(token) ... case 18: /* LINE INPUT */ get_lninput(); break; case 19: /* OPEN */ parse_open(); break; case 20: /* CLOSE */ do_fclose(); break;

Now: • •

compile Bxcomp.c, compile Engine.c.

Using the same Test.bas and test.txt examples we used in this chapter, now compile them with our new Bxcomp.exe. Use:

Bxcomp Test

Then execute: Test.exe

END OF FILE:
The latest Test.bas works as intended, but, as far as reading file input, it has very limited capability. For one thing, we can only read-in data from a file of a predetermined length. As shown here: ... OPEN "I", #1, "test.txt" INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# ' INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# CLOSE 1 ...

397

We can only read-in a certain number of records and we have to know when to stop, before we exceed the length of the file. If we didn't stop, we'd begin to read-in random garbage in memory. Example: ... OPEN "I", #1, "test.txt" Loop: INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# GOTO Loop ...

Well, clearly this would end up in an endless loop ! Of course, we could do something smart, like this: ... OPEN "I", #1, "test.txt" Loop: INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# PRINT input$, next$, valuea, valueb%, valuec!, valued# ' IF input$ = "" THEN GOTO ExitLoop ENDIF GOTO Loop ' ExitLoop: ...

The only problem with that is, that testing for an empty input$ : IF input$ = "" THEN

gives no indication that we have reached the end of the file. We could have simply had an error reading in the data. What we need is, in the event of a read error, the ability to actually test if we have reached the end of the file. Something like this: ... IF EOF(1) THEN GOTO ExitLoop ENDIF ... Placed in the proper context, something like this would allow us to continually test for an End Of File situation, before we start corrupting our data. Example:

398

... OPEN "I", #1, "test.txt" Loop: IF EOF(1) THEN GOTO ExitLoop ENDIF ' INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# IF input$ = "" THEN GOTO Loop ENDIF ' PRINT input$, next$, valuea, valueb%, valuec!, valued# GOTO Loop ' ExitLoop: ...

In this sequence, we test for an End Of File before each read and then again in the event of a read error. Here is a flow chart:

Loop: 1) Test: end of file ? if so: exit this loop: 2) Input: read file 3) Test: is data empty ? if so: jump back to Loop: 4) Print: data 5) Goto: Loop: ExitLoop: 6)...
Now that we know what we need, let's begin. It's easy enough to test for an empty string, that mechanism is already in place. What we don't have is the mechanism for testing for an end of file condition. That is going to require a new function, for that purpose and then tie it in to our existing IF/Else parser. Here is a snip from function boolexpress():

int boolexpress() {... type = get_type(); ...

/* what type of comparison is it */

Boolexpress() begins by making a call to function get_type(), where a determination is made as to the specific type of expression is being made. In other words, is it a numeric variable or character string comparison. Here is a snip from get_type():

399

int get_type() { ... if(isalpha(ch)) { s_pos = pi; type = get_vtype(pi); } else if(isdigit(ch)) { type = 1; } else if(ch == '(') { type = 2; } else if(ch == '\"') { type = 4; }...

/* variable name */ /* number value */ /* expression in parens */ /* quoted string */

In the first expression test, if the character in question is an alpha character, then function get_vtype() is called. A reminder, our Basic statement will look like this:

IF EOF(1) THEN
Since the IF will be tokenized, the character in question will be the letter 'E'. Here is a snip from get_vtype():

int get_vtype(int pi) {... while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#;", ch)) { type = 1; /* a numeric variable */ }...

Get_vtype() begins by skipping over all alpha-numeric characters. The purpose is to get to the character that follows, if there is one. Obviously, if the variable type is a string or a numeric variable, then the type is returned to the caller. In our case, what follows is going to be a left paren '('. So we need to add another test here, in the event that a left paren is encountered. Like so:

400

... else if(ch == '(') { f_flag = if_eof(); f_flag--; if(f_flag == s_pos) { type = 5; e_pos = pi; } }...

/* a function */ /* execute EOF Only! */ /* it's an EOF(#) */

You'll see here that another call is made to a function named if_eof(). We can't assume that just because we encounter a left paren, that we have a: Test for EOF statement. In the future, we may want to test for the results of other functions. EOF(1) is a function call, with the parameter "1" being passed to it. And, what we are after is the returned result of that function call. So, now we need to find out whether or not this truly is a call to EOF(). To do that, we will use an existing utility for determining the existence of a particular character string, find_strng().

int if_eof() {... strcpy(tmp, "EOF"); flag = find_strng(tmp); return flag; ...

If, the character string "EOF" does exist in the current statement and at our current position, then it must be a function call to EOF(). That being the case, as shown here: if(f_flag == s_pos) { type = 5; e_pos = pi; }... /* it's an EOF(#) */

we would then set the type value to "5". With the type returned to boolexpress(), we can now act on that. As shown below:

401

int boolexpress() {... type = get_type(); ... else if(type == 5) { bool = is_eof(); if(bool != 0) { bool = 1; } }

/* what type of comparison is it */ /* an EOF evaluation */

we would then make a call to a new function; is_eof(). Here, in is_eof(), we will make the actual test as to whether or not we have reached the EOF. The actual code that performs that is shown here: handle = fp[port].fptr; bool = feof(handle); return bool;

however, before that can be done, the standard validations need to be executed. Like: is the file open? is it the right file handle,...? With all that said, here is the code portion: • • Begin by opening file Ifendif.c: and modify boolexpress() so that it has these changes:

int boolexpress() { int bool, type, a_bool, or_bool, op; int ab_code=17, x=line_ndx; type = get_type(); /* what type of comparison is it */ if((type == 1) || (type == 2)) { bool = Nboolterm(type); /* numeric evaluation */ } else if((type == 3) || (type == 4)) { bool = Sboolterm(type); /* a string evaluation */ } else if(type == 5) { bool = is_eof(); /* an EOF evaluation */ if(bool != 0)

402

(Continued) { } bool = 1;

} else { a_bort(ab_code,x); } /* --- process AND / OR --- */ ... ... /*------- end boolexpress --------*/

•

Next, rewrite get_vtype() to read like so:

int get_vtype(int pi) { char ch; int type=0, f_flag=0; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#;", ch)) { type = 1; /* a numeric variable */ } else if(ch == '(') /* a function */ { f_flag = if_eof(); /* execute EOF Only! */ f_flag--; if(f_flag == s_pos) /* it's an EOF(#) */ { type = 5; e_pos = pi; } } return type; } /*------- end get_vtype --------*/

•

next, here are the new functions: if_eof() and is_eof():

403

int if_eof() { char tmp[4]; int flag=0; strcpy(tmp, "EOF"); /* this function is a file handler only */ flag = find_strng(tmp); /* flag is: (mark+1) */ return flag; } /*-------- end if_eof ---------*/

int is_eof() { char ch; int pi, len, bool, port; int maxfiles=(IOARRAY-1), x=line_ndx, ab_code=23; double fileno; FILE *handle; len = strlen(p_string); pi = e_pos; /* this function is a file handler only */ e_pos = get_digit(pi, len); fileno = get_avalue(); port = (int) fileno; port--; if((port < 0) || (port > maxfiles)) /* if it's out of range, abort */ { strcpy(xstring, "Invalid File Number:\n1 to 99:"); a_bort(ab_code,x); } else if(fp[port].path[0] == '\0') /* if path[0]==\0, it's closed */ { strcpy(xstring, "File Handle Is Closed.\n"); a_bort(ab_code,x); /* if closed, abort */ } handle = fp[port].fptr; bool = feof(handle); return bool; } /*------- end is_eof ----------*/

• • • •

copy these to file Ifendif.c, then save and close it. Next, open file: Error.c: to function a_bort(), add the following:

404

void a_bort(int code,int line_ndx) { ... case 23: printf("\nIF EOF : error: in statement: "); printf("%d.\n\tIF %s%s",(line_ndx+1),p_string,xstring); printf("Usage: IF EOF(1) THEN...\ncode(%d)\n", code); break;

• • • •

now, save Error.c and close it. Open file Prototyp.h and under the "Ifendif.c" heading, add these two prototypes: /* Ifendif.c ... int if_eof(void); int is_eof(void); */

Okay, that's done! Now, recompile Bxbasic.c. Before we can test this out, we need to create a data file: • • Open our data file: TEST.TXT and have it read as follows:

"hello world","next string", 32000, 650000, 1.123, 3000000.123 "hello ","world", 2000, 50000, 0.123, 5000000.123 "hello world","next string", 32000, 650000, 1.123, 3000000.123
• Change Test.bas to read as follows:

405

'

test.bas version 11.8 CLS PRINT "Opening Test File" OPEN "I", #1, "test.txt"

' Start: IF EOF(1) THEN GOTO Finish ENDIF INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# ' IF input$="" THEN GOTO Start ENDIF PRINT input$, next$, valuea, valueb%, valuec!, valued# GOTO Start ' Finish: CLOSE 1 ' -----------------------------------------TheEnd: END ' ------------------------------------------

Okay, now execute:

Bxbasic Test

As you can see, Test.bas can now detect the End Of File and determine the loop before it starts corrupting the data or getting caught up in an endless loop.

FILE OUTPUT:
Since we can now read-in file data, the next obvious step would be to enable file output. The standard Basic keyword for File Output is the WRITE command. It's usage is something like this:

WRITE#1, output$, [var,,]
As you can see, it's usage parallels the INPUT# command. The only thing different is the direction that the data is traveling in. Since writing to a file is very similar to reading from a file, the group of functions for writing are also very similar to those for reading. Since we already know the mechanics behind reading data in from a file, and there are many similarities, we needn't get bogged down with the details of writing to a file, except for the different methods of writing. With that said, let's get started! The first thing we need to do is add the WRITE keyword to our keyword lists. Begin by: • opening file Input.c

406

•

in function get_byte() add the "else if()" for WRITE, as shown here: get_byte() ... else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, else if(strcmp(keyword, else {...

"LINE") == 0) "OPEN") == 0) "CLOSE") == 0) "WRITE") == 0)

byte=18; byte=19; byte=20; byte=21;

•

that's all for Input.c, save and close it. Next, open Bxbasic.c, and add "case 21" to function parser(), as shown: parser() ... case 20: /* CLOSE */ do_fclose(); break; case 21: /* WRITE */ write_io(); break; ...

• •

•

Save Bxbasic.c and close it.

• •

Now open file Prototyp.h and under the heading of "Fileio.c", add these prototypes: /* ... void void int int ... Fileio.c */

write_io(void); get_foutput(int,int); write_fstring(int,int,char *); write_fvalue(int,int,char *);

•

Save Prototyp.h and close it.

The next thing we need to do is enable the ability to open a file for output. For Sequential Access files, there are two methods for opening a file for writing: 1) open/create a new file for: Write 2) open/existing file for: Append

407

These are the two new methods we will add: case 'O': /* open file for output */ handle = fopen(io_path, "w"); break; case 'A': /* open file for Append */ handle = fopen(io_path, "a"); break;

• •

Now, open file Fileio.c and change function fopen_short() to read as follows:

void fopen_short(char io_mode, int ndx, int len) { char ch, quote='\"', io_path[PATH]; int pi, indx, x=line_ndx, ab_code=20; FILE *handle; pi = e_pos; ch = p_string[pi]; while((ch != quote) && (isalpha(ch) == 0) && (pi < len)) { pi++; /* advance to: quote or alpha char */ ch = p_string[pi]; /* "filename" or filename$ */ } if(pi == len) { strcpy(xstring, "Invalid File Name:"); a_bort(ab_code,x); } e_pos = pi; if(ch == quote) /* OPEN "I", #1, "path" */ { strng_assgn(); strcpy(io_path, s_holder); } else /* OPEN "I", #1, pathname$ */ { indx = get_strndx(); strcpy(io_path, sv_stack[indx]); } pi = e_pos; fp[ndx].mode = io_mode; /* copy to structure */ strcpy(fp[ndx].path, io_path); /* --- now open file for I/O-'type' --- */ switch(io_mode) { case 'I': /* open file for Input */ handle = fopen(io_path, "r"); strcpy(xstring, "File Not Found:"); break; case 'O': /* open file for output */ handle = fopen(io_path, "w"); strcpy(xstring, "Unable to Open File:"); break; case 'A': /* open file for Append */ handle = fopen(io_path, "a");

408

(Continued) strcpy(xstring, "Unable to Open File:"); break; default: /* incorrect mode specified */ strcpy(xstring, "Invalid Mode:\tI,O,A,R:\n"); a_bort(ab_code,x); } /* ----- */ if(handle == NULL) { a_bort(ab_code,x); } else { fp[ndx].fptr = handle; } } /*------ end fopen_short -------*/

/* could not be opened */

Now we will add the functions that will process those two methods of writing. • add these new functions to file Fileio.c:

void write_io() { char chmode; int pi, len, port, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=24; double fileno; len = strlen(p_string); pi = e_pos; pi = get_digit(pi, len); /* WRITE#1, output$ */ if(pi == len) /* at end of line */ { strcpy(xstring, "Invalid Write Format:"); a_bort(ab_code,x); } else { e_pos = pi; fileno = get_avalue(); port = (int) fileno; port--; chmode = fp[port].mode; /* I/O mode */ if((port < 0) || (port > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if((chmode != 'O') && (chmode != 'A')) { strcpy(xstring, "FILE not OPEN for OUTPUT:\n"); a_bort(ab_code,x); /* file not opened for Output or APPEND */ } else { get_foutput(port,len);

409

(Continued) } } } /*------- end write_io ----------*/

void get_foutput(int port, int len) { char ch, nl='\n', varname[VAR_NAME]; int pi, type, wflag=0, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=24; double fileno; FILE *handle; pi = e_pos; ch = p_string[pi]; while((pi < len) && (ch != '\n')) /* process up to line terminator */ { if(isalpha(ch) == 0) { pi = get_alpha(pi, len); ch = p_string[pi]; } if(pi < len) { e_pos = pi; type = get_vtype(pi); e_pos = pi; strcpy(varname, get_varname()); if(type == 3) /* output a string$ */ { wflag = write_fstring(wflag,port,varname); } else /* type==0/1: output a value */ { wflag = write_fvalue(wflag,port,varname); } pi = e_pos; ch = p_string[pi]; } } handle = fp[port].fptr; fprintf(handle, "%c", nl); } /*------- end get_foutput ----------*/

410

int write_fstring(int wflag, int port, char *name) { char quote='\"', comma=',', varname[VAR_NAME], temp[BUFSIZE]; int ndx; FILE *handle; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); strcpy(temp, sv_stack[ndx]); /* ----- write file output ----- */ handle = fp[port].fptr; if(wflag > 0) { fprintf(handle, "%c", comma); } fprintf(handle, "%c%s%c", quote, temp, quote); wflag++; return wflag; } /*------- end write_fstring ----------*/

int write_fvalue(int wflag, int port, char *name) { char ch, comma=',', varname[VAR_NAME], temp[BUFSIZE]; int pi, type, ndx=0, indx=0, len, idx; int ivalue; long lvalue; float fvalue; long double dvalue; FILE *handle; strcpy(varname, name); pi = e_pos; type = get_Nvtype(pi); /* ------ get var-type index ------ */ if(type == 4) /* --- double float --- */ { nam_stack = dn_stack; max_vars = dmax_vars; init_fn = init_dbl; } else if(type == 3) /* --- single float --- */ { nam_stack = fn_stack; max_vars = fmax_vars; init_fn = init_flt; } else if(type == 2) /* --- long integer --- */ { nam_stack = ln_stack;

411

(Continued) max_vars = lmax_vars; init_fn = init_lng; } else /* type == 0/1 /* --- integer --- */ { nam_stack = in_stack; max_vars = imax_vars; init_fn = init_int; } ndx = get_varndx(varname); /* --------- store value -------- */ if(type == 4) { dvalue = dv_stack[ndx]; sprintf(temp, "%Lf", dvalue); } else if(type == 3) { fvalue = fv_stack[ndx]; sprintf(temp, "%f", fvalue); } else if(type == 2) { lvalue = lv_stack[ndx]; #ifdef Power_C sprintf(temp, "%Ld", lvalue); #endif #ifdef LccWin32 sprintf(temp, "%d", lvalue); #endif } else { ivalue = iv_stack[ndx]; sprintf(temp, "%d", ivalue); } /* --- this trims trailing zeros --- */ len = strlen(temp); idx = (len-1); ch = temp[idx]; if((type == 4) || (type == 3)) { if(ch == '0') { while(ch == '0') { temp[idx] = '\0'; idx--; ch = temp[idx]; if(ch == '.') { temp[idx] = '\0'; } } } } handle = fp[port].fptr; if(wflag > 0) { fprintf(handle, "%c", comma); } fprintf(handle, "%s", temp);

/* convert to ascii, here */

/* convert to ascii, here */

/* convert to ascii, here */ /* convert to ascii, here */ /* type == 0/1 */ /* convert to ascii, here */

412

(Continued) wflag++; return wflag; } /*------- end write_fvalue ----------*/

That's it!

• •

Save Fileio.c and close it. Compile Bxbasic.c

The basic chain of events isn't too hard to follow: Enter:

write_io() (validate statement) get_foutput() (handle multiple variables) (repeat) string variable numeric variable

write_fstring() write_fvalue()
To test it, copy this new Test.bas: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" ' test.bas version 11.9 CLS

OPEN "O", #2, "test2.txt" ' Start: IF EOF(1) THEN GOTO Finish ENDIF INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# IF input$="" THEN GOTO Start ENDIF PRINT input$, next$, valuea, valueb%, valuec!, valued# WRITE#2, valued#, valuec!, valueb%, valuea, next$, input$ ' write data in reversed order! GOTO Start ' Finish: CLOSE 1, 2

413

(Continued) ' -----------------------------------------TheEnd: END ' ------------------------------------------

As you can see, we are going to open file handle #2 for Output.

Execute

Bxbasic test

Open file: Test2.txt and examine the contents. Now we will Append to an existing file. Copy this version of Test.bas: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" ' OPEN "A", #2, "test2.txt" ' Start: IF EOF(1) THEN GOTO Finish ENDIF INPUT#1, input$, next$, valuea, valueb%, valuec!, valued# IF input$="" THEN GOTO Start ENDIF PRINT input$, next$, valuea, valueb%, valuec!, valued# WRITE#2, input$, next$, valuea, valueb%, valuec!, valued# GOTO Start ' Finish: CLOSE 1, 2 ' -----------------------------------------TheEnd: END ' -----------------------------------------test.bas version 11.10 CLS

Execute

Bxbasic test

Now examine Test2.txt. As you can see, not only can we Write data to a new file, but, we can also Append new data to an existing file.

414

LINE INPUT:
Not always are lines of a file made up of comma delimited strings and numeric data. i.e.:

"hello world","next string", 32000, 650000, 1.123, 3000000.123
There are times when you will want to read-in and entire line of text from a file, up to the 'newline' character. Take this for example:

: The Microsoft Macro Assembler (MASM) : :Mixed-Language Support for Variables and Procedures :--------------------------------------------------:All EXTRN, PUBLIC, and PROC items, as well as uses of the .MODEL :directive, support a language type.
As you can see, since this is not stored as a series of quote-comma-quote delimited strings, we would have a difficult time trying to read-in this type of a file. The answer to this problem is to have a mechanism for just such a case, where we can read-in the entire line, using the newline character as the delimiter. We already have a command called LINE INPUT, for accepting keyboard input and inserting it into a string variable. What we need is an extension of that, to accommodate file input. i.e.:

LINE INPUT#1, input$
Then, process the file input as we normally would, except we'd have a special handler for the "newline" delimited character string. Let's begin! • Start by opening file: Getinput.c • in function get_lninput(), we will add the code shown below, that will trap for the hash-mark "#" character.

void get_lninput() { char ch, varname[VAR_NAME]; int pi, loc=0; pi = e_pos; ch = p_string[pi]; if(ch == '#') { p_string[pi] = '@'; input_io(); return; } else if(ch == ';') { ...

/* do not echo newline */

On encountering the "#" hash-mark, it causes program branching to function input_io(). Additionally, since input_io() is used by the other file read functions, within "p_string", the hash-mark "#" will be converted into a "@" symbol. Since all the other file related functions also use the hash-mark, we need a way to tell the engine that this is a special case and to handle it differently.

415

• • •

Save Getinput.c and close it. Next, open Fileio.c, and replace function input_io() with this version:

void input_io() { char ch; int pi, port, len, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=21; double fileno; len = strlen(p_string); pi = e_pos; ch = p_string[pi]; pi = get_digit(pi, len); if(pi == len) /* at end of line */ { strcpy(xstring, "Invalid Input Format:"); a_bort(ab_code,x); } else { e_pos = pi; fileno = get_avalue(); port = (int) fileno; port--; if((port < 0) || (port > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if(fp[port].mode != 'I') { strcpy(xstring, "FILE is not OPEN for INPUT:\n"); a_bort(ab_code,x); /* file not opened for Input */ } else { get_finput(port, len, ch); } } } /*------- end input_io ----------*/

The change is only minor. We will be passing get_finput() the character variable "ch". "ch" contains the "@" character. • Now replace function get_finput() with this version:

416

void get_finput(int port, int len, char chx) { char ch, varname[VAR_NAME]; int pi, type; pi = e_pos; ch = p_string[pi]; while((pi < len) && (ch != '\n')) /* process up to line terminator */ { if(isalpha(ch) == 0) { pi = get_alpha(pi, len); ch = p_string[pi]; } if(pi < len) { e_pos = pi; type = get_vtype(pi); e_pos = pi; strcpy(varname, get_varname()); if(chx == '@') /* line input# assignment */ { read_fline(port, varname); break; } else if(type == 3) /* a string$ assignment */ { read_fstring(port, varname); } else /* type==0/1: numeric assignment */ { read_fvalue(port, varname); } pi = e_pos; ch = p_string[pi]; } } } /*------- end get_finput ----------*/

What we've done is added this condition: if(chx == '@') /* line input# assignment */ { read_fline(port, varname); break; }

•

Now, add the new function: read_fline():

417

void read_fline(int port, char *name) { char varname[VAR_NAME], temp[BUFSIZE]; int ndx=0, xsize, ii; FILE *handle; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); /* ------- read file input ------- */ temp[0] = '\0'; handle = fp[port].fptr; if(!feof(handle)) { fgets(temp, BUFSIZE, handle); } xsize = strlen(temp); xsize++; if(xsize > 2) { temp[(xsize-2)] = '\0'; /* remove '\n' */ } sv_stack[ndx] = realloc(sv_stack[ndx], xsize * sizeof(char)); strcpy(sv_stack[ndx], temp); /* save new string */ } /*------- end read_fline ----------*/

• •

Save Fileio.c and close it. Open file: Prototyp.h and make this one change and addition:

/*

Fileio.c */ ... void get_finput(int,int,char); ... void read_fline(int,char *);

• •

Save Prototyp.h and close it. Compile Bxbasic.c

The next step is to copy this to: Test.txt:

hello world, next string, 32000, 650000, 1.123, 3000000.123 hello , world, 2000, 50000, 0.123, 5000000.123 hello world, next string", 32000, 650000, 1.123, 3000000.123

418

Now, copy this version of Test.bas: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" ' Start: IF EOF(1) THEN GOTO Finish ENDIF LINE INPUT#1, input$ ' IF input$="" THEN GOTO Start ENDIF PRINT input$ GOTO Start ' Finish: CLOSE 1, 2 ' -----------------------------------------TheEnd: END ' -----------------------------------------test.bas version 11.11 CLS

Try it,

Bxbasic test

419

PRINT TO FILE:
If we wanted to write information to a 'newline' delimited file, we couldn't do it with our current WRITE command. WRITE could be modified in some way to allow it to do that, but, the designers of Basic already came up with a method of handling that. What they did was to expand the PRINT command, which already does quite a lot, to output to a 'newline' delimited file. PRINT-ing information to a file is quite similar to PRINT-ing to a printer. Just the plain text string is transmitted, without commas or quotes and at the end of the line, a newline character, "\n", is sent as a line terminator. So, we are going to modify the PRINT command, just as we modified the LINE INPUT command. Start by: • opening file Output.c • and make the following change, by adding the trap for the hash-mark:

void parse_print() { char ch, quote='\"', paren='('; int pi, type, ab_code=9, x=line_ndx; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; /* --- print to file --- */ if(ch == '#') { p_string[pi] = '@'; write_io(); return; } /* --- print newline --- */ else if(strchr(":\n", ch)) { printf("\n"); return; }...

/* change #1 to @1 */

• •

Save Output.c and close it. Now, open Prototyp.h and make this one modification and addition: /* Fileio.c */ ... void get_foutput(int,int,char); ... void print_fstring(int,char *);

/*<-- modify */ /*<-- add */

•

Save Prototyp.h and close it.

420

• •

Now, open Fileio.c and replace write_io() with this version:

void write_io() { char ch, chmode; int pi, len, port, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=24; double fileno; len = strlen(p_string); pi = e_pos; ch = p_string[pi]; pi = get_digit(pi, len); /* WRITE#1, output$ */ if(pi == len) /* at end of line */ { strcpy(xstring, "Invalid Write Format:"); a_bort(ab_code,x); } else { e_pos = pi; fileno = get_avalue(); port = (int) fileno; port--; chmode = fp[port].mode; /* I/O mode */ if((port < 0) || (port > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if((chmode != 'O') && (chmode != 'A')) { strcpy(xstring, "FILE not OPEN for OUTPUT:\n"); a_bort(ab_code,x); /* file not opened for Output or APPEND */ } else { get_foutput(port, len, ch); } } } /*------- end write_io ----------*/

•

Replace get_foutput() with this version:

void get_foutput(int port, int len, char chx) { char ch, nl='\n', varname[VAR_NAME]; int pi, type, wflag=0, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=24; double fileno; FILE *handle; pi = e_pos; ch = p_string[pi]; while((pi < len) && (ch != '\n')) { if(isalpha(ch) == 0) { pi = get_alpha(pi, len);

/* process up to line terminator */

421

(Continued) ch = p_string[pi]; } if(pi < len) { e_pos = pi; type = get_vtype(pi); e_pos = pi; strcpy(varname, get_varname()); if(chx == '@') /* PRINT to file */ { print_fstring(port,varname); break; } else if(type == 3) /* output a string$ */ { wflag = write_fstring(wflag,port,varname); } else /* type==0/1: output a value */ { wflag = write_fvalue(wflag,port,varname); } pi = e_pos; ch = p_string[pi]; } } handle = fp[port].fptr; fprintf(handle, "%c", nl); } /*------- end get_foutput ----------*/

•

Now, add this new function: print_fstring():

void print_fstring(int port, char *name) { char varname[VAR_NAME], temp[BUFSIZE]; int ndx; FILE *handle; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); strcpy(temp, sv_stack[ndx]); /* ----- write file output ----- */ handle = fp[port].fptr; fprintf(handle, "%s", temp); } /*------- end print_fstring ----------*/

• •

Save Fileio.c and close it. Now, recompile Bxbasic.c.

422

Now try this new version of Test.bas: ' ' PRINT "Opening Test File" OPEN "I", #1, "test.txt" OPEN "O", #2, "test2.txt" ' Start: IF EOF(1) THEN GOTO Finish ENDIF LINE INPUT#1, input$ ' IF input$="" THEN GOTO Start ENDIF PRINT input$ PRINT#2, input$ GOTO Start ' Finish: CLOSE 1, 2 ' -----------------------------------------TheEnd: END ' -----------------------------------------test.bas version 11.12 CLS

Execute

Bxbasic Test

and then examine Test2.txt. Pretty neat, huh ?

ENGINE UPDATE III:
Okay, only one thing left to do now. Let's bring our engine up to speed by making it fully functional with Bxbasic.exe. • • Open Engine.c in function parser(), add the new "case 21":

423

parser() ... case 20: /* CLOSE */ do_fclose(); break; case 21: /* WRITE */ write_io(); break; ...

•

Save Engine.c and close it.

That's it ! Now, recompile Bxcomp.c and recompile Engine.c. Using:

Bxcomp test
compile and run all the Test.bas versions in this chapter.

CONCLUSION
In this chapter we've covered quite a bit concerning disk file I/O. At this point we have added these new elements of functionality to our engine: o o o o o o o OPEN# CLOSE# INPUT# LINE INPUT# WRITE# PRINT# EOF(#)

424

CHAPTER – 11 SUPPLEMENTAL
INTRODUCTION
This is a supplemental to Chapter 11. • The purpose is to make a number of updates and code fixes. Mostly, I've changed many if/else statements into switch/case statements. • To help eliminate a lot of redundant code, I've added a couple of new utility functions. These new functions are called in place of the same code showing up all over the place. Each group is identified by the file name, in boldface, it belongs to. Then the functions contained here should replace the existing ones in the Chapter-11 set of files. Not all files are affected. Only those listed here in this document.

Prototyp.h:
/* Utility.c */ ... int get_alnum(int,int); int while_isalnum(int); ... Fileio.c */ ... void reset_handle(int); void zero_handle(int); ...

/*

Utility.c:
int get_alnum(int pi, int len) { char ch; ch = p_string[pi]; while((isalnum(ch) == 0) && (pi < len)) { pi++; ch = p_string[pi]; } return pi; } /*---------- end get_alnum ---------*/

425

int while_isalnum(int pi) { char ch; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } return pi; } /*-------- end while_isalnum ---------*/

Fileio.c:
void reset_handle(int ndx) { FILE *handle; handle = fp[ndx].fptr; fclose(handle); zero_handle(ndx); } /*------ end reset_handle -------*/ /* it's open */

void zero_handle(int ndx) { fp[ndx].path[0] = '\0'; fp[ndx].mode = '\0'; fp[ndx].fptr = '\0'; } /*------ end zero_handle -------*/

/* reset "path" to 0 */

void clr_iohandles() { int ii; for(ii=0; ii<IOARRAY; ii++) { zero_handle(ii); } } /*-------- end clr_iohandles --------*/ /* clear all file handles */

426

void do_fclose() { char ch; int ii, pi, ndx, len, maxfiles=(IOARRAY-1), x=line_ndx, ab_code=22; double fileno; len = strlen(p_string); pi = e_pos; pi = get_digit(pi, len); if(pi == len) /* clear all file handles */ { for(ii=0; ii<IOARRAY; ii++) { if(fp[ii].path[0] != '\0') /* if path[0] != \0, it's open */ { reset_handle(ii); } } } else { ch = p_string[pi]; while(pi < len) { if(isdigit(ch) == 0) { pi = get_digit(pi, len); } if(pi < len) { e_pos = pi; fileno = get_avalue(); ndx = (int) fileno; ndx--; if((ndx < 0) || (ndx > maxfiles)) { strcpy(xstring, "Invalid File Number:\t1 to 99:\n"); a_bort(ab_code,x); /* if it's out of range, abort */ } else if(fp[ndx].path[0] != '\0') /* if path[0] !=\0 */ { reset_handle(ndx); } pi = e_pos; ch = p_string[pi]; } } } } /*------ end do_fclose -------*/

void read_fvalue(int port, char *name) { char chi='\0', varname[VAR_NAME], temp[BUFSIZE]; int pi, type, ii=0, ndx=0, indx=0; FILE *handle; strcpy(varname, name); pi = e_pos; type = get_Nvtype(pi);

427

(Continued) /* ------ get var-type index ------ */ switch(type) { case 4: /* --- double float nam_stack = dn_stack; max_vars = dmax_vars; init_fn = init_dbl; break; case 3: /* --- single float nam_stack = fn_stack; max_vars = fmax_vars; init_fn = init_flt; break; case 2: /* --- long integer nam_stack = ln_stack; max_vars = lmax_vars; init_fn = init_lng; break; default: /* --- integer --nam_stack = in_stack; max_vars = imax_vars; init_fn = init_int; break; } ndx = get_varndx(varname);

--- */

--- */

--- */

*/

/* -------- read file data --------- */ handle = fp[port].fptr; while((isdigit(chi) == 0) && (!feof(handle))) { ii = fgetc(handle); /* get digit or: . or: - */ chi = (char) ii; if((chi == '-') || (chi == '.')) { temp[indx] = chi; indx++; } } while((isdigit(chi) != 0) || (chi == '.') && (!feof(handle))) { temp[indx] = chi; indx++; ii = fgetc(handle); chi = (char) ii; } temp[indx] = '\0'; /* --------- store data switch(type) { case 4: dv_stack[ndx] = break; case 3: fv_stack[ndx] = break; case 2: lv_stack[ndx] = break; default: -------- */ (double) atof(temp); atof(temp); atol(temp); /* type == 0/1 */

428

(Continued) iv_stack[ndx] = atoi(temp); break; } } /*------- end read_fvalue ----------*/

int write_fvalue(int wflag, int port, char *name) { char ch, comma=',', varname[VAR_NAME], temp[BUFSIZE]; int pi, type, ndx=0, indx=0, len, idx; int ivalue; long lvalue; float fvalue; long double dvalue; FILE *handle; strcpy(varname, name); pi = e_pos; type = get_Nvtype(pi); /* ------ get var-type index ------ */ switch(type) { case 4: /* --- double float nam_stack = dn_stack; max_vars = dmax_vars; init_fn = init_dbl; break; case 3: /* --- single float nam_stack = fn_stack; max_vars = fmax_vars; init_fn = init_flt; break; case 2: /* --- long integer nam_stack = ln_stack; max_vars = lmax_vars; init_fn = init_lng; break; default: /* type == 0/1 --- integer nam_stack = in_stack; max_vars = imax_vars; init_fn = init_int; break; } ndx = get_varndx(varname); /* --------- store value -------- */ switch(type) { case 4: dvalue = dv_stack[ndx]; sprintf(temp, "%Lf", dvalue); break;

--- */

--- */

--- */

--- */

/* convert to ascii, here */

429

(Continued) case 3: fvalue = fv_stack[ndx]; sprintf(temp, "%f", fvalue); break; case 2: lvalue = lv_stack[ndx]; #ifdef Power_C sprintf(temp, "%Ld", lvalue); #endif #ifdef LccWin32 sprintf(temp, "%d", lvalue); #endif break; default: ivalue = iv_stack[ndx]; sprintf(temp, "%d", ivalue); break; } /* --- this trims trailing zeros --- */ len = strlen(temp); idx = (len-1); ch = temp[idx]; if((type == 4) || (type == 3)) { if(ch == '0') { while(ch == '0') { temp[idx] = '\0'; idx--; ch = temp[idx]; if(ch == '.') { temp[idx] = '\0'; } } } } handle = fp[port].fptr; if(wflag > 0) { fprintf(handle, "%c", comma); } fprintf(handle, "%s", temp); wflag++; return wflag; } /*------- end write_fvalue ----------*/

/* convert to ascii, here */

/* convert to ascii, here */ /* convert to ascii, here */ /* type == 0/1 */ /* convert to ascii, here */

430

Getinput.c:
void input_val(char *name, int loc) { char ch, chx, varname[VAR_NAME], string[VAR_NAME]; int pi, ndx, len, row, col; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; /* --- get cursor --- */ switch(loc) { case 1: row = cursor_row(); col = cursor_col(); /* --- get data-input --- */ #ifdef LccWin32 chx = getch(); #endif gets(string); len = strlen(string); reset_cursor(loc,len,col,row); break; default: #ifdef LccWin32 chx = getch(); #endif gets(string); break; } /* --- double --- */ switch(ch) { case '#': nam_stack = dn_stack; /* indirect ref.to name_stack max_vars = dmax_vars; init_fn = init_dbl; /* indirect ref.to function ndx = get_varndx(varname); dv_stack[ndx] = (double) atof(string); pi++; break; /* --- float --- */ case '!': nam_stack = fn_stack; /* indirect ref.to name_stack max_vars = fmax_vars; init_fn = init_flt; /* indirect ref.to function ndx = get_varndx(varname); fv_stack[ndx] = (float) atof(string); pi++; break; /* --- long --- */ case '%': nam_stack = ln_stack; /* indirect ref.to name_stack max_vars = lmax_vars;

*/ */

*/ */

*/

431

(Continued) init_fn = init_lng; /* indirect ref.to function */ ndx = get_varndx(varname); lv_stack[ndx] = atol(string); pi++; break; /* --- integer --- */ default: nam_stack = in_stack; /* indirect ref.to name_stack */ max_vars = imax_vars; init_fn = init_int; /* indirect ref.to function */ ndx = get_varndx(varname); iv_stack[ndx] = atoi(string); break; } pi = iswhite(pi); ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; set_TabNl(ch); /* Tab-NewLine */ } pi = iswhite(pi); e_pos = pi; } /*---------- end input_val ----------*/

void input_str(char *name, int loc) { char ch, chx, varname[VAR_NAME], string[BUFSIZE]; int pi, ndx, len, row, col; unsigned xsize; strcpy(varname, name); nam_stack = sn_stack; /* indirect ref.to name_stack */ max_vars = smax_vars; init_fn = init_str; /* indirect ref.to function */ ndx = get_varndx(varname); /* --- get cursor --- */ switch(loc) { case 1: row = cursor_row(); col = cursor_col(); /* --- get data-input --- */ #ifdef LccWin32 chx = getch(); #endif gets(string); len = strlen(string); reset_cursor(loc,len,col,row); break;

432

default: #ifdef LccWin32 chx = getch(); #endif gets(string); break; } /* --- store data --- */ xsize = strlen(string); xsize++; sv_stack[ndx] = realloc(sv_stack[ndx], xsize * sizeof(char)); strcpy(sv_stack[ndx], string); /* save new string */ pi = e_pos; pi++; ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; set_TabNl(ch); } pi = iswhite(pi); e_pos = pi;

/* Tab-NewLine */

} /*---------- end input_str ----------*/

Ifendif.c:
int boolexpress() { int bool, type, a_bool, or_bool, op; int ab_code=17, x=line_ndx; type = get_type(); /* what type of comparison is it */ switch(type) { case 1: case 2: bool = Nboolterm(type); /* numeric evaluation */ break; case 3: case 4: bool = Sboolterm(type); /* a string evaluation */ break; case 5: bool = is_eof(); /* an EOF evaluation */ if(bool != 0) { bool = 1; } break;

433

(Continued) default: a_bort(ab_code,x); break; } /* --- process AND / OR --- */ op = IsAndOrOp(); while(op != 0) { if(op == 1) { a_bool = AndOrBoolExp(); if((bool == 1) && (a_bool == 1)) { bool = 1; } else { bool = 0; } } else if(op == 2) { or_bool = AndOrBoolExp(); if((bool == 1) || (or_bool == 1)) { bool = 1; } } op = IsAndOrOp(); } return bool; } /*------- end boolexpress --------*/

/* do: AND */

/* do: OR */

int AndOrBoolExp() { int bool, type; int ab_code=17, x=line_ndx; type = get_type(); /* what type of comparison is it */ switch(type) { case 1: case 2: bool = Nboolterm(type); /* numeric evaluation */ break; case 3: case 4: bool = Sboolterm(type); /* a string evaluation */ break; default: a_bort(ab_code,x); break; } return bool; } /*------- end AndOrBoolExp --------*/

434

int Nrelation(double lvalue) { int bool, op, type; int ab_code=17, x=line_ndx; double rvalue; op = get_op(); /* get eval op type */ type = get_type(); /* get right side type */ switch(type) { case 1: rvalue = get_avalue(); /* variable or digit value */ break; case 2: rvalue = rdp_main(); /* expression within parens */ break; default: a_bort(ab_code,x); break; } bool = eval_value(lvalue,rvalue,op); /* evaluate operators */ return bool; } /*------- end Nrelation --------*/

int eval_value(double lval,double rval,int op) { int bool=0; switch(op) { case 1: if(lval == rval) { bool = 1; } break; case 2: if(lval < rval) { bool = 1; } break; case 3: if(lval > rval) { bool = 1; } break; case 4: if(lval <= rval) { bool = 1; } break; case 5: if(lval >= rval) { bool = 1; }

435

(Continued) break; case 6: if(lval != rval) { bool = 1; } break; default: break; } return bool; } /*------- end eval_value -------*/

int Srelation(char *lstr) { char lstring[BUFSIZE], rstring[BUFSIZE]; int bool, ndx, op, type; int ab_code=17, x=line_ndx; strcpy(lstring, lstr); op = get_op(); /* get eval op type */ type = get_type(); /* get right side type */ switch(type) { case 3: ndx = get_string(); /* string variable */ strcpy(rstring, sv_stack[ndx]); break; case 4: get_qstring(); strcpy(rstring, s_holder); break; default: a_bort(ab_code,x); break;

/* quoted string */

} bool = eval_string(lstring,rstring,op); return bool; } /*------- end Srelation --------*/

int eval_string(char *lstr,char *rstr,int op) { char lstring[BUFSIZE], rstring[BUFSIZE]; int bool=0, comp; strcpy(lstring, lstr); strcpy(rstring, rstr);

436

(Continued) comp = strcmp(lstring, rstring); /* --- now test expression --- */ switch(op) { case 1: if(comp == 0) { bool = 1; } break; case 2: if(comp < 0) { bool = 1; } break; case 3: if(comp > 0) { bool = 1; } break; case 4: if(comp <= 0) { bool = 1; } break; case 5: if(comp >= 0) { bool = 1; } break; case 6: if(comp != 0) { bool = 1; } break; default: break; } return bool; } /*------- end eval_string -------*/

int get_op() { char ch; int pi, op, ab_code=18, x=line_ndx; pi = e_pos; ch = p_string[pi]; if(strchr("\"$%!#", ch)) { pi++; }

/* increment past current symbol */

437

(Continued) pi = iswhite(pi); ch = p_string[pi]; switch(ch) { case '=': op = 1; /* break; case '<': pi++; ch = p_string[pi]; if(ch == '>') { op = 6; /* } else if(ch == '=') { op = 4; /* a } else { op = 2; /* a } break; case '>': pi++; ch = p_string[pi]; if(ch == '=') { op = 5; /* } else { op = 3; /* } break; default: a_bort(ab_code,x); break; } if(strchr("=>", ch)) { pi++; } e_pos = pi; return op; } /*------- end get_op -------*/

an is_equal evaluation */

a not_equal evaluation */ less-than or equal eval */ less-than evaluation */

a greater-than or equal eval */ a greater-than evaluation */

int get_vtype(int pi) { char ch; int type=0, f_flag=0; pi = while_isalnum(pi); ch = p_string[pi]; if(ch == '$') { type = 3; }

/* a string variable */

438

(Continued) else if(strchr(" =<>%!#;", ch)) { type = 1; /* a numeric } else if(ch == '(') /* { f_flag = if_eof(); /* f_flag--; if(f_flag == s_pos) /* { type = 5; e_pos = pi; } } return type; } /*------- end get_vtype --------*/ variable */ a function */ execute EOF Only! */ it's an EOF(#) */

Input.c:
void save_tmp() { char ch; int pi, len; /* --- setup xstring to write indented line --- */ strcpy(xstring, " "); strcat(xstring, p_string); pi = 0; ch = p_string[pi]; /* --- test for a Label: --- */ if(isupper(ch) != 0) { len = (LLEN-2); pi = while_isalnum(pi); ch = p_string[pi]; if((ch == ':') && (pi <= len)) { pi++; p_string[pi] = '\0'; strcat(p_string, "\n\0"); fprintf(f_out,"%s", p_string); nrows++; } else { fprintf(f_out,"%s", xstring); nrows++; } } /* --- test for numbered line --- */

/* loop thru "Label:" */

/* write block label */

/* write indented line */

439

(Continued) else if(isdigit(ch)) { fprintf(f_out,"%s", p_string); /* write numbered line */ nrows++; } else { pi = iswhite(pi); ch = p_string[pi]; if(ch != '\'') /* eliminate comment lines */ { fprintf(f_out,"%s", xstring); /* write indented line */ nrows++; } } } /*------- end save_tmp ----------*/

void tmp_byte(int ii) { char ch; int pi, si, byte; int x=ii, ab_code=4; /* ----- fill temp_byte[] here ----- */ pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; if(ch == '\'') /* it's a comment */ { byte = 0; strcpy(temp_prog[ii], "\n\0"); } else { if(isupper(ch)) /* is this a keyword */ { e_pos = pi; byte = get_byte(ii); /* call get_byte */ pi = e_pos; } else if(isalpha(ch)) /* a possible assignment */ { si = pi; /* save pointer position */ pi = while_isalnum(pi); pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$", ch)) /* a variable assignment */ { byte = 1; get_MOD(pi); /* scan for a MOD expression */ pi = si; } else { a_bort(ab_code, x); /* not an assignment */ } }

440

(Continued) else { a_bort(ab_code, x); } } temp_byte[ii] = byte; e_pos = pi; } /*---------- end tmp_byte ----------*/ /* not a keyword or variable */

Variable.c:
void parse_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ /* --- is this a character string --- */ switch(ch) { case '$': e_pos = pi; parse_str(varname); break; /* compare name to double array */ case '#': /* double sign */ nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; init_fn = init_dbl; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; Match('='); /* now get assignment value */

441

(Continued) dv_stack[ndx] = rdp_main(); break; /* compare name to float array */ case '!': /* float sign */ nam_stack = fn_stack; /* indirect reference to name_stack */ max_vars = fmax_vars; init_fn = init_flt; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; Match('='); /* now get assignment value */ fv_stack[ndx] = (float) rdp_main(); break; /* compare name to long array */ case '%': /* long sign */ nam_stack = ln_stack; /* indirect reference to name_stack */ max_vars = lmax_vars; init_fn = init_lng; /* indirect reference to function */ ndx = get_varndx(varname); pi++; pi = iswhite(pi); e_pos = pi; Match('='); /* now get assignment value */ lv_stack[ndx] = (long) rdp_main(); break; /* compare name to integer array */ default: /* no type sign */ nam_stack = in_stack; /* indirect reference to name_stack */ max_vars = imax_vars; init_fn = init_int; /* indirect reference to function */ ndx = get_varndx(varname); pi = iswhite(pi); e_pos = pi; Match('='); /* now get assignment value */ iv_stack[ndx] = (int) rdp_main(); break; } } /*-------- end parse_let ---------*/

double get_varvalue() { char ch, varname[VAR_NAME]; int pi, ndx=0, ab_code=13, x=line_ndx; double value; /* --- get varname --- */ strcpy(varname, s_holder); /* --- get var type --- */

442

(Continued) pi = e_pos; ch = p_string[pi]; var_type = ch; /* --- now compare to var type array --- */ switch(ch) { case '#': nam_stack = dn_stack; /* indirect reference to name_stack */ max_vars = dmax_vars; _GetChar(); /* increment character pointer */ break; case '!': nam_stack = fn_stack; /* indirect reference to name_stack */ max_vars = fmax_vars; _GetChar(); /* increment character pointer */ break; case '%': nam_stack = ln_stack; /* indirect reference to name_stack */ max_vars = lmax_vars; _GetChar(); /* increment character pointer */ break; default: nam_stack = in_stack; /* indirect reference to name_stack */ max_vars = imax_vars; break; } while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ } if(ndx == max_vars) /* error: did not find it */ { a_bort(ab_code, x); } switch(ch) { case '#': value = dv_stack[ndx]; break; case '!': value = (double) fv_stack[ndx]; break; case '%': value = (double) lv_stack[ndx]; break; default: value = (double) iv_stack[ndx]; break; } return value; } /*--------- end get_varvalue ----------*/

443

void clr_indvar(char *name) { char ch, varname[VAR_NAME]; int pi, ndx=0; strcpy(varname, name); pi = e_pos; ch = p_string[pi]; /* --- indirect pointers --- */ switch(ch) { case '$': if(smax_vars == 0) { return; } nam_stack = sn_stack; max_vars = smax_vars; _GetChar(); break; case '#': if(dmax_vars == 0) { return; } nam_stack = dn_stack; max_vars = dmax_vars; _GetChar(); break; case '!': if(fmax_vars == 0) { return; } nam_stack = fn_stack; max_vars = fmax_vars; _GetChar(); break; case '%': if(lmax_vars == 0) { return; } nam_stack = ln_stack; max_vars = lmax_vars; _GetChar(); break; default: if(imax_vars == 0) { return; } nam_stack = in_stack; max_vars = imax_vars; break; } /* --- get variable index --- */ while((ndx < max_vars) && (strcmp(nam_stack[ndx], varname) != 0)) { ndx++; /* find varname in stack */ }

444

(Continued) if(ndx == max_vars) { return; } /* reached end: not found */

/* --- zero individual variable --- */ else if((ch == '$') && (ndx < max_vars)) { sv_stack[ndx][0] = '\0'; /* clear string */ sn_stack[ndx][0] = '\0'; } else if((ch == '#') && (ndx < max_vars)) { dv_stack[ndx] = 0; /* clear double */ dn_stack[ndx][0] = '\0'; } else if((ch == '!') && (ndx < max_vars)) { fv_stack[ndx] = 0; /* clear float */ fn_stack[ndx][0] = '\0'; } else if((ch == '%') && (ndx < max_vars)) { lv_stack[ndx] = 0; /* clear long */ ln_stack[ndx][0] = '\0'; } else if(ndx < max_vars) { iv_stack[ndx] = 0; /* clear integer */ in_stack[ndx][0] = '\0'; } } /*-------- end clr_indvar --------*/

String.c:
void stringstr() { char ch, char_x, quote='\"'; int ii, pi, count, ivalue; double fvalue; pi = e_pos; /* pi enters pointing to: (num, chr) */ pi++; /* advance to first number: num, chr) */ pi = iswhite(pi); e_pos = pi; /* --- get count --- */ count = (int) get_avalue(); if(count < 1) { count = 0; } else if(count > 255) { count = 255; }

445

(Continued) pi = e_pos; /* pi re-enters pointing to: ,chr) */ pi++; pi = iswhite(pi); e_pos = pi; ch = p_string[pi]; /* --- get character --- */ if(ch == quote) /* is it a quoted char: "*") */ { pi++; ch = p_string[pi]; char_x = ch; while(ch != ')') /* advance to paren */ { pi++; ch = p_string[pi]; } e_pos = pi; } else { fvalue = get_avalue(); ivalue = fvalue; } for(ii=0; ii < count; ii++) { s_holder[ii] = ivalue; } s_holder[count] = '\0'; } /*---------- end stringstr ----------*/

void chrstr() { int pi, ivalue; double fvalue; pi = e_pos; pi++; pi = iswhite(pi); e_pos = pi; fvalue = get_avalue(); ivalue = fvalue; s_holder[0] = ivalue; s_holder[1] = '\0'; /* pi enters pointing to: ( */ /* advance to alpha/num: 10) */

} /*---------- end chrstr ----------*/

446

Loops.c:
void do_for() { char ch, varname[VAR_NAME]; int pi, f_ndx, Inc=1, ab_code; int start, next_tru, x=line_ndx; long From, Final; /* --- assign FROM --- */ f_ndx = get_From(); /* --- get TO --- */ Final = get_To(); /* --- get STEP --- */ pi = e_pos; ch = p_string[pi]; if(ch == 'S') { Inc = get_Step(); } /* --- setup for-loop conditions --- */ From = lv_stack[f_ndx]; start = line_ndx; fornxt_flg++;

/* register: line counter */ /* increment For/Next flag */

/* --- increment loop --- */ if(From < Final) { for(; lv_stack[f_ndx] <= Final; lv_stack[f_ndx] += Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { get_token(); s_pos = 0; e_pos = 0; parser(); if(token == 12) /* encountered a: return */ { next_tru = 0; token = 0; } else { line_ndx++; } } } } /* --- decrement loop --- */ else { for(; lv_stack[f_ndx] >= Final; lv_stack[f_ndx] -= Inc) { next_tru = 1; line_ndx = (start + 1); /* set pointer to: here+1 */ while(next_tru == 1) { get_token();

447

(Continued) s_pos = 0; e_pos = 0; parser(); if(token == 12) { next_tru = 0; token = 0; } else { line_ndx++; } } } } fornxt_flg--; } /*-------- end do_for ----------*/ /* decrement For-Next flag */

/* encountered a: return */

Bxcomp.c:
int get_vtype(int pi) { char ch; int type=0; pi = while_isalnum(pi); ch = p_string[pi]; if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#", ch)) { type = 1; /* a numeric variable */ } return type; } /*------- end get_vtype --------*/

That's all !

CHAPTER - 11 supplemental

448

CHAPTER - 12
INTRODUCTION
If you recall, in Chapter 1, I said we would be exploring Assembly language, too. In this chapter I would like to branch into Assembly Language. Not in a huge way, just a beginning you might say. What I would like to do is start work on a Basic to Assembly language compiler. What I mean by that is, a compiler that will read-in a Basic source file, but, instead of just converting the source code into a byte code, it would be translated directly into 80x86 Assembly language. Now, the result of doing that would be an assembly language source file and not an executable. Why not just output an executable ? Well, that would require several more steps and would be very time consuming. We would need to build an assembler that would output Machine language and possibly even a Linker. To do so would be very instructive, I'm sure, but it would take too long before we could begin to test our assembly language programs. Then how will we make our assembly language source files into executables ? The approach I'd like to take is to use existing off the shelf x86 assemblers that are readily available and use them to assemble our source code into executables. That will shorten our development time. The only down side is that our assembler source code won't be very transportable, to other cpu's or operating systems. Currently, Bxbasic as an interpreter is very transportable, having been written in Ansi-C. It can be compiled and executed on just about any cpu or OS that has an Ansi-C compiler. Isn't working in Assembly Language going to be hard ? Yes, but, it's not impossible. Actually, quite a lot of the work is already done. We already have a compiler that translates Basic source code into byte code. All we need to do to make an assembly language compiler is translate the byte code we now have into Assembly language and write it to a source file. Should I start learning Assembly Language ? You will need to do some homework on your own, to get an understanding of what Assembly language is all about and the 80x86 Instruction Set. You can either do some research on the Internet and find a good tutorial or perhaps find some books on x86 Assembly language at a good used book store. I will list some sources for reference material and Internet links at the end of this chapter . It is not required that you have prior experience in Assembly Language programming, but, this will not be an x86 Assembly Language tutorial, either. I will try to do my best to keep this as simple as possible. Rather than teaching you how to program in Assembly language, we will jump right in with both feet and start outputting assembly code. It will be expected that you will do the research on your own to be able to follow along.

THE ASSEMBLER:
There are several assemblers available and some really good ones are free for the downloading. I would like to start out really simple, as simple as I can and try to make this as uncomplicated as possible. The assembler that I want to use in the beginning is called A86. A86 is: • a 16 bit assembler, (shareware), • it's just about the easiest assembler to use, • it generates binary ".COM" files,

449

• • •

it doesn't need an additional linker, A86 comes with a disassembler: D86, D86 is a great little debugging tool that will really come in handy.

At this point, (if you haven't already,) download A86.zip from the same directory that this series is located in. To begin using A86, all you will need to do is copy the A86.zip file to a directory of it's own, such as: C:>\programs...\A86\ and unzip the contents into that directory. It requires no setup or installation. For further instructions on using A86 read the file: "A86read.txt".

BXBASM:
I think it's time to get started. As I indicated previously, we have already written a great deal of the code that we will need to build the Bxb Assembly language compiler. Most of it can just be copied from Bxbasic.c and Bxcomp.c and then we can add the new routines we will need on top of that. We are going to borrow heavily from Jack Crenshaw's original tutorial on the subject of writing an assembly language compiler. Start by creating a new C program file and name it: "Bxbasm.c". You should be able to do this in the same C working directory that we have been using. If you are using Lcc-Win32, create a new project for Bxbasm.c in the same directory. Now begin by copying this header information to the top of Bxbasm.c:

/* bxbasm.c : alpha version.12 */ #define Power_C /* #define LccWin32 */ /* --- declare headers --- */ #include <stdio.h> #include <conio.h> #include <io.h> #include <stdlib.h> #include <ctype.h> #include <string.h> #include <malloc.h> #include <math.h> #ifdef Power_C #include <bios.h> #endif #ifdef LccWin32 #include <tcconio.h> #endif /* Power-C version */ /* LccWin32 version */

450

(Continued) /* --- declare constants --- */ #define BUFSIZE 256 #define TOKEN_LEN 21 #define VAR_NAME 33 #define LLEN 33 #define PATH 81

If you are using Lcc-Win32, be sure you remove the comments from the

#define LccWin32
and add comments around

#define Power_C
Now copy the declarations for the global variables. You will recall that most of these are used by line_cnt() and it's supporting routines:

/* ------ global vars ------------ */ FILE *f_in, *f_out; /* these are the i/o file handles char *prog_name; /* program source-file name char p_string[BUFSIZE]; /* file input string char **array1; /* program array char t_holder[20]; /* token data holder char s_holder[BUFSIZE]; /* xstring (print) data holder int nrows; /* numbers of lines in source file int ncolumns=BUFSIZE; /* dimension for array1[][columns] int line_ndx; /* current execution line int s_pos, e_pos; /* pointers to start & end of token char xstring[BUFSIZE]; /* the print string char **temp_prog; /* temp program array char **temp_label; /* temp label name array int *temp_byte; /* temp byte code array int *byte_array; /* byte code array char **label_nam; /* labels name array int token; /* token: current byte code char var_type; /* current variable type int clrscreen=0; /* flag: CLRSCN /**/

*/ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */ */

Now, copy the function prototypes and #includes:

451

/* ----void void void void void void void void void void void void void void void void #include

function prototypes ----- */ pgm_parser(void); get_token(void); parser(void); Header(void); open_destin(void); Prolog(void); Epilog(void); Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); a_bort(int,int); writeln(char *); write_str(char *); PostLabel(char *); assemble(void); "prototyp.h"

/* --- function includes --- */ #include "input.c" #include "utility.c"

As you can see, there are several new functions. All of these have to do with the generation of the assembly source code. Here is the code for function main():

/* ----- begin program ------------- */ int main(int argc, char *argv[]) { int x=0, ab_code=1; printf("Bxbasm Compiler\n"); if(argc != 2) { a_bort(ab_code, x); } strcpy(t_holder, argv[1]); line_cnt(argv); Header(); pgm_parser(); Epilog(); assemble(); /* --- end of program --- */ clr_arrays(); return 0; } /*---------- end main ----------*/

452

As you can see, main() looks pretty similar to what we've been using. It makes three function calls that are new; Header(), Epilog()and assemble(). Two functions we already have, that don't need any changes are; pgm_parser() and get_token(). Copy these beneath main():

void pgm_parser() { line_ndx = 0; while(line_ndx < nrows) { s_pos = 0; e_pos = 0; get_token(); parser(); line_ndx++; } } /*---------- end pgm_parser ----------*/

void get_token() { strcpy(p_string, array1[line_ndx]); token = byte_array[line_ndx]; } /*---------- end get_token ----------*/

The next function we need is parser(), but, this version needs a few modifications made to it. Copy parser() to Bxbasm.c and place it after get_token():

void parser() { int ab_code=4, x=line_ndx; switch(token) { /* case 1: /* case 2: /* case 3: /* case 4: /* case 5: /* case 6: case 7: Do_cls(); break; case 8: Do_end(); break; /* /* /* /* /* /* /* LET */ CLEAR */ LOCATE */ PRINT */ GOTO */ BEEP */ CLS */

/* END */

453

(Continued) /* case 9: /* GOSUB */ /* case 10: /* RETURN */ /* case 11: /* FOR */ /* case 12: /* NEXT */ /* case 13: /* IF */ /* case 14: /* ELSEIF */ /* case 15: /* ELSE */ /* case 16: /* ENDIF */ /* case -1: /* block label */ default: a_bort(ab_code, x); break; } } /*---------- end parser ----------*/

As you can see, nearly everything except case-7 and case-8 are commented out. The reason for this is easily explainable. Take a look at case-7. Case-7 relates to the byte code for CLS, (clear screen). The first thing our new compiler is going to do is compile the assembly code for Clear Screen. Case-8 relates to the byte code for END. After we have cleared the screen, we will call the assembly routine to end the program. Why not start with case-1 and case-2 for CLS and END ? Well, that would require a complete rewrite of the byte coding routines we already have. It's just simpler to use the existing byte codes and enable the case's that we already have as we need them. Note that case-7 calls function Do_cls() and case-8 calls Do_end(), both are new functions.

ASSEMBLY ROUTINES:
Now we begin adding the new functions that we will be needing. Recall that in main(), the function call right after line_cnt() is function "Header()". Here is the code for Header():

void Header() { open_destin(); writeln(";\t*************BxbAsm Compiler*************"); writeln(";"); writeln(" jmp\tSTART"); writeln(";"); Prolog(); } /*---------- end Header ----------*/

Header() incorporates three new functions; open_destin(), writeln() and Prolog(). Function open_destin() is used to open the destination file for our assembly language source code.

454

writeln() is a general purpose utility that will write to the output file. Everything enclosed in a writeln() statement and passed as a parameter, is part of the assembly source code and will be written to disk. In Assembly language, the semicolon symbol, ";", serves the purpose of a comment, which of course is ignored by the assembler. So, in the above writeln() statements there is actually only one line of assembly code:

"jmp START"
for "jump to START". This is like a GOTO statement in Basic, where "START" can be thought of as a label. Here is the code for function open_destin():

void open_destin() { char ch, dot='.', *destin; int ii; unsigned size=PATH; destin = malloc(size * sizeof(char)); strcpy(destin, s_holder); /* copy source file name */ ii = 0; ch = '\0'; while(ch != dot) { ch = s_holder[ii]; /* make source.bas = source. */ ii++; } destin[ii] = '\0'; strcat(destin, "asm"); /* append "asm" to filename */ /* --- copy asm source filename --- */ strcpy(t_holder, destin); /* --- open destination file (write) --- */ f_out = fopen(destin,"w"); printf("Destination file: %s\n",destin); } /*---------- end open_destin ----------*/

"open_destin" is an abbreviation for "Open Destination File". If you look closely, you will notice that it is almost a direct copy of the top portion of merge_source(), from Bxcomp.c. Prolog() is the final statement in Header(). Here is the code for function Prolog():

455

void Prolog() /* Write the Prolog */ { writeln("; --------------------------------------------------"); writeln("START PROC NEAR"); writeln("; --------------------------------------------------"); } /*---------- end Prolog ----------*/

In Prolog() you will see that there is only one line of assembly code that is not a comment. What that statement is doing is declaring a "procedure" named "START". In assembly language a "procedure" is a block of code similar to a "function". So, think of it as a function named "start". Just as C has a main() function, Assembly has a "start". The main program begins and ends in "start". Also, it is referred to as a "near" procedure as opposed to a "far" procedure. For now, all of our procedures will be "near". Procedure "START" is the target of our "jump" in Header(). Since we have a Prolog, we might as well have an Epilog. Here is the code for function Epilog():

void Epilog() /* Write the Epilog */ { writeln(";"); PostLabel("DONE"); writeln(" INT 20H"); writeln("START\tENDP"); writeln("; --------------------------------------------------"); Do_functions(); fclose(f_out); printf("Done.\n"); } /*---------- end Epilog ----------*/

Function Epilog() will server the purpose of writing the ending point of our assembly language source code. After the main program has been translated and written to file, Epilog() will write the assembly code that signifies to the assembler the end of the main program and then it will write any additional functions or procedures that are called for.

PROCEDURES:
If you recall, in parser(), the targets of case-7 and case-8 are functions Do_cls() and Do_end(). Here is the code for those two functions, copy these to Bxbasm.c:

456

void Do_cls() /* call clear-screen */ { writeln(" call CLRSCN"); clrscreen = 1; } /*---------- end Do_cls ----------*/

void Do_end() /* jump to Done */ { writeln(" jmp DONE"); } /*---------- end Do_end ----------*/

These two functions simply write a single line of assembly code each. In Do_cls(), global variable "clrscreen" serves as a flag and if the Basic source code contains a CLS instruction, the flag is set to a True value. Ultimately, if the flag is True, the clear screen procedure will be written to the assembly source file. Notice how this process differs from Bxb's Engine.exe. The Engine contains all of the functions that will ever be needed to execute any Bxbasic source file. Bxbasm.exe will only contain those functions that are actually needed and called for in the Bxbasm assembly source file. Function Do_functions() will serve the purpose of controlling which additional functions and procedures need to be written to the assembly source code file. Here is the code:

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } } /*---------- end Do_functions ----------*/

In this case, "clear screen" is the only function we have, so, if the flag "clrscreen" is set to True, function "ClrScrn()" will be called and the procedure for clearing the screen will be written to the assembly source file. As we add more functionality to our compiler we will add more flags and more functions to Do_functions(). Here is the code for ClrScrn():

457

void ClrScrn() /* clear screen function */ { writeln("CLRSCN PROC NEAR"); writeln(" push ax"); /* save ax */ writeln(" mov ax, 2"); /* set crt mode: 80x25 bw */ writeln(" INT 10H"); /* bios interrupt */ writeln(" pop ax"); /* restore ax */ writeln(" ret"); writeln("CLRSCN ENDP"); writeln("; --------------------------------------------------"); } /*---------- end ClrScrn ----------*/

As you can see, this is almost entirely x86 assembly language. The BIOS Interrupt 10H, sets the display mode and in effect clears the display. This is one of many built-in BIOS and DOS routines that we can use to accomplish any number of tasks using assembly language.

UTILITIES:
The next two functions; write_str() and writeln(), are the output routines that write the assembly code to the source file. Copy these to Bxbasm.c:

void write_str(char *string) { fprintf(f_out,"%s", string); } /*---------- end write ----------*/

void writeln(char *string) { write_str(string); fprintf(f_out, "\n"); } /*---------- end writeln ----------*/

Function PostLabel() is used for writing a block label to the source file.

458

void PostLabel(char *string) /* Post a Label To Output */ { fprintf(f_out, "%s:\n", string); } /*---------- end PostLabel ----------*/

Function assemble() calls A86.com to assemble the ".asm" source file into the object file.

void assemble() { strcpy(s_holder, "a86 "); strcat(s_holder, t_holder); system(s_holder); /* call A86 assembler */ } /*---------- end assemble ----------*/

Copy functions: PostLabel(), assemble() and a_bort() to Bxbasm.c:

void a_bort(int code,int x) { switch(code) { case 1: printf("Unspecified Program Name.\n"); printf("Enter:\"bxbasm source.bas\"\n"); printf("code(%d)\n",code); break; case 2: printf("Program file:\"%s\" not found.\n", t_holder); printf("Enter: \"bxbasm source.bas\"\n"); printf("Program Terminated.\ncode(%d)\n", code); break; case 3: break; case 4: printf("\nSyntax error: in program line:"); printf(" %d.\n%s",(x+1),p_string); printf("Unknown Command.\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*---------- end a_bort ----------*/

459

That's it! That's all there is for now.

IMPORTANT: Before you proceed, make sure the copy of A86.com, that came in the Sources.zip file, is in the same working directory that Bxbasm.exe will reside in. When Bxbasm.exe is executed, A86.com will automatically be called to assemble the source file into the command file. Now compile Bxbasm.c. Hopefully there will be no errors reported. Now, using Bxbasm.exe, compile this version of Test.bas: test.bas version 12.1 CLS ' -----------------------------------------END ' -----------------------------------------'

From the DOS prompt, enter:

Bxbasm Test
The result will be the new assembly source file and the command file: Test.asm and Test.com. Using your editor, open Test.asm and take a look at it. • I recommend using Windows "Notepad" or Dos "Edit". What ever you use, take note that Test.asm is a pure ascii text file and has no embedded control codes. If you use a word processor, be careful, the assembler can not read embedded control codes. You will notice that Test.asm is considerably larger than Test.bas, but, beside the fact that there are quite a few comment lines added, this is pure Assembly Language! Assuming there were no errors and there shouldn't have been any, let's try Test.com. From the Dos prompt, enter:

Test
It should have cleared the screen and left the cursor at the Dos prompt on the top line. If it did all that, then our Assembly Language compiler is working ! Oh, one more thing, at the Dos prompt, enter:

dir *.com
Look at the file size for Test.com. It's in bytes !!! Not Kbytes, but bytes !!!

460

RE-ORGANIZE:
Before moving on, it's time we clean up Bxbasm.c a bit by moving some functions to other files. Create a new file and name it: Asmutils.c and copy this header information to the top of that file:

/* bxbasm : Asmutils.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

Now copy this list of functions from Bxbasm.c to Asmutils.c: Header() open_destin() Prolog() Epilog() writeln() write_str() PostLabel() assemble()

and delete them from Bxbasm.c. Save Asmutils.c and close it. Open Prototyp.h. Copy the following prototypes to that file:

/* void void void void void void void void

Asmutils.c */ Header(void); open_destin(void); Prolog(void); Epilog(void); writeln(char *); write_str(char *); PostLabel(char *); assemble(void);

Now create another new file and name it:

Asmfunct.c
and copy this header information to the top of the file:

461

/* bxbasm : Asmfunct.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

Now copy this list of functions from Bxbasm.c to Asmfunct.c: Do_cls() Do_end() Do_functions() ClrScrn()

and delete them from Bxbasm.c. Copy these prototypes to Prototyp.h: /* void void void void Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void);

Now create another new file and name it:

Asmerror.c
and copy this header information to the top of the file:

/* bxbasm : Asmerror.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h"

and copy function a_bort() from Bxbasm.c to Asmerror.c. We don't need to add the prototype for a_bort() to prototype.h, because it is already declared. Now, change the function prototypes and includes, in Bxbasm.c, to read as follows:

462

/* ----void void void #include

function prototypes ----- */ pgm_parser(void); get_token(void); parser(void); "prototyp.h"

/* --- function includes --- */ #include "input.c" #include "utility.c" #include "asmutils.c" #include "asmfunct.c" #include "asmerror.c"

Save and close the files we have been working on. Re-compile Bxbasm.c to make sure we didn't break it. Test it again by compiling Test.bas with Bxbasm.exe. Enter:

Bxbasm Test

BEEP:
Along the lines of keeping it simple, let's take the next small step and add the BEEP command. Open Bxbasm.c and modify the global variables section to include the variable "beepsound": /* ------ global vars ------------ */ [snip] int clrscreen=0; /* flag: CLRSCN int beepsound=0; /* flag: BEEP /**/ [snip]

*/ */

Now make the following changes to function parser() so that case-6 calls function Do_beep(), as follows:

463

void parser() [snip] case 6: Do_beep(); break; case 7: Do_cls(); break; case 8: Do_end(); break; [snip]

/* BEEP */ /* CLS */ /* END */

Save Bxbasm.c and close it. Now open file: Asmfunct.c. Change Do_functions() by adding the "beepsound" expression, as follows:

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(beepsound == 1) { BeepSnd(); } } /*---------- end Do_functions ----------*/

Now, add functions Do_beep() and BeepSnd() as follows:

void Do_beep() { writeln(" call BEEP"); beepsound = 1; } /*---------- end Do_beep ----------*/

void BeepSnd() { writeln("BEEP PROC NEAR"); writeln(" push ax");

464

(Continued) writeln(" push dx"); writeln(" mov dl, 7"); writeln(" mov ah, 2"); writeln(" INT 21H"); writeln(" pop dx"); writeln(" pop ax"); writeln(" ret"); writeln("BEEP ENDP"); writeln("; --------------------------------------------------"); } /*---------- end BeepSnd ----------*/

You may notice that Procedure Beep is rather lengthy for just making a beep sound. That's partially because it has to push and pop two registers. In this procedure we use Interrupt 21H, which is a DOS Interrupt. The command code is placed in register AH, in this case 2 is the code for "display a character" and the character to be displayed is loaded in register DL. Then Interrupt 21H is called to perform the specific Dos function. Save file Asmfunct.c and close it. Now open file Prototyp.h. Add the prototypes for Do_beep() and BeepSnd() in the Asmfunct.c list: /* Asmfunct.c [snip] void Do_beep(void); void BeepSnd(void); */

Save Prototyp.h and close it. Now compile Bxbasm.c. Using Bxbasm.exe, compile this version of Test.bas: test.bas version 12.2 CLS BEEP ' -----------------------------------------END ' -----------------------------------------'

From the DOS prompt, enter:

Bxbasm Test
Okay, it's still not much, but, I said we were going to take it slowly. Using your editor, open Test.asm and examine the assembly source code.

465

Now execute the program by entering:

Test
Well, did that work okay ?

".COM":
Before we continue I will need to explain some details about ".COM" programs and how the A86 assembler works. Without going into too much detail about how ".COM" programs work and why they differ from ".EXE" programs, I will try to give a very general idea of what is going on. All command files, (".com" programs that is,) are intended to be executed from memory address 100H, within the program segment. The reason for this is more for historical reasons than anything else. PC-Dos was written at a time when 256KBytes RAM was considered to be quite a lot. In reality, it was just enough memory to load the DOS and one command program. DOS begins by creating what is called the "Program Segment Prefix" at memory location 00H and uses the RAM space up to location 0FFH for it’s own purposes. Above memory location 0FFH and beginning at memory address 0100H, a 64KB segment of memory space is set aside for the ".com" program to be loaded and executed. Here is diagram of this region of memory:

0H

used by DOS Program Segment Prefix ~ ~ ".COM" 64K program area ~ ~ Stack RAM, ~ ~

0FFH

0100H

100FEH

100FFH

With this in mind, the first program instruction to be executed has to begin right at location 100H. This causes a little bit of a problem though. The assembler wants the program variables to be placed at the beginning of the program, before the START procedure. If your program uses no variables, then there is no problem. However, not many program function well without variables. So, we place a "jump" instruction at the beginning of the program, then allow the assembler to allocate space for the variables, which is then followed by the main body of the program.

0100H Start ~

jmp: vars.............. ~

You might have wondered why the assembly source code generated by Bxbasm begins with a jump instruction ("jmp"). As stated above, this is because the assembler is expecting all the variables to be declared at the beginning of the source code. Up to this point we have not had any variables to declare.

466

Like Assembly Language, a number of high level programming languages require that all variables be declared at the beginning of the program, for these same reasons. As you've seen in Bxbasic and in many other dialects of Basic, this is not always a requirement. However, many new dialects of Basic do make it a requirement that all variables be declared.

PRINT:
Let's take the next leap forward and expand our compiler to generate the assembly code to correspond to the PRINT command. We will begin by printing a quoted string, such as:

PRINT "hello world".
Before we begin, unless we are going to make it a requirement that all variables be declared, we will need to develop a scanner that will pre-scan the Basic source code and seek out variables, constants and character strings. This would be to identify them so that we can assign names and declare them before we write the main body of the program. Since we are starting by printing a quoted string, we will need to: • create the look-ahead scanner that will identify a double quoted character string, • generate a unique variable name to store it under, • write variable-name and quoted string to the .asm source file, • replace the quoted string in memory with the variable-name. Example: *) Basic source: [PRINT "hello world"] *) byte code: [ 4 ] ["hello world"] 1) identify print-------^ identify quoted string-------^ if it is a print statement, then begin scanning for quoted string. 2) 3) 4) generate a unique variable name for quoted string: Var-1 write to asm file:

Var-1 DB 'hello world',13,10,'$'

replace quoted string with varname: [ 4 ] [ Var-1 ] byte code--------------^ new variable name-----------^

In the assembly source file, variable names and quoted strings look like this:

Var Abc

DB DB

'hello world',13,10,'$' 'this is a test',13,10,'$'
Define Byte

In the left column is the variable name, followed by "DB" which means:

That is followed by the quoted string and the ascii codes for a Return and LineFeed. This type of string is terminated by a dollar sign.

467

Let's get started: Begin by opening Bxbasm.c. Add two new global var's: "var_count" and "printstring", as shown here: /* ------ global vars ------------ */ [snip] int clrscreen=0; /* flag: CLRSCN int beepsound=0; /* flag: BEEP int var_count=0; /* variables counter int printstring=0; /* flag: PRINT /**/ [snip]

*/ */ */ */

In function parser(), change the code for case-4 to read as follows:

void parser() [snip] case 4: /* PRINT */ Do_print(); break; [snip]

Save Bxbasm.c and close it. Open file: Asmutils.c. Copy this new version of function Prolog():

void Prolog() /* Write the Prolog */ { ScanVars(); /* pre-scan for quoted strings */ writeln("; --------------------------------------------------"); writeln("START PROC NEAR"); writeln("; --------------------------------------------------"); } /*---------- end Prolog ----------*/

Note the addition of the call to ScanVars(). Here is the code for ScanVars(), copy this code to Asmutils.c:

468

void ScanVars() /* identify and write all variables */ { char ch, NVar[12], a_string[BUFSIZE]; int pi, x; line_ndx = 0; while(line_ndx < nrows) { get_token(); if(token == 4) /* the print token */ { strcpy(NVar, NewVarname()); /* get a varname */ write_str(NVar); write_str("\tDB \'"); pi = 0; pi = iswhite(pi); ch = p_string[pi]; if(ch == '\"') /* jump over quote */ { pi++; ch = p_string[pi]; x = 0; while(ch != '\"') /* copy up to quote */ { a_string[x] = ch; x++; pi++; ch = p_string[pi]; } a_string[x] = '\0'; strcat(a_string, "\',13,10,\'$\'"); writeln(a_string); strcpy(array1[line_ndx], NVar); } /* replace string with varname */ } line_ndx++; } } /*---------- end ScanVars ----------*/

This function proceeds to scan through each program line, looking for a "Print" token. When it finds one, it calls NewVarname() to generate a unique name for identification. Even though this will be a quoted string, the assembler needs a name with which to identify it. The var-name and the character string are then written to the source file:

Var

DB

'hello world',13,10,'$'

Then the quoted string in the program-array is replaced with the var-name: instead of: it will read: Print "hello world" Print Var

Later, when the main body of the program is written to the source file, the "Print" statement will refer to the varname, rather than to the string itself. Here is the code for function NewVarname():

469

char *NewVarname() { char Val[6]; static char NVar[12];

/* generate a unique varname */

var_count++; strcpy(NVar, "VARX"); sprintf(Val, "%d", var_count); strcat(NVar, Val); return NVar; } /*---------- end NewVarname ----------*/

Copy the above to file: Asmutils.c. Save and close it. Open file Asmfunct.c. Change Do_functions() to read as follows:

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(beepsound == 1) { BeepSnd(); } if(printstring == 1) { PrintStr(); } } /*---------- end Do_functions ----------*/

Copy these two new functions: Do_print() and PrintStr() (below) to Asmfunct.c:

void Do_print() { char temp[BUFSIZE]; strcpy(temp, p_string); strcat(temp, ")"); write_str(" mov dx, offset("); writeln(temp); writeln(" call PRINTSTR"); printstring = 1; } /*---------- end Do_print ----------*/

470

Examine the write_str statement above. It's not clear by looking at the statements, but what will be written to file will look very much like this:

mov

dx, offset(VARX1)

What this statement is doing is loading the DX register pair with the offset address of variable VARX1. That address will be the beginning of the quoted string where the assembler will store it in memory. The Dos Interrupt we will be using, INT 21H, requires that the offset address of the string be loaded in the DX register. From that point, procedure "PRINTSTR" will be called, where it will print the entire string. Here is the code for PrintStr():

void PrintStr() { writeln("PRINTSTR PROC NEAR"); writeln(" push ax"); writeln(" mov ah, 9"); writeln(" INT 21H"); writeln(" pop ax"); writeln(" ret"); writeln("PRINTSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintStr ----------*/

Notice how register AH is loaded with the value 9. That is the "INT 21H" code for "Display String". Save Asmfunct.c and close it. Open file: Prototyp.h. Update these prototype lists to include the four new functions: /* Asmutils.c */ [snip] void ScanVars(void); char *NewVarname(void); /* Asmfunct.c [snip] void Do_print(void); void PrintStr(void); [snip] */

Save Prototyp.h and close it. Now compile Bxbasm.c. With Bxbasm.exe, compile this version of Test.bas:

471

test.bas version 12.3 CLS BEEP PRINT "hello world!" BEEP PRINT "this is a test" ' -----------------------------------------END ' ------------------------------------------

'

Then try it, enter:

Test

REVIEW:
Now let's take a look at the generated assembly code in Test.asm: ; ; *************BxbAsm Compiler*************

jmp START ; VARX1 DB 'hello world!',13,10,'$' VARX2 DB 'this is a test',13,10,'$' ; -------------------------------------------------START PROC NEAR ; -------------------------------------------------call CLRSCN call BEEP mov dx, offset(VARX1) call PRINTSTR call BEEP mov dx, offset(VARX2) call PRINTSTR jmp DONE ; DONE: INT 20H START ENDP ; -------------------------------------------------CLRSCN PROC NEAR push ax mov ax, 2 INT 10H pop ax ret CLRSCN ENDP ; -------------------------------------------------BEEP PROC NEAR

472

(Continued) push ax push dx mov dl, 7 mov ah, 2 INT 21H pop dx pop ax ret BEEP ENDP ; -------------------------------------------------PRINTSTR PROC NEAR push ax mov ah, 9 INT 21H pop ax ret PRINTSTR ENDP ; --------------------------------------------------

See how procedure "Start" mirrors Test.bas ? Does it all make sense ? If you don't understand what is taking place, go back and review the new functions we just added to Asmutils.c and Asmfunct.c, then refer back to Test.asm to see the results of each step. The generated code works fine, but, it's not perfect. For one thing, look at this line of assembly code:

VARX1 DB

'hello world!',13,10,'$'

When you ran Test.com, you may have noticed that a newline was automatically printed after the quoted text. That is because of the : ",13,10," after the quoted string. Recall I said they were the Return and LineFeed ascii codes. In a real world situation, we would not want a newline to be automatically printed unless we knew that one was needed. We may want to follow the text string with some more data on the same line. So, we have to stop the: ",13,10," from being printed and instead, just print the quoted string and the dollar sign terminator. Like this:

VARX1 DB

'hello world!','$'

Okay that's easy enough to change. We need to add some code to our program to test that what follows is in fact a newline character or some other data, such as a Tab or something else. Example: PRINT "hello world", In this example, the quoted string is followed by a comma, which represents a Tab character. So we would then need to send the Tab, as a single character, to the display. So we need a routine for outputting a single character. Actually, we already have something that might do the trick, we just need to "tweak" it a little bit. What I'm referring to is the BEEP routine. Take a look at this assembly code:

473

BEEP PROC NEAR push ax push dx mov dl, 7 mov ah, 2 INT 21H pop dx pop ax ret BEEP ENDP

; ; ; ; ; ; ;

save registers " code for beep code for print character dos request restore registers "

This routine has 10 lines of generated code and 8 lines of actual assembly code, which is quite a lot for something that just makes a beep sound. What would be a little more efficient is a generic routine for outputting a single character to the display. That routine could then be called by a simplified BEEP routine. Example: BEEP PROC NEAR push dx mov dl, 7 call PRINTCHR pop dx ret BEEP ENDP PRINTCHR PROC push ax mov ah, 2 INT 21H pop ax ret PRINTCHR ENDP NEAR ; ; ; ; save registers code for print character dos request restore registers

; ; ; ;

save registers code for beep print it restore registers

Okay, this is even more lines of generated code, but, in the long run it could reduce the number of lines generated. We now have a BEEP routine that has 5 lines of assembly code and a generic "print single character" routine that has 5 lines of assembly code. If we wanted to create a routine that outputs a Tab character, that would only add 5 more lines of assembly code. The only problem with this idea is, that every time we wanted to output a character, we would need to create a new procedure for it and there are potentially 255 different characters. A better solution might be to not create a new procedure for each character, but, instead do something like this:

474

... push dx mov dl, 7 ; code for beep call PRINTCHR pop dx ... ; -------------------------------------------------PRINTCHR PROC NEAR push ax mov ah, 2 INT 21H pop ax ret PRINTCHR ENDP ; --------------------------------------------------

In this example, the setting up of the character to be output is done in the main body of the program. Doing it this way reduces the number of setup lines to 4 and procedure PRINTCHR remains at 5 lines of assembly code. This will result in only adding 4 lines of assembly code for a given character to be output. There still may be a better way, but, let's try it this way and see how well it works. Open file: Asmfunct.c. Rewrite function Do_beep() to look like this:

void Do_beep() { writeln(" push dx"); writeln(" mov dl, 7 ; code for beep"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } /*---------- end Do_beep ----------*/

Now, delete function BeepSnd(), we won't be needing it any more and replace it with this code for function PrintChr():

475

void PrintChr() { writeln("PRINTCHR PROC NEAR"); writeln(" push ax"); writeln(" mov ah, 2"); writeln(" INT 21H"); writeln(" pop ax"); writeln(" ret"); writeln("PRINTCHR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintChr ----------*/

Now, rewrite function Do_functions()so that it reads like this:

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(printstring == 1) { PrintStr(); } if(printchrctr == 1) { PrintChr(); } } /*---------- end Do_functions ----------*/

Save Asmfunct.c and close it. Now, open Prototyp.h. Delete this prototype: void BeepSnd(void); and replace it with this prototype: void PrintChr(void); Save Prototyp.h and close it. Now, open file: Bxbasm.c. And, from the "global vars" section, delete the declaration for: int beepsound=0; /* flag: BEEP */

476

Save Bxbasm.c and close it. Open file: Asmutils.c. We need to make a change to function ScanVars(), in the line that prints the newline, after the quoted string has been printed. Delete the ",13,10," from the "strcat" line: strcat(a_string, "\',13,10,\'$\'"); so that it looks like: strcat(a_string, "\',\'$\'"); That section should end up looking like this:

void ScanVars() [snip]

/* identify and write all variables */ strcat(a_string, "\',\'$\'"); writeln(a_string); strcpy(array1[line_ndx], NVar); /* replace string with varname */

} } line_ndx++;

} } /*---------- end ScanVars ----------*/

Save Asmutils.c and close it. We haven't fixed everything yet, but, let's make sure what we do have is working correctly. Compile Bxbasm.c. Now, using Bxbasm.exe, compile Test.bas again. Enter: Bxbasm Test Let's take a look at the assembly code generated for Test.asm. Compare your output with this one, it should be exactly the same: ; ; *************BxbAsm Compiler*************

jmp START ; VARX1 DB 'hello world!','$' VARX2 DB 'this is a test','$' ; -------------------------------------------------START PROC NEAR ; -------------------------------------------------call CLRSCN

477

(Continued) push mov call pop mov call push mov call pop mov call jmp dx dl, 7 PRINTCHR dx dx, offset(VARX1) PRINTSTR dx dl, 7 PRINTCHR dx dx, offset(VARX2) PRINTSTR DONE ; code for beep

; code for beep

; DONE: INT 20H START ENDP ; -------------------------------------------------CLRSCN PROC NEAR push ax mov ax, 2 INT 10H pop ax ret CLRSCN ENDP ; -------------------------------------------------PRINTSTR PROC NEAR push ax mov ah, 9 INT 21H pop ax ret PRINTSTR ENDP ; -------------------------------------------------PRINTCHR PROC NEAR push ax mov ah, 2 INT 21H pop ax ret PRINTCHR ENDP ; --------------------------------------------------

Not too bad, is it ? The main body is a little cluttered, but, that can't be helped. Now execute:

Test.com
Notice how this time both strings are printed on the same line, as it should have. The next thing we need to do is redesign the PRINT routines so that they reflect the real world situations we will encounter. We wrote the Print routines with just one purpose in mind; to display a single quoted string. We've broadened that definition a bit by not making the assumption that each string was followed by a newline.

478

As it stands, function ScanVars() looks for a "Print" byte code then proceeds to generate a new variable identifier. In a real situation, we can't automatically do that, because it may just be an empty PRINT: statement with no text string at all. In that case, all the statement is saying is: PRINT (a newline) It would be a wasted effort to go through the trouble of generating a unique variable name. Additionally, there is the distinct possibility that the print statement will contain multiple strings, or variables delimited by commas or semi-colons. Therefore, we need to build a loop that will cycle through the program line until it reaches the end of the line. We haven't begun working with assigning variables yet, so for now we just have to seek out multiple quoted strings. For the sake of argument, let's look at this code fragment from function parse_print(), in file: Output.c:

void parse_print() [snip] /* --- LOOP: multiple print statements --- */ while(ch != '\n') {/* --- print variable --- */ if(isalpha(ch)) { strcpy(s_holder, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { get_strvar(); /* string variable */ } else { get_prnvar(); /* numeric variable */ } } /* --- next char is a quote -- */ else if(ch == quote) { get_prnstring(); } /* --- error: --- */ [snip] /* --- LOOP: if more to print --- */ [snip]

The loop we end up with is probably going to resemble this. For now though, all we need to be concerned with is being able to handle multiple strings and delimiters.

479

MULTIPLE STRINGS:
We will begin this section by making the changes we've talked about that will allow us to print multiple strings that reside on the same program line. Here is the new code for ScanVars():

void ScanVars() { char ch; int pi;

/* identify and write all variables */

line_ndx = 0; while(line_ndx < nrows) { get_token(); if(token == 4) /* the print token */ { pi = 0; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* found a quoted string */ { e_pos = pi; GetNewVar(); /* get new varname */ pi = e_pos; } else { pi++; /* get next char */ } } } line_ndx++; } writeln("NEWLINE DB 13,10,\'$\'"); } /*---------- end ScanVars ----------*/

You'll see how it's changed from the last version. It looks a little smaller in size, but, that's because the parts that do most of the work have been cut out and put in their own functions. Two things worth pointing out; • first, there is the call to GetNewVar(), this is where all the work gets done, • second, there is the addition of the writeln statement just prior to exiting the function. We are going to create a default variable for the newline. When ever the program calls for a newline, then this string will be sent to the display.

Here is the code for GetNewVar():

480

void GetNewVar() { char ch, NVar[12], a_string[BUFSIZE]; int pi, x=0; pi = e_pos; s_pos = pi; /* save pi value */ /* --- get new varname --- */ strcpy(NVar, NewVarname()); write_str(NVar); write_str("\tDB \'"); pi++; ch = p_string[pi]; while(ch != '\"') /* copy up to quote */ { a_string[x] = ch; x++; pi++; ch = p_string[pi]; } a_string[x] = '\0'; e_pos = pi; strcat(a_string, "\',\'$\'"); writeln(a_string); /* write declaration */ /* --- save varname in array --- */ StorVar(NVar); pi = e_pos; ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; e_pos = pi; } } /*---------- end GetNewVar ----------*/

The first part is pretty much as it was, before being moved here. The "save varname" section is different. We now make a call to another new function StorVar() and pass to it "NVar". Here's the code for StorVar():

void StorVar(char *name) { char ch, varname[12]; int pi, x=0, ii, len; unsigned size; strcpy(varname, name); len = strlen(varname); ii = (s_pos + len); if(s_pos > 0) { for(;x < s_pos; x++)

/* how many chars to xfer */

481

(Continued) { s_holder[x] = p_string[x]; } x = 0; } for(pi=s_pos; pi < ii; pi++,x++) { s_holder[pi] = varname[x]; /* copy varname */ } x = pi; s_pos = pi; pi = e_pos; pi++; ch = p_string[pi]; while(ch != '\0') { s_holder[x] = p_string[pi]; x++; pi++; ch = p_string[pi]; } s_holder[x] = '\0'; strcpy(p_string, s_holder); /* --- save p_string to array --- */ x = strlen(p_string); len = strlen(array1[line_ndx]); if(x > len) { size = (x+1); ii = line_ndx; array1[ii] = realloc(array1[ii], size * sizeof(char)); } strcpy(array1[line_ndx], p_string); /* store to array */ e_pos = s_pos; } /*---------- end StorVar ----------*/

This function does quite a number of things and requires some explaining. In general; • 1) it copies the varname to a temporary string, • 2) then it transfers the rest of the string, beyond the second quote, to the temp string. example: p_string = "hello world", "testing", "123" temp = Var1, "testing", "123" • 3) in the event that this is the second or third pass through, the first part of p_string will be copied to the temp string up to the first quote. s_pos will be pointing to the index of the first quote, example: p_string = Var1, "testing", "123" temp = Var1, s_pos--------^ then it resumes at step one, above, • • it then will replace p_string with this revised version, it will repeat this process for each quoted string.

482

To start; • we need to get a count of how many characters to transfer, that is, what is the varname length, • in this statement: ii = (s_pos + len); variable ii contains the sum of s_pos and len, • s_pos is the index of where the first quote is located, • under the section marked: "save p_string to array", after the varname and the remainder of the program line have been transferred, the new program line is copied to array1[line_ndx]. There is the potential for a very real problem, one that could crash the program, and that is if the new program line is longer than the old program line. Example: array1[line_ndx] = ["do", "ra", "me"] p_string = [Varx1, Varx2, Varx3] As you can see, we can't just copy p_string into array1[], due to the size difference. What is needed in this case is to resize array1[]. That is exactly what the "if" expression does, in that event. Okay, open file: Asmutils.c. Copy these (above) three functions to it. Additionally, we need to make a copy of function get_varname(), and put it in this file too. get_varname() is in file: Variable.c. Copy it to the bottom of Asmutils.c. We will need it later, in Do_print(). One last thing, in function Epilog(), add the line shown below so that it displays how many program lines have been compiled:

void Epilog() /* Write the Epilog */ [snip] printf("Lines compiled: %d\n", nrows); printf("Done.\n"); } /*---------- end Epilog ----------*/

Now save Asmutils.c and close it. Open file Asmfunct.c. Here is the new version of Do_print():

void Do_print() { char ch, temp[VAR_NAME]; int pi; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi;

483

(Continued) printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); return; } /* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; strcat(temp, ")"); /* --- write assembly --- */ write_str(" mov dx, offset("); writeln(temp); writeln(" call PRINTSTR"); pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9 ; code for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } } } /*---------- end Do_print ----------*/

The first thing we have here, under the heading: "print newline", is a utility for handling the empty PRINT statement. As you can see, if the statement consists of a colon or a newline, then register DX is loaded with the offset for variable "newline" and PRINTSTR is called. Next is the loop structure that scans across the print statement extracting each variable name. Notice this portion at the top of the loop: while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname());

484

Here is why we needed to make a copy of function get_varname(). Rather than write a new function, when we already have one that does the job, we just copied it. After the varname is written to the source file, we begin looking for a delimiter, such as a comma, colon or newline. If the next character is a comma, the "generic print character" routine is written to the source file, with the ascii code for a Tab. writeln(" mov dl, 9 ; code for tab");

Also, the printchrctr flag is set: printchrctr = 1; If the next character following the varname is a semi-colon, then no special action is taken. This will force the next item to be printed on the same line. The loop then continues back to the top, looking for more variables to write. Copy function Do_print() to Asmfunct.c. Save and close it. Now compile Bxbasm.c. Using Bxbasm.exe, compile this version of Test.bas: test.bas version 12.4 CLS PRINT PRINT "hello "; PRINT "world!": "test"; "ing", "test"; "ing", PRINT "123" PRINT "", "Assembly"; " Language": "", "", "is "; "pretty "; "cool!" ' -----------------------------------------END ' -----------------------------------------'

Doesn't that last print line look interesting ? Take a look at Test.asm, see how that all translated. Now execute:

Test.com

485

CONCLUSION
Well this has been interesting, don't you think ? • BEEP, • CLS, • END and • PRINT, in Assembly Language! We will be doing a lot more work in Assembly Language in the chapters to come.

486

CHAPTER - 13
INTRODUCTION
Welcome back. In the last chapter, working on our Assembly Language compiler, we got to the point where our vocabulary included; BEEP, CLS, END and PRINT. I'd like to resume where we left off and expand our language definition even further. We can begin by adding program redirection with the GOTO and GOSUB keywords. Let's get started.

GOTO-GOSUB-RETURN:
In years past, the Basic programming language has often been berated because of programmers liberal usage of the GOTO command. While Basic is by no means the only language with a GOTO command, early versions of Basic lent themselves to misuse of the GOTO. This may be because those early versions were somewhat limited in their ability to build structured programs. If you are one of those who hates Basic's GOTO command, then you're going to love Assembly language for all it's "jump" instructions. A "jump" is just another word for "goto" and Assembly language is built on "jumps". In fact, the result of any conditional expression is a "jump". Assembly language has no IF-Then-Else construct, as such. In assembly language an IF construct would resemble something like this:

CMP AL, BL ; JZ Step1 ; ... ; ... JMP Step2 ... Step1: MOV AX, [SI] ...

compare value-A to value-B IF equal, THEN jump to Step1 ELSE continue

;do stuff

The way you view an IF/Then/Else construct in Basic is different from how you see it in Assembler. In Basic you would see it as:

IF(this expression is True) THEN or ELSE (do this list of things) [list.....] (jump over the list and resume)

487

In Assembler you would see it as:

IF(this expression is True) THEN or ELSE (do this list of things) [list.....] (jump to: LABEL) LABEL:
Basic's GOSUB is a little bit closer to resembling an assembly language "Call" instruction, where a "Call" instruction is always followed by a "Return" instruction. Let's begin by enabling the GOTO, GOSUB, RETURN and Block-Label tokens in parser(). Open file Bxbasm.c. We need to remove the comments around case-5, case-9, case-10 and case- -1, as I've done here:

void parser() { int ab_code=4, x=line_ndx; switch(token) { /* case 1: /* /* case 2: /* /* case 3: /* case 4: /* Do_print(); break; case 5: /* Do_goto(); break; case 6: /* Do_beep(); break; case 7: /* Do_cls(); break; case 8: /* Do_end(); break; case 9: /* Do_gosub(); break; case 10: /* Do_return(); break; /* /* /* /* /* case case case case case 11: 12: 13: 14: 15: /* /* /* /* /* LET */ CLEAR */ LOCATE */ PRINT */ GOTO */ BEEP */ CLS */ END */ GOSUB */ RETURN */

FOR */ NEXT */ IF */ ELSEIF */ ELSE */

488

(Continued) /* case 16: /* ENDIF */ case -1: /* block label */ Do_label(); break; default: a_bort(ab_code, x); break; } } /*---------- end parser ----------*/

Make the above changes to parser() in Bxbasm.c. Save and close Bxbasm.c. Open file: Asmfunct.c. A GOTO command is an explicit jump to a specific label or location within the program. That translates very easily to assembly language. The only information needed is the destination label or memory location. Here is the code for Do_goto():

void Do_goto() { char temp[LLEN]; strcpy(temp, get_varname()); /* --- write assembly --- */ write_str(" jmp "); writeln(temp); } /*---------- end Do_goto ----------*/

As you can see, we use function get_varname() to retrieve the destination label from the program statement. This is one of the benefits of creating multiple use routines. We didn't have to create a new function from scratch for retrieving the label name. Function get_varname() works just as well. We are going to need a function for handling the output of labels, so here is the code for Do_label():

void Do_label() { char temp[LLEN]; strcpy(temp, label_nam[line_ndx]); PostLabel(temp); } /*---------- end Do_label ----------*/

489

As I indicated previously, the assembly language "Call" is very similar to Basic's "Gosub" command. Both commands "jump" to a subroutine or function marked by a label name and on encountering the "Return" command, the program returns to the step immediately following the "Call" or "Gosub". Here is the code for Do_gosub():

void Do_gosub() { char temp[LLEN]; strcpy(temp, get_varname()); /* --- write assembly --- */ write_str(" call "); writeln(temp); } /*---------- end Do_gosub ----------*/

As you can see, this function is almost identical to the GOTO function. Here is the code for function Do_return():

void Do_return() { char temp[LLEN]; /* --- write assembly --- */ writeln(" ret"); writeln("; --------------------------------------------------"); } /*---------- end Do_gosub ----------*/

Now copy the above new functions to file: Asmfunct.c. Save and close file: Asmfunct.c. Now add the prototypes for the new functions we just added, to file: Prototyp.h, as shown below, under the heading for Asmfunct.c: /* void void void void void void void void void void void void Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); Do_beep(void); Do_print(void); PrintStr(void); PrintChr(void); Do_goto(void); Do_label(void); Do_gosub(void); Do_return(void);

490

We need to borrow a routine from Bxbasic's: Ifendif.c; get_vtype(). This is another case of using a pre-built routine. Copy the code for function get_vtype(), below, to file: Asmutils.c.

int get_vtype(int pi) { char ch; int type=0; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(ch == '$') { type = 3; /* a string variable */ } else if(strchr(" =<>%!#", ch)) { type = 1; /* a numeric variable */ } return type; } /*------- end get_vtype --------*/

Okay, make sure you did everything correctly and save everything. Compile Bxbasm.c. Now, using Bxbasm.exe, compile this version of Test.bas: ' test.bas version 13.1 CLS GOSUB Printit GOTO TheEnd ' -----------------------------------------Printit: PRINT "hello world!" RETURN ' -----------------------------------------TheEnd: END ' ------------------------------------------

Enter:

Bxbasm Test

Then execute Test.exe. Before we continue, using your editor, examine the assembly code that was output for file: Test.asm.

491

LOCATE:
The next command we will add to Bxbasm is the LOCATE command. The "Locate" command is used for positioning the cursor to a specified Row and Column on the display screen. Since we are currently using the standard 25 by 80 console mode display, our screen grid ranges from Row 0, Column 0 to Row 24, Column 79. More simply put, from: 0,0 to 24,79. The first thing we need to do, is to enable the token for the LOCATE keyword, that is "case-3", in function parser(). In file: Bxbasm.c, in function parser(), remove the comments from "case-3" and add the function call to Do_locate(), as shown below:

void parser() { int ab_code=4, x=line_ndx; switch(token) { /* case 1: /* LET */ /* case 2: /* CLEAR */ case 3: /* LOCATE */ Do_locate(); break; [snip] /*---------- end parser ----------*/

Save file Bxbasm.c. Below is the code for function Do_locate(). You will notice that, except for the bottom part, it closely resembles function locate(), from Bxbasic's Output.c. Really, the only thing different here is that instead of acting on the command, we are instead writing to the .asm source file the instructions which will perform that function.

void Do_locate() { char ch, rows[VAR_NAME], cols[VAR_NAME]; int pi, ab_code=3, x=line_ndx; pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; e_pos = pi; /* --- get row --- */ if(isalnum(ch)) { strcpy(rows, get_varname()); /* digit-or-varname */ pi = e_pos; } else /* failed to find alpha/num */ { a_bort(ab_code, x); } pi = iswhite(pi);

492

(Continued) ch = p_string[pi]; if(ch == ',') /* comma separates row and column */ { pi++; pi = iswhite(pi); ch = p_string[pi]; } else { a_bort(ab_code, x); } e_pos = pi; /* --- get column --- */ if(isalnum(ch)) { strcpy(cols, get_varname()); /* digit-or-varname */ } else { a_bort(ab_code, x); } /* --- now position cursor --- */ /* --- write assembly --- */ writeln(" mov ah, 2\t\t; code:set cursor"); write_str(" mov dh, "); write_str(rows); writeln("\t\t; row"); write_str(" mov dl, "); write_str(cols); writeln("\t\t; column"); writeln(" INT 10H"); } /*---------- end Do_locate ----------*/

Copy Do_locate() to file Asmfunct.c. Save and close file Asmfunct.c. Now we need to add a prototype for this function to Prototyp.h. Under the heading of Asmfunct.c, add the prototype as shown here: /* Asmfunct.c [snip] void Do_locate(void); */

Now recompile Bxbasm.c. And, with Bxbasm.exe, try out this new version of Test.bas:

493

test.bas version 13.2 CLS LOCATE 10, 20 GOSUB Printit GOTO TheEnd ' -----------------------------------------Printit: PRINT "hello", "world!" RETURN ' -----------------------------------------TheEnd: END ' ------------------------------------------

'

With your editor, examine the assembly source code for Test.asm and make sure you understand what is taking place.

VARIABLE ASSIGNMENTS:
The next thing to do is to begin assigning variables. We will begin with character strings. First, you have to remember (or realize if you don't already), that Assembly Language is an extremely primitive form of programming. Unlike high level languages, Assembly does not have any built-in character string handling functions. I.E., you can't give assembler an expression as you would in Basic, such as:

ABC$ = "hello"
Assembler has no means of coping with that kind of statement. Not by itself at least. If we were to transcribe the above statement and output a line of source code that looked like this:

ABC$ DB

'hello'

Then this would no longer be a variable, it would be a "constant", just like the constants generated when we translate a Print statement. In the above, the word "hello" exists inside a tiny space between the quotes. A string variable can not be boxed in like that. Unlike a single character variable or a numeric variable, strings change their shape. It has to have the ability to grow or shrink in size as the case may be, as the data and it's size changes. Then, how do we deal with this? What we are going to do during the scanning process, when we encounter an assignment statement such as the one above, is to output a statement that looks more like this:

ABC$ DB 0 DB 256

DUP(' ')

Okay, but, what does this output mean ? • Well, the first thing we need to do is get and maintain the length of the character string. That count will be stored in the single byte, on the first line right after the variable name, which is initially assigned a value of zero.

494

•

On the second line is an instruction to reserve a storage area of 256 bytes that will be used to contain the character string data. 256 bytes ?, Isn't that a lot for a string that says: "hello" ?

You might be asking;

Well, yes. It is a lot. But, we can't be certain that it will always hold only five characters. Since Assembly language has no built-in functions for dealing with strings of data, we have to build our own, from scratch. String functions are very complex to construct and they take time. What we are going to do is take a "short-cut", an easy way out of a complicated situation. Since we don't yet have any mechanism for dynamically allocating memory resources for data storage, we are going to "impose" ourselves on the resources and just grab as much memory as we need to solve our problem. We are going to give our variable plenty of extra room, much more than it needs, but we are setting an upper limit of 256 characters. There is an old saying: "...you can always throw money at a problem" In our case: "...you can always throw memory at a problem" Since we have not yet constructed a character string handling function and lack any memory allocation resources, we will begin at "square one" and build a primitive string handler. There are external resources at our disposal that we will learn to use and they will facilitate writing programs in Assembly Language. We have already begun to use some of them, such as when we print information to the screen by using the "INT 10" or "INT 21" interrupt function calls. These are part of the BIOS and DOS built-in functions. The BIOS and DOS are made up of dozens of these functions. The word BIOS means "Basic Input Output System". That's everything the computer does to communicate with the outside world. The system BIOS has a very long list of functions that we can and will be using in writing our programs. DOS means Disk Operating System. The operating system (MS-DOS) has a huge library of functions that we will be using as well. If it were not for these two systems, the computer wouldn't function. It would just sit there. Everything the computer does after you turn the power switch ON is a result of the BIOS and the DOS (yes, even Windows. Dos is now built directly into the Windows boot-strap loader). Both the BIOS and DOS system's functions are easily accessible in assembly programming. Beyond the BIOS and DOS, there are what are called Standard Libraries of routines. They are pre-written utilities and functions like; dynamic memory allocation and string handling routines that we will be discussing in the future. Programmers have been writing programs in Assembly language for many years and they have assembled large collections of functions and routines that they can "cut-n-paste" into a program to speedup the process of writing programs. Why reinvent the wheel every time you want to start writing a new program. Getting Started: When the Basic source code has a PRINT command, our compiler generates a variable name and stores the character string data, near the beginning of the program, in association with the variable name. Actually, it's a "constant" (or, more formally, a literal) and not a variable, because the data doesn't change. After the character string is assigned to a permanent memory location, the actual character string in the program is replaced with the name it has been assigned. So, when the Print command is executed, the Print function is passed

495

the name associated with the data, which of course refers to it's memory address. We will be using a method similar to this when dealing with assigning character strings to a string variable. Starting with file Bxbasm.c, make the following changes to the global vars list, so that they include the three new variables shown here: /* ------ global vars ------------ */ [snip] int clrscreen=0; /* flag: CLRSCN int var_count=0; /* variables counter int printstring=0; /* flag: PRINT int printchrctr=0; /* flag: PRINT char **VarIndx; /* variables name index int VarNdxCnt=0; /* variables count int stringcopy=0; /* flag: string copy /**/ [snip]

*/ */ */ */ */ */ */

Also, make the following change to the "function includes" list: /* --- function includes --- */ #include "input.c" #include "utility.c" #include "asmutils.c" #include "asmfunct.c" #include "asmerror.c" #include "asmvars.c"

In function parser(), remove the comments from around "case-1", and add the call to Do_let(), as shown here:

void parser() { int ab_code=4, x=line_ndx; switch(token) { case 1: /* LET */ Do_let(); break; /* case 2: /* CLEAR */ case 3: /* LOCATE */ Do_locate(); break; [snip]

Make those changes and save Bxbasm.c.

496

Open file: Asmutils.c. We need to make some changes by adding a section that will scan for LET statements. Here is what the new code segment (in snippets) is going to look like: /* --- LET: statement --- */ if(token == 1) { pi = 0; e_pos = pi;

To start with, we need to extract the Left-Hand (or destination) variable name from the program line: /* --- get varname --- */ strcpy(varname, get_varname()); strcat(varname, "$");

Notice the last line in the above. What we are going to do is include the "$" symbol in the variable name. That is going to help make the variable name a unique identifier. The next thing the scanner will do (below) is call a special function (check_VarName()) that is going to look up the variable's name, in a lookup table, to see has it has already been declared. if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\t\tDB 256 DUP(\' \')"); }

Why ? Because, you can only declare a variable, (or identifier,) once in a program. A variable name can be referenced numerous times throughout a program, but, we don't want every occurrence of a variable's name to generate a new declaration of the same variable. So, if the variable was previously declared, in an earlier part of the program, it's name would already exist in the lookup table and the set of instructions that follows would be skipped. Function check_VarName() will be explained later, but, suffice it to say that it returns a Boolean value based on the results of the lookup table. If variable name "X" does not exist yet, the name will be written to the output file along with a byte declaration, followed by an un-named declaration of 256 bytes of blank space. Here is what the output, for a variable ABC$, would look like :

ABC$ DB 0 DB 256

DUP(' ')

Next, we enter a "while" loop to process an entire program line up to the end-of-line. Just as when dealing with the PRINT statement, we will call GetNewVar() to handle the generation of a "constant identifier" and storing the string data:

497

pi = e_pos; pi++; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') { e_pos = pi; GetNewVar(); pi = e_pos; } else { pi++; } } }

/* found a quoted string */ /* get new varname */

/* get next char */

Here is the entire new code for function ScanVars():

void ScanVars() /* identify and write all variables */ { char ch, varname[VAR_NAME]; int pi, ii; line_ndx = 0; while(line_ndx < nrows) { get_token(); /* --- LET: statement --- */ if(token == 1) { pi = 0; e_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); strcat(varname, "$"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\t\tDB 256 DUP(\' } pi = e_pos; pi++; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* { e_pos = pi; GetNewVar(); /* pi = e_pos; } else { pi++; /* }

\')");

found a quoted string */ get new varname */

get next char */

498

(Continued) } } /* --- PRINT: statement --- */ else if(token == 4) { pi = 0; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') { e_pos = pi; GetNewVar(); pi = e_pos; } else { pi++; } } } line_ndx++; } writeln("NEWLINE DB 13,10,\'$\'"); /* --- clear var name index --- */ for(ii=0; ii < VarNdxCnt; ii++) { free(VarIndx[ii]); } free(VarIndx); } /*---------- end ScanVars ----------*/

/* found a quoted string */ /* get new varname */

/* get next char */

Notice at the bottom portion of the above function, the additional code segment. That routine will erase the "lookup table", when it is no longer needed. If the variable's name does not yet exist, it is then added to the lookup table and a Boolean True or False is returned to the calling function. Here is the code for the check_VarName() function:

int check_VarName(char *name) { char varname[VAR_NAME]; int ndx=0, bool=0; strcpy(varname, name); if(VarNdxCnt == 0) /* no vars exist yet */ { InitVars(); ndx = (VarNdxCnt-1); strcpy(VarIndx[ndx], varname); } else

499

(Continued) { while((ndx < VarNdxCnt)&&(strcmp(varname,VarIndx[ndx]) != 0)) { ndx++; } if(ndx >= VarNdxCnt) /* var name does not exist */ { InitVars(); ndx = (VarNdxCnt-1); strcpy(VarIndx[ndx], varname); } else { bool = 1; /* var name exists */ }

} return bool; } /*---------- end check_VarName ----------*/

Function InitVars() is the actual routine that maintains the lookup table's size. The lookup table is in the form of an array and it is dynamically lengthened for each new variable. Doing it this way places no upper limit on the number of variable names:

void InitVars() { int ndx; unsigned size; if(VarNdxCnt == 0) { ndx = VarNdxCnt; VarNdxCnt++; size = VarNdxCnt; VarIndx = malloc(size * sizeof(char *)); size = VAR_NAME; VarIndx[ndx] = malloc(size * sizeof(char)); } else { ndx = VarNdxCnt; VarNdxCnt++; size = VarNdxCnt; VarIndx = realloc(VarIndx, size * sizeof(char *)); size = VAR_NAME; VarIndx[ndx] = malloc(size * sizeof(char)); } } /*---------- end InitVars ----------*/

Function GetNewVar() has been modified to write the correct output to the source file by including the string length:

500

void GetNewVar() { char ch, NVar[12], a_string[BUFSIZE], count[4]; int pi, x=0; pi = e_pos; s_pos = pi; /* --- get new varname --- */ strcpy(NVar, NewVarname()); write_str(NVar); write_str("\tDB "); pi++; ch = p_string[pi]; while(ch != '\"') /* copy up to quote */ { a_string[x] = ch; x++; pi++; ch = p_string[pi]; } a_string[x] = '\0'; e_pos = pi; itoa(x, count, 10); /* x: is the string length */ writeln(count); write_str("\t\tDB \'"); strcat(a_string, "\',\'$\'"); writeln(a_string); /* write declaration */ /* --- save varname in array --- */ StorVar(NVar); pi = e_pos; ch = p_string[pi]; if(strchr(":;,", ch)) { pi++; e_pos = pi; } } /*---------- end GetNewVar ----------*/

Copy the above functions to file: Asmutils.c. We need to create a new file to contain the variables functions we will be building. Create a new file and name it: Asmvars.c. Copy this header information to the top:

501

/* bxbasm : AsmVars.c : alpha version */ /* ----- includes ----- */ #include "prototyp.h" /* variable types: int=no symbol, long=%, float=!, double=#, string=$ */

The first function we will add to it is: Do_let(). This function for the most part is a copy of the upper portion of function parse_let() from Bxbasic:

void Do_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=11, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ if(ch == '$') { e_pos = pi; copy_string(varname); } else { a_bort(0, x); } } /*---------- end Do_let ----------*/

At this point, after retrieving the destination variable's name, it makes a call to function copy_string(). At this point in the program, we have already completed the input scanning. We are now translating each program statement into Assembly language and writing the assembly code to the source file. Function copy_string() writes the instructions that will in effect copy the source-string to the destination-string:

502

void copy_string(char *name) { char ch, varname[VAR_NAME]; int pi, len; len = strlen(p_string); strcpy(varname, name); /* --- left varname --- */ strcat(varname, "$)"); write_str(" mov di, offset("); write_str(varname); writeln("\t; destination string"); /* --- get right varname --- */ pi = e_pos; pi = get_alpha(pi, len); e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { strcat(varname, "$"); } write_str(" mov si, offset("); write_str(varname); write_str(")"); writeln("\t; source string"); writeln(" call COPYSTR"); stringcopy = 1; } /*---------- end copy_string ----------*/

The source file output from this function will look like this:

mov di, offset(destination$) mov si, offset(source$) call COPYSTR
It also sets the flag: "stringcopy", so that the data transfer procedure will be written to the source file. Copy the above two functions to file Asmvars.c. Next we need to make some changes to file: Asmfunct.c. Start by making the changes to function Do_functions() as shown here:

503

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(printstring == 1) { PrintStr(); } if(printchrctr == 1) { PrintChr(); } if(stringcopy == 1) { StringCopy(); } } /*---------- end Do_functions ----------*/

Here is function StringCopy(). This writes the string copy procedures to the assembly source file. Read the comments at the right of the assembly code. Note that register "di" will reference the destination string's address and register "si" will reference the source string's address:

void StringCopy() { writeln("COPYSTR PROC NEAR"); writeln(" push ax"); writeln(" push bx"); writeln(" push cx"); writeln(" xor cx, cx"); /* clear cx */ writeln(" mov cl, byte ptr [si]"); /* get string length */ writeln(" mov [di], cl"); /* copy to destination */ writeln(" inc di"); /* move to first character space */ writeln(" inc si"); /* move to first character */ writeln("CS1:"); writeln(" mov al, [si]"); /* copy character to 'al' */ writeln(" mov byte ptr [di], al"); /* copy to destination */ writeln(" inc si"); /* next */ writeln(" inc di"); writeln(" loop CS1"); /* decrement cx, next loop */ writeln("CS2:"); writeln(" mov byte ptr [di], '$'"); /* add terminator */ writeln(" pop cx"); writeln(" pop bx"); writeln(" pop ax"); writeln(" ret"); writeln("COPYSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end StringCopy ----------*/

504

Make note of these two lines: writeln(" ... writeln(" mov cl, byte ptr [si]"); /* get string length */

loop CS1");

/* decrement cx, next loop */

The second line uses the "loop" instruction. In a "loop", the CX register is loaded with the number of loops that have to be cycled through. In this case, we load CL with the length of the source string, contained in the address referenced by [si]. In these two lines: writeln(" writeln(" inc inc di"); si"); /* move to first character space */ /* move to first character */

both the source and the destination address pointers have to be incremented so that they are each pointing at the first data cell. Here is a illustration of how this looks in memory:

si--->[5][h|e|l|l|o] length--^ ^---1st character space di--->[0][ | | | | |...] length--^ ^---1st character space
As you can see, at the outset, they both point to the string length cell. In this line: writeln(" mov byte ptr [di], '$'"); /* add terminator */

after the string data has been transferred, the string will be terminated by a "$" symbol. Function Do_print() has to be modified to work with the new data format and be able to print a string variable:

void Do_print() { char ch, temp[VAR_NAME]; int pi; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch))

505

{ }

writeln(" writeln(" return;

mov dx, offset(NEWLINE)"); call PRINTSTR");

/* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; strcat(temp, ")"); /* --- write assembly --- */ write_str(" mov dx, offset("); writeln(temp); writeln(" inc dx"); writeln(" call PRINTSTR"); pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9\t\t; code:for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } } } /*---------- end Do_print ----------*/

Copy the above functions to file: Asmfunct.c. Save and close it. In file: Prototyp.h, update the function prototypes so that they include all of the following: [snip] /* void void void void void void

Asmutils.c */ Header(void); open_destin(void); Prolog(void); Epilog(void); writeln(char *); write_str(char *);

506

void PostLabel(char *); void assemble(void); void ScanVars(void); char *NewVarname(void); void StorVar(char *); void GetNewVar(void); void assemble(void); /* char *get_varname(void); /* int get_vtype(int); int check_VarName(char *); void InitVars(void); /* void void void void void void void void void void void void void void /* Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); Do_beep(void); Do_print(void); PrintStr(void); PrintChr(void); Do_goto(void); Do_label(void); Do_gosub(void); Do_return(void); Do_locate(void); StringCopy(void);

Note only */ Note only */

Asmvars.c */ void Do_let(void); void copy_string(char *); [snip]

Save everything and compile Bxbasm.c. Then, with Bxbasm.exe, compile this version of Test.bas: test.bas version 13.3 abc$ = "test" CLS LOCATE 10, 20 GOSUB Printit GOTO TheEnd ' -----------------------------------------Printit: PRINT "hello", "world!" RETURN ' -----------------------------------------TheEnd: END ' -----------------------------------------'

507

With your editor, take a look at the assembly source code generated in Test.asm. Now compile this Test.bas: test.bas version 13.4 abc$ = "test" abc$ = "testing" CLS LOCATE 10, 20 GOSUB Printit GOTO TheEnd ' -----------------------------------------Printit: PRINT "hello", "world!" RETURN ' -----------------------------------------TheEnd: END ' -----------------------------------------'

Below is the assembly source code generated. Notice how abc$ exists only once in the variables declarations. Look at the program code, after "Start". See how both assignments to variable abc$ are written: ; ; *************BxbAsm Compiler*************

jmp START ; abc$ DB 0 DB 256 DUP(' ') VARX1 DB 4 DB 'test','$' VARX2 DB 7 DB 'testing','$' VARX3 DB 5 DB 'hello','$' VARX4 DB 6 DB 'world!','$' NEWLINE DB 13,10,'$' ; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov di, offset(abc$) ; destination string mov si, offset(VARX1) ; source string call COPYSTR mov di, offset(abc$) ; destination string mov si, offset(VARX2) ; source string call COPYSTR [snip]

508

UTILS-INPUT:
We need to cleanup some of the clutter that's starting to take place in Asmutils.c. It has a few genuine utilities that we may use in the future, but, we are filling it up rapidly with the input scanning routines. What we need to do is create a separate "input" file for storing the input, scanning and prologue routines. To begin with, open file: Bxbasm.c. Change the "function includes" section to read as follows: /* --- function includes --- */ #include "input.c" #include "utility.c" #include "asminput.c" #include "asmutils.c" #include "asmfunct.c" #include "asmerror.c" #include "asmvars.c" [snip]

Next, create a new file and name it: "Asminput.c". Then copy this header information to the top of the file:

/* bxbasm : Asminput.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h"

From Asmutils.c, copy these functions to Asminput.c: Header() open_destin() Prolog() ScanVars() check_VarName() InitVars() GetNewVar() StorVar() NewVarname() Epilog() assemble()

Save Asminput.c. Delete those same functions from Asmutils.c, so that only these functions remain in Asmutils.c:

509

writeln() write_str() PostLabel() get_varname() get_vtype()

Save Asmutils.c. In file: Prototyp.h, change the prototypes for those files so that they read as follows: /* void void void /* char /* int /* void void void void int void void void char void void [snip] Asmutils.c */ writeln(char *); write_str(char *); PostLabel(char *); *get_varname(void); Note only */ get_vtype(int); Note only */ Asminput.c */ Header(void); open_destin(void); Prolog(void); ScanVars(void); check_VarName(char *); InitVars(void); GetNewVar(void); StorVar(char *); *NewVarname(void); Epilog(void); assemble(void);

Save file Prototyp.h. Re-compile Bxbasm.c and make sure nothing broke in the process. Then recompile Test.bas and make sure the source output still looks the same and make sure Test.com still works.

DYNAMIC STRINGS:
What we've just completed, in learning how to declare variable space and how to store variables, was instructive, but, it's not very practical. For one thing, our program's size would grow by 256 bytes for each string variable. Now I don't mean how much memory it uses, I mean the actual program's file size. Why ? Because, every time we declare a variable, the assembler reserves 256 bytes of data space at the front end of our program, before the executable code begins.

510

Example:

jmp Start abc$ [0][...256 bytes...] xyz$ [0][...256 bytes...] Var1 [5][hello] Var2 [5][world] ... ... ... ... Start: begin program code mov ax, [blah] xor cx, cx
... or graphically:

[jmp][..............][mov ax,blah][xor cx,cx]...... Variables code begins
In this example, 512 bytes was added to the program size just in those two variables, abc$ and xyz$. In a program that had only 10 string variables, a full 2.5KB would be added to the program size. How do we get around this obstacle ? Well, there are some good StdLib (Standard Library) routines for dealing with this, but, we are not there yet. So we have to come up with our own solution. We need something that, (while it doesn't have to be perfect), will at least work in a similar fashion, as far as declaring and assigning variables and will reduce the programs size. Allow me to illustrate:

end of program

[jmp][.][mov ax,blah][xor cx,cx][...] [..............] declarations code begins end of program variables space

What this example attempts to illustrate is, that, we can move the string data area to the back of the program. In fact, beyond the memory region used by the program code. Rather than having the assembler allocate storage space for strings (256 bytes each) at assemble-time, the memory allocation is performed at runtime. Remember, we still have no dynamic memory allocation routines.

"Hmmm,... that's a neat trick, if you can do it !"
You might be thinking. Yes, it is! But, it's not that hard to do,

not really.

I digress:
It is a common practice for memory allocation routines to employ what is called the: "ZSEG" (or similar name beginning with the letter "Z"), which is a "dummy-segment". We have not had to deal with memory segments up to this point, because, in the ".COM" format programs, all of the Code, Stack and Data reside in one segment of memory, the "Code Segment".

511

In ".EXE" format programs, each portion resides in it's own proprietary memory region or "segment". Thus you have a Code Segment, a Data Segment a Stack Segment and possibly even other segments of memory. The "ZSEG" is a Segment Name that is written to the very END of the program. In other words, after the last piece of assembly code has been written to the source file, the following segment name is tacked on to the end:

[program code] ... ; end of program ;--------------ZSEG SEGMENT ZSEG ENDS
The segment name is declared and then terminated, with no code inserted. The assembler doesn't realize that there is no code inside ZSEG, so it creates a pointer to the segment anyway. Why this "slight of hand" trickery ? First, • • • • •

neither the BIOS nor DOS has a really good memory management tool, for that matter they don't have a memory management tool, (well,....DOS kind of has something, more on that later), after the program loads into memory, all the rest of unused memory is available for whatever you wish to do with it, the problem is, that DOS (and the bios) don't tell you how much memory you have available, nor where it is located, presumably, the available memory begins at the point where the program code ends in memory.

Second, there is a practical reason for naming the segment: "ZSEG"; • many assemblers will position segments in memory in an alphabetical order, • some assemblers require that the segment names be declared at the front of the source code, just like constants and variables, • if you should use an assembler that had this requirement, if you declared it as "MY_SEG" you would have no guarantee that it will be positioned where you want it, at the end of the program, • by naming it "ZSEG", you stand a good chance that it will be located at the end, right where you want it. What we are aiming at is a means of identifying where the program ends and where available memory begins. We need something that we can use as a reference. For us, there is a problem with using ZSEG. It has to do with the version of the A86 assembler we are using. It does not allow any segments to be declared and referenced other than the Code Segment, where our program resides. Therefore, trying to declare ZSEG as a "dummy segment", either at the front or the back of the program, won't work for us. There is a solution, however. All we have to do (and this works as far as I can tell) is to place a label at the end of the program, like so:

... ; end of program ;--------------ZSEG:
At least A86 seems to accept this trailing label without complaining.

512

With the "ZSEG:" label pointing to the first byte past the end of the program, we can now successfully reference available memory that exists beyond the program. You might be thinking:

"Okay, so how do we assign variables to this region of memory ?"
Well, we are going to use some of what we already have, with a few changes. This is what our current memory model looks like:

abc$ [0][...256 bytes...] xyz$ [0][...256 bytes...]
One byte is declared for the string length, with 256 bytes for the data. Here is what we are going to change that to:

abc$ [0][word:address] xyz$ [0][word:address]
The string length byte will stay, but, it will be followed by a Word (two bytes) that will contain the memory offset (address) of the data area in the higher region of memory. The input scanning routine, (ScanVars()) will write this to the source file for each string variable:

abc$ DB DW xyz$ DB DW

0 0 0 0

; string length ; address offset

Since address offsets are two bytes long, each variable declaration will only add three bytes to the program size. How do we implement and keep track of offset addressing ? We are going to use two new system variables, EndProg and NextFree. When the program first starts, we are going to retrieve the offset of label "ZSEG:" and store that address in variable EndProg. NextFree will always point to the next available offset address, which will be 256 bytes farther down the line from the variable before it. So, at the start of the program, NextFree will have the same offset as EndProg. That means we will plant the offset (address of ZSEG:) in NextFree, too.

513

Example:

EndProg DW 0 NextFree DW 0 ;------------------------Start: mov ax, ZSEG mov EndProg, ax mov NextFree, ax blah... blah... ; end of program ;------------------ZSEG:
When the program encounters a string variable assignment, the variable is assigned the offset address held in NextFree. NextFree will then be incremented by 256 bytes, ready for the next variable. Allocating 256 bytes of memory space for each variable is far from being efficient, but, we aren't doing anything with it anyway and we haven't yet constructed a tool that will dynamically shrink and expand string data space. Ideally, that is the direction we are heading in. Open file: Asminput.c. Here is the first code modification, it is to function Prolog() in file: Asminput.c:

void Prolog() /* Write the Prolog */ { ScanVars(); /* pre-scan for quoted strings */ writeln("; --------------------------------------------------"); writeln("START PROC NEAR"); writeln("; --------------------------------------------------"); writeln(" mov ax, ZSEG ; last byte of program"); writeln(" div SIXTN"); writeln(" mov ah, 0"); writeln(" inc al"); writeln(" mul SIXTN"); writeln(" mov EndProg, ax ; store value"); writeln(" mov NextFree, ax"); writeln("; --------------------------------------------------"); } /*---------- end Prolog ----------*/

You may ask;

"Okay, you explained zseg, EndProg and NextFree, but what do these lines mean ?" div mov inc mul SIXTN ah, 0 al SIXTN

514

Well, (without "spilling the beans" too much yet) DOS organizes memory into little Blocks consisting of sixteen bytes each. Each Block is referred to as a "Segment". The x86 cpu has two types of registers, that can be described as: "Segment Registers" and "Offset Registers". A Segment Register points to one particular 16-byte Block, (beginning with 00H) . The Offset Register will point to an address or "Offset" that is based on the Segment in memory. For instance, in this instruction:

mov

es, 10

; decimal segment

the Segment Register named ES is assigned the value of 10. ES is pointing to the first byte, in the tenth 16-byte block in memory. That would translate to the 160th byte in memory. In this instruction:

mov

si, 10

; decimal offset

the Offset Register named SI is pointed to address 10, relative to the Segment Register. In other words, SI is pointing to the 170th byte in memory. The code fragment shown; • • • •

div mov inc mul

SIXTN ah, 0 al SIXTN

; ; ; ;

divides AX (offset of zseg) by sixteen, rounds the number off, adds one to it, then multiplies it by sixteen, again.

What does this do? It determines the exact address of the first free segment above the program. Doing that allows us to begin using memory on a new segment boundary. Rather than in the same segment as, and the first byte after the program code. Whew! Here is the new ScanVars():

void ScanVars() /* identify and write all variables */ { char ch, varname[VAR_NAME]; int pi, ii; line_ndx = 0; while(line_ndx < nrows) { get_token(); /* --- LET: statement --- */ if(token == 1) { ScanLet(); }

515

(Continued) /* --- PRINT: statement --- */ else if(token == 4) { ScanPrint(); } line_ndx++; } SystemConst(); /* write system constants */ /* --- clear var name index --- */ for(ii=0; ii < VarNdxCnt; ii++) { free(VarIndx[ii]); } free(VarIndx); } /*---------- end ScanVars ----------*/

As you can see, this function has been cleaned up a bit. The LET and PRINT routines have been moved out to their own functions. We will be adding additional things that need to be scanned to ScanVars(), so we will need to keep things organized. Function SystemConst() will write system constants and variables to the source file:

void SystemConst() { writeln("; --------------------------------------------------"); writeln("; Do Not Delete !!!: Constants and system variables"); writeln("; --------------------------------------------------"); writeln("NEWLINE DB 13,10,\'$\'"); writeln("C10 DW 10 ; constant for division"); writeln("C10000 DW 10000 ; constant for division"); writeln("SIXTN DB 16 ; segment multiplier"); writeln("EndProg DW 0 ; ZSEG: last segment"); writeln("NextFree DW 0"); writeln("; --------------------------------------------------"); } /*---------- end SystemConst ----------*/

Function ScanLet() will scan the assignment statement and will write the variable declaration to the source file. It will also handle the processing of the quoted string:

void ScanLet() { char ch, varname[VAR_NAME]; int pi; pi = 0;

516

(Continued) e_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); strcat(varname, "$"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\t\tDW 0"); } pi = e_pos; pi++; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* found a quoted string */ { e_pos = pi; GetNewVar(); /* get new varname */ pi = e_pos; } else { pi++; /* get next char */ } } } /*---------- end ScanLet ----------*/

This is the PRINT scanning handler which was previously part of ScanVars():

void ScanPrint() { char ch; int pi; pi = 0; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* found a quoted string */ { e_pos = pi; GetNewVar(); /* get new varname */ pi = e_pos; } else { pi++; /* get next char */ } } } /*---------- end ScanPrint ----------*/

517

Epilog() has been modified to write the label "ZSEG:" at the end of the source file:

void Epilog() /* Write the Epilog */ { writeln(";"); PostLabel("DONE"); writeln(" INT 20H"); writeln("START\tENDP"); writeln("; --------------------------------------------------"); /* --- write functions --- */ Do_functions(); /* --- mark end of program --- */ writeln("ZSEG:"); writeln("; --------------------------------------------------"); fclose(f_out); printf("Lines compiled: %d\n", nrows); printf("Done.\n"); } /*---------- end Epilog ----------*/

Copy these added functions and changes to file: Asminput.c and save it. The next group of changes will be to file: Asmfunct.c. Below is the new code for function StringCopy(). Take a good look at this function and make sure you understand what the sequence of events are. In particular, the section labeled "MakeNew":

void StringCopy() { writeln("COPYSTR PROC NEAR"); writeln(" push ax"); writeln(" push bx"); writeln(" push cx"); writeln(" mov al, byte ptr [di] ; get destin's count"); writeln(" cmp al, 0 ; is the count 0"); writeln(" jz MakeNew ; create new variable space"); writeln(" jmp CSMake ; else"); writeln("MakeNew:"); writeln(" push di ; save Destin's address"); writeln(" mov ax, NextFree ; get next address"); writeln(" add NextFree, 256 ; increment to next"); writeln(" inc di ; Word pointer to offset"); writeln(" mov [di], ax ; set address"); writeln(" pop di ; get original offset"); writeln("CSMake:"); writeln(" xor cx, cx ; clear cx"); writeln(" mov cl, byte ptr [si] ; copy source count"); writeln(" mov [di], cl ; copy string count");

518

(Continued) writeln(" inc di ; get offset pointer"); writeln(" mov ax, [di] ; copy offset"); writeln(" mov di, ax ; point di to high offset"); writeln(" inc si"); writeln("CS1:"); writeln(" mov al, [si]"); /* copy character to 'al' */ writeln(" mov byte ptr [di], al"); /* copy to destination */ writeln(" inc si"); /* next */ writeln(" inc di"); writeln(" loop CS1"); /* decrement cx, next loop */ writeln("CS2:"); writeln(" mov byte ptr [di], '$'"); /* add terminator */ writeln(" pop cx"); writeln(" pop bx"); writeln(" pop ax"); writeln(" ret"); writeln("COPYSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end StringCopy ----------*/

The changes we are making, in how we access string variables, also affects the print routines. Here is the revised Do_print():

void Do_print() { char ch, temp[VAR_NAME]; int pi; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); return; } /* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; ch = p_string[pi]; /* --- write string variable --- */ if(ch == '$')

519

(Continued) { strcat(temp, "$)"); write_str(" mov bx, offset("); writeln(temp); writeln(" inc bx"); writeln(" mov dx, [bx]"); writeln(" call PRINTSTR"); pi++;

} /* --- write string constant --- */ else { strcat(temp, ")"); write_str(" mov dx, offset("); writeln(temp); writeln(" inc dx"); writeln(" call PRINTSTR"); } pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9\t\t; code:for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } } } /*---------- end Do_print ----------*/

Copy these two functions to Asmfunct.c and save it. Next we need to update the file: Prototyp.h. Make the additions to the list under Asminput.c, as shown here:

520

/* void void void void int void void void char void void void void void

Asminput.c */ Header(void); open_destin(void); Prolog(void); ScanVars(void); check_VarName(char *); InitVars(void); GetNewVar(void); StorVar(char *); *NewVarname(void); Epilog(void); assemble(void); ScanLet(void); ScanPrint(void); SystemConst(void);

Save Prototyp.h and close it. Compile Bxbasm.c. Try Bxbasm.exe out on this Test.bas: ' test.bas version 13.5 CLS abc$ = "test" PRINT abc$ abc$ = "testing" xyz$ = "123" PRINT abc$, xyz$ GOSUB Printit GOTO TheEnd ' -----------------------------------------Printit: PRINT "hello", "world!" RETURN ' -----------------------------------------TheEnd: END ' ------------------------------------------

It's not obvious by the display, but, the variables abc$ and xyz$ are being accessed and printed from high memory. Pretty neat, huh ? With your editor, take a look at the assembly source code in Test.asm. Make sure you look at the very end of the source code.

521

Here are some snippets: [snip] abc$ DB DW VARX1 DB DB VARX2 DB DB xyz$ DB DW

0 0 4 'test','$' 7 'testing','$' 0 0

[snip] ; -------------------------------------------------; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------NEWLINE DB 13,10,'$' C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division SIXTN DB 16 ; segment multiplier EndProg DW 0 ; ZSEG: last segment NextFree DW 0

[snip] mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------call CLRSCN mov di, offset(abc$) ; destination string mov si, offset(VARX1) ; source string call COPYSTR

[snip] COPYSTR PROC NEAR push ax push bx push cx mov al, byte ptr [di] ; get destin's count cmp al, 0 ; is the count 0 jz MakeNew ; create new variable space jmp CSMake ; else MakeNew: push di ; save Destin's address mov ax, NextFree ; get next address add NextFree, 256 ; increment to next

522

(Continued) inc di ; Word pointer to offset mov [di], ax ; set address pop di ; get original offset CSMake: xor cx, cx ; clear cx mov cl, byte ptr [si] ; copy source count mov [di], cl ; copy string count inc di ; get offset pointer mov ax, [di] ; copy offset mov di, ax ; point di to high offset inc si CS1: mov al, [si] mov byte ptr [di], al inc si inc di loop CS1 CS2: mov byte ptr [di], '$' pop cx pop bx pop ax ret COPYSTR ENDP ; -------------------------------------------------ZSEG: ; --------------------------------------------------

523

D86:
An exercise:
We've not really discussed A86 and D86, but, I'd like to take a moment and do an exercise using D86. D86 is the counter-part to A86, it is the disassembler. To run this exercise, you will need to have downloaded the A86.ZIP file. If you have not done that yet, do so now. It should be located in the same subdirectory as this tutorial series. Assuming you have followed my prior instructions concerning the installation of A86.ZIP, you should have a separate subdirectory on your system named "...\A86", or something similar to that. In that subdirectory both A86 and D86 should be located. You are going to need to copy two files from your Bxbasm working directory, where Bxbasm.exe is located, over to the \..\A86\ subdirectory. The two files you will need to copy are: Test.com and Test.sym • • • copy them now. now you will need to drop down to the MS-DOS prompt. change directories to the ...\A86 subdirectory.

At the dos prompt, enter: D86 Test You should see:

[enter]

100

JMP START

Does that look familiar ? Beneath that is a bunch of stuff (like assembly instructions) that does not look familiar. Ignore that. What you are seeing is really the variables and constants declaration area, at the beginning of the program. It looks like assembly instructions because D86 is trying to translate everything it sees from the binary machine code into x86 assembly code. In the bottom left corner of the screen is a block of letters followed by numbers. This is a real-time display of the registers and their contents. • The registers that have an X are general purpose registers. • Registers with an I are special index registers. • Registers that have a P are pointer registers. • Registers that have an S are segment registers. Notice how all the segment registers have the save value. The one un-named register, in the right column, is the status flag register. • • • • Press the F1 key, once. Now that should look quite familiar, take a look at the AX register and note it's contents, press the F1 key again, just once, that will execute the first instruction, notice the new contents of AX: AX 020B that is the offset address of zseg:, (cool huh ?),

524

• •

while keeping an eye on the changes going on in AX, press the F1 key until the cursor reaches address: 0147 at that point, AX will contain the address (or offset) for our first variable: 0210 is that cool or what ? now, press the letter J, (case doesn't matter) and press the Space-Bar once, now key-in: 0204 and press the enter key

• • • •

We just jumped to address 0204, within sight of the end of the program code. See how address: 020A is a RET instruction and it is followed by ZSEG:. Now does this start to make sense ? Take a look at the contents of the AX register. Remember, this is the calculated offset for our first variable. Below ZSEG:, do you see offset address 0210 ? From memory offset 0210 onward, it's ours to play with. To quite D86, press the letter Q and then [enter]. If you want to run through it again, do so. I suggest you read the D86 documentation.

CONCLUSION
Well, what can I say ? We're knee deep in Assembly Language now. There's still more to come. Here is a short list of some of the things we will tackle in the next and up-coming chapters: o dynamically adjusted string lengths, o variable to variable assignments, o integer and floating point variables, o evaluating expressions.

525

CHAPTER - 14
INTRODUCTION
Welcome back. So far we have a small, but, growing vocabulary with a great number of things still to deal with. What I'd like to start out with, in this chapter, is to make a couple of improvements regarding string handling. First, if we were to attempt to compile a Basic line of code like this:

abc$ = ""
which is a "null" string or a zero-length string, the whole program would crash. Don't try it, just trust me. "Why", you might ask, "would we ever want to make a null assignment" ? There are a number of reasons which include; perhaps freeing up memory, or simply resetting a string's length to zero and more I'm sure. At any rate, making a null assignment is a perfectly valid thing to do. If for no other reason, we should at least be able to handle a null string so that the program doesn't crash when we do encounter one. Another thing I'd like to enable is variable to variable assignments. Example:

abc$ = xyz$
We currently have the capability to assign a constant (a quoted string) to a variable, but, more times than not, string assignments involve copying strings, or adding strings together, or slicing strings up in a way that you end up with only a portion of the original string. Then, we need to make sure we can still print our strings after making these changes and improvements. A key thing to remember is that when you change just one little thing, you have to look at the bigger picture and see how many other things were affected by the changes you made.

NULL, VAR-to-VAR:
Currently, this is what our memory model looks like, for this Basic statement:

abc$ = "hello world"

526

low memory region:

abc$ DB 0 (byte)[length] <--- base address DW 0 (word)[high memory offset] <--- address of data area VARX1 DB 11 (byte)[length] DB 'hello world','$' <--- constant string ...------------------------------program code region:

...------------------------------high memory region:

[...256 bytes...]
If instead, the above statement were a null assignment, like this:

<--- string data area

abc$ = ""
while our model wouldn't change, the declaration generated would look like this: low memory region:

abc$ DB 0 (byte)[length] <--- base address DW 0 (word)[high memory offset] <--- address of data area VARX1 DB 0 (byte)[length] DB '','$' <--- constant string ...------------------------------As you can see, the string constant is empty. In order to process this statement, we would need to be able to: • assign a zero value to abc's length, • assign abc$ a data area, in high memory, if it does not yet exist, • insert the high memory address in abc$'s offset Word, • insert the proper string terminator at the correct place in abc$'s data area, Perhaps the way to proceed is: • determine if the source string's length is zero, • if so, redirect and take that course of action; • the destination's length will need to be set to zero and • in the data area, an ascii zero should be placed at offset(0). The code would be something like this:

mov cmp jz

cl, byte ptr [si] cl, 0 ZeroLen

; get source's length ; compare length to zero ; is it zero, jump to subroutine

[snip]-----------------------------ZeroLen: mov al, 0 mov byte ptr [di], al ; insert an ascii zero in ; destination's data area

527

That small addition should pretty much take care of assigning null strings. There is one correction that needs to be made before we get too far into this. In the last chapter, when we checked to see whether or not a variable had been assigned a high memory address, we checked it's length value to see if it was zero or not. That will no longer work after we make the above changes, since a string can now have a zero length. We need another means of checking whether or not a variables has been assigned a data area or not. The simplest answer comes from the "offset Word": low memory region:

abc$ DB DW

0 0

(byte)[length] <--- base address (word)[high memory offset] <--- address of data area

As you can see, when the assembly code is written to the source file, the Offset Word is given a value of zero. As long as that value reads zero we can be assured that it has not been assigned an upper memory address. A zero offset address would not be valid. That would be a reference to byte zero in the PSP. Therefore, the Offset Word is the best thing to use in determining whether or not a variable has been assigned. Here are the code changes to COPYSTR that will enable zero length strings: [snip] push inc mov cmp jz jmp

di di ax, [di] ax, 0 MakeNew CSMake

; save Destination address ; load destin's high-memory offset ; compare:is the offset=0 ; if zero, jump:create new data area

MakeNew:;--------------------------------------------mov ax, NextFree ; get next available offset add NextFree, 256 mov [di], ax ; store offset CSMake:;---------------------------------------------pop di ; retrieve destination xor cx, cx mov cl, byte ptr [si] ; load source's length mov [di], cl ; copy source length to destination inc di mov ax, [di] ; copy destination's offset to AX mov di, ax ; point DI to high-memory offset cmp cl, 0 ; compare source string's length jz ZeroLen ; is it zero, jump inc si CS1:;------------------------------------------------mov al, [si] ; transfer string mov byte ptr [di], al inc si inc di loop CS1 jmp CS2 ZeroLen:;--------------------------------------------mov al, 0 ; create zero length string

528

(Continued) mov byte ptr [di], al inc di CS2:;------------------------------------------------mov byte ptr [di], '$' ; add terminator [snip]

Be sure you study the comments, so that you understand the sequence of events: Here is the new code for function StringCopy(), copy this to file: Asmfunct.c:

void StringCopy() { writeln("COPYSTR PROC NEAR"); writeln(" push ax"); writeln(" push bx"); writeln(" push cx"); writeln(" push di ; save Destin's address"); writeln(" inc di ; get Word"); writeln(" mov ax, [di] ; destination's high-memory offset"); writeln(" cmp ax, 0 ; is the offset=0, not yet assigned"); writeln(" jz MakeNew ; create new variable space"); writeln(" jmp CSMake ; else"); writeln("MakeNew:"); writeln(" mov ax, NextFree ; get next address"); writeln(" add NextFree, 256 ; increment to next"); writeln(" mov [di], ax ; set new high-memory address"); writeln("CSMake:"); writeln(" pop di ; destin length:count"); writeln(" xor cx, cx ; clear cx"); writeln(" mov cl, byte ptr [si] ; get source length"); writeln(" mov [di], cl ; copy length to destin count"); writeln(" inc di ; advance to high-memory pointer"); writeln(" mov ax, [di] ; copy offset address to AX"); writeln(" mov di, ax ; re-point DI to high-memory address"); writeln(" cmp cl, 0 ; string length"); writeln(" jz ZeroLen ; is it zero"); writeln(" inc si"); writeln("CS1:"); writeln(" mov al, [si]"); writeln(" mov byte ptr [di], al"); writeln(" inc si"); writeln(" inc di"); writeln(" loop CS1"); writeln(" jmp CS2"); writeln("ZeroLen:"); writeln(" mov al, 0"); writeln(" mov byte ptr [di], al"); writeln(" inc di"); writeln("CS2:"); writeln(" mov byte ptr [di], '$' ; add terminator");

529

(Continued) writeln(" pop cx"); writeln(" pop bx"); writeln(" pop ax"); writeln(" ret"); writeln("COPYSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end StringCopy ----------*/

That's all it's going to take to enable zero length strings. Copy this code and save it. Now to the issue of printing, we have two different methods of printing string data; one for variables and another for string constants. Not that it's that big a deal, but there should be as much consistency as possible when and where possible. This Basic statement: PRINT abc$ generates this assembly code:

mov inc mov call

bx, offset(abc$) bx dx, [bx] PRINTSTR");

<--- pointer to length byte <--- pointer to Offset Word <--- load DX w/offset address

While this statement: PRINT "hello world" generates this assembly code:

mov dx, offset(VARX) inc dx call PRINTSTR

<--- pointer to length byte <--- pointer to string constant

The reason for this, if you recall, is because constants are stored in the declarations area, in low memory, right beside the string length byte. Like so:

abc$ DB DW VARX1 DB DB

0 0 11 'hello

;(byte)[length] <--- base address ;(word)[high memory offset] <--- address of data area ;(byte)[length] world','$' <--- constant string

What we need to do is to make the compiler generate the same code for either situation. What this will require is the addition if a Word declaration for VARX1, just like we do for abc$. Here is what that will look like:

VARX1 DB DW DB

11 <--- Offset Word 'hello world','$'

530

There is one difference though, we will actually know ahead of time what the offset will be in this case. Because the string constant isn't going anywhere, the offset is going to be just three bytes from the "base" (the length byte). In other words, the first character in the string will be in the third byte above the length byte. So, the offset will be: VARX1 + 3. Here is what the assembly code will look like:

VARX1 DB DW DB

11 (varx1+3) 'hello world','$'

So, the output for abc$ and the constant will now look like this:

abc$ DB DW VARX1 DB DW DB

0 0 11 (VARX1+3) 'hello world','$'

Here are the changes that need to be made to correct the print routines, beginning in file: Asminput.c:

void ScanLet() { char ch, varname[VAR_NAME]; int pi; pi = 0; e_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); strcat(varname, "$"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\tDW 0"); } pi = e_pos; pi++; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* found a quoted string */ { e_pos = pi; GetNewVar(); /* get new varname */ pi = e_pos; } else { pi++; /* get next char */ }

531

(Continued) } } /*---------- end ScanLet ----------*/

In GetNewVar(), this is the sequence that generates the offset address for the string constant: [snip] writeln(count); write_str("\tDW ("); write_str(NVar); writeln("+3)"); [snip]

void GetNewVar() { char ch, NVar[12], a_string[BUFSIZE], count[4]; int pi, x=0; pi = e_pos; s_pos = pi; /* --- get new varname --- */ strcpy(NVar, NewVarname()); write_str(NVar); /* NVar DB # */ write_str("\tDB "); pi++; ch = p_string[pi]; while(ch != '\"') /* copy up to quote */ { a_string[x] = ch; x++; pi++; ch = p_string[pi]; } a_string[x] = '\0'; e_pos = pi; itoa(x, count, 10); writeln(count); write_str("\tDW ("); write_str(NVar); writeln("+3)"); write_str("\tDB \'"); strcat(a_string, "\',\'$\'"); writeln(a_string); /* write declaration */ /* --- save varname in array --- */ StorVar(NVar); pi = e_pos; ch = p_string[pi];

532

(Continued) if(strchr(":;,", ch)) { pi++; e_pos = pi; } } /*---------- end GetNewVar ----------*/

Copy those two functions to Asminput.c and save it. These are the changes that need to be made to file: Asmfunct.c:

void StringCopy() { writeln("COPYSTR PROC NEAR"); writeln(" push ax"); writeln(" push bx"); writeln(" push cx"); writeln(" push di ; save Destin's address"); writeln(" inc di ; get Word"); writeln(" mov ax, [di] ; destination's high-memory offset"); writeln(" cmp ax, 0 ; is the offset=0, not yet assigned"); writeln(" jz MakeNew ; create new variable space"); writeln(" jmp CSMake ; else"); writeln("MakeNew:"); writeln(" mov ax, NextFree ; get next address"); writeln(" add NextFree, 256 ; increment to next"); writeln(" mov [di], ax ; set new high-memory address"); writeln("CSMake:"); writeln(" pop di ; destin length:count"); writeln(" xor cx, cx ; clear cx"); writeln(" mov cl, byte ptr [si] ; get source length"); writeln(" mov [di], cl ; copy length to destin count"); writeln(" inc di ; advance to high-memory pointer"); writeln(" mov ax, [di] ; copy offset address to AX"); writeln(" mov di, ax ; re-point DI to high-memory address"); writeln(" cmp cl, 0 ; string length"); writeln(" jz ZeroLen ; is it zero"); writeln(" inc si"); writeln(" mov ax, [si] ; copy address to AX"); writeln(" mov si, ax ; point SI to string address"); writeln("CS1:"); writeln(" mov al, [si] ; data transfer"); writeln(" mov byte ptr [di], al"); writeln(" inc si"); writeln(" inc di"); writeln(" loop CS1"); writeln(" jmp CS2");

533

(Continued) writeln("ZeroLen:"); writeln(" mov al, 0 ; write an ascii zero"); writeln(" mov byte ptr [di], al"); writeln(" inc di"); writeln("CS2:"); writeln(" mov byte ptr [di], '$' ; add terminator"); writeln(" pop cx"); writeln(" pop bx"); writeln(" pop ax"); writeln(" ret"); writeln("COPYSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end StringCopy ----------*/

In Do_print(), you will see that the code to output a string variable and a constant are now the same:

void Do_print() { char ch, temp[VAR_NAME]; int pi; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); return; } /* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; ch = p_string[pi]; /* --- write string variable --- */ if(ch == '$') { strcat(temp, "$)"); write_str(" mov bx, offset("); writeln(temp); writeln(" inc bx"); writeln(" mov dx, [bx]"); writeln(" call PRINTSTR"); pi++;

534

(Continued) } /* --- write string constant --- */ else { strcat(temp, ")"); write_str(" mov bx, offset("); writeln(temp); writeln(" inc bx"); writeln(" mov dx, [bx]"); writeln(" call PRINTSTR"); } pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9\t\t; code:for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } } } /*---------- end Do_print ----------*/

Copy these changes and save it. Now compile Bxbasm.c. First, we will test that we can assign a zero length string. Try this Test.bas: test.bas version 14.1 CLS abc$ = "" PRINT "abc=", abc$ ' -----------------------------------------TheEnd: END ' -----------------------------------------'

It should have just displayed:

abc$=

As long as it didn't crash, we're okay.

535

Second, to verify that we can make a variable to variable assignments, compile this Test.bas: test.bas version 14.2 CLS abc$ = "this is abc string" xyz$ = abc$ PRINT "abc=", abc$ PRINT "xyz=", xyz$ ' -----------------------------------------TheEnd: END ' -----------------------------------------'

Before going on to the next subject, take a minute and examine the source listing in: Test.asm.

536

CONCATENATE:
Concatenating a string is where you stick two or more character strings together to form a single string. Example:

abc$ = abc$ + xyz$
This is a widely used function and is probably one of the most common string manipulation function used. Even though there are numerous character string manipulation utilities in the "Standard Libraries", we are going to handcraft our own "string cat" utility, just to get the feel of what it takes to do this sort of thing. After all, the purpose of learning Assembly Language is to learn how to do things at the lowest level as possible. Here is a Basic "string-cat" statement that we want to translate:

abc$ = abc$ + xyz$
Let's build a model of what we need to do: • • we will assume that abc$ was assigned: and xyz$ was assigned: "hello " "world"

First, let's look at the variables declaration area (in low memory):

abc$ DB DW xyz$ DB DW

6 [0200H] 5 [0300H]

; ; ; ;

<--<--<--<---

string length high memory offset string length high memory offset

We are going to assume that we have 256 bytes of data space reserved for each character string. Here's a list of things we need to take into consideration: • what is the current string length of abc$ (destination), • what is the length of xyz$ (source string), • will the addition of xyz$ exceed the size limit of abc$, • if so, should we abort the process or do we add only as much as will fit. Second, let's look at the data area for each (in high memory): offset:

[0200H]:[h|e|l|l|o| |$|.... [0300H]:[w|o|r|l|d|$|...
What we want to end up with is: offset:

[0200H]:[h|e|l|l|o| |w|o|r|l|d|$|... [0300H]:[w|o|r|l|d|$|...
To answer the above questions, we need to start by: • examining the length of abc$, is it at the maximum length already (255 bytes) ? if so, then we should abort the procedure,

537

•

if not, add xyz$'s length to abc$'s, does it exceed the maximum length? if so, subtract abc$'s length from the maximum, the result is the quantity to transfer, if xyz$ is null, then the procedure can be aborted.

•

If the transfer is do-able, then we set a pointer to the end of abc$, (the destination), where the first character will be inserted and a pointer to the first character in xyz$: offset:

[0200H]:[h|e|l|l|o| |$|.... set:DI:------------------^ [0300H]:[w|o|r|l|d|$|... set:SI:------^
And then, using a loop, move the contents of [SI] to [DI]. Next, to output the correct assembly code, we need to dissect the Basic statement. There are two ways you can write a string-cat statement. Example:

abc$ = abc$ + xyz$
or:

abc$ = xyz$ + qaz$
While the first statement is a straight concatenation, the second is a compound statement, because, it is an assignment plus a concatenation. So, from the parsing perspective, we need to be able to handle both situations. Additionally, if you look at the first example, the first half of that statement is redundant:

abc$ = abc$
as it appears to be reassigning abc$ to abc$. Question: Do we have to reassign abc$ to abc$, again, as the statement implies, before we can concatenate xyz$ ? No. We can skip right over that part of the statement and move right to the: "+ xyz$".

In fact, we can use the fact that both the left and right variables are the same to assume that we are dealing with a stringcat statement. If we compare the left and right variable names and they prove to be the same:

does: "abc$" = "abc$"
then we jump right to the stringcat subroutine. Everything that follows will involve string concatenations to abc$. If, on the other hand, using this statement as an example:

abc$ = xyz$ + qaz$
the left and right variable names are not the same, this statement can reasonably be assumed to be a normal assignment statement. It is only after that, that we encounter the "+" plus sign, that we can detect that this is also a stringcat statement. We would then branch to the stringcat routine.

538

Beginning with file: Asmvars.c, there are some minor changes that need to be made to Do_let():

void Do_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0; int ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; /* get the type character */ /* --- we now have varname and type --- */ if(ch == '$') /* string variable assignment */ { e_pos = pi; copy_string(varname); } else { ab_code = 5; a_bort(ab_code, x); } } /*---------- end Do_let ----------*/

Below is a new copy_string(). In this version, under the heading of "string cat", the left and right variable names are compared: /* --- string cat --- */ if(strcmp(name, varname) == 0) { cat_string(len); /* are destin & source the same */ } /* a$ = a$+b$ */ If the comparison is True, we jump directly to function: cat_string(). If the comparison is false, we continue on to the string copy routine. It is there that we again test for a string concatenation, after the initial assignment:

539

void copy_string(char *name) { char ch, varname[VAR_NAME]; int pi, len; len = strlen(p_string); strcpy(varname, name); /* --- destination varname --- */ write_str(" mov di, offset("); strcat(varname, "$)"); write_str(varname); writeln("\t; destination string"); /* --- get source varname --- */ pi = e_pos; pi = get_alpha(pi, len); e_pos = pi; strcpy(varname, get_varname()); /* --- string cat --- */ if(strcmp(name, varname) == 0) { cat_string(len); /* are destin & source the same */ } /* a$ = a$+b$ */ /* --- string copy --- */ else { pi = e_pos; ch = p_string[pi]; if(ch == '$') { strcat(varname, "$"); pi++; } write_str(" mov si, offset("); write_str(varname); write_str(")"); writeln("\t; source string"); writeln(" call COPYSTR"); stringcopy = 1; /* --- plus: string cat --- */ pi = iswhiter(pi); ch = p_string[pi]; if(ch == '+') /* is there a string cat */ { strcpy(varname, name); write_str(" mov di, offset("); strcat(varname, "$)"); write_str(varname); writeln("\t; destination string"); /* --- string cat --- */ cat_string(len); } } } /*---------- end copy_string ----------*/

540

Below is the new function: cat_string(). In either of these statements:

abc$ = abc$ + xyz$ string cat-----------^ abc$ = xyz$ + qaz$ ^-------string cat
we advance to the "+" plus sign and that is where this next function begins. In a normal assignment statement, the first thing that happens is that, the instruction to load DI with the offset of abc$ is output to the source file:

mov

di, offset(abc$)

In a "stringcat" situation, it is important that we preserve the offset of abc$, because we will repeatedly need to refer to it during each stringcat operation. So, we need to output a "push" instruction so that we save DI on the stack:

writeln("

push di");

Later, we will "pop" it back before each stringcat. We would then load SI with the offset for xyz$, the source string and make the call to the "CATSTR" procedure:

void cat_string(int len) { char ch, varname[VAR_NAME]; int pi; pi = e_pos; pi++; pi = iswhiter(pi); ch = p_string[pi]; if(ch == '+') { while((pi < len) && (ch != '\n')) { if(ch == '+') { writeln(" push di"); pi = get_alpha(pi, len); e_pos = pi; strcpy(varname, get_varname()); pi = e_pos; ch = p_string[pi]; if(ch == '$') { strcat(varname, "$"); } write_str(" mov si, offset("); write_str(varname); write_str(")"); writeln("\t; source string"); writeln(" call CATSTR"); writeln(" pop di"); } else

541

(Continued) { pi++; } pi = iswhiter(pi); ch = p_string[pi]; } stringcat = 1; } } /*---------- end cat_string ----------*/

This routine continues to loop until the end of line is reached. For this statement:

abc$ = abc$ + xyz$ + qaz$
here is what the source code would look like:

push mov call pop push mov call pop

di si, offset(xyz$) CATSTR di di si, offset(qaz$) CATSTR di

Save these functions to file: Asmvars.c. Here are the changes and additions to file: Asmfunct.c:

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(printstring == 1) { PrintStr(); } if(printchrctr == 1) { PrintChr(); } if(stringcopy == 1) { StringCopy(); } if(stringcat == 1) { StringCat(); } } /*---------- end Do_functions ----------*/

542

Here is the code for procedure: CATSTR. Look carefully at the comments and follow the program flow. See how each of the requirements, previously listed, is met:

void StringCat() { writeln("CATSTR PROC NEAR"); writeln(" push ax"); writeln(" push bx"); writeln(" push cx"); writeln(" xor ax, ax ; clear registers"); writeln(" xor bx, bx"); writeln(" xor cx, cx"); writeln(" mov al, byte ptr [di] ; load destin length"); writeln(" cmp al, 255 ; is destin full"); writeln(" jz CatEnd ; if so, exit"); writeln(" mov bl, byte ptr [si] ; load source length"); writeln(" cmp bl, 0 ; is source empty"); writeln(" jz CatEnd"); writeln(" mov cx, ax"); writeln(" add cx, bx"); writeln(" cmp cx, 255 ; is total len > 255"); writeln(" jl CatStart"); writeln(" sub cx, 255 ; how much over"); writeln(" sub bl, cl ; truncate length"); writeln("CatStart:"); writeln(" mov cl, bl ; copy transfer count"); writeln(" add [di], bl ; copy length to destin count"); writeln(" inc di ; advance to high-memory pointer"); writeln(" mov dx, [di] ; copy offset address to DX"); writeln(" mov di, dx ; re-point DI to high-memory address"); writeln(" add di, ax ; move pointer to end of destination"); writeln(" inc si ; advance to high-memory pointer"); writeln(" mov dx, [si] ; copy address to DX"); writeln(" mov si, dx ; point SI to source address"); writeln("Cat1:"); writeln(" mov al, [si] ; data transfer"); writeln(" mov byte ptr [di], al"); writeln(" inc si"); writeln(" inc di"); writeln(" loop Cat1"); writeln(" mov byte ptr [di], '$' ; add terminator"); writeln("CatEnd:"); writeln(" pop cx"); writeln(" pop bx"); writeln(" pop ax"); writeln(" ret"); writeln("CATSTR ENDP"); writeln("; --------------------------------------------------"); } /*---------- end StringCat ----------*/

Copy and save these to Asmfunct.c.

543

In file Asmerror.c, there is a minor change that needs to be made to a_bort():

void a_bort(int code,int x) [snip] case 5: printf("\nVariable Type error: in program line:"); printf(" %d.\n%s",(line_ndx+1),p_string); printf("Type must be: String \"%c\".\ncode(%d)\n",'$',code); break; case 6: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Useage: LET (variable=assignment):\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; } exit(1); } /*---------- end a_bort ----------*/

In file: Bxbasm.c, add the flag variable: "stringcat": /* ------ global vars ------------ */ [snip] int VarNdxCnt=0; /* variables count */ int stringcopy=0; /* flag: string copy */ int stringcat=0; /* flag: string concatenate */ [snip]

In file Prototyp.h, update these lists to include the prototypes for the new functions: /* void void void void void void void void void void void void void void void Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); Do_beep(void); Do_print(void); PrintStr(void); PrintChr(void); Do_goto(void); Do_label(void); Do_gosub(void); Do_return(void); Do_locate(void); StringCopy(void); StringCat(void);

544

(Continued) /* Asmvars.c */ void Do_let(void); void copy_string(char *); void cat_string(int);

Copy these changes and save everything. Now re-compile Bxbasm.c. Try Bxbasm.exe by compiling this Test.bas: ' test.bas version 14.3 xyz$ = "test " qaz$ = "string" abc$ = xyz$ + qaz$ PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

Execute Test.com. Examine the assembly code generated and make sure you understand what is taking place in each step. Here's what the main part of the code should look like: jmp START ; xyz$ DB 0 DW 0 VARX1 DB 5 DW (VARX1+3) DB 'test ','$' qaz$ DB 0 DW 0 VARX2 DB 6 DW (VARX2+3) DB 'string','$' abc$ DB 0 DW 0 ; -------------------------------------------------; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------NEWLINE DB 13,10,'$' C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division SIXTN DB 16 ; segment multiplier EndProg DW 0 ; ZSEG: last segment NextFree DW 0

545

(Continued) ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------mov di, offset(xyz$) ; destination string mov si, offset(VARX1) ; source string call COPYSTR mov di, offset(qaz$) ; destination string mov si, offset(VARX2) ; source string call COPYSTR mov di, offset(abc$) ; destination string mov si, offset(xyz$) ; source string call COPYSTR mov di, offset(abc$) ; destination string push di mov si, offset(qaz$) ; source string call CATSTR pop di mov bx, offset(abc$) inc bx mov dx, [bx] call PRINTSTR mov dx, offset(NEWLINE) call PRINTSTR TheEnd: jmp DONE [snip]

That's all we are going to do with character strings, for the moment.

INTEGERS:
I think the next logical thing to do is incorporate numeric variables. Numbers, in some ways, are easier to deal with than character strings, but, in other ways they can be more difficult. When dealing with strings, you are working with actual ascii characters that can be stored and displayed in any manner you wish. As we've seen, in the sections we've just completed, after we have stored a character string, we can then output the data to the display simply by giving the print routine the base address of the character string in question. There are both BIOS and DOS routines for outputting characters to the display. This is not quite true when it comes to numeric values.

546

For this discussion, if we limit ourselves to integer values, an integer can range from 0 to 65535 and be stored in a single binary Word (two bytes, (16 bits)). If we are willing to cut the value down to 255, we can store a short integer in a single byte (8 bits). Example:

abc xyz

DB DW

255 ; single byte 65535 ; word (2 bytes)

As easy as it is to store large amounts of data in numeric form, using fewer characters (or bytes) than character strings, the first problem that arises is that numeric values do not translate easily to the display screen. The PC display screen is a character based device. What that means is, that numbers have to be translated into ascii characters in order to be displayed on the screen. Each digit has to be represented by a single character. In the above, the variable "abc", is a single byte and holds the value of 255. To display that number on the screen, is going to require a minimum of three bytes, one for each digit. Example:

[2][5][5]
So, an integer number can range in size from 1 to 5 characters in length. What this means is that we are going to have to construct a mechanism for converting any number value into it's character representation before we can print that number. Our current print routines are set up for printing character strings by furnishing the base address of where the string is stored in memory. Also, each string is terminated by a "$" symbol. • In reality, there is no requirement that a string be terminated by any character in particular. In our implementation, we are using the "$" symbol and this is due to the fact that we are using a particular DOS print routine that uses that symbol as a terminator. All we have to do is, make the value to character conversion and store the resulting character string in a given location, terminate it and pass the base address (offset) to the print routine that we are currently using. For that purpose, we will use a character variable named "IntString":

IntString DB

' ','$' ^^^^^^^^^^

which will consist of 10 blank character spaces and it will be located in the system variables area. In file: Asminput.c, make the following change to SystemConst() by adding "IntString":

547

void SystemConst() { writeln("; --------------------------------------------------"); writeln("; Do Not Delete !!!: Constants and system variables"); writeln("; --------------------------------------------------"); writeln("NEWLINE DB 13,10,\'$\'"); writeln("C10 DW 10 ; constant for division"); writeln("C10000 DW 10000 ; constant for division"); writeln("SIXTN DB 16 ; segment multiplier"); writeln("EndProg DW 0 ; ZSEG: last segment"); writeln("NextFree DW 0"); writeln("IntString DB ' ','$' ; integer string"); writeln("; --------------------------------------------------"); } /*---------- end SystemConst ----------*/

From now on, when making a variable assignment, since we are now going to be working with multiple variable types, we will need to determine what type of variable the statement pertains to. It can either be an integer or a string variable. To make this determination, we will use function get_vtype() and branch accordingly. /* --- get varname --- */ strcpy(varname, get_varname()); type = get_vtype(pi); /* get variable type */

The next thing we need to do is devise some way to distinguish one variable type from another. In Basic, historically the "$" symbol represents a character string and an integer can either use the "!" symbol or no symbol at all (the default). Although we have been using the "$" symbol for string variables by tacking it onto the end of the variable name, unfortunately the "!" is not an allowed character in variable names. If we use no character at all, then we will have a problem with the system generated variable names used to reference string constants. That is because they don't have a symbol either. There will be no way to distinguish "VARX1" from "IntVar", in terms of variable type. What we will have to do is, append a "$" symbol to system generated variables as well as regular string variables. **Note: remember, that when quoted strings are converted to variable names during the input phase, the string is replaced with the system generated name in the program array. That is why we can't just ignore this. This way, when we encounter a variable name, if it doesn't have a terminating symbol, it has to be an integer. In a Basic statement, such as:

abc = 100 abc! = 100
both cases would be interpreted correctly as being an integer variable. For the moment we will append an "I" to the end of integer variables, we will have to come up with something more creative in the future. Here is ScanLet() with the additions and changes made:

548

void ScanLet() { char ch, varname[VAR_NAME]; int pi, type, ab_code=5, x=line_ndx; pi = 0; e_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); type = get_vtype(pi); /* get variable type */ /* --- integer --- */ if(type == 1) { strcat(varname, "I"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDW 0"); } } /* --- string --- */ else if(type == 3) { strcat(varname, "$"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\tDW 0"); } pi = e_pos; pi++; ch = ' '; while(ch != '\n') { pi = iswhiter(pi); ch = p_string[pi]; if(ch == '\"') /* found a quoted string */ { e_pos = pi; GetNewVar(); /* get new varname */ pi = e_pos; } else { pi++; /* get next char */ } } } else { a_bort(ab_code, x); } } /*---------- end ScanLet ----------*/

In function NewVarname(), the only change is the appending of the "$" to the variable name:

549

char *NewVarname() { char Val[6]; static char NVar[12]; var_count++; strcpy(NVar, sprintf(Val, strcat(NVar, strcat(NVar, return NVar;

/* generate a unique varname */

"VARX"); "%d", var_count); Val); "$");

} /*---------- end NewVarname ----------*/

Copy these changes to Asminput.c and save it. In file: Asmvars.c, function Do_let(), we will now branch according to the variable assignment type:

void Do_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0, type; int ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos; /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); type = get_vtype(pi); /* get variable type */ /* --- we now have varname and type --- */ if(type == 3) { copy_string(varname); } else if(type == 1) { asn_integer(varname); } /* string variable assignment */ /* integer assignment */

} /*---------- end Do_let ----------*/

Function asn_integer() will output the source code for simple value-to-variable assignments:

550

void asn_integer(char *name) { char ch, varname[VAR_NAME], data[10]; int pi, len, ii=0; len = strlen(p_string); strcpy(varname, name); strcat(varname, "I"); /* --- write varname --- */ write_str(" mov "); write_str(varname); write_str(", "); /* --- get value --- */ pi = e_pos; pi = get_digit(pi, len); ch = p_string[pi]; while(isdigit(ch) && (ii < 10)) { data[ii] = ch; ii++; pi++; ch = p_string[pi]; } data[ii] = '\0'; /* --- write value --- */ writeln(data); } /*---------- end asn_integer ----------*/

Integer assignments are very simple and straight forward. Copy these to Asmvars.c and save it. Here are the additions and changes to file: Asmfunct.c.

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(printstring == 1) { PrintStr(); } if(printchrctr == 1) { PrintChr(); } if(stringcopy == 1) { StringCopy(); } if(stringcat == 1) { StringCat();

551

(Continued) } if(printinteger == 1) { PrintInt(); } } /*---------- end Do_functions ----------*/

Functions PrintInt(), PrintInt1() and PrintInt2() output the assembly code for these three procedures: PRINTINT, FIXINT and INTASC. Procedure PRINTINT (for: Print Integer) is the general print handler for printing numbers. INTASC (for: Integer to Ascii) is the conversion function that translates the value into an ascii character string. During the conversion process, the resulting ascii characters are stored in the system variable: "IntString".
(This function is based on a routine from the book "The Serious Assembler", ISBN:0-671-55963-X.)

FIXINT (for: Left Fix Integer) has to "left-fix" or remove the leading blank spaces in the string. Because function INTASC does not know how many characters the resulting string will require, it stores the characters in a right to left pattern as it converts the value. Example:

value = 100
if you take the value of 100 and move it into register AX:

mov

ax, value

and then divide AX by ten, the result will be stored in AX and any remainder will be stored in register DX:

div AX DX

ax, 10 10 0

register value

Then, take the number in DX and add 30H (48 decimal) to it, the result will be the ascii code for that number. The number in DX (zero) plus 30H, will result in ascii code 48 (decimal).

DX = 0 + 48 = 48, ascii character "0"
That ascii code is then stored in the right most position in variable "IntString":

IntString:[ | | | | | | | | |0] -------------------^
The procedure is repeated with each character being stored one position to the left of the one before it, until done:

IntString:[ | | | | | | |1|0|0] ---------------^

552

Before we can print this, we need to call FIXINT, to shift everything to the far left and add the terminating "$" symbol:

IntString:[1|0|0|$| | | | | | ] -----^
We can now call our generic print string routine.

void PrintInt() { writeln("PRINTINT PROC NEAR"); writeln(" mov ax, dx"); writeln(" xor dx, dx ; clear High Order Word"); writeln(" mov si, offset(IntString)"); writeln(" mov cx, 10"); writeln(" call INTASC ; convert integer to ascii"); writeln(" call FIXINT"); writeln(" mov dx, offset(IntString)"); writeln(" call PRINTSTR"); writeln(" ret"); writeln("PRINTINT ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt ----------*/

void PrintInt1() { writeln("FIXINT PROC NEAR"); writeln(" mov ax, offset(IntString)"); writeln(" mov di, ax"); writeln(" mov si, ax"); writeln(" mov cl, 10"); writeln("Fix1:"); writeln(" cmp byte ptr [si], ' ' ; is it a space"); writeln(" jnz Fix2 ; if not, exit loop"); writeln(" inc si"); writeln(" loop Fix1"); writeln(" jmp FixEnd"); writeln("Fix2:"); writeln(" mov al, byte ptr [si]"); writeln(" mov [di], al"); writeln(" inc si"); writeln(" inc di"); writeln(" loop Fix2"); writeln(" mov [di], '$'"); writeln("FixEnd:"); writeln(" ret"); writeln("FIXINT ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt1 ----------*/

553

void PrintInt2() { writeln("INTASC PROC NEAR"); writeln(" mov di, si ; save start of string"); writeln("IA1:"); writeln(" mov BYTE PTR [si], ' ' ; fill character"); writeln(" inc si ; point to next field position"); writeln(" loop IA1 ; loop until done"); writeln(" div C10000 ; divide by 10,000"); writeln(" mov bx, ax ; save quotient"); writeln(" mov ax, dx ; move remainder back to ax"); writeln("IA2:"); writeln(" mov cx, 4 ; number of digits to print"); writeln("IA3:"); writeln(" xor dx, dx ; clear High Order Word"); writeln(" div C10 ; divide by ten"); writeln(" add dl, '0' ; convert to ascii digit"); writeln(" dec si ; step backwards thru buffer"); writeln(" cmp si, di ; out of space ?"); writeln(" jb IAX ; yes, quit"); writeln(" mov [si], dl ; store digit"); writeln(" or ax, ax ; all digits printed ?"); writeln(" jnz IA4 ; no, keep on going"); writeln(" or bx, bx ; any more work ?"); writeln(" jz IAX ; no, can quit"); writeln("IA4:"); writeln(" loop IA3 ; next digit"); writeln("IA5:"); writeln(" or bx, bx ; more work to do ?"); writeln(" jz IAX ; no, can quit"); writeln(" mov ax, bx ; get next 4 digits"); writeln(" xor bx, bx ; show no more digits"); writeln(" jmp IA2 ; keep on going"); writeln("IAX:"); writeln(" ret"); writeln("INTASC ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt2 ----------*/

Be sure to study the comments for INTASC. We need to make some changes to our print function. We also need to handle printing integers by outputting the call to PRINTINT:

void Do_print() { char ch, temp[VAR_NAME]; int pi, type; pi = e_pos; pi = iswhiter(pi);

554

(Continued) ch = p_string[pi]; e_pos = pi; printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); return; } /* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; type = get_vtype(pi); /* --- write integer variable --- */ if(type == 1) { strcat(temp, "I"); write_str(" mov dx, "); writeln(temp); writeln(" call PRINTINT"); printinteger = 1; } /* --- write string variable --- */ else if(type== 3) { strcat(temp, "$)"); write_str(" mov bx, offset("); writeln(temp); writeln(" inc bx"); writeln(" mov dx, [bx]"); writeln(" call PRINTSTR"); pi++; } /* --- write integer variable --- */ else { strcat(temp, "I"); write_str(" mov dx, "); writeln(temp); writeln(" call PRINTINT"); printinteger = 1; } pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9\t\t; code:for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0)

555

(Continued) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi;

} } } /*---------- end Do_print ----------*/

Copy these (above) functions to Asmfunct.c and save it. We need to add the flag variable: "printinteger" to the global vars section in Bxbasm.c: /* ------ global vars ------------ */ [snip] int stringcopy=0; /* flag: string copy */ int stringcat=0; /* flag: string concatenate */ int printinteger=0; /* flag: integer assign */ [snip]

In file: Asmerror.c, we need to make a few additions to a_bort():

void a_bort(int code,int x) [snip] case 5: printf("\nVariable Type error: in program line:"); printf(" %d.\n%sType must be: ",(line_ndx+1),p_string); printf("String\"%c\" or Integer\"%c\"\n",'$','!'); printf("code(%d)\n",code); break; case 6: printf("\nSyntax error: in program line: %d.\n",(line_ndx+1)); printf("%s", p_string); printf("Useage: LET (variable=assignment):\ncode(%d)\n", code); break; default: printf("Program aborted, undefined error."); break; [snip]

In file: Prototyp.h, update these prototype lists:

556

/* void void void void void void void void void void void void void void void void void void /* void void void void

Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); Do_beep(void); Do_print(void); PrintStr(void); PrintChr(void); Do_goto(void); Do_label(void); Do_gosub(void); Do_return(void); Do_locate(void); StringCopy(void); StringCat(void); PrintInt(void); PrintInt1(void); PrintInt2(void); Asmvars.c */ Do_let(void); copy_string(char *); cat_string(int); asn_integer(char *);

Copy and save everything. Now, re-compile Bxbasm.c. Before we go any further, let's first make sure our old string functions still work. With Bxbasm.exe, try this Test.bas : ' test.bas version 14.4 xyz$ = "test " qaz$ = "string" abc$ = xyz$ + qaz$ PRINT abc$ ' -----------------------------------------TheEnd: END ' ------------------------------------------

Okay, if that still works we can proceed. Now try this Test.bas:

557

test.bas version 14.5 value = 100 PRINT value ' -----------------------------------------TheEnd: END ' ------------------------------------------

'

Examine Test.asm. Here is what it should look like: ; ; *************BxbAsm Compiler*************

jmp START ; valueI DW 0 ; -------------------------------------------------; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------NEWLINE DB 13,10,'$' C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division SIXTN DB 16 ; segment multiplier EndProg DW 0 ; ZSEG: last segment NextFree DW 0 IntString DB ' ','$' ; integer string ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------mov valueI, 100 mov dx, valueI call PRINTINT mov dx, offset(NEWLINE) call PRINTSTR TheEnd: jmp DONE ; DONE: INT 20H START ENDP ; -------------------------------------------------PRINTSTR PROC NEAR push ax mov ah, 9

558

(Continued) INT 21H pop ax ret PRINTSTR ENDP ; -------------------------------------------------PRINTINT PROC NEAR mov ax, dx xor dx, dx ; clear High Order Word mov si, offset(IntString) mov cx, 10 call INTASC ; convert integer to ascii call FIXINT mov dx, offset(IntString) call PRINTSTR ret PRINTINT ENDP ; -------------------------------------------------FIXINT PROC NEAR mov ax, offset(IntString) mov di, ax mov si, ax mov cl, 10 Fix1: cmp byte ptr [si], ' ' ; is it a space jnz Fix2 ; if not, exit loop inc si loop Fix1 jmp FixEnd Fix2: mov al, byte ptr [si] mov [di], al inc si inc di loop Fix2 mov [di], '$' FixEnd: ret FIXINT ENDP ; -------------------------------------------------INTASC PROC NEAR mov di, si ; save start of string IA1: mov BYTE PTR [si], ' ' ; fill character inc si ; point to next field position loop IA1 ; loop until done div C10000 ; divide by 10,000 mov bx, ax ; save quotient mov ax, dx ; move remainder back to ax IA2: mov cx, 4 ; number of digits to print IA3: xor dx, dx ; clear High Order Word div C10 ; divide by ten add dl, '0' ; convert to ascii digit dec si ; step backwards thru buffer

559

(Continued) cmp si, di ; out of space ? jb IAX ; yes, quit mov [si], dl ; store digit or ax, ax ; all digits printed ? jnz IA4 ; no, keep on going or bx, bx ; any more work ? jz IAX ; no, can quit IA4: loop IA3 ; next digit IA5: or bx, bx ; more work to do ? jz IAX ; no, can quit mov ax, bx ; get next 4 digits xor bx, bx ; show no more digits jmp IA2 ; keep on going IAX: ret INTASC ENDP ; -------------------------------------------------ZSEG: ; --------------------------------------------------

CONCLUSION
Well, these are humble beginnings, but, we are making some progress. We are able to write Assembly Language functions and call them from the main program. We can declare and assign values to integer variables, even though we don't have any math routines yet. We can concatenate character strings, but I admit, that's only a start when it comes to string variables. There's still more to come.

560

CHAPTER - 15
INTRODUCTION
Welcome back! In the last chapter we began working with integer variables. We devised a method for declaring and assigning values to them and we also made provisions for displaying them on the screen. In this chapter we will add the basic math functions by building a Recursive Descent Parser. Actually, we will be using the same Recursive Descent Parser we are already using in our Bxb Scripting Engine. This parser is based on the one originally developed by Jack Crenshaw, in his tutorial series "Let's Build a Compiler". We will of course make a few changes, from the one we are now using, so that it outputs assembly source code.

COMPILING EXPRESSIONS:
We want to be able to input an expression such as the one below:

qaz = abc * ((xyz + 1) * 3) / 3
and have that translated directly into compilable code. Remember, the expression will NOT be interpreted at runtime and then executed. Instead, the interpretation process will take place entirely at compile time and the output will be the pure executable code. To build the assembly code for the above expression, we will be using what is called a “Recursive Descent, Left-toRight” parser. That means we have to dissect the expression according to the precedence of each expression element, in a normal left-to-right algebraic fashion. The element with the highest precedence is the element contained within the deepest level of parenthesis. Then, working towards the outer levels of parens', that is followed by any multiply or divide terms, with the lowest level of precedence being any add or subtract statements. In the expression above and working from left-to-right, the first thing we need to do is save the value of "abc", for later use by pushing it onto the stack, like this:

mov ax, abc push ax

; load variable ; save this value ;

Stack [ abc ] [ ]

With that said, the first element (the inner most parens’) to compile is:

(xyz + 1)

; abc * ((xyz + 1) * 3) / 3 ^----^

and here is what the assembly code for that statement would look like:

561

mov ax, xyz push ax

; ; load variable ; save this value ; ; ; load a constant ; restore value to add ;

Stack [ abc ] -->[ xyz ] [ ] Stack [ abc ] <--[ ]

mov pop add

ax, 1 bx ax, bx

You might be wondering: "why go through so many steps to add '1' to 'xyz', why not just do something like:"

mov ax, xyz add ax, 1
"and reduce the number of operations from five to two?" Well, in this instance you certainly could. There is no reason why you couldn't reduce the number of steps, but, that might not work in all cases. We will save this for a later topic on Optimization. With a new value for "xyz", the next term is:

(xyz * 3)

; abc * (xyz * 3) / 3 ^----^ ; ; ; save this value ; load a constant ; ; ; restore value to ; multiply Stack [ abc ] -->[ xyz ] [ ] Stack [ abc ] <--[ ]

and here is the assembly code to perform that operation:

push ax mov ax, 3

pop mul

bx bx

at this point the AX register contains the result of the multiply operation. Still working from left-to-right, (even though it looks like we are working from right-to-left,) the next element to compile is:

abc * xyz

; abc * (xyz) / 3 ^------^

and here is the assembly code: (remember, "abc" was on the stack),

pop mul

bx bx

; ; restore value to ; multiply

Stack <--[ ] [ ]

again, register AX contains the result of the multiply operation.

562

At this point there remains only one element left to compile:

abc / 3

; abc / 3 ^----^

We will begin by pushing the result of the last operation ("abc") onto the stack:

push ax mov ax, 3 pop bx

; ; save this value ; load a constant ; ; restore value to ; divide

Stack -->[ abc ] [ ] Stack <--[ ] [ ]

At this point: ax = 3, bx = abc. There is one major problem with this arrangement. In all division operations, the AX register is the dividend and the second operand is the divisor. If we were to carry out this operation, we would be dividing 3 by the value of "abc" and that is the opposite of what we want to do. So, we need to exchange or "flip" the contents of the two registers. Here is how we do that:

xchg ax, bx mov dx, 0 div bx

; [ax]<--->[bx] ; clear dx for integer divide ; perform division

Now register AX contains the final result. The next step is to store that in variable "qaz":

lea mov

di, qaz [di], ax

; load destination register with qaz's offset ; store result

Done ! Our new math parser will need to do all of the above for us.

563

RECURSIVE DESCENT PARSER:
We will begin with the small change that needs to be made to file Bxbasm.c. Modify the "includes" section to look like this: /* --- function includes --- */ #include "input.c" #include "utility.c" #include "asminput.c" #include "asmutils.c" #include "asmfunct.c" #include "asmerror.c" #include "asmvars.c" #include "asmrdp.c"

The next change will be to function "asn_integer()" in file: Asmvars.c. Make these changes and save it:

void asn_integer(char *name) { char ch, varname[VAR_NAME]; int pi; pi = e_pos; pi = iswhite(pi); e_pos = pi; strcpy(varname, name); strcat(varname, "I"); /* --- call: recursive descent parser --- */ Assignment(); /* --- store data --- */ write_str(" mov "); write_str(varname); writeln(", ax\t\t; store variable"); } /*---------- end asn_integer ----------*/

Now we need to create a new file to hold all the functions for our parser. Open a new file and name it "Asmrdp.c", and copy this header to that file:

564

/* bxbasm : Asmrdp.c : alpha version */ /* special credits to: Jack Crenshaw's "Let's Build a Compiler" */ /* ----- function prototypes ----- */ #include "prototyp.h"

Now copy the following functions into file: "Asmrdp.c":

void Assignment() /* Recursive Descent Parser Main */ { Match('='); a_Expression(); } /*---------- end Assignment ----------*/

void a_Expression() { char ch; int pi;

/* Parse and Translate an Expression */

pi = e_pos; ch = p_string[pi]; if(IsAddop(ch)) { writeln(" mov ax, 0\t\t; negate a number"); } else { a_Term(); pi = e_pos; ch = p_string[pi]; } while(IsAddop(ch)) { writeln(" push ax\t\t\t; save this value"); switch(ch) { case '+': Add(); break; case '-': Subtract(); break; default: break; } pi = e_pos; ch = p_string[pi]; } } /*---------- end a_Expression ----------*/

565

void a_Term() { char ch; int pi;

/* Parse and Translate a Math Term */

a_Factor(); pi = e_pos; ch = p_string[pi]; while(IsMultop(ch)) { writeln(" push ax\t\t\t; save this value"); switch(ch) { case '*': Multiply(); break; case '/': Divide(); break; default: break; } pi = e_pos; ch = p_string[pi]; } } /*---------- end a_Term ----------*/

void a_Factor() /* Parse and Translate a Math Factor */ { char ch, Value[10]; int pi; pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); a_Expression(); Match(')'); } else if(isalpha(ch)) /* variable name */ { Ident(); } else { strcpy(Value, a_GetNum()); /* numeric value */ write_str(" mov ax, "); write_str(Value); writeln("\t\t; load a constant"); } } /*---------- end a_Factor ----------*/

566

void Ident() /* Parse and Translate an Identifier */ { char ch, Name[VAR_NAME]; int pi, type; strcpy(Name, get_varname()); pi = e_pos; type = get_Nvtype(pi); /* get variable type */ if(type == 1) { strcat(Name, "I"); } SkipWhite(); pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); Match(')'); write_str(" call "); writeln(Name); } else { write_str(" mov ax, "); write_str(Name); writeln("\t\t; load variable"); } } /*---------- end Ident ----------*/

char *a_GetNum() /* Get a Number */ { char ch; static char Value[10]; int pi, ndx=0, ab_code=12, ln=line_ndx; Value[0] = '\0'; pi = e_pos; ch = p_string[pi]; if(! isdigit(ch)) { strcpy(t_holder, "Numeric Value"); a_bort(ab_code,ln); } while((isdigit(ch)) && (ndx < 10)) { Value[ndx] = ch; ndx++; pi++; ch = p_string[pi]; } Value[ndx] = '\0'; e_pos = pi; SkipWhite(); return Value; } /*---------- end a_GetNum ----------*/

567

void Add() /* Recognize and Translate an Add */ { Match('+'); a_Term(); writeln(" pop bx\t\t\t; restore value to add"); writeln(" add ax, bx"); } /*---------- end Add ----------*/

void Subtract() /* Recognize and Translate a Subtract */ { Match('-'); a_Term(); writeln(" pop bx\t\t\t; restore value to subtract"); writeln(" sub ax, bx"); writeln(" neg ax"); } /*---------- end Subtract ----------*/

void Multiply() /* Recognize and Translate a Multiply */ { Match('*'); a_Factor(); writeln(" pop bx\t\t\t; restore value to multiply"); writeln(" mul bx"); } /*---------- end Multiply ----------*/

void Divide() /* Recognize and Translate a Divide */ { Match('/'); a_Factor(); writeln(" pop bx\t\t\t; restore value to divide"); writeln(" xchg ax, bx"); writeln(" xor dx, dx\t\t; clear dx for integer divide"); writeln(" div bx"); } /*---------- end Divide ----------*/

568

void Match(char x) /* Match a Specific Input Character */ { char ch, string[6]; int pi, ab_code=12, ln=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch != x) { strcpy(string, "\" \""); string[1] = x; strcpy(t_holder, string); a_bort(ab_code,ln); } else { _GetChar(); SkipWhite(); } } /*---------- end Match ----------*/

void _GetChar() { e_pos++; /* advance pointer */ } /*---------- end _GetChar ----------*/

int IsAddop(char ch) { int bool=0;

/* Recognize an Addop */

if((ch == '+') || (ch == '-')) { bool = 1; } return bool; } /*---------- end IsAddop ----------*/

569

int IsMultop(char ch) { int bool=0; if(strchr("*^/%", ch)) { bool = 1; } return bool;

/* Recognize an Multop */

} /*---------- end IsMultop ----------*/

int Is_White(char ch) { int bool=0; if((ch == ' ') || (ch == '\t')) { bool = -1; } return bool; } /*---------- end Is_White ----------*/

void SkipWhite() { char ch; int pi;

/* Skip Over Leading White Space */

pi = e_pos; ch = p_string[pi]; while(Is_White(ch)) { _GetChar(); pi = e_pos; ch = p_string[pi]; }

/* advance & save pointer */

} /*---------- end SkipWhite ----------*/

Save file Asmrdp.c and close it. You will notice that these functions parallel the parser we use in the Bxb Scripting Engine. Now, in file: Asmutils.c, add this new function:

570

int get_Nvtype(int pi) { char ch; int type=0;

/* determine variable type */

ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(strchr(" )=+*-/\n", ch)) { type = 1; /* } else if(ch == '%') { type = 2; /* } else if(ch == '!') { type = 3; /* } else if(ch == '#') { type = 4; /* } return type;

an integer */ a long integer */ a float */ a double float */

} /*------- end get_Nvtype --------*/

Add these new prototypes to file: Prototyp.h, under these headings: /* void void void /* char /* int int /* Asmutils.c */ writeln(char *); write_str(char *); PostLabel(char *); *get_varname(void); Note only */ get_vtype(int); Note only */ get_Nvtype(int);

Asmrdp.c */ void Assignment(void); void a_Expression(void); void a_Term(void); void a_Factor(void); void Ident(void); char *a_GetNum(void); void Add(void); void Subtract(void); void Multiply(void); void Divide(void); /* void Match(char); /* void _GetChar(void); /* int IsAddop(char); /* int IsMultop(char);

Note Note Note Note

only only only only

*/ */ */ */

571

(Continued) /* int Is_White(char); /* void SkipWhite(void); Note only */ Note only */

Now compile Bxbasm.c. Using Bxbasm.exe, compile this version of Test.bas: ' test.bas version 15.1 abc = 10 xyz = 3 qaz = abc * ((xyz + 1) * 3) / 3 PRINT qaz ' -----------------------------------------END ' ------------------------------------------

Now execute Test.com. Here is the source code for Test.asm, (abbreviated) showing the output of the recursive descent parser: ; ; *************BxbAsm Compiler*************

jmp START ; abcI DW 0 xyzI DW 0 qazI DW 0 ; -------------------------------------------------; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------NEWLINE DB 13,10,'$' C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division SIXTN DB 16 ; segment multiplier EndProg DW 0 ; ZSEG: last segment NextFree DW 0 IntString DB ' ','$' ; integer string ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN

572

(Continued) mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------mov ax, 10 ; load a constant lea di, abcI ; load variable offset mov [di], ax ; store variable mov ax, 3 ; load a constant lea di, xyzI ; load variable offset mov [di], ax ; store variable mov ax, abcI ; load variable push ax ; save this value mov ax, xyzI ; load variable push ax ; save this value mov ax, 1 ; load a constant pop bx ; restore value to add add ax, bx push ax ; save this value mov ax, 3 ; load a constant pop bx ; restore value to multiply mul bx pop bx ; restore value to multiply mul bx push ax ; save this value mov ax, 3 ; load a constant pop bx ; restore value to divide xchg ax, bx mov dx, 0 ; clear dx for integer divide div bx lea di, qazI ; load variable offset mov [di], ax ; store variable [snip] jmp DONE ... [snip] ZSEG: ; --------------------------------------------------

573

NEGATIVE INTEGERS:
Now try compiling and executing this version of Test.bas: ' test.bas version 15.2 abc = -10 PRINT abc abc = 10 xyz = 3 qaz = abc * ((xyz + 1) * 3) / 3 PRINT qaz ' -----------------------------------------END ' ------------------------------------------

Do you see the error? The problem is that we haven't made any provisions for handling negative values. We can easily remedy this though. In file: "Asmfunct.c", we have a function named "PrintInt2()", this function handles outputting the code for printing an integer value to the screen. What we need to do is add a local variable to procedure "INTASC" that will act as a sign-flag, indicating a positive or negative value. Just as in the beginning of the assembly file, where we declare all the global variables, we will declare a local variable in procedure INTASC, which will be preceded by a "jump" instruction:

INTASC PROC jmp IA0 sign DB 0 IA0:

NEAR ; operand 'sign': local variable

after we have loaded the variable value into register AX, we will compare it against a zero value, to see if it is greater than or less than zero:

cmp ax, 0 jge PosVal neg ax mov sign, 1 PosVal: div C10000

; is ax negative ; change to a positive value ; set sign flag (negative) ; divide by 10,000

If the value in register AX is greater than zero, it is obviously a positive number. The instruction states:

compare ax to 'zero'
followed by:

if it's greater than or equals zero, jump to label: "PosVal"

574

After the digits have been inserted into the display string, the last thing to do is to insert a minus sign, if the value is negative and reset the sign flag to zero:

IAX: mov cmp jz dec mov IAXit: mov ret

al, sign al, 0 IAXit si B[si], '-' sign, 0

; sign flag ; if => 0, positive value ; next data position ; insert minus 'sign' ; reset sign flag

Here is the new code for function "PrintInt2()". Copy this to file Asmfunct.c, replacing the old version:

void PrintInt2() { writeln("INTASC PROC NEAR"); writeln(" jmp IA0"); writeln("sign DB 0 ; operand 'sign': local variable"); writeln("IA0:"); writeln(" mov di, si ; save start of string"); writeln("IA1:"); writeln(" mov B[si], ' ' ; fill character"); writeln(" inc si ; point to next field position"); writeln(" loop IA1 ; loop until done"); writeln(" cmp ax, 0 ; is ax negative"); writeln(" jge PosVal"); writeln(" neg ax ; change to a positive value"); writeln(" mov sign, 1 ; set sign flag (negative)"); writeln("PosVal:"); writeln(" div C10000 ; divide by 10,000"); writeln(" mov bx, ax ; save quotient"); writeln(" mov ax, dx ; move remainder back to ax"); writeln("IA2:"); writeln(" mov cx, 4 ; number of digits to print"); writeln("IA3:"); writeln(" xor dx, dx ; clear High Order Word"); writeln(" div C10 ; divide by ten"); writeln(" add dl, '0' ; convert to ascii digit"); writeln(" dec si ; step backwards thru buffer"); writeln(" cmp si, di ; out of space ?"); writeln(" jb IAX ; yes, quit"); writeln(" mov [si], dl ; store digit"); writeln(" or ax, ax ; all digits printed ?"); writeln(" jnz IA4 ; no, keep on going"); writeln(" or bx, bx ; any more work ?"); writeln(" jz IAX ; no, can quit"); writeln("IA4:"); writeln(" loop IA3 ; next digit");

575

(Continued) writeln("IA5:"); writeln(" or bx, bx ; more work to do ?"); writeln(" jz IAX ; no, can quit"); writeln(" mov ax, bx ; get next 4 digits"); writeln(" xor bx, bx ; show no more digits"); writeln(" jmp IA2 ; keep on going"); writeln("IAX:"); writeln(" mov al, sign ; sign flag"); writeln(" cmp al, 0 ; if 0+ positive value"); writeln(" jz IAXit"); writeln(" dec si ; next data position"); writeln(" mov B[si], '-' ; insert minus 'sign'"); writeln("IAXit:"); writeln(" ret"); writeln("INTASC ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt2 ----------*/

Now recompile Bxbasm.c. Using Bxbasm.exe, recompile this version of Test.bas: test.bas version 15.3 abc = 10 PRINT abc xyz = 3 qaz = abc * ((xyz + 1) * 3) / 3 PRINT qaz ' Negative value abc = -10 PRINT abc xyz = 3 qaz = abc * ((xyz + 1) * 3) / 3 PRINT qaz ' -----------------------------------------END ' -----------------------------------------'

Now execute Test.com. As you can see, it now properly handles negative integer values.

576

FLOATING POINT NUMBERS:
Well, if doing calculations using short integers was all we would ever need to do, what we currently have would work just fine. However, that's not a real world situation. We can't expect to do very much by just using integer values. This brings up the issue of "Real Numbers". Real numbers are numbers that we use every day, that represent both integers and fractional numbers that range from very large to very small, including negative values. **Since it is not within the scope of this text to explain how to perform multi-digit algebra and calculus, I will leave it to the reader to research those topics, if that is of any interest to you. We currently have a math parser that, by using the x86's Integer Math Functions, can perform simple 16-bit arithmetic. The x86 is capable of doing some 32-bit arithmetic, but I feel it would be a waste of time to take that route and I would prefer to move-on to the x87 instruction set, which is generally faster for most situations and especially when performing "real" number calculations. There does exist Library Routines for doing real number calculations by using just the x86 instruction set, but, I will leave that for the reader to investigate if you so choose to. Instead, what I will concentrate on is introducing the x87 math processor and the instruction set that performs what we have accomplished by using the x86 integer instructions. I will also expand the variable types by including 32-bit Long Integers, 64-bit Extended Long Integers, 32-bit and 64-bit Floating Point numbers. For future reference, when I refer to the "x87", it will be a general reference to the Intel Math Processor or the FPU (Floating Point Unit) that includes the 8087, 80287, 80387, x486, x586, Pentium, etc. I will try to not use examples that are specific to a single processor to the exclusion of others. Also, this will not be an in-depth study of the x87 or how it functions. I will explain just as much as is needed for the reader to be able to understand the assembly code we will be writing.

THE x87 FPU:
The x87 Floating Point Unit (math processor) differs greatly from the x86 in both the instruction set and its architecture. You should be fairly familiar with the architecture and the registers of the x86: The x86: General registers: segment registers: special registers:

ax: bx: cx: dx:

[ah][al] [bh][bl] [ch][cl] [dh][dl]

cs:[ ds:[ es:[ ss:[

] ] ] ]

sp:[ bp:[ si:[ di:[

] ] ] ]

flags[][][][][][][][]

16bits/8bits The x86 CPU is comprised of four 16-bit or eight 8-bit general purpose registers, four 16-bit segment registers and four 16-bit special purpose registers, plus the flags registers.

577

The x87: x87 data registers:(stack)

ST(0):[ ST(1):[ ST(2):[ ST(3):[ ST(4):[ ST(5):[ ST(6):[ ST(7):[
80bits......

] ] ] ] ] ] ] ]

The x87 is comprised of eight 80-bit registers called Stack(0) through Stack(7). There are also four special purpose registers. For the time being, all we need to be concerned with is the eight Stack Registers. All x87 math calculations occur on the Stack Registers and in particular, Stack Register (0). That is to say, that most calculations involve Stack(0) and one other Stack register or a memory location operand. Here is a familiar x86 instruction, that adds two values:

add ax, bx
the same instruction on the x87 would be: or

FADD ST(0), ST(1) FADD ST(1)

In the last example, when ST(0) is not stated, ST(0) is the default destination. All x87 instructions are preceded by the "prefix" letter "F". Such as: FSUB, FMUL, FDIV, etc. For example: • The instruction to load an integer onto the Stack would be: "FILD" for: (prefix)"F" Integer LoaD. • The instruction to move a value from the Stack to a memory location would be: "FIST" for: (prefix)"F" Integer STore. Once a value is loaded onto the x87 Stack, it is treated as a real number. Meaning, that it makes no distinction between integer and floating point numbers. All numbers are converted to 80 bit Real Numbers. I will refrain from getting into the discussion of exponents, mantissas and scientific notation, but suffice it to say that all numbers, once loaded onto the Stack, are converted to the same format which Intel calls "Temporary Real". The x87 FPU can load three types of numbers directly from memory; Integers, Floating Point and BCD (Binary Coded Decimal). Integers are stored in memory in the normal binary format, in three different sizes:

Word Integer:[16 bits] Short Integer:[.....32 bits.....] Long Integer:[..........64 bits..........]

578

Floating Point numbers are stored in a form of scientific notation in three different sizes:

Short Real:[.....32 bits.....] Long Real:[..........64 bits..........] Float Real:[...............80 bits...............]
BCD is stored in a pure ascii format:

Packed BCD:[...............80 bits...............]
For our purposes, we will be using the Integer and Real types of data storage. Except for the brand name on the package, the x86 and x87 are very different machines and bare no similarity. In fact, the x86 and x87 can't even communicate information or data between their registers (not absolutely true, but, for practical purposes it is). For instance, there is no way to pass the contents of the x86's AX register (or any other register for that matter) to the x87's Stack. The only thing the x86 and x87 share in common is the same data space, ram memory! x86 cpu not x87 fpu

memory

If an x86 routine needs the result of an x87 computation to complete a task, the x86 has to signal the x87 to make the computation and then store the result in the shared data area, where the x86 can act on the resultant data. A mechanism exists whereby the x87 signals the x86 that the task is complete and the data is ready for use. There are several idiosyncrasies about the x87 that you have to get used to: • for one thing, unlike the x86, which can "push" registers onto the x86 memory stack until you are "blue-inthe-face" or run out of memory, the x87 has a hardware stack that is eight storage elements in length, or more simply put, eight stack spaces, • the x87's eight Stack Registers and the x87's eight stack spaces are one in the same. You can never have more than eight (8) things on the x87's stack at any given time. There are no exceptions! if you attempt to load a ninth data item onto the Stack, you will lose the first data item that was loaded onto the Stack as it will be pushed right off the Stack, the x87's Stack is a "First-In Last-Out" (FILO) stack and each previous data item on the stack gets pushed "down" on the stack, from the Stack(0) position, unlike the x86's stack, where the top-of-stack's address changes with every push and pop, the x87's top-ofstack is always ST(0). there are no explicit "push" or "pop" instructions for the x87. Instead there are "Load" and "Store" instructions. Some "Store" and math instructions perform what can be considered a "pop".

•

•

•

•

If a math expression requires more than eight Stack elements, you have to handle the overflow manually, by moving things off the Stack to a temporary storage area in memory. Otherwise Stack items can be lost. You then have to move them back onto the Stack as the Stack empties.

579

For example, let's say you want to push three integer values onto the Stack which are stored in memory locations: mem1, mem2 and mem3: x87 instruction: x87 Stack

FILD

mem1

ST(0):[ ST(1):[ ST(2):[
x87 Stack

100] ] ]

next, push mem2: x87 instruction:

FILD

mem2

ST(0):[ ST(1):[ ST(2):[
x87 Stack

200] 100] ]

next, push mem3: x87 instruction:

FILD

mem3

ST(0):[ ST(1):[ ST(2):[ ST(3):[

300] 200] 100] ]

With the limited Operand/Stack space available on the x87, performing math on the x87 is a careful ballet of managing the Stack Registers. If you do not intentionally "pop" the Stack frequently, you will fill it up. The x87 has a primary set of instructions, as well as a secondary set of instructions that "pop" the Stack after each operation. Let's say we want to add those numbers we previously pushed onto the Stack and then store the result back in memory. Here are the steps using the primary set of instructions to perform that, followed by an illustration of the activity on the Stack: (x87 Stack as we left it): x87 Stack

ST(0):[ ST(1):[ ST(2):[ ST(3):[
FADD ST(1)

300] 200] 100] ]

; ST(0) = ST(0)+ST(1) x87 Stack

ST(0):[ ST(1):[ ST(2):[ ST(3):[
FADD ST(2)

500] 200] 100] ]

; ST(0) = ST(0)+ST(2) x87 Stack

ST(0):[ ST(1):[ ST(2):[ ST(3):[

600] 200] 100] ]

580

Now, save the data: FIST mem1

; store result to memory location: mem1 x87 Stack

ST(0):[ ST(1):[ ST(2):[ ST(3):[

600] 200] 100] ]

As you can see, the Stack is still populated with three data items, the result and two of the operands. Now we will perform the same operations, but, using the x87's secondary instruction set that cleans up the Stack after each operation: (x87 Stack as it started): x87 Stack

ST(0):[ ST(1):[ ST(2):[ ST(3):[
FADDP ST(1)

300] 200] 100] ]

; ST(0) = ST(0)+ST(1) and POP operand off Stack x87 Stack

ST(0):[ ST(1):[ ST(2):[
FADDP ST(1)

500] 100] ]

; ST(0) = ST(0)+ST(1) and POP operand off Stack x87 Stack

ST(0):[ ST(1):[
Now, save the data: FISTP mem1

600] ]

; store result to memory location: and POP Stack x87 Stack

ST(0):[

]

This time, the Stack is left empty and in a state that is ready for the next operation. You may have noticed that, when ST(0) is the destination of an operation, only the second operand needs to be stated. In other words, if you only state one operand, ST(0) is implied to be the "Destination Operand", by default. However, in the case of the Store instruction, ST(0) is always implied as the source register for the Store operation. • "Loads" are always to the top of the Stack; ST(0) and • "Stores" are always from the top of the Stack; ST(0). Of course, for most other operations, two operands (register or memory) can be stated at any time. The x87 instruction set has a complete compliment to the x86 math instructions plus a number of others as well. There is no need to use the x86 instructions for Integer Math and the x87 for Floating Point Math. For our purposes, since the x87 is generally faster, it may be beneficial to just use the x87 instruction set. Of course, in doing so, the programs written for the x87 will not run on older machines not equipped with the x87 co-processor. However, since most machines are x486's, x586's or better, that is not a major issue these days. We will not dwell over whether or not we should write exclusively for the x87 or not. With the advent of the x486, the FPU was built directly into the x86 CPU chip. Machines that are less than an x486, (although they do exist,) are clearly in the minority. Even still, there are Library Routines that can mimic the x87 instruction set on an x86.

581

QUIZ:
Consider the following expression, remember, each successive expression within parens has to be pushed onto the Stack, because the inner most expression has to be performed first:

x = a*(b+(c-(d/(e*(f+(g-(h/(i*(j+(k-l))))))))))
How can an expression like this be performed using the x87 ?

THE PARSER:
Having recently completed writing the Recursive Descent Parser (and a darn good one at that, thanks to Jack Crenshaw) you may be wondering: “how much rewriting will we have to do to make the parser work with the x87?” Well, very little I'm happy to say. Since the architecture of the parser is pretty rock solid, all we need to do is substitute the x86 instructions with the x87 instructions. Also, since the x87 has instructions that perform a duel operation by "popping" the Stack, in addition to the main instruction, there will actually be fewer instructions. Here is an example of x86 code that loads two operands into the registers, adds them together, pops the stack and then stores the result:

mov ax, mem1 push ax mov ax, mem2 pop bx add ax, bx mov si, offset(mem3) mov [si], ax
Now, the same operation using the x87:

; load data item ; push onto stack ; load data item ; pop stack to register ; add values ; load memory address ; store data

FILD mem1 FILD mem2 FADDP mem2 FISTP mem3

Not only is the process done in fewer steps, • it is performed quicker, • it can be used on floating point numbers and • it can be used on extremely large multidigit numbers. The x86 code shown above can only perform 16 bit integer math. You can do 32 bit integer math on the x86, but, that almost doubles the number of steps involved. Using the instructions above, the x87 can perform 16 bit or 80 bit math without adding one single instruction.

582

CODING THE x87:
As illustrated by the above example, coding for the x87 can actually be a lot easier, once you know which x87 instructions to use. Using functions from our parser, here are the Add(), Subtract(), Multiply() and Divide() routines, written in the original x86 code and then followed by the respective x87 code that is a direct replacement:

*************************************ADD************************************* x86 code:
void Add() /* Recognize and Translate an Add */ { Match('+'); a_Term(); writeln(" pop bx\t\t\t; restore value to add"); writeln(" add ax, bx"); } /*---------- end Add ----------*/

x87 code:
void Add() /* Recognize and Translate an Add */ { Match('+'); a_Term(); writeln(" FADDP ST(1)\t\t; add to ST(0) and pop"); } /*---------- end Add ----------*/

*************************************SUBTRACT************************************* x86 code:
void Subtract() /* Recognize and Translate a Subtract */ { Match('-'); a_Term(); writeln(" pop bx\t\t\t; restore value to subtract"); writeln(" sub ax, bx"); writeln(" neg ax"); } /*---------- end Subtract ----------*/

583

x87 code:
void Subtract() /* Recognize and Translate a Subtract */ { Match('-'); a_Term(); writeln(" FSUBP ST(1), ST\t; subtract and pop"); } /*---------- end Subtract ----------*/

*************************************MULTIPLY************************************* x86 code:
void Multiply() /* Recognize and Translate a Multiply */ { Match('*'); a_Factor(); writeln(" pop bx\t\t\t; restore value to multiply"); writeln(" mul bx"); } /*---------- end Multiply ----------*/

x87 code:
void Multiply() /* Recognize and Translate a Multiply */ { Match('*'); a_Factor(); writeln(" FMULP ST(1)\t\t; multiply and pop"); } /*---------- end Multiply ----------*/

*************************************DIVIDE************************************* x86 code:
void Divide() /* Recognize and Translate a Divide */ { Match('/'); a_Factor(); writeln(" pop bx\t\t\t; restore value to divide"); writeln(" xchg ax, bx"); writeln(" xor dx, dx\t\t; clear dx for integer divide"); writeln(" div bx"); } /*---------- end Divide ----------*/

584

x87 code:
void Divide() /* Recognize and Translate a Divide */ { Match('/'); a_Factor(); writeln(" FDIVP ST(1)\t\t; divide and pop"); } /*---------- end Divide ----------*/

In the x86 versions, each function ranged from two lines to as many as four lines of x86 code, while the x87 version has reduced each function to a single line of x87 code. Clearly you can see now the immediate benefit of using the x87 in the math parser. As illustrated above, the x86 was designed as a general purpose machine and that means that several steps may be required to perform a specific task. On the other hand, the x87 was designed for the specific task of loading numbers from memory, directly into the math registers, performing the math and then storing the result directly back into memory. This greatly reduces the amount of guess work and ambiguity that can exist when using the x86 for performing math. With the x87, the process is straight forward: • load the operands, • do the math, • store the result. To see just how the x87 works, let's compile an Assembly Language test program that calculates this expression:

result = var1 * (var2 + (var4 - var3))
Here is the Assembler code. Copy this to the your text editor and save it as "Test.asm", in the same directory that contains both A86.com and D86.com. ;[begin] ; ************************** jmp START ; var1I DW 10 var2F DD .123 var3F DD 1.5 var4I DW 2 resultD DQ 0 ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------; -------------------------------------------------FINIT ; initialize x87 FILD var1I ; load integer FLD var2F ; load float

585

(Continued) FILD var4I FLD var3F FSUBP ST(1), ST FADDP ST(1) FMULP ST(1) FSTP resultD jmp DONE ; ; ; ; ; ; load integer load float subtract and add to ST(0) multiply and store float,

pop and pop pop pop stack

; DONE: INT 20H START ENDP ; -------------------------------------------------;[end]

•

Now, drop down to the DOS Prompt and enter: A86 Test.asm this will assemble the source code into "Test.com".

Now we will use the D86 Disassembler to single step through the instructions: • again, at the DOS Prompt, enter: D86 Test • as soon as D86 is loaded, press the F10 key, on the top row of the keyboard, this opens the x87 window, What you see in the upper right portion of the screen are the eight x87 Stack/Registers and beneath that are the x87 flags and special purpose registers. Now, we will single step through the program: • press the F1 key, once, the instruction pointer moved to "Start" and the first x87 instruction, which is FINIT, the FINIT initializes the x87, • press the F1 key, one step at a time so that you can single step the program, • single step up to the DONE: label, but, don't go beyond that point. As you saw, each FLD and FILD pushed a value onto the Stack. Then the SUB, ADD and MULT instructions were executed. The final instruction, FSTP, saved the result to memory. That's how the x87 works. Pretty neat, huh ? Now, press the HOME key, once. This will take you back to the very first "JMP" instruction, where you can repeat the process. Repeat this several times until you get a feel for what is going on. Here is the equation as executed by the x87:

var4 - var3 + var2 * var1 = result 2 - 1.5 + .123 * 10 = "
When you are ready to quit, press the letter "Q" and press "Enter".

586

THE CODE:
Since the x87 has the capability to work with three different sizes of integers, (16-bit, 32-bit and 64-bit), and three sizes of floating point numbers, (32-bit, 64-bit and 80-bit), we will begin by building the input routines that will translate them from Basic Script into meaningful Assembly Code. Each of the integer and floating point data types can be represented by variable names with the additional use of a data type symbol:

symbol: none % ~ ! #

data type: size: 16 bit integer DW 32 bit integer DD 64 bit integer DQ 32 bit floating point DD 64 bit floating point DQ

For the time being we will not assign the 80 bit floating point data type. In a normal algebraic expression the destination for the result is on the left hand side of the equation, while the mathematical expression is on the right side. Such as:

result = term1 + term2
Using the above expression, the first thing we need to do is to make a determination of what the destinations data type and storage size is. That is so that we can output the correct data size when we declare the variables in the assembly code. In the above expression, no data type symbol is given for the result, therefore we can conclude that it is a 16 bit integer and would be written to the output file as:

result

DW

0

Actually, since we have already begun appending a variable type identifier to the variable names, (in the case of integers it is the capital letter "I",) we would need to add that to the variable definition as well:

resultI

DW

0

For the other data types we will need to add an identifier letter as well. Here is the list of identifiers that we will use for each data type: Letter: I L Q F D symbol: none % ~ ! # data type: 16 bit integer 32 bit integer 64 bit integer 32 bit floating point 64 bit floating point size: DW DD DQ DD DQ

In usage, here is how the various numeric data types will be written to the '.asm' file:

SixteenBitIntegerI ThirtytwoBitIntegerL SixtyfourBitIntegerQ ThirtytwoBitFloatF SixtyfourBitFloatD

DW DD DQ DD DQ

? ? ? ? ?

587

Let's get started putting together the source code we will be needing. First we will make some changes to some existing code and then add some brand new code to handle our new data types. In file: Asminput.c, replace the existing version of function ScanLet(), with this version here:

void ScanLet() { char ch, varname[VAR_NAME]; int pi, type, ab_code=5, x=line_ndx; pi = 0; e_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); type = get_vtype(pi); /* get variable type */ /* --- process numeric expressions --- */ if(type == 1) { ScanNumeric(varname); } /* --- string --- */ else if(type == 3) { ScanString(varname); } else { a_bort(ab_code, x); } } /*---------- end ScanLet ----------*/

You will notice that function Scanlet() now calls two new functions: ScanNumeric() and ScanString(). Now add the new function: ScanString() as well:

void ScanString(char *Name) { char ch, varname[VAR_NAME]; int pi, ab_code=8, x=line_ndx; strcpy(varname, Name); strcat(varname, "$"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDB 0"); writeln("\t\tDW 0"); } pi = e_pos; pi++; ch = p_string[pi]; pi = get_rightExp(pi); ch = p_string[pi];

588

(Continued) while(ch != '\n') { if(ch == '\"') /* { e_pos = pi; GetNewVar(); /* pi = e_pos; } else if(isalpha(ch)) /* { pi = get_NextOp(pi); } else if(isdigit(ch)) /* { a_bort(ab_code, x); } pi = get_rightExp(pi); ch = p_string[pi]; } found a quoted string */ get new varname */ skip over varnames */ error: digits */

} /*---------- end ScanString ----------*/

Save file: Asminput.c and close it. Function ScanString()does pretty much the same thing as the string portion of the previous version of Scanlet(). Now create a new file and name it: AsmNInpt.c. Copy this header to the top of the file:

/* bxbasm : AsmNInpt.c : alpha version */ /* ----- function prototypes ----- */ #include "prototyp.h" [snip]

Now copy the following functions to that same file:

void ScanNumeric(char *name) { char ch, varname[VAR_NAME]; int pi, type, len, ab_code=5, x=line_ndx; strcpy(varname, name); pi = e_pos; len = strlen(varname); s_pos = pi - len; /* --- get var-type length --- */ type = get_Nvtype(pi);

589

(Continued) /* --- variable declarations --- */ Sav_Destin(type, varname); /* save destination name */

/* --- handle expressions --- */ pi = e_pos; len = strlen(p_string); while(pi < len) { ch = p_string[pi]; while((isalnum(ch) == 0) && (pi < len) && (ch != '.')) { pi++; ch = p_string[pi]; } if(pi < len) { e_pos = pi; if((isdigit(ch)) || (ch == '.')) { GetNewTmp(); /* get new temp varname */ } else if(isalpha(ch)) /* a variable name */ { AddNewVar(); } pi = e_pos; len = strlen(p_string); } } } /*---------- end ScanNumeric ----------*/

Function ScanNumeric(), having it's roots in what was Scanlet(), has been redesigned to make use of the new numeric data types. Notice this segment of code: [snip] /* --- get var-type length --- */ type = get_Nvtype(pi); [snip]

here a call is made to new function: get_Nvtype(), where the specific numeric data type is determined.

void Sav_Destin(int type, char *Name) { static char varname[VAR_NAME]; int ab_code=5, x=line_ndx; strcpy(varname, Name); /* --- variable declarations --- */

590

(Continued) if(type == 1) /* 16-Bit { strcat(varname, "I"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDW 0"); } } else if(type == 2) /* 32-Bit { strcat(varname, "L"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDD 0"); } } else if(type == 3) /* 32-Bit { strcat(varname, "F"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDD 0"); } } else if(type == 4) /* 64-Bit { strcat(varname, "D"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDQ 0"); } } else if(type == 5) /* 64-Bit { strcat(varname, "Q"); if(check_VarName(varname) == 0) { write_str(varname); writeln("\tDQ 0"); } } else { a_bort(ab_code, x); } /* --- save destination name --- */ StorNVar(varname); } /*---------- end Sav_Destin ----------*/ Word Integer */

Long Integer */

Floating Point */

Double Float */

Quad Integer */

In function Sav_Destin() the data type's size is written to the .asm file.

591

void AddNewVar() { char ch, varname[VAR_NAME]; int pi, type; pi = e_pos; s_pos = pi; /* --- get varname --- */ strcpy(varname, get_varname()); pi = e_pos; /* --- get variable type --- */ type = get_Nvtype(pi); if(type == 1) { strcat(varname, "I"); } else if(type == 2) { strcat(varname, "L"); } else if(type == 3) { strcat(varname, "F"); } else if(type == 4) { strcat(varname, "D"); } else if(type == 5) { strcat(varname, "Q"); } /* --- save varname in array --- */ StorNVar(varname); } /*---------- end AddNewVar ----------*/

void GetNewTmp() { char ch, NVar[12], a_string[BUFSIZE]; int pi, x=0, dotflag=0; long itest=0; double ftest=0; pi = e_pos; s_pos = pi; /* --- get new varname --- */ strcpy(NVar, NewFVarname()); /* --- get numeric value --- */ ch = p_string[pi]; while(isdigit(ch) || (ch == '.')) /* copy digits */ { if(ch == '.') { dotflag = 1; /* flag: floating point number */ } a_string[x] = ch;

592

(Continued) x++; pi++; ch = p_string[pi]; } a_string[x] = '\0'; e_pos = pi; /* --- integers --- */ if(dotflag == 0) { itest = atol(a_string); if(itest <= 32767) { strcat(NVar, "I"); write_str(NVar); /* NVar DW integer */ write_str("\tDW "); } else if(itest <= 2147483647) { strcat(NVar, "L"); write_str(NVar); /* NVar DD long */ write_str("\tDD "); strcat(NVar, "%"); } else { strcat(NVar, "Q"); write_str(NVar); /* NVar DD long */ write_str("\tDQ "); strcat(NVar, "~"); } } /* --- floats --- */ else { ftest = atof(a_string); if(ftest <= 2147483647) { strcat(NVar, "F"); write_str(NVar); /* NVar DD float */ write_str("\tDD "); strcat(NVar, "!"); } else { strcat(NVar, "D"); write_str(NVar); /* NVar DQ double */ write_str("\tDQ "); strcat(NVar, "#"); } } writeln(a_string); /* write temp declaration */ /* --- save varname in array --- */ StorNVar(NVar); } /*---------- end GetNewTmp ----------*/

Function GetNewTmp() matches the value of a variable with the correct data type and size and then writes the data type to the .asm source file.

593

void StorNVar(char *name) { char ch, varname[12]; int pi, x=0, ii, len; unsigned size; strcpy(varname, name); len = strlen(varname); ii = (s_pos + len); /* how many chars to xfer */ if(s_pos > 0) { for(x=0;x < s_pos; x++) { s_holder[x] = p_string[x]; } x = 0; } for(pi=s_pos; pi < ii; pi++,x++) { s_holder[pi] = varname[x]; /* copy varname */ } x = pi; s_pos = pi; pi = e_pos; ch = p_string[pi]; while(ch != '\0') { s_holder[x] = p_string[pi]; x++; pi++; ch = p_string[pi]; } s_holder[x] = '\0'; strcpy(p_string, s_holder); /* --- save p_string to array --- */ x = strlen(p_string); len = strlen(array1[line_ndx]); if(x > len) { size = (x+1); ii = line_ndx; array1[ii] = realloc(array1[ii], size * sizeof(char)); } strcpy(array1[line_ndx], p_string); /* store to array */ e_pos = s_pos; } /*---------- end StorNVar ----------*/

Function StorNVar() replaces the line of program code stored in memory (array1[n]) with the newly revised version as it pertains to the current renamed variable.

594

char *NewFVarname() { char Val[6]; static char NVar[12];

/* generate a unique varname */

var_count++; strcpy(NVar, "VARX"); sprintf(Val, "%d", var_count); strcat(NVar, Val); return NVar; } /*---------- end NewFVarname ----------*/

STOP ____________________________!!! **end of file! Now save file: 'AsmNInpt.c', and close it. Now open file: Asmutils.c, and add these new functions:

int get_Nvtype(int pi) { char ch; int type=0; ch = p_string[pi]; while(isalnum(ch)) { pi++; ch = p_string[pi]; } if(strchr(" )=+*-/\n", { type = 1; /* } else if(ch == '%') { type = 2; /* } else if(ch == '!') { type = 3; /* } else if(ch == '#') { type = 4; /* } else if(ch == '~') { type = 5; /* } return type;

ch)) a 16 Bit Word Integer */ a 32 Bit Long Integer */ a 32 Bit Floating Point */ a 64 Bit Double Float */ a 64 Bit Long Integer */

} /*------- end get_Nvtype --------*/

595

int get_rightExp(int pi) { char ch;

/* get to rightside expression */

ch = p_string[pi]; while((isalnum(ch) == 0) && (ch != '\"') && (ch != '\n')) { pi++; ch = p_string[pi]; } return pi; } /*---------- end get_rightExp ----------*/

Save file: Asmutils.c and close it. In file: Bxbasm.c , modify the "function includes" section to read as follows: /* --- function includes --- */ #include "input.c" #include "utility.c" #include "asminput.c" #include "asmutils.c" #include "asmfunct.c" #include "asmerror.c" #include "asmvars.c" #include "asmrdp.c" #include "asmninpt.c"

Save file: Bxbasm.c , and close it. Now open file: Asmvars.c, and copy these new functions into that file:

/* variable types: int=no sym, long int=%, float=!, double float=#, Dlong int=~, string=$ */

16-bit 32-bit 32-bit 64-bit 64-bit byte

void Do_let() { char ch, varname[VAR_NAME]; int pi, stlen, ndx=0, type; int ab_code=6, x=line_ndx; stlen = strlen(p_string); pi = e_pos;

596

(Continued) /* --- retrieve variable name from statement --- */ pi = get_alpha(pi, stlen); if(pi == stlen) /* error: didn't find it */ { a_bort(ab_code, x); } e_pos = pi; strcpy(varname, get_varname()); type = get_vtype(pi); /* get variable type */ /* --- we now have varname and type --- */ if(type == 3) { copy_string(varname); /* string variable assignment */ } else { type = get_Nvtype(pi); /* get var-type length */ /* --- floating point --- */ if((type == 3) || (type == 4)) { asn_float(type,varname); } /* --- integers --- */ else if(type != 0) { asn_integer(type,varname); } else { a_bort(ab_code, x); } } } /*---------- end Do_let ----------*/

void asn_integer(int type, char *name) { char ch, varname[VAR_NAME]; int pi; pi = e_pos; pi = iswhite(pi); e_pos = pi; strcpy(varname, name); if((type == 2) || (type == 5)) { pi++; pi = iswhite(pi); e_pos = pi; }

/* 32/64-Bit Long Integer */ /* skip over 'type' symbol */

/* --- call: recursive descent parser --- */ Assignment(); /* --- store data --- */ write_str(" FISTP ");

597

(Continued) write_str(varname); writeln("\t\t; store integer, pop stack"); } /*---------- end asn_integer ----------*/

void asn_float(int type, char *name) { char ch, varname[VAR_NAME]; int pi; pi = e_pos; pi = iswhite(pi); e_pos = pi; strcpy(varname, name); pi++; /* skip over 'type' symbol */ pi = iswhite(pi); e_pos = pi; /* --- call: recursive descent parser --- */ Assignment(); /* --- store data --- */ write_str(" FSTP "); write_str(varname); writeln("\t\t; store float, pop stack"); } /*---------- end asn_float ----------*/

Save file: Asmvars.c, and close it. Now create a new file and name it: Asmrdp.c. Copy this header to the top of the file:

/* bxbasm : Asmrdp.c : alpha version */ /* special credits to: Jack Crenshaw's "How to Build a Compiler" */ /* ----- function prototypes ----- */ #include "prototyp.h"

Now copy these functions into file: Asmrdp.c:

598

void Assignment() /* Recursive Descent Parser Main */ { writeln(" FINIT \t\t\t; initialize x87"); Match('='); a_Expression(); } /*---------- end Assignment ----------*/

void a_Expression() { char ch; int pi;

/* Parse and Translate an Expression */

pi = e_pos; ch = p_string[pi]; if(IsAddop(ch)) { writeln(" FLDZ\t\t\t\t; push 0 onto stack"); } else { a_Term(); pi = e_pos; ch = p_string[pi]; } while(IsAddop(ch)) { switch(ch) { case '+': Add(); break; case '-': Subtract(); break; default: break; } pi = e_pos; ch = p_string[pi]; } } /*---------- end a_Expression ----------*/

void a_Term() { char ch; int pi; a_Factor(); pi = e_pos; ch = p_string[pi];

/* Parse and Translate a Math Term */

599

(Continued) while(IsMultop(ch)) { switch(ch) { case '*': Multiply(); break; case '/': Divide(); break; default: break; } pi = e_pos; ch = p_string[pi]; } } /*---------- end a_Term ----------*/

void a_Factor() /* Parse and Translate a Math Factor */ { char ch, Value[10]; int pi, ab_code=7, x=line_ndx; pi = e_pos; ch = p_string[pi]; if(ch == '(') { Match('('); a_Expression(); Match(')'); } else if(isalpha(ch)) { Ident(); } else { a_bort(ab_code, x); }

/* variable name */ /* error */

} /*---------- end a_Factor ----------*/

void Ident() /* Parse and Translate an Identifier */ { char ch, Name[VAR_NAME]; int pi, type, ab_code=5, x=line_ndx; strcpy(Name, get_varname()); pi = e_pos; /* --- get variable type --- */ type = get_Nvtype(pi);

600

(Continued) SkipWhite(); pi = e_pos; ch = p_string[pi]; if(type != 1) { pi++; e_pos = pi; SkipWhite(); pi = e_pos; ch = p_string[pi]; } if(ch == '(') /* function call */ { Match('('); Match(')'); write_str(" call "); writeln(Name); } /* --- load integers --- */ else if((type == 1) || (type == 2) || (type == 5)) { write_str(" FILD "); write_str(Name); writeln("\t\t; load integer"); } /* --- load floating pt. --- */ else if((type == 3) || (type == 4)) { write_str(" FLD "); write_str(Name); writeln("\t\t; load float"); } else { a_bort(ab_code, x); } } /*---------- end Ident ----------*/

void Add() /* Recognize and Translate an Add */ { Match('+'); a_Term(); writeln(" FADDP ST(1)\t\t; add to ST(0) and pop"); } /*---------- end Add ----------*/

601

void Subtract() /* Recognize and Translate a Subtract */ { Match('-'); a_Term(); writeln(" FSUBP ST(1), ST\t; subtract and pop"); } /*---------- end Subtract ----------*/

void Multiply() /* Recognize and Translate a Multiply */ { Match('*'); a_Factor(); writeln(" FMULP ST(1)\t\t; multiply and pop"); } /*---------- end Multiply ----------*/

void Divide() /* Recognize and Translate a Divide */ { Match('/'); a_Factor(); writeln(" FDIVP ST(1)\t\t; divide and pop"); } /*---------- end Divide ----------*/

Save file: Asmrdp.c, and close it. Now open file: Input.c, and make the following changes to these functions:

void tmp_byte(int ii) { char ch; int pi, si, byte; int x=ii, ab_code=4; /* ----- fill temp_byte[] here ----- */ pi = e_pos; pi = iswhite(pi); ch = p_string[pi]; if(ch == '\'') /* it's a comment */ { byte = 0;

602

(Continued) strcpy(temp_prog[ii], "\n\0"); } else { if(isupper(ch)) /* is this a keyword */ { e_pos = pi; byte = get_byte(ii); /* call get_byte */ pi = e_pos; } else if(isalpha(ch)) /* a possible assignment */ { si = pi; /* save pointer position */ while(isalnum(ch)) { pi++; ch = p_string[pi]; } pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$~", ch)) /* a variable assignment */ { byte = 1; get_MOD(pi); /* scan for a MOD expression */ pi = si; } else { a_bort(ab_code, x); /* not an assignment */ } } else { a_bort(ab_code, x); /* not a keyword or variable */ } } temp_byte[ii] = byte; e_pos = pi; } /*---------- end tmp_byte ----------*/

int get_byte(int ii) { char ch, keyword[TOKEN_LEN]; int pi, si=0, byte; int x=ii, ab_code=4; pi = e_pos; ch = p_string[pi]; while(isalnum(ch)) { keyword[si] = ch; si++; pi++; ch = p_string[pi]; } keyword[si] = '\0'; /* --- assign byte code --- */

603

(Continued) if(strcmp(keyword, "REM") == 0) byte=0; else if(strcmp(keyword, "LET") == 0) { byte=1; get_MOD(pi); } /* scan for a MOD expression */ else if(strcmp(keyword, "CLEAR") == 0) byte=2; else if(strcmp(keyword, "LOCATE") == 0) byte=3; else if(strcmp(keyword, "PRINT") == 0) byte=4; else if(strcmp(keyword, "GOTO") == 0) byte=5; else if(strcmp(keyword, "BEEP") == 0) byte=6; else if(strcmp(keyword, "CLS") == 0) byte=7; else if(strcmp(keyword, "END") == 0) byte=8; else if(strcmp(keyword, "GOSUB") == 0) byte=9; else if(strcmp(keyword, "RETURN") == 0) byte=10; else if(strcmp(keyword, "FOR") == 0) byte=11; else if(strcmp(keyword, "NEXT") == 0) byte=12; else if(strcmp(keyword, "IF") == 0) byte=13; else if(strcmp(keyword, "ELSEIF") == 0) byte=14; else if(strcmp(keyword, "ELSE") == 0) byte=15; else if(strcmp(keyword, "ENDIF") == 0) byte=16; else if(strcmp(keyword, "INPUT") == 0) byte=17; else if(strcmp(keyword, "LINE") == 0) byte=18; else { pi = iswhite(pi); ch = p_string[pi]; if(strchr("=#%!$~", ch)) /* a variable assignment */ { byte = 1; get_MOD(pi); /* scan for a MOD expression */ pi = e_pos; /* push pointer back */ } else { a_bort(ab_code, x); /* not a keyword or variable */ } } e_pos = pi; return byte; } /*---------- end get_byte ----------*/

Save file: Input.c , and close it. Now make these changes and additions to file: Prototyp.h: /* void void void void int void void void Asminput.c */ Header(void); open_destin(void); Prolog(void); ScanVars(void); check_VarName(char *); InitVars(void); GetNewVar(void); StorVar(char *);

604

(Continued) char void void void void void void /* void void void /* char /* int int int /* void void void void char void void /* void void void void void /* *NewVarname(void); Epilog(void); assemble(void); ScanLet(void); ScanPrint(void); SystemConst(void); ScanString(char *); Asmutils.c */ writeln(char *); write_str(char *); PostLabel(char *); *get_varname(void); Note only */ get_vtype(int); Note only */ get_Nvtype(int); get_rightExp(int); AsmNInpt.c */ ScanNumeric(char *); GetNewTmp(void); StorNVar(char *); GetNewFloat(int); *NewFVarname(void); AddNewVar(void); Sav_Destin(int,char *); Asmvars.c */ Do_let(void); copy_string(char *); cat_string(int); asn_integer(int,char *); asn_float(int,char *);

/* /* /* /* /* /*

Asmrdp.c */ void Assignment(void); void a_Expression(void); void a_Term(void); void a_Factor(void); void Ident(void); void Add(void); void Subtract(void); void Multiply(void); void Divide(void); void Match(char); void _GetChar(void); int IsAddop(char); int IsMultop(char); int Is_White(char); void SkipWhite(void);

Note Note Note Note Note Note

only only only only only only

*/ */ */ */ */ */

605

Save file: Prototyp.h, and close it. There! We're done with all that. Now, compile Bxbasm.c. I hope we haven't introduced any new errors or typos into our code, but, with this many additions and changes it's entirely possible that we have. If your compiler turns-up any errors track them down and try and fix them. And, please let me know of them as soon as possible so I can make the same repairs in this document and upload the corrected code. Now, drop down to the DOS Prompt and using Bxbasm.exe, compile this new version of Test.bas: ' test.bas version 15.4 Integer = 10 Long% = 100 Float! = 1.5 Double# = 2.123 result = Double# * (Float! + (Long% - Integer)) PRINT result ' -----------------------------------------END ' ------------------------------------------

As you can see, in this version of Test.bas, we are making use of some of our new data types: 16-bit Integer, 32-bit Long Integer, 32-bit Float and 64-bit Double-Float.

Before executing the assembled "Test.com", let's take a look at the ".asm" source code that was generated by the A86 assembler:

; ;

*************BxbAsm Compiler*************

jmp START ; IntegerI DW 0 VARX1I DW 10 LongL DD 0 VARX2I DW 100 VARX2I DW 100 FloatF DD 0 VARX3F DD 1.5 DoubleD DQ 0 VARX4F DD 2.123 resultI DW 0 ; --------------------------------------------------

606

(Continued) ; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------NEWLINE DB 13,10,'$' SIXTN DB 16 ; segment multiplier C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division EndProg DW 0 ; ZSEG: last segment NextFree DW 0 IntString DB ' ','$',' ' ; integer string ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------FINIT ; initialize x87 FILD VARX1I ; load integer FISTP IntegerI ; store integer, pop stack FINIT ; initialize x87 FILD VARX2I ; load integer FISTP LongL ; store integer, pop stack FINIT ; initialize x87 FLD VARX3F ; load float FSTP FloatF ; store float, pop stack FINIT ; initialize x87 FLD VARX4F ; load float FSTP DoubleD ; store float, pop stack FINIT ; initialize x87 FLD DoubleD ; load float FLD FloatF ; load float FILD LongL ; load integer FILD IntegerI ; load integer FSUBP ST(1), ST ; subtract and pop FADDP ST(1) ; add to ST(0) and pop FMULP ST(1) ; multiply and pop FISTP resultI ; store integer, pop stack mov dx, resultI ; load integer value call PRINTINT mov dx, offset(NEWLINE) call PRINTSTR jmp DONE ; DONE: INT 20H START ENDP ; -------------------------------------------------[snip] (print functions omitted) ; -------------------------------------------------ZSEG: ; --------------------------------------------------

607

Except for the fact that one of our variable declarations and most of our comments are a bit staggered, (which for clarity, I have straightened out here, we will have to fix that), this looks fairly clean. Let's begin by examining the variable declarations:

; IntegerI VARX1I LongL VARX2I FloatF VARX3F DoubleD VARX4F resultI ;

DW DW DD DW DD DD DQ DD DW

0 10 0 100 0 1.5 0 2.123 0

as you can see, our compiler correctly translated our variable data types into the right data declarations. As a reminder, here is a list of our data types and memory storage sizes:

IntegerI: resultI: LongL: FloatF: DoubleD:

Word Integer: Word Integer: Double Integer: (regular)Float: (double)Float:

Declare Declare Declare Declare Declare

Word: Word: Double: Double: Quad:

16 16 32 32 64

bits bits bits bits bits

**Note, at this time we are not using the 64-bit Quad Integer. You will notice, though, if you look closely, that this assignment looks to be in error:

... LongL VARX2I ...

DD 0 DW 100

In the Basic script, shown here:

Long% = 100
we declared variable "Long" to be of type "%", which it was correctly declared as a 32-bit Double Integer. However, the temporary variable VARX2 has an "I" appended to it and it is declared as a data-type "WORD", shown here:

VARX2I

DW

100

Actually, there is nothing wrong in the code here. Odd as it may appear, the statement was correctly translated. You have to look at the section of code that performed the translation to see why this happened this way. In file: AsnNInpt.c, the function named GetNewTmp() performs this translation.

608

Here is the snippet of code that is in question: [snip] /* --- integers --- */ if(dotflag == 0) { itest = atol(a_string); if(itest <= 32767) { strcat(NVar, "I"); write_str(NVar); /* NVar write_str("\tDW "); } else if(itest <= 2147483647) { strcat(NVar, "L"); write_str(NVar); /* NVar write_str("\tDD "); strcat(NVar, "%"); } else { strcat(NVar, "Q"); write_str(NVar); /* NVar write_str("\tDQ "); strcat(NVar, "~"); } } [snip] In particular, look at these two lines of code: if(itest <= 32767) else if(itest <= 2147483647) Variable "itest" holds the value that is to be assigned. Based on that number either a data-type "I", "L" or "Q" will be written to the .asm source file. In this case, Basic variable "Long" is being assigned the value of 100. Therefore, temporary variable VARX2 is made to be a normal 16-bit word integer. Variable "Long" can, as the program runs, contain any value within the range of a 32-bit integer. However, temporary variable VARX2 isn't really a variable at all, it's a constant. Therefore, we might as well use the smallest amount of data-space (ram) as is necessary to contain it. Does that make sense? Now, let's examine the .asm program code: [snip] FINIT FILD FISTP FINIT FILD FISTP FINIT

DW

integer */

DD

long */

DD

long */

VARX1I IntegerI VARX2I LongL

; ; ; ; ; ; ;

initialize x87 load integer store integer, pop stack initialize x87 load integer store integer, pop stack initialize x87

609

FLD VARX3F FSTP FloatF FINIT FLD VARX4F FSTP DoubleD FINIT FLD DoubleD FLD FloatF FILD LongL FILD IntegerI FSUBP ST(1), ST FADDP ST(1) FMULP ST(1) FISTP resultI mov dx, resultI call PRINTINT mov dx, offset(NEWLINE) call PRINTSTR jmp DONE [snip]

; ; ; ; ; ; ; ; ; ; ; ; ; ; ;

load float store float, pop stack initialize x87 load float store float, pop stack initialize x87 load float load float load integer load integer subtract and pop add to ST(0) and pop multiply and pop store integer, pop stack load integer value

The first part, using the x87 instruction set, just loads the variables with the values, from the constants, needed to perform the math operation. Look back at the original Basic script:

result = Double# * (Float! + (Long% - Integer))
then compare it to the assembly source code shown here: [snip] FLD FLD FILD FILD FSUBP FADDP FMULP FISTP [snip]

DoubleD FloatF LongL IntegerI ST(1), ST ST(1) ST(1) resultI

; ; ; ; ; ; ; ;

load float load float load integer load integer subtract and pop add to ST(0) and pop multiply and pop store integer, pop stack

Bare in mind that each F_LD is the equivalent of a PUSH onto the x87's stack. Variables DoubleD and FloatF have to be pushed onto the stack before IntegerI can be subtracted from LongL. Now execute Test.com. There is one caveat to our test program though. We only have the ability to display regular 16-bit integers. Our print routines can't display 32-bit integers nor floating point numbers. Why not? Well, as far as integers go, we only need to make a small change to our existing print routines. But, floating point numbers have to be handled in an entirely different way, due to the way floating point numbers are stored in memory.

610

We will have to tackle that in the next issue.

CONCLUSION
There's still more to come. Here is a short list of some of the things we will tackle in the next chapter: • • • more x87 programming, handling floating point numbers in assembly language, converting floating point values to a printable form.

611

CHAPTER - 16
INTRODUCTION
In the last chapter we made considerable progress in our x86 native code compiler by incorporating the x87 FPU. Additionally, we’ve expanded our language to include floating point values and constructed a recursive descent parser that will handle the basic math operations. At the point where we left off we still had no means of displaying the results of a floating point calculation. This will be our focus in this chapter.

DISPLAYING NUMBERS:
First, let's recall what is involved in the processing and displaying of regular Integer values. The x86 cpu's instruction set is capable of performing binary (base-2) and base-10 integer math in either 8, 16, 32 and 64-bit numbers which are then stored in memory as Base-2 (binary) integers. That is to say, that the whole (integer) numbers from one to five, when stored in binary form, look like this:

Base-10 Base-2

1 0001

2 0010

3 0011

4 0100

5 0101

All we need to do, in order to display a single character integer number is to add 30 Hex, (or 48 decimal) to its value and that leaves us with the correct ascii character code for that particular number. i.e.:

Decimal: 0 Ascii: 30H

1 31H

2 32H

3 33H

4 34H

5 35H

Refer to your ascii code chart for the digits 6 thru 9. Even though the x86 cpu has no instruction for performing this conversion from binary to ascii, the process is relatively simple. For multi digit integers, all we have to do is continually divide the number in the AX register by ten, using the x86 DIV instruction. Then we add 30h to the fractional part, which is left in the DL register. The sum in the DL register can then either be displayed on the screen or stored in a display buffer until all the digits have been converted to ascii. Here is a short assembly language snippet from our INTASC function that performs just that: IA3: xor div add

dx, dx C10 dl, '0'

; clear High Order Word ; divide by ten ; convert to ascii digit

In this example, C10 is a constant which equals "10" and the ascii character '0' which is the same as 30h is added to the DL register to make the integer into an ascii character.

612

The problem is that floating point numbers are not stored in memory in the same way as integer numbers are. In fact, the x86 has no capability for handling and processing floating point numbers. Floating point numbers cannot be loaded into any of the x86 registers in any meaningful way. That doesn't mean that you can't load a floating point number stored in memory into an x86 register. It just means that the x86 will interpret it as an integer value, which would be wrong. As far as the x86 is concerned, you could say that the way in which floating point numbers are stored in memory is more of a software encoding method than anything having to do with the x86 hardware. What I mean here is, that integer numbers are native to the x86 and floating point numbers are not.

WHAT IS A FLOATING POINT NUMBER?:
Floating point numbers are also called “Real Numbers”. To be decoded, floating point numbers have to be converted by means of a software algorithm into three distinct segments: • the sign, • the exponent • and the mantissa, (pronounced: man-tess-ah) as seen here:

[sign] [exponent] [mantissa]
If you are unfamiliar with this concept, it is called Scientific Notation and often has the written appearance of something like this: +1.0e2 or +1.0x10² • • the "2", (in the first example) to the right of the "e" and the "x10", (in the second example) are called the exponent and are a decimal place holder, everything to the left of the exponent is the mantissa and that represents the value of the number.

In this example, the exponent (the “2”) is indicating that the decimal point may be moved to the right of it's present location by two digits. 100. In other words, the value represented above equals: Another thing to know about floating point numbers, as they are employed by the Intel processors, is that the smallest storage size used with the x86 is 32 bits or four bytes long, which is called Single Precision. Here is a breakdown of how the Intel processors store a single precision floating point number:

[1-sign bit] [8-exponent bits] [23-mantissa bits]
There are a number of conflicting schemes for storing floating point numbers. This is just one method and it is the method that Intel chose to use and the IEEE has adopted as a standard. An important thing to note is that Intel and the IEEE have agreed on these things, in regard to storing floating point numbers: • the sign is represented by a "1" or a "0", • a "1" means negative and a "0" means positive, • that the exponent would be in an excess-127 format, or have a bias of 127, • that the value zero would be a null value, • and that the first "1" bit in the mantissa would be implied.

613

"Okay...!", you are thinking, "so, what does all that mean ?" Well, • first, the exponent tells you how many decimal places (binary places rather) the decimal point has to be moved. Either to the right for positive whole values, or, to the left for fractional values. The exponent has a size of 8 bits and since eight bits can represent a total value of 255, by splitting 255 in half, you can represent 127 decimal positions above the center position or 127 positions below the center point. That means 127 decimal places to the right and/or 127 decimal places to the left.

0..............127............255
That's all the exponent does, it tells you where to move the decimal point. Actually, the decimal point, where it concerns binary representations of Real Numbers, it's referred to as the "Binary Point". With the exponent value set at a number equal to or greater than 127, the difference between 127 and the value in the exponent byte is how many decimal places to the right the decimal point has to be moved. Example: the number "2.0" would be represented by:

sign exponent mantissa [0] [ 128 ] [ 1.00000...0]
• • • • the sign is zero, therefore positive, subtract 127 from 128 = 1. 1 is how many binary places the binary point has to be moved to the right, the mantissa is implied to be at least "1.0", therefore, the adjusted binary value in the mantissa is:

[ 10.000000...0]
Why does "10.000...0" equal "2" ? Because it's BINARY:

[ 0000-0010 ] = 2
the leading zeros are imaginary and you have to mentally add them in. • A value of zero would be represented by a "Null" value:

sign [0]

exponent mantissa [0] [ 00000...0]

The exponent and mantissa are all zeros. • By implying that the first bit in the mantissa is a "1" means that we don't have to waste a bit to represent it, because we already know it is a "one". What does this mean ?

614

Well, let's look at a bit-for-bit representation of the number "1.0":

[sign] [ exponent ] [. mantissa ] [0] - [0111-1111] - [.0000-0000-0000-0000-0000-000]
Our floating point number is 32 bits long. • • The sign is positive, and consumes one bit. The 8 bit exponent equals 7F-hex, which is 127 decimal, so, 127 minus 127 = 0, zero is the number of places to move the Binary Point. The 23-bit mantissa equals .0...0. Once again we have to imagine that there is a 24th bit, just to the left of the binary point and that it is always set to "1".

•

Thus the value of "1.0".

23 BIT MANTISSA:
Now, what if we wanted to represent 123456789.0 or 123.456789 or 0.123456789. In any case these are either very large and/or very small numbers to represent using only 23 bits. If we were to rearrange the 23 mantissa bits, so that they conform to a recognizable binary pattern, they would look more like this:

shift

[.000-0000-0000-0000-0000-0000]
Now, if we were to set all 23 bits the highest value that they could represent, it would be: Binary Hex

[111-1111-1111-1111-1111-1111] 3 - F - F - F - F - F

which equals: 4,194,303.0. The addition of just one single bit would increase the maximum value to: 5,242,879.0. Which is a difference of: 1,048,576. The engineers that were working to solve this problem believed that they could give themselves one extra bit, for free, if they could assume, or imply that a 24th bit existed and that it was always set to a "1". Additionally, that the Binary Point follows immediately to the right of the "Implied 1" bit. Thus, leaving the remaining 23 bits, that actually do exist, to makeup the remainder of the value. "How does that work?" you may ask. Let's assume for a moment that we are only concerned with whole numbers, (no fractional parts). If a number is greater that zero, it has to be at least "one". So, what this means is, that even though the mantissa of a Real Number may appear to be in the form of:

[0000-0000-0000-0000-0000-000]

615

it must be imagined as being:

[1.000-0000-0000-0000-0000-0000]
Therefore, the floating point value of "1.0", when stored in memory will have the appearance of: Binary [0011-1111-1000-0000-0000-0000-0000-0000] Hex 3 F 8 0 0 0 0 0 Or, more simply put: 3F80-0000 Remember, the left-most bit (or High-Order bit), is the sign-bit. Therefore we have to shift everything to the left by one bit:

shift

[0-0111-1111-0000-0000-0000-0000-0000-000] to get 7 F 0 0 0 0 0 0
or 7F00-0000

Which really translates to:

[0111-1111-1.000-0000-0000-0000-0000-0000] 7 F - 1.0 0 0 0 0 0
Therefore, you could say that there exists a memory model and a real model that represents a floating point number. Here are the representations of the numbers from two to five in both models: 2-Memory: [0100-0000-0000-0000-0000-0000-0000-0000] = 4000-0000H 2-Real: [1000-0000-1.000-0000-0000-0000-0000-0000] = 80h, 0010-0000

8

0

0010.

shift binary point 1 3-Memory: [0100-0000-0100-0000-0000-0000-0000-0000] = 4040-0000H 3-Real: [1000-0000-1.100-0000-0000-0000-0000-0000] = 80h, 0011-0000

8

0

0011.

shift binary point 1 4-Memory: [0100-0000-1000-0000-0000-0000-0000-0000] = 4080-0000H 4-Real: [1000-0001-1.000-0000-0000-0000-0000-0000] = 81h, 0100-0000

8

1

0100.

shift binary point 2 5-Memory: [0100-0000-1010-0000-0000-0000-0000-0000] = 40C0-0000H 5-Real: [1000-0001-1.010-0000-0000-0000-0000-0000] = 81h, 0101-0000

8

1

0101.

shift binary point 2 Since the mantissa has only 23 bit positions and the exponent has a range of 127 binary point positions, clearly the range of the exponent is greater that the number of bits in the mantissa. What that means is that theoretical values much greater that those shown above can be represented.

616

For instance, a "1" with 127 trailing "0's" before the binary point: 1....<127>....0. Or, a binary point followed by 127 leading "0's" before the "1": .0...<127>....1

FRACTIONAL MANTISSA:
We have so far discussed everything to the left of the binary point. Fractional numbers are represented by the remaining bits that are to the right of the binary point. If the integer or whole number portion of a floating point number is significantly large and consumes a large number of the available 24 bits, the fractional portion can and will lose precision. This is due to the simple fact that there are fewer bits to accurately represent the fraction. Ordinarily, when interpreting binary numbers, beginning with the right-most bit, each bit to the left is double the value of the prior bit. i.e.: ...8, 4, 2, 1. Using binary numbers to count the fractional portion of a mantissa, works in just the opposite way. Beginning with the left-most bit, each bit to the right of the "1." bit is worth one-half the value of the prior bit. Example: assume our 24th bit is "1" and it is followed by the binary point. The next bit, to the right of the binary point, has a value of "0.5". The next bit to the right has a value of "0.25", etc.

1.000-0000-0000-0000-...0
.5 .25 .125 .0625 .03125 .015625 .0078125 Make a chart by calculating all 23 mantissa bits, keep dividing the last number by two, for all 23 bits.

Quiz:
What is the smallest fractional value possible, using only 23 mantissa bits ? What is the largest fractional value possible by using 23 mantissa bits ? If the floating point value was "1.5", then the first bit to the right of the binary point would be set and the floating point number would be stored in memory as: [0011-1111-1100-0000-0000-0000-0000-0000]

3 - F - C - 0 - 0 - 0 - 0 - 0
Translation: the sign bit is zero, so shift it out, to the left. Now shift all remaining bits to the left one bit.

617

That leaves: [0] or: for:

[0111-1111]-[1000-0000-0000-0000-0000-000] 7 - F - 8 - 0 - 0 - 0 - 0 - 0 [1.1000-0000-0000-0000-0000-000] 1.5

A value of "1.75" would be stored as: [0011-1111]-[1110-0000-0000-0000-0000-0000]

3 or: [0]

F -

E -

0 -

0 -

0 -

0 -

0

for:

[0111-1111]-[1100-0000-0000-0000-0000-000] 7 - F - C - 0 - 0 - 0 - 0 - 0 [1.1100-0000-0000-0000-0000-000] 1.5 +.25

How can you verify that ? Copy this snippet of code to a file named "test.asm" in your A86 directory. ;\a86v3\ test.asm... ; -------------------------------------------------jmp START ; -------------------------------------------------float32 dd 1.75 ; -------------------------------------------------START PROC near ; -------------------------------------------------FINIT FLD float32 Done: INT 20H ; -------------------------------------------------START ENDP

1) assemble the .asm file by dropping down to the DOS prompt and typing:

a86 test.asm
2) using the D86 disassembler, type:

d86 test
3) press the F-10 key

618

4) your screen will look something like this:

0100 JMP START 0103 ............... 0105 ............... START 0107 FINIT 010A FLD FLOAT32 010F JMP DONE AX BX CX DX 0000 0000 0000 0000 ...... ...... ...... ...... 1: 2: 3:

0:______ 1:______ 2:______

5) key this in exactly as shown here: 1b,float32 then press [enter] You will notice the bottom portion has changed to something like this:

AX BX CX DX

0000 0000 0000 0000

...... ...... ...... ......

1: b,float32,, 00 00 E0 3F .... 2: 3:

What we are looking at is the Hex representation of the value stored in memory location "float32". Which is 1.75 decimal and 3FE0-0000 hex. 6) now press the F1 key, until the value "+1.75" appears at the top right of the screen. Press the letter Q and [Enter] to exit. Now, go back to "test.asm" and change the value of float32 to "1.(whatever)" and repeat the above 6 steps.

Quiz:
Using the chart you created above and using the available 23 mantissa bits and the implied 24th bit, on paper illustrate which bits need to be set to "1's" in order for the mantissa to equal "1.1...", to the smallest fraction possible. In other words, "1.125" is not the correct answer.

619

DISPLAYING REAL NUMBERS:
We can devise an algorithm that will left-shift the bits of a floating point number to separate the sign and the exponent. And, break the mantissa down into the integer and fractional portions. Placing them in the various x86 registers. Like so: [0] [0111-1111]-[1000-0000-0000-0000-0000-000] AX 0000 BX 007F CX 0001 DX 8000 1 1.

But in doing so, we might end up with some complicated code. Trying to keep track of all the shifts and carries can end up with a program that's hard to debug at a later date, long after we've forgotten what it all meant. What I'd like to do, instead, is utilize the x87 FPU to do most of the work for us. And, possibly end up with some code that is easier to understand. Plus we can learn some of the x87 instruction set in the process. The x87 by itself has no direct means of converting floating point numbers into a printable form, but, with a small amount of effort we can make use of some x87 features to make the job easier. Just as integers are native to the x86 CPU, floating point numbers are native to the x87. There is one caveat though about using the x87, and that is that the x87 has no designated "sign flag". When using the x86 you can load an integer value into the AX register and test the "sign flag" to determine if the value it contains is positive or negative. Not exactly so with the x87. The x87 has a 16 bit "status word" to signify the status of the FPU. Four bits of the status word are used to indicate a "greater than" or "lesser than" status. A value on the stack can be compared against the value "0" and the status word will indicate whether the value was greater than or less than zero. If the status word indicates a value is less than zero, then it is safe to presume that it is a negative value. No positive value, integer or fraction, can be less than zero.

WHAT'S YOUR SIGN?
Below is a snippet of code that employs the x87 to deduce the sign of a given value stored in memory. There are three variables used here: • • • decsign; which will be our sign flag, 0=positive, 1=negative, float32; our 32 bit floating point value, sw87 ; storage space for the x87 status-word.

620

mov decsign, 0 FLD float32 FLDZ FCOM FSTSW sw87 FWAIT mov ax, sw87 and ax, 4700h jnz SIGN_POSITIVE mov decsign, 1 SIGN_POSITIVE: FINIT ret

; ; ; ; ;

the sign flag is cleared by setting it to zero our floating point number is loaded onto the x87 stack an x87 instruction to load a "0" onto the stack compares float32 to "0", causes the status word to be set stores the status-word in memory variable "sw87"

; the status word is ANDed with this bit pattern: ; 0100-0111-0000-0000b ; to determine if float32 is less than zero, ; if it's a negative value, the sign-flag variable is set

You might wonder, “why not simply load the value in float32 into the AX register and test the sign-flag ?” Well, you can't. Unless float32 is an integer, you can't load floating point values into the x86 registers. Remember ?

INTEGERS & FRACTIONS:
For the next procedure, we need to split the floating point number into the whole and fractional parts. We will again make use of the x87. One particular x87 instruction is called FRNDINT, which stands for: Round-to-Integer. The purpose is to take a float value, such as: 1.23456 and round it up to an integer value. Assume that float32's value is 1.23456, here is how it would be used: FLD float32 FRNDINT This will load float32 into stack(0) position and "round" it up to "1.0", loosing the fractional portion. That will work well only about one half of the time though. If however, float32's value were to be: 1.65432, then there will be a problem. The same instruction will round 1.65432 up to "2.0". This could cause a significant problem if what we really wanted was "1.0". Well, there is a way to deal with that, programmatically. All we need to do is load float32 onto the stack twice, then round-up stack(0) and compare the rounded value against stack(1). If the contents of stack(0) is greater than stack(1), (which it should be less if anything), then we know that the integer portion grew and it needs to be decremented by one. Example: FLD float32 St(0): 1.23456 FLD float32 St(1): 1.23456 now round up: FRNDINT St(0): 1.0 St(1): 1.23456

621

now compare: FCOM If Stack(0) is less than Stack(1) we are okay. But, if Stack(0) is greater than Stack(1), then we may have a problem. Let's change the value for float32 to: "1.65432" and try it again: FLD float32 FLD float32 now round up: FRNDINT now compare: FCOM St(0) is greater than St(1), so we decrement St(0): We first, load a "1", FLD1 St(0): 1.0 St(1): 2.0 St(2): 1.65432 St(0): 2.0 St(1); 1.0 St(0): 1.0 St(1): 1.65432 St(0): 2.0 St(1): 1.65432 St(0): 1.65432 St(1): 1.65432

next, exchange St(0) with St(1), FXCH subtract St(1) from St(0) FSUBP St(1) save whole number: FIST integer Now, no matter what the value of Stack(0) was after the round-up, it will now be correctly adjusted to be the whole number portion of float32. Stack(0) will then be stored in variable "integer". After the Integer-Store instruction, Stack(1) will be left containing the original floating point value of float32. To arrive at the fractional portion, all we have to do is subtract the integer portion, contained in Stack(0) , from Stack(1) and we will have the remaining fractional value. subtract St(0) from St(1) and pop, FSUBP St(1) St(0): 1.0 St(1): 1.65432 St(0): 0.65432 Now we need to convert the fractional portion, which is now in Stack(0), into an integer value, so that our display routines can process the numbers. There is one small issue though, we need to use as much of the fractional part as possible. In some cases, maybe all we need is a fraction of only two decimal places, but, generally that may not be sufficient or accurate. And, then again, twenty or thirty decimal places, though it gives greater accuracy, is far too many. For our purposes, we will use eight decimal places. What we will do is, multiply the decimal fraction by

leave remaining fractional value in St(0),

622

"10000" twice and that will shift each decimal character to the left of the decimal point for a total of eight decimal places. The fraction in Stack(0), which is: "0.65432" will be converted to an integer of: "65432000.". Here is what that operation will look like: *C10000; integer variable = 10000 FILD C10000 exchange registers, FXCH multiply, twice & pop FMUL St(1) FMULP St(1) now round to an integer, FRNDINT save to variable "fraction", FISTP fraction That's all there is to it. Then all we have to do is call the display routines to print out the sign, integer variable, decimal point and fraction variable: (sign)[ integer . fraction ] St(0): 6543.2000 ----^ St(0): 65432000. -------^ St(0): 0.65432 St(1): 10000. St(0): 10000. St(1): 0.65432

1.65432000

623

CONVERSION ROUTINES:
Here are the modifications to the existing code and some new functions. In file: Bxbasm.c, add this new variable declaration: int printfloat=0; so that it reads as follows: [snip] char int int int int int /**/ [snip] **VarIndx; VarNdxCnt=0; stringcopy=0; stringcat=0; printinteger=0; printfloat=0; /* /* /* /* /* /* variables name index variables count flag: string copy flag: string concatenate flag: print integer flag: print float */ */ */ */ */ */ /* flag: print float */

That is the only change for this file. Save Bxbasm.c and close it. Next, open file: Asmfunct.c. Here are the listings for replacements of existing functions and some new ones.

void Do_print() { char ch, temp[VAR_NAME]; int pi, type; pi = e_pos; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; printstring = 1; /* --- print newline --- */ if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); return; } /* --- LOOP: multiple print statements --- */ while(strchr("\n\0", ch) == 0) { strcpy(temp, get_varname()); pi = e_pos; type = get_vtype(pi);

624

(Continued) /* --- write string variable --- */ if(type == 3) { strcat(temp, "$)"); write_str(" mov bx, offset("); writeln(temp); writeln(" inc bx"); writeln(" mov dx, [bx]"); writeln(" call PRINTSTR"); pi++; } else { pi = e_pos; type = get_Nvtype(pi); /* --- write integer variable --- */ if((type == 1) || (type == 2)) { if(type == 1) /* 16-bit integer */ { strcat(temp, "I"); } else /* if(type == 2) /* 32-bit integer */ { strcat(temp, "L"); } write_str(" FILD "); write_str(temp); writeln("\t\t\t; load integer value"); write_str(" FIST integer"); writeln("\t\t\t; store in integer print buffer"); writeln(" call GET_SIGN"); writeln(" call PRINTINT"); /**/ printinteger = 1; if(type == 2) /* advance pi over type symbol */ { pi++; } } /* --- write floating point variable --- */ else /* if(type == 3) /* 32-bit float */ { strcat(temp, "F"); write_str(" FLD "); write_str(temp); writeln("\t\t\t; load floating point number"); write_str(" FST x87buff"); writeln("\t\t\t; store in print buffer"); writeln(" call GET_SIGN"); write_str(" lea si, x87buff"); writeln("\t\t\t; point SI to 32 bit operand"); writeln(" call FLOAT2ASC"); writeln(" mov dx, offset(decbuff)"); writeln(" call PRINTSTR"); /**/ printfloat = 1; pi++; /* advance pi over type symbol */ }

625

(Continued) } pi = iswhiter(pi); ch = p_string[pi]; if(ch == ',') { writeln(" push dx"); writeln(" mov dl, 9\t\t; code:for tab"); writeln(" call PRINTCHR"); writeln(" pop dx"); printchrctr = 1; } else if(strchr(":\n", ch)) { writeln(" mov dx, offset(NEWLINE)"); writeln(" call PRINTSTR"); } /* --- is it end of statement --- */ if(strchr("\n\0", ch) == 0) { pi++; pi = iswhiter(pi); ch = p_string[pi]; e_pos = pi; } } } /*---------- end Do_print ----------*/

void Do_functions() { if(clrscreen == 1) { ClrScrn(); } if(printstring == 1) { PrintStr(); } if(printchrctr == 1) { PrintChr(); } if(stringcopy == 1) { StringCopy(); } if(stringcat == 1) { StringCat(); } if(printinteger == 1) { PrintInt(); PrintInt1(); PrintInt2(); PrintInt3(); PrintInt4(); } if(printfloat == 1)

626

(Continued) { if(printinteger == 0) { PrintInt1(); PrintInt2(); PrintInt3(); PrintInt4(); } PrintFlt(); PrintFlt1(); } } /*---------- end Do_functions ----------*/

void PrintFlt1() /*NEW*/ { writeln("LOAD_DECBUF2 PROC NEAR"); writeln(" lea si, IntString ; address of integer string"); writeln(" mov cx, si ; copy source address to CX"); writeln(" add cx, 10 ; point to end of string"); writeln("LOADER1:"); writeln(" inc di ; increment destination"); writeln(" mov al, [si] ; get ascii number"); writeln(" cmp al, '$' ; at the end yet ?"); writeln(" jz DONE_DECBUF"); writeln(" mov [di], al ; store ascii number"); writeln(" inc si ; increment source pointer"); writeln(" cmp si, cx ; is SI at end ?"); writeln(" jz DONE_DECBUF"); writeln(" jmp LOADER1 ; repeat"); writeln("DONE_DECBUF:"); writeln(" ret"); writeln("LOAD_DECBUF2 ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintFlt1 ----------*/

void PrintFlt() /*NEW*/ { writeln("FLOAT2ASC PROC NEAR"); writeln(" xor ax, ax ; zero AX register"); writeln(" mov al, decsign ; load sign flag"); writeln(" cmp ax, 0 ; 0=positive, 1=negative"); writeln(" jz IS_POSVAL"); writeln(" FLD x87buff ; load floating point number"); writeln(" FCHS ; change sign to positive");

627

(Continued) writeln(" FSTP x87buff ; store as positive number"); writeln("IS_POSVAL:"); writeln(" clc ; EXTRACT WHOLE NUMBER:"); writeln(" FILD C10000 ; load multiplier"); writeln(" FLD x87buff ; load floating point number"); writeln(" FLD x87buff ; twice"); writeln(" FRNDINT ; round to integer"); writeln(" FXCH ; exchange st(0) with st(1)"); writeln(" FCOM ; did number round upward ?"); writeln(" FSTSW sw87 ; store x87 status word"); writeln(" FWAIT"); writeln(" and sw87, 100h ; strip away excess bits"); writeln(" cmp sw87, 100h ; is H.O. '1' bit set"); writeln(" jne NOT_GREATER"); writeln(" FXCH ; exchange st(0) with st(1)"); writeln(" FLD1 ; load '1'"); writeln(" FSUBP st(1) ; decrement st(1)"); writeln(" jmp RESUME_CONV"); writeln("NOT_GREATER:"); writeln(" FXCH ; exchange st(0) with st(1)"); writeln("RESUME_CONV:"); writeln(" FIST integer ; save integer value"); writeln(" FSUBP st(1) ; subtract integer from float"); writeln(" FMUL st(1) ; multiply x 10,000"); writeln(" FMULP st(1) ; multiply x 10,000 and pop"); writeln(" FRNDINT ; round to integer"); writeln(" FISTP fraction ; store fraction as integer"); writeln("DO_INTEGER:"); writeln(" mov ax, W integer ; load L.O. word"); writeln(" mov dx, W integer+2 ; load H.O. word"); writeln(" mov cx, 10 ; string length"); writeln(" lea si, IntString ; point to string"); writeln(" call INTASC"); writeln(" call FIXINT"); writeln(" call LOAD_DECBUF1"); writeln(" push di ; save decbuff pointer"); writeln("DO_FRACTION:"); writeln(" mov ax, W fraction ; load L.O. word"); writeln(" mov dx, W fraction+2 ; load H.O. word"); writeln(" mov cx, 10 ; string length"); writeln(" lea si, IntString ; point to string"); writeln(" call INTASC"); writeln(" call FIXINT"); writeln(" pop di ; point DI to decbuff"); writeln(" call LOAD_DECBUF2"); writeln(" mov B [di], '$' ; add termination character"); writeln(" dec di ; push DI back to last character"); writeln(" mov al, B [di] ; last embedded character"); writeln(" cmp al, '0' ; is it an ascii '0'"); writeln(" jz FRAC_END"); writeln(" mov al, [di-1] ; previous character"); writeln(" cmp al, '0' ; is this an ascii '0'"); writeln(" jnz FRAC_END"); writeln(" mov B [di], '0' ; make final character a '0'"); writeln("FRAC_END:");

628

(Continued) writeln(" ret"); writeln("FLOAT2ASC ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintFlt ----------*/

void PrintInt() { writeln("PRINTINT PROC NEAR"); writeln(" mov ax, W integer \t\t; load L.O. word"); writeln(" mov dx, W integer+2 \t\t; load H.O. word"); writeln(" lea si, IntString"); writeln(" mov cx, 10"); writeln(" call INTASC \t\t\t; convert integer to ascii"); writeln(" call FIXINT"); writeln(" call LOAD_DECBUF1"); writeln(" mov B [di], '$' \t\t; add termination character"); writeln(" mov dx, offset(decbuff)"); writeln(" call PRINTSTR"); writeln(" ret"); writeln("PRINTINT ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt ----------*/

void PrintInt1() { writeln("FIXINT PROC NEAR"); writeln(" mov ax, offset(IntString)"); writeln(" mov di, ax"); writeln(" mov si, ax"); writeln(" mov cl, 10"); writeln("Fixint1:"); writeln(" cmp byte ptr [si], ' ' ; is it a space"); writeln(" jnz Fixint2 \t\t\t; if not, exit loop"); writeln(" inc si"); writeln(" loop Fixint1"); writeln(" jmp FixEnd"); writeln("Fixint2:"); writeln(" mov al, byte ptr [si]"); writeln(" mov [di], al"); writeln(" inc si"); writeln(" inc di"); writeln(" loop Fixint2");

629

(Continued) writeln(" mov B [di], '$'"); writeln("FixEnd:"); writeln(" ret"); writeln("FIXINT ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt1 ----------*/

void PrintInt2() { writeln("INTASC PROC NEAR"); writeln(" mov di, si ; save start of string"); writeln("IA1:"); writeln(" mov BYTE PTR [si], ' ' ; fill character"); writeln(" inc si ; point to next field position"); writeln(" loop IA1 ; loop until done"); writeln(" div C10000 ; divide by 10,000"); writeln(" mov bx, ax ; save quotient"); writeln(" mov ax, dx ; move remainder back to ax"); writeln("IA2:"); writeln(" mov cx, 4 ; number of digits to print"); writeln("IA3:"); writeln(" xor dx, dx ; clear High Order Word"); writeln(" div C10 ; divide by ten"); writeln(" add dl, '0' ; convert to ascii digit"); writeln(" dec si ; step backwards thru buffer"); writeln(" cmp si, di ; out of space ?"); writeln(" jb IAX ; yes, quit"); writeln(" mov [si], dl ; store digit"); writeln(" or ax, ax ; all digits printed ?"); writeln(" jnz IA4 ; no, keep on going"); writeln(" or bx, bx ; any more work ?"); writeln(" jz IAX ; no, can quit"); writeln("IA4:"); writeln(" loop IA3 ; next digit"); writeln("IA5:"); writeln(" or bx, bx ; more work to do ?"); writeln(" jz IAX ; no, can quit"); writeln(" mov ax, bx ; get next 4 digits"); writeln(" xor bx, bx ; show no more digits"); writeln(" jmp IA2 ; keep on going"); writeln("IAX:"); writeln(" ret"); writeln("INTASC ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt2 ----------*/

630

void PrintInt3() /*NEW*/ { writeln("LOAD_DECBUF1 PROC NEAR"); writeln(" lea di, decbuff ; point to decimal buffer"); writeln(" push di ; save address"); writeln(" mov cx, 25 ; length of decimal buffer"); writeln("CLR_BFFR:"); writeln(" mov B [di], 0 ; zero buffer"); writeln(" inc di ; next byte"); writeln(" loop CLR_BFFR"); writeln(" pop di ; reload buffer address"); writeln(" lea si, IntString ; address of integer string"); writeln(" mov cx, si ; copy address to CX"); writeln(" add cx, 10 ; point to end of string"); writeln(" cmp decsign, 0 ; test decsign for positive"); writeln(" jz POS_SIGN1"); writeln(" mov B [di], '-' ; make negative"); writeln(" jmp POS_SIGN2"); writeln("POS_SIGN1:"); writeln(" mov B [di], ' ' ; insert space"); writeln("POS_SIGN2:"); writeln(" inc di ; point to next character"); writeln(" inc di ; point to next character"); writeln(" mov al, [si] ; get ascii number"); writeln(" cmp al, '$' ; at the end yet ?"); writeln(" jz SET_DECIMAL"); writeln(" mov [di], al ; store ascii number"); writeln(" inc si ; increment source pointer"); writeln(" cmp si, cx ; is SI at end ?"); writeln(" jz SET_DECIMAL"); writeln(" jmp POS_SIGN2 ; repeat"); writeln("SET_DECIMAL:"); writeln(" mov B [di], '.' ; insert decimal"); writeln(" ret"); writeln("LOAD_DECBUF1 ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt3 ----------*/

void PrintInt4() /*NEW*/ { writeln("GET_SIGN PROC NEAR"); writeln(" mov decsign, 0 ; writeln(" FLDZ ; writeln(" FCOM ; writeln(" FSTSW sw87 ; writeln(" FWAIT"); writeln(" mov ax, sw87 ; writeln(" and ax, 4700h ;

clear sign-flag"); load a zero"); compare st(1) to st(0)"); store x87 status word"); copy status register"); 0100-0111-0000-0000b");

631

(Continued) writeln(" jnz SIGN_POSITIVE"); writeln(" mov decsign, 1 ; set negative sign-flag"); writeln("SIGN_POSITIVE:"); writeln(" FINIT"); writeln(" ret"); writeln("GET_SIGN ENDP"); writeln("; --------------------------------------------------"); } /*---------- end PrintInt4 ----------*/

Save file: Asmfunct.c, and close it. Now open file: Asminput.c. And replace this existing function:

void SystemConst() { writeln("; --------------------------------------------------"); writeln("; Do Not Delete !!!: Constants and system variables"); writeln("; --------------------------------------------------"); writeln("sw87 dw 0 \t\t; x87 status word"); writeln("integer dd 0"); writeln("fraction dd 0"); writeln("decbuff db 25 dup(0)"); writeln("decsign db 0"); writeln("x87buff dq 0"); writeln("; --------------------------------------------------"); writeln("NEWLINE DB 13,10,\'$\'"); writeln("SIXTN DB 16 \t\t; segment multiplier"); writeln("C10 DW 10 \t\t; constant for division"); writeln("C10000 DW 10000 \t\t; constant for division"); writeln("EndProg DW 0 \t\t; ZSEG: last segment"); writeln("NextFree DW 0"); writeln("IntString DB ' ','$',' ' ; integer string"); writeln("; --------------------------------------------------"); } /*---------- end SystemConst ----------*/

Save Asminput.c and close it.

632

Next, open file: Prototyp.h. And update this list of prototypes: /* void void void void void void void void void void void void void void void void void void void void void void Asmfunct.c */ Do_cls(void); Do_end(void); Do_functions(void); ClrScrn(void); Do_beep(void); Do_print(void); PrintStr(void); PrintChr(void); Do_goto(void); Do_label(void); Do_gosub(void); Do_return(void); Do_locate(void); StringCopy(void); StringCat(void); PrintInt(void); PrintInt1(void); PrintInt2(void); PrintInt3(void); PrintInt4(void); PrintFlt(void); PrintFlt1(void);

Save file: Prototyp.h and close it. Now, using your C compiler, compile Bxbasm.c. Next, copy this version of Basic script to file: Test.bas test.bas version 16.1 Integer = 10 Long% = 100 Float! = 1.5 Double# = 2.123 result! = Double# * (Float! + (Long% - Integer)) PRINT result! ' -----------------------------------------END ' -----------------------------------------'

Be sure it's in the same directory that contains A86.com. And, make sure you have a newly compiled copy of Bxbasm.exe in that same directory. Using Bxbasm.exe compile Test.bas.

bxbasm test.bas

633

Then execute Test.com (display): 194.25448600

There!
After all of that,…….! We can now perform math operations using both integers and floating point numbers and display the results. Examine the Assembly Language source code for Test.asm: ; ; *************BxbAsm Compiler*************

jmp START ; IntegerI DW 0 VARX1I DW 10 LongL DD 0 VARX2I DW 100 FloatF DD 0 VARX3F DD 1.5 DoubleD DQ 0 VARX4F DD 2.123 resultF DD 0 ; -------------------------------------------------; Do Not Delete !!!: Constants and system variables ; -------------------------------------------------sw87 dw 0 ; x87 status word integer dd 0 fraction dd 0 decbuff db 25 dup(0) decsign db 0 x87buff dq 0 ; -------------------------------------------------NEWLINE DB 13,10,'$' SIXTN DB 16 ; segment multiplier C10 DW 10 ; constant for division C10000 DW 10000 ; constant for division EndProg DW 0 ; ZSEG: last segment NextFree DW 0 IntString DB ' ','$',' ' ; integer string ; -------------------------------------------------; -------------------------------------------------START PROC NEAR ; -------------------------------------------------mov ax, ZSEG ; last byte of program div SIXTN mov ah, 0 inc al mul SIXTN mov EndProg, ax ; store value mov NextFree, ax ; -------------------------------------------------FINIT ; initialize x87 FILD VARX1I ; load integer

634

(Continued) FISTP IntegerI ; store integer, pop stack FINIT ; initialize x87 FILD VARX2I ; load integer FISTP LongL ; store integer, pop stack FINIT ; initialize x87 FLD VARX3F ; load float FSTP FloatF ; store float, pop stack FINIT ; initialize x87 FLD VARX4F ; load float FSTP DoubleD ; store float, pop stack FINIT ; initialize x87 FLD DoubleD ; load float FLD FloatF ; load float FILD LongL ; load integer FILD IntegerI ; load integer FSUBP ST(1), ST ; subtract and pop FADDP ST(1) ; add to ST(0) and pop FMULP ST(1) ; multiply and pop FSTP resultF ; store float, pop stack FLD resultF ; load floating point number FST x87buff ; store in print buffer call GET_SIGN lea si, x87buff ; point SI to 32 bit operand call FLOAT2ASC mov dx, offset(decbuff) call PRINTSTR mov dx, offset(NEWLINE) call PRINTSTR jmp DONE ; DONE: INT 20H START ENDP ; -------------------------------------------------PRINTSTR PROC NEAR push ax mov ah, 9 INT 21H pop ax ret PRINTSTR ENDP ; -------------------------------------------------FIXINT PROC NEAR mov ax, offset(IntString) mov di, ax mov si, ax mov cl, 10 Fixint1: cmp byte ptr [si], ' ' ; is it a space jnz Fixint2 ; if not, exit loop inc si loop Fixint1 jmp FixEnd Fixint2: mov al, byte ptr [si]

635

(Continued) mov [di], al inc si inc di loop Fixint2 mov B [di], '$' FixEnd: ret FIXINT ENDP ; -------------------------------------------------INTASC PROC NEAR mov di, si ; save start of string IA1: mov BYTE PTR [si], ' ' ; fill character inc si ; point to next field position loop IA1 ; loop until done div C10000 ; divide by 10,000 mov bx, ax ; save quotient mov ax, dx ; move remainder back to ax IA2: mov cx, 4 ; number of digits to print IA3: xor dx, dx ; clear High Order Word div C10 ; divide by ten add dl, '0' ; convert to ascii digit dec si ; step backwards thru buffer cmp si, di ; out of space ? jb IAX ; yes, quit mov [si], dl ; store digit or ax, ax ; all digits printed ? jnz IA4 ; no, keep on going or bx, bx ; any more work ? jz IAX ; no, can quit IA4: loop IA3 ; next digit IA5: or bx, bx ; more work to do ? jz IAX ; no, can quit mov ax, bx ; get next 4 digits xor bx, bx ; show no more digits jmp IA2 ; keep on going IAX: ret INTASC ENDP ; -------------------------------------------------LOAD_DECBUF1 PROC NEAR lea di, decbuff ; point to decimal buffer push di ; save address mov cx, 25 ; length of decimal buffer CLR_BFFR: mov B [di], 0 ; zero buffer inc di ; next byte loop CLR_BFFR pop di ; reload buffer address lea si, IntString ; address of integer string mov cx, si ; copy address to CX

636

(Continued) add cx, 10 ; point to end of string cmp decsign, 0 ; test decsign for positive jz POS_SIGN1 mov B [di], '-' ; make negative jmp POS_SIGN2 POS_SIGN1: mov B [di], ' ' ; insert space POS_SIGN2: inc di ; point to next character mov al, [si] ; get ascii number cmp al, '$' ; at the end yet ? jz SET_DECIMAL mov [di], al ; store ascii number inc si ; increment source pointer cmp si, cx ; is SI at end ? jz SET_DECIMAL jmp POS_SIGN2 ; repeat SET_DECIMAL: mov B [di], '.' ; insert decimal ret LOAD_DECBUF1 ENDP ; -------------------------------------------------GET_SIGN PROC NEAR mov decsign, 0 ; clear sign-flag FLDZ ; load a zero FCOM ; compare st(1) to st(0) FSTSW sw87 ; store x87 status word FWAIT mov ax, sw87 ; copy status register and ax, 4700h ; 0100-0111-0000-0000b jnz SIGN_POSITIVE mov decsign, 1 ; set negative sign-flag SIGN_POSITIVE: FINIT ret GET_SIGN ENDP ; -------------------------------------------------FLOAT2ASC PROC NEAR xor ax, ax ; zero AX register mov al, decsign ; load sign flag cmp ax, 0 ; 0=positive, 1=negative jz IS_POSVAL FLD x87buff ; load floating point number FCHS ; change sign to positive FSTP x87buff ; store as positive number IS_POSVAL: clc ; EXTRACT WHOLE NUMBER: FILD C10000 ; load multiplier FLD x87buff ; load floating point number FLD x87buff ; twice FRNDINT ; round to integer FXCH ; exchange st(0) with st(1) FCOM ; did number round upward ? FSTSW sw87 ; store x87 status word FWAIT

637

(Continued) and sw87, 100h ; strip away excess bits cmp sw87, 100h ; is H.O. '1' bit set jne NOT_GREATER FXCH ; exchange st(0) with st(1) FLD1 ; load '1' FSUBP st(1) ; decrement st(1) jmp RESUME_CONV NOT_GREATER: FXCH ; exchange st(0) with st(1) RESUME_CONV: FIST integer ; save integer value FSUBP st(1) ; subtract integer from float FMUL st(1) ; multiply x 10,000 FMULP st(1) ; multiply x 10,000 and pop FRNDINT ; round to integer FISTP fraction ; store fraction as integer DO_INTEGER: mov ax, W integer ; load L.O. word mov dx, W integer+2 ; load H.O. word mov cx, 10 ; string length lea si, IntString ; point to string call INTASC call FIXINT call LOAD_DECBUF1 push di ; save decbuff pointer DO_FRACTION: mov ax, W fraction ; load L.O. word mov dx, W fraction+2 ; load H.O. word mov cx, 10 ; string length lea si, IntString ; point to string call INTASC call FIXINT pop di ; point DI to decbuff call LOAD_DECBUF2 mov [di], '$' ; add termination character dec di ; push DI back to last character mov al, B [di] ; last embedded character cmp al, '0' ; is it an ascii '0' jz FRAC_END mov al, [di-1] ; previous character cmp al, '0' ; is this an ascii '0' jnz FRAC_END mov B [di], '0' ; make final character a '0' FRAC_END: ret FLOAT2ASC ENDP ; -------------------------------------------------LOAD_DECBUF2 PROC NEAR lea si, IntString ; address of integer string mov cx, si ; copy source address to CX add cx, 10 ; point to end of string LOADER1: inc di ; increment destination mov al, [si] ; get ascii number cmp al, '$' ; at the end yet ?

638

(Continued) jz DONE_DECBUF mov [di], al ; store ascii number inc si ; increment source pointer cmp si, cx ; is SI at end ? jz DONE_DECBUF jmp LOADER1 ; repeat DONE_DECBUF: ret LOAD_DECBUF2 ENDP ; -------------------------------------------------ZSEG: ; --------------------------------------------------

Load Test.com into D86.com. Single-step your way through the code and make sure you understand what is happening. Use paper and pencil and keep track of what is going on in the x87 Stack registers and variable "decbuff". After you have Test.com loaded into D86, type:

1b,decbuff<enter>
You can use memory windows 2 thru 6 to monitor the contents of any other variables you wish, too. Run a test for negative values by modifying Test.bas to include single-precision variable "Negative", so that it reads as follows: test.bas version 16.2 Negative! = 0 Integer = 10 Long% = 100 Float! = 1.5 Double# = 2.123 result! = Double# * (Float! + (Long% - Integer)) Negative! = 0 - result! PRINT Negative! ' -----------------------------------------END ' -----------------------------------------'

Again, using Bxbasm.exe compile Test.bas:

bxbasm test.bas

639

Then execute Test.com (display): -194.25448600

As you can see, even negative floating point values are correctly converted to ascii characters.

CONCLUSION
Well, that's it for this chapter. I hope that this issue has been of some help in understanding Floating Point numbers, how they are stored in memory and the decoding process. We've also taken another glance at some of the x87 instructions and learned how to make use of the "status word". By now we've learned that because the display is a character based system, there is no direct way to send a numeric value to the output routines without some sort of value-to-ascii conversion before hand. With the conclusion of this chapter, it may appear that we took the long way around to printing numbers. But, if you are reading this tutorial with the intention of writing your own programming language compiler or scripting engine, then these are just some of the many things that you need to know. The more you know about the machine (CPU, FPU) and the operating system, the more successful you will be at writing a working compiler that other programmers will want to use. Steve Arbayo

640

Acknowledgements
“City_Zen”, who created the QDepartment Yahoo Group (egroups), then handed me the helm. Darren Turland, who submitted the first bit of code to this group, that began to open my eyes about how a programming language interpreter worked. Pavel Minayev, who offered the much needed early technical assistance. Brian Christopher, who’s technical assistance has been invaluable. Jack Crenshaw, whose work I studied, over and over again, until I began to understand Recursive Descent Parsers.

If you find any errors or typos you'd like to report, you can contact me through the QDepartment at: http://tech.groups.yahoo.com/group/QDeprtment/ Or: blunt_axe_basic@yahoo.com

641


				
DOCUMENT INFO
Description: Bxbasic is presented as a programming tutorial, to develop and construct a Console Mode Scripting Engine and Byte Code Compiler. The Bxbasic dialect, included here, is a subset of the GW-Basic and QBasic programming languages.