Thursday, 20 November 2014

BASIC Interpreter


I have been doodling a BASIC interpreter, written in Processing. A BASIC interpreter is something I've always wanted to do, with varying levels of ambition, resulting in various abandoned attempts.  "Abandoned" meaning they couldn't be said to work as BASIC. Now, the current project is beginning to resemble a working language.
The multiplication table in "hi-res". Note that the FOR...NEXT structure is still absent. Also, the parser currently requires some superfluous brackets.
Instead of trying to create a modern IDE + compiled BASIC (which there are plenty), I wanted to make a sort of virtual computer inspired by 1980s machines, with limited fixed-memory space and limited character "display modes".

I'm also trying to project how BASIC-based computers might have developed if history had followed another course. This shouldn't be taken too seriously, though. What's here is a mash-up of cherry-picked features benefitting from hindsight. For example, even if I've kept the graphical capacity of the "computer" pretty low, the storage space is near-infinite. This was not at all the situation in the end of the 1980s.

Although I admire various BASIC interpreters from the 1980s, the most significant source of inspiration comes from Sinclair QL SuperBASIC environment. I like the idea that a computer would be used not so much for running "apps", but giving enough building blocks for making interesting things. Arguably Linux/UNIX environments do this already, but woefully they inherit a bit too much from the text-terminal days. Instead, the BASIC interpreters were directly coupled to visuals and sound. Currently the emphasis is on a character display-style interaction with the visuals, but other "display modes" could be inserted just as well.
The obligatory "scroll some random garbage on screen" program.
The virtual machine contains 128K of "RAM", which is partially mapped as character display, character set graphics, free memory and system parameters, BASIC listing, variables and stack, just like the 8-bit computers.  I'd say this arrangement was even helpful at times, as it creates a straightforward division between "in-computer" variables and the Processing variables: the entire memory and all the virtual computer parameters are simply inside a byte array in Processing.

Although a Sinclair-style separation of graphics and command entry has merits, I went for full-screen cursor editing. (With some as yet half-baked windowing features.) The BASIC listing is line-numbered, something I chose for simplicity but also for certain immediacy. If it's not meant to be an IDE, let's not even go halfway there. Supporting some labels might still be viable, and I hope to have Sinclair QL-style procedures unless they prove to be too difficult.
A fragment from a BASIC program for editing the character set. It's a bit slow.
I kept the amount of memory low (128 kilobytes) because I wanted instant saving and restoring snapshots of the entire machine state, and for this reason the images should not take too much space on the disk. Saving the entire memory is much like saving a snapshot in an emulator, everything will be conveniently contained in it, colours, character display, character set, cursor position, BASIC listing, system variables, just everything.

One difference to emulators is that the snapshots can be loaded with commands during run-time. A major reason for this is to enable hypertext elements for traversing between snapshot images. So it's not necessary to "program" anything to make use of the screen editor. The old BASIC systems worked as on-screen scratchpads, and here, before the BASIC was functional, I could use the screen editor as a notepad for future features and to-do lists and general thinking-out-loud.

The system variable reference, as a memory snapshot. The [bracketed] text typed on-screen act as commands, which may invoke other memory snapshots.
However, I did not go so far as to emulate a processor, in which the BASIC would run. This would have been cool but far too difficult and time consuming for me at this moment. I have to raise a hat to anyone who made these things work in the 1980s and before. Here the Processing code is used to parse the BASIC commands, arranging all the memory reads and writes and handles the updating of the display.

What's the most difficult part here? The "pipeline" of functions that identify the keywords and elements in a BASIC statement and return values to various expressions and functions.

This kind of line could be entered in the full-screen character display:

CLS: PRINT "Hello world: ";PEEK(32128+PEEK(64));" times"

After the program scoops the ASCII string from the full-screen editor, it is sent to the pre-parser, which decides whether it ought to be added to the BASIC listing (not, because there is no line number) or run immediately.
A one-time scheme of the command pipeline, as a memory snapshot.
When run, the pre-parser splits the line into the two separate command lines, CLS and PRINT, divided by colons (:), yet ignoring the colon inside the quotation marks. So far quite simple. Then the command lines are sent one by one to the proper parser, which splits the statement into components. The components are separated by either space, comma, brackets and so on. (This is a bit more complex than I'm letting on).

With PRINT, the parser needs to handle an arbitrary amount of expressions, which can be strings or numeric variables or functions, usually separated with a comma. All of these are sent one by one to the Resolver, which identifies other parts as strings and others as something to be sent to the Calculator/Counter. The Counter is a recursive function which breaks the PEEK(32128+PEEK(64)) part, first the PEEK(64), re-working the initial string into PEEK(32128+1) (for example) and then into PEEK(32129), which is finally seen as a straightforward function, returning the contents of the address 32129, which might be 255. This is turned into a string and put on screen.

This is all a bit messy, and my hands-on style of improvisational programming means that a change in one part easily means an unexpected result in some other part. It's all been going on fairly well, though.
Possibly a support for cyrillic alphabet...
Is it available? No, at least not yet. Probably never, if past projects are any indication. There's too much work to be done ensuring the program works nicely and does not provide a potential backdoor for destroying somebody's files. If I do my own programs for myself there is a massive amount of work I do not need to care about. (better error-handling, some user-friendliness, manuals, support) There's a message here somewhere, I think.

Why? It's an interesting thing to do. Also it might help in devising scripting languages and word parsing for something unrelated.

No comments:

Post a Comment