Old Machinery: assembler

Showing posts with label assembler. Show all posts

Sunday, 21 March 2021

Leilei Relay

Baby UFOs in UFOland have to train arduously before they can leave their homeworld.

Historically, the most infamous of these training grounds was the Lei Lei Relay.

Collect the lanterns to advance to the next field, up until you reach the UFO cathedral. There you will attain the title of UFO master.

Guide the UFO using Joystick in Joystick port 2. Left and Right turns the ship, while Up is thrust and so is Fire. Pressing key '1' on the keyboard switches the musics on/off.

On and off I have been working on new Commodore 64 games. Most of them do not get much farther than early stages, so I really, really wanted to do something so small that it would get finished.

Even then, this game turned out to be somewhat bloated compared to my original intentions. So it's a small game that became slightly bigger than I wanted to.

It does have multicolor character graphics, sprites, smooth vertical scrolling, inertia, polar coordinate tables, multiple SID tunes and sound effects.

My original intention was to make a one-screen lander-type game. Some inspiration has been drawn from lander games and Space Taxi.

But it's closer to a gravity game like Thrust in the sense there's no actual landing involved. Instead, you need to collect all the lanterns from each room to proceed to the next one. The quicker this can be done, the more bonus points can be achieved. And that's it.

The weird shape of the player ship gives the game some additional spice. It needs to be turned into a vertical position to help squeeze it through some of the more narrow passages.

Unlike with Fort Django and Digiloi, there's no shooting, which makes this game simpler. But it's likely to be infernally difficult.

There's no PETSCII character graphics this time, I have some ideas about their use for other games, which may or may not be completed!

Tile map layout

As usual, I have come to trust Tiled for map-making. Instead of PETSCII I now used multicolor character graphics. The pipeline from character editing to "meta-tiles" would deserve a closer look but I'll leave that for another time. Let's just say experience in PETSCII gives a good start for optimizing character graphics, but it does have its own peculiarities.

First level in Tiled

At the center of the game routine is the scrolling screen buffer. Instead of updating screen portions piece by piece, the whole screen is redrawn for each frame.

I abandoned the character-specific colors to make this easier. This gives the game a somewhat monochromatic appearance. I tried to alleviate it by using different colors for different levels, and mixing the colors when possible.

There's a huge ordered buffer in the memory, but at least it made some other things easier for me. The level data in itself takes very little space, but testing levels and putting them in a reasonable order is time consuming so I went for a fairly low number.

More technical parts

The brutal redraw routine means I can scroll the area in whatever speed I like, and also if I draw changes into the buffer, these are rather simple to calculate, as all Y-coordinates work as increments to the HI-portion of the address. So if the first line of the buffer is at $8000, the next line is at $8100 and so on.

A new thing to me was polar coordinates and inertia, although I already made the routines for a more complex game idea, they found a better home in Leilei Relay.

The UFO coordinates are 24-bit values, but this is less complex than might sound.

24-bit coordinate:

[Low byte][High byte ][Highest byte ]

Sprite coordinate:

[Sprite LSB][Sprite MSB bit]

The high and highest byte are transferred directly to sprite screen coordinates, so that "highest" gives the MSB bit for the sprite coordinates higher than 255.

Low byte is kind of sub-pixel coordinates, as they increment over 255, the sprite has moved visibly one pixel.

For thrusting the craft forward, a polar coordinate table is needed. These are 8-bit values centred around the point 128,128 ($80,$80). They have been generated using Processing.

move_anglex:
.byte $80,$80,$81,$81,$82,$82,$82,$83,$83,$84,$84, [...]
move_angley:
.byte $70,$70,$70,$70,$70,$70,$70,$70,$70,$70,$70, [...]

So the first values are $80 and $70, which indicates 128 and 112, denoting a 0,-16 vector.

I have 256 values in the table although the UFO has only 32 visible angles. I won't go into the confusion this might create, I initially thought it would be nice to have more functional angles than visible angles, but after experimenting with it, this is really a no-no.

It's really annoying to see the ship move to a direction it is not pointing at even if the difference is very subtle.

So, it may be just as well assumed the table is 32 values long.

Oh, and the above table is not directly applied to the sprite coordinates, but to the ship inertia X/Y values, which are in turn 16-bit values.

These again have the initial value of $8000, $8000. (low byte=$0, high byte=$80)

If the ship angle is stored in register Y, then the above table can be used to alter inertia.

sec
lda inertia_x_lo
sbc #$80
sta inertia_x_lo

lda inertia_x_hi
sbc #$0
sta inertia_x_hi

sec
lda inertia_y_lo
sbc #$80
sta inertia_y_lo

lda inertia_y_hi
sbc #$0
sta inertia_y_hi

clc
lda inertia_x_lo
adc move_anglex,y
sta inertia_x_lo

lda inertia_x_hi
adc #$0
sta inertia_x_hi

clc
lda inertia_y_lo
adc move_angley,y
sta inertia_y_lo

lda inertia_y_hi
adc #$0
sta inertia_y_hi

Sooo the $80 is first subtracted from the inertia values, and then the motion (with the $80 baked in the values) is added. There must be a more clever way but this is the one I used.

The inertia values are then used similarly to affect the ship (24-bit) coordinates.

This is for the x coordinate, for the y coordinate it's the same. It may be useful to keep them separate.

sec
lda ship_x_lo
sbc #$0
sta ship_x_lo

lda ship_x_hi
sbc #$80
sta ship_x_hi

lda ship_x_highest
sbc #$0
sta ship_x_highest

clc
lda ship_x_lo
adc inertia_x_lo
sta ship_x_lo

lda ship_x_hi
adc inertia_x_hi
sta ship_x_hi

lda ship_x_highest
adc #$0
sta ship_x_highest

The main game sprite is simple, but it does have a layer of anti-aliasing, using another sprite. These were generated using Processing. There are 32 frames, making a total of 64 sprite frames for the UFO.

For each frame, the sprite graphics data are copied from outside the video bank area to the visible sprite frame, as 64 sprites would take a huge chunk of the video bank and this didn't fit into my plan. I'm now getting to understand how important it is in C64 programming to have a good plan for locating the graphic data and the video bank.

With this technique I could have had 64 frames (128 with anti-aliasing), but I felt it easier for the player if the vertical and horizontal positions can be more clearly discerned.

The thruster flame has only one frame but it is positioned differently in alternating frames to give it more direction and a tiny bit of transparency, again using a polar coordinate shift.

Four sprites are used on the lanterns, although with multiplexing they could have used up less. But I didn't want to practice multiplexing this time, it was enough to handle the above issues and sprite-to-sprite and sprite-to-background collisions.

Wrap up

Looking at my notes, I worked on the game intensively for a few weeks in May 2020. Some of the routines had been done previously so I didn't have to "invent" the inertia, polar coordinate and precision coordinate routines then. So adding that I might have worked on this for about month.

It appears I considered a release at end of June 2020, but decided to wait. Then I picked it up on this weekend, with the idea I might be heading towards a phase of adding things to the game.

But instead I simply cut off loose ends, adjusted some of the musics, added some graphic variety, removed anything that still looked like a bug, tested it on a real C64 and released it.

Design-wise, there was a lot that could have been added, but at this point every addition would have needed another round of testing and assessing the whole game, so it was better to keep the original promise of a "small game".

Leilei Relay at csdb

Wednesday, 6 May 2020

1541 Ultimate II

Finally I got the Ultimate cartridge for the C64.

Not the II+ mind you, but a second hand version of the II, with the tape emulation add-on and the Ethernet-USB dongle thrown in.

The adapter has the label "NO:usb 2.0 LAN JP208 MADE IN CHINA" and that's all I know of the brand and make of it. It's good to have an already proven solution included.

Firmware upgrade

I didn't really buy the cartridge for the Ethernet functions, but I was curious enough to want it to work. It happened the u1541 firmware was 2.6 and needed upgrading before this would work well. In any case it must be a good idea to update the firmware.

At this moment I realized the Ultimate cartridge, despite being highly rated and professional-looking hardware, has a rather scattered and a bit confusing documentation. Well, it's quite common for any of these hobbyist add-ons. As I got this second hand the docs may have been lacking.

So, an upgrade 2.6 version to 3.07beta needs an update.bin file in the root folder of the SD card (Usb stick is not enough). When booting, the update will run.

Afterwards, I could upgrade using u2u files. (I guess u2p files are for the plus) When I had my 3.4c version going, I could finally see the IP address on screen.

Remote control and file transfers

Now it's possible to telnet to that IP from another computer, using port 23.

PuTTy is fine, but correct settings are needed for display and cursor keys and backspace.

Initially arrow keys or function keys did not work without pressing Control at the same time. Changing PuTTy keyboard settings to VT100+ helped. "Initial state of cursor keys" is "normal".

Local echo and local line editing are both "force off". Backspace is set to "Control-H"

Also, the font did not display correctly despite changing it from UTF-8 to ISO-8859-1:

I fiddled with various settings, but apparently I needed to untick the box that says "override with UTF-8 if locale says so".

After this it started working properly:

The telnet connection is quite impressive and it's nice to have the remote for viewing files in quick succession. The software inside the cartridge does not care what state the C64 is in, so the menu can be operated all the time the power is on.

ftp is also possible, but any attempts to integrate it to caja (the Linux mint file browser) were not especially successful. I could see the folder but file transfers failed. The same with filezilla, a file manager, which seemed so convoluted I wouldn't probably use it even if I found the correct settings.

Instead, I got it to work using command line ftp to the IP address. First use 'binary' to set all further file transfers to binary, otherwise files will be sent with incorrect file lengths. (despite the system saying its switching to binary for the files.)

'ls' gives the remote folder contents. 'cd SD' is likely the first move, to get to the root folder of the drive. Then 'put' puts a file on the local current folder to the current remote folder, and 'get' does the reverse. 'bye' exits the ftp command line.

Apparently the telnet and ftp functions don't mesh together, so you can't upload a file using ftp while "in" with telnet.

Edit: Apparently they could work together, but it's just that messing back and forth multiple times with either ftp or telnet tends to lock the cartridge. A script that sends a file on ftp and launches it via telnet, may work one or two times. This could be related to a bug/oversight in the cartridge.

Some fooling around

One of the main functions of u1541 is the utility cartridge collection. For the purposes of using the cart for running files and general compatibility, the Retro Replay cartridge seems to be the standard.

But there are others, such as Final Cartridge III and my old friend Action Replay VI.

For fun, I did a code snippet on the monitor. (Invoked by MON).

Well, the monitor is in the Retro Replay version of the cart too.

AVI was the original environment where I learned the first steps of assembler in the early 1990s.

For a long time these were the last steps too, because I didn't understand how to structure these programs at all.

Now I see that it would be possible to write long programs using the monitor, giving enough room for subroutines and using a meticulous paper documentation/mapping.

There is another interesting cart, the Turbo Action ROM by Soundemon/Dekadence. This contains Turbo Assembler:

So the same code snippet can be written this way. The source can be assembled by pressing arrow left (the "esc" key, not the cursor key) followed by 3.

After assembling the source, the cart can be reset, and after activating the Tasm again the source is still in place. (If it's not overwritten I guess).

Perhaps this would have been helpful back in the day. However I must say that it would be really painful to write long sources with this one and writing code directly in machine code monitor might even have some advantages compared to an assembler. There were of course various techniques for splitting the code, and in the end it might require the kind of forethought and paper mapping as with the monitor.

Overall

The core functions of the cart are supreme and I probably won't look too much back at SD2IEC and IRQHack.

Yet, it seems the cart is trying to be a bit too many things, yet some obvious things are missing (if I'm not wrong). Effort has been put into areas that are not that interesting to me, like printing or some weird audio extensions.

For example, from what I've read and experienced, the cartridge does not seem to allow straight-up transfer+execution of the transferred file. You have to manually launch the ftp-transferred file from the menu. This doesn't sound a lot, but it's an unnecessary step and a more direct result would been better for building and executing code.

This is something the IRQhack could do (although again in a limited way) and I don't see a reason why the Ultimate would not if the software was put into place. Maybe I am wrong and the function is there somewhere.

Sunday, 17 February 2019

Linux, Rasm, Arkos Tracker 2, ZX Spectrum

Looking at a promising recent z80 cross-assembler and Arkos Tracker 2. Both work on Linux and there are now easy-to-use music player routines for ZX Spectrum included.

The assembler compiles the player routines without any modifications so it made sense to me to lump my notes together here.

It couldn't get much easier for the Linux user I think!

Rasm

Rasm is the z80 cross-compiler. (The R stands for Roudoudou I suppose). Download the Rasm archive, extract it to a folder.

Look for the compilation instructions inside the c file, something like

cc rasm_v0108.c -O2 -lm -lrt -march=native -o rasm

(Edit 30.8.2022: These days make seems to work as it should. If compilation stops at stdio.h or string.h, may need sudo apt install build-essential before attempting.)

After copying the output rasm to say /usr/local/bin, you can then use rasm inputfile.asm to produce an instant binary from out of the source.

The inputfile.asm contents might be something trivial like this:

org $8000

ld hl,#4000
loop:
inc a
inc l
ld (hl),a
jr loop

This should fill the top third of the screen with garbage.

The black screen comes from the bin2tap BASIC header

When the binary rasmoutput.bin (the default name) has been created, it still needs to be converted to a TAP file that a Spectrum could load.

Using the bin2tap (need to compile that too) you can create a runnable tape with a BASIC header. By default it works with the $8000 (32768) address so no need to worry about that either. Again, it ought to be copied to /usr/local/bin (for example).

Then, working with the default output binary name:

bin2tap -b -o mycode.tap rasmoutput.bin

After this, Fuse will happily load the produced mycode.tap and use the BASIC loader to load the binary and execute it at 32768 ($8000).

Rasm is a clever modern macro-assembler with all the repeat functions you'd expect nowadays. With a good editor like sublime-text (with added z80 highlighting) it's a breeze.

For example, the following will generate a table of 192 start addresses for each vertical pixel row, often needed in ZX Spectrum graphics routines:

screenbase=$4000

ytable:
repeat 3,yblock
repeat 8,ychar
repeat 8,ypixel
dw screenbase+(ychar-1)*32+(ypixel-1)*256+(yblock-1)*2048
rend
rend
rend

Using variables, evaluations and local labels inside repeat blocks even quite complex repetitive code can be generated.

Arkos Tracker 2

More information about Arkos Tracker from here. Extract the archive and simply run.

I'm not going too deeply into song creation here, suffice to say this is the easiest cross platform 8-bit tracker I've seen, with an audible default instrument and preset instrument categories for bass, snare etc.

After the pattern has been filled with bleeps and bloops, you can export the creation from the menu option file/export as generic (AKG), choosing source as the export mode.

Pick the PlayerAkg_SPECTRUM.asm from the players/playerAKG/sources folder, and copypaste the contents to the end of your source. (Of course it would be a better practice to include the player routine and the song.)

For example, this is a minimal main loop that will HALT the Spectrum until the next frame, and call the 'play frame' portion of the player routines.

org $8000

ld hl,Songname_Start
xor a
call PLY_AKG_Init

loop:
halt
call PLY_AKG_Play
jp loop

; paste the PlayerAkg_SPECTRUM.asm contents here

; paste the Songname_AKG.asm contents here

The exported song source can be copypasted after the player source. The Songname obviously refers to the filename it was exported with.

The start address is not needed so the org line may need to be removed from the song source. As can be seen the player nicely accepts a flexible start address - no need to copy the song data to some specific address either.

My example is simple, it's not especially useful to simply HALT and call the player routine, it might be better to use an interrupt to play the song.

But this is it for now, I'm super-impressed how easy this was to set up and get running.

Wednesday, 24 January 2018

QL assembler

Lately I've turned into a bit of a QL-nut, and now I wanted to try coding assembler on it. Most of the files and initial info are from Dilwyn Jones's comprehensive QL web pages, particularly the assembler and archiver pages.

Although vasmm68k on Linux is very useful, there are limits to what kind of code can be run on an emulator. It's not easy to find an emulator that a) works on Linux, has b) cycle-exact scanline video emulation, has c) flexible transfer between it's own filesystem and the PC and is d) free. UQLX is nice to have, but as far as I know it does not do screen flipping/vsync in the way a real hardware does.

So I went the route of compiling 68000 code on the QL hardware platform itself. After all, it is a semi-16-bit architecture with 800K of memory. With modern media, it should not be excessively painful to code on this system.

First I tried a tiny assembler called Adder, but only on my unexpanded QL it goes past the initial screen, so I moved to the larger Qmac assembler/linker package. This is a bit of an overkill for the things I want to try out, but at least it is convenient and does what one would expect from a full assembler.

I also tested asm20 at some point, but after a promising start it seemed a bit too incomplete. I failed to find a way to include binaries, and as I could not turn off the listing (opt nolist does not work) it becomes quite slow for even small sources.

I'm wondering why a quick editor/assembler/monitor setup like Asm-one or K-Seka on the Amiga/Atari ST does not exist for QL, or if it does, where is it hidden?

QMAC

So, over to the Qmac/Qlink package. Installing is not too hard once you get the hang of transmitting files over to the QL. Just unzip the qmac package on your Sinclair QL drive.

It should be remembered that QL archives need to be unzipped on the QL filesystem, as PC/Linux extraction tends to mess with QL executable file headers. To get around this, there's an already-unzipped version of unzip, and a BASIC file that repairs the header for that file.

After this has been achieved,

exec_w flp1_unzip;'flp1_archive_zip'

will start storing the files from the archive_zip to flp1_, if they are on that drive.

After the qmac file is unarchived, it is a good idea to copy the assembler, the linker and the source to a ram disk, if you have the memory to spare, as it should always be faster than any drive.

Afterwards, the executables can be run directly from the ram drive:

exec_w ram1_qmac;'ram1_source'

where 'source' is the assembler source text file with the name source_asm, here assuming it is in ram.

exec_w ram1_qlink;'ram1_source'

and this would result in a source_bin file.

Running the binary:

a=respr(size): lbytes ram1_source_bin,a: call a

For a relocatable code file, this would then reserve some unused memory space, load the file and call it. The size must be enough to contain the binary size. For non-relocatable code, the code should be loaded to a "safe" address and called from there, such as 196608 on an unexpanded QL.

ALCHP and LRESPR might be more handy but I tend to avoid running the Toolkit 2 extension unless I need it, as it's a few seconds more boot time :)

The commands above also show how to pass parameters to executable files. Note that EXEC_W executes the file and exits, whereas EXEC would run it as a job.

Running QED text editor

QED is a nice little text editor for writing the assembler sources. Loading QED, the file definition has to be a bit more explicit:

exec_w flp1_qed_exe;'flp1_source_asm'

Use F3 to enter the command mode in QED. X saves & exits the editor, Q exits.

By the way, with a television monitor the QED window size may need changing. Be sure to download a QED version that includes the BASIC configuration file. This config program brutally overwrites the QED binary with your chosen parameters, which is handy in that the QED can then always be copied as one file.

The configuring program may complain of a missing variable, as the window size function calls are part of yet another non-universal extension package. Insert something like this at the beginning:

10 SCR_XLIM=512: SCR_YLIM=256

Boot files

The above won't go very far in establishing an effortless edit/compile cycle.

But with boot files and the superBASIC procedures, much more can be done. Besides, creating useful boot files on Sinclair QL is somehow more fun than scripting on modern computers. Use colors, graphics, whatever you like. Hey, it's BASIC - but with procedures!

My boot file will copy the assembler, linker and my current working source and related binaries to a ram disk. From there they run almost instantly.

They could be copied from the HxC Floppy Emulator disk image, but the QL-SD sdc1_ drive is much faster than the HxC, so the files are copied from there.

20 print "Copying to RAM..."
30 copy sdc1_qed_exe to ram1_qed
40 copy sdc1_qlink to ram1_qlink
50 copy sdc1_qmac to ram1_qmac
60 copy sdc1_source_asm to ram1_source_asm
70 copy sdc1_datafile to ram1_data_bin

Obviously these can be made into it's own function that gives on-screen information as the files move.

In absence of a real makefile, I have a defined procedure called MAKE, just to have a similar compiler command on QL as on my Linux:

100 def proc make
110 exec_w ram1_qmac;'ram1_source'
120 exec_w ram1_qlink;'ram1_source'
130 end def

After all this, MAKE will assemble and link the file source_asm, and as a result there will be ram1_source_bin.

Executing the result can be made into a procedure too, just remember to reserve enough memory to fit the binary, otherwise it will behave erratically=likely crash.

I called this procedure SYS to have it short and as a nod to the Commodore computers.

10 a=respr(8192)
200 def proc sys
210 lbytes ram1_source_bin,a
220 call a
230 end def

If the code is to be loaded to a fixed address, the LBYTES should load it directly and no RESPR is needed.

Also bear in mind that apparently plain BASIC is not able to free the memory taken up with RESPR. So the memory should be reserved once in the boot file, not inside the procedure. As long as you don't re-run or load another basic program (and you don't need to) the variable is kept in memory.

Some extensions have commands for allocating/deallocating memory from BASIC, such as the ALCHP mentioned above.

A procedure for launching QED with the current source is also desirable, as a sort of alias for the long command line.

250 def proc qed
260 exec_w ram1_qed_exe;'ram1_source_asm'
270 end def

Development environment options

So, after the boot file has been run, I can type QED to edit my current source file, MAKE to compile it, SYS to run the code. More shortcuts can be created for producing disc backups out of the RAM files, etc. Instead of command procedures I could also create a key-press based environment.

As Sinclair QL is a multitasking system, the QED text editor can also be run as a parallel job with EXEC instead of EXEC_W, flipping between QED and the BASIC prompt using CTRL+C. This has the advantage I don't have to exit QED so the cursor position is not forgotten. The self-defined BASIC procedures remain in effect. Also, I can view and edit the source as the compilation runs in the background.

Whenever screen update happens on another job, the QED screen gets messed, and there is no screen refresh when you switch a job. In QED, F5 key at least redraws the screen.

Dual QL setup

With the above setup, if the code returns to BASIC neatly, it's possible to go on modifying the source without having to boot up the QL after every code test. Obviously, if the program crashes the system has to be rebooted, but having all the current materials auto-copied at least reduces the chore.

The crashes and re-boots resulting from wrong code can be alleviated by having two Sinclair QLs connected with a net cable, one as the compiler and the other as the target. Although this is probably more trouble than it's worth, I had to try.

One approach might be to assign NET values to each of the QLs and use LBYTES neti_2,address type commands for loading in the resulting binary data.

It gets crowded in here...

But, after fixing my EPROM cartridge with an FSERVE-enabled Toolkit II, I can also finally enable a file server between the Minerva-equipped QL and the unexpanded QL with a TKII cartridge. (The fileserver does not work from a RAM-based toolkit.)

With the file server, it becomes possible to access files from drives connected to the first QL, referring it with n1_ or n2_, for example. Commands can take the form of EXEC_W n2_flp1_qed_exe

The file server works if you remember it's not all-powerful - trying to do complex things such as compiling across the server and editing text at the same time may result in the computer getting stuck. Trying to poll a file every few seconds over the server was not such a great idea either. Even if I managed to get the QL stuck, in most of the cases the actual server job survives, so files may be recoverable.

So, instead of the LBYTES neti_ - SBYTES neto_ approach I can use the file server to load and run the binary off the ram disk of the other QL.

The second QL does add some unwanted physical complexity to the setup, and obviously the crashing target QL needs to be re-booted too.

Another thing is the other QL does not have Minerva so my page-flipping routines need a different approach and the code can't simply return back to BASIC. But more of that some other time.

68008 assembler

A few snippets of my early code. I'm only here interested in plain QL coding, and my code likely won't work in the later variants.

Still, I try to be nice-ish, and make relocatable code, mostly because it's useful to have the code return to BASIC intact for further editing (see the setup above). But I've not loaded the screen address using the QDOS call and neither do I care about potential graphics screen variants.

The following code will fill the usual screen area and return to BASIC without an error message:

SECTION CODE
MOVE.L #$20000,A0 ; starting from 131072
MOVE.L #$3FFF,D0 ; $4000 words = $8000 bytes
LOOP
MOVE.W #$AA00,(A0)+ ; bit pattern
DBRA D0,LOOP ; loop until d0 is 0

MOVEQ #$0,D0 ; return to basic
RTS
END

The Qmac likes to have SECTION and END.

Note that using + with MOVE.W will "hop" the A0 address in 2-byte steps. With MOVE.L it would be 4.

Next thing to ponder is why the following code might be faster than the previous:

SECTION CODE
MOVE.L #$20000,A0
MOVE.L #$7FF,D0
MOVE.L #$AA00AA00,D7
LOOP
MOVE.L D7,(A0)+
MOVE.L D7,(A0)+
MOVE.L D7,(A0)+
MOVE.L D7,(A0)+
DBRA D0,LOOP

MOVEQ #$0,D0
RTS
END

With 68008, I'm told it would be wise to have as much as possible done through the registers, as the speed issues (versus 68000) are not such a huge problem.

Some have gone as far as to say the QL is "really 8-bit", but as the opcodes are equal to 68000 and much more than 65536 bytes of memory can be accessed without writing any paging routines, I'd say it's a true 16-bit world for the coder.

The verdict

I'm surprised it's still possible to get accustomed with coding on the QL itself. Today's card readers and the memory expansion are very helpful in this. The largest problem is that the Qmac is not that fast, especially as the source begins to grow, but it has been useful in getting to know 68000 and the QL.

The QED text editor block copy/paste functions are a bit primitive compared to modern editors, but overall the typing experience is not that bad. Sinclair keyboards are a bit of an acquired taste, but I've found I can live with the QL keys.

It's been a long and rocky road to get the Sinclair QL make these things. But it has also been quite rewarding. I've been working with a sprite routine, something I'll come back to later.

Saturday, 3 December 2016

6502 notes

I've been dabbling in 6502 machine code for a long time, but only recently started to get a hold of it. My initial touchpoint was the machine code monitor on the Action Replay VI module on the C64 and a battered copy of the Commodore 64 Programmer's Reference I found from a flea market in early 1990s.

Which I obviously still have.

These days, I know a bit more, but certain things are difficult to remember. Here's some.

How the Carry (C) affects the ADC and SBC instructions (and the difference)

For the longest time I did not get this. For additions, carry is cleared (with CLC) and then becomes set if the addition crosses the 255 boundary. The C flag is then carried (duh) over to the next ADC instruction as a +1. With subtraction, it's the reverse: the carry is previously set and then becomes cleared if the subtraction crosses 0. When clear, the SBC #$00 performs a SBC #$01, for example. After this, 16-bit addition and subtraction became easier to understand.

;Incrementing a 16-bit value at $C000-$C001 with ADC

CLC ;Clear Carry
LDA $C000
ADC #$01 ;if $FF is crossed Carry becomes set
STA $C000
LDA $C001
ADC #$00 ;is like 1 if Carry is set
STA $C001

;Decrementing a 16-bit value at $C000-$C001 with SBC

SEC ;Set Carry
LDA $C000
SBC #$01 ;if $FF is crossed Carry becomes clear
STA $C000
LDA $C001
SBC #$00 ;is like 1 if Carry is clear
STA $C001

When is the Overflow (V) flag set?

This flag is not that useful to me, but I don't want to forget what it does.

As two bytes are added together with ADC, as above, the calculation is at the same time evaluated as unsigned and "signed". If the numbers as "signed" cross a boundary of (-127...128) the overflow will be set.

I've come across a misconception that says "if the 7th bit of the original value and the resulting calculation are different, then the overflow is set."

However, the following will not trigger the overflow flag, even though the 7th bit clearly changes:

CLC
LDA #$FF ; 0b11111111 (-1) is the original value
ADC #$02 ; 0b00000010
; 0b00000001 (+1) is the result

I won't go around doing the two's complement explanation, which frankly just makes it more difficult to understand for me. Let's just have a look at how the signed numbers work:

Positive values:

00000000 = 0
00000001 = 1
00000010 = 2
00000011 = 3
00000100 = 4
...
00001000 = 8
...
00010000 = 16
...
00100000 = 32
...
01000000 = 64
...
01111111 = 127

The negative numbers have their minus sign bit set:

11111111 = -1
11111110 = -2
11111101 = -3
11111100 = -4
11111011 = -5
...
11110111 = -9
...
11101111 = -17
...
11011111 = -33
...
10111111 = -65
...
10000000 = -128

There's no special "signed mode" for the ADC command. Simply, if within the above representation, you do a calculation that crosses over 127, or under -128, the V flag will be set. Just like the C will be used when crossing over 0...255 in the ordinary representation.

So what the above example does is -1+2, which does not cross this boundary any more than 1+2 would set the C flag.

The reformulated rule: "If values with a similar 7th bit are added together, and the 7th bit changes, the V flag is set."

For the most part I don't see the point of the signed-system, as I can usually pretend that 128 or 32768 is the zero point in an unsigned calculation.

Here, have a picture of a silly robot that makes it all easier to understand.

How the Indexed Indirect and Indirect Indexed addressing modes work

Most of these things have no place in fast code, hence I had not really looked at them.

Perhaps the funniest personal discovery was to see that it's possible to point to a 16-bit address using the indirect indexed addressing mode. I did not know the 6502 was capable of doing this so directly (relatively speaking). It's like z80's ld (de),a but not really. Because there is no 16-bit register as such, incrementing the address is a matter of using the ADC on the stored value as described in the first topic above.

Usually a memory fill is done effectively with something like this, which would fill the C64 default screen area with a "space".

LDA #$20
LDX #$FF
loop:
STA $0400,X
STA $0500,X
STA $0600,X
STA $0700,X
DEX
BNE loop
RTS

We can write the start address to a location in a zero page and use the indexed indirect mode to write to the address, at the same time incrementing the stored value with a 16-bit addition.

Some have suggested that the zero-page works as a bunch of additional "registers" for the 6502 and this seems to validate that idea slightly.

(It is extremely slow, though)

LDA #$04 ;high byte of start address $0400
STA $11 ;store at zero page
LDA #$00 ;low byte of start address $0400
STA $10 ;store at zero page
LDY #$00 ;we won't change the Y in the following
loop:
LDA #$20 ;fill with ASCII space
STA ($10),Y ;write to address stored at zp $10-$11
CLC ;clear Carry
LDA #$01 ;low byte of 16-bit addition
ADC $10 ;add to low byte
STA $10
LDA #$00 ;high byte of 16-bit addition
ADC $11 ;add to high byte (+Carry)
STA $11
CMP #$08 ;are we at $08xx?
BEQ exit
JMP loop
exit:
RTS

When to use BPL, BMI, BCS, BCC branching operations

Some of my earliest code tended to work with BEQ and BNE only. Quite a lot can be achieved with them. The classic loop already used above, works without comparison instructions at all. Besides this, it can make a big speed difference how the '0' is used in data/tables.

LDX #$0B
loop:
INC $d020
DEX
BNE loop

My first instinct has been to use the BPL and BMI opcodes as they sounded nice and simple: Branch on PLus and Branch on MInus.

However, the BPL/BMI work with the signed interpretation discussed above. Parts of code might work as intended if the numbers happened to be in the +0...127 boundary, which probably confused me in the past.

So the BCC (Branch if Carry Clear) and BCS (Branch if Carry Set) are used instead when dealing with unsigned number comparisons, even if they sound less friendly.

LDA #$20
CMP #$21
;C=0

LDA #$20
CMP #$20
;C=1

LDA #$20
CMP #$1F
;C=1

This ought to branch if the A value is lesser than the one it is compared to:

LDA #$1F ; $00-$1F will result in C=0
CMP #$20
BCC branch ; (branches in this case)

The next one branches if the A value is equal or greater than the one it is compared to:

LDA #$20 ; $20-$FF will result in C=1
CMP #$20
BCS branch ; (branches in this case)

So, yeah.

Strangely enough many of the webpages and even books about 6502 are incomplete when it comes to explaining in detail what each of the commands do. Even the books might just state that a particular opcode "affects" a certain flag, but may not open up how exactly that flag is affected.

Saturday, 23 July 2016

Fort Django

A little reflection on the making of a Commodore 64 game. It's the first C64 machine-code game project I've really finished. All right, it was made with cc65 C with inline assembler but what I mean it's not made in BASIC. Significantly, it's the first 8-bit game I've made public in some way.

I guess making an 8-bit game is something I would have liked to do for a long time. Fort Django was started in 2014, and after a long pause I found the energy to make it into a release.

Djangooooooooohh....!

The game

You guide the character with a joystick in port 2. You can run, climb, crouch, jump and shoot. Shoot down the baddies, collect money bags and find the exit. The descending bonus timer means the faster you can collect the next bag the more $ you can get.

The game is very short and not at all hard. The only challenge comes from trying to be faster, for example it's possible to break the $10000 barrier on completion.

Get that bag, shoot that baddie

Inspired by Saboteur! from Durell, I at first thought about making a beat 'em up oriented game. As I needed to scale down the project I found it would be simpler to turn the game into a shooter with a western theme. The map is very small, but then again I like short games such as Saboteur! and Bruce Lee, as they have a strange kind of replay value. Much like with Saboteur!, I wanted to ensure there was a definite ending to the game and score could not be milked forever.

Making of

I created the game with cc65 C compiler, using inline-assembler for speed critical parts such as the sprite routines. The C64 has eight sprites, eight 8-bit addresses for the horizontal coordinate (0-255) and one address that holds the highest bits for all the X coordinates, so the sprites can also reach the right hand part of the screen. (256-320)

For a beginner it can be a bit tricky to decode these 9-bit sprite coordinates, especially in pure assembler. I have eight separate 16-bit memory locations for storing the X coordinates. These are then broken down into the hardware sprite coordinate values. This approach is a compromise between ease of use and speed.

The figure below shows how the 16-bit X coordinates are stored in $C000/$C001, $C008/$C009, $C010/$C018 byte pairs (the grey stuff in the middle). The less significant byte of these 16-bit values can be copied directly to the $D000, $D002, $D004... but the high bit is taken from the lowest bit of the most significant byte of the 16-bit values and combined into a value that is stored in $D010.

This is done once in a frame, so the high-bit issue can be forgotten in other parts of the code. The handiness of this is only really apparent in C, where you can then move the sprite x coordinate around with:

*(unsigned*)0xc000=*(unsigned*)0xc000+1;

Comparisons between coordinates become easier, too. This checks if sprite 7 is right of the sprite 0:

if(*(unsigned*)0xc038>*(unsigned*)0xc000){do_stuff();}

Basically all the "collision detection" is built from this type of statements instead of the hardware sprite collision address, and as such is not pixel perfect. In these box-collision cases it's better to be lenient toward the player and a bit biased against the enemies.

The "16-bit" coordinate ought not to exceed 511, because only one bit is taken from the more significant byte.

The sprite Y coordinates are 8-bit values anyway and can be handled directly with the $D001, $D003, $D005... hardware addresses.

The whole X coordinate copying is achieved with the code below, starting from the less significant byte copying and ending with the high-bit construction. Obviously other locations than $C000- can be used for the coordinate storage.

lda $C038
sta $D00E
lda $C030
sta $D00C
lda $C028
sta $D00A
lda $C020
sta $D008
lda $C018
sta $D006
lda $C010
sta $D004
lda $C008
sta $D002
lda $C000
sta $D000

lda $C039
clc
rol a
ora $C031
rol a
ora $C029
rol a
ora $C021
rol a
ora $C019
rol a
ora $C011
rol a
ora $C009
rol a
ora $C001
sta $D010

This is a starting point for a fairly generic solution, but of course it can be adapted for any particular needs. For example, the top sprite coordinates of the dudes in the game are copied and transformed from the bottom part coordinates, which changes the above routine a bit. Also, if less sprites are used why bother going through all the eight?

About the graphics

The graphics are made with PETSCII editor. Not only the background tiles, but the game map and even the sprites have been edited there. It goes to show that the PETSCII editor multiframe-editing is surprisingly powerful way for controlling this type of game "assets".

The game map tiles.

Another PETSCII screen was wasted for defining the movement rules for the above tiles. These are used for building a movement table every time the player enters a room, just as the room is drawn from tiles. How all these sprite, tile and movement table elements exactly relate to each other is a bit too intense to explain here, especially as they are not that well thought out.

The tile movement rules. @=space, A=block, B=platform, C=ladder

The map was also edited in the PETSCII editor. The 40x25 area is divided into 5x3 tile elements, giving 64 simple screens. Coloring indicates enemies and money bags. Not everything is absolutely visible in the picture below, as some spaces can be "colored" too. The game engine allows only certain combinations and positions for the enemies and objects.

The game map.

The map memory area is also used for indicating whether objects or enemies are removed, from the 1 (white=enemy) and 2(red=bag) status into 255 and 254. After the game is over these are changed back into 1 and 2.

Editing a multicolor sprite in PETSCII editor. The sprite export is not a standard feature!

What next?

I dropped many game elements I toyed with at some point. For instance, the chests could have contained items, and the doors could have potentially led to other areas in the fort.

The gold bags were a fairly late addition when I felt that only shooting bad guys would be far too minimal. From adding the bags it was a small leap to increase the jumping elements in the game. The bags also inspired the time-based "bonus dollars" mechanic. Still there might have been a bit more to do.

Code-wise, a music routine would have been nice but in the beginning I was a bit scared to sync the animation with a music interrupt. The sound effects are also sparse due to my inexperience with SID.

Yet, all additions would have also expanded the complexity of the game and the testing time exponentially. I'm glad I could finish it as it is. With this experience I already have some ideas about how to approach this type of project better.

Links

You'll need a Commodore 64 or an emulator to play the game.

Fort Django v1.1 at CSDb
Read the blog post about the v1.1

1.0 (older version):
Direct download
Alternative link
Page at CSDb
A cracked version in a T64 format

Pages