acheronvm

AcheronVM

A Reconfigurable 16-bit Virtual CPU for the 6502 Microprocessor

Status

In recent years, AcheronVM has gone through a substantial rewrite, and is nearly complete. The code is presented in its current state due to interest, and documentation updates are ongoing.

While the project is still being finalized, the core operations do work fine. However, I cannot claim full stability of all features in all combinations yet. It’s a very reconfigurable system, and the WBIT dispatcher is getting the most attention.

Overview

AcheronVM provides a flexible and uniquely optimized virtual 16-bit CPU which runs on the 6502, inline with native assembly code. You could consider it a successor to SWEET16. It seeks to maximize speed, density, and high-level power, without making large sacrifices in any of those 3 aspects.

It is designed to be a rich execution environment for high-level 16-bit code, including pointer-offset structure dereferencing, reentrant functions, exception handling, and more, in a minimal footprint. The ISA is completely reconfigurable, and is intended to be customized per-project for the best performance and applicability.

A presentation on its design and implementation was given at VCF Midwest 14 (2019), and is recommended to orient yourself to the method behind the madness before continuing in to the documentation:

AcheronVM on the VCF Midwest YouTube Channel

Re-answering the final question: While no provided Acheron instructions test the entire carry stack, it is easy to create one. The zeropage location cstack holds its byte value, which can be directly read & tested from either 6502 or Acheron code. This example creates a little custom instruction around that.

Documentation

VM Documentation:

  1. Register Model
  2. Building the VM and User Code
  3. Running Acheron Code
  4. Creating New Instructions
  5. Frequently Yelled Questions

Default Instruction Set (Autogenerated Files):

Discussion Forum: Thread on the 6502.org forums, starting from the 2019 version.

License: AcheronVM is licensed under the LGPLv3. For alternate licensing and/or commercial support, email the AcheronVM github account.

Register Model

Registers are 16 bits wide, little-endian, stored in a sliding window in a dedicated zeropage area. They are named from r0 to (theoretically) r127, at respectively increasing zeropage addresses.

Register name:    r0    r1    r2    r3    r4  ...
Sample address:  0046  0048  004a  004c  004e ...

The Sliding Window

Within a scope or function call, new uninitialized registers can be slid into view, and slid back out of view when work is complete. The registers of the parent scope/function are still fully visible and accessible, just at higher register locations.

Example:

                 r0   r1   r2   ...  ; register names
                 0000 1ef4 abcd ...  ; register values

Call a function, and it might grow 2 registers.

       <--------|
       r0   r1  |r2   r3   r4   ...  ; r2 = caller's r0
       xxxx xxxx|0000 1ef4 abcd ...

The function can use r0 and r1 as its own local variables. It can read r2-r4 to receive any input parameters, as well as write its return values there. All of the registers are fully orthogonal. The only thing differentiating parameters, locals, or return values is programmer policy.

When the function returns, the register window is slid back to its original place, which was marked at the beginning of the function. (The mgrow and retm instructions perform this marking/growing and function return.) The caller can then continue on as normal, with its registers where they were before, oblivious to the called function’s registers, and without any save/restore work.

      |--------->
      |          r0   r1   r2   ...
      |.... .... 0034 1ef4 abcd ...  ; r0 holds a return value

Register Space Efficiency

Some functions might not need new variables, solely working in their caller’s register space. This is especially useful in local functions where the parent context is always fully known.

Since registers are in zeropage, this space might be at a premium. Functions should seek to minimize the number of registers that need to be grown, reusing registers wherever possible.

The ‘Prior’ Register, rP

To gain the advantage of the implicit addressing enjoyed by Accumulator or Top-of-Stack systems, AcheronVM remembers the prior used register for easy reuse. This referred to as rP, and aliases one of the real registers (r0-r127); it is not a separate register itself.

Many instructions implicitly read or write rP, and can remap rP to any other register. The most common remapping instruction is with, which directly specifies which register to use as rP.

Example:

     r0   r1  [r2]  ; rP points to r2
    0f00 0012 0000

Issuing incp increments rP, which can affect any register, in this case r2:

incp
     r0   r1  [r2]
    0f00 0012 0001

When rP points somewhere else, incp will affect a different register:

with r1
incp
      r0  [r1]  r2  ; rP now points to r1
     0f00 0013 0001

Just like any other accumulator-based CPU, two-register instructions often only need to specify 1 of them, as rP is implicit.

add r0
      r0  [r1]  r2  ; r1 += r0
     0f00 0f13 0001

Since rP may point to any register, the instructions remain orthogonal. For performance, the with instruction can be transparently combined with the following instruction, optimizing away its overhead.

Interaction With Register Window Changes

When the register window is grown, rP stays at the same physical register. For instance, if it points to r2 and 2 more registers are grown, then rP will point to r4.

                r0  r1 [r2] r3  r4  r5 ...
grow 2
        r0  r1  r2  r3 [r4] r5  r6  r7 ...

When the register window is shrunk, rP automatically moves to r0 for safety, so that rP cannot point into deallocated register space. This behavior may be modified by compile-time options.

        r0  r1  r2  r3 [r4] r5  r6  r7 ...
shrink 2
               [r0] r1  r2  r3  r4  r5 ...

Naming Registers

The regnames macro can define names for the registers, making source code much more readable and maintainable. This should be scoped within .proc or .scope to avoid name clashes.

;; Calculate address = $1000 + (y1 << 6) + x1
.proc calcAddress
  ; This renames r0, r1, and r2.
  regnames address, x1, y1
  with address
  copyr y1
  shl 5
  add x1
  addi $1000
  ret
.endproc  

Names can be left empty if they’re not needed in the current scope:

  ; Only r0 and r2 are given names
  regnames address,,y1

Building the VM and User Code

Building software that uses AcheronVM is a two-stage process.

First, the VM itself must be built, via make acheron. A critical output of this step is the acheron.inc file, which defines macros for generating all the instructions represented in that build of AcheronVM. Documentation can also be generated separately via make docs, if Lisp is installed (SBCL or CLISP).

Second, the user code is built. This includes acheron.inc, which enables the new instruction syntax. This code is directly linked against the .o files from the first step, in order to resolve any labels from the VM, at the very least the one for jsr acheron. The included make test target performs this linking build with the test.asm file, appending its code to the same memory segment as the AcheronVM runtime.

Custom Configuration

src/acheron.cfg specifies the memory layout, and should be adjusted or ported per project. The ZPSTACK, ZPTEMP, and ACHERON memory areas define the locations of all the memory used. Everything else is for creating the necessary output files.

src/options.inc holds build-time feature and performance declarations, the end-of-line marker for generated text files, and the per-platform binary file type for the final output.

src/acheron.asm currently pulls in the instruction definitions using .include, to standardize the visibility of macros and to specifically control the order for reachability. Instructions may be removed in order to reduce the footprint, free up opcode space, or to use custom replacements. The .include directives can be removed, or individual instructions can be commented out in their respective files. New custom instructions should currently be in their own file and added via .include from inside acheron.asm as well.

Makefile contains the output filename, the Lisp to use in building the documentation, and the test file generation.

Running Acheron Code

6502 and Acheron code can be inlined:

 .include "acheron.inc"
 
 ; Normal 6502 code
 ldx #$01
 stx $0400
 inx
 stx $0401
 jsr acheron   ; Switch to Acheron mode

 grow 1
 with r0
 ldma $0400
 stma $0402
 native        ; Switch to 6502 mode

 rts

The register window position and rP are preserved when switching to 6502 mode and back.

The Acheron call stack resides on the 6502 CPU stack, so 6502 code can be used within nested Acheron functions, as long as the CPU stack is preserved.

Acheron can also call native 6502 routines with calln. No registers need be saved on the 6502 side, as all Acheron register state is automatically persisted into zeropage first.

Currently, calls to Acheron functions from 6502 mode must be done from Acheron mode, in order for the stack state to be properly maintained:

; 6502 code
 lda ...
 sta ...
 jsr acheron
 call coolFunc16  ; acheron-mode function call
 native
 lda ...
 sta ...
 rts

; Acheron function
coolFunc16:
 mgrow 3
 ...
 retm

Creating New Instructions

Important Locations

All noted zeropage locations & native calls are documented in instruction-set.html as well.

Name Type Description
rptr zp byte Zeropage address of r0, the start of the register window
pptr zp byte Zeropage address of rP, the ‘prior’ register, usually held in .X
iptr zp pointer Base location of the Acheron byte code instruction pointer
iptr_offset zp byte When held in .Y, (iptr),Y is the actual instruction pointer
cstack zp byte Carry stack, with the MSB being the most recent carry bit
zptemp 3 zp bytes Temporary zeropage storage for use in instruction implementations

Instruction entry

Register Value
.X Zeropage location of rP
.Y Offset in (iptr),Y to the byte after the opcode
Carry Clear

Instruction exit

There are multiple return points to the main loop, differentiated by how operand bytes remain in the instruction stream, and thus how .Y is handled.

In all cases, .X must still point to rP, which the instruction might have changed. Carry is ignored.

Entry Description
mainLoop0 (iptr),Y points to the byte after the complete instruction
mainLoop1 (iptr),Y points to the last byte of the instruction
mainLoop2 (iptr),Y points to the penultimate byte of the instruction
mainLoopRestoreY .Y has been saved to iptr_offset, and would point to the last byte of the instruction, like mainLoop1

The convenience macros save_y, tay_save, and others at the bottom of macros.inc save the .Y register into iptr_offset when it needs to be used for other purposes in an instruction.

If .X needs to be repurposed inside an instruction implementation, it should be saved in pptr and restored before jumping back to the main loop.

Defining Instructions

The OP macro labels a location in the source code where the named instruction implementation begins.

OP <name>, <operands>, <category>, <html doc>

This macro does not create any bytes in the code stream where it resides, but places all its generated information into separate ca65 segments, which allows instructions to flow into another’s implementation without issue:

OP incp2, none, regs, "rP := rP + 2"
 sec
OP incp, none, regs, "rP := rP + 1"
 lda #0
 adc 0,x
 sta 0,x
 bcc :+
  inc 1,x
:jmp mainLoop0

The OP usage site also gains the label op_<name>, in case such a chaining needs to jump to it.

Operand Encoding

The 2nd field of the OP macro defines what operand parameters the instruction takes, and how they’re ordered and encoded. These options are defined in operand-encodings.inc, and can be extended.

Register operands are stored pre-shifted (r0 encodes to 0; r1 to 2; r2 to 4, etc). Options to offset values by 1, calculate relative branch offsets from the current PC to a label, and other conveniences are included.

rA names an auxiliary register which will be referenced, while rD names a destination register that is intended to be come the new rP, meaning .X will be updated to point to it. These are naming conventions and must be implemented in code to match the documentation.

Reading Operand Bytes

On instruction entry lda (iptr),y accesses the first operand byte. The .Y register may be freely incremented without bounds checking, up to a maximum instruction length of 128 bytes. The last read does not have to increment .Y if it jumps to mainLoop1.

When reading register values, the offset is stored in the operand byte, but this must be added to rptr. The convenience macros get_ra and get_ra_y perform this operation, requiring that carry was clear on entry, and assuming carry stays clear as long as the register number is sane and does not wrap past address $ff. The macro get_rd reads a register address and places it in .X, effectively realiasing rP.

Examples

Note that while some of these mimic standard instructions, the actual source code shares implementation bodies between instructions and is not verbatim here.

No operands:

OP clrp, none, regs, "rP := 0"
 lda #0         ; set both bytes of rP to zero
 sta 0,x
 sta 1,x
 jmp mainLoop0  ; since .Y is pointing past the full instruction

1-byte literal operand:

OP seti8, imm8, regs, "rP := imm8"
 lda (iptr),y  ; get the imm8 operand
 sta 0,x       ; store in low byte of rP
 lda #0
 sta 1,x       ; high byte
 jmp mainLoop1

2-byte literal operand:

OP seti16, imm16, regs, "rP := imm16"
 lda (iptr),y
 sta 0,x
 iny            ; we need to increment once to get to the 2nd operand byte
 lda (iptr),y
 sta 1,x
 jmp mainLoop1  ; ... but can just leave it there

Register operand:

OP copyr, ra, regs, "rP := rA"
 get_ra_y  ; saves .Y and sets it to point to rA (rptr + operand byte)
 lda 0,y   ; this is absolute,y addressing but oh well
 sta 0,x
 lda 1,y
 sta 1,x
 jmp mainLoop1 

Both a byte literal and a register operand:

OP copyri, imm8ra, regs, "rP := rA + imm8"
 lda (iptr),y  ; get the imm8 operand
 pha           ;   and stash it
 iny           ; move to the next operand byte
 get_ra_y      ; Save .Y, then read rA into .Y
 pla           ; add the imm8
 adc 0,y       ;   to rA
 sta 0,x       ;   and store into rP
 lda #0        ; high byte
 adc 1,y
 sta 1,x
 jmp mainLoopRestoreY ; .Y was pointing to the last byte when it was saved

Carry Stack Example

We can create a custom instruction to test if any bits are set in the carry stack. This little self-contained block of code is all that’s required to define & document it:

OP bcstack, rel8, flow, "Branch if any bit in the carry stack is set."
 lda cstack
 beq :+
  jmp op_ba          ; jump to the 'branch always' instruction implementation,
                     ;  which will consume the rel8 operand byte
:jmp mainLoop1       ; done, but 1 more byte of the instruction must be passed over

And just for extra convenience and demonstration,

OP clrcstack, none, regs, "Clear the carry stack."
 lda #0
 sta cstack
 jmp mainLoop0

Now Acheron code can deal more idiomatically with the carry stack state as a whole:

 clrcstack           ; start with a clean carry stack

 with r3             ; do some additions
 add r4
 add r5
 add r6

 bcstack overflowed  ; branch if any of the above additions carried
 
 ...
 
overflowed:          ; normal ca65 label
 jump dead

Defining Categories

The instruction reference is broken into categories, each with their own documentation and set of instructions. These categories are also declared via macros in the source code:

OP_CATEGORY <name>, <title> [, <html documentation>]
OP_CATDOC <name>, <html documentation>

Multiple OP_CATDOCs may be defined for a single category. Each documentation string will be enclosed in a <p> tag.

To include literal newlines for <pre> tag use, you must use ca65 brace notation to enclose a list of strings and EOL tags:

OP_CATEGORY flow, "Flow Control"
OP_CATDOC flow, {"To call an Acheron subroutine from native code:<pre> jsr acheron", EOL, " call <i>label</i>", EOL, " native</pre>This supports reentrant usage from within native portions of a running Acheron call chain."}

Defining Zeropage Variables

Instructions may need zeropage space to implement persistent features. The ZPVAR macro allocates the bytes within the zp space allocated for AcheronVM, as well as adds it to the documentation.

ZPVAR <name>, <category>, <byte length>, <html documentation>
ZPVAR pptr, regs, 1, "Pointer to the prior register, rP."

Defining Native Subroutines

Some facilities might interact more with 6502 code. Entry points may be added to the documentation. The NATIVE macro declares the name as label at the current point in code, so name: does not need to be additionally declared.

NATIVE <name>, <category>, <jsr|jmp>, <html documentation>
NATIVE acheron, flow, jsr, "Enter Acheron mode, interpreting bytecodes immediately after the JSR instruction."
 pla
 sta iptr
 ...

Defining Pseudo-Instructions

Since Acheron source code definitions can only dispatch on the instruction name, and not its operand syntax, different names must be used for the different operand sets that instructions support:

 ; 6502 addressing modes, distinguished by operand syntax:
 lda 0
 lda 0,x
 lda (0),y

 ; Acheron addressing modes, distinguished by instruction name:
 ldm r3
 ldmi r3,6
 ldma $0000

Some of these sets of instructions could be rolled together into single pseudo-instructions, which are simply ca65 macros making assembly-time decisions on the literal parameters. Declaring them in this manner adds them to acheron.inc and the documentation.

PSEUDO <name>, <params string>, <category>, <html documentation>, <macro expansion>

It is best to write and test the macro expansion first, and then pull its lines into the brace format that ca65 handles. This example shows how setp8, setp16, and clrp can all be combined into a single pseudo-instruction, based on the literal value to set into rP:

First, get the macro itself working:

.macro setp imm
 .if (imm > 255) .or (imm < 0)
   setp16 imm
 .elseif (imm > 0)
   setp8 imm
 .else
   clrp
 .endif
.endmacro

Then quote the individual lines, interspersing them with EOL tags in a brace list:

{" .if (imm > 255) .or (imm < 0)",EOL,"   setp16 imm",EOL," .elseif (imm > 0)",EOL,"   setp8 imm",EOL," .else",EOL,"   clrp",EOL," .endif"}

And finally fill in the rest of the fields. Note that the parameters are represented as a string, and may include multiple parameters as "param1, param2".

PSEUDO setp, "imm", regs, "Becomes clrp, setp8, or setp16.", {" .if (imm > 255) .or (imm < 0)",EOL,"   setp16 imm",EOL," .elseif (imm > 0)",EOL,"   setp8 imm",EOL," .else",EOL,"   clrp",EOL," .endif"}

Frequently Yelled Questions

Where does the name come from‽

AcheronVM is a refined, more tractable version of a nearly impossible-to-use prior attempt at high-level 6502 optimization. Its design still embraces some unorthodox and potentially confusing features necessary to gain its advantages. Like the esoteric programming language Malbolge, AcheronVM’s name also derives from Dante’s Inferno, except that the Acheron is a river that only borders Hell.

The documentation is confusing and/or incomplete!

That’s because AcheronVM is confusing and/or incomplete! ;-) But seriously, any feedback given about what you don’t understand will help me flesh out the documentation better, and is very welcome. I find video explanations help some people break through complexity barriers, and more of those will come as well.

Why don’t you have string handling, memory heap management, etc‽

AcheronVM provides a 16-bit assembly-level language, and is intended to be embedded inside projects that have their own assumptions about datastructures and memory layouts. Various libraries and instruction packs for these kinds of features will be developed, and you are encouraged to develop your own project-specific instructions as well.

Code using instruction $X would be smaller if it had a different addressing mode!

The ISA provided with AcheronVM is intended to be usefully complete, and appropriate for many general workloads. However, any given code situation will have its own unique needs for optimal flow. Feel free to modify and augment AcheronVM’s ISA to best suit your code, especially to combine common multi-instruction tasks into single instructions; that’s the VM’s intent.

But don’t custom instructions break compatibility between Acheron instances‽

Yes. However, most 6502 code is already bespoke per project, and source code reuse isn’t actually that high. Especially in larger, more complex projects, the demands of the design clash with default assumptions that other code might have. Therefore, shared 6502 code is already treated more like a reference implementation to be ported, and less like a drop-in unit; which is the same situation as combining differently-ISA’d Acheron code.

However, if you have an ISA you have settled on for a project, you can reuse its generated acheron.inc file to build external bytecode that is compatible with your specific VM. Other crazier options with shared opcode subsets might come in the future.

Do you accept pull requests with new instructions‽

Sure. The number of included instructions is still less than 128. When it gets larger than that, I’ll make a contrib directory of additional instruction packs with all sorts of behavior, including those of narrow purpose. The default ISA will remain less than 128 instructions and general purpose, to facilitate an easier path for newbies to use and extend it.

I want a full 256 instruction ISA!

No problem. There are 3 selectable dispatchers included. The default WBIT dispatcher supports 128 instructions + an embedded with flag in the opcode, but the BYTE dispatcher will dispatch on a full 256 instructions. See options.inc for the selection, and OPWITH for adding an optimized with-based entry point to individual byte-dispatched instructions. (The 3rd is the SPARSE dispatcher, which is the smallest and fastest, but only supports 128 instructions without the with flag, and is likely useless.)

Will AcheronVM fit in my tiny embedded environment‽

A maximally minimalist ISA can stuff a working VM in less than 100 bytes. AcheronVM comprises just a dispatcher and a set of instruction implementations, working under an implied register model. There are no internal static support functions taking up space, although the jsr acheron entry point (13 bytes) is usually necessary. The dispatcher is 33-42 bytes without the trap features. Each instruction implementation only takes its code footprint plus a 2 byte entry in the dispatch table. At its largest size with everything enabled, the default distribution takes around 2kB at the time of this writing.

Is the bytecode relocatable‽

Acheron bytecode has the exact same relocation issues as 6502 code, with 16-bit absolute function locations & address references, and relative 8-bit branches. It is built with the ca65 toolchain, though, and that can use the relocatable o65 file format which takes care of everything for you.

Can AcheronVM run from ROM‽

Not currently, though there will likely be an option for that if there’s interest. There is a fair bit of self-modifying code (including the dispatcher itself) which would need to be replaced by larger, slower static versions. A ROM version would likely still need to copy the dispatcher into RAM to run, which would also require a number of branches in the source to be converted to jumps.