In recent years, AcheronVM has gone through a substantial rewrite, and is nearly complete. The code is presented in its current state due to interest, and documentation updates are ongoing.
While the project is still being finalized, the core operations do work fine. However, I cannot claim full stability of all features in all combinations yet. It’s a very reconfigurable system, and the WBIT dispatcher is getting the most attention.
AcheronVM provides a flexible and uniquely optimized virtual 16-bit CPU which runs on the 6502, inline with native assembly code. You could consider it a successor to SWEET16. It seeks to maximize speed, density, and high-level power, without making large sacrifices in any of those 3 aspects.
It is designed to be a rich execution environment for high-level 16-bit code, including pointer-offset structure dereferencing, reentrant functions, exception handling, and more, in a minimal footprint. The ISA is completely reconfigurable, and is intended to be customized per-project for the best performance and applicability.
A presentation on its design and implementation was given at VCF Midwest 14 (2019), and is recommended to orient yourself to the method behind the madness before continuing in to the documentation:
Re-answering the final question: While no provided Acheron instructions test the entire carry stack, it is easy to create one. The zeropage location cstack
holds its byte value, which can be directly read & tested from either 6502 or Acheron code. This example creates a little custom instruction around that.
VM Documentation:
Default Instruction Set (Autogenerated Files):
Discussion Forum: Thread on the 6502.org forums, starting from the 2019 version.
License: AcheronVM is licensed under the LGPLv3. For alternate licensing and/or commercial support, email the AcheronVM github account.
Registers are 16 bits wide, little-endian, stored in a sliding window in a dedicated zeropage area. They are named from r0
to (theoretically) r127
, at respectively increasing zeropage addresses.
Register name: r0 r1 r2 r3 r4 ...
Sample address: 0046 0048 004a 004c 004e ...
Within a scope or function call, new uninitialized registers can be slid into view, and slid back out of view when work is complete. The registers of the parent scope/function are still fully visible and accessible, just at higher register locations.
Example:
r0 r1 r2 ... ; register names
0000 1ef4 abcd ... ; register values
Call a function, and it might grow 2 registers.
<--------|
r0 r1 |r2 r3 r4 ... ; r2 = caller's r0
xxxx xxxx|0000 1ef4 abcd ...
The function can use r0
and r1
as its own local variables. It can read r2
-r4
to receive any input parameters, as well as write its return values there. All of the registers are fully orthogonal. The only thing differentiating parameters, locals, or return values is programmer policy.
When the function returns, the register window is slid back to its original place, which was marked at the beginning of the function. (The mgrow
and retm
instructions perform this marking/growing and function return.) The caller can then continue on as normal, with its registers where they were before, oblivious to the called function’s registers, and without any save/restore work.
|--------->
| r0 r1 r2 ...
|.... .... 0034 1ef4 abcd ... ; r0 holds a return value
Some functions might not need new variables, solely working in their caller’s register space. This is especially useful in local functions where the parent context is always fully known.
Since registers are in zeropage, this space might be at a premium. Functions should seek to minimize the number of registers that need to be grown, reusing registers wherever possible.
rP
To gain the advantage of the implicit addressing enjoyed by Accumulator or Top-of-Stack systems, AcheronVM remembers the prior used register for easy reuse. This referred to as rP
, and aliases one of the real registers (r0
-r127
); it is not a separate register itself.
Many instructions implicitly read or write rP
, and can remap rP
to any other register. The most common remapping instruction is with
, which directly specifies which register to use as rP
.
Example:
r0 r1 [r2] ; rP points to r2
0f00 0012 0000
Issuing incp
increments rP
, which can affect any register, in this case r2
:
incp
r0 r1 [r2]
0f00 0012 0001
When rP
points somewhere else, incp
will affect a different register:
with r1
incp
r0 [r1] r2 ; rP now points to r1
0f00 0013 0001
Just like any other accumulator-based CPU, two-register instructions often only need to specify 1 of them, as rP
is implicit.
add r0
r0 [r1] r2 ; r1 += r0
0f00 0f13 0001
Since rP
may point to any register, the instructions remain orthogonal. For performance, the with
instruction can be transparently combined with the following instruction, optimizing away its overhead.
When the register window is grown, rP
stays at the same physical register. For instance, if it points to r2
and 2 more registers are grown, then rP
will point to r4
.
r0 r1 [r2] r3 r4 r5 ...
grow 2
r0 r1 r2 r3 [r4] r5 r6 r7 ...
When the register window is shrunk, rP
automatically moves to r0
for safety, so that rP
cannot point into deallocated register space. This behavior may be modified by compile-time options.
r0 r1 r2 r3 [r4] r5 r6 r7 ...
shrink 2
[r0] r1 r2 r3 r4 r5 ...
The regnames
macro can define names for the registers, making source code much more readable and maintainable. This should be scoped within .proc
or .scope
to avoid name clashes.
;; Calculate address = $1000 + (y1 << 6) + x1
.proc calcAddress
; This renames r0, r1, and r2.
regnames address, x1, y1
with address
copyr y1
shl 5
add x1
addi $1000
ret
.endproc
Names can be left empty if they’re not needed in the current scope:
; Only r0 and r2 are given names
regnames address,,y1
Building software that uses AcheronVM is a two-stage process.
First, the VM itself must be built, via make acheron
. A critical output of this step is the acheron.inc
file, which defines macros for generating all the instructions represented in that build of AcheronVM. Documentation can also be generated separately via make docs
, if Lisp is installed (SBCL or CLISP).
Second, the user code is built. This includes acheron.inc
, which enables the new instruction syntax. This code is directly linked against the .o
files from the first step, in order to resolve any labels from the VM, at the very least the one for jsr acheron
. The included make test
target performs this linking build with the test.asm
file, appending its code to the same memory segment as the AcheronVM runtime.
src/acheron.cfg
specifies the memory layout, and should be adjusted or ported per project. The ZPSTACK, ZPTEMP, and ACHERON memory areas define the locations of all the memory used. Everything else is for creating the necessary output files.
src/options.inc
holds build-time feature and performance declarations, the end-of-line marker for generated text files, and the per-platform binary file type for the final output.
src/acheron.asm
currently pulls in the instruction definitions using .include
, to standardize the visibility of macros and to specifically control the order for reachability. Instructions may be removed in order to reduce the footprint, free up opcode space, or to use custom replacements. The .include
directives can be removed, or individual instructions can be commented out in their respective files. New custom instructions should currently be in their own file and added via .include
from inside acheron.asm
as well.
Makefile
contains the output filename, the Lisp to use in building the documentation, and the test file generation.
6502 and Acheron code can be inlined:
.include "acheron.inc"
; Normal 6502 code
ldx #$01
stx $0400
inx
stx $0401
jsr acheron ; Switch to Acheron mode
grow 1
with r0
ldma $0400
stma $0402
native ; Switch to 6502 mode
rts
The register window position and rP
are preserved when switching to 6502 mode and back.
The Acheron call stack resides on the 6502 CPU stack, so 6502 code can be used within nested Acheron functions, as long as the CPU stack is preserved.
Acheron can also call native 6502 routines with calln
. No registers need be saved on the 6502 side, as all Acheron register state is automatically persisted into zeropage first.
Currently, calls to Acheron functions from 6502 mode must be done from Acheron mode, in order for the stack state to be properly maintained:
; 6502 code
lda ...
sta ...
jsr acheron
call coolFunc16 ; acheron-mode function call
native
lda ...
sta ...
rts
; Acheron function
coolFunc16:
mgrow 3
...
retm
All noted zeropage locations & native calls are documented in instruction-set.html
as well.
Name | Type | Description |
---|---|---|
rptr |
zp byte | Zeropage address of r0 , the start of the register window |
pptr |
zp byte | Zeropage address of rP , the ‘prior’ register, usually held in .X |
iptr |
zp pointer | Base location of the Acheron byte code instruction pointer |
iptr_offset |
zp byte | When held in .Y , (iptr),Y is the actual instruction pointer |
cstack |
zp byte | Carry stack, with the MSB being the most recent carry bit |
zptemp |
3 zp bytes | Temporary zeropage storage for use in instruction implementations |
Register | Value |
---|---|
.X | Zeropage location of rP |
.Y | Offset in (iptr),Y to the byte after the opcode |
Carry | Clear |
There are multiple return points to the main loop, differentiated by how operand bytes remain in the instruction stream, and thus how .Y
is handled.
In all cases, .X
must still point to rP
, which the instruction might have changed. Carry is ignored.
Entry | Description |
---|---|
mainLoop0 |
(iptr),Y points to the byte after the complete instruction |
mainLoop1 |
(iptr),Y points to the last byte of the instruction |
mainLoop2 |
(iptr),Y points to the penultimate byte of the instruction |
mainLoopRestoreY |
.Y has been saved to iptr_offset , and would point to the last byte of the instruction, like mainLoop1 |
The convenience macros save_y
, tay_save
, and others at the bottom of macros.inc
save the .Y
register into iptr_offset
when it needs to be used for other purposes in an instruction.
If .X
needs to be repurposed inside an instruction implementation, it should be saved in pptr
and restored before jumping back to the main loop.
The OP
macro labels a location in the source code where the named instruction implementation begins.
OP <name>, <operands>, <category>, <html doc>
This macro does not create any bytes in the code stream where it resides, but places all its generated information into separate ca65 segments, which allows instructions to flow into another’s implementation without issue:
OP incp2, none, regs, "rP := rP + 2"
sec
OP incp, none, regs, "rP := rP + 1"
lda #0
adc 0,x
sta 0,x
bcc :+
inc 1,x
:jmp mainLoop0
The OP usage site also gains the label op_<name>
, in case such a chaining needs to jump to it.
The 2nd field of the OP
macro defines what operand parameters the instruction takes, and how they’re ordered and encoded. These options are defined in operand-encodings.inc
, and can be extended.
Register operands are stored pre-shifted (r0
encodes to 0; r1
to 2; r2
to 4, etc). Options to offset values by 1, calculate relative branch offsets from the current PC to a label, and other conveniences are included.
rA
names an auxiliary register which will be referenced, while rD
names a destination register that is intended to be come the new rP
, meaning .X
will be updated to point to it. These are naming conventions and must be implemented in code to match the documentation.
On instruction entry lda (iptr),y
accesses the first operand byte. The .Y
register may be freely incremented without bounds checking, up to a maximum instruction length of 128 bytes. The last read does not have to increment .Y
if it jumps to mainLoop1
.
When reading register values, the offset is stored in the operand byte, but this must be added to rptr
. The convenience macros get_ra
and get_ra_y
perform this operation, requiring that carry was clear on entry, and assuming carry stays clear as long as the register number is sane and does not wrap past address $ff. The macro get_rd
reads a register address and places it in .X, effectively realiasing rP
.
Note that while some of these mimic standard instructions, the actual source code shares implementation bodies between instructions and is not verbatim here.
No operands:
OP clrp, none, regs, "rP := 0"
lda #0 ; set both bytes of rP to zero
sta 0,x
sta 1,x
jmp mainLoop0 ; since .Y is pointing past the full instruction
1-byte literal operand:
OP seti8, imm8, regs, "rP := imm8"
lda (iptr),y ; get the imm8 operand
sta 0,x ; store in low byte of rP
lda #0
sta 1,x ; high byte
jmp mainLoop1
2-byte literal operand:
OP seti16, imm16, regs, "rP := imm16"
lda (iptr),y
sta 0,x
iny ; we need to increment once to get to the 2nd operand byte
lda (iptr),y
sta 1,x
jmp mainLoop1 ; ... but can just leave it there
Register operand:
OP copyr, ra, regs, "rP := rA"
get_ra_y ; saves .Y and sets it to point to rA (rptr + operand byte)
lda 0,y ; this is absolute,y addressing but oh well
sta 0,x
lda 1,y
sta 1,x
jmp mainLoop1
Both a byte literal and a register operand:
OP copyri, imm8ra, regs, "rP := rA + imm8"
lda (iptr),y ; get the imm8 operand
pha ; and stash it
iny ; move to the next operand byte
get_ra_y ; Save .Y, then read rA into .Y
pla ; add the imm8
adc 0,y ; to rA
sta 0,x ; and store into rP
lda #0 ; high byte
adc 1,y
sta 1,x
jmp mainLoopRestoreY ; .Y was pointing to the last byte when it was saved
We can create a custom instruction to test if any bits are set in the carry stack. This little self-contained block of code is all that’s required to define & document it:
OP bcstack, rel8, flow, "Branch if any bit in the carry stack is set."
lda cstack
beq :+
jmp op_ba ; jump to the 'branch always' instruction implementation,
; which will consume the rel8 operand byte
:jmp mainLoop1 ; done, but 1 more byte of the instruction must be passed over
And just for extra convenience and demonstration,
OP clrcstack, none, regs, "Clear the carry stack."
lda #0
sta cstack
jmp mainLoop0
Now Acheron code can deal more idiomatically with the carry stack state as a whole:
clrcstack ; start with a clean carry stack
with r3 ; do some additions
add r4
add r5
add r6
bcstack overflowed ; branch if any of the above additions carried
...
overflowed: ; normal ca65 label
jump dead
The instruction reference is broken into categories, each with their own documentation and set of instructions. These categories are also declared via macros in the source code:
OP_CATEGORY <name>, <title> [, <html documentation>]
OP_CATDOC <name>, <html documentation>
Multiple OP_CATDOC
s may be defined for a single category. Each documentation string will be enclosed in a <p>
tag.
To include literal newlines for <pre>
tag use, you must use ca65 brace notation to enclose a list of strings and EOL tags:
OP_CATEGORY flow, "Flow Control"
OP_CATDOC flow, {"To call an Acheron subroutine from native code:<pre> jsr acheron", EOL, " call <i>label</i>", EOL, " native</pre>This supports reentrant usage from within native portions of a running Acheron call chain."}
Instructions may need zeropage space to implement persistent features. The ZPVAR
macro allocates the bytes within the zp space allocated for AcheronVM, as well as adds it to the documentation.
ZPVAR <name>, <category>, <byte length>, <html documentation>
ZPVAR pptr, regs, 1, "Pointer to the prior register, rP."
Some facilities might interact more with 6502 code. Entry points may be added to the documentation. The NATIVE
macro declares the name as label at the current point in code, so name:
does not need to be additionally declared.
NATIVE <name>, <category>, <jsr|jmp>, <html documentation>
NATIVE acheron, flow, jsr, "Enter Acheron mode, interpreting bytecodes immediately after the JSR instruction."
pla
sta iptr
...
Since Acheron source code definitions can only dispatch on the instruction name, and not its operand syntax, different names must be used for the different operand sets that instructions support:
; 6502 addressing modes, distinguished by operand syntax:
lda 0
lda 0,x
lda (0),y
; Acheron addressing modes, distinguished by instruction name:
ldm r3
ldmi r3,6
ldma $0000
Some of these sets of instructions could be rolled together into single pseudo-instructions, which are simply ca65 macros making assembly-time decisions on the literal parameters. Declaring them in this manner adds them to acheron.inc
and the documentation.
PSEUDO <name>, <params string>, <category>, <html documentation>, <macro expansion>
It is best to write and test the macro expansion first, and then pull its lines into the brace format that ca65 handles. This example shows how setp8
, setp16
, and clrp
can all be combined into a single pseudo-instruction, based on the literal value to set into rP
:
First, get the macro itself working:
.macro setp imm
.if (imm > 255) .or (imm < 0)
setp16 imm
.elseif (imm > 0)
setp8 imm
.else
clrp
.endif
.endmacro
Then quote the individual lines, interspersing them with EOL
tags in a brace list:
{" .if (imm > 255) .or (imm < 0)",EOL," setp16 imm",EOL," .elseif (imm > 0)",EOL," setp8 imm",EOL," .else",EOL," clrp",EOL," .endif"}
And finally fill in the rest of the fields. Note that the parameters are represented as a string, and may include multiple parameters as "param1, param2"
.
PSEUDO setp, "imm", regs, "Becomes clrp, setp8, or setp16.", {" .if (imm > 255) .or (imm < 0)",EOL," setp16 imm",EOL," .elseif (imm > 0)",EOL," setp8 imm",EOL," .else",EOL," clrp",EOL," .endif"}
AcheronVM is a refined, more tractable version of a nearly impossible-to-use prior attempt at high-level 6502 optimization. Its design still embraces some unorthodox and potentially confusing features necessary to gain its advantages. Like the esoteric programming language Malbolge, AcheronVM’s name also derives from Dante’s Inferno, except that the Acheron is a river that only borders Hell.
That’s because AcheronVM is confusing and/or incomplete! ;-) But seriously, any feedback given about what you don’t understand will help me flesh out the documentation better, and is very welcome. I find video explanations help some people break through complexity barriers, and more of those will come as well.
AcheronVM provides a 16-bit assembly-level language, and is intended to be embedded inside projects that have their own assumptions about datastructures and memory layouts. Various libraries and instruction packs for these kinds of features will be developed, and you are encouraged to develop your own project-specific instructions as well.
The ISA provided with AcheronVM is intended to be usefully complete, and appropriate for many general workloads. However, any given code situation will have its own unique needs for optimal flow. Feel free to modify and augment AcheronVM’s ISA to best suit your code, especially to combine common multi-instruction tasks into single instructions; that’s the VM’s intent.
Yes. However, most 6502 code is already bespoke per project, and source code reuse isn’t actually that high. Especially in larger, more complex projects, the demands of the design clash with default assumptions that other code might have. Therefore, shared 6502 code is already treated more like a reference implementation to be ported, and less like a drop-in unit; which is the same situation as combining differently-ISA’d Acheron code.
However, if you have an ISA you have settled on for a project, you can reuse its generated acheron.inc
file to build external bytecode that is compatible with your specific VM. Other crazier options with shared opcode subsets might come in the future.
Sure. The number of included instructions is still less than 128. When it gets larger than that, I’ll make a contrib
directory of additional instruction packs with all sorts of behavior, including those of narrow purpose. The default ISA will remain less than 128 instructions and general purpose, to facilitate an easier path for newbies to use and extend it.
No problem. There are 3 selectable dispatchers included. The default WBIT
dispatcher supports 128 instructions + an embedded with
flag in the opcode, but the BYTE
dispatcher will dispatch on a full 256 instructions. See options.inc
for the selection, and OPWITH
for adding an optimized with
-based entry point to individual byte-dispatched instructions. (The 3rd is the SPARSE
dispatcher, which is the smallest and fastest, but only supports 128 instructions without the with
flag, and is likely useless.)
A maximally minimalist ISA can stuff a working VM in less than 100 bytes. AcheronVM comprises just a dispatcher and a set of instruction implementations, working under an implied register model. There are no internal static support functions taking up space, although the jsr acheron
entry point (13 bytes) is usually necessary. The dispatcher is 33-42 bytes without the trap features. Each instruction implementation only takes its code footprint plus a 2 byte entry in the dispatch table. At its largest size with everything enabled, the default distribution takes around 2kB at the time of this writing.
Acheron bytecode has the exact same relocation issues as 6502 code, with 16-bit absolute function locations & address references, and relative 8-bit branches. It is built with the ca65 toolchain, though, and that can use the relocatable o65 file format which takes care of everything for you.
Not currently, though there will likely be an option for that if there’s interest. There is a fair bit of self-modifying code (including the dispatcher itself) which would need to be replaced by larger, slower static versions. A ROM version would likely still need to copy the dispatcher into RAM to run, which would also require a number of branches in the source to be converted to jumps.