While I was trying to write a(nother) Perl script to work with BBC disc images that would ultimately analyse the contents of a disc image and determine what each file in it might be, I spent too long -- or maybe not enough time -- thinking about how to tell what might be a 6502 machine code program. I eventually settled on the idea of looking for continuous runs of valid instructions ending with a break in the flow of execution, such as RTS, JMP or a conditional branch.
And it somehow turned into this: This is a zip archive with a copy of the script and some demonstration files to explain it:
What it can do is, take a file extracted from a disc image; search for chunks of 6502 machine code; and produce a file of BeebAsm-compatible 6502 assembly language code, which will assemble to produce a binary file which is byte-for-byte identical to the original input file. You can rename any of the temporary labels it assigns.
Once you have unzipped the above, open a shell, cd to the folder and typeThe -i parameter is the input file, and -l is the load address. 0x indicates a hex constant (because & and $ have special menings to the shell). You can optionally specify an execution address with -e, but in this case, it's the same as the load address.
You should see the following output:Note it's all been considered "not code", because the code does not end cleanly on a change of flow before the first invalid instruction. But we can see an RTS at &090D; so we can set a gap here. This time, run-g specifies a gap. The first address is the start of the gap, there is a comma as a delimiter, and the second address is the first after the gap. Just like *SAVE . If you want to specify multiple gaps, you will have to separate them with spaces; which can either be individually escaped with backslashes, or else put the whole lot in speech marks; -g "start1,end1 start2,end2"
This will give the following output:This is looking good, but we can do better.
Notice that labels have been assigned at the start of the code, and to every address in an instruction operand. These are of the form tl and a series of hex digits. We can use the parameter -o filename to generate a JSON file with the labels (which will also contain the gap definitions, so we can omit the -g parameter when we come to reload it); edit the JSON to give the labels more meaningful names; and then use the parameter -j filename to load this JSON file back in. We can also use -a filename to create a BeebAsm source file.The JSON will look something like this:though the order may be different, as the JSON object is implemented as a Perl hash, and the ordering of hash elements is subject to change. Addresses are given in decimal, but it is actually possible to specify hex constants with & or 0x.
And here's what the assembler output hello_again.6502 might look like with some renaming of labels:[/code]This will assemble to create a file hello.rec which will be a faithful recreation of HELLO.
There's still more work to do, changing runs of EQUBs of printable characters to single EQUS statements and allowing dot-labels in not-code sections. I was just desperate enough to be delighted when it worked at all.
And it somehow turned into this: This is a zip archive with a copy of the script and some demonstration files to explain it:
- hello_world.6502 -- the original BeebASM source code
- HELLO -- the 6502 program generated from it
- hello_1.ssd -- a disc image with HELLO
- nard -- itself
What it can do is, take a file extracted from a disc image; search for chunks of 6502 machine code; and produce a file of BeebAsm-compatible 6502 assembly language code, which will assemble to produce a binary file which is byte-for-byte identical to the original input file. You can rename any of the temporary labels it assigns.
Once you have unzipped the above, open a shell, cd to the folder and type
Code:
$ ./nard -i HELLO -l 0x900You should see the following output:
Code:
Load address = &0900 Execution address = &0900Next available address = &091EStart of chunk &0900.tl0900 0900 A2 <-- Possible code section begins.tl0900 0900 A2 00 LDX #&00 0902 BD 0E 09 LDA tl090e, X 0905 F0 06 BEQ tl090d 0907 20 EE FF JSR tlffee 090A E8 INX 090B D0 F5 BNE tl0902.tl090d 090D 60 RTS .tl090e 090E 48 PHA 090F 65 6C ADC tl006c 0911 6C 6F 2C JMP (tl2c6f) 0914 20 77 6F JSR tl6f77 0917 72 <-- Invalid opcodeNever mind, it's not code after all. 0918 6C <-- Possible code section begins.tl0918 0918 6C 64 21 JMP (tl2164) 091B 0D 0A 00 ORA tl000aNew labels: tl000a = &000A 1 tl0918 = &0918 0 tl2164 = &2164 1Chunk finished &091E . Not-code: 24 Code: 6Labels: tl000a = &000A 1 tl0900 = &0900 0 tl0918 = &0918 0 tl2164 = &2164 1Code:
$ ./nard -i HELLO -l 0x900 -g0x90e,0x91eThis will give the following output:
Code:
Load address = &0900 Execution address = &0900Next available address = &091EGap starts at &090E and ends at &091E.Start of chunk &0900.tl0900 0900 A2 <-- Possible code section begins.tl0900 0900 A2 00 LDX #&00 0902 BD 0E 09 LDA tl090e, X 0905 F0 06 BEQ tl090d 0907 20 EE FF JSR tlffee 090A E8 INX 090B D0 F5 BNE tl0902.tl090d 090D 60 RTS Code ends cleanly with RTS, preceded by 0 bytes not-code.New labels: tl0902 = &0902 1 tl090d = &090D 1 tl090e = &090E 1 tlffee = &FFEE 1Chunk finished &090E . Not-code: 0 Code: 14Start of chunk &090E.tl090e 090E 48 <-- Gap 090F 65 <-- Gap 0910 6C <-- Gap 0911 6C <-- Gap 0912 6F <-- Gap 0913 2C <-- Gap 0914 20 <-- Gap 0915 77 <-- Gap 0916 6F <-- Gap 0917 72 <-- Gap 0918 6C <-- Gap 0919 64 <-- Gap 091A 21 <-- Gap 091B 0D <-- Gap 091C 0A <-- Gap 091D 00 <-- GapNo new labels.Chunk finished &091E . Not-code: 16 Code: 0Labels: tl0900 = &0900 0 tl0902 = &0902 1 tl090d = &090D 1 tl090e = &090E 1 tlffee = &FFEE 1Notice that labels have been assigned at the start of the code, and to every address in an instruction operand. These are of the form tl and a series of hex digits. We can use the parameter -o filename to generate a JSON file with the labels (which will also contain the gap definitions, so we can omit the -g parameter when we come to reload it); edit the JSON to give the labels more meaningful names; and then use the parameter -j filename to load this JSON file back in. We can also use -a filename to create a BeebAsm source file.
Code:
$ ./nard -i HELLO -l 0x900 -g0x90e,0x91e -o hello_labels.json$ nano hello_labels.json # or use whatever editor you prefer$ ./nard -i HELLO -l 0x900 -j hello_labels.json -a hello_again.6502Code:
{ "gaps" : [ [ "2318", "2334" ] ], "labels" : { "2318" : "tl090e", "2317" : "tl090d", "2306" : "tl0902", "65518" : "tlffee", "2304" : "tl0900" }}And here's what the assembler output hello_again.6502 might look like with some renaming of labels:
Code:
\ Recreation of "HELLO"ORG &0900\ Labels:text = &090Eoswrch = &FFEE.start LDX #&00.loop1 LDA text, X BEQ bye JSR oswrch INX BNE loop1.bye RTS EQUB &48 : EQUB &65 : EQUB &6C : EQUB &6C : EQUB &6F : EQUB &2C : EQUB &20 EQUB &77 : EQUB &6F : EQUB &72 : EQUB &6C : EQUB &64 : EQUB &21 : EQUB &0D EQUB &0A : EQUB &00SAVE "hello.rec",&0900,&091E,&0900There's still more work to do, changing runs of EQUBs of printable characters to single EQUS statements and allowing dot-labels in not-code sections. I was just desperate enough to be delighted when it worked at all.
Statistics: Posted by julie_m — Wed Oct 01, 2025 11:38 pm