Quantcast
Channel: stardot.org.uk
Viewing all articles
Browse latest Browse all 5552

development tools • NARD: Not A Real Disassembler

$
0
0
While I was trying to write a(nother) Perl script to work with BBC disc images that would ultimately analyse the contents of a disc image and determine what each file in it might be, I spent too long -- or maybe not enough time -- thinking about how to tell what might be a 6502 machine code program. I eventually settled on the idea of looking for continuous runs of valid instructions ending with a break in the flow of execution, such as RTS, JMP or a conditional branch.

And it somehow turned into this: This is a zip archive with a copy of the script and some demonstration files to explain it:
  • hello_world.6502 -- the original BeebASM source code
  • HELLO -- the 6502 program generated from it
  • hello_1.ssd -- a disc image with HELLO
  • nard -- itself
nard is not a real disassembler, insofar as it takes some detective work to track down where code really begins and ends, and lots of manual work renaming labels. It doesn't do any kind of stateful analysis of the code it's looking at. All it is doing is looking for a block of valid instructions, that ends with RTS, RTI, JMP or Bxx (i.e., a conditional branch). But not JSR, because the next RTS encountered will cause execution to resume at the next location. Only when a block of code ends cleanly with a change in the flow of execution is it considered legitimate.

What it can do is, take a file extracted from a disc image; search for chunks of 6502 machine code; and produce a file of BeebAsm-compatible 6502 assembly language code, which will assemble to produce a binary file which is byte-for-byte identical to the original input file. You can rename any of the temporary labels it assigns.

Once you have unzipped the above, open a shell, cd to the folder and type

Code:

$ ./nard -i HELLO -l 0x900
The -i parameter is the input file, and -l is the load address. 0x indicates a hex constant (because & and $ have special menings to the shell). You can optionally specify an execution address with -e, but in this case, it's the same as the load address.

You should see the following output:

Code:

Load address = &0900  Execution address = &0900Next available address = &091EStart of chunk &0900.tl0900     0900 A2 <-- Possible code section begins.tl0900     0900 A2 00    LDX #&00            0902 BD 0E 09 LDA tl090e, X            0905 F0 06    BEQ tl090d            0907 20 EE FF JSR tlffee            090A E8       INX             090B D0 F5    BNE tl0902.tl090d     090D 60       RTS .tl090e     090E 48       PHA             090F 65 6C    ADC tl006c            0911 6C 6F 2C JMP (tl2c6f)            0914 20 77 6F JSR tl6f77            0917 72 <-- Invalid opcodeNever mind, it's not code after all.            0918 6C <-- Possible code section begins.tl0918     0918 6C 64 21 JMP (tl2164)            091B 0D 0A 00 ORA tl000aNew labels:  tl000a     = &000A     1  tl0918     = &0918     0  tl2164     = &2164     1Chunk finished &091E . Not-code: 24 Code: 6Labels:  tl000a     = &000A     1  tl0900     = &0900     0  tl0918     = &0918     0  tl2164     = &2164     1
Note it's all been considered "not code", because the code does not end cleanly on a change of flow before the first invalid instruction. But we can see an RTS at &090D; so we can set a gap here. This time, run

Code:

$ ./nard -i HELLO -l 0x900 -g0x90e,0x91e
-g specifies a gap. The first address is the start of the gap, there is a comma as a delimiter, and the second address is the first after the gap. Just like *SAVE . If you want to specify multiple gaps, you will have to separate them with spaces; which can either be individually escaped with backslashes, or else put the whole lot in speech marks; -g "start1,end1 start2,end2"

This will give the following output:

Code:

Load address = &0900  Execution address = &0900Next available address = &091EGap starts at &090E and ends at &091E.Start of chunk &0900.tl0900     0900 A2 <-- Possible code section begins.tl0900     0900 A2 00    LDX #&00            0902 BD 0E 09 LDA tl090e, X            0905 F0 06    BEQ tl090d            0907 20 EE FF JSR tlffee            090A E8       INX             090B D0 F5    BNE tl0902.tl090d     090D 60       RTS Code ends cleanly with RTS, preceded by 0 bytes not-code.New labels:  tl0902     = &0902     1  tl090d     = &090D     1  tl090e     = &090E     1  tlffee     = &FFEE     1Chunk finished &090E . Not-code: 0 Code: 14Start of chunk &090E.tl090e     090E 48 <-- Gap            090F 65 <-- Gap            0910 6C <-- Gap            0911 6C <-- Gap            0912 6F <-- Gap            0913 2C <-- Gap            0914 20 <-- Gap            0915 77 <-- Gap            0916 6F <-- Gap            0917 72 <-- Gap            0918 6C <-- Gap            0919 64 <-- Gap            091A 21 <-- Gap            091B 0D <-- Gap            091C 0A <-- Gap            091D 00 <-- GapNo new labels.Chunk finished &091E . Not-code: 16 Code: 0Labels:  tl0900     = &0900     0  tl0902     = &0902     1  tl090d     = &090D     1  tl090e     = &090E     1  tlffee     = &FFEE     1
This is looking good, but we can do better.

Notice that labels have been assigned at the start of the code, and to every address in an instruction operand. These are of the form tl and a series of hex digits. We can use the parameter -o filename to generate a JSON file with the labels (which will also contain the gap definitions, so we can omit the -g parameter when we come to reload it); edit the JSON to give the labels more meaningful names; and then use the parameter -j filename to load this JSON file back in. We can also use -a filename to create a BeebAsm source file.

Code:

$ ./nard -i HELLO -l 0x900 -g0x90e,0x91e -o hello_labels.json$ nano hello_labels.json  #  or use whatever editor you prefer$ ./nard -i HELLO -l 0x900 -j hello_labels.json -a hello_again.6502
The JSON will look something like this:

Code:

{   "gaps" : [      [         "2318",         "2334"      ]   ],   "labels" : {      "2318" : "tl090e",      "2317" : "tl090d",      "2306" : "tl0902",      "65518" : "tlffee",      "2304" : "tl0900"   }}
though the order may be different, as the JSON object is implemented as a Perl hash, and the ordering of hash elements is subject to change. Addresses are given in decimal, but it is actually possible to specify hex constants with & or 0x.

And here's what the assembler output hello_again.6502 might look like with some renaming of labels:

Code:

\ Recreation of "HELLO"ORG &0900\  Labels:text       = &090Eoswrch     = &FFEE.start    LDX #&00.loop1    LDA text, X    BEQ bye    JSR oswrch    INX     BNE loop1.bye    RTS     EQUB &48 : EQUB &65 : EQUB &6C : EQUB &6C : EQUB &6F : EQUB &2C : EQUB &20    EQUB &77 : EQUB &6F : EQUB &72 : EQUB &6C : EQUB &64 : EQUB &21 : EQUB &0D    EQUB &0A : EQUB &00SAVE "hello.rec",&0900,&091E,&0900
[/code]This will assemble to create a file hello.rec which will be a faithful recreation of HELLO.

There's still more work to do, changing runs of EQUBs of printable characters to single EQUS statements and allowing dot-labels in not-code sections. I was just desperate enough to be delighted when it worked at all.

Statistics: Posted by julie_m — Wed Oct 01, 2025 11:38 pm



Viewing all articles
Browse latest Browse all 5552

Trending Articles