It could do with someone digging into the DNFS source and working out what is done to Rx packets across the tube (though OSBGBP and LOAD should be equivalent from this point of view, both using the block transfer 2-byte-at-a-time mode across the tube). Everything filesystem-wise apart from LOAD/SAVE and OSBGBP only receives into host memory.
Logically, you'd expect the tube case to be faster than the host memory case - doing an absolute store into the fixed-address tube register vs having to do an indexed store and increment the address for the host memory (even if the latter done with self-modifying code, still need to do the increment). However, it was well known BITD not to be the case - machines with 2nd processor needed slower clock.
One possibility is that it is deliberately slugged. The bulk transfer channel has no handshake, and the host is required by the spec to limit the rate at which data is written to the tube as there are only 2 bytes in the FIFO and the 2P side has to have time to take an NMI and empty it before the next pair of bytes. Originally this was no big deal, as the 2P was typically a faster CPU than the host and if the host was taking an NMI anyhow to generate the bytes (eg. DFS or Econet) then it was likely to be faster. However, with the advent of the 16032 2P with its terrible interrupt latency (hence inspiring the creation of ARM...) the tube spec was updated to have a much larger delay required between pairs of bytes. That delay is probably no longer needed on PiTubes (even if emulating 16032) and was never needed for real 6502s, so it's conceivable that the released DNFS is optimised for a case that no longer makes sense.
The Econet code obviously can't do anything about the data rate in the middle of the packet (even if it wanted to deliberately slow the data rate, there'd be no means of doing so without a large buffer to store the bytes arriving), but possibly there are some deliberate delays around start or end of packet.
Or possibly I'm raising a red herring talking about 16032s and it's a straightforward code problem of the tube code taking longer to set up (install as the NMI handler) than the host version, and this work done after sending the ack frame.
Either way, the likely difference between L3FS and the Pi Bridge will be the speed of line turnaround: L3FS running on a BBC/Master is likely to let out a flag or two of fill before starting to send the packet data, while the Pi Bridge on a good day maybe just has the opening flag and no fill.
I'm sure this behaviour in DNFS is fixable, but unclear if it's a small, medium or large project to do so.
Logically, you'd expect the tube case to be faster than the host memory case - doing an absolute store into the fixed-address tube register vs having to do an indexed store and increment the address for the host memory (even if the latter done with self-modifying code, still need to do the increment). However, it was well known BITD not to be the case - machines with 2nd processor needed slower clock.
One possibility is that it is deliberately slugged. The bulk transfer channel has no handshake, and the host is required by the spec to limit the rate at which data is written to the tube as there are only 2 bytes in the FIFO and the 2P side has to have time to take an NMI and empty it before the next pair of bytes. Originally this was no big deal, as the 2P was typically a faster CPU than the host and if the host was taking an NMI anyhow to generate the bytes (eg. DFS or Econet) then it was likely to be faster. However, with the advent of the 16032 2P with its terrible interrupt latency (hence inspiring the creation of ARM...) the tube spec was updated to have a much larger delay required between pairs of bytes. That delay is probably no longer needed on PiTubes (even if emulating 16032) and was never needed for real 6502s, so it's conceivable that the released DNFS is optimised for a case that no longer makes sense.
The Econet code obviously can't do anything about the data rate in the middle of the packet (even if it wanted to deliberately slow the data rate, there'd be no means of doing so without a large buffer to store the bytes arriving), but possibly there are some deliberate delays around start or end of packet.
Or possibly I'm raising a red herring talking about 16032s and it's a straightforward code problem of the tube code taking longer to set up (install as the NMI handler) than the host version, and this work done after sending the ack frame.
Either way, the likely difference between L3FS and the Pi Bridge will be the speed of line turnaround: L3FS running on a BBC/Master is likely to let out a flag or two of fill before starting to send the packet data, while the Pi Bridge on a good day maybe just has the opening flag and no fill.
I'm sure this behaviour in DNFS is fixable, but unclear if it's a small, medium or large project to do so.
Statistics: Posted by arg — Tue Dec 09, 2025 6:21 pm