Quantcast
Channel: stardot.org.uk
Viewing all articles
Browse latest Browse all 2456

8-bit acorn software: other • A tale of two spigots - more digits of pi, and faster

$
0
0
Where we're headed is lots of digits of pi, computed surprisingly rapidly on a BBC Micro, and printed progressively faster digit by digit as the calculation proceeds. That's a so-called spigot. Run one here - that's from this previous thread.

This is a long story, make sure you're sitting comfortably, and we can begin. Much like a spigot, we started slowly and have sped up progressively...

Once upon a time litwr posted
"A mathematical demo and the request for help from the hardware owners" and over many updates he refined a pi spigot program and ported it to many architectures - and published big tabulations of results here. It's a lot of good benchmark data. I think the algorithm used is by Rabinowitz and Wagon, from 1995.

Much more recently, early this year, Dave (hoglet) found a video showing someone's relay-powered computer printing off digits of pi one at a time. You can read more about that, and watch the video, here. There was just enough information in the video to figure out how it was being done: a variation of the Bailey–Borwein–Plouffe formula, also from 1995.

From the screenshots of the output of unseen python code, we worked on a succession of Basic programs, refining them and trying to get close enough to bytes and bits that we could write a 6502 version.

Here's the timeline of what happened next
  • March 19: The Science Elf posts the relay computer video on YouTube
  • March 20: Dave posts on the retrocomputing forum
  • March 21: Ed starts figuring out the algorithm, posts this Basic program which can compute 10 digits max, in about half a second
  • March 22: Musing on how efficient 6502 might be, running a nibble-serial calculation that needs bit-shifting
  • March 24: A glimmer of the idea that 16 is bigger than 10, so each round which calculates a hex digit of result is doing more than just one decimal digit worth
  • March 28: A glimmer of the idea that our bignums (which we don't yet have) can get shorter as the calculation proceeds
  • March 30: The start of an email discussion. Basic now doing division by shift and subtract
  • March 31: Ed reckons 5 bytes of RAM will be needed for each pi digit (turns out it's less)
  • April 1: Dave convinces Ed to start using PROCs so we can see what we need
  • April 3: Ed needs 7 PROCs and a FN. Looks like we need only 3 bignums not 5. Basic only computes 7 good digits and takes 7 seconds to do it - it's not trying to be fast, but to help us understand what we need to do in assembly.
  • April 16: Ed says he's still regarding it as his current project, and Dave has helped to clarify some tactics
  • May 7: Ed back from holiday, has done nothing for weeks. A new faster formula is noted (Bellard's formula)
  • May 11: Ed starts a github repo so we have a record and can collaborate
  • May 16: Ed visits Dave and is cajoled into making some progress. A whiteboard is brought into play.
  • May 21: Dave wonders if Ed is stuck again
  • May 27: Ed realises he is underwater and calls for assistance. Much progress with Dave on zoom over the next 5 hours. First version of Basic which handles more than 4 bytes - can compute 20 digits in 868s (nearly 15 mins) or 8 digits in 84s
  • May 27: Same day, Dave writes a better comparison function for a 55% speedup - more on larger outputs
  • May 28: Next day, a further 61% speedup by tidying code and shrinking the bignums progressively as calculation proceeds (no point producing bits which will never be used.) In fact more than that, for larger outputs - 84% faster at 32 digits
  • May 28: Same day, Dave gets a 21% speedup by accumulating 8 bits of division results and then accumulating a byte into the result. Can do 32 digits in 464s now.
  • May 29: Next day, Dave implements an idea for streamlining the division. We lose one bignum, so only two now, excellent RAM economy, and we get... 7 times speedup - 800% speedup.
  • May 30: Next day, we find a few small improvements to the Basic speed, and Dave codes up the first 6502 version, which worked almost first time, and can do 32 digits in 11 seconds - 80 digits is 46s vs 273s in Basic - nearly 6x speedup (700%). Not as much as hoped for. But this is still mostly Basic, with just the division and accumulation in assembly.
  • May 31: Next day, Dave writes the rest of the code in 6502. Now 302s for 1000 digits - pretty fast, but not nearly as fast as the benchmark spigot, which takes just 157s.
  • June 2: Dave profiles the 6502 code, and implements an idea to use short integers for as long as we can. Reducing our divisor from 4 bytes to 3 results in a 24% speedup.
  • June 3: Dave converts some arithmetic to table lookup for a 10% speedup
  • June 5: Various improvements committed... 1000 digits in 205s
  • June 5: same day, a bright idea to use self-modifying code, as the division takes such a long time it can be patched to use constants instead of shifting the same multibyte values for each bit of the bignum
  • June 7: Dave implements the bright idea, for a whopping 42% speedup. 1000 digits in 145s, 3000 digits in 1251s (all these algorithms are quadratic in the number of digits)
  • June 10: Dave starts looking at the faster Bellard series - it has much the same structure as BBP so we reuse all our best efforts. Dave codes a Basic version and Ed times it as about 44-50% faster than the BBP Basic, which is about as expected, and very hopeful.
  • June 11 (today) Dave has coded up the Bellard version in 6502 assembly, it runs 37% faster than the BBP version!
Phew, what a journey. And probably more to come.

Here are the latest measurements: (We're using beebjit in fast accurate mode, autobooting from an ssd built by beebasm. Great tooling!)

Code:

Summary for BBP Pi Spigot   80     1.21s  200     6.52s  400    24.35s  800    93.51s  100     1.78s 1000   144.14s 3000  1245.31s Summary for Bellard Pi Spigot   80    0.81s  200    4.29s  400   16.3s  800   65.34s  100    1.18s 1000  101.94s 3000  910.52s
There's a historical angle which might be interesting or amusing, observations about the progress of the computation of pi...

(We won't expect to win any speed records because spigots are not built for speed, for that we need Machin-type arctan formulas (polynomials), or AGM (needs a bignum square root), or Chudnowsky (complicated polynomial series).)

Here's a trace of record computations:
  • 1954, Nicholson and Jeenel, NORC, 13 minutes, 3093 digits
  • 1957, Felton, Pegasus, 33 hours, 7480 digits (trying for 10021 had an error)
  • 1958, Genuys, IBM 704, 1.7 hours, 10000 digits
  • 1958, Felton, Pegasus, 33 hours, 10021 digits (rerun without error)
  • 1959, Genuys, IBM 704, 4.3 hours, 16167 digits
  • 1961, Shanks and Wrench, IBM 7090, 8.7 hours, 100265 digits
  • 1961, Gerard, IBM 7090, 38 mins, 20000 digits (speed record of sorts)
  • 1966, Guilloud and Filliatre, IBM 7030, 41.92 hours, 250000 digits
It presently takes us 976s (16 mins) to do 3100 digits, which is slower than NORC. But we might be in fighting distance of this 1954 speed record.
We can (at present) do 10240 digits in 10541s which is a bit under 3 hours. Up to 1959 that would be a capacity record, in 1957 a speed record too.
And we can presently do 16200 digits in 26517s, which is about 7.4 hours , and which would give us the capacity record up to 1961.

We'll not get to 100k digits (1961 record) unless we share memory between our two bignums and also use at least 48k or more. Sideways RAM would do it, or a second processor would give us maybe 56k.
If we could, we'd have the capacity record up to 1966.

Statistics: Posted by BigEd — Tue Jun 11, 2024 11:16 pm



Viewing all articles
Browse latest Browse all 2456

Trending Articles