Make Spreadsheets Safe!: 2014

Sunday, 26 January 2014

How to Code Recursion in Excel

In my posting "Earth falls toward a black hole and everyone dies", I said that the science-fiction generator it describes works by traversing a story flowchart, and that the natural way to program this is by recursion. I'm now going to explain. How did I code a recursive Excel spreadsheet?

Here's an example I've written to demonstrate. The spreadsheet is here. It has three columns. The first two, headed input and output, are one row each. Type a number into input, and its factorial will appear in output.

And here is the Excelsior program I generated this spreadsheet from:

  table input : -> general.

  table output : -> general.

  output[ ] = fac[ input[] ].


  type fac_range = 0:100.

  table fac : fac_range -> general.

  fac[ 0 ] = 1.

  fac[ n>0 ] = n * fac[ n-1 ].


  layout( 'Sheet 1'
        , rows( heading
              , row( input as null
                   , output as null
                   , fac as y
                   )  
              )
        ).

The key to understanding this is that Excelsior thinks of a spreadsheet as a collection of tables. The Excelsior keyword for these is table. When Excelsior generates a spreadsheet, it maps each table onto one or more cells, as dictated by the layout statement. In this program, that statement puts fac to the right of output to the right of input, and runs fac down the sheet. Since fac has one dimension — as indicated by the type statement for fac_range, and by the table declaration for fac — that makes it occupy a single column.

The key to understanding the recursion is to think of tables as functions. Indeed, my syntax for table declarations imitates the notation used by mathematicians to specify a function's argument and result types. The two equations for fac then become analogous to the usual recursive definition of factorial.

This, clearly, is recursion. But it's laid out — spread out, you might say — in space rather than in time.

So that generates the factorials. The remaining part of the program is that which gets the right factorial from the table and puts it into the output cell. Namely, the first three statements. The first two declare the input and output cells as tables that have no subscripts, and are therefore one cell each. The third statement uses the input as a subscript to fac, and puts the result into the output.

And that's how to program recursion in Excel.

Kaprekar's Constant in Excel and Excelsior

The integer 6174 is also known as Kaprekar's constant. According to Wikipedia, it is notable for the following property:

Take any four-digit number which contains at least two different digits (so not 1111, 2222, and so on). Leading zeros are allowed.
Arrange the digits in ascending and then in descending order to get two four-digit numbers, adding leading zeros if necessary.
Subtract the smaller number from the bigger number.
Go back to step 2.

The above process, known as Kaprekar's routine, will always reach its fixed point, 6174, in at most 7 iterations.

I'm going to use it as an example. In my last posting, about literate programming and my science-fiction generator spreadsheet, I explained that I made the spreadsheet by writing a program in a language I call Excelsior, and compiling this to Excel. I want now to demonstrate Excelsior, by using it to write a Kaprekar-constant spreadsheet.

I was inspired by Excel developer Jeff Lutes, who posted to the Microsoft Excel Developers List in July 2012, asking for an elegant way to calculate Kaprekar's constant in Excel, preferably without using Visual Basic. I tried this in Excelsior, and generated this spreadsheet. Here below is the Excelsior program, in the HTML documentation format output by Literate Excelsior. Program text is on a grey background: the rest is commentary. To use the spreadsheet, type a four-digit number into cell A2. You can also see the program here, which may be better if the blog styling interferes with this posting.

Program structure

I'm laying the spreadsheet out in columns. Each row corresponds to one iteration, and I shall define a constant called max_iterations to give the depth of each column:

type column = 1:max_iterations.

constant max_iterations = 20.

I hope this will be big enough.

Briefly stated, the columns hold the following tables:

the original number;
its digits, calculated by converting it to text and extracting a substring;
the digits in ascending order, calculated by using the function SMALL, which returns the k-th smallest value in an array or range;
the number in ascending order, calculated by concatenating these;
the number in descending order;
their difference. I feed this back into the first column.

The tables

number

number[1] holds the original number. number[n], where n>1, contains the absolute value of difference calculated in the n-1'th iteration. I made it absolute because I hit problems with the minus sign otherwise.

table number : column -> general.

number[ 1 ] = 4973.
number[ n>1 ] = abs( difference[ n-1 ] ).

number_as_text

number_as_text[n] is number[n] converted to text.

table number_as_text : column -> text.

number_as_text[ n ] = text( number[n], "0000" ).

digits

digits is a four-column table. The d'th column holds the d'th digit of number.

type digit_range = 1:4.

table digits : digit_range column -> general.

digits[ d, n ] = value( mid( number_as_text[n], d, 1 ) ).

digits_ascending

digits_ascending is also a four-column table. The d'th column holds the d smallest digit.

table digits_ascending : digit_range column -> general.

digits_ascending[ d, n ] = small( digits[all,n], d ).

number_ascending

number_ascending[n] concatenates the digits from digits_ascending[n], giving us them as a single number.

table number_ascending : column -> general.

number_ascending[ n ] =
  value( digits_ascending[ 1, n ] &
         digits_ascending[ 2, n ] &
         digits_ascending[ 3, n ] &
         digits_ascending[ 4, n ]
       ).

number_descending

Similarly, number_descending[n] is a single number with the digits in descending order.

table number_descending : column -> general.

number_descending[ n ] =
  value( digits_ascending[ 4, n ] &
         digits_ascending[ 3, n ] &
         digits_ascending[ 2, n ] &
         digits_ascending[ 1, n ]
       ).

difference

difference[n] is the difference between number_ascending[n] and number_descending[n].

table difference : column -> general.

difference[ n ] = number_ascending[ n ] - number_descending[ n ].

layout

This arranges the tables from left to right as named below, and puts an automatically-generated heading above each giving its name.

layout( 'Sheet 1'
      , rows( heading
            , row( number as y
                 , number_as_text as y
                 , digits as xy
                 , digits_ascending as xy
                 , number_ascending as y
                 , number_descending as y
                 , difference as y
                 )
            )
      ).

And that is how to calculate 6174 in Excel. It's the most useless integer in the mathematical universe, but at least the calculations are easy to read.

Friday, 24 January 2014

"Earth falls toward a black hole and everyone dies"

In my post "The Programmer as Essayist", I quoted the famous computer scientist Donald Knuth on literate programming. I now want to give an example from my own work. It's a spreadsheet that generates science-fiction plots. Try it! Clicking on that link will open Excel if you're in Windows, and you can then run the spreadsheet immediately or save it for later. To run it, scroll to the top. The pale yellow column headed Story contains the output. To change it, click on cell B4, below the label saying Recalculate. The cell will then display as a menu containing one option, also called Recalculate. Select this option (even though it's already selected). That will activate a chain of random-number generators, and you'll get a new story in column A.

I wrote this spreadsheet using literate programming. This is where it came from.

In the anthology The Year's Best Science Fiction No. 5 (edited by Harry Harrison and Brian Aldiss, Sphere, 1972), there are stories. That's usual for anthologies. There is also a story generator. That's less usual. The story generator is called "The Science Fiction Horror Movie Pocket Computer" and is written by Gahan Wilson.

Or perhaps I should say drawn, because the SFHMPC is a flowchart. It consists of boxes containing phrases such as "Earth", "is struck by a giant comet and", "destroyed", "is attacked by", "bug(s)", "which (who)", "look upon us only as a source of nourishment", "and are", "radioactive", "and", "can be killed by", "the atomic bomb (The End)". The boxes are connected by lines. Start at the top, follow it round, choose an exit at each box, and you end up with a cheesy SF plot that Hollywood would be proud of.

Wilson's SFHMPC went through several incarnations in my hands before I got it working in Excel. The first was in felt-pen and card. I used to teach Artificial Intelligence at Oxford, and I also used to publicise AI at Freshers' Fair. There's not actually much AI in the SFHMPC, but it's fun and it demonstrates that computers can do interesting things with data other than numbers. So I drew a huge copy of it on a monster piece of card bought from the Broad Canvas art shop, and used it as one of my Freshers' Fair props, together with such tantalisers as the Dennett and Hofstadter "A Conversation with Einstein's Brain", a wodge of output from Eliza, and a photocopy of John Varley's short story "Overdrawn at the Memory Bank".

A bit later, I implemented the SFHMPC in Prolog. That's because Prolog was my main language for teaching AI, and I wanted an amusing demo to show how Prolog can represent networks.

Then I put that up on my Web site, as a PHP script running SWI-Prolog. Go here to try it, to see the Prolog source, or to read about how to call Prolog from PHP.

Then even later, as part of my spreadsheet research, I implemented the SFHMPC in Excel. Now, this is an unusual spreadsheet. That's because the natural way of generating a story from the flowchart would be by recursion. Like this:

To generate a story, go to the box at the top of the flowchart, then generate the rest of the story.

To generate the rest of the story, output the contents of the box you're at, then choose one of its exits, go to the corresponding box, and generate the rest of the story. Stop if there are no exits.

That, indeed, is how the Prolog version works.But you can't do recursion in Excel, unless you descend into Visual Basic, so how did I make the spreadsheet version work? That's a topic for another posting. What I want to say in this one is that I wrote it in Excelsior, a language I've implemented that compiles into Excel.

And you can write Excelsior programs in two modes. One is similar to the way you'd write in a language such as Pascal or Java. Everything is assumed to be source code unless you precede it by comment delimiters, in which case it's comment. When you feed this into Excelsior, it generates a spreadsheet.

The other mode is for literate programming. Everything is assumed to be comment unless you indent it, in which case it's code. When you ask Excelsior to compile this, it generates two pieces of output. One is a spreadsheet. The other, from Excelsior's documentation engine, is a nicely formatted Web page. Here's the documentation engine's output for the SFHMPC.

In this Web page, commentary is left alone. Code is rendered in a monospace font, on a lightly coloured background to make it stand out. I wrote the code and commentary for the SFHMPC as a mathematical essay, thinking of the code as inserts in the same way that equations would be in a normal maths essay. I introduced the data structures in an order which, I hope, makes it easy to understand how the code works. I included some simple examples. And that is literate programming.

.

Saturday, 4 January 2014

The Programmer as Essayist

I believe that the time is ripe for significantly better documentation of programs, and that we can best achieve this by considering programs to be works of literature. Hence, my title: "Literate Programming."

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition and excellence of style. Such an author, with thesaurus in hand, chooses the names of variables carefully and explains what each variable means. He or she strives for a program that is comprehensible because its concepts have been introduced in an order that is best for human understanding, using a mixture of formal and informal methods that reinforce each other.

— Donald Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99. This quote is at http://www.literateprogramming.com/ .

Make Spreadsheets Safe!

The blog for an Indiegogo campaign to fund a project on spreadsheet safety. I want to replicate a famous economics spreadsheet, the "Reinhart-Rogoff spreadsheet", using tools I've developed for writing modular, literate, spreadsheets. This will deliver (a) a modular economic model and (b) a demonstration of how to write spreadsheets that are modular, easy to read, and thereby safer than spreadsheets built using only Excel.

Sponsors so far

Ross Gayler.
Owain Shave.
Luís Carvalho
Lukas Graf