Arrays, Part III: Copy-on-write nuts and bolts

Array of satellite dishes

In Part I, we discussed the problem with using arrays in the functional paradigm. In Part II, we proposed using copy-on-write behavior as a possible solution. In this part, we will discuss the details of implementing copy-on-write, particularly in a purely-functional, lazy language.

The first thing we need to establish is the rules for how the copy-on-write flag (COW) should be set. We will choose to use a static analysis to decide where to insert instructions to enable COW. This allows us to isolate the COW logic to code that knows it’s working with arrays, which allows array code compiled by our compiler and non-array code compiled without knowledge of arrays to be linked together and work together. It also means we can reuse our static analysis to report errors to the user when what they expect to be O(1) or O(n) array code is compiled to be O(n) or O(n^2).

The COW flag prevents an array from being modified in-place; so the list of places we enable COW here is the photographic negative of the cases where in-place modification is allowed.

We enable COW on any array constant / literal which occurs at the top level of a program (statically allocated in the data segment), since we can’t see if it will be used in multiple places or not. On the other hand, we leave COW disabled on arrays allocated and returned from array functions at run time, until the caller decides to enable it.

Obviously, any function λ v. ... v ... v ... which uses an (unlifted) array parameter in multiple places should enable COW on the array before any of those uses. In addition, any function like λ v. ... for f = λ x. ... v ... . ... which allocates a closure over an array parameter should enable COW, since our analysis doesn’t track whether the closure will be called multiple times. Similar considerations apply to binding constructs like for v = ... . ... v ... v ... and for ⌊v⌋ ∝ ... . ... v ... v ..., where an array returned from a function we call is used multiple times (or used in a closure).

Note that, if we have a branching construct:

analyze e.
    case c0 x0. ...,
    case c1 x1. ...,
    case c2 x2. ...,

, only multiple uses in the same branch, or one use outside the case branches and one use inside, count as ‘multiple uses’.

Before we place an array in a record or in a constructor, we need to enable COW on it, since our analysis doesn’t consider those data types. Similarly, we need to enable COW on any array variable which will be passed to a polymorphic function. If we call a polymorphic higher-order function, and pass it a function that returns an array, e.g. map f xn, where f returns an array, we need to wrap f in logic that enables COW on the result before returning it to map.

While we never enable COW on an array being returned (directly) from a function, we do need to wrap certain expressions in an extra evaluation context that will enable COW on the result before passing it further up the stack. We need to do this on any array expression being passed to a polymorphic function, on the right-hand side of any array thunk which is used multiple times, stored in a data structure, or passed as an argument to a function (since I don’t know how to have a function enable COW on a lifted array parameter).

None of the cases where we add this wrapper is a tail context, so they don’t interfere with tail recursion. In general, we enable COW on function parameters and on function arguments, but not on function results.

This strategy for copy-on-write is based on a static analysis just before the code generator actually runs. It’s possible to run this analysis throughout program optimization, and constrain optimizations to only increase, never decrease, the cases where COW is disabled; but, ultimately, the final decision whether an array will be built in-place or not is up to the code generator. In Part IV, I will consider what options there are for programmers to get automated errors when un-wanted array copies are inserted into their programs.

Arrays, Part II: Possible solution: copy-on-write

Array of satellite dishes

Part I is here.

In part I, we established that we want to be able to use arrays in the functional paradigm, and we want doing so to be efficient; but we ran into a problem, because efficient array algorithms require in-place modification. That conflicts with the functional style, and particularly with functional purity.

In this part, I want to start laying out what I think is the solution.

The problem with in-place modification of arrays is one of aliasing. The functional paradigm is based around <a href="https://en.wikipedia.org/wiki/Persistent_data_structure"persistent data structures, which are semantically immutable values. Because they are immutable, the distinction between call-by-reference (fast) and call-by-value (easy to reason about) gets collapsed. Persistent data structures can be passed around as pointers, but the semantics is the same as if they were copied. But, unfortunately for us, that means you get pointers to the same array scattered around program memory. Modifying the array in-place through one of them changes the value seen by every other pointer, which breaks the value semantics.

The traditional solution to a problem like this in imperative languages is copy-on-write. You add a copy-on-write bit to the array header, and turn it on when you copy a pointer to the array. Then, before modifying the array in place, you check the copy-on-write bit and, if it’s set, you copy the array to new memory and modify that instead.

This is particularly natural in functional languages like Global Script, because an expression like xn @@ x is specified as returning a new array value. There’s no expectation that it returns the same array pointer that was passed in, but, as long as the xn argument was the only pointer to the array, there’s no violation of the semantics if it does, since there aren’t any pointers left to the old array value. Copy-on-write is an optimization, reusing the storage for xn when there are no other pointers to it.

This works nicely with another array optimization, related to appending. “Traditional” arrays, implemented in languages like Fortran and C, have a fixed size; you have to allocate a new array yourself if you need something longer than the array you have. There’s a simple optimization for growing arrays in-place, which gives us amortized O(1) append behavior, at the cost of occasionally moving the array in memory. (The copies associated to these moves are amortized O(1) because we space them out so that each item added costs us an O(1) portion of a future copy operation.) For mutable arrays – where identity is important, and you can have multiple points to the same mutable array – this requires a memory layout like this:
dynamic-array,
where every pointer to the same mutable array points at the same header, and moving the data in memory changes the pointer to the data inside the header.

By contrast, with persistent but copy-on-write arrays, we can use this memory layout:
copy-on-write-array,
where the data is always stored immediately after the header. If we want, we can still have extra blank space at the end of the data, for the array to grow into, but we can handle that space being missing or exhausted, too. The @@ (or ) function can modify the array in-place provided two conditions are met: the copy-on-write bit is turned off, and the array has extra capacity to grow into. In that case, it uses the first slot in the extra capacity to store the new element, and returns a pointer to the same array that was passed in. If the array is marked as copy-on-write, or it needs to be re-allocated before it can grow, we just allocate a new array (ensuring the new array has extra capacity for the next call to @@ or ), copy the old array into it along with the new element(s), and return a pointer to it. So we get the ability to grow arrays for free, using the same code path as copy-on-write.

(There’s no point in having extra capacity in an array that has the copy-on-write bit turned on, so the garbage collector can remove extra capacity from such an array, or add extra capacity to any array it notices is only the target of one pointer.)

(This ‘nicer’ representation isn’t 100% compatible with the slice optimization proposed in Part I for view (@), but maybe we can pick a representation at compile-time – or even dynamically – depending on how the array variable is used.)

Note that the array returned by @@ never has the copy-on-write bit turned on. The only ‘array constructor’ function that ever returns an array with that bit turned on is nil, and copying that array takes O(1) time anyway. So we can guarantee that a loop that just calls @@ or repeatedly on ‘the same’ array will copy the input array at most once.

In Part III, we will consider when to actually turn on the copy-on-write bit, and propose a flag to allow programmers to get compile-time errors when the copy-on-write strategy will fail for their code.

Arrays, Part I: the problem

Array of satellite dishes

The way to write map on arrays in the functional style is

'map :: ∀ 'α. ∀ 'β. (α → β) → array.t α → array.t β;
'map 'f 'a = w nil a, where:
    'w 'b nil :- b;
    'w 'b ('x@'a1) :- w (b @@ f x) a1;
;

While this is natural, it’s unfortunately O(n^2): the @ view and @@ function in the last line are both O(n) in the array they’re applied to, because they need to make (nearly-) complete copies of the input.

One response functional programmers take to that is to give up on arrays and use linked lists instead. Linked lists are easy to build as algebraic data types and fit quite nicely with the functional paradigm:

'map :: ∀ 'α. ∀ 'β. (α → β) → list.t α → list.t β;
'map 'f nil :- nil;
'map 'f ('x@'xn) :- f x @ map f xn;

On the other hand, while lazy linked lists are quite nice, arrays are (arguably) a better translation of the concept of the free monoid into functional programming. An array.t α is either an element of the free monoid over α or non-terminating, which is similar to record types, function types, and sum types, whereas a list.t α could be a finite sequence of αs, an infinite sequence, or a finite sequence of αs followed by undefined.

Furthermore, linked lists use substantially more memory than arrays do (not asymptotically, but a linked list will consume something like 3n pointer words for n elements, vs. something like n + 4 pointer words for an n-element array) and have (potentially) worse cache behavior – something that is becoming more important over time, not less.

This leads some functional programmers to use arrays in spite of their inefficiency. The most popular technique (AFAIK) is to use mutable arrays within the ST monad, which means abandoning the functional paradigm entirely in favor of the imperative for array manipulations. Fortunately, that can be cleverly hidden within array combinators, but, for Global Script, it would be nice if there was a better approach; some way to do array programming within the functional paradigm.

For view (@), the O(n) problem is easy to fix: replace arrays with slices. Essentially, rather than storing a pointer to a region of memory, store a region and offset. Then view (@) can return a slice with the start offset pointing one element further into the array than in the input.

The @@ function is a harder nut to crack. The only way to implement it in O(1) is to modify the input array in place, but this is obviously impossible in a pure-functional language.

That leaves arrays in an awkward place in functional programming languages: basically a foreign data structure, used for efficiency but not really belonging.

Part II, on copy-on-write, is here.

Values, Computations, and Referential Transparency

Riffing off of Programming Paradigms and the Procedural Paradox, by Eric Normand (which you should definitely read, if you haven’t).

I think the most important thing to add is that, in (lazy) purely functional languages (/ styles), functions are completely orthogonal to data, computations, and effects (/ actions).  Data, computations, and actions are types of expressions:

  • 2 is a datum
  • 2 + 2 is a computation
  • print 2 is an action

and then you can classify functions according to what their function body is:

  • λ 'x. [x] is a (parameterized) datum
  • λ 'x. x + x is a (parameterized) computation
  • λ 'x. print x is a (parameterized) action

Now, in a pure-functional language, data and computations have the same type, while data/computations and actions have different types.  So the type system doesn’t reflect the datum / computation / action distinction as well as we’d like.  (Despite that being a really useful distinction).

Global Script Core maintains a lifted / unlifted distinction. This distinguishes proper values / data from the value-or-non-termination things that normal lazy languages, like source-level Global Script, use. It lets the optimizer keep track of expressions that are guaranteed to terminate and variables that are always bound to (run-time) values.

The distinction between data and expressions is not the same as the lifted / unlifted distinction! 2 + 2 is a perfectly sensible unlifted expression (it’s guaranteed to terminate), but it remains a computation; while something like for rec 'xn ∝ 3 @ xn. xn is (probably) a value, even though it’s only allowed due to lifting. I don’t think that a recursive definition makes something a computation, unless the right-hand-side is already a computation.

The question is, should values and computations have different types? An expression and its value are clearly different things, or the concept of an expression evaluation wouldn’t make sense. On the other hand, when we write ‘2 + 2’ in a Global Script program, we intend it to be replaced with its denotation – which is a value. This is the original meaning of referential transparency, and it’s something Global Script shares with mathematical and (mostly) non-mathematical English (unlike side-effects, which are purely a programming-language invention).

So conflating values and computations in the type system has a respectable pedigree. In natural-language logic, when a statement needs to be referred to directly (as opposed to its meaning or truth-value), it’s put in quotes: “The statement that ‘Elizabeth II is the Queen of the United Kingdom’ is true now, but was false 100 years ago” [giving two examples]. In programming language theory, brackets are used instead: “2 + 2” is 4, but “[| 2 + 2 |]” is an expression. Global Script can do the same thing using the gs{} QLO, which quotes an expression (and explicitly captures variables from its enclosing scope). So λ 'x. 2 + x is a parameterized (computed) number, while λ 'x. gs{2 + x} is a parameterized expression.

I think that’s a reasonably clear way of making the distinction in the language, but the distinction itself is probably worth bringing up with student programmers.

How the Unix User Interface Works, Part I

In the beginning was the typewriter

The traditional Unix console / terminal / command-line user interface ultimately descends from typewriters. I suppose that the younger set (like your author) mostly haven’t seen typewriters before; they look like this:

Underwoodfive.jpg
source: Wikipedia,

and combine the technologies of moveable type, invented by the Chinese around 1040, with the keyboard, invented by the ancient Greeks in the 3rd century BC, and developed over the course of the 18th and 19th centuries. Moveable “type” is a mass noun; a “piece of type” is an individual hard artifact with a mirror-image of a letter carved into or onto it, capable of transferring ink to paper. The key idea of the a typewriter was to place each piece of type on a lever, and connect the other end to a particular key; pressing the key would move the type forward into a ribbon with ink on it, striking the letter into the paper behind. Other mechanical actions would move the paper (generally) so the next key press would strike the type after the letter just typed. Other keys would be connected to other actions: the Return key would move the paper up and return to the beginning of the line, the Tab key would move the paper to the next (operator-defined) “tab stop”, and so on. A good video of a mechanical typewriter in action can be seen here on YouTube.

The electric typewriter and the teletype

Traditional typewriters were purely mechanical devices: levers and gears were used to translate the mechanical action of pressing the keys into the motion of the type and paper. Once the typewriter became commercially popular, the idea developed of using the keys to send an electrical signal which would drive the type and move the paper. By this point, vertical motion (e.g., moving to the next line) would be supplied by moving the paper up, but the type itself would be placed on a moveable “carriage” and moved right on a normal keypress or left when the user hit Return.

At the same time, the telegraph had been invented, sending coded signals long distances over wires. It was only natural to connect the two: have a keyboard on one end send a signal for the key that was pressed, and interpret the signal on the other end using pieces of type on levers similar to a typewriter’s. The result was the teletype, which is basically a pair of electric typewriters, but with each keyboard connected to the other typewriter’s type bar over a telephone wire. ASCII was originally designed as a code for transmitting characters between teletypes; several of its character codes are designed to control the teletype printer, with the rest designed to format data for transmission.

The printing computer terminal

Once computers were invented, and the idea of using a computer interactively emerged, it was natural to connect a teletype to a computer: pressing a key would send an electrical code to the computer, and the computer could send a matching signal back to the teletype to cause a letter to be printed. This system is partly why the ‘output’ instruction on 1970s languages is called print, and why BASIC had separate PRINT and LPRINT statements, for printing (literally) to the teletype (so to the user) and to a separate line printer, respectively. (Note that a teletype is basically a pair of a keyboard and a line printer anyway.)

There was no echo facility built in to the teletype; the computer had to be programmed explicitly to send each character typed back to the terminal so you could see what you were typing. The simplest way to hide passwords and other input you didn’t want saved forever in the printout was to disable this echo in the computer, which is why that’s what Unix traditionally does.

Enter Unix

The Unix system (sometimes Uɴɪx, never UNIX, except in the trademark) was designed under these conditions. Unix software is designed to process simple text files, with records delimited by the ASCII linefeed or ‘newline’ character: decimal 10, hexadecimal 0x0a, sometimes referred to as ‘control-J’, and designated as \n in C. The kernel (or, rather, the teletype device driver, which is major device number 4 on Linux) was responsible for talking to the teletype, and translating between its conventions and this internal format. It would buffer output as needed, or stop programs that would otherwise overflow the buffer, to slow down output to what the teletype could handle, translate newline characters in program output to whatever the terminal needed to tell it to print a new line (frequently an ASCII carriage return (decimal 13, hexadecimal 0x0d, designated as control-M or \r in C) followed by a newline character), and tra driver, which is major device number 4 on Linux) was responsible for talking to the teletype, and translating between its conventions and this internal format. It would buffer output as needed, or stop programs that would otherwise overflow the buffer, to slow down output to what the teletype could hnslate whatever the terminal keyboard produced when the operator pressed the Enter or Return key (frequently an ASCII carriage return character) into a newline in the program input.

The Unix hater’s handbook complains that Unix ‘should’ have a program that understands the format of its text files and can display them to the user. But it does! Or did, back when it was first designed. That program was the kernel. Having a separate program for displaying each type of file on your computer is the way of madness; you don’t want to go there. The problem is that, once Unix left Bell Labs and entered the University of California at Berkley, that’s exactly where it went.

Video display terminals

This is where the madness truly begins. When video display terminals were first hooked up to a Unix system, the right thing would have been to write a new kernel device driver for them. It could have mapped between the VDT and the Unix tradition of plain text files, possibly even providing better full-screen (WYSIWYG) text editing capabilities, while providing an API for programs that wanted to use the full capabilities of the video terminal to do so and abstracting away from differences between video terminals provided by different companies. The actual capabilities it needed to supply were pretty simple, basically just controlling the position of text output on the screen, due to the limitations of terminals at the time.

Instead, the video display terminals were hooked up to the same teletype (glorified electric typewriter) device driver Unix had been using from the beginning. There is no reason or excuse for this decision.

The kernel kept handling traditional Unix programs, that just wanted to read in plain text files and write out plain text files, in the traditional way. Of course, as terminal technology improved, the kernel’s input editing capabilities stayed the same, which wasn’t good, but things functioned.

Programs that wanted full-screen display capabilities used an ioctl to turn the kernel’s device driver off entirely, so they could talk to the terminal directly. This is where the Unix hater’s handbook starts making more sense, because this was a terrible idea. It’s not portable! Despite the existence of an actual ANSI standard, terminal makers didn’t map the same control sequences to exactly the same behavior, or produce exactly the same byte sequences for the new control keys (like arrow keys).

Programs dealt with this in two ways (simplifying slightly):

  • First, they created a file, called termcap, which could be parsed by full-screen programs to learn (partially) how to program the currently-attached terminal.
  • Then, because it turns out that parsing and using termcap is every full-screen program causes too much code duplication, they wrote a second library, on top of termcap [of course!], called ncurses, for programming terminals.

ncurses has an API that’s probably too large to put into the kernel, but the main point is that its implementation is coupled to the particular terminal it’s talking to, and it depends on bypassing the kernel to talk to it.

Mi computer es su computer

In the meantime, Unix needed to support remote logins, to run programs, including interactive programs, on other computers, at first using rsh, and later of course ssh. Of course, interactive programs expect to talk to a terminal device driver, or possibly to talk to the terminal directly; so how do you make that work over remote connections? BSD chose to do the only (non-) sensible thing: create a new type of device driver, called a pseudo-teletype (pty) to use on the remote computer, put the terminal device driver on the local machine into raw mode (bypassing it), and let the kernel on the remote device handle echoing (and editing) input, translating newlines, etc. If the remote program wanted full-screen capabilities, it put the pty device it was talking to into raw mode.

Gettysburgh Address (Free Speech Protest)

Facebook has deemed the US declaration of independence “hate speech”.  Global Script takes no political position; but we are unalterably opposed to censorship in all forms. Free speech is a fundamental value of the Internet, and is necessary to all software development whatsoever. The only possible response to censorship of any kind, especially on the Internet, is to spread the censored material as far and wide as possible. Therefore, we will be posting a representative selection of US founding documents on this site, in protest of Facebook’s decision.

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate — we can not consecrate — we can not hallow — this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us — that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion — that we here highly resolve that these dead shall not have died in vain — that this nation, under God, shall have a new birth of freedom — and that government of the people, by the people, for the people, shall not perish from the earth.