Arrays, Part I: the problem

Array of satellite dishes

The way to write map on arrays in the functional style is

'map :: ∀ 'α. ∀ 'β. (α → β) → array.t α → array.t β;
'map 'f 'a = w nil a, where:
    'w 'b nil :- b;
    'w 'b ('x@'a1) :- w (b @@ f x) a1;

While this is natural, it’s unfortunately O(n^2): the @ view and @@ function in the last line are both O(n) in the array they’re applied to, because they need to make (nearly-) complete copies of the input.

One response functional programmers take to that is to give up on arrays and use linked lists instead. Linked lists are easy to build as algebraic data types and fit quite nicely with the functional paradigm:

'map :: ∀ 'α. ∀ 'β. (α → β) → list.t α → list.t β;
'map 'f nil :- nil;
'map 'f ('x@'xn) :- f x @ map f xn;

On the other hand, while lazy linked lists are quite nice, arrays are (arguably) a better translation of the concept of the free monoid into functional programming. An array.t α is either an element of the free monoid over α or non-terminating, which is similar to record types, function types, and sum types, whereas a list.t α could be a finite sequence of αs, an infinite sequence, or a finite sequence of αs followed by undefined.

Furthermore, linked lists use substantially more memory than arrays do (not asymptotically, but a linked list will consume something like 3n pointer words for n elements, vs. something like n + 4 pointer words for an n-element array) and have (potentially) worse cache behavior – something that is becoming more important over time, not less.

This leads some functional programmers to use arrays in spite of their inefficiency. The most popular technique (AFAIK) is to use mutable arrays within the ST monad, which means abandoning the functional paradigm entirely in favor of the imperative for array manipulations. Fortunately, that can be cleverly hidden within array combinators, but, for Global Script, it would be nice if there was a better approach; some way to do array programming within the functional paradigm.

For view (@), the O(n) problem is easy to fix: replace arrays with slices. Essentially, rather than storing a pointer to a region of memory, store a region and offset. Then view (@) can return a slice with the start offset pointing one element further into the array than in the input.

The @@ function is a harder nut to crack. The only way to implement it in O(1) is to modify the input array in place, but this is obviously impossible in a pure-functional language.

That leaves arrays in an awkward place in functional programming languages: basically a foreign data structure, used for efficiency but not really belonging.

Part II, on copy-on-write, is here.


2 thoughts on “Arrays, Part I: the problem

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s