### Continuing Integer Sequences

Monday, October 27th, 2014 | Author: Konrad Voelkel

A classic class of "math problem" is the continuation of integer sequences from a finite sample (usually at the beginning). For example:

Continue 2,4,6,8,...

To which the solution is often supposed to be 10,12,14, and so on.

**The problem, as any mathematician knows**, is that there is not the one solution. In fact, given any set of finite numbers one could just talk about the sequence that starts like that and continues with zeroes only, that's a perfectly valid sequence. Of course, one should really figure out the rule behind the finitely many numbers, but there are always many possible choices. The game is not to find any sensible rule, but to find the rule the designer had in mind. It's more about testing your knowledge of culture than an honest test of mathematical ability (or anything else).

**But there is a way to fix this**.

If you look at the sequence above, you could continue with 12,14,16,18,22, and so on - the sequence of all even numbers which are not divisible by 10. This is a valid continuation, of course, and it even follows a sensible rule, just like the sequence of all even numbers. But there is a difference: one description is longer.

While you can always take a finite sequence of numbers and any infinite integer sequence, and craft the rule "the first few numbers are given by the finite sequence, then follows this infinite sequence", this will always be longer than the rule for the infinite sequence alone.

The technical concept that captures the length of descriptions appropriately is *Kolmogorov complexity*, relative to a description language. A description language can be formalized as a Turing machine M that takes descriptions D of integer sequences in some syntax (a list of letters and whitespace, for example) together with a natural number N, and spits out the integer $M(D,N) = a_N$ in the sequence $(a_n)_{n \in \mathbb{N}}$ which is described by D.

One may describe the computation of an integer sequence more or less efficiently, so it makes sense to define the (Kolmogorov) complexity of a sequence relative to a description language as the minimum length of a description in that language. Note that length of words is a natural number, hence the infimum length is always attained and the complexity is a natural number, too.

If we take some description language and bake a new one by requiring everything to start with the symbols "###" and then the same stuff as in the old language, then we have always longer descriptions in the new language. That's why we have to take the complexity relative to a description language. There may be other unexpected behaviour: if you have a single symbol just for the sequence of all even numbers, the description gets very short for this particular sequence (not necessarily for the others). One can compare description languages by looking at the length of the programs that compute the sequences from descriptions. Surely, in the program that interprets the single symbol for the sequence of all even numbers, the computation description for that sequence is hidden. Putting such comparison factors into the Kolmogorov complexity gives a more robust measure of the descriptive complexity. We don't need to go into more details here, as you might as well take as description language "Fortran programs" or "Haskell programs" and be happy with it.

**The proposed fix** to the problem of non-unique solutions to sequences consists now of modifying the question in two steps:

1) Given a finite sequence of integers, what is the shortest description of any rule that produces this sequence (and continues infinitely)? In other words: what is a description whose length attains the Kolmogorov complexity?

2) Since there might be several distinct rules with the same description length, the real question is: What is the minimal Kolmogorov complexity of an infinite sequence that starts with the prescribed values?

The reason why this "fixes" the problem, is the following: suppose someone has a rule for an infinite integer sequence and writes down a few of the first numbers. If you can come up with a rule that starts like that, but has a shorter description, you can say "oh, but if you wanted that more complicated sequence, you should have given more numbers to make it clear!". In some sense, this would mean that the problem was underspecified.

At least this is a much better complaint than just saying "I could continue any finite sequence however I like". And it's a very very hard problem, to find the Kolmogorov complexity explicitly. Even lower bounds are usually impossible. The complexity itself is not a computable number, so it is not possible (in general) to write a computer program to continue a finite sequence as I proposed. It's that hard.

A technical note at the end: Ordinarily, Kolmogorov complexity is defined for finite strings, which we apply to some encoding of a Turing machine that computes an infinite sequence. This excludes any non-computable infinite sequence, but that's sensible for the problem at hand.