Binary Tree Arrays

2021-12-20

This year, I have been taking part in Advent of Code for the first time. Day 18’s problem has required the use of binary trees. Since I’m solving these problems in C, binary tree arrays came to mind as the perfect way to build and manage the binary trees I’d be wrangling.

They’re a beautifully simple way of positioning the elements in predictable locations in memory. So I’d like to take a few minutes to share what makes them so great, in my mind.

Think of a binary tree like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


               0
             /   \
           /       \
         /           \
       1               2
     /   \           /   \
    /     \         /     \
   3       4       5       6
  / \     / \     / \     / \
 7   8   9  10  11  12  13  14

Excuse the poor proportions. If you lay out the nodes of the binary tree into an array, using the indices given by the diagram above, then some neat numerical properties appear:

The children of a node at index i can be found at: i*2+1 and i*2+2.
The parent of a node at index i can be found at: (i-1)/2

With these two properties, you can trivially construct a binary tree within an array. The required size of the array can be inferred from the maximum depth the tree will have. If you consider a depth of 1 to be a tree with 3 elements then the required size of the array will always be 1+(1<<depth).

The size of the array for a binary tree of a given depth is 1+(1<<depth)

Breadth first traversal is as simple as a for loop over the array, while depth first traversal is the same as any other binary tree. In addition, the tree is densely packed and doesn’t waste space on pointers between nodes, making it very cache friendly.

If you want to iterate through all the nodes at a given depth in the array, the range of indices to iterate through is trivial to calculate with these next two properties:

The number of nodes (“width”) at a given depth is 1<<depth
The first index at a given depth is width-1.

1
2
3
4
5


struct node array[32];
size_t width = 1<<depth;
for (size_t i = width-1; i < 2*width-1; ++i) {
  struct node *n = array + i;
}

If you want to know the depth of node, the calculation can be derived by inverting the previous two properties.

The depth of a node is log2(i+1)

If you don’t want to rely on a maths library, this is easily calculable by finding the index of the left-most 1 bit of i+1.

1
2
3
4
5
6
7
8
9


size_t depth(size_t i)
{
  size_t d = 0;
  i += 1;
  while (i >>= 1) {
    d++;
  }
  return d;
}

On x86 there’s the bsr instruction that accomplishes the same thing more efficiently. Though you probably just want to use the __builtin_clz intrinsic supported by both GCC and clang.

And that’s all there really is to this data structure. The beauty of it is that these properties are all easily rediscovered with a little thought, just by considering a few examples. As a result, the only thing you need to know is the first property.

Neat.