Chunking a List into Segments

Chunking a List into Segments#

In this article, we will discuss a method for chunking a list into segments of a specified size. There are multiple ways to approach this problem, but one of the simplest methods is to use the partitioning function.

The Approach Using Partitioning#

Here’s how partitioning works: it uses one element as a left argument for every element on the right. The left argument acts as a count of the number of new sections or segments that begin with the corresponding element on the right. When we restrict this to a boolean context, it simply indicates whether to start a new segment or continue the previous one.

Test Case Example#

Consider a list of seven elements:

We start a segment with the first element.
We continue with “cat” in the same segment.
“In” also belongs to the same segment.
The lowercase “t” creates its own segment since we immediately start a new segment with “head.”
We include “sep” in that segment, while “pat” gets its own segment at the end.

This behavior can be expressed using the partitioning function on the list:

l ← 'The' 'cat' 'in' 'the' 'hat' 'sat' 'pat'

Constructing a Boolean Vector#

The real task is to construct a boolean vector. This vector will have a 1 every n elements based on our specified chunk size.

To achieve this, we can create a Lambda (anonymous) function, which we’ll refer to as addition. The function will take the list on the right according to the specification and the chunk size on the left.

3 {⍺} l

This indicates the chunk size of 3.

Next, we can take the length of the list, which we denote as this “tally mark”:

3 {≢⍵} l  ⍝ Returns the length of the list, which is 7

We will then use the reshape function (Greek letter rho, representing reshape) to reshape the 1 and 0 pattern we created earlier, based on our chunk size.

Function Execution#

Note that in APL functions, they have a long right scope, so we must parenthesize the left argument in the reshape function but not the right argument.

Finally, we will take this boolean vector and use it for partitioning:

3 {((≢⍵)⍴⍺↑1)⊂⍵} l

This APL expression partitions the list into chunks based on the boolean vector.

Reducing Parentheses#

To minimize the number of parentheses, I prefer to swap the function arguments. The partitioning function expects the mask on the left and the data on the right.

There’s a higher-order function known as commute, which modifies the function to its immediate left so that it accepts its arguments in the opposite order. Thus, we can move the partitioning function to the left, remember to commute it, and then place the right argument as the left argument to the partitioning function.

Testing the Functionality#

Let’s run some tests by adjusting the chunk sizes.

When we adjust the chunk size to 2, we get pairs, and any single left element stands alone:

2 {⍵⊂⍨(≢⍵)⍴⍺↑1} l

Reducing the chunk size to 6 results in almost all elements being included in the same segment:

6 {⍵⊂⍨(≢⍵)⍴⍺↑1} l

If we increase the chunk size to 7, we still get an additional enclosure for a single element:

7 {⍵⊂⍨(≢⍵)⍴⍺↑1} l

Increasing the chunk size further results in no difference, as the specification merely shortens the mask.

Thank you for reading this article on chunking lists!