Lookup without Replacement

Lookup without Replacement#

Consider two vectors, e.g. L: 'abacba' and R: 'baabaac'. By now, you should know about dyadic ⍳:

L←'abacba'
R←'baabaac'
L⍳R

2 1 1 2 1 1 4

That finds first location in L of each element in R.

However, what if we wanted the first b in R to “consume” the first b in L so that the second b in R would have to contend with the index of the second b in L?

That is, we want some function which gives 2 1 3 5 6 7 4. You could call it “iota without replacement”. As, in a sense, the iota progresses through L as it finds element from R, it is also known in APL circles as “progressive dyadic iota”.

'a1' 'b1' 'a2' 'c1' 'b2' 'a3' ⍳ 'b1' 'a1' 'a2' 'b2' 'a3' 'a4' 'c1'

2 1 3 5 6 7 4

So, because we numbered the as (which otherwise all match each other) and the bs, the right pairs get matched up.

While ⍋ gives us the indices that will sort, ⍋⍋ gives us the positions that each element will occupy in the sorted result.

↑L (L⍳L) (⍋⍋L⍳L)

a b a c b a 1 2 1 4 2 1 1 4 2 6 5 3

The first line is the data and the second is the indices of the first occurrences (i.e. all identical items will get the same index). The third line is the position that each will occupy when sorted. That means that identical elements get consecutive positions.

E.g. you can see that the first b gets 4 (because there are 3 as) and the second gets 5.

We can label both vectors in this way:

(L⍳L)⍳(R⍳R)

1 2 2 1 2 2 7

This almost solves the problem (although it might not look like it), but this labelling depends on both the order and the frequency of the elements in a vector. However, we need both vectors to be labelled in the same way, i.e the first a in each vector will get same label, and the second b in each will also get matching labels.

Since we’re going to look up elements of R in L anyway, we can use indices into L (that is L⍳R) instead of the lookup of R into itself (R⍳R) This ensures that elements of R are labelled with “L’s labelling system”.

↑L (L⍳L) (⍋⍋L⍳L)

a b a c b a 1 2 1 4 2 1 1 4 2 6 5 3

↑R (L⍳R) (⍋⍋L⍳R)

b a a b a a c 2 1 1 2 1 1 4 5 1 2 6 3 4 7

The only problem that remains is that L and R have their elements with different frequencies. The first b of R is labelled 5, whereas the first b of L is labelled 4. This is because L has one less a.

To solve this we need the second argument of each ⍳ to have the same number of each element when labelling both L and R. The easiest way to do this is with L⍪R and R⍪L. ⍪ is used instead of , so that the method can be used for higher rank arrays.

⍋⍋L⍳L,R

1 8 2 12 9 3 10 4 5 11 6 7 13

⍋⍋L⍳R,L

8 1 2 9 3 4 12 5 10 6 13 11 7

So, now we are working in precisely the same labelling scheme for both L and R, we just need use ↑ to remove the unneeded elements.

(≢L)↑⍋⍋L⍳L⍪R

1 8 2 12 9 3

(≢R)↑⍋⍋L⍳R⍪L

8 1 2 9 3 4 12

((≢L)↑⍋⍋L⍳L⍪R)⍳((≢R)↑⍋⍋L⍳R⍪L)

2 1 3 5 6 7 4

Now, we can define this function:

f←{((≢⍺)↑⍋⍋⍺⍳⍺⍪⍵)⍳((≢⍵)↑⍋⍋⍺⍳⍵⍪⍺)}

A use case of this function is in a first-come, first-served queue. For example you might have First Class, Premium, Business, and Economy seats on a plane. This function will allocate seats based on which class was ordered.

Lets say we have a plane: '11bbbpeepee' where 1 is first class, p is premium, b is business, and e is economy. And we have a bunch of customers coming to buy seats: 'bbepbeppe1ee'.

'11bbbpeepee' f 'bbepbeppe1ee'

3 4 7 6 5 8 9 12 10 1 11 12

Person number 1 gets seat 3, person number 3 gets seat 7, and so on. People 8 and 12 however cannot get a seat, because all of the seats of their type have been allocated already. This is indicated by the result 12 which is out of the bounds of plane (length-11).