Wilful Inference

Prerequisites: Knowledge of in/sur/bijections and the definition of a permutation.

Update (23 Sep. 2020): I just delivered some portion of this material to the Berkeley Math Circle's Intermediate II group. The corresponding notes (Part 1 and Part 2) may provide some helpful background. To bridge the gap between terminology used there and here, note that injection means "1-to-1 function," surjection means "onto function," and bijection means "1-to-1 correspondence."

In this post, I will give some definitions and problems to show that all 3-permutations are Wilf-equivalent. None of the results or proofs are new. My goal is to introduce an interesting topic, centered on a guided exploration. The beginning is framed around my first encounters with the topic, but this can be skipped.

I have loved math (of some sort or another) for as long as I can remember, but when my mom's friend Rick Luttmann encouraged us to attend the 2014 meeting of the MAA Golden Section, I was blown away by the topics being presented, which were unlike anything I had seen before. It was here that I got to see Zvezdelina Stankova deliver a talk on the subject of permutation patterns. The subject enthralled me, largely thanks to Zvezda's compelling presentation of the results. Of course, one hour is not enough to describe a whole area of research and give proofs accessible to a high-school layperson. As such, I left with a keen interest in the material, but little idea of why the results held true.

Fortunately, I was able to revisit the subject later that year. In the summer, I attended a math camp centered around training for competitions. It was immediately obvious that I was in over my head. Everyone was younger, seemingly sharper, and certainly far more acquainted with the world of competition mathematics. On my first day of algebra class, I wanted to run and hide. I was sitting in the back of a large room, filled with students who deftly used formulas I had never encountered to modify symbolic expressions that seemed utterly static to me. But I fortunately stuck with it until the afternoon, when I had my first class in combinatorics. I loved the subject, finding particular interest in enumeration and bijective proofs. While combinatorics is an amazing subject, this nurturing learning environment was mostly due to the instructor, Walter Stromquist. His teaching encouraged participation and teamwork, calling students to the board and having them work out many problems collaboratively. He also adopted a fun system of incentives, where he gave toy cars to students who would present their work at the board. This gave me courage to share my ideas and value what I had to say. Walter was also very willing to talk more outside of class, encouraging exploration and teaching me about more sorts of math that I never knew existed. He took the time to tell me all about graph embeddings on various surfaces and envy-free cake-cutting. But the most impactful thing that Walter did for me was encouraging me to revisit the subject of pattern avoidance, guiding me towards finding a proof myself.

In his class, Walter discussed the occurrences of the Catalan numbers in various enumerations, including triangulations of polygons, 231-avoiding permutations and Dyck paths. He also pointed out that Catalan numbers count sequences 0\leq a_1\leq \dots\leq a_n with a_k< k for every k=1,\dots,n. He then challenged the class to find a bijection between such sequences and 231-avoiding permutations. Recalling my earlier fascination with Zvezda's talk, I took this challenge to heart. I barely finished my proof in time to present it on the last day of class, but I was very happy to have undertaken a somewhat open-ended question and come out the other side. Unfortunately, Walter had to leave early, so I didn't get to show him my proof in person. But we corresponded after the camp ended and he sent an email to Vincent Vatter on my behalf. Vince, who is an expert in permutation patterns, kindly replied and showed us how my proof was just the usual one "turned on its side." However, I still enjoy Walter's suggested approach and I will outline it here. The writing below is more mature than my high school abilities allowed, but it follows the same outline.

We write [n] for the set \{1,\dots,n\}, so that an n-permutation is a bijection \sigma:[n]\rightarrow [n]. We write S_n for the set of n-permutations. We will picture these permutations as n squares filled in on an n\times n grid, so that no two filled squares are in the same row or the same column. Some examples are drawn below.



From left to right, these are 132, 231, 312 and 213. By representing a permutation within a square grid, we can apply any of the symmetries of a square to it (meaning the group D_8 acts on the set S_n). If this symmetry is denoted by R, then we simply write R\sigma for the permutation formed by applying R to the diagram of \sigma. You may check that the above four permutations can all be transformed into each other. 

In permutation patterns, the key notion is that of containment. Given permutations \sigma\in S_n and \tau\in S_k, suppose there are integers 1\leq x_1<\dots<x_k\leq n such that \tau(i)<\tau(j) iff \sigma(x_i)<\sigma(x_j). Then we will say that \sigma contains \tau. If we wish to emphasize the indices x_i, we will also write \tau=\sigma(x_1,\dots,x_k), where this notation implies that x_1<\dots<x_k. Containment means that the diagram of \tau can be formed from the diagram of \sigma by iteratively deleting the row and column corresponding to some filled-in square, a total of n-k times. For example, 21453 contains 132.


If \sigma does not contain \tau, then we say that \sigma avoids \tau. For any permutation \tau, we then define A_n(\tau) to be the number of permutations in S_n avoiding \tau. We say that two permutations \sigma and \tau are Wilf-equivalent (written as \sigma\sim\tau) if A_n(\sigma)=A_n(\tau) for every positive integer n. Even for rather small permutations, determining Wilf-equivalence or lack thereof can be difficult. If the permutations are not Wilf-equivalent, the sequences A_n(\sigma) and A_n(\tau) may still agree for the first several terms. If they are Wilf-equivalent, equality must be demonstrated for all n, despite difficulties in finding closed forms for these sequences.

Problem 1: If \tau\in S_k and \sigma\in S_n are Wilf-equivalent, show that k=n.

Solution:  If \rho\in S_i and i<k, then \rho clearly avoids \tau, because are no integers 1\leq x_1<\dots<x_k\leq i. Thus A_i(\tau)=i! for all i<k. But \tau obviously contains itself, so A_k(\tau)<k!. Hence k can be recovered from the sequence A_j(\tau), as the smallest j for which A_j(\tau)<j!. The same thing is true of n and A_j(\sigma). But we have assumed that these two sequences are the same, so we have k=n as we wished to prove. 

This shows that each Wilf-equivalence class of permutations is contained in some S_n. It turns out that all permutations in S_3 are Wilf-equivalent, which we will prove in the course of this post. In the case of S_4, there are three Wilf-equivalence classes among the 24 permutations. This was first proven by Zvezdelina Stankova in the early 1990's, when she was an undergraduate! But even the six permutations in S_3 seem like a lot to tackle. Thankfully, we need only consider these permutations up to symmetries of the square.

Problem 2: If R is a symmetry of the square, show that R\sigma contains R\tau if and only if \sigma contains \tau.

Solution: First suppose that R is reflection across a horizontal line. For any permutation \rho\in S_k, we have R\rho(i)=k+1-\rho(i). It follows that \rho(i)<\rho(j) if and only if R\rho(i)>R\rho(j). If \sigma\in S_n contains \tau\in S_m, then there exist 1\leq x_1<\dots<x_m\leq n such that \tau=\sigma(x_1,\dots,x_m). This assumption implies that R\tau(i)<R\tau(j)\Longleftrightarrow \tau(i)>\tau (j)\Longleftrightarrow \sigma(x_i)>\sigma(x_j)\Longleftrightarrow R\sigma(x_i)<R\sigma(x_j). This shows that R\tau=R\sigma(x_1,\dots,x_n). It follows that R\sigma contains R\tau. Conversely, if R\sigma contains R\tau, then R(R\sigma)=\sigma contains R(R\tau)=\tau. Now suppose that R is reflection across a vertical line. For any permutation \rho\in S_k, we have R\rho(i)=\rho(k+1-i). If \sigma\in S_n contains \tau\in S_m, then there exist some 1\leq x_1<\dots<x_m\leq n with \tau=\sigma(x_1,\dots,x_m). For i=1,\dots,m, let y_i=n+1-x_i and z_i=y_{m+1-i}. Then we have 1\leq y_m<\dots<y_1\leq n and 1\leq z_1<\dots<z_m\leq n. Note that R\sigma(y_i)=\sigma(x_i) and thus R\sigma(z_i)=R\sigma(y_{m+1-i})=\sigma(x_{m+1-i}). Since \tau(i)<\tau(j)\Longleftrightarrow \sigma(x_i)<\sigma(x_j), we are able to conclude that R\tau(i)=\tau(m+1-i)<\tau(m+1-j)=R\tau(j) \Bigg\Updownarrow R\sigma(z_i)=\sigma(x_{m+1-i})<\sigma(x_{m+1-j})=R\sigma(z_j). This shows that R\tau=R\sigma(z_1,\dots,z_m). It follows that R\sigma contains R\tauConversely, if R\sigma contains R\tau, then R(R\sigma)=\sigma contains R(R\tau)=\tau. Thus we have proven the result when R is a vertical or horizontal reflection. But these two reflections generate all eight symmetries of the square, so the result follows.

Since avoidance is just the negation of containment, it follows that transforming \tau by a symmetry of the square does not change the sequence A_n(\tau). Hence \tau\sim R\tau, for any permutation \tau and symmetry R of the square. In particular, it follows that 132\sim 231\sim 312\sim 213 and 123\sim 321. To show that any two permutations in S_3 are Wilf-equivalent, it then suffices to prove 231\sim 321. To prove this result, we will biject 231-avoiding and 321-avoiding n-permutations for all n, with another type of sequence in between.

Let L_n denote the set of all sequences of integers a_1,\dots,a_n such that 0\leq a_i<i for all i=1,\dots,n. For \sigma\in S_n and k\in [n], define F_k(\sigma) to be the number of indices i<k with \sigma(i)<\sigma(k). Pictorially, this counts the number of filled squares occuring left-and-down from the filled square of the k^\text{th} column. Define F:S_n\rightarrow L_n to take \sigma to the sequence F_k(\sigma) for k=1,\dots,n. For example, F(1342)=0121.


Problem 3: Show that F is a bijection between S_n and L_n

Solution: We prove this result by induction on n. We know that S_n has n! elements. We can also see that L_n has n! elements, because there are precisely k choices for the k^\text{th} entry of a sequence in L_n. To prove that F is a bijection, it therefore suffices to show that it is a surjection. Let a_1,\dots,a_n be a sequence in L_n. Then a_1,\dots,a_{n-1} is in L_{n-1}. By induction, there exists \tau\in S_{n-1} with F(\tau)=(a_1,\dots,a_{n-1}). We will now find some permutation \sigma\in S_n such that F(\sigma)=(a_1,\dots,a_n). We define \sigma(n)=a_n+1. For k<n, we let \sigma(k)=\tau(k) if \tau(k)<a_n+1. Otherwise, we let \sigma(k)=\tau(k)+1. Since \tau is defined to be a permutation, we can see that \sigma:[n]\rightarrow [n] is injective and thus \sigma\in S_n. Since \sigma and \tau are related by \tau=\sigma(12\dots n-1), we also see that F_k(\sigma)=F_k(\tau)=a_k for k<n. Note that there are precisely a_n values k\in [n-1] with \sigma(k)=\tau(k)< a_n+1, since \tau is a permutation. But if \tau(k)\geq a_n+1, we have \sigma(k)=\tau(k)+1>a_n+1. Thus there are exactly a_n indices k<n with \sigma(k)<a_n+1=\sigma(n). This proves that F(\sigma)=(a_1,\dots,a_n). As an example, we illustrate this inductive process to find \sigma\in S_4 corresponding to the sequence 0121. 



Problem 4: Show that \sigma\in S_n is 231-avoiding if and only if F(\sigma) is non-decreasing.

Solution: Fix some permutation \sigma\in S_n. For any j\in [n], we will write I_j=\{i<j:\sigma(i)<\sigma(j)\}, so that F_j(\sigma)=\#(I_j). If there exist 1\leq j<k\leq n with F_k(\sigma)<F_j(\sigma), then \#(I_k)<\#(I_j). Hence, there exists some i\in I_j\setminus I_k. In other words, we have i<j<k such that \sigma(k)\leq \sigma(i)<\sigma(j). Because i< k, we have \sigma(k)\neq \sigma(i) and thus \sigma(k)<\sigma(i). This proves that 231=\sigma(i,j,k). Therefore, we have shown that F(\sigma) decreasing implies that \sigma contains 231. Conversely, suppose \sigma contains 231. Then there exist i<j<k with 231=\sigma(i,j,k). Let k be as small as possible with this property. If h\in I_k satisfies h>j, then we have i<j<h and \sigma(h)<\sigma(k)<\sigma(i). But then 231=\sigma(i,j,h), which contradicts the minimality of k. For any h\in I_k, we therefore have h\leq j and \sigma(h)<\sigma(k)<\sigma(j) and thus h\in I_j. In other words, I_k\subset I_j. The containment is strict, since i\in I_j but i\notin I_k. Thus F_k(\sigma)=\#(I_k)<\#(I_j)=F_j(\sigma), so F(\sigma) decreases.

We define B_n\subset L_n to consist of those sequences that are non-decreasing. The previous problem shows that F maps 231-avoiding permutations bijectively onto B_n. As a high-schooler, this was as far as I could proceed. But we are now halfway to proving 231\sim 321. To biject B_n with 321-avoiding permutations, we follow Rotem's approach as mentioned in this paper. For any permutation \sigma\in S_n, we will recursively define a sequence G_i(\sigma). Let G_1(\sigma)=0. If we have \sigma(j)<\sigma(i) for all j<i, we let G_i(\sigma)=G_{i-1}(\sigma). Otherwise, we define G_i(\sigma)=\sigma(i). To refer to this sequence as a whole, we write G(\sigma). For example, G(154236)=004233 because 5 and 6 are the only values in 154236 greater than all their predecessors.

Problem 5: If \sigma\in S_n is 321-avoiding, show that G(\sigma)\in B_n

Solution: We prove the contrapositive. If G(\sigma)\notin B_n, then either G(\sigma)\notin L_n or G(\sigma) decreases.  Suppose first that G(\sigma) is decreasing. Then there exist 1\leq j<k\leq n with G_j(\sigma)>G_k(\sigma). We choose these indices so that j+k is as small as possible. Note that G_j(\sigma)>G_k(\sigma)\geq 0, so we must have j>1. The minimality assumption on j+k implies that G_j(\sigma)\neq G_{j-1}(\sigma) and G_k(\sigma)\neq G_{k-1}(\sigma). By the definition of G, we then have G_j(\sigma)=\sigma(j) and G_k(\sigma)=\sigma(k). Recall that G_j(\sigma)\neq G_{j-1}(\sigma) also implies that there is some i<j with \sigma(i)>\sigma(j). Therefore, i<j<k and \sigma(i)>\sigma(j)>\sigma(k). Hence 321=\sigma(i,j,k) so \sigma contains 321. Now suppose that G(\sigma)\notin L_n. Then there exists some i\in [n] with G_i(\sigma)\geq i. Choose i to be as small as possible with this property. It is then clear that i>1 and G_i(\sigma)>i-1>G_{i-1}(\sigma). By the definition of G, we conclude that \sigma(i)=G_i(\sigma)\geq i and there is some j<i with \sigma(j)>\sigma(i). For the sake of contradiction, assume that \sigma(k)>\sigma(i) for all k>i. Then \sigma(\{i+1,\dots,n\})\subset \{\sigma(i)+1,\dots,n\} and thus n-i\leq n-\sigma(i), since \sigma is injective. But then i\geq \sigma(i)\geq i, so we must have \sigma(i)=i. Therefore, the sets \{i+1,\dots,n\} and \{\sigma(i)+1,\dots,n\} are equinumerous, so \sigma must biject these sets. But this contradicts the existence of j<i with \sigma(j)>\sigma(i), which we have already demonstrated. Thus there must be some k>i with \sigma(k)<\sigma(i). It follows that 321=\sigma(j,i,k) and thus \sigma contains 321. We have now covered both reasons for G(\sigma)\notin B_n.

We want to show that G is injective, but its image does not (to the best of my knowledge) have a simple characterization. Instead, we will define a useful function H in terms of some \tau:[n]\rightarrow \{0,\dots,n-1\}. We partition [n]=A\sqcup B so that A consists of all indices i\in [n] with 0<\tau(i)\neq \tau(j) for all j<i. Let T be the set of permutations \sigma\in S_n with \sigma(i)=\tau(i) for all i\in A. Then T\neq \emptyset, since \tau maps A injectively into [n-1]\subset [n]. Let B=\{b_1<\dots<b_k\}. Then T consists of permutations in S_n with n-k of their values determined. Since k inputs and outputs remain to be determined, elements of T are in bijection with permutations in S_k. More concretely, every \rho\in S_k corresponds to a unique \pi\in T with \rho=\pi(b_1,\dots,b_k)We write H:S_k\rightarrow T\subset S_n to denote this bijection, so that H(\rho)=\pi. Note that H is injective, since \rho=\pi(b_1,\dots,b_k). Keep in mind that the construction of H:S_k\rightarrow S_n depends on \tau. For example, suppose that \tau=22052. Then A=\{1,4\}, so H(132)=21453 and H(321)=24351.


Problem 6: Show that G is injective by using H to construct \sigma uniquely from G(\sigma).

Solution: Let \tau=G(\sigma) and let A, B and T be as in the construction of H. Applying the definition of G, we can see that \sigma(i)\neq \tau(i) implies either i=1 or \tau(i)=\tau(i-1). But \tau(1)=0, so we have i\notin A for any i\in [n] with \sigma(i)\neq \tau(i). Thus \sigma(i)=\tau(i) for all i\in A. Hence, if \rho\in S_n satisfies G(\rho)=\tau, we have \rho\in T. We wish to prove that such a \rho is unique (we have assumed that \sigma is one such permutation). We consider an arbitrary \upsilon\in S_k and set \rho=H(\upsilon)\in T. Since H:S_k\rightarrow T is a bijection, this is the same as considering an arbitrary \rho\in T. Assume for the sake of contradiction that \upsilon \neq 12\dots k and G(\rho)=\tau. Then there exist 1\leq i<j\leq k with \upsilon(i)>\upsilon(j). Correspondingly, we have b_i<b_j and \rho(b_i)>\rho(b_j). By the definition of G, we have G_{b_j}(\rho)=\rho(b_j)>0. Moreover, the values G_l(\rho) for l<b_j are either 0 or \sigma(p) for some p\leq l<b_j. Since \sigma is a permutation, this gives G_{b_j}(\rho)\neq G_l(\rho) for any l<b_j. Therefore, 0<\tau(b_j)\neq \tau(l) for all l<b_j, which implies that b_j\in A. This contradicts the definition of b_j as an element of B. Thus there is only one \rho\in T which can satisfy G(\rho)=\tau, which is given by \rho=H(12\dots k). But we originally defined \tau=G(\sigma), so this uniqueness gives the free bonus equality \sigma=H(12\dots k).

Problem 7: For any \tau\in B_n, show that there is a 321-avoiding permutation \sigma with \tau=G(\sigma).

Solution: Let A, B and T be as in the construction of H, where \tau is as given in the problem. We define \sigma=H(12\dots k), because the previous solution shows that this is the only viable option. But we don't yet know that \tau is in the image of G, so we cannot use the uniqueness of Problem 6 to immediately conclude that \tau=G(\sigma). First, we assume for the sake of contradiction that there are i<j<l with 321=\sigma(i,j,l). Recall that \tau is non-decreasing and \sigma=\tau on A, so \sigma is increasing on A. Because 12\dots k=\sigma(b_1,\dots,b_k), it follows that \sigma is also increasing on B. But \sigma(i)>\sigma(j)>\sigma(l). Thus, out of the indices i, j, and l, at most one can be in A and at most one can be in B. Since [n]=A\cup B, this is clearly a contradiction. Therefore, \sigma is 321-avoiding. Now assume for the sake of contradiction that G(\sigma)\neq \tauWe then choose the smallest i\in [n] such that G_i(\sigma)\neq \tau(i). Note that i>1, since G_1(\sigma)=0=\tau(1), both sequences being in B_n. First, suppose that i\in A. Then \sigma(i)=\tau(i)\neq G_i(\sigma), so we have G_i(\sigma)=G_{i-1}(\sigma) and \sigma(i)>\sigma(j) for all j<i, by the definition of G. Therefore \sigma(i)\geq i, contradicting \sigma(i)=\tau(i)<i. Next, suppose that i\in B. Then G_i(\sigma)\neq \tau(i)=\tau(i-1)=G_{i-1}(\sigma), so by the definition of G, we have G_i(\sigma)=\sigma(i) and some j<i with \sigma(j)>\sigma(i). But \sigma is increasing on B, so we must have j\in A. Thus G_j(\sigma)=\tau(j)=\sigma(j)>\sigma(i)=G_i(\sigma), contradicting the fact that G(\sigma) is non-decreasing. But we must have i\in A or i\in B and we have reached a contradiction in either case. Therefore, no such index i can exist. This proves that \tau=G(\sigma), as desired.

This result shows that G maps 321-avoiding permutations bijectively onto B_n. Combining this with our final result on F, this gives a bijection between 231-avoiding and 321-avoiding permutations. Therefore A_n(231)=A_n(321) for every n\in \mathbb N. Thus 231\sim 321 and so all 3-permutations are Wilf-equivalent!

Comments