Wilful Inference

Prerequisites: Knowledge of in/sur/bijections and the definition of a permutation.

Update (23 Sep. 2020): I just delivered some portion of this material to the Berkeley Math Circle's Intermediate II group. The corresponding notes (Part 1 and Part 2) may provide some helpful background. To bridge the gap between terminology used there and here, note that injection means "1-to-1 function," surjection means "onto function," and bijection means "1-to-1 correspondence."

In this post, I will give some definitions and problems to show that all 3-permutations are Wilf-equivalent. None of the results or proofs are new. My goal is to introduce an interesting topic, centered on a guided exploration. The beginning is framed around my first encounters with the topic, but this can be skipped.

I have loved math (of some sort or another) for as long as I can remember, but when my mom's friend Rick Luttmann encouraged us to attend the 2014 meeting of the MAA Golden Section, I was blown away by the topics being presented, which were unlike anything I had seen before. It was here that I got to see Zvezdelina Stankova deliver a talk on the subject of permutation patterns. The subject enthralled me, largely thanks to Zvezda's compelling presentation of the results. Of course, one hour is not enough to describe a whole area of research and give proofs accessible to a high-school layperson. As such, I left with a keen interest in the material, but little idea of why the results held true.

Fortunately, I was able to revisit the subject later that year. In the summer, I attended a math camp centered around training for competitions. It was immediately obvious that I was in over my head. Everyone was younger, seemingly sharper, and certainly far more acquainted with the world of competition mathematics. On my first day of algebra class, I wanted to run and hide. I was sitting in the back of a large room, filled with students who deftly used formulas I had never encountered to modify symbolic expressions that seemed utterly static to me. But I fortunately stuck with it until the afternoon, when I had my first class in combinatorics. I loved the subject, finding particular interest in enumeration and bijective proofs. While combinatorics is an amazing subject, this nurturing learning environment was mostly due to the instructor, Walter Stromquist. His teaching encouraged participation and teamwork, calling students to the board and having them work out many problems collaboratively. He also adopted a fun system of incentives, where he gave toy cars to students who would present their work at the board. This gave me courage to share my ideas and value what I had to say. Walter was also very willing to talk more outside of class, encouraging exploration and teaching me about more sorts of math that I never knew existed. He took the time to tell me all about graph embeddings on various surfaces and envy-free cake-cutting. But the most impactful thing that Walter did for me was encouraging me to revisit the subject of pattern avoidance, guiding me towards finding a proof myself.

In his class, Walter discussed the occurrences of the Catalan numbers in various enumerations, including triangulations of polygons, 231-avoiding permutations and Dyck paths. He also pointed out that Catalan numbers count sequences $0\leq a_1\leq \dots\leq a_n$ with $a_k< k$ for every $k=1,\dots,n$. He then challenged the class to find a bijection between such sequences and 231-avoiding permutations. Recalling my earlier fascination with Zvezda's talk, I took this challenge to heart. I barely finished my proof in time to present it on the last day of class, but I was very happy to have undertaken a somewhat open-ended question and come out the other side. Unfortunately, Walter had to leave early, so I didn't get to show him my proof in person. But we corresponded after the camp ended and he sent an email to Vincent Vatter on my behalf. Vince, who is an expert in permutation patterns, kindly replied and showed us how my proof was just the usual one "turned on its side." However, I still enjoy Walter's suggested approach and I will outline it here. The writing below is more mature than my high school abilities allowed, but it follows the same outline.

We write $[n]$ for the set $\{1,\dots,n\}$, so that an $n$-permutation is a bijection $\sigma:[n]\rightarrow [n]$. We write $S_n$ for the set of $n$-permutations. We will picture these permutations as $n$ squares filled in on an $n\times n$ grid, so that no two filled squares are in the same row or the same column. Some examples are drawn below.



From left to right, these are $132$, $231$, $312$ and $213$. By representing a permutation within a square grid, we can apply any of the symmetries of a square to it (meaning the group $D_8$ acts on the set $S_n$). If this symmetry is denoted by $R$, then we simply write $R\sigma$ for the permutation formed by applying $R$ to the diagram of $\sigma$. You may check that the above four permutations can all be transformed into each other. 

In permutation patterns, the key notion is that of containment. Given permutations $\sigma\in S_n$ and $\tau\in S_k$, suppose there are integers $1\leq x_1<\dots<x_k\leq n$ such that $\tau(i)<\tau(j)$ iff $\sigma(x_i)<\sigma(x_j)$. Then we will say that $\sigma$ contains $\tau$. If we wish to emphasize the indices $x_i$, we will also write $\tau=\sigma(x_1,\dots,x_k)$, where this notation implies that $x_1<\dots<x_k$. Containment means that the diagram of $\tau$ can be formed from the diagram of $\sigma$ by iteratively deleting the row and column corresponding to some filled-in square, a total of $n-k$ times. For example, $21453$ contains $132.$


If $\sigma$ does not contain $\tau$, then we say that $\sigma$ avoids $\tau$. For any permutation $\tau$, we then define $A_n(\tau)$ to be the number of permutations in $S_n$ avoiding $\tau$. We say that two permutations $\sigma$ and $\tau$ are Wilf-equivalent (written as $\sigma\sim\tau$) if $A_n(\sigma)=A_n(\tau)$ for every positive integer $n$. Even for rather small permutations, determining Wilf-equivalence or lack thereof can be difficult. If the permutations are not Wilf-equivalent, the sequences $A_n(\sigma)$ and $A_n(\tau)$ may still agree for the first several terms. If they are Wilf-equivalent, equality must be demonstrated for all $n$, despite difficulties in finding closed forms for these sequences.

Problem 1: If $\tau\in S_k$ and $\sigma\in S_n$ are Wilf-equivalent, show that $k=n$.

Solution:  If $\rho\in S_i$ and $i<k$, then $\rho$ clearly avoids $\tau$, because are no integers $1\leq x_1<\dots<x_k\leq i$. Thus $A_i(\tau)=i!$ for all $i<k$. But $\tau$ obviously contains itself, so $A_k(\tau)<k!$. Hence $k$ can be recovered from the sequence $A_j(\tau)$, as the smallest $j$ for which $A_j(\tau)<j!$. The same thing is true of $n$ and $A_j(\sigma)$. But we have assumed that these two sequences are the same, so we have $k=n$ as we wished to prove. 

This shows that each Wilf-equivalence class of permutations is contained in some $S_n$. It turns out that all permutations in $S_3$ are Wilf-equivalent, which we will prove in the course of this post. In the case of $S_4$, there are three Wilf-equivalence classes among the 24 permutations. This was first proven by Zvezdelina Stankova in the early 1990's, when she was an undergraduate! But even the six permutations in $S_3$ seem like a lot to tackle. Thankfully, we need only consider these permutations up to symmetries of the square.

Problem 2: If $R$ is a symmetry of the square, show that $R\sigma$ contains $R\tau$ if and only if $\sigma$ contains $\tau$.

Solution: First suppose that $R$ is reflection across a horizontal line. For any permutation $\rho\in S_k$, we have $R\rho(i)=k+1-\rho(i)$. It follows that $\rho(i)<\rho(j)$ if and only if $R\rho(i)>R\rho(j)$. If $\sigma\in S_n$ contains $\tau\in S_m$, then there exist $1\leq x_1<\dots<x_m\leq n$ such that $\tau=\sigma(x_1,\dots,x_m)$. This assumption implies that $$R\tau(i)<R\tau(j)\Longleftrightarrow \tau(i)>\tau (j)\Longleftrightarrow \sigma(x_i)>\sigma(x_j)\Longleftrightarrow R\sigma(x_i)<R\sigma(x_j).$$ This shows that $R\tau=R\sigma(x_1,\dots,x_n).$ It follows that $R\sigma$ contains $R\tau$. Conversely, if $R\sigma$ contains $R\tau$, then $R(R\sigma)=\sigma$ contains $R(R\tau)=\tau$. Now suppose that $R$ is reflection across a vertical line. For any permutation $\rho\in S_k$, we have $R\rho(i)=\rho(k+1-i)$. If $\sigma\in S_n$ contains $\tau\in S_m$, then there exist some $1\leq x_1<\dots<x_m\leq n$ with $\tau=\sigma(x_1,\dots,x_m)$. For $i=1,\dots,m$, let $y_i=n+1-x_i$ and $z_i=y_{m+1-i}$. Then we have $1\leq y_m<\dots<y_1\leq n$ and $1\leq z_1<\dots<z_m\leq n$. Note that $R\sigma(y_i)=\sigma(x_i)$ and thus $R\sigma(z_i)=R\sigma(y_{m+1-i})=\sigma(x_{m+1-i})$. Since $\tau(i)<\tau(j)\Longleftrightarrow \sigma(x_i)<\sigma(x_j),$ we are able to conclude that $$R\tau(i)=\tau(m+1-i)<\tau(m+1-j)=R\tau(j)$$ $$\Bigg\Updownarrow$$ $$R\sigma(z_i)=\sigma(x_{m+1-i})<\sigma(x_{m+1-j})=R\sigma(z_j).$$ This shows that $R\tau=R\sigma(z_1,\dots,z_m).$ It follows that $R\sigma$ contains $R\tau$. Conversely, if $R\sigma$ contains $R\tau$, then $R(R\sigma)=\sigma$ contains $R(R\tau)=\tau$. Thus we have proven the result when $R$ is a vertical or horizontal reflection. But these two reflections generate all eight symmetries of the square, so the result follows.

Since avoidance is just the negation of containment, it follows that transforming $\tau$ by a symmetry of the square does not change the sequence $A_n(\tau)$. Hence $\tau\sim R\tau,$ for any permutation $\tau$ and symmetry $R$ of the square. In particular, it follows that $132\sim 231\sim 312\sim 213$ and $123\sim 321$. To show that any two permutations in $S_3$ are Wilf-equivalent, it then suffices to prove $231\sim 321$. To prove this result, we will biject 231-avoiding and 321-avoiding $n$-permutations for all $n$, with another type of sequence in between.

Let $L_n$ denote the set of all sequences of integers $a_1,\dots,a_n$ such that $0\leq a_i<i$ for all $i=1,\dots,n$. For $\sigma\in S_n$ and $k\in [n]$, define $F_k(\sigma)$ to be the number of indices $i<k$ with $\sigma(i)<\sigma(k)$. Pictorially, this counts the number of filled squares occuring left-and-down from the filled square of the $k^\text{th}$ column. Define $F:S_n\rightarrow L_n$ to take $\sigma$ to the sequence $F_k(\sigma)$ for $k=1,\dots,n$. For example, $F(1342)=0121$.


Problem 3: Show that $F$ is a bijection between $S_n$ and $L_n$. 

Solution: We prove this result by induction on $n$. We know that $S_n$ has $n!$ elements. We can also see that $L_n$ has $n!$ elements, because there are precisely $k$ choices for the $k^\text{th}$ entry of a sequence in $L_n$. To prove that $F$ is a bijection, it therefore suffices to show that it is a surjection. Let $a_1,\dots,a_n$ be a sequence in $L_n$. Then $a_1,\dots,a_{n-1}$ is in $L_{n-1}$. By induction, there exists $\tau\in S_{n-1}$ with $F(\tau)=(a_1,\dots,a_{n-1})$. We will now find some permutation $\sigma\in S_n$ such that $F(\sigma)=(a_1,\dots,a_n)$. We define $\sigma(n)=a_n+1$. For $k<n$, we let $\sigma(k)=\tau(k)$ if $\tau(k)<a_n+1$. Otherwise, we let $\sigma(k)=\tau(k)+1$. Since $\tau$ is defined to be a permutation, we can see that $\sigma:[n]\rightarrow [n]$ is injective and thus $\sigma\in S_n$. Since $\sigma$ and $\tau$ are related by $\tau=\sigma(12\dots n-1),$ we also see that $F_k(\sigma)=F_k(\tau)=a_k$ for $k<n$. Note that there are precisely $a_n$ values $k\in [n-1]$ with $\sigma(k)=\tau(k)< a_n+1,$ since $\tau$ is a permutation. But if $\tau(k)\geq a_n+1$, we have $\sigma(k)=\tau(k)+1>a_n+1.$ Thus there are exactly $a_n$ indices $k<n$ with $\sigma(k)<a_n+1=\sigma(n)$. This proves that $F(\sigma)=(a_1,\dots,a_n)$. As an example, we illustrate this inductive process to find $\sigma\in S_4$ corresponding to the sequence $0121.$ 



Problem 4: Show that $\sigma\in S_n$ is 231-avoiding if and only if $F(\sigma)$ is non-decreasing.

Solution: Fix some permutation $\sigma\in S_n$. For any $j\in [n]$, we will write $I_j=\{i<j:\sigma(i)<\sigma(j)\},$ so that $F_j(\sigma)=\#(I_j)$. If there exist $1\leq j<k\leq n$ with $F_k(\sigma)<F_j(\sigma),$ then $\#(I_k)<\#(I_j)$. Hence, there exists some $i\in I_j\setminus I_k$. In other words, we have $i<j<k$ such that $\sigma(k)\leq \sigma(i)<\sigma(j)$. Because $i< k$, we have $\sigma(k)\neq \sigma(i)$ and thus $\sigma(k)<\sigma(i)$. This proves that $231=\sigma(i,j,k)$. Therefore, we have shown that $F(\sigma)$ decreasing implies that $\sigma$ contains $231$. Conversely, suppose $\sigma$ contains $231$. Then there exist $i<j<k$ with $231=\sigma(i,j,k)$. Let $k$ be as small as possible with this property. If $h\in I_k$ satisfies $h>j$, then we have $i<j<h$ and $\sigma(h)<\sigma(k)<\sigma(i)$. But then $231=\sigma(i,j,h)$, which contradicts the minimality of $k$. For any $h\in I_k$, we therefore have $h\leq j$ and $\sigma(h)<\sigma(k)<\sigma(j)$ and thus $h\in I_j$. In other words, $I_k\subset I_j$. The containment is strict, since $i\in I_j$ but $i\notin I_k$. Thus $F_k(\sigma)=\#(I_k)<\#(I_j)=F_j(\sigma)$, so $F(\sigma)$ decreases.

We define $B_n\subset L_n$ to consist of those sequences that are non-decreasing. The previous problem shows that $F$ maps 231-avoiding permutations bijectively onto $B_n$. As a high-schooler, this was as far as I could proceed. But we are now halfway to proving $231\sim 321$. To biject $B_n$ with 321-avoiding permutations, we follow Rotem's approach as mentioned in this paper. For any permutation $\sigma\in S_n$, we will recursively define a sequence $G_i(\sigma)$. Let $G_1(\sigma)=0$. If we have $\sigma(j)<\sigma(i)$ for all $j<i$, we let $G_i(\sigma)=G_{i-1}(\sigma).$ Otherwise, we define $G_i(\sigma)=\sigma(i)$. To refer to this sequence as a whole, we write $G(\sigma)$. For example, $G(154236)=004233$ because $5$ and $6$ are the only values in $154236$ greater than all their predecessors.

Problem 5: If $\sigma\in S_n$ is 321-avoiding, show that $G(\sigma)\in B_n$. 

Solution: We prove the contrapositive. If $G(\sigma)\notin B_n$, then either $G(\sigma)\notin L_n$ or $G(\sigma)$ decreases.  Suppose first that $G(\sigma)$ is decreasing. Then there exist $1\leq j<k\leq n$ with $G_j(\sigma)>G_k(\sigma)$. We choose these indices so that $j+k$ is as small as possible. Note that $G_j(\sigma)>G_k(\sigma)\geq 0$, so we must have $j>1$. The minimality assumption on $j+k$ implies that $G_j(\sigma)\neq G_{j-1}(\sigma)$ and $G_k(\sigma)\neq G_{k-1}(\sigma)$. By the definition of $G,$ we then have $G_j(\sigma)=\sigma(j)$ and $G_k(\sigma)=\sigma(k).$ Recall that $G_j(\sigma)\neq G_{j-1}(\sigma)$ also implies that there is some $i<j$ with $\sigma(i)>\sigma(j)$. Therefore, $i<j<k$ and $\sigma(i)>\sigma(j)>\sigma(k).$ Hence $321=\sigma(i,j,k)$ so $\sigma$ contains $321$. Now suppose that $G(\sigma)\notin L_n$. Then there exists some $i\in [n]$ with $G_i(\sigma)\geq i$. Choose $i$ to be as small as possible with this property. It is then clear that $i>1$ and $G_i(\sigma)>i-1>G_{i-1}(\sigma)$. By the definition of $G,$ we conclude that $\sigma(i)=G_i(\sigma)\geq i$ and there is some $j<i$ with $\sigma(j)>\sigma(i).$ For the sake of contradiction, assume that $\sigma(k)>\sigma(i)$ for all $k>i$. Then $\sigma(\{i+1,\dots,n\})\subset \{\sigma(i)+1,\dots,n\}$ and thus $n-i\leq n-\sigma(i),$ since $\sigma$ is injective. But then $i\geq \sigma(i)\geq i,$ so we must have $\sigma(i)=i$. Therefore, the sets $\{i+1,\dots,n\}$ and $\{\sigma(i)+1,\dots,n\}$ are equinumerous, so $\sigma$ must biject these sets. But this contradicts the existence of $j<i$ with $\sigma(j)>\sigma(i),$ which we have already demonstrated. Thus there must be some $k>i$ with $\sigma(k)<\sigma(i)$. It follows that $321=\sigma(j,i,k)$ and thus $\sigma$ contains $321.$ We have now covered both reasons for $G(\sigma)\notin B_n.$

We want to show that $G$ is injective, but its image does not (to the best of my knowledge) have a simple characterization. Instead, we will define a useful function $H$ in terms of some $\tau:[n]\rightarrow \{0,\dots,n-1\}.$ We partition $[n]=A\sqcup B$ so that $A$ consists of all indices $i\in [n]$ with $0<\tau(i)\neq \tau(j)$ for all $j<i.$ Let $T$ be the set of permutations $\sigma\in S_n$ with $\sigma(i)=\tau(i)$ for all $i\in A.$ Then $T\neq \emptyset$, since $\tau$ maps $A$ injectively into $[n-1]\subset [n].$ Let $B=\{b_1<\dots<b_k\}.$ Then $T$ consists of permutations in $S_n$ with $n-k$ of their values determined. Since $k$ inputs and outputs remain to be determined, elements of $T$ are in bijection with permutations in $S_k$. More concretely, every $\rho\in S_k$ corresponds to a unique $\pi\in T$ with $\rho=\pi(b_1,\dots,b_k)$. We write $H:S_k\rightarrow T\subset S_n$ to denote this bijection, so that $H(\rho)=\pi$. Note that $H$ is injective, since $\rho=\pi(b_1,\dots,b_k).$ Keep in mind that the construction of $H:S_k\rightarrow S_n$ depends on $\tau.$ For example, suppose that $\tau=22052$. Then $A=\{1,4\},$ so $H(132)=21453$ and $H(321)=24351.$


Problem 6: Show that $G$ is injective by using $H$ to construct $\sigma$ uniquely from $G(\sigma).$

Solution: Let $\tau=G(\sigma)$ and let $A,$ $B$ and $T$ be as in the construction of $H$. Applying the definition of $G,$ we can see that $\sigma(i)\neq \tau(i)$ implies either $i=1$ or $\tau(i)=\tau(i-1)$. But $\tau(1)=0,$ so we have $i\notin A$ for any $i\in [n]$ with $\sigma(i)\neq \tau(i).$ Thus $\sigma(i)=\tau(i)$ for all $i\in A.$ Hence, if $\rho\in S_n$ satisfies $G(\rho)=\tau,$ we have $\rho\in T$. We wish to prove that such a $\rho$ is unique (we have assumed that $\sigma$ is one such permutation). We consider an arbitrary $\upsilon\in S_k$ and set $\rho=H(\upsilon)\in T.$ Since $H:S_k\rightarrow T$ is a bijection, this is the same as considering an arbitrary $\rho\in T.$ Assume for the sake of contradiction that $\upsilon \neq 12\dots k$ and $G(\rho)=\tau.$ Then there exist $1\leq i<j\leq k$ with $\upsilon(i)>\upsilon(j).$ Correspondingly, we have $b_i<b_j$ and $\rho(b_i)>\rho(b_j).$ By the definition of $G,$ we have $G_{b_j}(\rho)=\rho(b_j)>0.$ Moreover, the values $G_l(\rho)$ for $l<b_j$ are either $0$ or $\sigma(p)$ for some $p\leq l<b_j.$ Since $\sigma$ is a permutation, this gives $G_{b_j}(\rho)\neq G_l(\rho)$ for any $l<b_j$. Therefore, $0<\tau(b_j)\neq \tau(l)$ for all $l<b_j,$ which implies that $b_j\in A.$ This contradicts the definition of $b_j$ as an element of $B.$ Thus there is only one $\rho\in T$ which can satisfy $G(\rho)=\tau,$ which is given by $\rho=H(12\dots k).$ But we originally defined $\tau=G(\sigma)$, so this uniqueness gives the free bonus equality $\sigma=H(12\dots k).$

Problem 7: For any $\tau\in B_n$, show that there is a 321-avoiding permutation $\sigma$ with $\tau=G(\sigma).$

Solution: Let $A,$ $B$ and $T$ be as in the construction of $H,$ where $\tau$ is as given in the problem. We define $\sigma=H(12\dots k),$ because the previous solution shows that this is the only viable option. But we don't yet know that $\tau$ is in the image of $G,$ so we cannot use the uniqueness of Problem 6 to immediately conclude that $\tau=G(\sigma).$ First, we assume for the sake of contradiction that there are $ i<j<l$ with $321=\sigma(i,j,l)$. Recall that $\tau$ is non-decreasing and $\sigma=\tau$ on $A$, so $\sigma$ is increasing on $A.$ Because $12\dots k=\sigma(b_1,\dots,b_k),$ it follows that $\sigma$ is also increasing on $B$. But $\sigma(i)>\sigma(j)>\sigma(l)$. Thus, out of the indices $i,$ $j,$ and $l,$ at most one can be in $A$ and at most one can be in $B$. Since $[n]=A\cup B$, this is clearly a contradiction. Therefore, $\sigma$ is 321-avoiding. Now assume for the sake of contradiction that $G(\sigma)\neq \tau$. We then choose the smallest $i\in [n]$ such that $G_i(\sigma)\neq \tau(i)$. Note that $i>1,$ since $G_1(\sigma)=0=\tau(1)$, both sequences being in $B_n.$ First, suppose that $i\in A$. Then $\sigma(i)=\tau(i)\neq G_i(\sigma),$ so we have $G_i(\sigma)=G_{i-1}(\sigma)$ and $\sigma(i)>\sigma(j)$ for all $j<i,$ by the definition of $G.$ Therefore $\sigma(i)\geq i,$ contradicting $\sigma(i)=\tau(i)<i$. Next, suppose that $i\in B$. Then $G_i(\sigma)\neq \tau(i)=\tau(i-1)=G_{i-1}(\sigma),$ so by the definition of $G,$ we have $G_i(\sigma)=\sigma(i)$ and some $j<i$ with $\sigma(j)>\sigma(i)$. But $\sigma$ is increasing on $B,$ so we must have $j\in A$. Thus $G_j(\sigma)=\tau(j)=\sigma(j)>\sigma(i)=G_i(\sigma),$ contradicting the fact that $G(\sigma)$ is non-decreasing. But we must have $i\in A$ or $i\in B$ and we have reached a contradiction in either case. Therefore, no such index $i$ can exist. This proves that $\tau=G(\sigma),$ as desired.

This result shows that $G$ maps 321-avoiding permutations bijectively onto $B_n$. Combining this with our final result on $F,$ this gives a bijection between 231-avoiding and 321-avoiding permutations. Therefore $A_n(231)=A_n(321)$ for every $n\in \mathbb N$. Thus $231\sim 321$ and so all 3-permutations are Wilf-equivalent!

Comments