Evaluating Gemini 3.0 Pro ‘Deep Think’ on American Mathematical Monthly Problems

January 2, 2026

The purpose of this post is to summarise my findings evaluating Gemini 3.0 Pro ‘Deep Think’ on the problems at the back of the May 2025 issue of the American Mathematical Monthly. My evaluation is available from my website, together with the transcript of all my interactions with Gemini. I used Gemini because while it’s probably second to ChatGPT 5.2 Pro at solving maths problems, I find it marginally more interesting to interact with.

I chose the May 2025 issue at random, but subject to the constraints that the deadline for submission of answers should have passed, and the solutions not yet be published. I think the problems in this issue are representative. None of them require exceptional ingenuity, and only 12535 (on binomial coefficients) could, I think, be said to be routine, even for an expert. In the table below, the headings are short for ‘Could I have done it quicker without the LLM?’, ‘Did the LLM make it fun?’ and ‘Fudge factor’, which I hope is self-explanatory. Stars mark the two problems to which I had already submitted solutions.

Image

This evaluation is of course highly subjective, but I hope you will find it of some interest. One highlight for me was when both Gemini 3.0 Pro ‘Deep Think’ and ChatGPT 5.2 ‘Extended Thinking’ rated my solution to 12535 as \alpha- for `aesthetic merit’. You can read Gemini’s solution verbatim in the evaluation report linked above.

Before embarking on this project, I was surprised that I hadn’t already seen many similar exercises from other better mathematicians … but half-way through I realised how time-consuming it was, and my surprise ceased. But Gemini and ChatGPT are tools, and like any tool, we need to learn how to use them, double-edged though they may be. In this spirit, I know I have benefitted in my career from being an enthusiast for computer algebra, and for programming; for instance Haskell changed how I think about monads.

Thinking of LLMs as just another tool in our toolkit, we should be honest about their strengths as well as pointing out their weaknesses (laughter optional, but sometimes irresistible). Some strengths I can identify are their

  • encyclopedic command of the basic literature;
  • ability to make connections and find analogies between different areas of mathematics;
  • ability, depending on the subfield, to do maths: of the seven problems, Gemini gave essentially correct solution to six.

In particular I have found Gemini and its colleagues excellent at definite integration (see 12534, but please also read the aside: ‘How LLMs sometimes cheat at definite integrals’) and at proving identities in enumerative combinatorics (see my write-up for 12535, but I have also had success with significantly harder problems). The weaknesses we all know: see my write-up of 12531 for an example of the superficially plausible ‘proof’ that they love to produce, complete with (mostly) spurious literature references, that turns out on time-consuming inspection to be fundamentally flawed. And as the ‘fun’ column shows, I am confident that in a narrow majority of cases, I would have enjoyed the problem more if I hadn’t used the LLM at all.

I think this is what really worries me: when in a year or so, state-of-the-art LLMs can reliably solve almost every American Mathematical Monthly problem, and even write aesthetically pleasing proofs of the binomial identities I like so much, what place will be left for mathematicians like me that enjoy the shallows as well as the depths of mathematics? Well, this is my problem, and there could be worse ones to have.

Finally, I hope I am not opening up a huge can of worms with this blogpost. For complete completeness, I will mention, although it is anyway on public record, that I contributed maths problems to the FrontierMath benchmark, organized by Epoch AI. I also set and solve maths problems for Surge AI; they have nothing to do with this post. I have no connection with Google or OpenAI and I no longer do any work for Epoch AI.


The q-Vandermonde Theorem by counting subspaces

October 22, 2025

Fix throughout a prime power q. We define the q-binomial coefficient \binom{d}{a}_q to be the number of a-dimensional subspaces of \mathbb{F}_q^d. If a < 0 or a > d then, by definition, the q-binomial coefficient is zero. The purpose of this post is to use this setting to prove one version of the q-Vandermonde convolution: en route we prove three results of increasing generality counting complementary subspaces.

In the proofs below we use the notation V_b for a varying b-dimensional subspace that is counted in part of the proof; if the subspace is instead fixed, often as part of the hypotheses, we use E_b.

Lemma. Let E_a be a given a-dimensional subspace of \mathbb{F}_q^{a+b}. The number of b-dimensional subspaces of \mathbb{F}_q^{a+b} complementary to E_a is q^{ab}.

Proof. Let e^{(1)}, \ldots, e^{(a+b)} be the canonical basis of \mathbb{F}_q^{a+b} of unit vectors. We may assume without loss of generality that E_a is spanned by the first a unit vectors. If V_b is a complementary subspace then V has a unique basis (up to order) of the form

u^{(1)} + e^{(a+1)}, \ldots, u^{(b)} + e^{(a+b)}

where u^{(1)},\ldots, u^{(b)} \in E_a. There are q^a choices for each of u^{(1)}, \ldots, u^{(b)} and so q^{ab} subspaces in total. \Box

If the claim about the basis is not obvious, I suggest thinking about row-reduction on the matrix whose rows are e^{(1)}, \ldots, e^{(a)} followed by a basis for V_b. We already have e^{(1)}, \ldots, e^{(a)} with pivots in the first a columns, so these rows are already perfect. Each of the remaining b columns must be a pivot column, so after row operations (including reordering) on rows a+1, \ldots, a+b we obtain a matrix

\left( \begin{matrix} I_a & 0 \\ M & I_b \end{matrix} \right)

in which different choices for the matrix M correspond to distinct subspaces V_b.

Proposition. Let E_a be a given a-dimensional subspace of \mathbb{F}_q^d. The number of b-dimensional subspaces V_b of \mathbb{F}_q^d such that E_a \cap V_b = 0 is \binom{d-a}{b}_q q^{ab}.

Proof. Observe that E_a \oplus V_b is an (a+b)-dimensional subspace of \mathbb{F}_q^n containing E_a. Such subspaces are in bijection with the b-dimensional subspace of \mathbb{F}_q^d / E_a. Hence there are \binom{d-a}{b}-choices for a subspace V_{a+b} in which E_a and the required V_b are complementary. By the previous lemma, there are q^{ab} choices for the complement V_b in V_{a+b}. \Box

In the following corollary note that, by hypothesis, all the b-dimensional subspaces we count contain the given r-dimensional subspace E_r.

Corollary. Let E_r \subseteq E_s be given r– and s-dimensional subspace of \mathbb{q}^d. The number of b-dimensional subspaces V_b of \mathbb{F}_q^n such that V_b \cap E_s = E_r is \binom{d-s}{b-r}_q q^{(b-r)(s-r)}.

Proof. By the previous proposition, applied to \mathbb{F}_q^n / E_r, there are \binom{(d-r) - (s-r)}{b-r}_q q^{(b-r)(s-r)} complementary subspaces V_{b-r}/E_r to E_s / E_r. Now again use that subspaces of the quotient space \mathbb{F}_q^n/E_r are in bijection with subspaces of \mathbb{F}_q^n containing E_r. \Box

Theorem. [q-Vandermonde theorem] We have

\binom{d}{b}_q = \sum_k q^{k(d-\ell-b+k)} \binom{\ell}{k}_q \binom{d-\ell}{b-k}_q.

Proof. Fix a (d-\ell)-dimensional subspace E_{d-\ell} of \mathbb{F}_q^n. We count subspaces V_b \subseteq \mathbb{F}_q^n having intersection of dimension b-k with E_{d-\ell}; by the corollary, taking r = b-k and s = d-\ell, there are \binom{\ell}{k}_q q^{k(d-\ell-b+k)} such subspaces. Summing over k gives the result. \Box

In the final step, we may assume that k \in \{b-d+\ell, \ldots, m\}, else V_b cannot have an intersection of dimension b-k with E_{d-\ell}. Correspondingly, the summands for k not in this range in the q-Vandermonde theorem are zero.

Corollary [q-Vandermonde theorem, alternative version] We have

\binom{d}{b}_q = \sum_k q^{(b-k)(\ell-k)}\binom{\ell}{k}_q\binom{d-\ell}{b-k}_q

Proof. Substitute \ell' = d- \ell and k' = b - k in the original version, cancel in a few places, swap the order of the binomial coefficients in each product, and then erase the primes. \Box


The continuation monad: a mathematical introduction

August 17, 2025

In the Haskell category, objects are types, and morphisms are Haskell functions. The continuation monad for a fixed return type r is the functor m sending a type a to the type m a of functions (a -> r) -> r. The purpose of this post is to use vector space duality to motivate the definitions of the monadic operations return :: a -> m a and join :: m m a -> m a. We show that

  • return, the unit in the monad, is the analogue of the canonical inclusion of a vector space in its double dual, and
  • monad multiplication is the analogue of the canonical surjection V^{\star\star\star\star} \twoheadrightarrow V^{\star\star} induced by applying the unit on V^\star to get a canonical inclusion V^\star \hookrightarrow V^{\star\star\star}.

Having constructed the continuation monad, and checked one of the two monad laws, we then use the Haskell identity

ma >>= f = join (fmap f ma)

to define the monadic bind operation. (Here fmap :: (a -> c) -> m a -> m c is what the continuation monad does on morphisms.) By unpacking this definition into one that is self-contained, we interpret monadic bind, usually denoted >>=, as a way to compose computations written in continuation passing style.

Thus the order in this post is the opposite of most (all?) of the many good introductions to the continuation monad that can be found on the web: I hope this will make this post of interest to mathematicians, and also perhaps to the expert computer scientists as an illustration of how a non-expert mathematician came (at last) to some understanding of the ideas. We end with a few examples of Haskell programs written in the continuation monad, or equivalently, in continuation passing style.

A Haskell module with all the Haskell below and some extras is available from my website.

The double dual monad

Fix a field \mathbb{F} and let Vect be the category of \mathbb{F}-vector spaces. There is a canonical inclusion V \hookrightarrow V^{\star\star} defined by

v maps to ‘evaluate at v

or in symbols

v \stackrel{\eta_V}{\longmapsto} [\theta \mapsto \theta(v)]\qquad(\star)

for v \in V and \theta \in V^\star. Thus (\star) defines a natural transformation \eta from the identity functor on Vect to the double dual functor T. This will turn out out to to be the unit in the double dual monad.

Remark. In general V^{\star\star} is a vastly bigger vector space than V: for instance if V is the vector space of all \mathbb{F}-valued sequences with finite support, then V^{\star} is the vector space of all \mathbb{F}-valued sequences. Thus V has a canonical countable basis, whereas and V^\star has dimension equal to the continuum \mathfrak{c}. Moreover V^{\star\star} has dimension 2^\mathfrak{c}. (Even the existence of bases for V^{\star} and V^{\star\star} depends on set theoretic foundations: when \mathbb{F} = \mathbb{F}_2 it is implied by the Boolean Prime Ideal Theorem, a weak form of the Axiom of Choice.) For this reason we have to reason formally, mainly by manipulating symbols, rather than use matrices or bases. But this is a good thing: for instance it means our formulae for the monad unit and monad multiplication translates entirely routinely to define the continuation monad in the category Hask.

The unit tells us how to multiply

To make the double dual functor a monad, we require multiplication, that is, a natural transformation \mu : T^2 \rightarrow T. Thus for each vector space V, we require a ‘naturally defined’ linear map \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}. As already said, we obtain this as the dual map to \eta_{V^\star} : V^\star \rightarrow V^{\star\star\star}. That is \mu_V = \eta_{V^\star}^\star.

This needs some unpacking. Explicitly, \eta_{V^\star} is the linear map

\theta \stackrel{\eta_{V^\star}}{\longmapsto} [\phi \mapsto \phi(\theta)]

for \theta \in V^{\star} and \phi \in V^{\star\star}. Generally, given a linear map f : U \rightarrow W, the dual map is given by `restriction along f‘: that is, we regard a dual vector \theta \in W^\star as an element of U^\star by mapping U into W by f, and then applying \theta; in symbols f^\star (\theta) = \theta \circ f. Therefore the dual map to \eta_{V^\star}, which is the V component of the natural transformation \mu, is

w \stackrel{\mu_V}{\longmapsto} [\theta \mapsto w(\eta_{V^\star}(\theta))] = [\theta \mapsto w([\phi \mapsto \phi(\theta)])]  \quad (\dagger)

If you have an intuitive feeling for what this means, well great. I hope in a later post to show that \mu has an intuitive interpretation in the case \mathbb{F} = \mathbb{F}_2 using propositional logic to identify elements of

\mathcal{P}(\mathcal{P}(X)) \leftrightarrow \{0,1\}^{\mathcal{P}(X)} \leftrightarrow \{0,1\}^{\{0,1\}^X}

with boolean formulae in variables indexed by X; multiplication then becomes the observation that interpreting variables in a boolean formula as yet further boolean formulae does not leave the space of boolean formulae. Here I will note that the definition of \mu_V in (\dagger) does at least typecheck: \phi \in V^{\star\star} so the map \phi \mapsto \phi(\theta) is an element of V^{\star\star\star} and therefore in the domain of \Phi : V^{\star\star\star} \rightarrow \mathbb{F}. Naturality of \mu follows very neatly from the naturality of \eta by duality: see the commutative diagrams below.

Image

But why is it a monad?

What we haven’t shown is that the triple (T, \eta, \mu) defines a monad. For this we need to check the monad laws shown in the commutative diagrams below.

Image

The first is the identity law that \mu_V \circ \eta_{V^{\star\star}} = \mathrm{id}_{V^{\star\star}}. This has a short proof, which, as expected from the remark about formalism above, come down to juggling the order of symbols until everything works out. Recall that \eta_{U} is the canonical embedding of U into U^{\star\star}. Thus if we take \phi \in V^{\star\star} then \eta_{V^{\star\star}}(\phi) is the element of V^{\star\star\star\star} defined by

\Psi \mapsto \mathrm{ev}_\phi(\Psi) = \Psi(\phi).

Moreover, \mu_V is defined by regarding elements of V^{\star\star\star\star} as elements of V^{\star\star} by restriction along the canonical inclusion \eta_{V^\star} : V^{\star} \rightarrow V^{\star\star\star}. This map sends \theta \in V^{\star} to the linear map [\phi \mapsto \phi(\theta)] in V^{\star\star\star}. Therefore

\begin{aligned} (\mu_V \circ \eta_{V^{\star\star}})(\phi) &= \mu_V([\Psi \mapsto \Psi(\phi)]) \\&= [\theta \mapsto (\eta_{V^\star}(\theta))(\phi)] \\ &= [\theta \mapsto \phi(\theta)] \\&= \phi \end{aligned}

and so \mu_V \circ \eta_{V^{\star\star}} is indeed the identity of V^{\star\star}.

A proof in this style of the second associativity law seems distinctly fiddly. Making the feeble excuse that there is no ‘super case’ Greek alphabet to represent elements of V^{\star\star\star\star\star}, I will omit it. I plan in a sequel to this post to show that T(V) = V^{\star\star} is the monad induced by the adjunction betweeen the contravariant dual and itself; the identity monad law then becomes the triangle law (\eta D) \circ (D\epsilon) = G for the right-hand functor D : \mathbf{Vect} \rightarrow \mathbf{Vect}^\mathrm{op} in the adjunction, and the associativity law becomes an immediate application of the interchange law. If time permits I’ll include the proof by string diagrams: to my surprise I can’t immediately find this on the web, except in this nice video from the Catsters: see the blackboard at the 9 minute mark.

But how could it not be a monad?

As a final remark, for either law one could argue `how else could it possibly work?’ Thus in first, we know that (\mu_{V} \circ \eta_{V^{\star\star}})(\phi) is some kind of linear map V^{\star} \rightarrow \mathbb{F}, and since all we have to hand is \phi \in V^{\star\star}, maybe it should be obvious from naturality that the map is \phi itself? It would be intriguing if this could be made into a rigorous argument either working in Vect or directly in Hask, perhaps using ideas from Wadler’s paper Theorems for free.

The continuation monad in Haskell

We now translate the definition of the unit and multiplication from the double dual monad into Haskell. This is completely routine. For the unit, we translate \eta_V :V \rightarrow V^{\star\star}, as defined in (\star) by \eta_V(v) = \mathrm{ev}_v to

return x = \k -> k x -- return :: a -> ((a -> r) -> r)

It might be more idiomatic to write this as return = flip ($). Either way we get return :: a -> ((a -> r) -> r).

You can join if you can return

For multiplication, we translate \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}, as defined in (\dagger) by \mu_V(w) = [\theta \mapsto w(\eta_{V^\star}(\theta))] to

join :: ((((a -> r) -> r) -> r) -> r) -> ((a -> r) -> r)
join w = \k -> w (return k)

Note that this uses return, capturing the sense seen above in which the unit for the double dual monad tells us how to define multiplication. Unpacked to be self-contained join becomes

join w = \k -> w (\φ -> φ k)

which thanks to Haskell’s support for Unicode is still valid code. It is a useful exercise to prove the identity monad law by equation-style reasoning from these definitions.

You can bind if you can join

We now use the Haskell identity ma >>= f = join (fmap f ma) mentioned in the introduction to define bind. First of course we must define fmap, defining the Functor instance for the continuation monad. For the double dual functor with f : V \rightarrow W, we have f^\star (\theta) = \theta \circ f and so f^{\star\star} (\phi) = \phi \circ f^\star or unpacked fully,

f^{\star\star}\phi = [\theta \mapsto \phi(f^\star (\theta))] = \theta \mapsto \phi(\theta \circ f)

We therefore define

fmap :: (a -> b) -> ((a -> r) -> r) -> (b -> r) -> r
fmap f run_a = \k_b -> run_a (k_b . f)

and the compiler will now accept

bind run_a a_to_run_b = join $ fmap a_to_run_b run_a

(For why run_a is a reasonable name for values of the monadic type (a -> r) -> r see the final part of this post.) It is now a mechanical exercise to substitute the definitions fmap run_a = \k_b -> run_a (k_b . f) and then join w = \k -> w (\φ -> φ k) to get a self-contained definition, which we tidy up in three further stages.

bind1 run_a a_to_run_b = join $ \k_b ->
           run_a (k_b . a_to_run_b)
      where join w = \k -> w (\φ -> φ k)

bind2 run_a a_to_run_b = \k -> (\k_b ->
      run_a (k_b . a_to_run_b))(\φ -> φ k)

bind3 run_a a_to_run_b =
      \k -> run_a ((\φ -> φ k) . a_to_run_b)

bind4 run_a a_to_run_b =
      \k -> run_a (\x -> (a_to_run_b x) k)

In the final line, the parentheses around a_to_run_b x could be dropped. Incidentally, with the DeriveFunctor extension, ghc can derive the declaration for fmap just seen.

But don’t monads need an algebraic data type?

Answer: yes if we want to take advantage of Haskell’s do notation, and be consistent with Haskell expected types for return, fmap, join and >>=. Below we define an algebraic data type RunCont with inner parameter r for the return type and outer parameter a, so that the ‘curried type’ RunCont r has kind * -> *, as a monad should. The constructor is called RC (it is also common to reuse the type name as the constructor) and the ‘wrapped value’ can be extracted using runCont :: RunCont r a -> (a -> r) -> r.

newtype RunCont r a = RC {runCont :: (a -> r) -> r}

For instance if a_to_run_b :: a -> RunCont r b then runCont (a_to_run_b x) has type (b -> r) -> r so is the correct replacement for a_to_run_b in bind4. Wrapping return is even more routine, and we end up with

instance Monad (RunCont r) where
  return x = RC (\k -> k x)
  RC run_a >>= a_to_run_b = RC (\k ->
    run_a (\x -> runCont (a_to_run_b x) k))

It is a nice fact that bind determines both fmap and join, so, after the unpacking above to avoid using fmap in the definition of bind, everything we need is in the declaration above. That said, versions of Haskell after 7.10 require separate instance declarations for Functor, which to illustrate this point we define by

instance Functor (RunCont r) where
  fmap f (RC run_a) = (RC run_a) >>= (return . f)

and also for Applicative, which can be derived from bind, but since I find the result completely unintuitive, instead we’ll use the usual boilerplate

instance Applicative (RunCont r) where
  pure = return
  () = ap -- ap :: Monad m => m (a -> b) -> m a -> m b

recovering this instance from the Monad declaration. Incidentally, the functor declaration also has a ‘boilerplate’ definition; fmap = liftM is the ubiquitous boilerplate. I can’t easily find on the web why the ghc authors make this codebreaking change (the page on the Haskell wiki has a dead link for a 2011 proposal along these lines that was rejected) but surely the rise of programming in the applicative style has something to do with it.

Why newtype?

I used newtype rather than data above to make RC strict in its value. Counterintuitively (to me at least) this means that overall the RunCont monad is lazier, because the compiler does not need to inspect the argument to RC to check it has type (a -> r) -> r. Ever since I implemented the Random state monad with data rather than newtype, and many weeks later, found puzzling instance of what seemed like Haskell being over-eager to evaluate, I have been able to remember this distinction. An alternative to using newtype, which emphasises this point, is data RunCont r a = RC {!runCont :: (r -> a) -> a}. Note the explicit strictness declaration. There is a further benefit to newtype over data, that the strictness and unique constructor together mean that the compiler can erase the constructors and so generate code that has no overhead; a related fact, which at first I found rather surprising, is that fully compiled Haskell code has ‘no remembrance of types past’.

Continuation passing style is the computational model of the continuation monad

Not before time we mention that the type a -> r is that of continuations: functions that take as input type a and return the desired final output type of the entire computation, which we always denote r.

Callbacks

Such functions arise naturally as callbacks: ‘function holes’ left in the anticipation that another author or library will be able to fill in the gap. Consider for instance the following program intended to be useful when checking the Fibonacci identity \sum_{k=0}^n F_k = F_{n+2} - 1.

fibSum n = sum [fib k | k <- [0..n]]

Of course this won’t compile, because there is no fib function to hand. Rather than take time off to write one, we instead press on, by bringing fib into scope in the simplest possible way:

run_fibSum n fib = sum [fib k | k | k <- [0..n]]
                   :: Int -> (Int -> Integer) -> Integer

Here I’ve supposed fib has type Int -> Integer; this is reasonable because the answer might be more than 2^{64}, but surely the input won’t be, but the true reason is to make a distinction between the types a (here Int) and r (here Integer). Thus run_fibSum is a way of running the continuation fib to give the answer we want, and \n -> RC (run_fibSum n) is a value of type Int -> RunCont Int Integer, so suitable for use in our monad, most obviously as the second argument to bind >>=. For a minimal example, with exactly the same structure, take

run_fib n fib = fib n
    -- run_fib :: Int -> (Int -> Integer) -> Integer
run_fib' n = \fib -> fib n
    -- run_fib' :: Int -> (Int -> Integer) -> Integer
run_fib'' n = RC (\fib -> fib n)
    -- run_fib'' :: Int -> RC Int Integer

where the final function could have been written simply as return n; as we have seen many times, the unit in the continuation monad is ‘x maps to evaluate at x‘. In this form we see clearly that run_fib doesn’t really care that the intended continuation is the Fibonacci function; it could be any value of type Int -> Integer. I hope this explains why I’ve used RunCont as the name for the monadic type: its wrapped values are continuation runners of type (a -> r) -> r, and not continuations, of type a -> r. To my mind, using Cont is as bad as conflating elements of V^{\star} (continuations) with elements of V^{\star\star} (continuation runners).

Continuation passing style

Consider the following very simple Haskell functions addOne x = x+1 and timesTwo y = y * 2 of intended type Int -> Int. We can compose them to get timesTwoAfterAddOne = timesTwo . addOne. To make life harder, we imagine that addOne has to control the entire future course of our program, and to help it out, we’ll give a suitably extended version of it the continuation timesTwo as the second argument:

addOneCPS x timesTwo = timesTwo (x + 1)

and so timesTwoAddOne x = addOneCPS x timesTwo. Of course this generalizes immediately to any continuation of type Int -> r; below I’ve renamed timesTwo to k (as in the derivation of bind above), but the function and type are the same:

addOneCPS :: Int -> (Int -> r) -> r
addOneCPS x k = k (x + 1)

Similarly we can write addTwo in this style,

timesTwoCPS :: Int -> (Int -> r) -> r
timesTwoCPS y k = k (y * 2)

and now a composed function, still in CPS style so having a continuation that might perhaps be as simple as id :: Int -> Int (when r is Int) or show :: Int -> String (when r is String), is

timesTwoAfterAddOneCPS x k = addOneCPS x
                              (\y -> timesTwoCPS y k)

Even allowing for the extra argument, this is far less attractive than the simple composition timesTwo . addOne we had earlier. But I am sure that by now you see where this is leading. Above our final version of monadic bind for the continuation monad (written without an algebraic data type) was, changing the dummy variable from x to y to avoid an imminent clash,

bind4 run_a a_to_run_b = \k ->
    run_a (\y -> (a_to_run_b y) k)

Substitute addOneCPS x for run_a and timesTwoCPS for a_to_run_b and get

bind4 (addOneCPS x) timesTwo = \k -> (addOneCPS x)
    (\y -> (timesTwoCPS y) k)

which is the \eta-expansion of timesTwoAfterAddOneCPS x. Therefore, when wrapped in the appropriate constructors, we can write timesTwoAfterAddOneCPS x = (addOneCPS x) >>= timesTwoCPS, and even use do notation:

timesTwoThenAddOneCPS x =
  do y <- addOneCPS x
     -- addOneCPS x = RC (\k -> k (x + 1))
     timesTwoCPS y
     -- timesTwoCPS y = RC (\k -> k (y * 2))

Recursion in continuation passing style

I’ll use the sum function as an example, even though it does not show Haskell in its best light. In fact, it is a notable ‘gotcha’ that both naive ways to write it use an amount of memory proportional to the length of the list. Worst of all is the right-associative fold

sumR xs = foldr (+) 0 xs

which means that sumR [1,2,3,4] expands to 1 + (2 + (3 + (4+0))) and then 4+0 is evaluated, and so on. (This can be checked using the linked Haskell code, which puts a trace on plus.) Still bad, but more easily fixed, is the left-associative fold

sumL xs = foldl (+) 0 xs

which means that sumL [1,2,3,4] expands to (((0+1) + 2) + 3) + 4, and then 0+1 is evaluated, and so on. The correct way to write it uses a strict left-fold, available as foldl'; one can write ones own using the primitive seq, which tells the compiler ‘I really want the first value computed to WHNF before you do anything more’

foldl' op init [] = init
foldl' op init (x : xs) = let next = init `op` xs
    in seq next (foldl' op next xs)

One could also replace init above with !init to force evaluation without using seq. Either way, the init value functions as an accumulator for the sum. In this connection it is worth noting that, for the sum function on common numerical types, WHNF means complete evaluation, but in general one has to fight harder to force full evaluation. For more on these issues see the post beginning To foldr, foldl or foldl’, that is the question! on the Haskell wiki.

The right-associative fold is easily converted to CPS-style:

sumRCPS [] k = k 0
sumRCPS (x : xs) k = sumRCPS xs (k . (+ x))

Thus if the list is empty we apply the continuation to 0, and otherwise we modify the continuation so that it first adds x, then does whatever it was going to do before. Thus sumCPS [1,2,3,4] id evaluates to sumCPS [2,3,4] (id . (+1)), and then in three further steps to sumCPS [] (id . (+1) . (+2) . (+3) . (+4)); now, as already seen, the continuation constructed above is evaluated 0, giving id . (+1) . (+2) . (+3) . (+4) $ 0 which in turn evaluates to (((0 + 4) + 3) + 2) + 1). Note that although this might look like a left-associative fold, it does the same computation as the right-associative fold.

The only way to convert the left-associative fold to CPS-style that I can see does all the work outside the continuation

sumLCPS [] acc k = k acc
sumLCPS (x : xs) acc k = sumLCPS xs (acc + x) k

where again either seq or a strictness annotation !acc is needed to avoid a space blow-up. So rather disappointingly, CPS-style is no help at all in writing an efficient sum function.

An example where CPS-style is useful

The function takeUntil has the following naive implementation:

takeUntil :: (a -> Bool) -> [a] -> Maybe [a]
takeUntil p [] = Nothing
takeUntil p (x : xs)
    | p x = Just []
    | otherwise = x `mCons` takeUntil p xs
  where x `mCons` Nothing = Nothing
        x `mCons` Just ys = Just (x : ys)

For example, takeUntil (> 4) [1,2,3] evaluate to 1 `mCons` 2 `mCons` 3 `mCons` Nothing, and then to 1 `mCons` 2 `mCons` Nothing, then to 1 `mCons` Nothing and finally to Nothing. (Nothing will come of nothing, as King Lear so inaptly said.) Using continuation passing style, we can short-circuit the evaluation in the ‘Nothing’ case.

takeUntilCPS :: (a -> Bool) -> [a] ->
    (Maybe [a] -> Maybe [a]) -> Maybe [a]
takeUntilCPS p [] _ = Nothing
takeUntilCPS p (x : xs) k
    | p x = k (Just [x])
    | otherwise = let k' (Just ys) = k $ Just (x : ys)
                  in (takeUntilCPS p xs k')

Note we do not need a pattern match on ‘Nothing’ to define k' because if the input was Nothing then we’d have discarded the continuation and immediately returned Nothing. This is the first and only example in this post where we use the freedom in continuation passing style not just to modify the continuation but to discard it entirely.

Final thought on continuations

Finally, as far as I can see, continuation passing style isn’t really very important in Haskell, and some comments I’ve read suggest it can even be harmful: the linked answer mentions that for GHC,

‘The translation into CPS encodes a particular order of evaluation, whereas direct style does not. That dramatically inhibits code-motion transformations’.

On the other hand, continuations are excellent for stretching the mind, and illuminating how programming languages really work. Hence, no doubt, their ubiquity in computer science courses, and the many tutorials on the web. In this spirit, let me mention that, as in the Yoneda Lemma, if we get to pick the return type r in RunCont r a then there is a very simple hack to extract show that any continuation runner really is a hidden value of type a (this is a common metaphor in web tutorials), which we can extract by

counit :: RunCont a a -> a
counit (RC run_a) = run_a id

This might suggest an alarmingly simple way to implement the continuation monad: just choose the output type r to be a, extract the value, and do the obvious. But of course the code below only type checks if we (as a text substitution) replace r with a, and so, as I’ve seen many times, somehow the syntactical type checker can guide us away from what is semantically incorrect.

instance Monad (RunCont r) where
  return x = RC (\k -> k x)
  (RC run_a) >>= g = let x = run_a id in g x
  -- • Couldn't match type ‘a’ with ‘r’
       ‘a’ is a rigid type variable bound by ...


Sampling from the tail end of the normal distribution

February 3, 2025

The purpose of this post is to present two small observations on the normal distribution, with more speculative applications to academic recruitment. The mathematical part at least is rigorous.

Tails and the exponential approximation

We can sample from the tail end of the standard normal distribution N(0,1) by sampling from N(0,1) as usual, but then rejecting all samples that are not at least some threshold value t. From my mental picture of the bell curve, I think it is intuitive that the mean of the resulting sample will be not much more than t. In fact the mean is

\displaystyle t + \frac{1}{t} - \frac{2}{t^3} + \mathrm{O}(\frac{1}{t^4}).

Please pause for a moment and ask yourself: do you expect the variance of the tail-end sample to increase with t or instead to decrease with t?

The answer, less intuitively I think, is that the variance decreases as 1/t^2. More precisely, it is

\displaystyle \frac{1}{t^2} - \frac{6}{t^4} + \mathrm{O}(\frac{1}{t^6}).

Thus, the larger t is, the more concentrated is the sample. Given that one expects tail events to be rare, and so perhaps more variable, this is maybe a surprise. One can derive the series expansions above by routine but fiddly calculations using the probability density function \frac{1}{\sqrt{2\pi}} \exp(-x^2/2) of the standard normal distribution.

Alternatively, and I like that this can be done entirely in one’s head, observe that for large t and small h, we have

\displaystyle \frac{1}{\sqrt{2\pi}} \exp(-(t+h)^2/2 ) \approx \frac{1}{\sqrt{2\pi}} \exp(-t^2) \exp(-th)

and so the normal distribution conditioned on x \ge t should be well approximated by an exponential distribution with parameter t, shifted by t. The probability density function of the unshifted exponential distribution with parameter t is t \exp(-th), and, correspondingly, as is well known, its mean and standard deviation are both 1/t. Thus this approximation suggests that the mean of the tail distribution should be t + 1/t and its variance 1/t^2, agreeing with the two asymptotic series above.

To make this explicit, the table below shows the percentage of ‘outlier’ samples that are t+1 or more standard deviations from the mean, in a sample conditioned to be at least t standard deviations from the mean.

t 0 1 2 3 4 5
\% outliers 31.7 14.3 5.9 2.3 0.9 0.3

The University of Erewhon

In the selection process for the University of Erewhon, applicants are invited to sit a fiendishly difficult exam. The modal score is zero (these candidates are of course rejected) and the faculty recruits the tiny minority who exceed the pass mark. The result, of course, is that the faculty recruits not the best mathematicians, but those best at passing their absurd test. (Nonetheless there is a long queue at the examination centre each year: people want jobs!) But more deleteriously, the effect is to create a monoculture: modelling exam-taking ability as an N(0,1) random variable, the analysis above shows that sampling with threshold t gives a distribution in which almost everyone scores about t + 1/t. (We saw the standard deviation was 1/t up to terms of order 1/t^2.) And of course t is chosen so large that only those able to devote years of their life to preparing for the exam, to the exclusion of all other activities, have any chance of passing. And thus the dispririting result is that all candidates are not only rather alike in their test-taking ability, but the selection by this one statistic creates a culture entirely dominated by the branches of mathematics and the personality traits favoured by the test. The table above shows it also creates the delusion that most people are about as good at maths (or at least at the Erewhonian test) as you: for example, if t = 3 and you have 50 colleagues, then since only 2.3% of the conditioned sample are outliers, lying in the \ge 4 tail, there is a good chance that none of your colleagues is more than 1 standard deviation ahead of you in ability.

I have gone through selection processes in my life that had something in common with this obvious parody (often I fear unjustly benefitting), and believe I can write from experience that the result has never been good. The alternative of course is to use multiple criteria when recruiting, allowing that some successful candidates may be weak in some areas, while very strong in others. Rather obvious, I know, but I think it’s interesting that a very simple mathematical argument backs up this conclusion.

In the next part of the post we’ll see that the effect of luck can mean that Erewhonian style recruitment not only creates a monoculture, it doesn’t even successfully select the most talented candidates.

The effect of luck

If our intuition about tail events is perhaps suspect, our intuition for conditional probability is surely worse. When the two come together, by conditioning on rare events, the effects can be catastrophic. Selection bias is frequently overlooked even by those who should know better. It seems quite possible to me that we’re here today only because in the many worlds in which the delicate balance of nuclear deterrence failed, there is no-one left in the charred embers to read blog posts on selection bias. Or to pick a less extreme example, would you spend £1000 to attend a seminar given by a panel of lottery winners in which they promise to reveal the detail of their gambling strategies?

Returning to the normal distribution, and the University of Erewhon, suppose that with probability 1-p we sample from N(0,1) (people with average luck), and with probability p we sample from N(0,1) + \ell (lucky people). Suppose that \ell is small, say \ell = 1/2, so the effect of luck is relatively modest, boosting one’s performance by just 1/2 a standard deviation. As before, we then reject all samples below the threshold t. Does the effect of luck become more or less significant as t increases?

I think here most people would intuitively guess the right answer, but the magnitude of the effect is striking. The table below shows the number of ‘average’ and ‘lucky’ people when we take a million samples according to the scheme above, keeping only the samples that are at least t. We suppose that there is a probability p = 1/5 of being lucky. The final column is the ratio of lucky to average people.

t average lucky ratio
1 127064 61573 0.48
1.5 53375 31774 0.60
2 18361 13070 0.71
2.5 4924 4562 0.93
3 1054 1176 1.12
3.5 187 250 1.34
4 24 51 2.13

Thus even though average people outnumber lucky people four to one, for a ratio of 0.25, when we sample from the t \ge 3 tail of the distribution (with, as just explained, a boost of 0.5 for lucky people), lucky people predominate. The effect is that the University of Erewhon fails in its goal of recruiting people three standard deviations above the mean: instead of the 2230 successful candidate, 1176 are people who are merely about 2.5 standard deviations above the mean, but happened to get lucky.

The effect is robust to different models for luck. For instance suppose, to be fair, we allow people to have a chance to be unlucky, and, to counterbalance this, model luck as an N(0,1/4) random variable, so the standard deviation is 1/2, equal to the effect size in the previous model. Since the sum of N(0,1) and N(0,1/4) random variables has the N(0,5/4) distribution, the effect is to increase the variance, while leaving the mean unchanged. The graphs below show the probability density function for the original N(0,1) random variable, conditioned on the sum being in the \ge t tail, for t \in \{2.5, 3, 3.5, 4\}.

Image

Observe that the density functions become slightly more concentrated, and while they shift to the right, the means are 2.31109, 2.67471, 3.04521 and 3.42096, significantly less in all cases than the threshold t. So, once again, the most obvious effect of being more selective is to increase the importance of being lucky, and we see the monoculture effect from the first scenario. The importance of luck is confirmed by the graph below showing the probability density function for luck, again conditioned on the sum being in the \ge t tail.

Image

It is striking that even for t = 2.5 (the blue curve) only 10.09% of the conditional probability density function is in the negative half. Almost everyone recruited gets lucky, most by at least one standard deviation from the mean and many by two or more standard deviations from the mean. The moral is again clear: unless you believe (as is sometimes inaccurately said of Napoleon) that luck is an intrinsic life-long characteristic that you genuinely want to select for, use multiple criteria when recruiting.


Lifting set/multiset duality

November 23, 2024

This is a pointer to a note in the same spirit as the previous post. This time we take n \in \mathbb{N} and lift the identity

\sum_{k=0}^n (-1)^k \left(\!\binom{n}{k}\!\right) \binom{n}{k} = 0

where \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} is the number of k-multisets of a set of size n to a long exact sequence of polynomial representations of \mathrm{GL}_n(F), working over an arbitrary field F. We then descend to get quick proofs of generalizations of the symmetric function identity

\sum_{r=0}^n (-1)^r h_{n-r}(x_1,\ldots, x_D)e_r(x_1,\ldots, x_D) = 0

for D \ge n and the corresponding q-binomial identity, obtained by specializing at 1,q, \ldots, q^{D-1}, namely

\sum_{r=0}^n (-1)^r q^{r(r-1)2} \genfrac{[}{]}{0pt}{}{n-r+D-1}{n-r}_q  \genfrac{[}{]}{0pt}{}{D}{r}_q = 0.


Lifting the ‘stars and bars’ identity

September 21, 2024

Let \left(\!\binom{n}{k}\!\right) denote the number of k-multisubsets of an n-set.
A basic result in enumerative combinatorics is that

\left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} \qquad (1)

One proof of this uses the method of stars and bars. The link will take you to Wikipedia; for a similar take on the method, interpreting \left(\!\binom{n}{k}\!\right) as the number of ways to put k unlabelled balls into n numbered urns, see Theorem 2.3.3 in my draft combinatorics textbook.

The purpose of this post is to give a combinatorial proof of the q-binomial version of this identity and then to show that this q-lift can be restated as an equality of two symmetric functions specialized at the powers of q, and so is but a shadow of an algebraic isomorphism between two representations of \mathrm{SL}_2(\mathbb{C}). I finish by explaining how my joint work with Eoghan McDowell shows that this isomorphism exists for representations of \mathrm{SL}_2(\mathbb{F}) over an arbitrary field \mathbb{F}, and then outlining a beautiful construction by Darij Grinberg of an explicit isomorphism, which (deliberately) agrees with the one in our paper. Each lifting step, indicated by increasing equation numbers, gives a more conceptual, and often more memorable result. It is not yet clear what formulation might deserve to be called the ‘ultimate lift’ of the stars and bars identity.

This earlier post has much more on the background multilinear algebra, and representation theory.

Subsets are enumerated by q-binomial coefficients

Given a subset X of \mathbb{N}_0 , we define its weight, denoted \mathrm{wt}(X) , to be its sum of entries. We shall adopt a slightly unconventional approach and define the q-binomial coefficient \genfrac{[}{]}{0pt}{}{n}{k}_q \in \mathbb{C}[q] to be the enumerator by weight of the k-subsets of \{0,1,\ldots, n-1\}, normalized to have constant term 1. That is,

\genfrac{[}{]}{0pt}{}{n}{k}_q = q^{-k(k-1)/2} \sum_{X \subseteq \{0,1,\ldots, n-1\}} q^{\mathrm{wt}(X)}.

(This property of q-binomial coefficients appears to me to be fundamental, but it is not always stressed in introductory accounts.) Set \genfrac{[}{]}{0pt}{}{n}{k}_q = 0 if n or k is negative. Clearly \genfrac{[}{]}{0pt}{}{n}{k}_q is a polynomial in q. For example

\begin{aligned} \genfrac{[}{]}{0pt}{}{4}{2}_q&= q^{-1}(q^{0+1} + q^{0+2} + q^{0+3} + q^{1+2} + q^{1+3} + q^{2+3}) \\ &= 1 + q + 2q^2 + q^3 + q^4, \end{aligned}

and \genfrac{[}{]}{0pt}{}{n}{0}_q = \genfrac{[}{]}{0pt}{}{n}{n}_q = 1 and \genfrac{[}{]}{0pt}{}{n}{1}_q  = 1 + q + \cdots + q^{n-1} for all n \in \mathbb{N}_0. We now recover the usual definition of \genfrac{[}{]}{0pt}{}{n}{k}_q. As a preliminary lemma we need a q-lift of Pascal’s Rule \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}, which we prove by adapting the standard bijective proof.

Lemma. If n \in \mathbb{N} and k \in \mathbb{N}_0 then

\genfrac{[}{]}{0pt}{}{n}{k}_q = q^{n-k} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q  + \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Proof. If k = 0 then the result holds by our convention that \genfrac{[}{]}{0pt}{}{n-1}{-1}_q = 0 so we may suppose that k \ge 1. Removing n-1 from a k-subset of \{0,1,\ldots,n-1\} containing n-1 puts such k-subsets in bijection with the k-1 subsets of \{0,1,\ldots, n-2\}, enumerated by \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q. This map decreases the weight by n-1. The k-subsets of \{0,1,\ldots, n-1\} not containing n-1 are in an obvious weight preserving bijection with the k-subsets of \{0,1,\ldots, n-2\}, enumerated by \genfrac{[}{]}{0pt}{}{n-1}{k}_q. Therefore,

q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q = q^{n-1} q^{(k-1)(k-2)/2} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Cancelling q^{k(k-1)/2}, using that

(k-1)(k-2)/2 + k-1 = k(k-1)/2,

we obtain the required identity. \Box

For n \in \mathbb{N}_0, we define the quantum number [n]_q by [n]_q = 1 + q + \cdots + q^{n-1} and the quantum factorial [n]_q! by [n]_q! = [n]_q[n-1]_q \ldots [1]_q. Observe that specializing q to 1 these become n and n!, respectively. (See the section on characters below for the connection with the alternative convention in which quantum numbers are Laurent polynomials.) Note that in the following proposition, if k = 0 then both products are empty and so \genfrac{[}{]}{0pt}{}{n}{0}_q = 1 for all n \in \mathbb{N}_0.

Proposition. If n, k \in \mathbb{N}_0 then

\displaystyle\genfrac{[}{]}{0pt}{}{n}{k}_ q = \frac{[n]_q[n-1]_q \ldots [n-k+1]_q}{[k]_q!}.

Proof. Working by induction on n+k using the previous lemma we have

\begin{aligned} \genfrac{[}{]}{0pt}{}{n}{k}_q &=  q^{n-k}\genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + \genfrac{[}{]}{0pt}{}{n-1}{k}_q \\ &= q^{n-k} \frac{[n-1]_q\ldots [n-k+1]_q[n-k]_q}{[k-1]_q!} \\ &\qquad \qquad \qquad + \frac{[n-1]_q\ldots [n-k+1]_q[n-k]_q}{[k]_q!} \\&= \frac{[n-1]_q\ldots [n-k+1]_q}{[k]_q!} \bigl(  q^{n-k}[k]_q + [n-k]_q  \bigr) \\ &= \frac{[n-1]_q \ldots [n-k+1]_q}{[k]_q!} [n]_q \\ \end{aligned}

where we used the inductive hypothesis to go from line 1 to line 2 and then

\begin{aligned} q^{n-k}[k]_q + [n-k]_q &= q^{n-k}(1+q + \cdots + q^{k-1})  \\ & \qquad\qquad + 1 + q + \cdots + q^{n-k-1} \\ &= 1 + q + \cdots + q^{n-1} \\&= [n]_q \end{aligned}

for the final equality. \Box

Partitions in a box are enumerated by q-multibinomial coefficients

Given a multisubset M of \{0,1,\ldots, n-1\}, we again define its weight to be its sum of entries (now taken with multiplicities). Let

\left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \sum_M q^{\mathrm{wt}(M)}

be the corresponding enumerator. (The double brackets notation is not standard.) For example

\begin{aligned} \left[\!\genfrac{[}{]}{0pt}{}{3}{2}\!\right]_q  &= q^{0+0} + q^{0+1} + q^{0+2} + q^{1+1} + q^{1+2} + q^{2+2} \\ &= 1 + q + 2q^2 + q^3 + q^4  \\ &= \genfrac{[}{]}{0pt}{}{4}{2}_q\end{aligned}

and \left[\!\genfrac{[}{]}{0pt}{}{n}{0}\!\right]_q = 1 and

\left[\!\genfrac{[}{]}{0pt}{}{n}{1}\!\right]_q = q^0 + q^1 + \cdots + q^{n-1} = [n]_q = \genfrac{[}{]}{0pt}{}{n}{1}_q

for all n \in \mathbb{N}_0. This equality \left[\!\genfrac{[}{]}{0pt}{}{3}{2}\!\right]_q = \genfrac{[}{]}{0pt}{}{4}{2}_q is not a coincidence.

The q-stars and bars identity

The k-multisubsets enumerated by \genfrac{[}{]}{0pt}{}{n}{k}_q are in obvious bijection with sequences (j_1,\ldots, j_k) where 0 \le j_1 \le \ldots \le j_k \le n-1. In turn, such sequences are in bijection with k subsets of \{0,1,\ldots, n+k-2\} by the map

(j_1,\ldots, j_k) \mapsto \{ j_1+0,j_2+1, \ldots, j_k+(k-1) \}.

The composite map takes a multisubset of weight w to a subset of weight w + k(k-1)/2. By the first section, q^{k(k-1)/2}\genfrac{[}{]}{0pt}{}{n+k-1}{k}_q enumerates k-subsets of \{0,1,\ldots, n+k-2\} by their weight. We therefore have a bijective proof that

\left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q .\qquad(2)

This is the q-lift of (1), that \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k}. It seems striking to me that this bijection is easier to describe that the stars and bars bijection, while proving a more general result.

The abacus interpretation

We digress to give another combinatorial interpretation of \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q. Observe that a partition whose Young diagram is contained in the k \times (n-k) box is uniquely specified by the sequence 0 \le j_1 \le \ldots \le j_k \le n-k of x-coordinates of its k distinct upward steps. (See examples shortly below.) Moreover, since an upward step at x-coordinate j adds j boxes to the partition, the size of the partition is the weight of the multisubset \{j_1,\ldots, j_k\}. Therefore \genfrac{[}{]}{0pt}{}{n}{k}_q enumerates partitions in the k \times (n-k) box by their size. Using this partition setting, the map

(j_1,\ldots, j_k) \mapsto \{ j_1+0,j_2+1, \ldots, j_k+(k-1) \}

from weakly increasing sequences to subsets of \{0,1,\ldots, n-1 \} corresponds to recording a partition not by the x coordinates of its upward steps, but instead by the step numbers of its upward steps, numbering steps from 0.

Illustrative examples with k =4 and n-k = 5 of the partitions (4,3,3,2) and (5,4,1) are shown in the diagram below with step numbers in bold.

Image

Below each partition is the corresponding abacus: note that the step numbers of the upward steps are simply the bead positions on the abacus. This is related to Loehr’s beautiful labelled abaci model for antisymmetric functions.

Exercise. Use the abacus to give an alternative proof of the lemma, restated below

\genfrac{[}{]}{0pt}{}{n}{k}_q =  q^{n-k} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Perhaps surprisingly, this is not the unique q-lift of Pascal’s Rule. Use the abacus in a different way to prove that

\genfrac{[}{]}{0pt}{}{n}{k}_q = \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + q^k \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Exercise. Prove that

(1 + qz)(1+q^2 z) \ldots (1+q^{n-1}z) = \sum_{k=0}^n q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q z^k

This is one of many identities that generalize the usual Binomial Theorem.

Symmetric functions.

For k \in \mathbb{N}_0 let e_k and h_k be the elementary and complete symmetric functions of degree k in variables x_1,x_2,\ldots. Thus e_k is the sum of all products of k distinct variables and h_k is the sum of all products of k variables, repetitions allowed. More generally, let s_\lambda denote the Schur function labelled by the partition \lambda as defined combinatorially by

\displaystyle s_\lambda= \sum_{t \in \mathrm{SSYT}(\lambda)} x^t

where if the semistandard tableau t has c_i entries of i for each i \in \mathbb{N} then x^t = x_1^{c_1} x_2^{c_2} \ldots . It is a good basic exercise to show that e_k = s_{(1^k)} and h_k = s_{(k)}. It is immediate from our definition of q-binomial coefficients that

e_k(1,q,\ldots, q^{n-1}) = q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q

and it is immediate from the q-stars and bars identity that

h_k(1,q,\ldots, q^{n-1}) = \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right] = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q

and so h_k(1,q,\ldots, q^{n-k}) = \genfrac{[}{]}{0pt}{}{n}{k}. The q-stars and bars identity becomes

h_k(1,q,\ldots, q^{n-1}) = q^{-k(k-1)/2} e_k(1,q,\ldots, q^{n+k-2}) \qquad(3)

All these identities are special cases of the result, almost immediate from the combinatorial definition of Schur functions that s_\lambda(1,q,\ldots, q^{k-1}) enumerates semistandard tableaux with entries from \{0,1,\ldots, k-1\} by their weight, i.e. their sum of entries.

Characters of \mathrm{SL}_2(\mathbb{C})

The reason for bringing in symmetric functions is the following theorem which connects representations of \mathrm{SL}_2(\mathbb{C}) with plethysms of symmetric functions and hence enumeration of semistandard tableaux. Let \nabla^\lambda denote the Schur function labelled by the partition \lambda; if E is a d-dimensional vector space then the trace of the diagonal matrix with entries \alpha_1, \ldots, \alpha_d acting on \nabla^\lambda(E) is s_\lambda(\alpha_1,\ldots, \alpha_d). It is an instructive exercise to verify from this property that \nabla^{(k)}E \cong \mathrm{Sym}^k E and \nabla^{(1^k)}E \cong \bigwedge^k E.

Theorem. Let E be the natural representation of \mathrm{SL}_2(\mathbb{C}). The representations \nabla^\lambda \mathrm{Sym}^\ell E and \nabla^\mu \mathrm{Sym}^m E are isomorphic if and only if the specialized Schur functions s_\lambda(1,q,\ldots, q^\ell) and s_\mu(1,q,\ldots, q^m) are equal up a power of q.

Proof. See Theorem 3.4(b) and (d) in my joint paper with Rowena Paget. \Box

One reason for not giving full details at this point is that it enables me to pass lightly over one technicality. By our definition, the representation \mathrm{Sym}^\ell E of \mathrm{GL}_2(\mathbb{C}) has character

h_\ell(x_1,x_2) = x_1^\ell + x_1^{\ell-1}x_2 + \cdots + x_2^\ell.

(Or to be precise, the trace of the diagonal matrix with entries \alpha, \beta acting on \mathrm{Sym}^\ell E is this polynomial evaluated at x_1 = \alpha, x_2 = \beta.) Restricting to \mathrm{SL}_2(\mathbb{C}), we should impose the relation x_1x_2 = 1, meaning that if x_1 = Q then x_2 = Q^{-1}, and so we work most naturally with Laurent polynomials in Q. (This is essential since unequal characters of \mathrm{GL}_2(\mathbb{C}) may become equal on restriction to \mathrm{SL}_2(\mathbb{C}). In fact this happens if and only if they differ by a power of the determinant representation.) Thus as a representation of \mathrm{SL}_2(\mathbb{C}), \mathrm{Sym}^\ell\! E has character

s_\ell (Q,Q^{-1}) = Q^\ell + Q^{\ell-2} + \cdots + Q^{-\ell}

and more generally, \nabla^\lambda \mathrm{Sym}^\ell E has character

s_\lambda(Q^{\ell}, Q^{\ell-2}, \ldots, Q^{-\ell}) = (s_\lambda \circ s_\ell)(Q, Q^{-1})

where \circ is the plethysm product. (This is the main idea needed to prove the theorem.) One can always pass from a Laurent polynomial identity in \mathbb{C}[Q,Q^{-1}] to a polynomial identity in \mathbb{C}[q] polynomials by setting q = Q^2 and multiplying through by a suitable power of Q, so it is possible to work in either setting.

Exercise. Give as many proofs as you can that the q-binomial coefficient \genfrac{[}{]}{0pt}{}{n}{k}_q is centrally symmetric; that is, writing [q^a]f(z) for the coefficient of q^a in the polynomial f(q), we have

[q^r] \genfrac{[}{]}{0pt}{}{n}{k}_q = [q^{n(n-1)/2-r}] \genfrac{[}{]}{0pt}{}{n}{k}_q

for all r \in \mathbb{N}_0. [Hint: there is a one-line proof using Q-characters. For a proof at the level of polynomials, show that the reverse of a polynomial f(q) of degree c is z^c f(q^{-1}), and that the palindromic property is preserved by multiplication and division, and then apply this to the formula in the proposition.]

It is worth noting that the sum of centrally symmetric polynomials (as opposed to Laurent series) is centrally symmetric if and only if they have the same degree. This appears to be one obstacle to giving a representation theory interpretation of identities such as the q-lift of Pascal’s Rule in the first lemma.

Exercise. Use the representation theory of \mathrm{SL}_2(\mathbb{C}) to show that the q-binomial coefficients \genfrac{[}{]}{0pt}{}{n}{k}_q are unimodal; that is, the coefficients increase up to the middle degree(s) and then decrease.

The Wronksian isomorphism.

Taking \lambda = (1^k) and \mu = (k) in the theorem, we obtain that the representations \bigwedge^k \mathrm{Sym}^\ell E and \mathrm{Sym}^k \mathrm{Sym}^m E of \mathrm{SL}_2(\mathbb{C}) are isomorphic if and only if e_k(1,q,\ldots, q^{\ell-1}) = q^{\ell(\ell-1)/2}\genfrac{[}{]}{0pt}{}{\ell}{k}_q and h_k(1,q,\ldots, q^{m-1}) = \genfrac{[}{]}{0pt}{}{m+k-1}{k}_q are equal up a power of q. (These equalities were proved in the previous subsection.) We therefore set \ell = n + k -2 and obtain the Wronksian isomorphism.

\bigwedge^k \mathrm{Sym}^{n+k-2} E \cong \mathrm{Sym}^k \mathrm{Sym}^{n-1} E \qquad (4)

of representations of \mathrm{SL}_2(\mathbb{C}), lifting the q-stars and bars identity \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q to the level of representations. Instead going down to the level of ordinary binomial coefficients, by setting q=1, we obtain \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} which is where we started. This is however only the start of the story: in my joint paper with Eoghan McDowell, we showed that the modified Wronksian isomorphism

\bigwedge^k \mathrm{Sym}^{n+k-2}\! E \cong \mathrm{Sym}_k \mathrm{Sym}^{n-1} \! E \qquad(5)

in which the symmetric power \mathrm{Sym}^k \! E (defined as a quotient of E^{\otimes k}) is replaced with its dual \mathrm{Sym}_k E (defined as a submodule of E^{\otimes k}), holds as an isomorphism of representations of \mathrm{SL}_2(\mathbb{F}) for an arbitrary field \mathbb{F}. We do this by defining an explicit map from the right-hand side to the left-hand side. This is the most refined lift I know to date of stars and bars.

Exercise. Let E = \langle X, Y \rangle_\mathbb{F} be a 2-dimensional vector space over \mathbb{F}_2. Show that, regarded as a representation of \mathrm{GL}_2(\mathbb{F}), \mathrm{Sym}^2 \! E = \langle X^2, Y^2, XY \rangle has \langle X^2, Y^2\rangle as an irreducible 2-dimensional submodule with quotient isomorphic to the determinant representation and, dually,

\mathrm{Sym}_2 E = \langle X\otimes X, Y \otimes Y, X \otimes Y + Y \otimes X \rangle

has \langle X  \otimes Y + Y \otimes X \rangle as a one-dimensional submodule isomorphic to the determinant representation, with quotient the irreducible 2-dimensional representation already seen.

A proof of the Wronskian isomorphism after Darij Grinberg

For a very elegant proof of the Wronskian isomorphism, working still over a field \mathbb{F} of arbitrary characteristic, see these notes of Darij Grinberg, written in response to my joint paper. Here I will give an outline of Grinberg’s very beautiful construction.

Setup

Let E = \langle X, Y \rangle_\mathbb{F} as above. Then \mathrm{Sym}^{n-1} E can be thought of as the space of homogeneous polynomials of degree n-1 in X and Y; it is the degree n-1 component of the polynomial algebra

\mathbb{F}[X,Y] = \bigoplus_{r \ge 0} \mathrm{Sym}^r \! E.

(It may be helpful to remember the alternative notation \mathrm{Sym}^\bullet \! E for either side.) Thus we can identify \mathrm{Sym}_k \mathrm{Sym}^{n-1} E with a subspace of \mathrm{Sym}_k \mathbb{F}[X,Y] Since, given any vector space V, by definition \mathrm{Sym}_k V is the subspace of V^{\otimes k} of symmetric tensors, we can, in turn, identify \mathrm{Sym}_k \mathrm{Sym}^{n-1} E with the subspace of \bigl( \mathbb{F}[X,Y]\bigr)^{\otimes k} spanned by the symmetrisation of all tensors

X^{n-1-j_1}Y^{j_1} \otimes \cdots \otimes X^{n-1-j_k}Y^{j_k}

for 1 \le j_1 \le j_2 \le \ldots \le j_k \le n-1. (Note that this symmetrization should be performed without introducing scalar factors, giving a basis X \otimes X, X \otimes Y + Y \otimes X, Y \otimes Y as in the exercise above when k=2 and n=2.) Finally we use that this tensor product is isomorphic to \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] by the very explicit isomorphism defined on a spanning set by

f_1(X,Y) \otimes \cdots \otimes f_k(X,Y) \mapsto f_1(X_1,Y_1)\ldots f_k(X_k,Y_k).

The image of \mathrm{Sym}_k \mathrm{Sym}^{n-1} E is all polynomials in \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] of homogeneous degree n-1 with respect to each pair of variables X_i, Y_i that are symmetric with respect to permutation of indices. It might at first seem a little strange that the non-commutative tensor product can be represented in a commutative polynomial algebra, but I think any confusion is resolved when one realises that the indices of the X_i, Y_i serve to identify which tensor factor all variable come from.

This chain of maps gives the left-hand column of the commutative diagram below.

Image

The right-hand column is similar, and in fact a little easier to think about, as the map \bigwedge^k V \rightarrow V^{\otimes k} can be defined very simply by

v_1 \wedge \ldots \wedge v_k \mapsto \sum_{\sigma \in \mathfrak{S}_k} \mathrm{sgn}(\sigma) v_{1\sigma^{-1}} \wedge \ldots v_{k\sigma^{-1}}

without any worries about introducing scalar factors. The image of \bigwedge^k \mathrm{Sym}^{n+k-2} E is all polynomials in \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] of homogeneous degree n+k-2 with respect to each pair of variables X_i, Y_i that are antisymmetric with respect to permutation of indices.

Antisymmetrization and Grinberg’s map

Let g \cdot \sigma denote the action of \sigma \in \mathfrak{S}_k on the indices of a polynomial g \in \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k]. This corresponds to position permutation of tensor factors, as in the displayed equation ending the previous section. Hence the image of the composite map from \bigwedge^k \mathbb{F}[X,Y] to \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k] is all polynomials

\sum_{\sigma \in \mathfrak{S}_k} \mathrm{sgn}(\sigma) g \cdot \sigma.

Denote this antisymmetric polynomial by \mathrm{alt}(g). Observe that if we apply \mathrm{alt} to a polynomial that is already antisymmetric (for instance, one in the image of the right-hand column maps) then we multiply by k!. Thus \mathrm{alt} is a pseudo-projection.

Let \psi : \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] \rightarrow \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] be the linear map defined by

\psi(g) = \mathrm{alt}\bigl( g (X_1^{k-1}) (X_2^{k-2}Y_2) \ldots (X_{k-1}Y_{k-1}^{k-2}) (Y_k^{k-1}) \bigr)

for g \in \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k].

Observe that if g is homogeneous of degree n-1 in each pair X_i, Y_i (as in polynomials in the image of the first column) then the multiplication boosts the homogeneous degree of g from n-1 to n+k-2. Since we then antisymmetrize, the composite map going down the left-hand column and then across has image in the subspace of \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] defined by the going down the right-hand column. Moreover, by the key identity

\mathrm{alt}(f h) = f \mathrm{alt}(h)

for polynomials f symmetric under index permutation, it follows that the restriction of \psi to be subspace of such polynomials is defined by multiplication by

\mathrm{alt}\bigl( (X_1^{k-1}) (X_2^{k-2}Y_2) \ldots (X_{k-1}Y_{k-1}^{k-2}) (Y_k^{k-1}) \bigr).

This element can be rewritten as

\prod_{1 \le i < j \le k} (X_iY_j - X_jY_i)

and so affords the power \mathrm{det}^{k(k-1)/2} of the determinant representation of \mathrm{GL}_2(\mathbb{F}), which is of course the trivial representation of \mathrm{SL}_2(\mathbb{F}). It follows that \psi induces an \mathrm{SL}_2(\mathbb{F})-equivariant map, denoted \tilde{\psi} in the commutative diagram above. Because multiplication in the integral domain \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] is injective, \tilde{\psi} is injective. By dimension counting it is surjective, and we are done.

In fact \tilde{\psi} is precisely the map defined in my joint paper, but described in a much more transparent way, which of course Grinberg exploits in the proof, as I have attempted to show above. See Section 0.1 of his paper for details.


Notes on the Wedderburn decomposition

September 10, 2024

Let G be a finite group and let \mathrm{Irr}(G) denote the set of irreducible representations of G up to isomorphism. Wedderburn’s Theorem is that there is an algebra isomorphism

\mathbb{C}[G] \cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C}).

where \mathbb{C}[G] is the complex group algebra of G. Considering the modules for the right-hand side, we see that if V is a simple \mathbb{C}[G]-module then \mathbb{C}[G] acts on V as a full matrix algebra, and the kernel of the action map is the sum of the other block algebras. Thus Wedderburn’s Theorem implies Jacobson’s Density Theorem for simple \mathbb{C}[G]-modules, namely that the matrices representing the action G on a simple \mathbb{C}[G]-module V span \mathrm{Mat}_{\mathrm{dim} V}(\mathbb{C}).

This corollary, that irreducible representations of finite groups are simply matrix algebras acting on column vectors often strikes me as a little surprising: are irreducible representations really so boring? I think the answer is ‘yes they are, from the ring-theoretic perspective’. Consider that many of the interesting problems in representation theory concern the decomposition of big representations into irreducibles: for instance, for the symmetric group, Foulkes’ Conjecture in characteristic zero, or the decomposition matrix problem in prime characteristic. For such problems we care very much about the particular label of the irreducible representations (these come from the group structure), not just their dimension (which we have seen is entirely ring-theoretic information).

In this post I shall give three proofs of Wedderburn’s Theorem, with critical commentary. All three proofs need the following basic result from character theory.

Decomposing the regular character of G

By character orthogonality, the regular character of \mathbb{C}[G] defined by \pi(g) = |G|[g = \mathrm{id}_G] decomposes as \pi = \sum_{V \in \mathrm{Irr}(G)} (\dim V) \chi_V. Hence as a left \mathbb{C}[G]-module,

\mathbb{C}[G] \cong \bigoplus_{V \in \mathrm{Irr}(G)}  V^{\oplus \dim V} \qquad (\star)

and in particular

|G| = \sum_{V \in \mathrm{Irr}(G)} (\dim V)^2. \qquad (\dagger)

The inversion theorem proof

Let \rho_V : G \rightarrow \mathrm{GL}(V) for V \in \mathrm{Irr}(G) be the irreducible representations of G. We extend the domain of each \rho_V to \mathbb{C}[G] by linearity, so now \rho_V : \mathbb{C}[G] \rightarrow \mathrm{End}(V) is an algebra homomorphism. Identifying \mathrm{End}(V) with \mathrm{Mat}_{\dim V}(\mathbb{C}) by a fixed choice of basis, we obtain a map

\mathbb{C}[G] \rightarrow \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C}).

We must show it is an algebra isomorphism. By dimension counting using (\dagger) it suffices to show that it is injective. Thus we must show that each x \in \mathbb{C}[G] is determined by its actions \rho_V(x) on a full set of irreducible representations. For this, we use the following inversion formula:

x = \sum_{V \in \mathrm{Irr}(G)} \frac{\chi_V(1)}{|G|} \sum_{g \in G} \mathrm{tr} \rho_V(g^{-1} x) g.

To see this is true, it suffices, by linearity, to prove it when x is a group element. In this case \mathrm{tr} \rho_V(g^{-1}x) = \chi_V(g^{-1} x) and the formula follows from the fact that \sum_{V \in \mathrm{Irr}(G)} \frac{\chi_V(1)}{|G|} \chi_V(g^{-1}x) is 1 if g is x, and 0 otherwise. This fact is just column orthogonality between the column of g^{-1}x and the column of the identity element of G.

Critical remarks

The obvious criticism is: ‘where did the inversion formula come from?’ One way to reconstruct it is to use that the central primitive idempotent for the irreducible representation V is

e_V = \frac{\chi_V(1)}{|G|} \sum_{g \in G} \chi_V(g^{-1}) g.

We remark that e_V is clearly in the centre of \mathbb{C}[G] because its coefficients are constant on conjugacy classes; moreover, it is an easy calculation using row orthogonality to show that e_V e_W = 0 if V \not\cong W and e_V^2 = e_V. These properties can also be used to deduce the formula if one has forgotten it.

Now suppose that x \in \mathbb{C}[G] is in the pre-image of the Wedderburn block \mathrm{Mat}_{\dim V}(\mathbb{C} so acts non-zeroly only on V. We then have x e_V = x and so the coefficient x(g) of g in x is the coefficient of g in x e_V namely

\begin{aligned} \sum_k x(k) e_V(k^{-1} g) &= \sum_k x(k) \chi_V(1)/|G| \chi_V(g^{-1} k) \\ &= \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} \sum_k x(k) k )  \\ &= \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} x).\end{aligned}

Thus x = \sum_{g \in G} \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} x) g. (This formula is useful also in modular representation theory: see my blog post Modular representations that look ordinary.) We have now proved the inversion formula in the special case; summing over all irreducible representations V gives it in general. Intuitively, borrowing Kac’s phrase ‘one can hear [from its action on irreducible representations] the shape of an element of a semisimple group algebra’.

A less obvious criticism but one that should be made, is that the proof (particularly when read with the intuition above) can easily be misunderstood as proving more than is possible. The subtle point is that when we are given the image of x under the Wedderburn isomorphism, we get a set of |\mathrm{Irr}(G)| block matrices, not the corresponding labels of irreducible representations. But since the real point is to prove injectivity, not to show that this ‘abstract’ decomposition determines the action of x on each irreducible representation (which of course is false whenever there are two irreducible representations of the same dimension), we can assign the labels as we set fit.

If G is a finite abelian group then the inversion formula states that if x : G \rightarrow \mathbb{C} is any function then

x(g) = \frac{1}{|G|} \sum_{\chi \in \widehat{G}}  \chi(g^{-1})\chi(x)

where we write \widehat{G} rather than \mathrm{Irr}(G) for the character group of G. (To deduce this from the inversion formula, just identify x with the corresponding element of the group algebra, namely \sum_{g \in G} x(g)g.) This can be interpreted as Fourier inversion, reading \chi(x) as `the Fourier coefficient of x at the irreducible \chi‘. For example, if G = \mathbb{Z}/N\mathbb{Z} then, setting \zeta = \exp (2 \pi i /N) we get

x(m) = \frac{1}{n} \sum_{t = 0}^{N-1} \zeta^{- m t} \chi_t(x)

where \chi_t(x) = \sum_{r = 0}^{N-1} x(r) \zeta^{r t}. The very first post in this blog, from September 2010, has at bit more on this perspective.

The Schur’s Lemma proof

By Schur’s lemma applied to (\star) we get

\begin{aligned} \mathrm{End}_{\mathbb{C}[G]} \mathbb{C}[G] &\cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{End}_{\mathbb{C}[G]} (V^{\oplus \dim V})  \\ &\cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C})\end{aligned}

To illustrate the second isomorphism, consider the case \dim V = 2 and let v \in V be a any non-zero vector. Let V' be isomorphic to V by an isomorphism sending v to v'. Clearly any \mathbb{C}[G] endomorphism of V \oplus V' is determined by its effect on v and v', and less clearly, but I hope you’ll see that it follows from yet more Schur’s Lemma, it is of the form v \mapsto \alpha v + \gamma v' and v' \mapsto \beta v + \delta v' for some \alpha,\beta,\gamma,\delta \in \mathbb{C}, corresponding to the matrix

\left( \begin{matrix} \alpha & \gamma \\ \beta & \delta \end{matrix} \right).

To complete the proof we use that any \mathbb{C}[G]-endomorphism of \mathbb{C}[G] is of the form x \mapsto xg for a unique g \in G (determined as the image of \mathrm{id}_G) and so \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[G] \cong \mathbb{C}[G]^{\mathrm{op}}, the opposite algebra to G. This gives the Wedderburn isomorphism, but with an inconvenient ‘opposite’, which can be removed by noting that X \mapsto X^{\mathrm{tr}} is an isomorphism between \mathrm{Mat}_d(\mathbb{C}) and \mathrm{Mat}_d(\mathbb{C})^{\mathrm{op}}.

Critical remarks

The strength of this proof to my mind is that it has just one idea: ‘look at the endomorphism algebra of \mathbb{C}[G] as a module for itself’. After that, it’s just the remorseless application of Schur’s Lemma. The one idea requires is well motivated by experience with computing with representations. For instance, one very important link between character theory and the theory of permutation groups is that if \mathbb{C}[\Omega] is the permutation representation of G acting on a set \Omega, with canonical basis \{ e_\omega : \omega \in \Omega \}, then \dim \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[\Omega] = 2 and

\mathbb{C}[\Omega] =  \bigl\langle \sum_{\omega \in \Omega} e_\omega \bigr\rangle \oplus V

for an irreducible representation V if and only if the action of G on \Omega is 2-transitive.

The weaknesses are, in my view: (1) it meets the criterion to be a ‘one shot’ proof, but the fuss with the opposite algebra lengthens the proof; (2) the map \mathbb{C}[G] \rightarrow \mathrm{Mat}_{\dim V}(\mathbb{C}) it gives comes not from G acting on V (as we expected) but from G acting on V^{\oplus \dim V}, in a way that commutes with the diagonal left action of \mathbb{C}[G]. Taken together (1) and (2) make the isomorphism far less explicit than need be the case.

Digression on Yoneda’s Lemma

The appearance of the opposite algebra can be predicted: I’d like to digress briefly to show that it’s just a manifestation of the ‘opposite’ that appears in Yoneda’s Lemma. Stated for covariant functors, a special case of Yoneda’s Lemma states that natural transformation between the representable functors \mathrm{Hom}_\mathcal{C}(x, -) and \mathrm{Hom}_\mathcal{C}(y, -) are in bijection with \mathrm{Hom}_\mathcal{C}(y, x); given such a morphism g_{yx} : y \rightarrow x, the map \mathrm{Hom}_\mathcal{C}(x, -) \rightarrow \mathrm{Hom}_\mathcal{C}(y, -) is defined by f \mapsto f \circ g_{yx}, composing from right-to-left. (What other source of natural maps between these hom-spaces could there possibly be beyond morphisms from y to x?) This makes the Yoneda embedding x \mapsto \mathrm{Hom}_\mathcal{C}(x, -) a contravariant functor from \mathcal{C} to \mathbf{Set}^\mathrm{op}. (Slogan: ‘pre-composition is contravariant’.)

Cayley’s Theorem is the special case where \mathcal{C} is a group (a category with one object and all morphisms invertible): the Yoneda embedding is then the map G \mapsto \mathrm{Sym}_G defined by g \mapsto \theta_g where \theta_g(x) = xg. Note that g' g which may be read as ‘do g then g'‘ maps to \theta_{g g'} = \theta_{g'} \circ \theta_{g} — we check:

\begin{aligned} (\theta_{g'} \circ \theta_{g})(x) &= \theta_{g'} \bigl( \theta_g(x) \bigr)  \\ & \quad =  \theta_{g'} (xg) = (xg)g' = x(gg') = \theta_{g g'}(x). \end{aligned}

Thus the Yoneda embedding is not a group homomorphism, but becomes one if we regard the codomain as \mathrm{Sym}_G^{\mathrm{op}}. The proof above that \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[G] \cong \mathbb{C}[G]^{\mathrm{op}} is almost the same ‘dancing around Yoneda’ but this time with a version of Yoneda’s Lemma for categories enriched over \mathbb{C}-vector spaces.

Incidentally, the Yoneda embedding becomes covariant if we instead take natural transformations from \mathrm{Hom}_\mathcal{C}(-, x) to \mathrm{Hom}_\mathcal{C}(-, y) (‘post-composition is covariant’), but the ‘opposite’ pops up in another place, as these hom-into functors are contravariant, or, in the more modern language, ordinary covariant functors but defined on \mathcal{C}^\mathrm{op}.

Finally, Yoneda’s Lemma is often stated as

\mathrm{Nat}(\mathrm{Hom}_\mathcal{C}(x, -), F) \cong F(x)

‘natural transformations from representable functors are representable’; observe that when F = \mathrm{Hom}_\mathcal{C}(y, -), we have F(x) = \mathrm{Hom}_\mathcal{C}(y,x), so this is indeed a generalization of the result as I stated it.

The proof via density

In this ‘two shot’ proof we first prove Jacobsen’s Density Theorem for \mathbb{C}[G]-modules. The proof below closely following Tommaso Scognamiglio elegant proof on MathStackExchange. As motivation, recall that a representation of a finite group is irreducible if and only if it is cyclic, generated by any non-zero vector. Given this, to prove density, it is very natural to show that \mathrm{Mat}(V) is an irreducible representation: but since (supposing the conclusion) we know that as a left \mathbb{C}[G]-module

\mathrm{Mat}(V) \cong V\, \oplus \stackrel{\dim V}{\cdots}\! \oplus\, V

is not irreducible, we are going to have to consider a bigger group.

Density

There is a canonical \mathbb{C}-linear isomorphism \mathrm{End}(V) \cong V \otimes V^\star under which v \otimes \theta maps to the rank 1 endomorphism defined by x \mapsto v \theta(x). Recall that V^\star is a representation of G (acting on the left as usual) by \rho_{V^\star}(g) \theta = \theta \circ \rho_V(g^{-1}). The matrix giving the action of g is then \rho_V(g^{-1})^\mathrm{tr}, the transpose appearing because if \theta_1, \ldots, \theta_n is the dual basis to v_1, \ldots, v_n then from

\rho_V(g^{-1}) v_k = \sum_{j=1}^n \rho_V(g^{-1})_{jk} v_j \qquad (\ddagger)

we get (\theta_j \circ g^{-1})(v_k) = \rho(g^{-1})_{jk}, and so \rho_{V^\star}(g)\theta_j  = \sum_k \rho(g^{-1})_{jk} \theta_k, which then requires a transpose to agree with the matrix convention in (\ddagger). Thus V \otimes V^\star is a representation of G \times G with action defined by linear extension of

(f, h)(v \otimes \theta) = \rho_V(f) v \otimes \rho_{V^\star}(h)\theta

in which (f, h) \in G \times G acts with matrix \rho_V(f) \otimes \rho_V(h^{-1})^\mathrm{tr}. In particular \chi_{\mathrm{End}(V)} (f, h) = \chi_V(f) \chi_V(h^{-1}) = \chi_V(f) \overline{\chi_V(h)}. Hence

\begin{aligned} \langle \chi_{\mathrm{End}(V)}, \chi_{\mathrm{End}(V)} \rangle_{G \times G}  &= \frac{1}{|G|^2} \sum_{(f,h) \in G \times G} |\chi_V(f)|^2 |\chi_V(h)|^2   \\ &= \bigl( \frac{1}{|G|} \sum_{g \in G} |\chi_V(g)|^2 \bigr)  \\ &= 1. \end{aligned}

showing that \chi_{\mathrm{End}(V)} is an irreducible character and \mathrm{End}(V) is an irreducible representation of G \times G, and so (as remarked above) generated by any non-zero element. In particular, the image of the identity \mathrm{id}_V under the action of G \times G spans \mathrm{End}(V). Since, writing \cdot for the modular action,

\begin{aligned} \bigl( (f, h) \cdot \mathrm{id}_G \bigr) v &= (f \mathrm{id}_G h^{-1}) \cdot v  \\ &\quad = \rho_V(f) \rho_V(h^{-1}) = \rho_V(fh^{-1}) \end{aligned}

this span is the same as the span of all \rho_V(g) for g \in G, as required for density.

Proof of Wedderburn’s Theorem

By density, for each irreducible representation V \in \mathrm{Irr}(G), the induced map \rho_V : \mathbb{C}[G] \rightarrow \mathrm{Mat}_{\dim V}(\mathbb{C}) is surjective. Hence

\bigoplus_{V \in \mathrm{Irr}(G)} \rho_V : \mathbb{C}[G] \rightarrow \bigoplus_{V\in \mathrm{Irr}(G)}  \mathrm{Mat}_{\dim V}(\mathbb{C})

is a surjective algebra homomorphism. But by (\dagger), the dimensions on each side are |G|. Hence this map is an algebra isomorphism, as required.

Critical remarks

A strength of this proof is that you can swap in your own preferred proof of the density theorem. For instance, to prove the generalization to group algebras over an algebraically closed finite field that

\frac{\mathbb{F}[G]}{\mathrm{rad}\ \mathbb{F}[G]} \cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{F})

one can instead prove the density theorem for irreducible representations over an arbitrary field, and deduce that the map is injective because (either by definition, or an easy equivalent to the definition) an element of \mathbb{F}[G] acting as zero on every irreducible \mathbb{F}[G]-module lies in \mathrm{rad}\ \mathbb{F}[G]. The only disadvantage to my mind is that it’s less suitable for beginners: Jacobson’s Density Theorem is not an obvious result, and although I like Scognamiglio’s proof very much, the use of the G \times G action (on the left, but with one factor twisted over to the right, hence the fiddliness with transposition) could be off-putting.

Why ‘density’? A unitary example

The Weyl algebra \langle \hat{x}, \hat{p} : \hat{x}\hat{p}  - \hat{p}\hat{x}  = i \hbar \rangle models the position and momentum operators in quantum mechanics. The relation above is the algebraic formulation of the uncertainty principle that there can be no simultaneous eigenstate for both the position \hat{x} and momenum \hat{p} operators. In the canonical interpretation of (time independent) wave functions as elements of L^2(\mathbb{R}), i.e. square integrable complex valued functions on \mathbb{R}, there is a representation

\begin{aligned} \hat{x} \psi(x) &= x \psi(x) \\ \hat{p} \psi(x)  &= -i\hbar \frac{\mathrm{d} \psi(x)}{\mathrm{d} x}\end{aligned}

defined on the dense subspace of L^2(\mathbb{R}) of smooth functions in which the Weyl algebra generators act as self-adjoint operators. Exponentiating these operators we obtain \exp(i t \hat{x}) and \exp(i t \hat{p}) defined by

\begin{aligned} \exp(i t \hat{x}) \psi(x) &= \exp(itx) \psi(x) \\ \exp(i t \hat{p}) \psi(x) &= \exp (\hbar t \frac{\mathrm{d}}{\mathrm{d}x})\psi(x) =  \psi(x + \hbar t) \end{aligned}

which generate an irreducible unitary representation of the Heisenberg group by multiplication and translation operators. (The second equation is essentially Taylor’s Theorem, and any worries about its applicability can be assuaged by muttering ‘dense subspace’.) The Stone–von Neumann theorem is that any irreducible representation of the Heisenberg group is unitarily equivalent to this representation. An excellent question on MathStackexchange asks whether, whether these maps generate a dense subalgebra of \mathrm{End}(L^2(\mathbb{R})), suitably interpreted. The answer, perhaps surprisingly, is ‘no’ with respect to the operator norm, but ‘yes’ with the strong or weak operator topology; the proofs of this appeals to von Neumann’s double commutatant theorem, which is in turn is a generalization of the density theorem.

Automorphisms

Suppose that \tau is an algebra automorphism of \mathrm{Mat}_d(\mathbb{C}). Let V be the natural d-dimensional irreducible representation. Then twisting the action by \tau we get a new representation V^\tau in which \mathrm{Mat}_d(\mathbb{C}) acts by X \cdot v = X^\tau v. Since \mathrm{Mat}_d(\mathbb{C}) has a unique irreducible representation up to isomorphism, we have V^\tau \cong V. Hence, by definition of isomorphism of representations, there exists an invertible matrix T such that X^\tau = T^{-1}X T for all X \in \mathrm{Mat}_d(\mathbb{C}). Therefore \tau is conjugation by T. We have proved the special case of the Skolem–Noether theorem that all automorphisms of \mathrm{Mat}_d(\mathbb{C}) are inner, and so the automorphism group is isomorphic to \mathrm{PGL}_d(\mathbb{C}). I am grateful to Mariano for pointing out to me that the automorphism group of \mathbb{C}[G] is more complicated whenever there are several blocks of the same dimension.


Eigenvalues of the Kneser graph

June 8, 2023

A few days ago Ehud Friedgut gave a beautiful talk in the Bristol Combinatorics Seminar, titled ‘The most complicated proof ever of Mantel’s theorem, and other applications of the representation theory of the symmetric group‘ and based on joint work with Yuval Filmus, Nathan Lindzey and Yuval Wigderson. As the title promised, he used the representation theory of the symmetric group to prove Mantel’s Theorem, that if a graph on 2m vertices has more edges than the complete bipartite graph with bipartition into two parts of size m, then it contains a triangle. He illustrated the method by instead proving the Erdős–Ko–Rado Theorem, that the maximum size of a family of r-subsets of \{1,\ldots, n\} such that any two members of the family have a non-trivial intersection is \binom{n-1}{r-1}.

The first purpose of this post is to outline the new proof (as I understand it from Ehud’s talk) of this result, dropping in my own proof of one of the lemmas he used. Let me emphasise that — apart from the minor proof of the lemma, which in any case uses results of James — no originality is claimed for anything in this part. I then show that the methods appear (modulo the proof of the claim below) to adapt to proof the analogue of the Erdős–Ko–Rado Theorem for vector spaces.

A graph for the Erdős–Ko–Rado Theorem

The graph G we need to prove the Erdős–Ko–Rado Theorem is the Kneser graph, having as its vertices the r-subsets of \{1,\ldots, n\}. Two subsets X and Y are connected by an edge if and only if X \cap Y = \varnothing. The Erdős–Ko–Rado Theorem is then the claim that the maximum size of an independent set in G is \binom{n-1}{r-1}. As we shall see below, this follows from the Hoffman bound, that an independent set in a d-regular graph on N vertices with adjacent matrix A and least eigenvalue -\lambda has size at most N \lambda / (d+\lambda). (Note that since the sum of the eigenvalues is the trace of the adjacency matrix, namely 0, and the largest eigenvalue is the degree d, the least eigenvalue is certainly negative.)

Representations

To find the eigenvalues of A, Ehud and his coauthors use the representation theory of the symmetric group. Let V be the permutation module of S_n acting on the r-subsets of \{1,\ldots, n\}. It is well known that V has character

\chi^{(n)} + \chi^{(n-1,1)} + \cdots + \chi^{(n-r,r)}.

Since this character is multiplicity-free, there is a unique direct sum decomposition of V into Specht modules,

V = S^{(n)} \oplus S^{(n-1,1)} \oplus \cdots \oplus S^{(n-r,r)}. \quad (\star)

The adjacent matrix A acts as a linear map on V that commutes with the G-action. Therefore, by Schur’s Lemma, for each s with 0\le s \le r there is a scalar \lambda(s) such that A acts on S^{(n-s,s)} as multiplication by \lambda(s). Thus the \lambda(s) are the eigenvalues (possibly with repetition, but this turns out not to be the case) of A.

Lemma. The adjacency matrix A acts on S^{(n-s,s)} by scalar multiplication by (-1)^s\binom{n-r-s}{r-s}.

Proof. It follows from the theory in Chapter 17 of James’ Springer lecture notes on the symmetric groups that V has a filtration

V = V_0 \supset V_1 \supset \ldots \supset V_r \supset 0

such that V_r \cong S^{(n-r,r)} and V_s / V_{s+1} \cong S^{(n-s,s)} for 0\le s < r. The submodule V_r is spanned by certain ‘partially symmetrized’ polytabloids. Rather than introduce this notation, which is a bit awkward for a blog post, let me instead suppose that S_n acts on the set \{1,\ldots, s, -1, \ldots, -s\} \cup Z where |Z| = n-2s. Let X be a fixed (r-s)-subset of Z. Then V_s is a cyclic \mathbb{C}S_n-submodule of V generated (using right modules) by v(s) where

v(s) = \bigl( \{1,\ldots, s\} \cup X\bigr) \bigl(1-(1,-1)\bigr) \ldots \bigl( 1- (s,-s) \bigr)

There is a ‘raising map’ R : V_s \rightarrow S^{(n-s,s)} which — viewing S^{(n-s,s)} as a submodule of the permutation module of S_n acting on s-subsets — is defined on the generator v(s) by

v(s) \mapsto  \{1,\ldots, s\}  \bigl(1-(1,-1)\bigr) \ldots \bigl( 1- (s,-s) \bigr)

Fix an r-set \{\pm 1, \ldots, \pm s\} \cup X appearing in the expansion of v(s) in the canonical basis of V. Applying the adjacency matrix A sends this r-set to all those r-sets that it does not meet. Hence, after applying A and then R we get a scalar multiple of v(s) precisely from those sets \{\mp 1, \ldots, \mp s\} \cup Y in the support of

\bigl( \{\pm 1, \ldots, \pm s\} \cup X \bigr) A

such that Y is an (r-s)-subset of \{1,\ldots, n\} not meeting

\{1,\ldots, s,-1,\ldots, -s\} \cup X.

There are 2s + (r-s) = r+s forbidden elements, hence there are \binom{n-(r+s)}{r-s} choices for Y. Each comes with sign (-1)^s. \Box

End of proof

The lemma implies that \lambda(1) = -\binom{n-r-1}{r-1} is the least eigenvalue of A. Putting this into Hoffman’s bound and using that the graph G is regular of degree \binom{n-r}{r}, since this is the number of r-subsets of \{1,\ldots, n\} not meeting \{1,\ldots, r\}, we get that the maximum size of an independent set is at most

\begin{aligned} \frac{\binom{n}{r} \binom{n-r-1}{r-1}}{\binom{n-r}{r} + \binom{n-r-1}{r-1}}&= \frac{ \binom{n}{r}}{ \binom{n-r}{r}/\binom{n-r-1}{r-1} + 1} \\ &= \frac{ \binom{n}{r}}{ \frac{n-r}{r} + 1} \\ &= \textstyle \binom{n}{r} \frac{r}{n} \\&= \textstyle \binom{n-1}{r-1} \end{aligned}

which is exactly the Erdős–Ko–Rado bound. Thus we (or rather Ehud Friedgut and his coauthors) have proved the Erdős–Ko–Rado Theorem. Ehud then went on to outline how the method applies to Mantel’s Theorem: the relevant representation is, in the usual notation, M^{(n-3,2,1)}; note one may interpret an (n-3,2,1)-tabloid as an ordered pair \bigl( \{i, j\}, \{k \} \bigr) encoding the path of length 3 with edges \{i,k\} and \{j,k\}. There is a proof of Mantel’s Theorem by double counting using these paths: as Ehud remarked (quoting, if I heard him right, Kalai) ‘It’s very hard not to prove Mantel’s Theorem’.

Possible variations

I’ll end with two possible avenues for generalization, both motivated by the classification of multiplicity-free permutation characters of symmetric groups. My paper built on work of Saxl and was published at about the same time as parallel work of Godsil and Meagher.

Foulkes module

The symmetric group S_{2m} acts on the set of set partitions of \{1,\ldots, 2m\} into m sets each of size 2. The corresponding permutation character is multiplicity free: it has the attractive decomposition \sum_{\lambda \in \mathrm{Par}(n)} \chi^{2\lambda}, where 2\lambda is the partition obtained from \lambda by doubling the length of each path. Moreover, my collaborator Rowena Paget has given an explicit Specht filtration of the corresponding permutation module, analogous to the filtration used above of M^{(n-r,r)}. So everything is in place to prove a version of the Erdős–Ko–Rado Theorem for perfect matchings. Update. I am grateful to Ehud for pointing out that this has already been done (using slightly different methods) by his coauthor Nathan Lindzey.

Finite general linear groups

There is a remarkably close connection between the representations of S_n and the unipotent representations of the finite general linear group \mathrm{GL}_n(\mathbb{F}_q). Perhaps, one day, this will be explained by defining the symmetric group as a group of automorphisms of a vector space over the field with one element. The analogue of the permutation module M^{(n-r,r)} of S_n acting on r-subsets is the permutation module of \mathrm{GL}_n(\mathbb{F}_q) acting on r-subspaces of \mathbb{F}_q^n. This decomposes as a direct sum of irreducible submodules exactly as in (\star) above, replacing the Specht modules with their \mathrm{GL}_n(\mathbb{F}_q)-analogues. So once again the decomposition is multiplicity-free. This suggests there might well be a linear analogue of the Erdős–Ko–Rado Theorem in which we aim to maximize the size of a family of subspaces of \mathbb{F}_q^n such that any two subspaces have at least a 1-dimensional intersection.

Update. The Erdős–Ko–Rado Theorem for vector spaces was proved
in an aptly named paper by Frankl and Wilson in the general version for t-intersecting families. The special case t=1 is that the maximum number of r-dimenional subspaces of \mathbb{F}_q^n such that any two subspaces have at least a 1-dimensional interection is \binom{n-1}{r-1}_q for n \ge 2r, where the q-binomial coefficient is the number of r-1-dimensional subspaces of \mathbb{F}_q^{n-1}. Thus the maximum is attained by taking all the subspaces containing a fixed line, in precise analogy with the case for sets. (Note also that if n \le 2r-1 then any two r-dimensional subspaces have a non-zero intersection.)

James wrote a second set of lecture notes Representation of the general linear groups, pointing out several times the remarkable analogy with the representation theory of the symmetric group. At the time he wrote, before his work with Dipper on Hecke algebras, this seemed somewhat mysterious. (Very roughly, FS_n is to \mathrm{GL}_d(F) where F is an infinite field as the Hecke algebra \mathcal{H}_q(S_n) is to \mathrm{GL}_d(\mathbb{F}_q).) Chapter 13 of his book has, I think, exactly what we need to prove the analogue of the lemma above. Update: I also just noticed that Section 8.6 of Harmonic analysis on finite groups by Ceccherini-Silberstein, Scarabotti and Tolli, titled ‘The q-Johnson scheme’ is entirely devoted to the Gelfand pair (\mathrm{GL}_d(\mathbb{F}_q), H_e) where H_e is the maximal parabolic subgroup stabilising a chosen e-dimensional subspace.

Here is the setting for the general linear group. Fix r \le n/2 and let M^{(r)} be the permutation module of \mathrm{GL}_n(\mathbb{F}_q) acting on r-dimensional subspaces of \mathbb{F}_q^n defined over the complex numbers. For 0 \le s < r, let \psi_s : M^{(r)} \rightarrow M^{(s)} be the map sending a subspace to the formal linear combination of all s-dimensional subspaces that it contains. The \mathrm{GL}_d(q)-Specht module S^{(n-r,r)} is then the intersection of the kernels of all the \psi_s for s < r. Moreover if we set

\displaystyle M_s = \bigcap_{i=0}^{s-1} \ker \psi_i \subseteq M^{(r)}

then M^{(r)} = M_0 \supset M_1 \supset \cdots \supset M_r and M_r \cong S^{(n-r,r)} and M_s / M_{s+1} \cong S^{(n-s,s)} for 0 \le s < r.

The relevant graph, again denoted G has vertices all r-dimensional subspaces of \mathbb{F}_q^n; two subspaces are adjacent if and only if their intersection is \{0\}.

Lemma. The degree of G is \binom{n-r}{r}_q q^{r^2}.

Proof. Fix a basis e_1, \ldots, e_n for \mathbb{F}_q^n and let t \le n-r. An arbitrary t-dimensional subspace of \mathbb{F}_q^n intersecting \langle e_1, \ldots, e_r\rangle in \{0\} has the form

\langle v_1 + e_{i_1}, \ldots, v_t + e_{i_t} \rangle

where r+1 \le i_1 < \ldots < i_t \le n and

v_1, \ldots, v_t \in \langle e_1, \ldots, e_r \rangle.

These choices of 'pivot columns' and matrix coefficients uniquely determine the subspace. Hence there are q^{rt} \binom{n-r}{t} such subspaces. Taking t = r we get the result. \Box

I have checked the following claim for n \le 7 when q=2 and n \le 5 when q=3 by computer calculations. Note also that the q-binomial factor \binom{n-(r+s)}{r-s}_q counts, up to a power of q, the number of (r-s)-dimensional complements in \mathbb{F}_q^n to a fixed subspace of dimension r+s, in good analogy with the symmetric group case. But it seems some careful analysis of reduced row-echelon forms (like the argument for the degree of G above, but too hard for me to to do in an evening) will be needed to nail down the power of q. Examples 11.7 in James' lecture notes seem helpful.

Claim The adjacency matrix for the graph acts on S^{(n-s,s)} by scalar multiplication by (-1)^s q^{r^2 - rs + \binom{s}{2}}\binom{n-(r+s)}{r-s}_q.

Assuming the claim, the least eigenvalue is the one for s=1, and it is -q^{r^2-r} \binom{n-r-s}{r-s}_q. Putting this into Hoffman’s bound and using that the subspaces graph is regular of degree \binom{n-r}{r}_q q^{r^2} as seen above we get that the maximum size of an independent set is at most

\begin{aligned} \frac{\binom{n}{r}_q q^{r^2-r} \binom{n-r-1}{r-1}_q}{q^{r^2}\binom{n-r}{r}_q + q^{r^2-r}\binom{n-r-1}{r-1}_q} &= \frac{ \binom{n}{r}_q}{ q^r\binom{n-r}{r}_q/\binom{n-r-1}{r-1}_q + 1} \\ &= \frac{ \binom{n}{r}_q}{ q^r\frac{q^{n-r}-1}{q^r-1} + 1} \\ &= \textstyle \binom{n}{r} \frac{q^r-1}{q^r(q^{n-r}+1)+q^r-1} \\&= \textstyle \binom{n-1}{r-1}_q \end{aligned}

which, somewhat spectacularly I feel, again comes out on the nose.

A Putnam problem

The second generalization ties in with another project I’ve worked on but neglected to write up beyond a blog post: to prove the \mathrm{GL}_n(\mathbb{F}_q)-analogue of the spectral results on the Kneser graph motivated by Problem A2 in the 2018 Putnam.


Commuting group actions

April 28, 2023

In Schur–Weyl duality one has commuting actions of two algebras A and B on a vector space V such that each is the other’s centralizer. That is \mathrm{End}_A(V) = B and A = \mathrm{End}_B(V). In particular the actions of A and B on V commute. Using a left-action for A and a right-action for B, this says that

(av)b = a(vb)

for all a \in A, b \in B and v \in V, a fact one would surely expect from the notation. The purpose of this post is to record a lemma about commuting group actions that arose in this context.

Let G and H be finite groups with G acting on the left on \Omega and H acting on the right on \Omega, such the actions commute. For a simple example where G and H may be non-abelian, fix a group G and take \Omega = G = H. Then, for each x \in \Omega, we have

(gx)h = g(xh)

by the associativity of group multiplication. The slogan ‘action on places commutes with action on values’ gives another family of examples, typified by \Omega = \{1,\ldots, b\}^a with G = S_a acting on the left on \Omega by position permutation, and H = S_b acting on the right, diagonally on each of the a factors, by permuting values. For instance, if a = 4 and b = 2 then one joint orbit is

\{(1,\!1,\!2,\!2),\! (1,\!2,\!1,\!2),\! (2,\!1,\!1,\!2),\! (1,\!2,\!2,\!1), (2,\!1,\!2,\!1), (2,\!2,\!1,\!1) \}.

The orbit has size 6 so we expect the stabiliser of points in the orbit to have order 4! 2! / 6 = 8; a little thought shows that the stabiliser of (1,1,2,2) is

\langle (1,2)_G, (3,4)_G, (1,3)(2,4)_G (1,2)_H \rangle \cong C_2 \wr C_2.

(Here group elements are tagged by the group they live in.) Note the ‘diagonally embedded’ element (1,3)(2,4)_G (1,2)_H: even though we are working in G \times H, it is not the case that the stabiliser factors as \mathrm{Stab}_G(\omega)\mathrm{Stab}_H(\omega).

Since there is no established notation for the joint stabiliser in a simultaneous left- and right- action (in fact such a notation would probably just be a standing invitation to commit the diagonal fallacy), I’m going to switch to two commuting faithful right-actions, thought of as a right action of the Cartesian product G \times H, in which, by definition, the two factors commute. Since letters will make it clear which group contains each element, I’ll write gh rather than the formally correct (g, h).

Proposition.

Let G and H be finite groups. Suppose that G \times H acts transitively on a set \Omega. Fix \omega \in \Omega.

  1. The orbits of G on \Omega are blocks for the action of H.
  2. Every point in the G-orbit of \omega has the same H-stabiliser.
  3. We have G\mathrm{Stab}_{G \times H}(\omega) = G\mathrm{Stab}_H(\omega G) where \mathrm{Stab}_H(\omega G) is the subgroup of those h \in H fixing setwise the orbit \omega G. Moreover G is transitive on \Omega if and only both these groups are G \times H.
  4. There is an isomorphism of right \mathbb{C}G-modules \mathbb{C}(\omega G) \cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G.
  5. There are isomorphisms of right \mathbb{C}(G \times H)-modules\begin{aligned}\mathbb{C}\Omega &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_{G \times H} (\omega)}^{G \times H} \\ &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \bigl\uparrow_{G\mathrm{Stab}_{G \times H}(\omega)}^{G \times H} \\ &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \bigl\uparrow_{G\mathrm{Stab}_{H}(\omega G)}^{G \times H}\end{aligned}
  6. where we use (3) to regard \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \cong \mathbb{C}(\omega G) as a G \mathrm{Stab}_{G \times H}(\omega)-module in the second line and as a G \mathrm{Stab}_H(\omega G)-module in the third line.

Proof.

  1. More generally, if N \le K is a normal subgroup of a permutation group K then the orbits of N are blocks for the action of K, and for the action of any subgroup of K.
  2. Points in the same orbit have conjugate stabilisers, but here the conjugation action is trivial. Or, to see it very concretely, suppose that \omega h = \omega. Then (\omega g)h = (\omega h)g = \omega g. This shows that \mathrm{Stab}_H(\omega) \le \mathrm{Stab}_H(\omega g) and by symmetry equality holds.
  3. The left-hand side is the set of cosets Gh such that \omega g h = \omega for some g \in G. The right-hand side is the set of cosets Gh such that \omega h = \omega g' for some g' \in G. These are equivalent conditions: take g' = g^{-1} and use that the actions commute. Finally, identifying \Omega with the coset space(G \times H)/\mathrm{Stab}_{G \times H}(\omega)we see that G is transitive if and only if \mathrm{Stab}_{G\times H}(\omega) G = GH. Since G is a normal subgroup of G \times H, we may switch the order to get G \mathrm{Stab}_{G\times H}(\omega) = G \times H, as required.
  4. This is very basic. A nice proof uses the characterisation of induced modules that V \cong U\uparrow_L^K if and only if V\downarrow_L has a submodule isomorphic to U and \dim V = [K : L]\dim U. In this case we take U to to be the right \mathbb{C}\mathrm{Stab}_G(\omega)-module \langle \omega \rangle \cong \mathbb{C}.
  5. The first isomorphism follows as in (4) using that G \times H acts transitively on \Omega. The second follows from the third by (3). Finally the third isomorphism can be proved similarly using the characterisation of induced modules. \Box

Note that each part has a dual result swapping G and H. Since there is a reasonable notation for bimodule induction, using, as above, uparrows on the right, and writing \mathrm{Ind} for induction on the left, we can state the following corollary with left- and right-actions.

Corollary.

Let G and H be finite groups with G acting on the left on \Omega and H acting on the right on \Omega, such the actions commute. Then \mathbb{C}\Omega is a left G– and right H-bimodule and, given any \omega \in \Omega we have

\begin{aligned} \mathbb{C}\Omega &\cong \bigl( \mathrm{Ind}_{\mathrm{Stab}_G(\omega)}^G \mathbb{C} \bigr)\bigl\uparrow_{G \mathrm{Stab}_H(G\omega)}^{G \times H} \\ &\cong  \mathrm{Ind}_{\mathrm{Stab}_G(\omega H)}^{G \times H} \bigl(\mathbb{C} \bigl\uparrow_{\mathrm{Stab}_H(\omega)}^H \bigr)\end{aligned}

Proof.

This is immediate from (5) in the proposition and its dual version.\Box

It is interesting to note that if, as in the example below, H is transitive, then the effect of the left induction functor in the isomorphism in the second line is simply to turn a right-module into a bimodule, without changing its dimension or its right-module structure in any way.

Example: permuting matchings.

To finish, here is one of a family of interesting examples that come from taking two wreath products whose top groups are equal (as abstract groups) but whose base groups are different. Let

\begin{aligned} G &= \langle (\overline{1},\overline{2}), (\overline{4},\overline{5}) \rangle \rtimes \langle (\overline{1},\overline{4})(\overline{2},\overline{5}) \rangle \\ H &= (S_{\{1,2,3\}} \times S_{\{4,5,6\}}) \rtimes \langle (1,4)2,5)(3,6) \rangle. \end{aligned}

The groups G and H act on the set \Omega of four edge matchings between the parts \{ \overline{1},\overline{2},\overline{4},\overline{5} \} and \{1,2,3,4,5,6\} that respect the blocks \{\overline{1},\overline{2} \}, \{\overline{3}, \overline{4}\}, \{1,2,3\}, \{4,5,6\} shown by the rooted trees in the graph below. The matching \omega shown by black lines is \{ (\overline{1},1), (\overline{2},2), (\overline{4}, 4), (\overline{5}, 5) \}.

Image

It is clear that the actions of G and H commute. Moreover, it is an easy exercise to show that |\Omega| = 72 and both G and H act regularly. We have

\omega (\overline{1}, \overline{2}) \omega = \{ (\overline{2},1), (\overline{1},2), (\overline{4}, 4), (\overline{5}, 5)

and so (\overline{1}, \overline{2}) \omega (1,2) = \omega. This is shown graphically below.

Image

Similarly (\overline{1},\overline{4})(\overline{2},\overline{5}) \omega (1,4)(2,5)(3,6) = \omega. On the other hand, the permutation (2,3) \in H cannot be undone by a permutation in G. In fact, switching to right-actions for both groups, as in the proposition, we have

\begin{aligned} \mathrm{Stab}_{G \times H}(\omega) &= \langle (\overline{1}, \overline{2})(1,2), (\overline{1},\overline{4})(\overline{2},\overline{5})(1,4)(2,5) \rangle \\ &\cong S_2 \wr S_2.\end{aligned}

Note that, as predicted by the dual version of (3) in the lemma, since H is transitive, the stabiliser meets every coset of G and \mathrm{Stab}_{G \times H}(\omega)H = G \times H.


Scrapbook on plethysms

October 6, 2022

I’m visiting OIST in Okinawa, Japan for 6 weeks, in which I plan to work on some of the problems in the (over)-extended research summary I wrote for the Oberwolfach meeting in September. The purpose of this post is to collect some bits and pieces maybe relevant to these problems or that come in my reading.

I was briefly tempted to title this post ‘tropical plethysms’, but then it occurred to me that perhaps the idea was perhaps not completely absurd: as a start, is there such a thing as a tropical symmetric function?

A generalized deflations recurrence

For this post, let us say that a skew partition \lambda / \mu of \ell m is a horizontal \ell-border strip if there is a border-strip tableau T of shape \lambda / \mu comprised of m disjoint \ell-hooks, such that these hooks can be removed working right-to-left in the Young diagram. It is an exercise (see for instance the introduction to my paper on the plethystic Murnaghan–Nakayama rule) to show that at most one such T exists. We define the \ell-sign of the skew partition, denoted \mathrm{sgn}_\ell(\lambda/\mu), to be 0 if \lambda / \mu is not a horizontal \ell-border strip, and otherwise to be (-1)^k where k is the sum of leg lengths in T. Thus a skew partition is 1-decomposable if and only if it is a horizontal strip in the usual sense of Young’s rule, a skew partition of \ell is \ell-decomposable if and only if it is an \ell-hook (and then its \ell-sign is the normal sign), and

\mathrm{sgn}_3\bigl((7,5,4)/(4,3)\bigr) = (-1)^{0+1+0} = -1,

as shown by the tableau below:

Image

Proposition 5.1 in this joint paper with Anton Evseev and Rowena Paget, restated in the language of symmetric functions is the following recurrence for the plethysm multiplicities relevant to Foulkes’ Conjecture:

\begin{aligned}\langle s_{(n)}& \circ s_{(m)}, s_\lambda \rangle = \\ &\frac{1}{n} \sum_{\ell=1}^n \sum_{\alpha \in \mathrm{Par}((n-\ell) m))} \mathrm{sgn}_\ell(\lambda / \alpha)\langle s_{(n-\ell)} \circ s_{(m)}, s_\alpha \rangle.\end{aligned}

Our proof of Proposition 5.1 uses the theory of character deflations, as developed earlier in the paper, together with Frobenius reciprocity.

Generalizations of the Foulkes recurrence

My former Ph.D. student Jasdeep Kochhar used character deflations to prove a considerable generalization, in which (n) is replaced with an arbitrary partition, and (m) with an arbitrary hook partition. I think the only reason he stopped at hook partitions was that this was the only case where there was a convenient combinatorial interpretation of a certain inner product (see the end of this subsection), because his argument easily generalizes to show that

\begin{aligned}\langle s_{\nu}& \circ s_{\mu}, s_\lambda \rangle = \\ &\frac{1}{n} \sum_{\ell=1}^n \sum_{\alpha} \sum_\beta \langle p_\ell \circ s_\mu, s_{\lambda/\alpha} \rangle \mathrm{sgn}_\ell(\nu/\beta) \langle s_{\beta} \circ s_{\mu}, s_\alpha \rangle \end{aligned}

where \mu is any partition of m, the first sum is over all \alpha \in \mathrm{Par}\bigl( (n-\ell)m \bigr) (as before) and the second sum is over all \beta \in \mathrm{Par}(n-\ell). Since a one part partition has a unique \ell-hook, if \nu = (n) then the only relevant \beta is (n-\ell). Hence a special case of Kochhar’s result, that generalizes the original result in only one direction, is

\displaystyle \langle s_{(n)} \circ s_{\mu}, s_\lambda \rangle = \frac{1}{n} \sum_{\ell=1}^n \sum_{\alpha} \langle p_\ell \circ s_\mu, s_{\lambda/\alpha} \rangle \langle s_{(n-\ell)} \circ s_{\mu}, s_\alpha \rangle

with the same condition on the sum over \alpha. In the special case where \mu = (m), the plethystic Murnaghan–Nakayama rule states that

\langle p_\ell \circ s_{(m)}, s_{\lambda/\alpha} \rangle = \mathrm{sgn}_\ell(\lambda/\alpha)

and substituting appropriately we recover the original result. More generally, Theorem 6.3 in the joint paper implies that if \mu is a hook partition then \langle p_\ell \circ s_\mu, s_{\lambda/\alpha} \rangle is the product of \mathrm{sgn}_\ell(\lambda / \alpha) and the size of a certain set of \lambda/\alpha border-strip tableaux in which all the border strips have length \ell.

Kochhar’s recurrence can be generalized still further, replacing \nu and \mu with skew partitions \nu / \nu^\star and \mu / \mu^\star. Below we will prove (\star):

\begin{aligned}\langle &s_{\nu/\nu^\star} \circ s_{\mu/\mu^\star}, s_\lambda \rangle = \\ &\!\frac{1}{n} \!\sum_{\ell=1}^n \sum_{\alpha} \sum_\beta \langle p_\ell \circ s_{\mu/\mu^\star}, s_{\lambda/\alpha} \rangle \mathrm{sgn}_\ell(\nu/\beta) \langle s_{\beta/\nu^\star} \circ s_{\mu/\mu^\star}, s_\alpha \rangle \end{aligned}

where \nu/\nu^\star is a skew partition of n, and the sums are over all \alpha \in \mathrm{Par}\bigl( (n-\ell)m \bigr) and \beta \in \mathrm{Par}(n-\ell) such that \lambda/\alpha, \nu/\beta and \beta/\nu^\star are skew partitions.

Preliminaries for a symmetric functions proof of (\star)

We will use several times that if f and g are symmetric functions of degrees a and b, and \nu is a partition of a+b then

\langle s_\nu, fg \rangle = \sum_{\beta \in \mathrm{Par}(b)} \langle s_{\nu / \beta}, f \rangle \langle s_\beta, g \rangle.

One nice proof uses that the coproduct \Delta on the ring of symmetric functions satisfies \Delta s_\nu = \sum_{\beta \in \mathrm{Par}(b)} s_{\nu / \beta} \otimes s_\beta and the general fact

\langle h, fg \rangle = \langle \Delta h, f \otimes g \rangle,

expressing that the coproduct is the dual of multiplication — here one must think of multiplication as the linear map m defined by m (f \otimes g) = fg.

Let p_\gamma be the power sum symmetric function labelled by the partition \gamma. The expansion of an arbitrary homogeneous symmetric function f of degree n in the power sum basis is given by

\displaystyle f = \sum_{\gamma \in \mathrm{Par}(n)} \langle f, p_\gamma \rangle \frac{p_\gamma}{z_\gamma}

where z_\gamma is the size of the centralizer of an element \sigma_\gamma of cycle type \gamma in the symmetric group S_n. (This is in fact the most useful definition for our purposes, but there is also the explicit formula z_\gamma = \prod_{i=1}^n i^{a_i} a_i!, where a_i is the number of parts of size i in \gamma, or equivalently, the number of i-cycles in \sigma_\gamma.) The symmetric functions version of the Murnaghan–Nakayama rule is

\langle s_{\nu/\beta}, p_\gamma \rangle = \chi^{\nu/\beta}(\sigma_\gamma)

where \chi^{\nu/\beta} is the symmetric group character canonically labelled by the skew partition \nu. Thus

\displaystyle s_\nu = \sum_{\gamma \in \mathrm{Par}(n)} \chi^\nu(\sigma_\gamma) \frac{p_\gamma}{z_\gamma}

expresses a general Schur function in the power sum basis. More typically the Murnaghan–Nakayama rule is applied inductively by repeatedly removing hooks: if \tau \in S_{n-\ell} then, the coproduct relation implies that

\displaystyle \langle s_{\nu/\nu^\star}, p_\ell p_{\mathrm{cyc}(\tau)} \rangle = \sum_{\beta \in \mathrm{Par}(n-\ell)} \langle s_{\nu/\beta}, p_\ell \rangle \langle s_{\beta/\beta^\star}, p_{\mathrm{cyc}(\tau)} \rangle

and hence interpreting each side as a character value, we get the familiar relation

\displaystyle \chi^\nu(\pi \tau) = \sum_{\beta \in \mathrm{Par}(n-\ell)} \mathrm{sgn}_\ell(\nu/\beta) \chi^\beta(\tau)

where \pi is an \ell-cycle disjoint from \tau.

A symmetric functions proof of (\star)

Expressing s_{\nu/\nu^\star} in the power sum basis we get

\begin{aligned}\langle s_{\nu/\nu^\star} \circ s_{\mu/\mu^\star}, s_\lambda \rangle &= \bigl\langle \sum_{\gamma \in \mathrm{Par}(n)} \chi^{\nu/\nu^\star}(\sigma_\gamma) \frac{p_\gamma}{z_\gamma} \circ s_{\mu/\mu^\star}, s_\lambda \bigr\rangle \\ &= \sum_{\gamma \in \mathrm{Par}(n)} \frac{1}{z_\gamma} \chi^\nu(\sigma_\gamma) \langle p_\gamma \circ s_{\mu/\mu^\star}, s_\lambda \rangle \\ &= \frac{1}{n!} \sum_{\sigma \in S_n} \chi^{\nu/\nu^\star}(\sigma) \langle p_{\mathrm{cyc}(\sigma)}\circ s_{\mu/\mu^\star}, s_\lambda \rangle \end{aligned}

To continue we proceed as in both Kochhar’s proof and the proof of the original Foulkesian recurrence by splitting up the sum according to the length of the cycle of \sigma containing 1. There are

(n-1)\ldots (n-\ell+1)

ways to choose elements a_2, \ldots a_\ell \in \{1,\ldots, n\} to define an \ell-cycle \pi containing 1; we must then choose a permutation \tau of the remaining n-\ell elements. We saw in the preliminaries that, by the Murnaghan–Nakayama rule, if \pi is an \ell-cycle then \chi^{\nu/\nu^\star}(\pi \tau) = \sum_{\beta \in \mathrm{Par}(n-\ell)} \mathrm{sgn}_\ell(\nu/\beta) \chi^{\beta/\nu^\star}(\tau). Hence the right-hand side displayed above is

\begin{aligned} \frac{1}{n} \frac{1}{(n-\ell)!} \sum_{\ell=1}^n \sum_{\tau \in S_{n-\ell}} \sum_{\beta \in \mathrm{Par}(n-\ell)} &\mathrm{sgn}_\ell(\nu/\beta) \chi^{\beta/\nu^\star}(\tau) \\ & \langle p_\ell p_{\mathrm{cyc}}(\tau) \circ s_{\mu/\nu^\star}, s_\lambda \rangle. \end{aligned}

We now use that (fg)\circ k = (f \circ k)(g \circ k) for any symmetric functions f, g, k (this is clear for k with positive integer monomial coefficients from the interpretation of plethysm as ‘substitute monomials for variables’) and an application the coproduct relation from the preliminaries to get

\begin{aligned} \displaystyle \frac{1}{n}& \sum_{\ell=1}^n \sum_{\alpha \in \mathrm{Par}((n-\ell)m)} \sum_{\beta \in \mathrm{Par}(n-\ell)} \mathrm{sgn}_\ell(\nu/\beta) \chi^{\beta/\nu^\star}(\tau) \\ & \quad \frac{1}{(n-\ell)!} \langle p_\ell \circ s_{\mu/\mu^\star}, s_{\lambda/\alpha}\rangle \sum_{\tau \in S_{n-\ell}}\langle p_{\mathrm{cyc}(\tau)} \circ s_{\mu/\mu^\star}, s_\alpha \rangle.\end{aligned}

Repeating the first steps in the proof in reverse order in the inductive case for n-\ell, this becomes

\displaystyle \frac{1}{n} \sum_{\ell=1}^n \sum_{\alpha} \sum_{\beta} \mathrm{sgn}_\ell(\nu/\beta) \langle p_\ell \circ s_{\mu/\mu^\star}, s_{\lambda/\alpha} \rangle \langle s_{\beta/\nu^\star} \circ s_{\mu/\mu^\star}, s_\alpha \rangle

where the sums are as before. This completes the proof.

Stability of Foulkes Coefficients

I’m in the process of typing up a joint paper with my collaborator Rowena Paget where we prove a number of stability results on plethysm coefficients. Here I’ll show the method using the plethysm s_{(n)} \circ s_{(m)} relevant to Foulkes’ Conjecture.

Let \mathrm{SSYT}(\mu) be the set of semistandard tableaux of shape \mu. This set is totally ordered by setting s < t if and only if, in the rightmost column in which s and t differ, the larger entry occurs in t rather than s. Identifying semistandard tableaux of shape (1^m) with m-subsets of \mathbb{N}, this order becomes the colexicographic order on sets; similarly identifying semistandard tableaux of shape (m) with m-multisubsets of \mathbb{N}, it becomes the colexicographic order on multisets. An initial segment of < when \mu = (3) is shown below.

Image

Plethystic semistandard tableaux

We can now give the key definition.

Definition. Let \nu be a partition of n and let \mu be a partition of m. A plethystic semistandard tableaux of shape (\nu, \mu) is a \nu-tableau with entries from the set \mathrm{SSYT}(\mu), such that the \mu-tableau entries are weakly increasing order along the rows, and strictly increasing order down the columns, with respect to the total order <.

Definition. The weight of a plethystic semistandard tableau is the sum of the weights of its \mu-tableau entries.

Let \mathrm{PSSYT}(\nu, \mu)_\lambda denote the set of plethystic semistandard tableaux of shape (\nu, \mu) and weight \lambda. For example the three elements of \mathrm{PSSYT}\bigl( (2), (3) \bigr)_{(3,2,1)} are shown below.

Image

It is a nice exercise to show that the set \mathrm{PSSYT}\bigl( (n), (m) \bigr)_{(nm-r,r)} is in bijection with the set of partitions of r contained in an n \times m box, by the map sending a partition \gamma of r to the plethystic semistandard tableau whose ith largest (m)-tableau entry has exactly \gamma_i entries equal to 2 (the rest must then be 1).

Monomial coefficients in plethysms

The Schur function s_\lambda enumerates semistandard \lambda-tableaux by their weight. Working with variables x_1, x_2, \ldots and writing, as is common, x^t for x_1^{\mathrm{wt}(t)_1} x_2^{\mathrm{wt}(t)_2} \ldots , we have

\displaystyle s_\lambda(x_1,x_2,\ldots ) = \sum_{t \in \mathrm{SSYT}(\lambda)} x^t.

In close analogy, the plethsym s_\nu \circ s_\mu enumerates plethystic semistandard tableau of shape (\nu, \mu) by their weight. With the analogous definition of x^T for T a plethystic semistandard tableau, we have

\displaystyle (s_\nu \circ s_\mu)(x_1, x_2, \ldots ) = \sum_{T \in \mathrm{PSSYT}(\nu, \mu)}x^T.

Let \mathrm{mon}_\lambda denote the monomial symmetric function labelled by the partition \lambda. (We avoid the usual notation m_\lambda since m is in use as the size of \mu.) For instance \mathrm{mon}_{(3,1)}(x_1,x_2,\ldots) = x_1^3x_2 + x_1x_2^3 + x_1^3x_3 + \cdots .

It is immediate from the previous displayed equation that the coefficient of the monomial symmetric function \mathrm{mon}_\lambda in s_\nu \circ s_\mu is |\mathrm{PSSYT}(\nu,\mu)_\lambda|. Moreover, by the duality

\displaystyle \langle \mathrm{mon}_\lambda, h_\alpha \rangle = [\lambda = \alpha]

(stated using an Iverson bracket) between the monomial symmetric functions and the complete homogeneous symmetric functions h_\alpha, we have

\langle s_\nu \circ s_\mu, h_\lambda \rangle = |\mathrm{PSSYT}(\nu,\mu)_\lambda| \qquad (\dagger).

This equation (\dagger) is the critical bridge we need from the combinatorics of plethystic semistandard tableaux to the decomposition of the plethysm s_\nu \circ s_\mu into Schur functions.

Two row constituents in the Foulkes plethysm

As an immediate example, we use the suggested exercise above to find the multiplicities \langle s_{(n)} \circ s_{(m)}, s_{(mn-r,r)} \rangle. We require the following lemma.

Lemma. If r \ge 1 then s_{(mn-r,r)} = h_{mn-r}h_r - h_{mn-r+1}h_{r-1}.

Stated as above, perhaps the quickest proof of the lemma is to apply the Jacobi—Trudi formula. Recast as a result about characters of the symmetric group, the lemma says that \chi^{(mn-r,r)} = \pi_r - \pi_{r-1}, where \pi_r is the permutation character of S_{mn} acting on r-subsets of \{1,\ldots, mn\}. This leads to an alternative proof by orbit counting.

Proposition. We have \langle s_n \circ s_m, s_{(nm-r,r)} \rangle = q(r) - q(r-1)\rangle, where q(r) is the number of partitions of r contained in an n \times m box.

Proof. This is immediate from the lemma above and equation (\dagger). \Box

In particular, it follows by conjugating partitions that if n, m \ge r then the multiplicities of s-{(nm-r,r)} in s_n \circ s_m and s_m \circ s_n agree: this verifies Foulkes’ Conjecture in the very special case of two row partitions. Moreover, specializing to two variables, it follows by conjugating partitions that

(s_{(n)} \circ s_{(m)})(x_1,x_2) = (s_{(m)} \circ s_{(n)})(x_1,x_2).

This is a combinatorial statement of Hermite reciprocity. More explicitly, since s_{(2)} \circ s_{(n)} is contained in s_{(n)}^2 which, by Young’s rule, has only constituents with at most 2 parts, we have

s_{(2)} \circ s_{(n)} = s_{(2n)} + s_{(2n-2,2)} + s_{(2n-4,4)} + \cdots,

where the sum ends with s_{(n,n)} if n is even, or s_{(n+1,n-1)} if n is odd.

Stable constituents in the Foulkes plethysm

By the proposition above, if m \ge r and n \ge r, then the multiplicity of s_{(mn-r,r)} in s_n \circ s_m, is equal to |\mathrm{Par}(r)| - |\mathrm{Par}(r-1)|, independently of m and n. The following theorem generalizes this stability result to arbitrary partitions. It was proved, using the partition algebra, as Theorem A in The partition algebra and plethysm coefficients by Chris Bowman and Rowena Paget. I give a very brief outline of their proof in the third section below.

Given a partition \gamma of r, and d \ge \gamma_1 + |\gamma|, let \gamma_{[d]} denote the partition (d-|\gamma|,\gamma_1,\ldots,\gamma_{\ell(\gamma)}). To avoid an unnecessary restriction in the theorem below we also define (1)_{[1]} = (1). For example, the lemma in the previous section that s_{(mn-r,r)} = h_{mn-r}h_r - h_{mn-r+1}h_{r-1} can now be stated more cleanly as s_{(r)_{[mn]}} = h_{(r)_{[mn]}} - h_{(r-1)_{[mn]}}. We generalize this in the first lemma following the theorem below.

Theorem [Foulkes stability] Let \gamma be a partition of r. The multiplicity \langle s_n \circ s_m, s_{\gamma_{[mn]}} \rangle is independent of m and n for m, n \ge r.

To prove the Foulkes stability theorem we need two lemmas. I am grateful to Martin Forsberg Conde for pointing out that the stability property in the first lemma is critical to the proof and should be emphasised.

Lemma [Schur/homogeneous stability]. Fix r \in \mathbb{N} and d \ge 2r. Let L = \bigcup_{s \le r} \mathrm{Par}(s). For each partition \gamma \in \mathrm{Par}(r) there exist unique coefficients b_\beta for \beta \in L such that

s_{\gamma_{[d]}} = \sum_{\beta \in L} b_{\beta} h_{\beta_{[d]}}.

Moreover b_\gamma = 1 and b_\beta =0 unless (d-|\beta|;\beta) \unrhd (d-r;\gamma).

Proof. Going in the opposite direction, we have

h_{\beta_{[d]}} = \sum_{\gamma \in L} K_{\beta_{[d]}\gamma_{[d]}} s_{\gamma_{[d]}}, (\ddagger)

where the Kostka number K_{\lambda\mu} is the number of semistandard Young tableaux of shape \lambda and content \mu. Since K_{\lambda\mu} = 0 unless \lambda \unrhd \mu, we are justified in summing only over elements of L in (\ddagger).

Let \mathcal{T}_\beta be the set of semistandard tableaux of shape \beta_{[d]} and content \gamma_{[d]}. Let \beta \in L. Observe that in any t \in T_\beta, there are d-r entries of 1, and since d-r \ge r \ge |\beta| \ge \beta_1, these 1s fill up the first row of t beyond the end of the second part \beta_1. This leaves (d-|\beta|) - (d-r) = r-|\beta| boxes in the top row to be occupied by entries \ge 2, where the only restriction is that these entries are weakly increasing. Therefore there is a bijection between \mathcal{T}_\beta and the set of semistandard tableaux of disjoint skew-shape \beta \sqcup (r-|\beta|) and content \gamma. It follows that

K_{\beta_{[d]}\gamma_{[d]}} = \widetilde{K}_{\beta\gamma},

independently of d. Moreover, since K_{\lambda\lambda} = 1, we have \widetilde{K}_{\gamma\gamma} = 1, and since K_{\lambda\mu} = 0 unless \lambda \unrhd \mu, we have \widetilde{K}_{\beta\gamma} = 0 unless (d-|\beta|;\beta) \unrhd (d-|\gamma|;\gamma). The lemma now follows by inverting the unitriangular matrix \widetilde{K}. \Box

Lemma [PSSYT stability]. Fix r \in \mathbb{N}. For each partition \gamma with |\gamma| \le r, the size of the set \mathrm{PSSYT}\bigl((n), (m)\bigr)_{\gamma_{[mn]}} is independent of m and n, provided that m \ge r and n \ge r.

Outline proof. Observe that if m > r then any (m)-tableau entry in a plethystic semistandard tableau of shape \bigl((n), (m)\bigr) and weight \gamma_{[mn]} has 1 as its leftmost entry. Similarly, if n > r then the first (m)-tableau entry in such a plethystic semistandard tableau is the all-ones tableau. This shows how to define a bijection between the sets \mathrm{PSSYT}\bigl((n),(m))_{\gamma_{[mn]}} and \mathrm{PSSYT}\bigl((n-1),(m-1)\bigr)_{\gamma_{[(m-1)(n-1)]}}. \Box

We are now ready to prove the Foulkes stability theorem.

Proof of Foulkes stability theorem Let m, n \ge r. By the lemma on homogeneous/Schur stability, it is sufficient to prove that for each partition \beta such that |\beta| < r or \beta \unrhd \gamma, the multiplicities \langle s_n \circ s_m, h_{\beta_{[mn]}}\rangle are independent of m and n. This is immediate from equation (\dagger) and the second lemma on PSSYT stability. \Box

Stability for more general plethysms

Let \mathcal{W}_d be the set of integer sequences (\omega_1,\omega_2, \ldots \omega_d) such that \sum_{i=1}^j \omega_i \ge 0 for all j \le d and \sum_{i=1}^d \omega_i = 0. Given \omega \in \mathcal{W} and a partition \mu such that \mu -\omega is also a partition, we define L_\mu(\omega) to be the maximum number of single box moves, always from longer rows to shorter rows, that take the Young diagram of \mu to the Young diagram of \mu-\omega.

For instance when \omega = (1,2,-2,-1,1,-1) and \mu = (9,7,2,2,2), the maximum number of moves is 6: the unique longer sequence moves one box from row 1 to row 2, then three boxes from row 2 to row 3, then one box from row 3 to row 4, and finally one box from row 5 to row 6. The sequence of partitions is

\begin{aligned} (9,7,2,2,2) \mapsto (8,8,2,2,2) &\mapsto (8,5,5,1,1) \\ &\!\!\!\!\!\!\!\!\!\!  \mapsto (8,5,4,3,2) \mapsto (8,5,4,3,1,1) \end{aligned}.

We leave it to the reader to check that L_{(9,6,2,2,2)}(\omega) = 5; since one box must be moved directly from row 2 to row 4, the 6 move sequence above is no longer feasible.

As background and motivation, we remark that \mathcal{W}_d is the weight space of the Lie algebra \mathsf{sl}_d(\mathbb{C}), and an upper bound for L_\mu(\omega) is the minimal length L(\omega) of an expression for \omega as a sum of the basic roots \epsilon_j - \epsilon_{j+1}. For instance, again with \omega = (1,2,-2,-1,1,-1) we have

\omega = (\epsilon_1 - \epsilon_2) + 3(\epsilon_2 - \epsilon_3) + (\epsilon_3 - \epsilon_4) + (\epsilon_5-\epsilon_6)

corresponding to the 6 move sequence above, and L(\omega) = 1 + (1+2) + (1+2-2) + 0 + 1 + 0 = 6. In general, we have L(\omega) = \sum_{j=1}^d (\omega_1 + \cdots + \omega_j), or, equivalently,

L(\omega) = \sum_{j=1}^d (d-i)\omega_i.

Remark. Brion uses this Lie theoretic interpretation the stability part of the claim below as Theorem 3.1(ii) in his paper Stable properties of plethysm. Part (i) proves that the multiplicity is increasing, while (iii) gives a geometric interpretation of the stable multiplicity.

Claim Let \mu be a partition and let \omega \in \mathcal{W}_d be such that \mu-\omega is also a partition. Let \ell = L_\mu(\omega). The plethysm coefficient \langle s_{(n)} \circ s_\mu, s_{n\mu - \omega} \rangle is constant for n \ge \ell and the stable value is \bigl|\mathrm{PSSYT}\bigl( (\ell), \mu \bigr)\bigr|_{\ell\mu - \omega}.

Our proof again uses the machine of plethystic semistandard tableaux and stable weight space multiplicities, as computed by taking the inner product with complete homogeneous symmetric functions. As a final remark, note that if \omega = (|\gamma|,-\gamma_1, \ldots, -\gamma_{\ell(\gamma)}) for a partition \gamma then L_{(m)}(\omega) = |\gamma| and the claim gives the stability bound in the Foulkes case \mu = (m) originally due to Bowman and Paget and proved above.

The partition algebra and stability

Let E be the natural representation of the general linear group \mathrm{GL}_d(\mathbb{C}). In conventional Schur—Weyl duality, one uses the bimodule E^{\otimes r}, acted on diagonally by \mathrm{GL}_d(\mathbb{C}) on the left and by place permutation by S_r on the right to pass between polynomial representations of \mathrm{GL}_d(\mathbb{C}) of degree r and representations of S_r. The key property that makes this work is that the two actions commute, and, stronger, they have the double centralizer property:

\begin{aligned} \mathrm{End}_{\mathrm{GL}_d(\mathbb{C})}(E^{\otimes r}) &= \mathbb{C}S_r \\ \mathrm{End}_{S_r}(E^{\otimes r}) &= \mathbb{C}\mathrm{GL}_d(\mathbb{C}). \end{aligned}

It follows that when one restricts the left action of \mathrm{GL}_d(\mathbb{C}) by replacing the general linear group with a smaller subgroup, the group algebra \mathbb{C}S_r must be replaced with some larger algebra. For the orthogonal group one obtains the Brauer algebra, and restricting all the way to the symmetric group S_d, one obtains the partition algebra (each defined with parameter d). In the recent meeting at Oberwolfach, Mike Zabrocki commented that he expected new progress on plethysm problems to be made by applying Schur—Weyl-duality in these variants. Incidentally, I highly recommend Zabrocki’s Introduction to symmetric functions for an elegant modern development of the subject.

An outline of the Bowman—Paget method

Given a partition \beta of r, let \mathcal{P}^\beta be the collection of set partitions of \{1,\ldots, r\} into disjoint sets of sizes \beta_1, \ldots, \beta_{\ell(\gamma)}. The symmetric group S_r acts transitively on each \mathcal{P}^\beta; let P^\beta be the corresponding permutation module defined over \mathbb{C} and let \rho^\beta be the corresponding permutation character. For instance the permutation character appearing in Foulkes’ Conjecture of S_{mn} acting on set partitions of \{1,\ldots, mn\} into n sets each of size m is \rho^{(m^n)}. In this case, each set partition in \mathcal{P}^{(m^n)} has stabiliser S_m \wr S_n; in general a stabiliser is a direct product of wreath products acting on disjoint subsets.

Let \mathrm{Par}_{\ge 2}(r) denote the set of partitions of r into parts all of size at least 2. The following theorem is an equivalent restatement of Theorem 8.9 in the paper of Bowman and Paget cited above.

Theorem [Bowman—Paget 2018]. Let \gamma be a partition of r and let m, n \ge r. The stable multiplicity \langle s_{(n)} \circ s_{(m)}, s_{\gamma_{[mn]}}\rangle is equal to \sum_{\beta \in \mathrm{Par}_{\ge 2}(r)} \langle \rho^\beta, \chi^\gamma \rangle.

The proof is an impressive application of Schur—Weyl duality and the partition algebra. In outline, the authors start by interpreting the left-hand side as the multiplicity of the Specht module S^{\gamma_{[mn]}} in the permutation module P^{(m^n)}. They then apply Schur—Weyl duality to move to the partition algebra, defined with parameter mn. In this setting, the Specht module S^{\gamma_{[mn]}} becomes the standard module \Delta_r(\gamma) for the partition algebra canonically labelled by \gamma. Conveniently this module is simply the inflation of the Specht module S^\gamma from S_r to the partition algebra. By constructing a filtration of the partition algebra module V^{(m^n)} corresponding to P^{(m^n)}, they are able to show that

[V^{(m^n)} : \Delta_r(\gamma)]_{S_{mn}} = \sum_{\beta \in \mathrm{Par}_{\ge 2}(r)} [P^\beta : S^\gamma]_{S_r}.

Restated using characters, this is the version of their theorem stated above. Bowman and Paget also show that V^{(m^n)} depends on the parameters m and n only through their product mn (provided, as ever, m, n \ge r); this gives an exceptionally elegant proof that the stable multiplicities for s_{(n)} \circ s_{(m)} and s_{(m)} \circ s_{(n)} agree.

A new result obtained by the partition algebra

Using the method of Bowman and Paget I can prove the analogous results for the stable multiplicities in the plethysm s_{(n-1,1)} \circ s_m. Here it is very helpful that the natural representation E of S_n decomposes as \chi^{(n)} + \chi^{(n-1,1)}, making it not too hard to describe all the embeddings of the representation S^{(n-1,1)} into the tensor product E^{\otimes r}.

For the (n-1,1) case, the analogue of the set \mathcal{P}^\beta is the set \mathcal{P}_\star^\epsilon of set partitions of \{1,\ldots,r\} into disjoint sets of sizes \epsilon_1, \ldots, \beta_{\ell(\epsilon)}, now with one subset marked. Let \rho_\star^\epsilon be the corresponding permutation character. Let \mathrm{Par}^\star(r) be the set of partitions of r with one marked part, and at most one part of size 1; if there is a part of size 1, it must be the unique marked part, and only the first part of a given size may be marked.

Theorem. Let \gamma be a partition of r and let m, n \in \mathbb{N}. The multiplicity \langle s_{(n-1,1)} \circ s_{(m)}, s_{\gamma_{[mn]}}\rangle is stable for n \ge r+1 and m \ge r. The stable multiplicity is equal to \sum_{\epsilon \in \mathrm{Par}^\star(r)} \langle \rho_\star^\epsilon, \chi^\gamma \rangle.

As a quick example, the marked partitions \epsilon lying in \mathrm{Par}^\star(4) are (3,1^\star), (2^\star, 2) and (4^\star) with corresponding permutation modules induced from S_3 \times S_1, S_2 \times S_2, and S_4. (This example is atypical in that the subgroups are all Young subgroups; in general they are products of wreath products.) The sum of the permutation charaters is 3\chi^{(4)} + 2\chi^{(3,1)} + \chi^{(2,2)} from which one can read off the stable multiplicities of s_{\gamma_{[mn]}} in s_{(n-1,1)} \circ s_m. For instance s_{(4)_{[mn]}} has stable multiplicity 3.

A generalization

This result can be generalized replacing (n-1,1) with an arbitrary partition. It turned out that Bowman and Paget had already proved this generalization (with possibly stronger than required bounds on m and n) and had a still more general result in which (m) could also be varied, a problem I had no idea how to attack. I’m very happy that we agreed to combine our methods in this joint paper.

Here I’ll record some parts of my original approach. Let \mathrm{MPar}_k(r) be the set of pairs (\alpha, \beta) where \beta is a partition of some t \le r having exactly k parts and \alpha is a partition of r - t into parts all of size \ge 2. We say that the elements of \mathrm{MPar}_k(r) are k-marked partitions. Each k-marked partition is determined by the pair of tuples (a_2,a_3,\ldots, a_r) and (b_1,b_2,\ldots, b_r), where a_i is the multiplicity of i as a part of \alpha and b_j is the multiplicity of j as a part of \beta.

Observe that \prod_{j=1}^r S_{b_j} is a Young subgroup of S_k and \prod_{i=1}^r S_i \wr S_{b_i} is a subgroup of S_t, where t = |\beta|, and (b_j) has its conventional meaning.

Given a partition \beta of t having exactly k parts, we define a map from the characters of S_k to the characters of S_t by a composition of restriction, then inflation then induction. Starting with a character \chi of S_k, restrict \chi to the Young subgroup \prod_{i=1}^r S_{b_i}. Then inflate to the product of wreath products \prod_{i=1}^r S_i \wr S_{b_i}. Finally induce the inflated character up to S_t. We denote the composite map by

\mathrm{IndInfRes}_\beta : \mathcal{C}(S_k) \rightarrow \mathcal{C}(S_t).

Theorem. Let \kappa be a partition of k with first part a and let \gamma be a partition of r. Let m, n \in \mathbb{N}. The multiplicity \langle s_{\kappa_{[n]}} \circ s_{(m)}, s_{\gamma_{[mn]}}\rangle is stable for n \ge r+a and m \ge r. The stable multiplicity is equal to

\displaystyle \sum_{(\alpha, \beta) \in \mathrm{MPar}_k(r)} \langle \phi_{(\alpha,\beta)}, \chi^\gamma\rangle

where the character \phi_{(\alpha,\beta)} \in \mathcal{C}(S_r) is defined by

\displaystyle \phi_{(\alpha,\beta)} = \bigl( \prod_{i=2}^r 1_{S_i \wr S_{a_i}}\bigr) \times \mathrm{IndInfRes}_{\beta} \chi^\kappa \uparrow^{S_r}.

We remark that the 0-marked partition of r are in obvious bijection with the partitions of r having no singleton parts, and in this case the character \phi_{(\varnothing, \alpha)} in the left-hand side of the inner product in the theorem is

\phi_{(\varnothing, \alpha)} = \bigl( \prod_{i=2}^r 1_{S_i \wr S_{a_i}}\bigr) \uparrow^{S_r}.

This is the permutation character of S_r acting on set partitions of \{1,\ldots, r\} into disjoint sets of sizes \alpha_1, \ldots, \alpha_{\ell(\alpha)}. Therefore the case k=0 of the theorem recovers the original result of Bowman and Paget.

Similarly, a 1-marked partition (\alpha,\beta) has b_j = 1 for a unique j; this defines a unique marked part of size j in a corresponding marked partition \alpha \sqcup (j)^\star in the sense of the previous section. In this case

\phi_{(\alpha,\beta)} = \bigl( 1_{S_j} \times \prod_{i=2}^r 1_{S_i \wr S_{a_i}} \bigr) \uparrow^{S_r}

which is the permutation character \rho_\star^{\alpha \sqcup (j)} from the previous section. Therefore again the theorem specializes as expected.

Extended example

We find all stable constituents s_{\gamma_{[mn]}} of the plethysm s_{(n-3,2,1)} \circ s_m when |\gamma| = 6. It will be convenient shorthand to write \pi^\epsilon for the Young permutation character induced from the trivial representation of the Young subgroup S_\epsilon. The decomposition of each \pi^\epsilon is given by the Kostka numbers seen earlier in this post.

The set \mathrm{MPar}_3(6) has five marked set partitions.

(1) \bigl( (3), (1,1,1) \bigr): here

\mathrm{IndInfRes}_{(2,1)} \chi^{(2,1)} = \mathrm{Inf}_{S_3}^{S_1 \wr S_3} \chi^{(2,1)} \uparrow S_3 = \chi^{(2,1)}

and

\phi_{((3),(1,1,1))} = \bigl(1_{S_3} \times \chi^{(2,1)} \bigr) \uparrow^{S_6} = \chi^{(5,1)} \!+\! \chi^{(4,2)} \!+\! \chi^{(4,1,1)} \!+\! \chi^{(3,2,1)}.

(2) \bigl( (2), (2,1,1) \bigr): here b = (2,1,0,0,0,0) and so we compute

\chi^{(2,1)}\downarrow_{S_2 \times S_1} = 1_{S_2} \times 1_{S_1} + \mathrm{sgn}_{S_2} \times 1_{S_1}.

The summands inflate to 1_{S_1 \wr S_2} \times 1_{S_2 \wr S_1} and \mathrm{Inf}^{S_1 \wr S_2} \mathrm{sgn}_{S_2} \times 1_{S_2 \wr S_1}, respectively. More simply, these are 1_{S_2} \times 1_{S_2} and \mathrm{sgn}_{S_2} \times 1_{S_2}. Hence

\mathrm{IndInfRes}_{(2,1,1)} \chi^{(2,1)} = \bigl(1_{S_2} \times 1_{S_2} \bigr)\uparrow^{S_4} + \bigl(\mathrm{sgn}_{S_2} \times 1_{S_2}\bigr) \uparrow^{S_4}.

Observe that the right-hand side is \pi^{(2,1,1)}. (This is a bit of a coincidence I think, but convenient for calculation.) Therefore \phi_{(2),(2,1,1)} = 1_{S_2} \times \pi^{(2,1,1)} \uparrow^{S_6} = \pi(2,2,1,1) and

\begin{aligned} \phi_{((2), (2,1,1))} &= \chi^{(6)} + 3\chi^{(5,1)}+4\chi^{(4,2)} + 3\chi^{(4,1,1)} + 2 \chi^{(3,3)} \\ & \qquad + 4\chi^{(3,2,1)} + \chi^{(3,1,1,1)} + \chi^{(2,2,2)} + \chi^{(2,2,1,1)}.\end{aligned}

(3) \bigl( \varnothing, (4,1,1) \bigr). A similar argument to the previous case shows that

\mathrm{IndInfRes}_{(4,1,1)} \chi^{(2,1)} = \bigl(1_{S_2} \times 1_{S_4} \bigr) \uparrow^{S_6} + \bigl( \mathrm{sgn}_{S_2} \times 1_{S_4}\bigr) \uparrow^{S_6}

and since this character is \pi^{(4,1,1)},

\phi_{(\varnothing, (4,1,1))} = \chi^{(6)} + 2\chi^{(5,1)} + \chi^{(4,2)} + \chi^{(4,1,1)}.

(4) \bigl( \varnothing, (3,2,1) \bigr): here b = (1,1,1,0,0,0) and so we compute \chi^{(2,1)}\downarrow_{S_1 \times S_1 \times S_1} = 2\bigl( 1_{S_1} \times 1_{S_1} \times 1_{S_1}\bigr). Inflating and inducing we find that

\mathrm{IndInfRes}_{(3,2,1)} \chi^{(2,1)} = 2\bigl( 1_{S_1} \times 1_{S_2} \times 1_{S_3}\bigr) \uparrow_{S_6}

and since this character is 2\pi^{(3,2,1)}, we have

\phi_{(\varnothing, (3,2,1))} = 2\chi^{(6)} \!+\! 4\chi^{(5,1)} \!+\! 4\chi^{(4,2)} \!+\! 2\chi^{(4,1,1)} \!+\! 2\chi^{(3,3)} + 2\chi^{(3,2,1)}.

(5) \bigl( \varnothing, (2,2,2) \bigr): here b = (0,3,0,0,0,0) and so the restriction map does nothing. We then inflate to get \mathrm{Inf}_{S_3}^{S_2 \wr S_3} \chi^{(2,1)}. The induction of this character to S_6 is

\mathrm{IndInfRes}_{(2,2,2)} \chi^{(2,1)} = \chi^{(5,1)} + \chi^{(4,2)} + \chi^{(3,2,1)}.

(These constituents can be computed using symmetric functions to evaluate the plethysm s_{(2,1)} \circ s_{(2)}.) The displayed character above is \phi_{(\varnothing, (2,2,2))}.

We conclude that, provided n \ge 8 and m \ge 6,

\begin{aligned} & s_{(n-3,2,1)} \circ s_{(m)} = \\ &\quad \cdots + 4s_{(6)_{[mn]}} \!+\! 11s_{(5,1)_{[mn]}} \!+\! 11s_{(4,2)_{[mn]}} \!+\! 7s_{(4,1,1)_{[mn]}} \\ &\qquad \!+\! 4s_{(3,3)_{[mn]}}  \!+\! 8s_{(3,2,1)_{[mn]}} \!+\! s_{(3,1,1,1)_{[mn]}} \!+\! s_{(2,2,2)_{[mn]}}\\ &\qquad  \!+\! s_{(2,2,1,1)_{[mn]}} \!+\! \cdots \end{aligned}

where the omitted terms are for partitions (nm-c;\gamma) where c\not=6. This decomposition can be verified in about 30 seconds using Magma.

Some corollaries

Setting \psi^\kappa_r = \sum_{(\alpha,\beta) \in \mathrm{MPar}_k(r)} \phi_{(\alpha,\beta)} we have

\langle s_{\kappa_{[n]}} \circ s_{(m)}, s_{\gamma_{[mn]}} \rangle = \langle \psi^\kappa_r, \chi^\gamma \rangle.

Plethysms when \nu has two rows. When \kappa = (k) we have, for each partition \beta of t,

\mathrm{IndInfRes}_{\beta}(1_{S_k}) = \prod_{j=1}^r 1_{S_j \wr S_{b_j}} \uparrow^{S_{t}}.

This is the permutation character \rho^\beta of S_t acting on set partitions of \{1,\ldots, t\} into parts of sizes \beta_1, \ldots, \beta_k. Hence

\psi^{(k)}_r = \sum_{(\alpha,\beta)\in\mathrm{MPar}_k(r)} (\rho^\alpha \times \rho^\beta)\uparrow^{S_r}

is the permutation character of S_r acting on set partitions of \{1,\ldots,r\} into (non-singleton) parts of sizes \alpha_1, \ldots, \alpha_{\ell(\alpha)} and k further distinguished parts of sizes \beta_1, \ldots, \beta_k. Denote this character by \rho^{(\alpha,\beta)}. By the restated version of the theorem,

\langle s_{(k)_{[n]}} \circ s_{(m)}, s_\gamma \rangle = \sum_{(\alpha,\beta)\in\mathrm{MPar}_k(r)} \langle \rho^{(\alpha,\beta)}, \chi^\gamma\rangle.

In particular, taking \gamma = (r) we find that

\langle s_{(k)_{[n]}} \circ s_{(m)}, 1_{S_r}\rangle = |\mathrm{MPar}_k(r)|,

provided that n \ge r+k and m \ge r. This result may also be proved using plethystic semistandard tableaux: the left-hand side of the previous displayed equation is \langle s_{(n-k,k)} \circ s_{(m)}, h_{mn-r,r} - h_{(mn-(r-1),r+1)} which we have seen is

|\mathrm{PSSYT}\bigl((n\!-\!k,k),(m)\bigr)_{\!(mn\!-\!r,r)}| \!-\! |\mathrm{PSSYT}\bigl((n\!-\!k,k),(m)\bigr)_{\!(mn\!-\!r\!+\!1,r\!-\!1)}|.

There is a bijection between plethystic semistandard tableaux T \in \mathrm{PSSYT}\bigl((n-k,k), (m)\bigr)_{(mn-r,r)} and partitions of r into k marked parts and some further unmarked parts: the marked parts record the number of 2s in each (m)-tableau in the second row of T. The subtraction in the displayed equation above cancels those partitions having an unmarked singleton part, and so we are left with |\mathrm{MPar}_k(r)|, as required.

Stable hook constituents of s_{(n-k,k)} \circ s_{(m)}. It is known that a plethysm s_\nu \circ s_\mu has a hook constituent if and only if both \nu and \mu are hooks. A particularly beautiful proof of this result, using the plethystic substitution s_\nu[1-X], was given by Langley and Remmel: see Theorem 3.1 in their paper The plethysm s_\lambda[s_\mu] at hook and near-hook shapes. In particular, the only hook that appears in the Foulkes plethysm s_{(n)} \circ s_{(m)} is s_{(mn)}.

Here we consider the analogous result for stable hooks, i.e. partitions of the form (mn-r,r-\ell,1^\ell). To get started take k=0. Let (\alpha,\varnothing) \in \mathrm{MPar}_0(r). Since \rho_\alpha is a Littlewood–Richardson product of the transitive permutation characters 1_{S_i \wr S_{a_i}}\!\uparrow^{S_{ia_i}} for i \in \{2,\ldots, r\}, and a Littlewood–Richardson product is a hook only when every term in the product is a hook, the stable hook constituents (r-\ell,1^\ell) are precisely the hook partitions in

\pi^{(2a_2, 3a_3,\ldots, ra_r)}.

Therefore \langle s_{(n)} \circ s_{(m)}, s_{(nm-r,r-\ell,1^\ell)}\rangle is the number of semistandard Young tableaux of shape (r-\ell,1^\ell) and content (2a_2,3a_3,\ldots,ra_r). This is the Kostka number

K_{(r-\ell,1^\ell), (2a_2,3a_3,\ldots,ra_r)}.

In particular, the longest leg length occurs when the content is (2,3,4,\ldots ), and so the longest leg length of a stable hook grows as \mathcal{O}(\sqrt{r}). (We emphasise that all this follows from the original result of Bowman and Paget.)

If we replace (n) with (n-k,k), using the generalization proved above, then the k marked parts in a marked partition (\alpha,\beta) \in \mathrm{MPar}_k(r) contributes further parts (b_1,2b_2,\ldots,rb_r) to the content above, and so the stable hook multiplicity is

K_{(r-\ell,1^\ell), (b_1,2b_2,\ldots, rb_r,2a_2,3a_3,\ldots, ra_r)}.

For example taking k=0, the stable 7-hooks in s_{(n)} \circ s_{(m)} are (7) with multiplicity 4, and (6,1) with multiplicity 3. The elements of \mathrm{MPar}_0(7) are \bigl( (7), \varnothing \bigr), \bigl( (5,2), \varnothing \bigr), \bigl( (4,3), \varnothing \bigr), \bigl( (3,2,2), \varnothing \bigr), of which the final three each defines a unique semistandard Young tableau counted by the Kostka number above; the contents are (2,5), (3,4) and (4,3) respectively.