Lean 4 and the Curry–Howard correspondence

February 9, 2026

This post is a record of my struggles learning the Lean 4 Proof Assistant, which I’m doing by the inefficient and misleading strategy of interpreting it whenever I can as a souped-up version of Haskell.

Curry–Howard in propositional logic

The formula

\bigl(p \rightarrow (q \rightarrow r)\bigr) \rightarrow \bigl((p \rightarrow q) \rightarrow (p \rightarrow r)\bigr)

is the generic instance of the second axiom scheme in Lukasiewicz’s formulation of propositional logic. As such, it is a tautology, true under any assignment of true and false to the propositional letters. An elegant proof of this is by the Curry–Howard correspondence for propositional logic: a proposition is a tautology if and only if, interpreting the propositional letters as type variables, there is a function of the corresponding type. Thus the fact that the Haskell compiler accepts

lukasiewicz2 :: (p -> (q -> r))
    -> ((p -> q) -> (p -> r))
lukasiewicz2 h_ptoqr h_pq h_p = (h_ptoqr h_p) (h_pq h_p)
-- h_ptoqr h_p :: q -> r evaluated at h_pq h_p :: q ...
-- has type r

or that the Lean compiler accepts

lemma lukasiewicz2 {p q r : Prop} : (p → (q → r))
    → ((p → q) → (p → r))
  := fun h_ptoqr h_pq h_p => (h_ptoqr h_p) (h_pq h_p)

is a proof that the Lukasiewicz formula is a tautology. Here h_pq can be read as ‘the hypothesis that p \rightarrow q‘, or ‘a proof that p \rightarrow q‘. It is a value of type p \rightarrow q. (For more on the curly braces block see below.) The Lean code may look more familiar in the entirely equivalent form below, which shows that Lean permits multiple arguments in type signatures to be listed by juxtaposition, just as for multiple arguments to functions:

lemma lukasiewicz2' {p q r : Prop}
    (h_ptoqr : p → (q → r)) (h_pq : p → q) : p → r
  := fun h_p => (h_ptoqr h_p) (h_pq h_p)

This proof can be rewritten in tactic style using the intro tactic to introduce the hypothesis h_p of type p, and then the exact tactic to finish:

lemma lukasiewicz2'' {p q r : Prop}
    (h_ptoqr : p → (q → r)) (h_pq : p → q) : p → r
  := by intro h_p; exact (h_ptoqr h_p) (h_pq h_p)

One thing I am struggling with in Lean is to get used to tactics, so examples like this where the macro-rewriting black box can easily be understood are particularly welcome. I hope the not so-subtle neurolinguistic programming of typing fun every so often will keep me motivated.

Some more tautologies

In a similar spirit, the Curry–Howard proofs that p → p and p → (q → p) are tautologies have one word solutions: id (valid in Haskell and Lean), and const (valid in Haskell). Here p → (q → p) is the generic instance of the first Lukasiewicz axiom scheme.

Negation

In Lean, negation is definitionally the same as ‘implies false’. That is \neg p = p \rightarrow F. We can mimic this in Haskell by writing data False, to introduce a new type False which deliberately does not have any constructors. We then define negation by the type synonym type Not p = p -> False. (As a visual aid, -> is the Haskell arrow; this is allowed by Lean, but Lean also supports which I’ve used for all Lean code.) For example, the proposition p \wedge \neg p \rightarrow F is a tautology, with Curry–Howard proof in Haskell

impossible :: (p, Not p) -> False
impossible (h_p, h_notp) = h_notp h_p

given by evaluating the function h_notp :: p → False at h_p :: p. One could curry this definition to get the equivalent tautology p \rightarrow (\neg p \rightarrow F) with definition impossible' h_p h_notp = h_notp h_p. The analogous Lean code for the curried form is

lemma impossible {p : Prop} : p → ¬p → False
  := fun h_p => fun h_np => h_np h_p

which, very similarly to before, can be rewritten as

lemma impossible' {p : Prop} (h_p : p) (h_np : ¬p)
  : False := h_np h_p

Observe that since a proposition cannot be both true and false, it is impossible, without using the special Haskell value undefined, or its Lean colleague sorry, to construct two valid inputs to impossible. (Or for another proof of this, note that a successful evaluation of impossible gives a value of the void type False.) But still, the fact that impossible type checks confirms that it is a tautology.

For a simpler instance, observe that id :: False → False type checks and so F \rightarrow F is a tautology. Thinking of types as their set of values, this is consistent with |p\rightarrow q| = |q|^{|p|}, since 0^{|p|} = 0 unless p is itself a void type, in which case we have 0^0 = 1; correspondingly id is the unique function of type False → False.

The contrapositive and vector space duality

Continuing in this mathematical theme, the contrapositive, formulated so that it introduces rather than removes a negation, is (p \rightarrow q) \rightarrow (\neg q \rightarrow \neg p) and correspondingly

contrapositive :: (p -> q) -> (Not q -> Not p)
contrapositive h_pq h_nq h_p = h_nq (h_pq h_p)

is a valid Haskell program, that takes h_pq : p -> q and the proof that p is false, as expressed by a value of type h_nq : q -> False, to produce a new value, namely the composition h_nq . h_pq of type p -> False, which is then evaluated at h_p : p, the argument of the output Not p. Here it may be helpful to note that the final parentheses are redundant: the type could also be written as (p -> q) -> (q -> False) -> (p -> False). Again the Lean code is almost identical:

lemma contrapositive {p q : Prop} : (p → q) → (¬q → ¬p)
  := fun h_pq h_nq h_p => h_nq (h_pq h_p)
  -- fun h_pq => (fun h_nq => (fun h_p =>
  --       h_nq (h_pq h_p)))

where the commented out lines are the less idiomatic form; like any civilised being, Lean curries by default.

The flip in the direction of implication corresponds to vector space duality: if we read False as \mathbb{F}, for a field \mathbb{F} (please ignore that this is not a void type), then the contrapositive states that, given a \mathbb{F}-linear map V \rightarrow W, there is a map \mathrm{Hom}(W, \mathbb{F}) \rightarrow \mathrm{Hom}(V, \mathbb{F}), namely the dual map W^\star \rightarrow V^\star. The introduction of a double negation corresponds to the canonical inclusion of a vector space in its double dual V \rightarrow V^{\star\star} that sends x \in V to evaluation at x:

double_negation_intro :: p -> Not (Not p)
  -- equivalently p -> (p -> False) -> False
double_negation_intro h_p h_np = h_np h_p

or in Lean

lemma double_negation_intro {p : Prop} : p → ¬¬p
  := fun h_p => (fun h_notp => h_notp h_p)

which works because ¬¬p is (p → False) → False so we can take the given value of type p and evaluate it at the argument of type p → False. As an antidote to the false argument that the number of negations cannot go down, I find it helpful to consider

lemma triple_negation_elimination {p : Prop} : ¬¬¬p → ¬p
  := fun h_nnnp =>
    (fun h_p => h_nnnp (double_negation_intro h_p))

which corresponds to the canonical map V^{\star\star\star} \rightarrow V^\star defined by dualising the canonical inclusion V \rightarrow V^{\star\star} just seen. These maps are not just canonical but also ‘natural’ in the sense of ‘natural transformation’: if f : V \rightarrow W is a linear map then the naturality squares below commute.

Image

As a big hint for the question at the following section, can you define a map V^{\star\star} \rightarrow V, natural in the vector space V? If your answer is ‘yes’, you are perhaps working in the category of finite vector spaces, which curiously is not the right setting for this question, even though the question does not, on the face of it, involve any form of infinity!

Double negation elimination

The Lukasiewicz system needs a third axiom scheme to become complete, in the sense that all tautologies are provable from the axioms by repeated applications of modus ponens. This third axiom scheme has as its generic instance the contrapositive, this time removing negation:

(\neg p \rightarrow \neg q) \rightarrow (q \rightarrow p)

Does this have a Curry–Howard proof in Haskell? Does it have a Curry–Howard proof in Lean? Using the version of the contrapositive already proved, we can certainly prove (\neg p \rightarrow \neg q) \rightarrow (\neg \neg q \rightarrow \neg\neg p); then using double negation introduction q \rightarrow \neg\neg q, we get

(\neg p \rightarrow \neg q) \rightarrow (q \rightarrow \neg\neg p).

The Curry–Howard proof of this in Haskell is, using our existing functions,

lukasiewicz3_nearly :: (Not p -> Not q)
  -> q -> Not (Not p)
lukasiewicz3_nearly h_npnq h_q =
  contrapositive h_npnq (double_negation_intro h_q)

which can be ‘desugared’ in careful steps to

lukasiewicz3_nearly1 h_npnq h_q = \h_nq ->
  contrapositive h_npnq (double_negation_intro h_q) h_nq
lukasiewicz3_nearly2 h_npnq h_q = \h_nq ->
  (double_negation_intro h_q) (h_npnq h_nq)
lukasiewicz3_nearly3 h_npnq h_q = \h_nq ->
  (\h_nq' -> h_nq' h_q) (h_npnq h_nq)
lukasiewicz3_nearly4 h_npnq h_q = \h_nq
  -> (h_npnq h_nq) h_q
lukasiewicz3_nearly5 h_npnq h_q h_nq = h_npnq h_nq h_q

I find it quite remarkable that, without a type signature, the inferred type for the final version is (a -> b -> c) -> b -> a -> c, which by parametric polymorphism and the theorems for free mantra, has unique value

lukasiewicz3_nearly6 g x y = g y x

I have no intuition for why this apparently complicated tautology involving both negation and double negation should be such a simple combinator.

It is now clear that the third Lukasiewicz axiom scheme is equivalent in logical power to double negation elimination. But, finally reaching one punchline, the Haskell type Not (Not p) -> p, and its syntactic equivalent ((p -> False) -> False) -> p are uninhabited. Thus there is no way to complete

double_negation_elimination :: Not (Not p) -> p
double_negation_elimination h_nnp = ...

without using undefined. Thus Haskell exhibits the Curry–Howard logic for intuitionistic, rather than classical, logic.

The foundations of Lean are in homotopy type theory, which is also intuitionistic. But Lean supports classical logic, and if I understand right, this is automatically enabled by importing Mathlib. Remembering that proof by contradiction is ‘the falsity of \neg p implies p‘ we get a one-line proof of double negation elimination in Lean.

lemma double_negation_elimination {p : Prop} :
  ¬¬p → p := by_contra
  -- by_contra has type (¬?m.1 → False) → ?m.1

If you are playing along and find this gives an error, please add import Mathlib.Tactic.ByContra at the top of your file. Or add import Mathlib, and go away and make some tea. I used a full import of Mathlib for the examples of dependent types involving natural numbers below. Using this we can prove the instance of the third Lukasiewicz scheme:

lemma contrapositive_eliminating_negation {p q : Prop}
  : (¬p → ¬q) → q → p :=
  fun h_npnq h_q => by by_contra h_np;
  -- we have h_npnq : ¬p → ¬q, h_q : q and now h_np : ¬p
  -- so all set up for
  exact h_npnq h_np h_q

where the definition in the body could be rewritten as intro h_npnq h_q; by_contra h_np; exact h_npnq h_np h_q

Peirce’s Law

For a similar example, Peirce’s Law, that ((p \rightarrow q) \rightarrow p) \rightarrow p, is negation free but still unprovable in intuitionistic logic. Peirce’s Law is the type of the dreaded (well, by me anyway) Scheme function call-with-current-continuation. This type is void in Haskell: instead the nearest approximation is callCC

class Monad m => MonadCont m where
  callCC :: ((a -> m b) -> m a) -> m a

where, omitting type constructors, and choosing a fixed return type r, one suitable type for a monad in the class MonadCont is m p = (p -> r) -> r. This might suggest taking r to be False but then m a is Not (Not a), so we stay in the doubly negated world. I found it an instructive exercise to deduce double negation elimination from Peirce’s Law, by using Peirce with q specialised to False to get ((p \rightarrow F) \rightarrow p) \rightarrow p), i.e. (p \rightarrow \neg p) \rightarrow p and then using a chain of implications \neg\neg p \rightarrow (p \rightarrow \neg p) \rightarrow p; this becomes a function composition below. I will leave filling in the sorry in the sublemma h_aux : ¬¬p → (¬p → p) below as an exercise. As a hint, one solution begins with intro h_nn h_n to get hold of values of the input types ¬¬p and \neg p; now compare the earlier proof by contradiction.

lemma peirce_implies_double_negation_elimination
  : (∀ r q : Prop, ((r → q) → r) → r)
          → (∀ p : Prop, ¬¬p → p) := by
  intro peirce p
  specialize peirce p False
  -- current state is p : Prop, peirce :
    ((p → False) → p) → p, goal ¬¬p → p
  have h_aux : ¬¬p → (¬p → p) := by sorry
  exact fun h_nnp => peirce (h_aux h_nnp)

This proof is probably comically long-winded to experts but no matter. At least I now have a better appreciation for why the first year undergraduates doing my introduction to pure maths course used to write proofs that began with a confident ‘suppose for a contradiction’, continued with a contrapositive, flirted perhaps with induction, and so on and so on, until eventually one reached the meat of the proof, which was usually a one line direct proof of the required proposition.

Types of types

Whereas in Haskell there are values, types and kinds (but without extensions the only kind that matters is \star, the common kind of all types), in Lean 4, in order to avoid Girard’s paradox, there is an infinite hierarchy. (The logical system of Principia Mathematica is I believe, similar.) Nearly at the bottom of the hierarchy are values, just like in Haskell, such as the numbers 3 or 1.234, the list [1,2,3] or the string literal "hello". As in Haskell, these values have various types, for instance Nat for 3, for 1.234, List Nat for [1,2,3] and String for "hello". (Unlike Haskell, strings are not definitionally the same as lists of characters, so String is more like Haskell’s efficient Text data type.) Thus 3 : Nat, 1.234 : ℝ, [1,2,3] : List Nat and "hello" : String. Each of Nat, , List Nat, String is a type. The common type of all these types is Type, which can also be notated Type 0. The hierarchy then proceeds: the type of Type is Type 1, the type of Type 1 is Type 2, and so on. Off to one side, but at the lowest level of all, are values of a further type: Prop. This is the only type we have used so far. The type of Prop is Type. Thus

         10 : Nat  : Type : Type 1 : Type 2
h_p : p → p : Prop : Type : Type 1 : Type 2

are chains of valid type signatures (supposing that h_p and p are in scope) and

variable (p q : Prop)
#check p → (q → p)
#check p → q
#check p → p

prints out p → (q → p) : Prop, p → q : Prop, and p → p : Prop, as expected. (Note the second is not an inhabited type.) To avoid having to inspect the interactive output, one can include the expected output at the end, using an extra pair of parentheses. Thus

#check (p → (q → p) : Prop)

is a suitable replacement for the first check; any mismatch gets highlighted in VSCode with the usual red cross. Some of the claims above on types of types can be confirmed using

#check (3 : Nat)
#check (Nat : Type)
#check (Prop : Type)
#check (Type : Type 1)

Impredicativity of the proposition type

One subtle point is that normally the type of a function α → β where α : Type i, β : Type j is Type k where k is the maximum of i and j. But if β : Prop then no matter how high α is in the hierarchy, the resulting type is Prop. Thus the types of the entity following := in the three examples below are respectively Type 2, Prop and Prop

example {α : Type 1} {β : Type 2} : Type 2 := α → β
example : Prop := (n : Nat) → 0 + n = n
example {p : Prop} : Prop := (α : Type 4) → p

This property ‘impredicativity of Prop‘ is needed below so that our functions of dependent type are propositions, and so compound proofs do not quickly leave the world of propositions.

Proof irrelevance

I have the strong feeling that there can be essentially different proofs of the same proposition. For example, the Fundamental Theorem of Algebra can be proved by induction on the 2-power dividing the degree of the polynomial, using only the intermediate value theorem from basic calculus, or instead as a corollary of Liouville’s theorem from complex analysis that a bounded entire function is constant. Lean 4 would disagree:

theorem proof_irrelevance {p : Prop} (h_p h'_p : p)
  : h_p = h'_p := by rfl

Here rfl can, I think, be read as ‘are reflexively equal’. The analogue in Haskell would be a data type P that is made an instance of Eq by instance Eq P where _ == _ = True.

Aside on curly braces in types

The block {p q r : Prop} in the first example lukasiewicz2 declares that p, q, r are implicitly of type Prop; this block can be omitted, if one first declares variable (p q r : Prop) at the top of the file, as in the check example above. This is a bit like declaring p, q, r to be global types, each themselves of type Prop; compare a global declaration const int x = 3 of global values in a traditional imperative language such as C. Lean also has an analogue of Haskell’s type classes, for which square brackets are used. For instance [Ring R] means that the compiler should know (probably from some other file) that R has the structure of a ring. We won’t need type classes at all below.

Function types and forall

So far we have stayed in the world of propositional logic (which unlike predicate logic does not have quantification), and correspondingly, except for double-negation elimination, everything we have done in Lean can also be done in Haskell. Function types, or \Pi-types, are the key place where the Lean type system is far more expressive. Let me show how one can discover the syntax for oneself. Using the tautology r \rightarrow r, the interactive environment prints out identical output ∀ (r : Prop), r → r : Prop for both the #check queries below.

#check (r : Prop) → r → r
#check ∀ r : Prop, r → r

This is not the same as the output p → p : Prop of #check p → p, which is only valid thanks to the variable declaration variable (p q : Prop) above. Thus ∀ r : Prop, r → r is the type of functions whose

  • first argument is a proposition, called r (say)
  • return value has type the proposition r → r

Or equivalently, ∀ r : Prop, r → r is the type of functions whose

  • first argument is a proposition, called r (as it happens)
  • second argument is a value of type r (let’s call it h_r)
  • returning a value of type r (it must be h_r).

As confirmation here are Lean proofs of the axiom p \rightarrow p, first using implicit declaration as before, and then written with each of these interpretations in mind:

lemma self_implication {p : Prop} : p → p := id
lemma self_implication' : (∀ r : Prop, r → r) :=
  by intro r; exact id
lemma self_implication'' : (r : Prop) → r → r :=
  fun r h_r => h_r

Both primed forms have identical type signatures (when read by the compiler) and the values defined, i.e. the identity functions r → r for each r : Prop are definitionally equivalent. The compiler, at least with my default settings, will object that the argument p in the final function form is unused; it is a nice touch that the output is still more cluttered by the instruction set_option linter.unusedVariables false on how to declutter it. But fair enough: it would be more idiomatic to write

lemma self_implication''' : (r : Prop) → r → r :=
  fun _ h_r => h_r

using the same underscore notation as Haskell for the ‘function hole’. Yet further definitions (i.e. the part following :=) are possible, for instance,

lemma self_implication'''' : (r : Prop) → r → r :=
  by simp
lemma self_implication''''' : (r : Prop) → r → r :=
  by simp?

both succeed, and the latter even tells us how it did it:

Try this: [apply] simp only [imp_self, implies_true]

Well, if you really must. I think I prefer the identity function.

Dependent types

Observe that the type of the output of self_implication'' depends explicitly on the first argument: in the first slot p, of type Prop, is the proposition to be shown to imply itself, whereas in the second and third slots, p is the type. The nearest we can get to the type signature in Haskell, supposing a type Prop is defined, is

self_implication_Haskell :: Prop → p → p -- what's p?

except this entirely fails to capture that p is not just a proposition, but in fact the first argument of the function. Instead the type signature above specifies a function taking an arbitrary proposition to a fully polymorphic function p → p, with no connection between the type variable p and the proposition. (To underline this point, equivalent Haskell type signatures in which p appears within the scope of a universal quantifier are :: Prop -> forall p. p -> p and forall p. Prop -> p -> p; you might need to enable the language extension ExplicitForAll for this to be accepted, but it seems that it is on by default in ghci-9.6.7.) Of course since we can give a Curry–Howard proof of self-implication in Haskell, there is an acceptable definition of self_implication_Haskell, namely \lambda _ h_p = h_p, but from our new dependent-type perspective, this is just a potentially confusing stroke of good luck.

Propositions indexed by natural numbers as a dependent type

For an example where it is not just the type signature but the whole function that cannot be realised in Haskell (at least without special tricks, see the final section below), we consider propositions parametrised by natural numbers. The following are equivalent declarations of a function that takes as input a natural number n, of type Nat, and returns a value of type 0 + n = n, i.e. a proof that adding zero to n gives n.

lemma zero_add_nat : ∀ n : Nat, 0 + n = n := by sorry
lemma zero_add_nat' : (n : Nat) → (0 + n = n) :=
  by sorry

In both cases one suitable replacement for sorry is intro n; ring_nf; after introducing n, the target is 0 + n = n, which falls to the ring normal form tactic. Showing a better appreciation of MathLib, we could also use intro n; exact zero_add n, or, showing a better appreciation of Peano arithmetic, we could use
lemma zero_add_nat'' : (n : Nat) → 0 + n = n := by
  intro n; cases n
  -- zero case 0 + 0 = 0
  exact (add_zero 0)
  -- succ case: ⊢ 0 + (n' + 1) = n' + 1
  rename_i n
  -- Nat.add_assoc (n m k : ℕ) : n+m+k = n+(m+k)
  rewrite [← Nat.add_assoc]
  -- goal is now 0 + n + 1 = n + 1
  -- which is implicitly (0 + n) + 1 = n + 1
  rewrite [zero_add_nat'' n]
  rfl

Above I edited the Lean output to use n' rather than the dagger from Lean, which did not come through well on WordPress. It would be more idiomatic to replace cases n with induction n hn, where hn is bound to the inductive hypothesis. Also it is standard to use rw rather than rewrite, combining multiple rewrites in one line. Since rw includes a weak form of rfl, the final line is then redundant. With these changes we get something you might recognise from the Lean Number Game.

lemma zero_add_nat''' : (n : Nat) → 0 + n = n := by
  intro n; induction n with
  | zero => exact (add_zero 0)
  | succ n hn => rw [← Nat.add_assoc, hn]

This is also a good example of the impredicativity of Prop mentioned in the section above on the type hierarchy, where we saw that example : Prop := (n : Nat) → 0 + n = n type checks. Correspondingly, the type of the function type zero_add_nat is Prop. This can be verified, taking great care not to confuse types on the left of := with types (as values) on the right, by

def type_of_zero_add_nat := ∀ n : Nat, 0 + n = n
#check (type_of_zero_add_nat : Prop)

Pi-types

Returning to the type signatures, notice that both zero_add_nat : (n : Nat) -> 0 + n = n and
lemma self_implication''' : (p : Prop) → p → p
  := fun _ h_p => h_p

taken verbatim from above are of the general form

lemma piTest {N : Type} {f : N → Prop} : (n : N) → f n
  := by sorry

which does indeed type check. (The triangle warning is from the use of sorry, which is of course the only way to complete the definition.) For more on the \Pitype, see ‘Function types (a.k.a. Pi types)’ in Buzzard’s Formalising mathematics in Lean: one interesting remark I’ll trail here is that a \Pi-type is analogous to a tangent vector field on a manifold, in that each value of f : M \rightarrow TM lies in a different (vector) space: f(x) \in T_xM. Anyway, we have now seen the most general type of a Lean function. Buzzard goes on to discuss inductive types (analogous to algebraic data types in Haskell) and quotient types (with no Haskell analogue, beyond what one could fake using GADTs).

This is the end of the Lean part of this post, and probably a very good place to stop reading. Thank you for getting this far!

Playing the natural number game in Haskell with singletons

There is a website out there with about 25 ‘design patterns’, all of which as far as I can tell are ingenious ways to work around the limitations of the type systems in more traditional languages, of which the prime miscreant seems to be Java. So it is with some dismay that I learned from the wonderful Haskell Unfolder podcast that the singleton design pattern is a good way to bring some of the expressiveness of dependent types to Haskell.

We define Peano naturals as a data type

data Nat = Zero | Succ Nat deriving (Eq, Show)

If like me, you often mistype a constructor when you mean a type, as in makeZero :: Zero as opposed to the correct makeZero :: Nat (with definition makeZero = Zero) then you will have got rather used to seeing variations on the theme of ‘Not in scope: type constructor or class ‘Zero’. A data constructor of that name is in scope; did you mean DataKinds?’ Well the great moment has finally arrived, when we do, in fact, mean DataKinds! (Seriously though, this is a nice example of why shipping without a million extensions enabled is the right choice for producing understandable error messages.) Using this extension, we get access to type level naturals, written as 'Zero,

type One = 'Succ 'Zero

and so on. Confusingly the quotes can be omitted, because (unlike this poor human) the compiler always knows what is at the value level and what is at the type level. Here I am including the quotes every time they are appropriate. The three further extensions required appear as comments in the next code block.

To encode the proposition m + n = p at the type level, we introduce a new generalized algebraic data type (using the GADTs extension) that has ‘wired in’ the Peano axioms that m + 0 = m for all m \in \mathbb{N} and, by definition m+(n+1) = (m+n) + 1. (Note that addition on the right is idiomatic, because it generalizes to ordinals in the usual way: \omega + 1 = \omega^+ is the successor ordinal to \omega whereas 1 + \omega = \omega.

data Add :: Nat -> Nat -> Nat -> Type where
  -- Uses GADTs and KindSignatures
  AddZero :: Add m 'Zero m
  AddSucc :: Add m n p -> Add m ('Succ n) ('Succ p)
  
deriving instance Show (Add m n p)
deriving instance Eq (Add m n p)
-- Uses StandaloneDeriving

We now use the Haskell compiler to prove that 0 + n = n for all n \in \mathbb{N}. What we want is to define a function zeroAdd :: (n : Nat) -> Add Zero n n such that zeroAdd n has type Add Zero n n. But note that in zeroAdd n, n is a value, whereas in Add Zero n n it is a type. Thus the type of the output depends on a value in the input, exactly as expected for a dependent type. And this is not possible in Haskell, no matter how many extensions one turns on.

What we need is a bridge from values to types. For instance, we need a value of type SomeNewDataType One such that, when we pattern match on it, we know that the type parameter is One (and not ‘Zero, or ‘Succ ‘Succ ‘Zero, or …). This is what singletons do for us.

data TypeValueDict (n :: Nat) where
  TVZero :: TypeValueDict 'Zero
  TVSucc :: TypeValueDict n -> TypeValueDict ('Succ n)

As an illustration of the bridge, I offer

showTypeFromDict :: TypeValueDict nType
  -> String
showTypeFromDict TVZero = "'Zero"
showTypeFromDict (TVSucc nValue) =
  "'Succ " ++ showTypeFromDict nValue
  
tvOneTypeString = showTypeFromDict
  (TVSucc TVZero :: TypeValueDict One)
  -- no quote for One, it is a type (well really a Nat)

where the output is the string "'Succ 'Zero'"; earlier we declared One as a type synonym for 'Succ 'Zero; it has type Nat

The proof that 0 + n = n is now routine: note that the illegal type signature zeroAdd :: (n : Nat) -> Add Zero n n is made legitimate using the dictionary.

zeroAdd :: TypeValueDict n -> Add Zero n n
zeroAdd TVZero = AddZero
zeroAdd (TVSucc r) = AddSucc (zeroAdd r)
  
zeroAddOne = zeroAdd (TVSucc TVZero) :: Add Zero One One

The fact that this compiles is a proof, at the type level, that 0 + n = n using only the Peano axioms. A truly expert Haskell programmer would probably refrain from running the code at all. Viewed from a sufficiently great distance, it is the same as the inductive proof in Lean.


Evaluating Gemini 3.0 Pro ‘Deep Think’ on American Mathematical Monthly Problems

January 2, 2026

The purpose of this post is to summarise my findings evaluating Gemini 3.0 Pro ‘Deep Think’ on the problems at the back of the May 2025 issue of the American Mathematical Monthly. My evaluation is available from my website, together with the transcript of all my interactions with Gemini. I used Gemini because while it’s probably second to ChatGPT 5.2 Pro at solving maths problems, I find it marginally more interesting to interact with.

I chose the May 2025 issue at random, but subject to the constraints that the deadline for submission of answers should have passed, and the solutions not yet be published. I think the problems in this issue are representative. None of them require exceptional ingenuity, and only 12535 (on binomial coefficients) could, I think, be said to be routine, even for an expert. In the table below, the headings are short for ‘Could I have done it quicker without the LLM?’, ‘Did the LLM make it fun?’ and ‘Fudge factor’, which I hope is self-explanatory. Stars mark the two problems to which I had already submitted solutions.

Image

This evaluation is of course highly subjective, but I hope you will find it of some interest. One highlight for me was when both Gemini 3.0 Pro ‘Deep Think’ and ChatGPT 5.2 ‘Extended Thinking’ rated my solution to 12535 as \alpha- for `aesthetic merit’. You can read Gemini’s solution verbatim in the evaluation report linked above.

Before embarking on this project, I was surprised that I hadn’t already seen many similar exercises from other better mathematicians … but half-way through I realised how time-consuming it was, and my surprise ceased. But Gemini and ChatGPT are tools, and like any tool, we need to learn how to use them, double-edged though they may be. In this spirit, I know I have benefitted in my career from being an enthusiast for computer algebra, and for programming; for instance Haskell changed how I think about monads.

Thinking of LLMs as just another tool in our toolkit, we should be honest about their strengths as well as pointing out their weaknesses (laughter optional, but sometimes irresistible). Some strengths I can identify are their

  • encyclopedic command of the basic literature;
  • ability to make connections and find analogies between different areas of mathematics;
  • ability, depending on the subfield, to do maths: of the seven problems, Gemini gave essentially correct solution to six.

In particular I have found Gemini and its colleagues excellent at definite integration (see 12534, but please also read the aside: ‘How LLMs sometimes cheat at definite integrals’) and at proving identities in enumerative combinatorics (see my write-up for 12535, but I have also had success with significantly harder problems). The weaknesses we all know: see my write-up of 12531 for an example of the superficially plausible ‘proof’ that they love to produce, complete with (mostly) spurious literature references, that turns out on time-consuming inspection to be fundamentally flawed. And as the ‘fun’ column shows, I am confident that in a narrow majority of cases, I would have enjoyed the problem more if I hadn’t used the LLM at all.

I think this is what really worries me: when in a year or so, state-of-the-art LLMs can reliably solve almost every American Mathematical Monthly problem, and even write aesthetically pleasing proofs of the binomial identities I like so much, what place will be left for mathematicians like me that enjoy the shallows as well as the depths of mathematics? Well, this is my problem, and there could be worse ones to have.

Finally, I hope I am not opening up a huge can of worms with this blogpost. For complete completeness, I will mention, although it is anyway on public record, that I contributed maths problems to the FrontierMath benchmark, organized by Epoch AI. I also set and solve maths problems for Surge AI; they have nothing to do with this post. I have no connection with Google or OpenAI and I no longer do any work for Epoch AI.


The q-Vandermonde Theorem by counting subspaces

October 22, 2025

Fix throughout a prime power q. We define the q-binomial coefficient \binom{d}{a}_q to be the number of a-dimensional subspaces of \mathbb{F}_q^d. If a < 0 or a > d then, by definition, the q-binomial coefficient is zero. The purpose of this post is to use this setting to prove one version of the q-Vandermonde convolution: en route we prove three results of increasing generality counting complementary subspaces.

In the proofs below we use the notation V_b for a varying b-dimensional subspace that is counted in part of the proof; if the subspace is instead fixed, often as part of the hypotheses, we use E_b.

Lemma. Let E_a be a given a-dimensional subspace of \mathbb{F}_q^{a+b}. The number of b-dimensional subspaces of \mathbb{F}_q^{a+b} complementary to E_a is q^{ab}.

Proof. Let e^{(1)}, \ldots, e^{(a+b)} be the canonical basis of \mathbb{F}_q^{a+b} of unit vectors. We may assume without loss of generality that E_a is spanned by the first a unit vectors. If V_b is a complementary subspace then V has a unique basis (up to order) of the form

u^{(1)} + e^{(a+1)}, \ldots, u^{(b)} + e^{(a+b)}

where u^{(1)},\ldots, u^{(b)} \in E_a. There are q^a choices for each of u^{(1)}, \ldots, u^{(b)} and so q^{ab} subspaces in total. \Box

If the claim about the basis is not obvious, I suggest thinking about row-reduction on the matrix whose rows are e^{(1)}, \ldots, e^{(a)} followed by a basis for V_b. We already have e^{(1)}, \ldots, e^{(a)} with pivots in the first a columns, so these rows are already perfect. Each of the remaining b columns must be a pivot column, so after row operations (including reordering) on rows a+1, \ldots, a+b we obtain a matrix

\left( \begin{matrix} I_a & 0 \\ M & I_b \end{matrix} \right)

in which different choices for the matrix M correspond to distinct subspaces V_b.

Proposition. Let E_a be a given a-dimensional subspace of \mathbb{F}_q^d. The number of b-dimensional subspaces V_b of \mathbb{F}_q^d such that E_a \cap V_b = 0 is \binom{d-a}{b}_q q^{ab}.

Proof. Observe that E_a \oplus V_b is an (a+b)-dimensional subspace of \mathbb{F}_q^n containing E_a. Such subspaces are in bijection with the b-dimensional subspace of \mathbb{F}_q^d / E_a. Hence there are \binom{d-a}{b}-choices for a subspace V_{a+b} in which E_a and the required V_b are complementary. By the previous lemma, there are q^{ab} choices for the complement V_b in V_{a+b}. \Box

In the following corollary note that, by hypothesis, all the b-dimensional subspaces we count contain the given r-dimensional subspace E_r.

Corollary. Let E_r \subseteq E_s be given r– and s-dimensional subspace of \mathbb{q}^d. The number of b-dimensional subspaces V_b of \mathbb{F}_q^n such that V_b \cap E_s = E_r is \binom{d-s}{b-r}_q q^{(b-r)(s-r)}.

Proof. By the previous proposition, applied to \mathbb{F}_q^n / E_r, there are \binom{(d-r) - (s-r)}{b-r}_q q^{(b-r)(s-r)} complementary subspaces V_{b-r}/E_r to E_s / E_r. Now again use that subspaces of the quotient space \mathbb{F}_q^n/E_r are in bijection with subspaces of \mathbb{F}_q^n containing E_r. \Box

Theorem. [q-Vandermonde theorem] We have

\binom{d}{b}_q = \sum_k q^{k(d-\ell-b+k)} \binom{\ell}{k}_q \binom{d-\ell}{b-k}_q.

Proof. Fix a (d-\ell)-dimensional subspace E_{d-\ell} of \mathbb{F}_q^n. We count subspaces V_b \subseteq \mathbb{F}_q^n having intersection of dimension b-k with E_{d-\ell}; by the corollary, taking r = b-k and s = d-\ell, there are \binom{\ell}{k}_q q^{k(d-\ell-b+k)} such subspaces. Summing over k gives the result. \Box

In the final step, we may assume that k \in \{b-d+\ell, \ldots, m\}, else V_b cannot have an intersection of dimension b-k with E_{d-\ell}. Correspondingly, the summands for k not in this range in the q-Vandermonde theorem are zero.

Corollary [q-Vandermonde theorem, alternative version] We have

\binom{d}{b}_q = \sum_k q^{(b-k)(\ell-k)}\binom{\ell}{k}_q\binom{d-\ell}{b-k}_q

Proof. Substitute \ell' = d- \ell and k' = b - k in the original version, cancel in a few places, swap the order of the binomial coefficients in each product, and then erase the primes. \Box


The continuation monad: a mathematical introduction

August 17, 2025

In the Haskell category, objects are types, and morphisms are Haskell functions. The continuation monad for a fixed return type r is the functor m sending a type a to the type m a of functions (a -> r) -> r. The purpose of this post is to use vector space duality to motivate the definitions of the monadic operations return :: a -> m a and join :: m m a -> m a. We show that

  • return, the unit in the monad, is the analogue of the canonical inclusion of a vector space in its double dual, and
  • monad multiplication is the analogue of the canonical surjection V^{\star\star\star\star} \twoheadrightarrow V^{\star\star} induced by applying the unit on V^\star to get a canonical inclusion V^\star \hookrightarrow V^{\star\star\star}.

Having constructed the continuation monad, and checked one of the two monad laws, we then use the Haskell identity

ma >>= f = join (fmap f ma)

to define the monadic bind operation. (Here fmap :: (a -> c) -> m a -> m c is what the continuation monad does on morphisms.) By unpacking this definition into one that is self-contained, we interpret monadic bind, usually denoted >>=, as a way to compose computations written in continuation passing style.

Thus the order in this post is the opposite of most (all?) of the many good introductions to the continuation monad that can be found on the web: I hope this will make this post of interest to mathematicians, and also perhaps to the expert computer scientists as an illustration of how a non-expert mathematician came (at last) to some understanding of the ideas. We end with a few examples of Haskell programs written in the continuation monad, or equivalently, in continuation passing style.

A Haskell module with all the Haskell below and some extras is available from my website.

The double dual monad

Fix a field \mathbb{F} and let Vect be the category of \mathbb{F}-vector spaces. There is a canonical inclusion V \hookrightarrow V^{\star\star} defined by

v maps to ‘evaluate at v

or in symbols

v \stackrel{\eta_V}{\longmapsto} [\theta \mapsto \theta(v)]\qquad(\star)

for v \in V and \theta \in V^\star. Thus (\star) defines a natural transformation \eta from the identity functor on Vect to the double dual functor T. This will turn out out to to be the unit in the double dual monad.

Remark. In general V^{\star\star} is a vastly bigger vector space than V: for instance if V is the vector space of all \mathbb{F}-valued sequences with finite support, then V^{\star} is the vector space of all \mathbb{F}-valued sequences. Thus V has a canonical countable basis, whereas and V^\star has dimension equal to the continuum \mathfrak{c}. Moreover V^{\star\star} has dimension 2^\mathfrak{c}. (Even the existence of bases for V^{\star} and V^{\star\star} depends on set theoretic foundations: when \mathbb{F} = \mathbb{F}_2 it is implied by the Boolean Prime Ideal Theorem, a weak form of the Axiom of Choice.) For this reason we have to reason formally, mainly by manipulating symbols, rather than use matrices or bases. But this is a good thing: for instance it means our formulae for the monad unit and monad multiplication translates entirely routinely to define the continuation monad in the category Hask.

The unit tells us how to multiply

To make the double dual functor a monad, we require multiplication, that is, a natural transformation \mu : T^2 \rightarrow T. Thus for each vector space V, we require a ‘naturally defined’ linear map \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}. As already said, we obtain this as the dual map to \eta_{V^\star} : V^\star \rightarrow V^{\star\star\star}. That is \mu_V = \eta_{V^\star}^\star.

This needs some unpacking. Explicitly, \eta_{V^\star} is the linear map

\theta \stackrel{\eta_{V^\star}}{\longmapsto} [\phi \mapsto \phi(\theta)]

for \theta \in V^{\star} and \phi \in V^{\star\star}. Generally, given a linear map f : U \rightarrow W, the dual map is given by `restriction along f‘: that is, we regard a dual vector \theta \in W^\star as an element of U^\star by mapping U into W by f, and then applying \theta; in symbols f^\star (\theta) = \theta \circ f. Therefore the dual map to \eta_{V^\star}, which is the V component of the natural transformation \mu, is

w \stackrel{\mu_V}{\longmapsto} [\theta \mapsto w(\eta_{V^\star}(\theta))] = [\theta \mapsto w([\phi \mapsto \phi(\theta)])]  \quad (\dagger)

If you have an intuitive feeling for what this means, well great. I hope in a later post to show that \mu has an intuitive interpretation in the case \mathbb{F} = \mathbb{F}_2 using propositional logic to identify elements of

\mathcal{P}(\mathcal{P}(X)) \leftrightarrow \{0,1\}^{\mathcal{P}(X)} \leftrightarrow \{0,1\}^{\{0,1\}^X}

with boolean formulae in variables indexed by X; multiplication then becomes the observation that interpreting variables in a boolean formula as yet further boolean formulae does not leave the space of boolean formulae. Here I will note that the definition of \mu_V in (\dagger) does at least typecheck: \phi \in V^{\star\star} so the map \phi \mapsto \phi(\theta) is an element of V^{\star\star\star} and therefore in the domain of \Phi : V^{\star\star\star} \rightarrow \mathbb{F}. Naturality of \mu follows very neatly from the naturality of \eta by duality: see the commutative diagrams below.

Image

But why is it a monad?

What we haven’t shown is that the triple (T, \eta, \mu) defines a monad. For this we need to check the monad laws shown in the commutative diagrams below.

Image

The first is the identity law that \mu_V \circ \eta_{V^{\star\star}} = \mathrm{id}_{V^{\star\star}}. This has a short proof, which, as expected from the remark about formalism above, come down to juggling the order of symbols until everything works out. Recall that \eta_{U} is the canonical embedding of U into U^{\star\star}. Thus if we take \phi \in V^{\star\star} then \eta_{V^{\star\star}}(\phi) is the element of V^{\star\star\star\star} defined by

\Psi \mapsto \mathrm{ev}_\phi(\Psi) = \Psi(\phi).

Moreover, \mu_V is defined by regarding elements of V^{\star\star\star\star} as elements of V^{\star\star} by restriction along the canonical inclusion \eta_{V^\star} : V^{\star} \rightarrow V^{\star\star\star}. This map sends \theta \in V^{\star} to the linear map [\phi \mapsto \phi(\theta)] in V^{\star\star\star}. Therefore

\begin{aligned} (\mu_V \circ \eta_{V^{\star\star}})(\phi) &= \mu_V([\Psi \mapsto \Psi(\phi)]) \\&= [\theta \mapsto (\eta_{V^\star}(\theta))(\phi)] \\ &= [\theta \mapsto \phi(\theta)] \\&= \phi \end{aligned}

and so \mu_V \circ \eta_{V^{\star\star}} is indeed the identity of V^{\star\star}.

A proof in this style of the second associativity law seems distinctly fiddly. Making the feeble excuse that there is no ‘super case’ Greek alphabet to represent elements of V^{\star\star\star\star\star}, I will omit it. I plan in a sequel to this post to show that T(V) = V^{\star\star} is the monad induced by the adjunction betweeen the contravariant dual and itself; the identity monad law then becomes the triangle law (\eta D) \circ (D\epsilon) = G for the right-hand functor D : \mathbf{Vect} \rightarrow \mathbf{Vect}^\mathrm{op} in the adjunction, and the associativity law becomes an immediate application of the interchange law. If time permits I’ll include the proof by string diagrams: to my surprise I can’t immediately find this on the web, except in this nice video from the Catsters: see the blackboard at the 9 minute mark.

But how could it not be a monad?

As a final remark, for either law one could argue `how else could it possibly work?’ Thus in first, we know that (\mu_{V} \circ \eta_{V^{\star\star}})(\phi) is some kind of linear map V^{\star} \rightarrow \mathbb{F}, and since all we have to hand is \phi \in V^{\star\star}, maybe it should be obvious from naturality that the map is \phi itself? It would be intriguing if this could be made into a rigorous argument either working in Vect or directly in Hask, perhaps using ideas from Wadler’s paper Theorems for free.

The continuation monad in Haskell

We now translate the definition of the unit and multiplication from the double dual monad into Haskell. This is completely routine. For the unit, we translate \eta_V :V \rightarrow V^{\star\star}, as defined in (\star) by \eta_V(v) = \mathrm{ev}_v to

return x = \k -> k x -- return :: a -> ((a -> r) -> r)

It might be more idiomatic to write this as return = flip ($). Either way we get return :: a -> ((a -> r) -> r).

You can join if you can return

For multiplication, we translate \mu_V : V^{\star\star\star\star} \rightarrow V^{\star\star}, as defined in (\dagger) by \mu_V(w) = [\theta \mapsto w(\eta_{V^\star}(\theta))] to

join :: ((((a -> r) -> r) -> r) -> r) -> ((a -> r) -> r)
join w = \k -> w (return k)

Note that this uses return, capturing the sense seen above in which the unit for the double dual monad tells us how to define multiplication. Unpacked to be self-contained join becomes

join w = \k -> w (\φ -> φ k)

which thanks to Haskell’s support for Unicode is still valid code. It is a useful exercise to prove the identity monad law by equation-style reasoning from these definitions.

You can bind if you can join

We now use the Haskell identity ma >>= f = join (fmap f ma) mentioned in the introduction to define bind. First of course we must define fmap, defining the Functor instance for the continuation monad. For the double dual functor with f : V \rightarrow W, we have f^\star (\theta) = \theta \circ f and so f^{\star\star} (\phi) = \phi \circ f^\star or unpacked fully,

f^{\star\star}\phi = [\theta \mapsto \phi(f^\star (\theta))] = \theta \mapsto \phi(\theta \circ f)

We therefore define

fmap :: (a -> b) -> ((a -> r) -> r) -> (b -> r) -> r
fmap f run_a = \k_b -> run_a (k_b . f)

and the compiler will now accept

bind run_a a_to_run_b = join $ fmap a_to_run_b run_a

(For why run_a is a reasonable name for values of the monadic type (a -> r) -> r see the final part of this post.) It is now a mechanical exercise to substitute the definitions fmap run_a = \k_b -> run_a (k_b . f) and then join w = \k -> w (\φ -> φ k) to get a self-contained definition, which we tidy up in three further stages.

bind1 run_a a_to_run_b = join $ \k_b ->
           run_a (k_b . a_to_run_b)
      where join w = \k -> w (\φ -> φ k)

bind2 run_a a_to_run_b = \k -> (\k_b ->
      run_a (k_b . a_to_run_b))(\φ -> φ k)

bind3 run_a a_to_run_b =
      \k -> run_a ((\φ -> φ k) . a_to_run_b)

bind4 run_a a_to_run_b =
      \k -> run_a (\x -> (a_to_run_b x) k)

In the final line, the parentheses around a_to_run_b x could be dropped. Incidentally, with the DeriveFunctor extension, ghc can derive the declaration for fmap just seen.

But don’t monads need an algebraic data type?

Answer: yes if we want to take advantage of Haskell’s do notation, and be consistent with Haskell expected types for return, fmap, join and >>=. Below we define an algebraic data type RunCont with inner parameter r for the return type and outer parameter a, so that the ‘curried type’ RunCont r has kind * -> *, as a monad should. The constructor is called RC (it is also common to reuse the type name as the constructor) and the ‘wrapped value’ can be extracted using runCont :: RunCont r a -> (a -> r) -> r.

newtype RunCont r a = RC {runCont :: (a -> r) -> r}

For instance if a_to_run_b :: a -> RunCont r b then runCont (a_to_run_b x) has type (b -> r) -> r so is the correct replacement for a_to_run_b in bind4. Wrapping return is even more routine, and we end up with

instance Monad (RunCont r) where
  return x = RC (\k -> k x)
  RC run_a >>= a_to_run_b = RC (\k ->
    run_a (\x -> runCont (a_to_run_b x) k))

It is a nice fact that bind determines both fmap and join, so, after the unpacking above to avoid using fmap in the definition of bind, everything we need is in the declaration above. That said, versions of Haskell after 7.10 require separate instance declarations for Functor, which to illustrate this point we define by

instance Functor (RunCont r) where
  fmap f (RC run_a) = (RC run_a) >>= (return . f)

and also for Applicative, which can be derived from bind, but since I find the result completely unintuitive, instead we’ll use the usual boilerplate

instance Applicative (RunCont r) where
  pure = return
  () = ap -- ap :: Monad m => m (a -> b) -> m a -> m b

recovering this instance from the Monad declaration. Incidentally, the functor declaration also has a ‘boilerplate’ definition; fmap = liftM is the ubiquitous boilerplate. I can’t easily find on the web why the ghc authors make this codebreaking change (the page on the Haskell wiki has a dead link for a 2011 proposal along these lines that was rejected) but surely the rise of programming in the applicative style has something to do with it.

Why newtype?

I used newtype rather than data above to make RC strict in its value. Counterintuitively (to me at least) this means that overall the RunCont monad is lazier, because the compiler does not need to inspect the argument to RC to check it has type (a -> r) -> r. Ever since I implemented the Random state monad with data rather than newtype, and many weeks later, found puzzling instance of what seemed like Haskell being over-eager to evaluate, I have been able to remember this distinction. An alternative to using newtype, which emphasises this point, is data RunCont r a = RC {!runCont :: (r -> a) -> a}. Note the explicit strictness declaration. There is a further benefit to newtype over data, that the strictness and unique constructor together mean that the compiler can erase the constructors and so generate code that has no overhead; a related fact, which at first I found rather surprising, is that fully compiled Haskell code has ‘no remembrance of types past’.

Continuation passing style is the computational model of the continuation monad

Not before time we mention that the type a -> r is that of continuations: functions that take as input type a and return the desired final output type of the entire computation, which we always denote r.

Callbacks

Such functions arise naturally as callbacks: ‘function holes’ left in the anticipation that another author or library will be able to fill in the gap. Consider for instance the following program intended to be useful when checking the Fibonacci identity \sum_{k=0}^n F_k = F_{n+2} - 1.

fibSum n = sum [fib k | k <- [0..n]]

Of course this won’t compile, because there is no fib function to hand. Rather than take time off to write one, we instead press on, by bringing fib into scope in the simplest possible way:

run_fibSum n fib = sum [fib k | k | k <- [0..n]]
                   :: Int -> (Int -> Integer) -> Integer

Here I’ve supposed fib has type Int -> Integer; this is reasonable because the answer might be more than 2^{64}, but surely the input won’t be, but the true reason is to make a distinction between the types a (here Int) and r (here Integer). Thus run_fibSum is a way of running the continuation fib to give the answer we want, and \n -> RC (run_fibSum n) is a value of type Int -> RunCont Int Integer, so suitable for use in our monad, most obviously as the second argument to bind >>=. For a minimal example, with exactly the same structure, take

run_fib n fib = fib n
    -- run_fib :: Int -> (Int -> Integer) -> Integer
run_fib' n = \fib -> fib n
    -- run_fib' :: Int -> (Int -> Integer) -> Integer
run_fib'' n = RC (\fib -> fib n)
    -- run_fib'' :: Int -> RC Int Integer

where the final function could have been written simply as return n; as we have seen many times, the unit in the continuation monad is ‘x maps to evaluate at x‘. In this form we see clearly that run_fib doesn’t really care that the intended continuation is the Fibonacci function; it could be any value of type Int -> Integer. I hope this explains why I’ve used RunCont as the name for the monadic type: its wrapped values are continuation runners of type (a -> r) -> r, and not continuations, of type a -> r. To my mind, using Cont is as bad as conflating elements of V^{\star} (continuations) with elements of V^{\star\star} (continuation runners).

Continuation passing style

Consider the following very simple Haskell functions addOne x = x+1 and timesTwo y = y * 2 of intended type Int -> Int. We can compose them to get timesTwoAfterAddOne = timesTwo . addOne. To make life harder, we imagine that addOne has to control the entire future course of our program, and to help it out, we’ll give a suitably extended version of it the continuation timesTwo as the second argument:

addOneCPS x timesTwo = timesTwo (x + 1)

and so timesTwoAddOne x = addOneCPS x timesTwo. Of course this generalizes immediately to any continuation of type Int -> r; below I’ve renamed timesTwo to k (as in the derivation of bind above), but the function and type are the same:

addOneCPS :: Int -> (Int -> r) -> r
addOneCPS x k = k (x + 1)

Similarly we can write addTwo in this style,

timesTwoCPS :: Int -> (Int -> r) -> r
timesTwoCPS y k = k (y * 2)

and now a composed function, still in CPS style so having a continuation that might perhaps be as simple as id :: Int -> Int (when r is Int) or show :: Int -> String (when r is String), is

timesTwoAfterAddOneCPS x k = addOneCPS x
                              (\y -> timesTwoCPS y k)

Even allowing for the extra argument, this is far less attractive than the simple composition timesTwo . addOne we had earlier. But I am sure that by now you see where this is leading. Above our final version of monadic bind for the continuation monad (written without an algebraic data type) was, changing the dummy variable from x to y to avoid an imminent clash,

bind4 run_a a_to_run_b = \k ->
    run_a (\y -> (a_to_run_b y) k)

Substitute addOneCPS x for run_a and timesTwoCPS for a_to_run_b and get

bind4 (addOneCPS x) timesTwo = \k -> (addOneCPS x)
    (\y -> (timesTwoCPS y) k)

which is the \eta-expansion of timesTwoAfterAddOneCPS x. Therefore, when wrapped in the appropriate constructors, we can write timesTwoAfterAddOneCPS x = (addOneCPS x) >>= timesTwoCPS, and even use do notation:

timesTwoThenAddOneCPS x =
  do y <- addOneCPS x
     -- addOneCPS x = RC (\k -> k (x + 1))
     timesTwoCPS y
     -- timesTwoCPS y = RC (\k -> k (y * 2))

Recursion in continuation passing style

I’ll use the sum function as an example, even though it does not show Haskell in its best light. In fact, it is a notable ‘gotcha’ that both naive ways to write it use an amount of memory proportional to the length of the list. Worst of all is the right-associative fold

sumR xs = foldr (+) 0 xs

which means that sumR [1,2,3,4] expands to 1 + (2 + (3 + (4+0))) and then 4+0 is evaluated, and so on. (This can be checked using the linked Haskell code, which puts a trace on plus.) Still bad, but more easily fixed, is the left-associative fold

sumL xs = foldl (+) 0 xs

which means that sumL [1,2,3,4] expands to (((0+1) + 2) + 3) + 4, and then 0+1 is evaluated, and so on. The correct way to write it uses a strict left-fold, available as foldl'; one can write ones own using the primitive seq, which tells the compiler ‘I really want the first value computed to WHNF before you do anything more’

foldl' op init [] = init
foldl' op init (x : xs) = let next = init `op` xs
    in seq next (foldl' op next xs)

One could also replace init above with !init to force evaluation without using seq. Either way, the init value functions as an accumulator for the sum. In this connection it is worth noting that, for the sum function on common numerical types, WHNF means complete evaluation, but in general one has to fight harder to force full evaluation. For more on these issues see the post beginning To foldr, foldl or foldl’, that is the question! on the Haskell wiki.

The right-associative fold is easily converted to CPS-style:

sumRCPS [] k = k 0
sumRCPS (x : xs) k = sumRCPS xs (k . (+ x))

Thus if the list is empty we apply the continuation to 0, and otherwise we modify the continuation so that it first adds x, then does whatever it was going to do before. Thus sumCPS [1,2,3,4] id evaluates to sumCPS [2,3,4] (id . (+1)), and then in three further steps to sumCPS [] (id . (+1) . (+2) . (+3) . (+4)); now, as already seen, the continuation constructed above is evaluated 0, giving id . (+1) . (+2) . (+3) . (+4) $ 0 which in turn evaluates to (((0 + 4) + 3) + 2) + 1). Note that although this might look like a left-associative fold, it does the same computation as the right-associative fold.

The only way to convert the left-associative fold to CPS-style that I can see does all the work outside the continuation

sumLCPS [] acc k = k acc
sumLCPS (x : xs) acc k = sumLCPS xs (acc + x) k

where again either seq or a strictness annotation !acc is needed to avoid a space blow-up. So rather disappointingly, CPS-style is no help at all in writing an efficient sum function.

An example where CPS-style is useful

The function takeUntil has the following naive implementation:

takeUntil :: (a -> Bool) -> [a] -> Maybe [a]
takeUntil p [] = Nothing
takeUntil p (x : xs)
    | p x = Just []
    | otherwise = x `mCons` takeUntil p xs
  where x `mCons` Nothing = Nothing
        x `mCons` Just ys = Just (x : ys)

For example, takeUntil (> 4) [1,2,3] evaluate to 1 `mCons` 2 `mCons` 3 `mCons` Nothing, and then to 1 `mCons` 2 `mCons` Nothing, then to 1 `mCons` Nothing and finally to Nothing. (Nothing will come of nothing, as King Lear so inaptly said.) Using continuation passing style, we can short-circuit the evaluation in the ‘Nothing’ case.

takeUntilCPS :: (a -> Bool) -> [a] ->
    (Maybe [a] -> Maybe [a]) -> Maybe [a]
takeUntilCPS p [] _ = Nothing
takeUntilCPS p (x : xs) k
    | p x = k (Just [x])
    | otherwise = let k' (Just ys) = k $ Just (x : ys)
                  in (takeUntilCPS p xs k')

Note we do not need a pattern match on ‘Nothing’ to define k' because if the input was Nothing then we’d have discarded the continuation and immediately returned Nothing. This is the first and only example in this post where we use the freedom in continuation passing style not just to modify the continuation but to discard it entirely.

Final thought on continuations

Finally, as far as I can see, continuation passing style isn’t really very important in Haskell, and some comments I’ve read suggest it can even be harmful: the linked answer mentions that for GHC,

‘The translation into CPS encodes a particular order of evaluation, whereas direct style does not. That dramatically inhibits code-motion transformations’.

On the other hand, continuations are excellent for stretching the mind, and illuminating how programming languages really work. Hence, no doubt, their ubiquity in computer science courses, and the many tutorials on the web. In this spirit, let me mention that, as in the Yoneda Lemma, if we get to pick the return type r in RunCont r a then there is a very simple hack to extract show that any continuation runner really is a hidden value of type a (this is a common metaphor in web tutorials), which we can extract by

counit :: RunCont a a -> a
counit (RC run_a) = run_a id

This might suggest an alarmingly simple way to implement the continuation monad: just choose the output type r to be a, extract the value, and do the obvious. But of course the code below only type checks if we (as a text substitution) replace r with a, and so, as I’ve seen many times, somehow the syntactical type checker can guide us away from what is semantically incorrect.

instance Monad (RunCont r) where
  return x = RC (\k -> k x)
  (RC run_a) >>= g = let x = run_a id in g x
  -- • Couldn't match type ‘a’ with ‘r’
       ‘a’ is a rigid type variable bound by ...


Sampling from the tail end of the normal distribution

February 3, 2025

The purpose of this post is to present two small observations on the normal distribution, with more speculative applications to academic recruitment. The mathematical part at least is rigorous.

Tails and the exponential approximation

We can sample from the tail end of the standard normal distribution N(0,1) by sampling from N(0,1) as usual, but then rejecting all samples that are not at least some threshold value t. From my mental picture of the bell curve, I think it is intuitive that the mean of the resulting sample will be not much more than t. In fact the mean is

\displaystyle t + \frac{1}{t} - \frac{2}{t^3} + \mathrm{O}(\frac{1}{t^4}).

Please pause for a moment and ask yourself: do you expect the variance of the tail-end sample to increase with t or instead to decrease with t?

The answer, less intuitively I think, is that the variance decreases as 1/t^2. More precisely, it is

\displaystyle \frac{1}{t^2} - \frac{6}{t^4} + \mathrm{O}(\frac{1}{t^6}).

Thus, the larger t is, the more concentrated is the sample. Given that one expects tail events to be rare, and so perhaps more variable, this is maybe a surprise. One can derive the series expansions above by routine but fiddly calculations using the probability density function \frac{1}{\sqrt{2\pi}} \exp(-x^2/2) of the standard normal distribution.

Alternatively, and I like that this can be done entirely in one’s head, observe that for large t and small h, we have

\displaystyle \frac{1}{\sqrt{2\pi}} \exp(-(t+h)^2/2 ) \approx \frac{1}{\sqrt{2\pi}} \exp(-t^2) \exp(-th)

and so the normal distribution conditioned on x \ge t should be well approximated by an exponential distribution with parameter t, shifted by t. The probability density function of the unshifted exponential distribution with parameter t is t \exp(-th), and, correspondingly, as is well known, its mean and standard deviation are both 1/t. Thus this approximation suggests that the mean of the tail distribution should be t + 1/t and its variance 1/t^2, agreeing with the two asymptotic series above.

To make this explicit, the table below shows the percentage of ‘outlier’ samples that are t+1 or more standard deviations from the mean, in a sample conditioned to be at least t standard deviations from the mean.

t 0 1 2 3 4 5
\% outliers 31.7 14.3 5.9 2.3 0.9 0.3

The University of Erewhon

In the selection process for the University of Erewhon, applicants are invited to sit a fiendishly difficult exam. The modal score is zero (these candidates are of course rejected) and the faculty recruits the tiny minority who exceed the pass mark. The result, of course, is that the faculty recruits not the best mathematicians, but those best at passing their absurd test. (Nonetheless there is a long queue at the examination centre each year: people want jobs!) But more deleteriously, the effect is to create a monoculture: modelling exam-taking ability as an N(0,1) random variable, the analysis above shows that sampling with threshold t gives a distribution in which almost everyone scores about t + 1/t. (We saw the standard deviation was 1/t up to terms of order 1/t^2.) And of course t is chosen so large that only those able to devote years of their life to preparing for the exam, to the exclusion of all other activities, have any chance of passing. And thus the dispririting result is that all candidates are not only rather alike in their test-taking ability, but the selection by this one statistic creates a culture entirely dominated by the branches of mathematics and the personality traits favoured by the test. The table above shows it also creates the delusion that most people are about as good at maths (or at least at the Erewhonian test) as you: for example, if t = 3 and you have 50 colleagues, then since only 2.3% of the conditioned sample are outliers, lying in the \ge 4 tail, there is a good chance that none of your colleagues is more than 1 standard deviation ahead of you in ability.

I have gone through selection processes in my life that had something in common with this obvious parody (often I fear unjustly benefitting), and believe I can write from experience that the result has never been good. The alternative of course is to use multiple criteria when recruiting, allowing that some successful candidates may be weak in some areas, while very strong in others. Rather obvious, I know, but I think it’s interesting that a very simple mathematical argument backs up this conclusion.

In the next part of the post we’ll see that the effect of luck can mean that Erewhonian style recruitment not only creates a monoculture, it doesn’t even successfully select the most talented candidates.

The effect of luck

If our intuition about tail events is perhaps suspect, our intuition for conditional probability is surely worse. When the two come together, by conditioning on rare events, the effects can be catastrophic. Selection bias is frequently overlooked even by those who should know better. It seems quite possible to me that we’re here today only because in the many worlds in which the delicate balance of nuclear deterrence failed, there is no-one left in the charred embers to read blog posts on selection bias. Or to pick a less extreme example, would you spend £1000 to attend a seminar given by a panel of lottery winners in which they promise to reveal the detail of their gambling strategies?

Returning to the normal distribution, and the University of Erewhon, suppose that with probability 1-p we sample from N(0,1) (people with average luck), and with probability p we sample from N(0,1) + \ell (lucky people). Suppose that \ell is small, say \ell = 1/2, so the effect of luck is relatively modest, boosting one’s performance by just 1/2 a standard deviation. As before, we then reject all samples below the threshold t. Does the effect of luck become more or less significant as t increases?

I think here most people would intuitively guess the right answer, but the magnitude of the effect is striking. The table below shows the number of ‘average’ and ‘lucky’ people when we take a million samples according to the scheme above, keeping only the samples that are at least t. We suppose that there is a probability p = 1/5 of being lucky. The final column is the ratio of lucky to average people.

t average lucky ratio
1 127064 61573 0.48
1.5 53375 31774 0.60
2 18361 13070 0.71
2.5 4924 4562 0.93
3 1054 1176 1.12
3.5 187 250 1.34
4 24 51 2.13

Thus even though average people outnumber lucky people four to one, for a ratio of 0.25, when we sample from the t \ge 3 tail of the distribution (with, as just explained, a boost of 0.5 for lucky people), lucky people predominate. The effect is that the University of Erewhon fails in its goal of recruiting people three standard deviations above the mean: instead of the 2230 successful candidate, 1176 are people who are merely about 2.5 standard deviations above the mean, but happened to get lucky.

The effect is robust to different models for luck. For instance suppose, to be fair, we allow people to have a chance to be unlucky, and, to counterbalance this, model luck as an N(0,1/4) random variable, so the standard deviation is 1/2, equal to the effect size in the previous model. Since the sum of N(0,1) and N(0,1/4) random variables has the N(0,5/4) distribution, the effect is to increase the variance, while leaving the mean unchanged. The graphs below show the probability density function for the original N(0,1) random variable, conditioned on the sum being in the \ge t tail, for t \in \{2.5, 3, 3.5, 4\}.

Image

Observe that the density functions become slightly more concentrated, and while they shift to the right, the means are 2.31109, 2.67471, 3.04521 and 3.42096, significantly less in all cases than the threshold t. So, once again, the most obvious effect of being more selective is to increase the importance of being lucky, and we see the monoculture effect from the first scenario. The importance of luck is confirmed by the graph below showing the probability density function for luck, again conditioned on the sum being in the \ge t tail.

Image

It is striking that even for t = 2.5 (the blue curve) only 10.09% of the conditional probability density function is in the negative half. Almost everyone recruited gets lucky, most by at least one standard deviation from the mean and many by two or more standard deviations from the mean. The moral is again clear: unless you believe (as is sometimes inaccurately said of Napoleon) that luck is an intrinsic life-long characteristic that you genuinely want to select for, use multiple criteria when recruiting.


Lifting set/multiset duality

November 23, 2024

This is a pointer to a note in the same spirit as the previous post. This time we take n \in \mathbb{N} and lift the identity

\sum_{k=0}^n (-1)^k \left(\!\binom{n}{k}\!\right) \binom{n}{k} = 0

where \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} is the number of k-multisets of a set of size n to a long exact sequence of polynomial representations of \mathrm{GL}_n(F), working over an arbitrary field F. We then descend to get quick proofs of generalizations of the symmetric function identity

\sum_{r=0}^n (-1)^r h_{n-r}(x_1,\ldots, x_D)e_r(x_1,\ldots, x_D) = 0

for D \ge n and the corresponding q-binomial identity, obtained by specializing at 1,q, \ldots, q^{D-1}, namely

\sum_{r=0}^n (-1)^r q^{r(r-1)2} \genfrac{[}{]}{0pt}{}{n-r+D-1}{n-r}_q  \genfrac{[}{]}{0pt}{}{D}{r}_q = 0.


Lifting the ‘stars and bars’ identity

September 21, 2024

Let \left(\!\binom{n}{k}\!\right) denote the number of k-multisubsets of an n-set.
A basic result in enumerative combinatorics is that

\left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} \qquad (1)

One proof of this uses the method of stars and bars. The link will take you to Wikipedia; for a similar take on the method, interpreting \left(\!\binom{n}{k}\!\right) as the number of ways to put k unlabelled balls into n numbered urns, see Theorem 2.3.3 in my draft combinatorics textbook.

The purpose of this post is to give a combinatorial proof of the q-binomial version of this identity and then to show that this q-lift can be restated as an equality of two symmetric functions specialized at the powers of q, and so is but a shadow of an algebraic isomorphism between two representations of \mathrm{SL}_2(\mathbb{C}). I finish by explaining how my joint work with Eoghan McDowell shows that this isomorphism exists for representations of \mathrm{SL}_2(\mathbb{F}) over an arbitrary field \mathbb{F}, and then outlining a beautiful construction by Darij Grinberg of an explicit isomorphism, which (deliberately) agrees with the one in our paper. Each lifting step, indicated by increasing equation numbers, gives a more conceptual, and often more memorable result. It is not yet clear what formulation might deserve to be called the ‘ultimate lift’ of the stars and bars identity.

This earlier post has much more on the background multilinear algebra, and representation theory.

Subsets are enumerated by q-binomial coefficients

Given a subset X of \mathbb{N}_0 , we define its weight, denoted \mathrm{wt}(X) , to be its sum of entries. We shall adopt a slightly unconventional approach and define the q-binomial coefficient \genfrac{[}{]}{0pt}{}{n}{k}_q \in \mathbb{C}[q] to be the enumerator by weight of the k-subsets of \{0,1,\ldots, n-1\}, normalized to have constant term 1. That is,

\genfrac{[}{]}{0pt}{}{n}{k}_q = q^{-k(k-1)/2} \sum_{X \subseteq \{0,1,\ldots, n-1\}} q^{\mathrm{wt}(X)}.

(This property of q-binomial coefficients appears to me to be fundamental, but it is not always stressed in introductory accounts.) Set \genfrac{[}{]}{0pt}{}{n}{k}_q = 0 if n or k is negative. Clearly \genfrac{[}{]}{0pt}{}{n}{k}_q is a polynomial in q. For example

\begin{aligned} \genfrac{[}{]}{0pt}{}{4}{2}_q&= q^{-1}(q^{0+1} + q^{0+2} + q^{0+3} + q^{1+2} + q^{1+3} + q^{2+3}) \\ &= 1 + q + 2q^2 + q^3 + q^4, \end{aligned}

and \genfrac{[}{]}{0pt}{}{n}{0}_q = \genfrac{[}{]}{0pt}{}{n}{n}_q = 1 and \genfrac{[}{]}{0pt}{}{n}{1}_q  = 1 + q + \cdots + q^{n-1} for all n \in \mathbb{N}_0. We now recover the usual definition of \genfrac{[}{]}{0pt}{}{n}{k}_q. As a preliminary lemma we need a q-lift of Pascal’s Rule \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k}, which we prove by adapting the standard bijective proof.

Lemma. If n \in \mathbb{N} and k \in \mathbb{N}_0 then

\genfrac{[}{]}{0pt}{}{n}{k}_q = q^{n-k} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q  + \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Proof. If k = 0 then the result holds by our convention that \genfrac{[}{]}{0pt}{}{n-1}{-1}_q = 0 so we may suppose that k \ge 1. Removing n-1 from a k-subset of \{0,1,\ldots,n-1\} containing n-1 puts such k-subsets in bijection with the k-1 subsets of \{0,1,\ldots, n-2\}, enumerated by \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q. This map decreases the weight by n-1. The k-subsets of \{0,1,\ldots, n-1\} not containing n-1 are in an obvious weight preserving bijection with the k-subsets of \{0,1,\ldots, n-2\}, enumerated by \genfrac{[}{]}{0pt}{}{n-1}{k}_q. Therefore,

q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q = q^{n-1} q^{(k-1)(k-2)/2} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Cancelling q^{k(k-1)/2}, using that

(k-1)(k-2)/2 + k-1 = k(k-1)/2,

we obtain the required identity. \Box

For n \in \mathbb{N}_0, we define the quantum number [n]_q by [n]_q = 1 + q + \cdots + q^{n-1} and the quantum factorial [n]_q! by [n]_q! = [n]_q[n-1]_q \ldots [1]_q. Observe that specializing q to 1 these become n and n!, respectively. (See the section on characters below for the connection with the alternative convention in which quantum numbers are Laurent polynomials.) Note that in the following proposition, if k = 0 then both products are empty and so \genfrac{[}{]}{0pt}{}{n}{0}_q = 1 for all n \in \mathbb{N}_0.

Proposition. If n, k \in \mathbb{N}_0 then

\displaystyle\genfrac{[}{]}{0pt}{}{n}{k}_ q = \frac{[n]_q[n-1]_q \ldots [n-k+1]_q}{[k]_q!}.

Proof. Working by induction on n+k using the previous lemma we have

\begin{aligned} \genfrac{[}{]}{0pt}{}{n}{k}_q &=  q^{n-k}\genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + \genfrac{[}{]}{0pt}{}{n-1}{k}_q \\ &= q^{n-k} \frac{[n-1]_q\ldots [n-k+1]_q[n-k]_q}{[k-1]_q!} \\ &\qquad \qquad \qquad + \frac{[n-1]_q\ldots [n-k+1]_q[n-k]_q}{[k]_q!} \\&= \frac{[n-1]_q\ldots [n-k+1]_q}{[k]_q!} \bigl(  q^{n-k}[k]_q + [n-k]_q  \bigr) \\ &= \frac{[n-1]_q \ldots [n-k+1]_q}{[k]_q!} [n]_q \\ \end{aligned}

where we used the inductive hypothesis to go from line 1 to line 2 and then

\begin{aligned} q^{n-k}[k]_q + [n-k]_q &= q^{n-k}(1+q + \cdots + q^{k-1})  \\ & \qquad\qquad + 1 + q + \cdots + q^{n-k-1} \\ &= 1 + q + \cdots + q^{n-1} \\&= [n]_q \end{aligned}

for the final equality. \Box

Partitions in a box are enumerated by q-multibinomial coefficients

Given a multisubset M of \{0,1,\ldots, n-1\}, we again define its weight to be its sum of entries (now taken with multiplicities). Let

\left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \sum_M q^{\mathrm{wt}(M)}

be the corresponding enumerator. (The double brackets notation is not standard.) For example

\begin{aligned} \left[\!\genfrac{[}{]}{0pt}{}{3}{2}\!\right]_q  &= q^{0+0} + q^{0+1} + q^{0+2} + q^{1+1} + q^{1+2} + q^{2+2} \\ &= 1 + q + 2q^2 + q^3 + q^4  \\ &= \genfrac{[}{]}{0pt}{}{4}{2}_q\end{aligned}

and \left[\!\genfrac{[}{]}{0pt}{}{n}{0}\!\right]_q = 1 and

\left[\!\genfrac{[}{]}{0pt}{}{n}{1}\!\right]_q = q^0 + q^1 + \cdots + q^{n-1} = [n]_q = \genfrac{[}{]}{0pt}{}{n}{1}_q

for all n \in \mathbb{N}_0. This equality \left[\!\genfrac{[}{]}{0pt}{}{3}{2}\!\right]_q = \genfrac{[}{]}{0pt}{}{4}{2}_q is not a coincidence.

The q-stars and bars identity

The k-multisubsets enumerated by \genfrac{[}{]}{0pt}{}{n}{k}_q are in obvious bijection with sequences (j_1,\ldots, j_k) where 0 \le j_1 \le \ldots \le j_k \le n-1. In turn, such sequences are in bijection with k subsets of \{0,1,\ldots, n+k-2\} by the map

(j_1,\ldots, j_k) \mapsto \{ j_1+0,j_2+1, \ldots, j_k+(k-1) \}.

The composite map takes a multisubset of weight w to a subset of weight w + k(k-1)/2. By the first section, q^{k(k-1)/2}\genfrac{[}{]}{0pt}{}{n+k-1}{k}_q enumerates k-subsets of \{0,1,\ldots, n+k-2\} by their weight. We therefore have a bijective proof that

\left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q .\qquad(2)

This is the q-lift of (1), that \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k}. It seems striking to me that this bijection is easier to describe that the stars and bars bijection, while proving a more general result.

The abacus interpretation

We digress to give another combinatorial interpretation of \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q. Observe that a partition whose Young diagram is contained in the k \times (n-k) box is uniquely specified by the sequence 0 \le j_1 \le \ldots \le j_k \le n-k of x-coordinates of its k distinct upward steps. (See examples shortly below.) Moreover, since an upward step at x-coordinate j adds j boxes to the partition, the size of the partition is the weight of the multisubset \{j_1,\ldots, j_k\}. Therefore \genfrac{[}{]}{0pt}{}{n}{k}_q enumerates partitions in the k \times (n-k) box by their size. Using this partition setting, the map

(j_1,\ldots, j_k) \mapsto \{ j_1+0,j_2+1, \ldots, j_k+(k-1) \}

from weakly increasing sequences to subsets of \{0,1,\ldots, n-1 \} corresponds to recording a partition not by the x coordinates of its upward steps, but instead by the step numbers of its upward steps, numbering steps from 0.

Illustrative examples with k =4 and n-k = 5 of the partitions (4,3,3,2) and (5,4,1) are shown in the diagram below with step numbers in bold.

Image

Below each partition is the corresponding abacus: note that the step numbers of the upward steps are simply the bead positions on the abacus. This is related to Loehr’s beautiful labelled abaci model for antisymmetric functions.

Exercise. Use the abacus to give an alternative proof of the lemma, restated below

\genfrac{[}{]}{0pt}{}{n}{k}_q =  q^{n-k} \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Perhaps surprisingly, this is not the unique q-lift of Pascal’s Rule. Use the abacus in a different way to prove that

\genfrac{[}{]}{0pt}{}{n}{k}_q = \genfrac{[}{]}{0pt}{}{n-1}{k-1}_q + q^k \genfrac{[}{]}{0pt}{}{n-1}{k}_q.

Exercise. Prove that

(1 + qz)(1+q^2 z) \ldots (1+q^{n-1}z) = \sum_{k=0}^n q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q z^k

This is one of many identities that generalize the usual Binomial Theorem.

Symmetric functions.

For k \in \mathbb{N}_0 let e_k and h_k be the elementary and complete symmetric functions of degree k in variables x_1,x_2,\ldots. Thus e_k is the sum of all products of k distinct variables and h_k is the sum of all products of k variables, repetitions allowed. More generally, let s_\lambda denote the Schur function labelled by the partition \lambda as defined combinatorially by

\displaystyle s_\lambda= \sum_{t \in \mathrm{SSYT}(\lambda)} x^t

where if the semistandard tableau t has c_i entries of i for each i \in \mathbb{N} then x^t = x_1^{c_1} x_2^{c_2} \ldots . It is a good basic exercise to show that e_k = s_{(1^k)} and h_k = s_{(k)}. It is immediate from our definition of q-binomial coefficients that

e_k(1,q,\ldots, q^{n-1}) = q^{k(k-1)/2} \genfrac{[}{]}{0pt}{}{n}{k}_q

and it is immediate from the q-stars and bars identity that

h_k(1,q,\ldots, q^{n-1}) = \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right] = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q

and so h_k(1,q,\ldots, q^{n-k}) = \genfrac{[}{]}{0pt}{}{n}{k}. The q-stars and bars identity becomes

h_k(1,q,\ldots, q^{n-1}) = q^{-k(k-1)/2} e_k(1,q,\ldots, q^{n+k-2}) \qquad(3)

All these identities are special cases of the result, almost immediate from the combinatorial definition of Schur functions that s_\lambda(1,q,\ldots, q^{k-1}) enumerates semistandard tableaux with entries from \{0,1,\ldots, k-1\} by their weight, i.e. their sum of entries.

Characters of \mathrm{SL}_2(\mathbb{C})

The reason for bringing in symmetric functions is the following theorem which connects representations of \mathrm{SL}_2(\mathbb{C}) with plethysms of symmetric functions and hence enumeration of semistandard tableaux. Let \nabla^\lambda denote the Schur function labelled by the partition \lambda; if E is a d-dimensional vector space then the trace of the diagonal matrix with entries \alpha_1, \ldots, \alpha_d acting on \nabla^\lambda(E) is s_\lambda(\alpha_1,\ldots, \alpha_d). It is an instructive exercise to verify from this property that \nabla^{(k)}E \cong \mathrm{Sym}^k E and \nabla^{(1^k)}E \cong \bigwedge^k E.

Theorem. Let E be the natural representation of \mathrm{SL}_2(\mathbb{C}). The representations \nabla^\lambda \mathrm{Sym}^\ell E and \nabla^\mu \mathrm{Sym}^m E are isomorphic if and only if the specialized Schur functions s_\lambda(1,q,\ldots, q^\ell) and s_\mu(1,q,\ldots, q^m) are equal up a power of q.

Proof. See Theorem 3.4(b) and (d) in my joint paper with Rowena Paget. \Box

One reason for not giving full details at this point is that it enables me to pass lightly over one technicality. By our definition, the representation \mathrm{Sym}^\ell E of \mathrm{GL}_2(\mathbb{C}) has character

h_\ell(x_1,x_2) = x_1^\ell + x_1^{\ell-1}x_2 + \cdots + x_2^\ell.

(Or to be precise, the trace of the diagonal matrix with entries \alpha, \beta acting on \mathrm{Sym}^\ell E is this polynomial evaluated at x_1 = \alpha, x_2 = \beta.) Restricting to \mathrm{SL}_2(\mathbb{C}), we should impose the relation x_1x_2 = 1, meaning that if x_1 = Q then x_2 = Q^{-1}, and so we work most naturally with Laurent polynomials in Q. (This is essential since unequal characters of \mathrm{GL}_2(\mathbb{C}) may become equal on restriction to \mathrm{SL}_2(\mathbb{C}). In fact this happens if and only if they differ by a power of the determinant representation.) Thus as a representation of \mathrm{SL}_2(\mathbb{C}), \mathrm{Sym}^\ell\! E has character

s_\ell (Q,Q^{-1}) = Q^\ell + Q^{\ell-2} + \cdots + Q^{-\ell}

and more generally, \nabla^\lambda \mathrm{Sym}^\ell E has character

s_\lambda(Q^{\ell}, Q^{\ell-2}, \ldots, Q^{-\ell}) = (s_\lambda \circ s_\ell)(Q, Q^{-1})

where \circ is the plethysm product. (This is the main idea needed to prove the theorem.) One can always pass from a Laurent polynomial identity in \mathbb{C}[Q,Q^{-1}] to a polynomial identity in \mathbb{C}[q] polynomials by setting q = Q^2 and multiplying through by a suitable power of Q, so it is possible to work in either setting.

Exercise. Give as many proofs as you can that the q-binomial coefficient \genfrac{[}{]}{0pt}{}{n}{k}_q is centrally symmetric; that is, writing [q^a]f(z) for the coefficient of q^a in the polynomial f(q), we have

[q^r] \genfrac{[}{]}{0pt}{}{n}{k}_q = [q^{n(n-1)/2-r}] \genfrac{[}{]}{0pt}{}{n}{k}_q

for all r \in \mathbb{N}_0. [Hint: there is a one-line proof using Q-characters. For a proof at the level of polynomials, show that the reverse of a polynomial f(q) of degree c is z^c f(q^{-1}), and that the palindromic property is preserved by multiplication and division, and then apply this to the formula in the proposition.]

It is worth noting that the sum of centrally symmetric polynomials (as opposed to Laurent series) is centrally symmetric if and only if they have the same degree. This appears to be one obstacle to giving a representation theory interpretation of identities such as the q-lift of Pascal’s Rule in the first lemma.

Exercise. Use the representation theory of \mathrm{SL}_2(\mathbb{C}) to show that the q-binomial coefficients \genfrac{[}{]}{0pt}{}{n}{k}_q are unimodal; that is, the coefficients increase up to the middle degree(s) and then decrease.

The Wronksian isomorphism.

Taking \lambda = (1^k) and \mu = (k) in the theorem, we obtain that the representations \bigwedge^k \mathrm{Sym}^\ell E and \mathrm{Sym}^k \mathrm{Sym}^m E of \mathrm{SL}_2(\mathbb{C}) are isomorphic if and only if e_k(1,q,\ldots, q^{\ell-1}) = q^{\ell(\ell-1)/2}\genfrac{[}{]}{0pt}{}{\ell}{k}_q and h_k(1,q,\ldots, q^{m-1}) = \genfrac{[}{]}{0pt}{}{m+k-1}{k}_q are equal up a power of q. (These equalities were proved in the previous subsection.) We therefore set \ell = n + k -2 and obtain the Wronksian isomorphism.

\bigwedge^k \mathrm{Sym}^{n+k-2} E \cong \mathrm{Sym}^k \mathrm{Sym}^{n-1} E \qquad (4)

of representations of \mathrm{SL}_2(\mathbb{C}), lifting the q-stars and bars identity \left[\!\genfrac{[}{]}{0pt}{}{n}{k}\!\right]_q = \genfrac{[}{]}{0pt}{}{n+k-1}{k}_q to the level of representations. Instead going down to the level of ordinary binomial coefficients, by setting q=1, we obtain \left(\!\binom{n}{k}\!\right) = \binom{n+k-1}{k} which is where we started. This is however only the start of the story: in my joint paper with Eoghan McDowell, we showed that the modified Wronksian isomorphism

\bigwedge^k \mathrm{Sym}^{n+k-2}\! E \cong \mathrm{Sym}_k \mathrm{Sym}^{n-1} \! E \qquad(5)

in which the symmetric power \mathrm{Sym}^k \! E (defined as a quotient of E^{\otimes k}) is replaced with its dual \mathrm{Sym}_k E (defined as a submodule of E^{\otimes k}), holds as an isomorphism of representations of \mathrm{SL}_2(\mathbb{F}) for an arbitrary field \mathbb{F}. We do this by defining an explicit map from the right-hand side to the left-hand side. This is the most refined lift I know to date of stars and bars.

Exercise. Let E = \langle X, Y \rangle_\mathbb{F} be a 2-dimensional vector space over \mathbb{F}_2. Show that, regarded as a representation of \mathrm{GL}_2(\mathbb{F}), \mathrm{Sym}^2 \! E = \langle X^2, Y^2, XY \rangle has \langle X^2, Y^2\rangle as an irreducible 2-dimensional submodule with quotient isomorphic to the determinant representation and, dually,

\mathrm{Sym}_2 E = \langle X\otimes X, Y \otimes Y, X \otimes Y + Y \otimes X \rangle

has \langle X  \otimes Y + Y \otimes X \rangle as a one-dimensional submodule isomorphic to the determinant representation, with quotient the irreducible 2-dimensional representation already seen.

A proof of the Wronskian isomorphism after Darij Grinberg

For a very elegant proof of the Wronskian isomorphism, working still over a field \mathbb{F} of arbitrary characteristic, see these notes of Darij Grinberg, written in response to my joint paper. Here I will give an outline of Grinberg’s very beautiful construction.

Setup

Let E = \langle X, Y \rangle_\mathbb{F} as above. Then \mathrm{Sym}^{n-1} E can be thought of as the space of homogeneous polynomials of degree n-1 in X and Y; it is the degree n-1 component of the polynomial algebra

\mathbb{F}[X,Y] = \bigoplus_{r \ge 0} \mathrm{Sym}^r \! E.

(It may be helpful to remember the alternative notation \mathrm{Sym}^\bullet \! E for either side.) Thus we can identify \mathrm{Sym}_k \mathrm{Sym}^{n-1} E with a subspace of \mathrm{Sym}_k \mathbb{F}[X,Y] Since, given any vector space V, by definition \mathrm{Sym}_k V is the subspace of V^{\otimes k} of symmetric tensors, we can, in turn, identify \mathrm{Sym}_k \mathrm{Sym}^{n-1} E with the subspace of \bigl( \mathbb{F}[X,Y]\bigr)^{\otimes k} spanned by the symmetrisation of all tensors

X^{n-1-j_1}Y^{j_1} \otimes \cdots \otimes X^{n-1-j_k}Y^{j_k}

for 1 \le j_1 \le j_2 \le \ldots \le j_k \le n-1. (Note that this symmetrization should be performed without introducing scalar factors, giving a basis X \otimes X, X \otimes Y + Y \otimes X, Y \otimes Y as in the exercise above when k=2 and n=2.) Finally we use that this tensor product is isomorphic to \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] by the very explicit isomorphism defined on a spanning set by

f_1(X,Y) \otimes \cdots \otimes f_k(X,Y) \mapsto f_1(X_1,Y_1)\ldots f_k(X_k,Y_k).

The image of \mathrm{Sym}_k \mathrm{Sym}^{n-1} E is all polynomials in \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] of homogeneous degree n-1 with respect to each pair of variables X_i, Y_i that are symmetric with respect to permutation of indices. It might at first seem a little strange that the non-commutative tensor product can be represented in a commutative polynomial algebra, but I think any confusion is resolved when one realises that the indices of the X_i, Y_i serve to identify which tensor factor all variable come from.

This chain of maps gives the left-hand column of the commutative diagram below.

Image

The right-hand column is similar, and in fact a little easier to think about, as the map \bigwedge^k V \rightarrow V^{\otimes k} can be defined very simply by

v_1 \wedge \ldots \wedge v_k \mapsto \sum_{\sigma \in \mathfrak{S}_k} \mathrm{sgn}(\sigma) v_{1\sigma^{-1}} \wedge \ldots v_{k\sigma^{-1}}

without any worries about introducing scalar factors. The image of \bigwedge^k \mathrm{Sym}^{n+k-2} E is all polynomials in \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] of homogeneous degree n+k-2 with respect to each pair of variables X_i, Y_i that are antisymmetric with respect to permutation of indices.

Antisymmetrization and Grinberg’s map

Let g \cdot \sigma denote the action of \sigma \in \mathfrak{S}_k on the indices of a polynomial g \in \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k]. This corresponds to position permutation of tensor factors, as in the displayed equation ending the previous section. Hence the image of the composite map from \bigwedge^k \mathbb{F}[X,Y] to \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k] is all polynomials

\sum_{\sigma \in \mathfrak{S}_k} \mathrm{sgn}(\sigma) g \cdot \sigma.

Denote this antisymmetric polynomial by \mathrm{alt}(g). Observe that if we apply \mathrm{alt} to a polynomial that is already antisymmetric (for instance, one in the image of the right-hand column maps) then we multiply by k!. Thus \mathrm{alt} is a pseudo-projection.

Let \psi : \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] \rightarrow \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] be the linear map defined by

\psi(g) = \mathrm{alt}\bigl( g (X_1^{k-1}) (X_2^{k-2}Y_2) \ldots (X_{k-1}Y_{k-1}^{k-2}) (Y_k^{k-1}) \bigr)

for g \in \mathbb{F}[X_1,Y_1,\ldots, X_k, Y_k].

Observe that if g is homogeneous of degree n-1 in each pair X_i, Y_i (as in polynomials in the image of the first column) then the multiplication boosts the homogeneous degree of g from n-1 to n+k-2. Since we then antisymmetrize, the composite map going down the left-hand column and then across has image in the subspace of \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] defined by the going down the right-hand column. Moreover, by the key identity

\mathrm{alt}(f h) = f \mathrm{alt}(h)

for polynomials f symmetric under index permutation, it follows that the restriction of \psi to be subspace of such polynomials is defined by multiplication by

\mathrm{alt}\bigl( (X_1^{k-1}) (X_2^{k-2}Y_2) \ldots (X_{k-1}Y_{k-1}^{k-2}) (Y_k^{k-1}) \bigr).

This element can be rewritten as

\prod_{1 \le i < j \le k} (X_iY_j - X_jY_i)

and so affords the power \mathrm{det}^{k(k-1)/2} of the determinant representation of \mathrm{GL}_2(\mathbb{F}), which is of course the trivial representation of \mathrm{SL}_2(\mathbb{F}). It follows that \psi induces an \mathrm{SL}_2(\mathbb{F})-equivariant map, denoted \tilde{\psi} in the commutative diagram above. Because multiplication in the integral domain \mathbb{F}[X_1,Y_1,\ldots,X_k,Y_k] is injective, \tilde{\psi} is injective. By dimension counting it is surjective, and we are done.

In fact \tilde{\psi} is precisely the map defined in my joint paper, but described in a much more transparent way, which of course Grinberg exploits in the proof, as I have attempted to show above. See Section 0.1 of his paper for details.


Notes on the Wedderburn decomposition

September 10, 2024

Let G be a finite group and let \mathrm{Irr}(G) denote the set of irreducible representations of G up to isomorphism. Wedderburn’s Theorem is that there is an algebra isomorphism

\mathbb{C}[G] \cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C}).

where \mathbb{C}[G] is the complex group algebra of G. Considering the modules for the right-hand side, we see that if V is a simple \mathbb{C}[G]-module then \mathbb{C}[G] acts on V as a full matrix algebra, and the kernel of the action map is the sum of the other block algebras. Thus Wedderburn’s Theorem implies Jacobson’s Density Theorem for simple \mathbb{C}[G]-modules, namely that the matrices representing the action G on a simple \mathbb{C}[G]-module V span \mathrm{Mat}_{\mathrm{dim} V}(\mathbb{C}).

This corollary, that irreducible representations of finite groups are simply matrix algebras acting on column vectors often strikes me as a little surprising: are irreducible representations really so boring? I think the answer is ‘yes they are, from the ring-theoretic perspective’. Consider that many of the interesting problems in representation theory concern the decomposition of big representations into irreducibles: for instance, for the symmetric group, Foulkes’ Conjecture in characteristic zero, or the decomposition matrix problem in prime characteristic. For such problems we care very much about the particular label of the irreducible representations (these come from the group structure), not just their dimension (which we have seen is entirely ring-theoretic information).

In this post I shall give three proofs of Wedderburn’s Theorem, with critical commentary. All three proofs need the following basic result from character theory.

Decomposing the regular character of G

By character orthogonality, the regular character of \mathbb{C}[G] defined by \pi(g) = |G|[g = \mathrm{id}_G] decomposes as \pi = \sum_{V \in \mathrm{Irr}(G)} (\dim V) \chi_V. Hence as a left \mathbb{C}[G]-module,

\mathbb{C}[G] \cong \bigoplus_{V \in \mathrm{Irr}(G)}  V^{\oplus \dim V} \qquad (\star)

and in particular

|G| = \sum_{V \in \mathrm{Irr}(G)} (\dim V)^2. \qquad (\dagger)

The inversion theorem proof

Let \rho_V : G \rightarrow \mathrm{GL}(V) for V \in \mathrm{Irr}(G) be the irreducible representations of G. We extend the domain of each \rho_V to \mathbb{C}[G] by linearity, so now \rho_V : \mathbb{C}[G] \rightarrow \mathrm{End}(V) is an algebra homomorphism. Identifying \mathrm{End}(V) with \mathrm{Mat}_{\dim V}(\mathbb{C}) by a fixed choice of basis, we obtain a map

\mathbb{C}[G] \rightarrow \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C}).

We must show it is an algebra isomorphism. By dimension counting using (\dagger) it suffices to show that it is injective. Thus we must show that each x \in \mathbb{C}[G] is determined by its actions \rho_V(x) on a full set of irreducible representations. For this, we use the following inversion formula:

x = \sum_{V \in \mathrm{Irr}(G)} \frac{\chi_V(1)}{|G|} \sum_{g \in G} \mathrm{tr} \rho_V(g^{-1} x) g.

To see this is true, it suffices, by linearity, to prove it when x is a group element. In this case \mathrm{tr} \rho_V(g^{-1}x) = \chi_V(g^{-1} x) and the formula follows from the fact that \sum_{V \in \mathrm{Irr}(G)} \frac{\chi_V(1)}{|G|} \chi_V(g^{-1}x) is 1 if g is x, and 0 otherwise. This fact is just column orthogonality between the column of g^{-1}x and the column of the identity element of G.

Critical remarks

The obvious criticism is: ‘where did the inversion formula come from?’ One way to reconstruct it is to use that the central primitive idempotent for the irreducible representation V is

e_V = \frac{\chi_V(1)}{|G|} \sum_{g \in G} \chi_V(g^{-1}) g.

We remark that e_V is clearly in the centre of \mathbb{C}[G] because its coefficients are constant on conjugacy classes; moreover, it is an easy calculation using row orthogonality to show that e_V e_W = 0 if V \not\cong W and e_V^2 = e_V. These properties can also be used to deduce the formula if one has forgotten it.

Now suppose that x \in \mathbb{C}[G] is in the pre-image of the Wedderburn block \mathrm{Mat}_{\dim V}(\mathbb{C} so acts non-zeroly only on V. We then have x e_V = x and so the coefficient x(g) of g in x is the coefficient of g in x e_V namely

\begin{aligned} \sum_k x(k) e_V(k^{-1} g) &= \sum_k x(k) \chi_V(1)/|G| \chi_V(g^{-1} k) \\ &= \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} \sum_k x(k) k )  \\ &= \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} x).\end{aligned}

Thus x = \sum_{g \in G} \frac{\chi_V(1)}{|G|} \chi_V(g^{-1} x) g. (This formula is useful also in modular representation theory: see my blog post Modular representations that look ordinary.) We have now proved the inversion formula in the special case; summing over all irreducible representations V gives it in general. Intuitively, borrowing Kac’s phrase ‘one can hear [from its action on irreducible representations] the shape of an element of a semisimple group algebra’.

A less obvious criticism but one that should be made, is that the proof (particularly when read with the intuition above) can easily be misunderstood as proving more than is possible. The subtle point is that when we are given the image of x under the Wedderburn isomorphism, we get a set of |\mathrm{Irr}(G)| block matrices, not the corresponding labels of irreducible representations. But since the real point is to prove injectivity, not to show that this ‘abstract’ decomposition determines the action of x on each irreducible representation (which of course is false whenever there are two irreducible representations of the same dimension), we can assign the labels as we set fit.

If G is a finite abelian group then the inversion formula states that if x : G \rightarrow \mathbb{C} is any function then

x(g) = \frac{1}{|G|} \sum_{\chi \in \widehat{G}}  \chi(g^{-1})\chi(x)

where we write \widehat{G} rather than \mathrm{Irr}(G) for the character group of G. (To deduce this from the inversion formula, just identify x with the corresponding element of the group algebra, namely \sum_{g \in G} x(g)g.) This can be interpreted as Fourier inversion, reading \chi(x) as `the Fourier coefficient of x at the irreducible \chi‘. For example, if G = \mathbb{Z}/N\mathbb{Z} then, setting \zeta = \exp (2 \pi i /N) we get

x(m) = \frac{1}{n} \sum_{t = 0}^{N-1} \zeta^{- m t} \chi_t(x)

where \chi_t(x) = \sum_{r = 0}^{N-1} x(r) \zeta^{r t}. The very first post in this blog, from September 2010, has at bit more on this perspective.

The Schur’s Lemma proof

By Schur’s lemma applied to (\star) we get

\begin{aligned} \mathrm{End}_{\mathbb{C}[G]} \mathbb{C}[G] &\cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{End}_{\mathbb{C}[G]} (V^{\oplus \dim V})  \\ &\cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{C})\end{aligned}

To illustrate the second isomorphism, consider the case \dim V = 2 and let v \in V be a any non-zero vector. Let V' be isomorphic to V by an isomorphism sending v to v'. Clearly any \mathbb{C}[G] endomorphism of V \oplus V' is determined by its effect on v and v', and less clearly, but I hope you’ll see that it follows from yet more Schur’s Lemma, it is of the form v \mapsto \alpha v + \gamma v' and v' \mapsto \beta v + \delta v' for some \alpha,\beta,\gamma,\delta \in \mathbb{C}, corresponding to the matrix

\left( \begin{matrix} \alpha & \gamma \\ \beta & \delta \end{matrix} \right).

To complete the proof we use that any \mathbb{C}[G]-endomorphism of \mathbb{C}[G] is of the form x \mapsto xg for a unique g \in G (determined as the image of \mathrm{id}_G) and so \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[G] \cong \mathbb{C}[G]^{\mathrm{op}}, the opposite algebra to G. This gives the Wedderburn isomorphism, but with an inconvenient ‘opposite’, which can be removed by noting that X \mapsto X^{\mathrm{tr}} is an isomorphism between \mathrm{Mat}_d(\mathbb{C}) and \mathrm{Mat}_d(\mathbb{C})^{\mathrm{op}}.

Critical remarks

The strength of this proof to my mind is that it has just one idea: ‘look at the endomorphism algebra of \mathbb{C}[G] as a module for itself’. After that, it’s just the remorseless application of Schur’s Lemma. The one idea requires is well motivated by experience with computing with representations. For instance, one very important link between character theory and the theory of permutation groups is that if \mathbb{C}[\Omega] is the permutation representation of G acting on a set \Omega, with canonical basis \{ e_\omega : \omega \in \Omega \}, then \dim \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[\Omega] = 2 and

\mathbb{C}[\Omega] =  \bigl\langle \sum_{\omega \in \Omega} e_\omega \bigr\rangle \oplus V

for an irreducible representation V if and only if the action of G on \Omega is 2-transitive.

The weaknesses are, in my view: (1) it meets the criterion to be a ‘one shot’ proof, but the fuss with the opposite algebra lengthens the proof; (2) the map \mathbb{C}[G] \rightarrow \mathrm{Mat}_{\dim V}(\mathbb{C}) it gives comes not from G acting on V (as we expected) but from G acting on V^{\oplus \dim V}, in a way that commutes with the diagonal left action of \mathbb{C}[G]. Taken together (1) and (2) make the isomorphism far less explicit than need be the case.

Digression on Yoneda’s Lemma

The appearance of the opposite algebra can be predicted: I’d like to digress briefly to show that it’s just a manifestation of the ‘opposite’ that appears in Yoneda’s Lemma. Stated for covariant functors, a special case of Yoneda’s Lemma states that natural transformation between the representable functors \mathrm{Hom}_\mathcal{C}(x, -) and \mathrm{Hom}_\mathcal{C}(y, -) are in bijection with \mathrm{Hom}_\mathcal{C}(y, x); given such a morphism g_{yx} : y \rightarrow x, the map \mathrm{Hom}_\mathcal{C}(x, -) \rightarrow \mathrm{Hom}_\mathcal{C}(y, -) is defined by f \mapsto f \circ g_{yx}, composing from right-to-left. (What other source of natural maps between these hom-spaces could there possibly be beyond morphisms from y to x?) This makes the Yoneda embedding x \mapsto \mathrm{Hom}_\mathcal{C}(x, -) a contravariant functor from \mathcal{C} to \mathbf{Set}^\mathrm{op}. (Slogan: ‘pre-composition is contravariant’.)

Cayley’s Theorem is the special case where \mathcal{C} is a group (a category with one object and all morphisms invertible): the Yoneda embedding is then the map G \mapsto \mathrm{Sym}_G defined by g \mapsto \theta_g where \theta_g(x) = xg. Note that g' g which may be read as ‘do g then g'‘ maps to \theta_{g g'} = \theta_{g'} \circ \theta_{g} — we check:

\begin{aligned} (\theta_{g'} \circ \theta_{g})(x) &= \theta_{g'} \bigl( \theta_g(x) \bigr)  \\ & \quad =  \theta_{g'} (xg) = (xg)g' = x(gg') = \theta_{g g'}(x). \end{aligned}

Thus the Yoneda embedding is not a group homomorphism, but becomes one if we regard the codomain as \mathrm{Sym}_G^{\mathrm{op}}. The proof above that \mathrm{End}_{\mathbb{C}[G]}\mathbb{C}[G] \cong \mathbb{C}[G]^{\mathrm{op}} is almost the same ‘dancing around Yoneda’ but this time with a version of Yoneda’s Lemma for categories enriched over \mathbb{C}-vector spaces.

Incidentally, the Yoneda embedding becomes covariant if we instead take natural transformations from \mathrm{Hom}_\mathcal{C}(-, x) to \mathrm{Hom}_\mathcal{C}(-, y) (‘post-composition is covariant’), but the ‘opposite’ pops up in another place, as these hom-into functors are contravariant, or, in the more modern language, ordinary covariant functors but defined on \mathcal{C}^\mathrm{op}.

Finally, Yoneda’s Lemma is often stated as

\mathrm{Nat}(\mathrm{Hom}_\mathcal{C}(x, -), F) \cong F(x)

‘natural transformations from representable functors are representable’; observe that when F = \mathrm{Hom}_\mathcal{C}(y, -), we have F(x) = \mathrm{Hom}_\mathcal{C}(y,x), so this is indeed a generalization of the result as I stated it.

The proof via density

In this ‘two shot’ proof we first prove Jacobsen’s Density Theorem for \mathbb{C}[G]-modules. The proof below closely following Tommaso Scognamiglio elegant proof on MathStackExchange. As motivation, recall that a representation of a finite group is irreducible if and only if it is cyclic, generated by any non-zero vector. Given this, to prove density, it is very natural to show that \mathrm{Mat}(V) is an irreducible representation: but since (supposing the conclusion) we know that as a left \mathbb{C}[G]-module

\mathrm{Mat}(V) \cong V\, \oplus \stackrel{\dim V}{\cdots}\! \oplus\, V

is not irreducible, we are going to have to consider a bigger group.

Density

There is a canonical \mathbb{C}-linear isomorphism \mathrm{End}(V) \cong V \otimes V^\star under which v \otimes \theta maps to the rank 1 endomorphism defined by x \mapsto v \theta(x). Recall that V^\star is a representation of G (acting on the left as usual) by \rho_{V^\star}(g) \theta = \theta \circ \rho_V(g^{-1}). The matrix giving the action of g is then \rho_V(g^{-1})^\mathrm{tr}, the transpose appearing because if \theta_1, \ldots, \theta_n is the dual basis to v_1, \ldots, v_n then from

\rho_V(g^{-1}) v_k = \sum_{j=1}^n \rho_V(g^{-1})_{jk} v_j \qquad (\ddagger)

we get (\theta_j \circ g^{-1})(v_k) = \rho(g^{-1})_{jk}, and so \rho_{V^\star}(g)\theta_j  = \sum_k \rho(g^{-1})_{jk} \theta_k, which then requires a transpose to agree with the matrix convention in (\ddagger). Thus V \otimes V^\star is a representation of G \times G with action defined by linear extension of

(f, h)(v \otimes \theta) = \rho_V(f) v \otimes \rho_{V^\star}(h)\theta

in which (f, h) \in G \times G acts with matrix \rho_V(f) \otimes \rho_V(h^{-1})^\mathrm{tr}. In particular \chi_{\mathrm{End}(V)} (f, h) = \chi_V(f) \chi_V(h^{-1}) = \chi_V(f) \overline{\chi_V(h)}. Hence

\begin{aligned} \langle \chi_{\mathrm{End}(V)}, \chi_{\mathrm{End}(V)} \rangle_{G \times G}  &= \frac{1}{|G|^2} \sum_{(f,h) \in G \times G} |\chi_V(f)|^2 |\chi_V(h)|^2   \\ &= \bigl( \frac{1}{|G|} \sum_{g \in G} |\chi_V(g)|^2 \bigr)  \\ &= 1. \end{aligned}

showing that \chi_{\mathrm{End}(V)} is an irreducible character and \mathrm{End}(V) is an irreducible representation of G \times G, and so (as remarked above) generated by any non-zero element. In particular, the image of the identity \mathrm{id}_V under the action of G \times G spans \mathrm{End}(V). Since, writing \cdot for the modular action,

\begin{aligned} \bigl( (f, h) \cdot \mathrm{id}_G \bigr) v &= (f \mathrm{id}_G h^{-1}) \cdot v  \\ &\quad = \rho_V(f) \rho_V(h^{-1}) = \rho_V(fh^{-1}) \end{aligned}

this span is the same as the span of all \rho_V(g) for g \in G, as required for density.

Proof of Wedderburn’s Theorem

By density, for each irreducible representation V \in \mathrm{Irr}(G), the induced map \rho_V : \mathbb{C}[G] \rightarrow \mathrm{Mat}_{\dim V}(\mathbb{C}) is surjective. Hence

\bigoplus_{V \in \mathrm{Irr}(G)} \rho_V : \mathbb{C}[G] \rightarrow \bigoplus_{V\in \mathrm{Irr}(G)}  \mathrm{Mat}_{\dim V}(\mathbb{C})

is a surjective algebra homomorphism. But by (\dagger), the dimensions on each side are |G|. Hence this map is an algebra isomorphism, as required.

Critical remarks

A strength of this proof is that you can swap in your own preferred proof of the density theorem. For instance, to prove the generalization to group algebras over an algebraically closed finite field that

\frac{\mathbb{F}[G]}{\mathrm{rad}\ \mathbb{F}[G]} \cong \bigoplus_{V \in \mathrm{Irr}(G)} \mathrm{Mat}_{\dim V}(\mathbb{F})

one can instead prove the density theorem for irreducible representations over an arbitrary field, and deduce that the map is injective because (either by definition, or an easy equivalent to the definition) an element of \mathbb{F}[G] acting as zero on every irreducible \mathbb{F}[G]-module lies in \mathrm{rad}\ \mathbb{F}[G]. The only disadvantage to my mind is that it’s less suitable for beginners: Jacobson’s Density Theorem is not an obvious result, and although I like Scognamiglio’s proof very much, the use of the G \times G action (on the left, but with one factor twisted over to the right, hence the fiddliness with transposition) could be off-putting.

Why ‘density’? A unitary example

The Weyl algebra \langle \hat{x}, \hat{p} : \hat{x}\hat{p}  - \hat{p}\hat{x}  = i \hbar \rangle models the position and momentum operators in quantum mechanics. The relation above is the algebraic formulation of the uncertainty principle that there can be no simultaneous eigenstate for both the position \hat{x} and momenum \hat{p} operators. In the canonical interpretation of (time independent) wave functions as elements of L^2(\mathbb{R}), i.e. square integrable complex valued functions on \mathbb{R}, there is a representation

\begin{aligned} \hat{x} \psi(x) &= x \psi(x) \\ \hat{p} \psi(x)  &= -i\hbar \frac{\mathrm{d} \psi(x)}{\mathrm{d} x}\end{aligned}

defined on the dense subspace of L^2(\mathbb{R}) of smooth functions in which the Weyl algebra generators act as self-adjoint operators. Exponentiating these operators we obtain \exp(i t \hat{x}) and \exp(i t \hat{p}) defined by

\begin{aligned} \exp(i t \hat{x}) \psi(x) &= \exp(itx) \psi(x) \\ \exp(i t \hat{p}) \psi(x) &= \exp (\hbar t \frac{\mathrm{d}}{\mathrm{d}x})\psi(x) =  \psi(x + \hbar t) \end{aligned}

which generate an irreducible unitary representation of the Heisenberg group by multiplication and translation operators. (The second equation is essentially Taylor’s Theorem, and any worries about its applicability can be assuaged by muttering ‘dense subspace’.) The Stone–von Neumann theorem is that any irreducible representation of the Heisenberg group is unitarily equivalent to this representation. An excellent question on MathStackexchange asks whether, whether these maps generate a dense subalgebra of \mathrm{End}(L^2(\mathbb{R})), suitably interpreted. The answer, perhaps surprisingly, is ‘no’ with respect to the operator norm, but ‘yes’ with the strong or weak operator topology; the proofs of this appeals to von Neumann’s double commutatant theorem, which is in turn is a generalization of the density theorem.

Automorphisms

Suppose that \tau is an algebra automorphism of \mathrm{Mat}_d(\mathbb{C}). Let V be the natural d-dimensional irreducible representation. Then twisting the action by \tau we get a new representation V^\tau in which \mathrm{Mat}_d(\mathbb{C}) acts by X \cdot v = X^\tau v. Since \mathrm{Mat}_d(\mathbb{C}) has a unique irreducible representation up to isomorphism, we have V^\tau \cong V. Hence, by definition of isomorphism of representations, there exists an invertible matrix T such that X^\tau = T^{-1}X T for all X \in \mathrm{Mat}_d(\mathbb{C}). Therefore \tau is conjugation by T. We have proved the special case of the Skolem–Noether theorem that all automorphisms of \mathrm{Mat}_d(\mathbb{C}) are inner, and so the automorphism group is isomorphic to \mathrm{PGL}_d(\mathbb{C}). I am grateful to Mariano for pointing out to me that the automorphism group of \mathbb{C}[G] is more complicated whenever there are several blocks of the same dimension.


Eigenvalues of the Kneser graph

June 8, 2023

A few days ago Ehud Friedgut gave a beautiful talk in the Bristol Combinatorics Seminar, titled ‘The most complicated proof ever of Mantel’s theorem, and other applications of the representation theory of the symmetric group‘ and based on joint work with Yuval Filmus, Nathan Lindzey and Yuval Wigderson. As the title promised, he used the representation theory of the symmetric group to prove Mantel’s Theorem, that if a graph on 2m vertices has more edges than the complete bipartite graph with bipartition into two parts of size m, then it contains a triangle. He illustrated the method by instead proving the Erdős–Ko–Rado Theorem, that the maximum size of a family of r-subsets of \{1,\ldots, n\} such that any two members of the family have a non-trivial intersection is \binom{n-1}{r-1}.

The first purpose of this post is to outline the new proof (as I understand it from Ehud’s talk) of this result, dropping in my own proof of one of the lemmas he used. Let me emphasise that — apart from the minor proof of the lemma, which in any case uses results of James — no originality is claimed for anything in this part. I then show that the methods appear (modulo the proof of the claim below) to adapt to proof the analogue of the Erdős–Ko–Rado Theorem for vector spaces.

A graph for the Erdős–Ko–Rado Theorem

The graph G we need to prove the Erdős–Ko–Rado Theorem is the Kneser graph, having as its vertices the r-subsets of \{1,\ldots, n\}. Two subsets X and Y are connected by an edge if and only if X \cap Y = \varnothing. The Erdős–Ko–Rado Theorem is then the claim that the maximum size of an independent set in G is \binom{n-1}{r-1}. As we shall see below, this follows from the Hoffman bound, that an independent set in a d-regular graph on N vertices with adjacent matrix A and least eigenvalue -\lambda has size at most N \lambda / (d+\lambda). (Note that since the sum of the eigenvalues is the trace of the adjacency matrix, namely 0, and the largest eigenvalue is the degree d, the least eigenvalue is certainly negative.)

Representations

To find the eigenvalues of A, Ehud and his coauthors use the representation theory of the symmetric group. Let V be the permutation module of S_n acting on the r-subsets of \{1,\ldots, n\}. It is well known that V has character

\chi^{(n)} + \chi^{(n-1,1)} + \cdots + \chi^{(n-r,r)}.

Since this character is multiplicity-free, there is a unique direct sum decomposition of V into Specht modules,

V = S^{(n)} \oplus S^{(n-1,1)} \oplus \cdots \oplus S^{(n-r,r)}. \quad (\star)

The adjacent matrix A acts as a linear map on V that commutes with the G-action. Therefore, by Schur’s Lemma, for each s with 0\le s \le r there is a scalar \lambda(s) such that A acts on S^{(n-s,s)} as multiplication by \lambda(s). Thus the \lambda(s) are the eigenvalues (possibly with repetition, but this turns out not to be the case) of A.

Lemma. The adjacency matrix A acts on S^{(n-s,s)} by scalar multiplication by (-1)^s\binom{n-r-s}{r-s}.

Proof. It follows from the theory in Chapter 17 of James’ Springer lecture notes on the symmetric groups that V has a filtration

V = V_0 \supset V_1 \supset \ldots \supset V_r \supset 0

such that V_r \cong S^{(n-r,r)} and V_s / V_{s+1} \cong S^{(n-s,s)} for 0\le s < r. The submodule V_r is spanned by certain ‘partially symmetrized’ polytabloids. Rather than introduce this notation, which is a bit awkward for a blog post, let me instead suppose that S_n acts on the set \{1,\ldots, s, -1, \ldots, -s\} \cup Z where |Z| = n-2s. Let X be a fixed (r-s)-subset of Z. Then V_s is a cyclic \mathbb{C}S_n-submodule of V generated (using right modules) by v(s) where

v(s) = \bigl( \{1,\ldots, s\} \cup X\bigr) \bigl(1-(1,-1)\bigr) \ldots \bigl( 1- (s,-s) \bigr)

There is a ‘raising map’ R : V_s \rightarrow S^{(n-s,s)} which — viewing S^{(n-s,s)} as a submodule of the permutation module of S_n acting on s-subsets — is defined on the generator v(s) by

v(s) \mapsto  \{1,\ldots, s\}  \bigl(1-(1,-1)\bigr) \ldots \bigl( 1- (s,-s) \bigr)

Fix an r-set \{\pm 1, \ldots, \pm s\} \cup X appearing in the expansion of v(s) in the canonical basis of V. Applying the adjacency matrix A sends this r-set to all those r-sets that it does not meet. Hence, after applying A and then R we get a scalar multiple of v(s) precisely from those sets \{\mp 1, \ldots, \mp s\} \cup Y in the support of

\bigl( \{\pm 1, \ldots, \pm s\} \cup X \bigr) A

such that Y is an (r-s)-subset of \{1,\ldots, n\} not meeting

\{1,\ldots, s,-1,\ldots, -s\} \cup X.

There are 2s + (r-s) = r+s forbidden elements, hence there are \binom{n-(r+s)}{r-s} choices for Y. Each comes with sign (-1)^s. \Box

End of proof

The lemma implies that \lambda(1) = -\binom{n-r-1}{r-1} is the least eigenvalue of A. Putting this into Hoffman’s bound and using that the graph G is regular of degree \binom{n-r}{r}, since this is the number of r-subsets of \{1,\ldots, n\} not meeting \{1,\ldots, r\}, we get that the maximum size of an independent set is at most

\begin{aligned} \frac{\binom{n}{r} \binom{n-r-1}{r-1}}{\binom{n-r}{r} + \binom{n-r-1}{r-1}}&= \frac{ \binom{n}{r}}{ \binom{n-r}{r}/\binom{n-r-1}{r-1} + 1} \\ &= \frac{ \binom{n}{r}}{ \frac{n-r}{r} + 1} \\ &= \textstyle \binom{n}{r} \frac{r}{n} \\&= \textstyle \binom{n-1}{r-1} \end{aligned}

which is exactly the Erdős–Ko–Rado bound. Thus we (or rather Ehud Friedgut and his coauthors) have proved the Erdős–Ko–Rado Theorem. Ehud then went on to outline how the method applies to Mantel’s Theorem: the relevant representation is, in the usual notation, M^{(n-3,2,1)}; note one may interpret an (n-3,2,1)-tabloid as an ordered pair \bigl( \{i, j\}, \{k \} \bigr) encoding the path of length 3 with edges \{i,k\} and \{j,k\}. There is a proof of Mantel’s Theorem by double counting using these paths: as Ehud remarked (quoting, if I heard him right, Kalai) ‘It’s very hard not to prove Mantel’s Theorem’.

Possible variations

I’ll end with two possible avenues for generalization, both motivated by the classification of multiplicity-free permutation characters of symmetric groups. My paper built on work of Saxl and was published at about the same time as parallel work of Godsil and Meagher.

Foulkes module

The symmetric group S_{2m} acts on the set of set partitions of \{1,\ldots, 2m\} into m sets each of size 2. The corresponding permutation character is multiplicity free: it has the attractive decomposition \sum_{\lambda \in \mathrm{Par}(n)} \chi^{2\lambda}, where 2\lambda is the partition obtained from \lambda by doubling the length of each path. Moreover, my collaborator Rowena Paget has given an explicit Specht filtration of the corresponding permutation module, analogous to the filtration used above of M^{(n-r,r)}. So everything is in place to prove a version of the Erdős–Ko–Rado Theorem for perfect matchings. Update. I am grateful to Ehud for pointing out that this has already been done (using slightly different methods) by his coauthor Nathan Lindzey.

Finite general linear groups

There is a remarkably close connection between the representations of S_n and the unipotent representations of the finite general linear group \mathrm{GL}_n(\mathbb{F}_q). Perhaps, one day, this will be explained by defining the symmetric group as a group of automorphisms of a vector space over the field with one element. The analogue of the permutation module M^{(n-r,r)} of S_n acting on r-subsets is the permutation module of \mathrm{GL}_n(\mathbb{F}_q) acting on r-subspaces of \mathbb{F}_q^n. This decomposes as a direct sum of irreducible submodules exactly as in (\star) above, replacing the Specht modules with their \mathrm{GL}_n(\mathbb{F}_q)-analogues. So once again the decomposition is multiplicity-free. This suggests there might well be a linear analogue of the Erdős–Ko–Rado Theorem in which we aim to maximize the size of a family of subspaces of \mathbb{F}_q^n such that any two subspaces have at least a 1-dimensional intersection.

Update. The Erdős–Ko–Rado Theorem for vector spaces was proved
in an aptly named paper by Frankl and Wilson in the general version for t-intersecting families. The special case t=1 is that the maximum number of r-dimenional subspaces of \mathbb{F}_q^n such that any two subspaces have at least a 1-dimensional interection is \binom{n-1}{r-1}_q for n \ge 2r, where the q-binomial coefficient is the number of r-1-dimensional subspaces of \mathbb{F}_q^{n-1}. Thus the maximum is attained by taking all the subspaces containing a fixed line, in precise analogy with the case for sets. (Note also that if n \le 2r-1 then any two r-dimensional subspaces have a non-zero intersection.)

James wrote a second set of lecture notes Representation of the general linear groups, pointing out several times the remarkable analogy with the representation theory of the symmetric group. At the time he wrote, before his work with Dipper on Hecke algebras, this seemed somewhat mysterious. (Very roughly, FS_n is to \mathrm{GL}_d(F) where F is an infinite field as the Hecke algebra \mathcal{H}_q(S_n) is to \mathrm{GL}_d(\mathbb{F}_q).) Chapter 13 of his book has, I think, exactly what we need to prove the analogue of the lemma above. Update: I also just noticed that Section 8.6 of Harmonic analysis on finite groups by Ceccherini-Silberstein, Scarabotti and Tolli, titled ‘The q-Johnson scheme’ is entirely devoted to the Gelfand pair (\mathrm{GL}_d(\mathbb{F}_q), H_e) where H_e is the maximal parabolic subgroup stabilising a chosen e-dimensional subspace.

Here is the setting for the general linear group. Fix r \le n/2 and let M^{(r)} be the permutation module of \mathrm{GL}_n(\mathbb{F}_q) acting on r-dimensional subspaces of \mathbb{F}_q^n defined over the complex numbers. For 0 \le s < r, let \psi_s : M^{(r)} \rightarrow M^{(s)} be the map sending a subspace to the formal linear combination of all s-dimensional subspaces that it contains. The \mathrm{GL}_d(q)-Specht module S^{(n-r,r)} is then the intersection of the kernels of all the \psi_s for s < r. Moreover if we set

\displaystyle M_s = \bigcap_{i=0}^{s-1} \ker \psi_i \subseteq M^{(r)}

then M^{(r)} = M_0 \supset M_1 \supset \cdots \supset M_r and M_r \cong S^{(n-r,r)} and M_s / M_{s+1} \cong S^{(n-s,s)} for 0 \le s < r.

The relevant graph, again denoted G has vertices all r-dimensional subspaces of \mathbb{F}_q^n; two subspaces are adjacent if and only if their intersection is \{0\}.

Lemma. The degree of G is \binom{n-r}{r}_q q^{r^2}.

Proof. Fix a basis e_1, \ldots, e_n for \mathbb{F}_q^n and let t \le n-r. An arbitrary t-dimensional subspace of \mathbb{F}_q^n intersecting \langle e_1, \ldots, e_r\rangle in \{0\} has the form

\langle v_1 + e_{i_1}, \ldots, v_t + e_{i_t} \rangle

where r+1 \le i_1 < \ldots < i_t \le n and

v_1, \ldots, v_t \in \langle e_1, \ldots, e_r \rangle.

These choices of 'pivot columns' and matrix coefficients uniquely determine the subspace. Hence there are q^{rt} \binom{n-r}{t} such subspaces. Taking t = r we get the result. \Box

I have checked the following claim for n \le 7 when q=2 and n \le 5 when q=3 by computer calculations. Note also that the q-binomial factor \binom{n-(r+s)}{r-s}_q counts, up to a power of q, the number of (r-s)-dimensional complements in \mathbb{F}_q^n to a fixed subspace of dimension r+s, in good analogy with the symmetric group case. But it seems some careful analysis of reduced row-echelon forms (like the argument for the degree of G above, but too hard for me to to do in an evening) will be needed to nail down the power of q. Examples 11.7 in James' lecture notes seem helpful.

Claim The adjacency matrix for the graph acts on S^{(n-s,s)} by scalar multiplication by (-1)^s q^{r^2 - rs + \binom{s}{2}}\binom{n-(r+s)}{r-s}_q.

Assuming the claim, the least eigenvalue is the one for s=1, and it is -q^{r^2-r} \binom{n-r-s}{r-s}_q. Putting this into Hoffman’s bound and using that the subspaces graph is regular of degree \binom{n-r}{r}_q q^{r^2} as seen above we get that the maximum size of an independent set is at most

\begin{aligned} \frac{\binom{n}{r}_q q^{r^2-r} \binom{n-r-1}{r-1}_q}{q^{r^2}\binom{n-r}{r}_q + q^{r^2-r}\binom{n-r-1}{r-1}_q} &= \frac{ \binom{n}{r}_q}{ q^r\binom{n-r}{r}_q/\binom{n-r-1}{r-1}_q + 1} \\ &= \frac{ \binom{n}{r}_q}{ q^r\frac{q^{n-r}-1}{q^r-1} + 1} \\ &= \textstyle \binom{n}{r} \frac{q^r-1}{q^r(q^{n-r}+1)+q^r-1} \\&= \textstyle \binom{n-1}{r-1}_q \end{aligned}

which, somewhat spectacularly I feel, again comes out on the nose.

A Putnam problem

The second generalization ties in with another project I’ve worked on but neglected to write up beyond a blog post: to prove the \mathrm{GL}_n(\mathbb{F}_q)-analogue of the spectral results on the Kneser graph motivated by Problem A2 in the 2018 Putnam.


Commuting group actions

April 28, 2023

In Schur–Weyl duality one has commuting actions of two algebras A and B on a vector space V such that each is the other’s centralizer. That is \mathrm{End}_A(V) = B and A = \mathrm{End}_B(V). In particular the actions of A and B on V commute. Using a left-action for A and a right-action for B, this says that

(av)b = a(vb)

for all a \in A, b \in B and v \in V, a fact one would surely expect from the notation. The purpose of this post is to record a lemma about commuting group actions that arose in this context.

Let G and H be finite groups with G acting on the left on \Omega and H acting on the right on \Omega, such the actions commute. For a simple example where G and H may be non-abelian, fix a group G and take \Omega = G = H. Then, for each x \in \Omega, we have

(gx)h = g(xh)

by the associativity of group multiplication. The slogan ‘action on places commutes with action on values’ gives another family of examples, typified by \Omega = \{1,\ldots, b\}^a with G = S_a acting on the left on \Omega by position permutation, and H = S_b acting on the right, diagonally on each of the a factors, by permuting values. For instance, if a = 4 and b = 2 then one joint orbit is

\{(1,\!1,\!2,\!2),\! (1,\!2,\!1,\!2),\! (2,\!1,\!1,\!2),\! (1,\!2,\!2,\!1), (2,\!1,\!2,\!1), (2,\!2,\!1,\!1) \}.

The orbit has size 6 so we expect the stabiliser of points in the orbit to have order 4! 2! / 6 = 8; a little thought shows that the stabiliser of (1,1,2,2) is

\langle (1,2)_G, (3,4)_G, (1,3)(2,4)_G (1,2)_H \rangle \cong C_2 \wr C_2.

(Here group elements are tagged by the group they live in.) Note the ‘diagonally embedded’ element (1,3)(2,4)_G (1,2)_H: even though we are working in G \times H, it is not the case that the stabiliser factors as \mathrm{Stab}_G(\omega)\mathrm{Stab}_H(\omega).

Since there is no established notation for the joint stabiliser in a simultaneous left- and right- action (in fact such a notation would probably just be a standing invitation to commit the diagonal fallacy), I’m going to switch to two commuting faithful right-actions, thought of as a right action of the Cartesian product G \times H, in which, by definition, the two factors commute. Since letters will make it clear which group contains each element, I’ll write gh rather than the formally correct (g, h).

Proposition.

Let G and H be finite groups. Suppose that G \times H acts transitively on a set \Omega. Fix \omega \in \Omega.

  1. The orbits of G on \Omega are blocks for the action of H.
  2. Every point in the G-orbit of \omega has the same H-stabiliser.
  3. We have G\mathrm{Stab}_{G \times H}(\omega) = G\mathrm{Stab}_H(\omega G) where \mathrm{Stab}_H(\omega G) is the subgroup of those h \in H fixing setwise the orbit \omega G. Moreover G is transitive on \Omega if and only both these groups are G \times H.
  4. There is an isomorphism of right \mathbb{C}G-modules \mathbb{C}(\omega G) \cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G.
  5. There are isomorphisms of right \mathbb{C}(G \times H)-modules\begin{aligned}\mathbb{C}\Omega &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_{G \times H} (\omega)}^{G \times H} \\ &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \bigl\uparrow_{G\mathrm{Stab}_{G \times H}(\omega)}^{G \times H} \\ &\cong \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \bigl\uparrow_{G\mathrm{Stab}_{H}(\omega G)}^{G \times H}\end{aligned}
  6. where we use (3) to regard \mathbb{C}\bigl\uparrow_{\mathrm{Stab}_G(\omega)}^G \cong \mathbb{C}(\omega G) as a G \mathrm{Stab}_{G \times H}(\omega)-module in the second line and as a G \mathrm{Stab}_H(\omega G)-module in the third line.

Proof.

  1. More generally, if N \le K is a normal subgroup of a permutation group K then the orbits of N are blocks for the action of K, and for the action of any subgroup of K.
  2. Points in the same orbit have conjugate stabilisers, but here the conjugation action is trivial. Or, to see it very concretely, suppose that \omega h = \omega. Then (\omega g)h = (\omega h)g = \omega g. This shows that \mathrm{Stab}_H(\omega) \le \mathrm{Stab}_H(\omega g) and by symmetry equality holds.
  3. The left-hand side is the set of cosets Gh such that \omega g h = \omega for some g \in G. The right-hand side is the set of cosets Gh such that \omega h = \omega g' for some g' \in G. These are equivalent conditions: take g' = g^{-1} and use that the actions commute. Finally, identifying \Omega with the coset space(G \times H)/\mathrm{Stab}_{G \times H}(\omega)we see that G is transitive if and only if \mathrm{Stab}_{G\times H}(\omega) G = GH. Since G is a normal subgroup of G \times H, we may switch the order to get G \mathrm{Stab}_{G\times H}(\omega) = G \times H, as required.
  4. This is very basic. A nice proof uses the characterisation of induced modules that V \cong U\uparrow_L^K if and only if V\downarrow_L has a submodule isomorphic to U and \dim V = [K : L]\dim U. In this case we take U to to be the right \mathbb{C}\mathrm{Stab}_G(\omega)-module \langle \omega \rangle \cong \mathbb{C}.
  5. The first isomorphism follows as in (4) using that G \times H acts transitively on \Omega. The second follows from the third by (3). Finally the third isomorphism can be proved similarly using the characterisation of induced modules. \Box

Note that each part has a dual result swapping G and H. Since there is a reasonable notation for bimodule induction, using, as above, uparrows on the right, and writing \mathrm{Ind} for induction on the left, we can state the following corollary with left- and right-actions.

Corollary.

Let G and H be finite groups with G acting on the left on \Omega and H acting on the right on \Omega, such the actions commute. Then \mathbb{C}\Omega is a left G– and right H-bimodule and, given any \omega \in \Omega we have

\begin{aligned} \mathbb{C}\Omega &\cong \bigl( \mathrm{Ind}_{\mathrm{Stab}_G(\omega)}^G \mathbb{C} \bigr)\bigl\uparrow_{G \mathrm{Stab}_H(G\omega)}^{G \times H} \\ &\cong  \mathrm{Ind}_{\mathrm{Stab}_G(\omega H)}^{G \times H} \bigl(\mathbb{C} \bigl\uparrow_{\mathrm{Stab}_H(\omega)}^H \bigr)\end{aligned}

Proof.

This is immediate from (5) in the proposition and its dual version.\Box

It is interesting to note that if, as in the example below, H is transitive, then the effect of the left induction functor in the isomorphism in the second line is simply to turn a right-module into a bimodule, without changing its dimension or its right-module structure in any way.

Example: permuting matchings.

To finish, here is one of a family of interesting examples that come from taking two wreath products whose top groups are equal (as abstract groups) but whose base groups are different. Let

\begin{aligned} G &= \langle (\overline{1},\overline{2}), (\overline{4},\overline{5}) \rangle \rtimes \langle (\overline{1},\overline{4})(\overline{2},\overline{5}) \rangle \\ H &= (S_{\{1,2,3\}} \times S_{\{4,5,6\}}) \rtimes \langle (1,4)2,5)(3,6) \rangle. \end{aligned}

The groups G and H act on the set \Omega of four edge matchings between the parts \{ \overline{1},\overline{2},\overline{4},\overline{5} \} and \{1,2,3,4,5,6\} that respect the blocks \{\overline{1},\overline{2} \}, \{\overline{3}, \overline{4}\}, \{1,2,3\}, \{4,5,6\} shown by the rooted trees in the graph below. The matching \omega shown by black lines is \{ (\overline{1},1), (\overline{2},2), (\overline{4}, 4), (\overline{5}, 5) \}.

Image

It is clear that the actions of G and H commute. Moreover, it is an easy exercise to show that |\Omega| = 72 and both G and H act regularly. We have

\omega (\overline{1}, \overline{2}) \omega = \{ (\overline{2},1), (\overline{1},2), (\overline{4}, 4), (\overline{5}, 5)

and so (\overline{1}, \overline{2}) \omega (1,2) = \omega. This is shown graphically below.

Image

Similarly (\overline{1},\overline{4})(\overline{2},\overline{5}) \omega (1,4)(2,5)(3,6) = \omega. On the other hand, the permutation (2,3) \in H cannot be undone by a permutation in G. In fact, switching to right-actions for both groups, as in the proposition, we have

\begin{aligned} \mathrm{Stab}_{G \times H}(\omega) &= \langle (\overline{1}, \overline{2})(1,2), (\overline{1},\overline{4})(\overline{2},\overline{5})(1,4)(2,5) \rangle \\ &\cong S_2 \wr S_2.\end{aligned}

Note that, as predicted by the dual version of (3) in the lemma, since H is transitive, the stabiliser meets every coset of G and \mathrm{Stab}_{G \times H}(\omega)H = G \times H.