Why are inner products symmetric?

So when I got to grad school, certain gaps in my (very unconventional) math education revealed themselves, and one of them was that I didn’t know anything about vector space duality. I took a graduate linear algebra course that first semester, but I think what actually caught me up to speed was my peers. I have a vivid memory of a conversation with Alex Blumenthal in which he said something to the effect of:

“You know what an inner product really is though?? It’s an isomorphism to the dual!!!

So, that was dope. Thanks, Alex!

But, once I thought about it, something bothered me a little. Suppose V is a real, finite-dimensional vector space. An inner product on V is a symmetric, positive definite bilinear form B:V\times V\rightarrow\mathbb{R}. But any nondegenerate bilinear form B:V\times V\rightarrow\mathbb{R} yields an isomorphism to the dual. The map given by v\mapsto B(v,-) for any v\in V, is a map V\rightarrow V^* by B‘s linearity in the second argument, is furthermore a linear map by the linearity of B in the first argument, and is injective by the nondegeneracy of B. Since V and V^* have the same dimension, this proves it is an isomorphism. So, why does an inner product have to be positive definite and symmetric?

Well, ok, look, if you want to use the inner product to define a metric (and, you know, who doesn’t?) then fine, I can see why they are needed. So, I’ve lived happily with the definition of inner products ever since. But still, this feels like geometry or analysis or something. Alex’s elegant explanation for what an inner product “really is”, was pure algebra. So, some part of me always wondered: is there something special, from a purely algebraic point of view, about the isomorphism V\rightarrow V^* that comes from an inner product? (Versus the isomorphisms that come from bilinear forms that are merely nondegenerate)?

Well, it took the intervening, what, 12 and a half years? But I figured it out. At least to my satisfaction.

I still see positive definiteness as essentially an analytic or geometric thing, not algebra. It would lose meaning if we switched to vector spaces over a positive-characteristic field, which from an algebraic point of view is not really a different setting. So for the purposes of this inquiry, I view positive definiteness just as a stand-in for nondegeneracy, which it implies. So the real mystery for me was always symmetry. What makes the isomorphism to the dual that arises from a symmetric bilinear form different from ones that arise from forms that are not symmetric?

I will answer below, but to set the stage, let’s talk about some other, more naive, ways to construct an isomorphism V\rightarrow V^*. For starters, because they’re both finite-dimensional vector spaces of equal dimension, one could just pick a basis e_1,\dots,e_n for V and a basis \ell_1,\dots,\ell_n for V^*, and then linearly extend the map e_j \mapsto \ell_j. This would certainly work.

Only slightly less naively, one could stop after the choice of basis e_1,\dots,e_n for V, and realize that this choice already allows the specification of an isomorphism. The chosen basis uniquely identifies a dual basis e_1^*,\dots, e_n^*\in V^*, characterized by the property that e_i^*(e_j) = \delta_{ij} (Kronecker delta). So there is no need to make an independent choice of basis for V^*; one can linearly extend the map e_j\mapsto e_j^* and be done with it.

All of this I have known for years, probably for the same 12.5 years since Alex clued me in in the first place. What I only just very recently figured out, is this:

The isomorphisms that arise from the second method are the same ones that arise from symmetric bilinear forms!!!

In other words, yes, any nondegenerate bilinear form will give you an isomorphism to the dual; but only the symmetric ones will map a basis of V to the dual basis!

I realized this in the context of representation theory. If V is an irreducible representation of a group G (over \mathbb{C}, say), then it is isomorphic as a representation to its dual representation V^* iff there is a nondegenerate bilinear form on V that is invariant under G. Because of irreducibility, this form (when it exists) will be unique up to scalar, and either symmetric or antisymmetric. In the former case, G is a subgroup of an orthogonal group; in the latter, it’s a subgroup of a symplectic group. In both cases, there exist bases for V and V^* according to which the matrices representing G (with respect to those bases) are identical (this is what it means for V and V^* to be isomorphic), but only in the former case can these bases be chosen to be dual to each other.

That said, I feel like this blog post ought to contain a justification for my excited, italicized claim above. And I can give you that justification in a totally elementary-linear-algebra, just-eff-with-matrices kind of a way!

Let’s just take V =\mathbb{R}^n, viewed as a space of column vectors. Then an arbitrary basis for V is given by the columns of a(n arbitrary) nonsingular square matrix M. Writing V^* also as column vectors (so that we can map between V and V^* using matrix multiplication), the dual basis is given by the columns of (M^{-1})^\top. Indeed, the rows of M^{-1} are evidently dual to the columns of M since the matrix product M^{-1}M is the identity; but I want to write V^* in terms of column vectors, so I am obligated to take a transpose.

All this said, the matrix of the transformation that sends our chosen basis to its dual basis is the product (M^{-1})^\top M^{-1}, because M^{-1} sends our chosen basis to the standard basis, and (M^{-1})^\top then sends the standard basis to the dual basis of our chosen one. And the matrix (M^{-1})^\top M^{-1} is evidently symmetric! 🦍

Gems from Isaacs’ book on characters iii: real and rational elements

This is part iii of the tear I’m on, writing down things I learned from I. Martin Isaacs’ book on characters of finite groups. Parts i and ii focused on controlling the degrees of irreducible characters, and I have more to share about that, but today let me focus on another theme: the group-theoretic conditions on an element x in a finite group G that force all characters to have real, or rational, values on x. Isaacs presents these conditions in Problems (2.11) and (2.12). I can’t believe I never learned this before. For example, I’ve heard for years that the characters of S_n are always ordinary integers, and now I finally know why!

Proposition (real elements): Let x be an element of a finite group G. Every character of G has a real value on x if and only if x is conjugate to its inverse.

This is Problem (2.11) in Isaacs.

Proof: For any character \chi of G and any group element x\in G, we have

\chi(x^{-1}) = \overline{\chi(x)}.

(E.g., Lemma (2.15d) in Isaacs.) If it happens that x and x^{-1} are conjugate, then we also have for any \chi that

\chi(x) = \chi(x^{-1}),

so that \chi(x) = \overline{\chi(x)}, and \chi(x) is real. Conversely, if x and x^{-1} are not conjugate, then they are separated by some character \chi, so that for this \chi we have \chi(x) \neq \overline{\chi(x)}, and \chi(x) is not real. 🦊

Proposition (rational elements): With the same notation as the previous proposition, \chi(x) is rational for every character \chi if and only if x is conjugate to x^m for every integer m prime to the order of G.

This is Problem (2.12) in Isaacs.

Proof: The idea is the same as the above, except instead of complex conjugation (which generates the Galois group of \mathbb{C}/\mathbb{R}), we’re working with the whole Galois group of a cyclotomic field big enough to contain all of the character values. Specifically:

Let the order of G be n, let \zeta_n be a primitive nth root of unity, and consider the Galois field extension \mathbb{Q}(\zeta_n)/\mathbb{Q}. Any character value of G is a sum of nth roots of unity, so it’s contained in this extension. Furthermore, if m is any integer prime to n, then \zeta_n \mapsto \zeta_n^m generates a Galois automorphism \sigma_m, and every Galois automorphism has this form. By choosing a diagonal basis for a given group element x in a given representation with character \chi, so that x is represented as a diagonal matrix with various nth roots of unity on the diagonal, we can see that \chi(x^m) is then the result of applying this same Galois automorphism \sigma_m to \chi(x), i.e.,

\chi(x^m) = \sigma_m(\chi(x)).

Thus if x is conjugate to x^m for all m prime to n, then

\chi(x) = \chi(x^m) = \sigma_m(\chi(x)),

and \chi(x) is therefore fixed under all Galois automorphisms of \mathbb{Q}(\zeta_n)/\mathbb{Q}, i.e., it is in \mathbb{Q}. Conversely, if x is not conjugate to x^m for some such m, then there is a character \chi that separates them, and the corresponding Galois automorphism \sigma_m does not fix \chi(x), i.e., it is not in \mathbb{Q}. 👑

This latter result explains why all characters of the symmetric group S_n are rational on all group elements: if m is prime to n!, then m is also prime to the length of any cycle of any element x of S_n, and therefore x^m has the same cycle structure as x, so is conjugate to it! It follows from the above that \chi(x) is rational for any character \chi! (It follows it’s a rational integer by the observation that character values are always algebraic integers because they are sums of roots of unity.) 🌈

Gems from Isaacs’ book on characters ii: more degrees

This is a direct followup of yesterday’s post. There is a bunch of great stuff I didn’t know in the exercises of Chapter 2 of Isaacs’ book, again much of it focused on getting control over the degrees of the characters of a finite group. Let me just get right into it:

Proposition: Let A be a finite abelian group. Let \chi be a character of A (not necessarily irreducible). Then

|A|\chi(1) \leq \sum_{a\in A} |\chi(a)|^2 \leq |A|\chi(1)^2.

The first inequality is an equality if all the irreducible components of \chi are distinct; the second is an equality if they are all identical.

This is (essentially) Problem (2.9a). (I’m throwing in the 2nd inequality and the comments about when equality occurs.) This is setup for an epic punchline; see below.

Proof: The quantity

\frac{1}{|A|}\sum_{a\in A} |\chi(a)|^2

is the canonical inner product \langle \chi,\chi\rangle_A on the \mathbb{C}-linear space of characters of A (equivalently, functions on A, since it’s abelian). Now the character \chi is a sum of irreducible characters; because A is abelian, they are all linear, i.e., they each have degree 1, so \chi(1) is how many there are. By the Schur orthogonality relations, all the irreducible characters have norm 1 with respect to the inner product \langle \cdot,\cdot\rangle_A, and are also orthogonal with respect to this inner product.

Picture the space of characters of A as a finite-dimensional (actually |A|-dimensional) Hilbert space, with the irreducible characters an orthonormal basis; then all the characters are the lattice points in the nonnegative orthant. Call the irreducible characters e_1,\dots,e_{|A|}, to suggest thinking of them as a standard orthonormal basis for the space. Then because the character \chi is a sum of \chi(1) (not necessarily distinct) elements of this basis, it satisfies

\chi = \sum x_i e_i,

where each x_i is a nonnegative integer and \sum x_i = \chi(1). In other words, \chi is a lattice point somewhere on the simplex defined by the intersection of the hyperplane \sum x_i = \chi(1) with the nonnegative orthant x_i\geq 0, \;\forall i. If \chi(1) < |A|, then |A| - \chi(1) of the x_i‘s have to be 0, in which case \chi is even on the lower-dimensional simplices defined by the same equation but summed only over the nonvanishing x_i‘s.

In any case, we are now in real Euclidean geometry land! The question is this: given such a point \chi, what’s the biggest and smallest that its norm (which is \sqrt{\langle \chi,\chi\rangle_A}, a.k.a. \sqrt{\sum x_i^2}, because the e_i‘s are orthonormal) can be?

I say that the smallest norms are the points where \chi(1) distinct x_i‘s are 1, the rest zero. And the biggest norms are the points where one x_i is \chi(1), the rest zero. My conviction about this rests on picturing the simplices I just described, and maybe you are already convinced with similar geometric reasoning. It amounts to saying that the closest point to the origin of a standard simplex \sum x_i = K in Euclidean space is the center point, and the farthest points from the origin are the vertices. I assume/hope this is a standard thing.

Anyway, if \chi is a vector in a Euclidean inner product space that is a sum of \chi(1) distinct orthonormal vectors (the minimum case), then \langle \chi,\chi\rangle = \chi(1). If it is \chi(1) times a single unit vector, then \langle \chi,\chi\rangle = \chi(1)^2. So in general, it lies between these two values, reaching the extremes under precisely these two conditions. We get the inequalities in the proposition by multiplying everything by |A|. 🐯

Okay, you ready for the punchline? Here you go!

Theorem: The degree of any irreducible character of a finite group is bounded above by the index of any abelian subgroup. 😲

This is Problem (2.9b) in Isaacs!

Proof: Let G be a finite group, let A be an abelian subgroup, and let \chi be an irreducible character. Then

|G| = \sum_{g\in G}|\chi(g)|^2 \geq \sum_{a\in A}|\chi(a)|^2 \geq |A|\chi(1),

where the first equality is the fact that \chi is irreducible (so has norm 1 with respect to the inner product on characters of G), the middle inequality is because the summands are all nonnegative reals and the second sum is a sub-sum of the first, and the final inequality is precisely the above proposition applied to the abelian group A\subset G.

Dividing through by |A|, we get [G:A]\geq \chi(1)!!! ✨

Okay again I am out of time! More soon hopefully.

Addendum 6-25-2021: Something I didn’t have time to mention yesterday but wanted to say. Part of the reason the theorem given above was so striking to me is that it resonates a little with a theorem I proved! And I want to discuss the relationship (mainly to give myself the opportunity to properly think through it).

I wrote a paper with Fedor Bogomolov (journal | arXiv) analyzing a pair of representation-theoretic properties we call purely noncommuting (PNC) and strongly purely noncommuting (SPNC). I’m gonna quickly explain our inquiry and mention some of our results, including the relevant one and its proof, and then ponder what it has to do with the above.

Ok:

Two diagonalizable matrices that commute with each other share a full basis of eigenvectors. Two matrices that don’t commute won’t share a full basis of eigenvectors, but they can still share eigenvectors, and their restrictions to the subspace spanned by the common eigenvectors do commute. So you can see “sharing eigenvectors” as “partial commuting”. In this context, we ask: suppose you have two elements x,y in a finite group G that don’t commute. Can you find a representation of G in which the associated matrices “noncommute purely” (i.e., don’t share eigenvectors)? We call a group that has such a representation for each noncommuting pair x,y a purely noncommuting (PNC) group. In this definition, the representation is allowed to depend on the pair x,y. If the group even has a representation that simultaneously avoids any shared eigenvectors for all noncommuting pairs at once, we call it strongly purely noncommuting (SPNC). Our inquiry was an investigation of the questions: Which groups are PNC? Which are SPNC?

The main results of our paper are that: (a) supersolvable groups (including nilpotent groups) are always PNC; (b) nonabelian simple groups are never PNC; (c) metabelian groups are not PNC in general, but if a metabelian group has the property that when its commutator is decomposed as a product of cyclic groups of prime-power order, the factors are pairwise nonisomorphic, then it is PNC. We also have a pair of other results. One characterizes group-theoretically when a pair of permutations noncommute purely in the standard representation of S_n. The other one is the one I want to talk about.

It’s this: If a finite group G has an irreducible character whose degree exceeds the index of any of its nonabelian subgroups, then it is (not only PNC) but SPNC. (And the representation corresponding to that character realizes it as SPNC, i.e. no two noncommuting elements of G share an eigenvector in that representation.)

This is Proposition 5.1 in both the journal and arXiv versions of our paper. The hypothesis is, incidentally, (sufficient but) not remotely necessary for a group to be SPNC.

Anyway, this theorem is basically a consequence of Frobenius reciprocity. (Proof:) Say the representation space in question (with degree exceeding the index of any nonabelian subgroup) is V, and (hoping for a contradiction) suppose x,y are two noncommuting elements of G that have a common eigenvector in V. Let L be the one-dimensional subspace of V spanned by the common eigenvector. Then L is a one-dimensional representation of the nonabelian subgroup H:=\langle x,y\rangle\subset G, and the number of copies of L inside the restriction of V to H is at least one. On the other hand, by Frobenius reciprocity, the number of copies of L in the restriction of V to H is equal to the number of copies of V in the induced representation \mathrm{Ind}_H^G L. But because L is one-dimensional, this induced representation has dimension [G:H], which by hypothesis is smaller than \dim V because H is nonabelian. So the number of copies of V in \mathrm{Ind}_H^G must actually be zero, and therefore the number of copies of L in V‘s restriction to H must be zero. This is the desired contradiction. 🐞

Okay. So lemme just mention that while this result plays only a minor role in our paper, it is a little nostalgic for me personally because it was chronologically the first fact I proved as part of the research that eventually became my PhD thesis. Anyway, what does it have to do with the above?

Well, ok, the above thing in Isaacs says that irreducible character degrees cannot ever exceed the indices of any abelian subgroups. My theorem just described says that if an irreducible character degree exceeds the indices of all nonabelian subgroups, then something cool (namely SPNCness) happens.

Now it was clear to me when I proved it that the hypothesis of my theorem was a somewhat special property of a group. Being SPNC is already a pretty special property, and the condition of my theorem is sufficient for it, but not necessary, so this condition is strictly rarer. Indeed I did a fair amount of work to construct an interesting example. (The hypothesis is met by minimal nonabelian groups, but there are easier ways to see they are SPNC than the above. The example I constructed, for which the above theorem actually informed me it was SPNC, was a certain subgroup of the affine group AGL(1,\mathbb{F}_{p^2}), where p is a prime congruent to 1 mod 4. See Proposition 1.1.49 of my thesis for the details.) The theorem I learned from Isaacs’ Problem (2.9b) above is making me appreciate a particular dimension of the specialness of this condition.

In general, the little subgroups of a nonabelian finite group tend to be abelian (e.g., the cyclic subgroups), while the bigger subgroups tend to be nonabelian (since as they contain more of the elements, they have a better shot at containing noncommuting pairs). However, it is not in general the case that all the nonabelian subgroups are bigger than all the abelian ones. For example, just take the direct or semidirect product of a little nonabelian group with a big abelian group.

On the other hand, taking into account the above theorem from Isaacs’ Problem (2.9b), you can see that the condition of my theorem actually requires every nonabelian subgroup to be larger than every abelian subgroup. The condition assumes there is an irreducible representation V whose degree is bigger than the index of every nonabelian subgroup H\subset G. But Isaacs’ thing says that this degree bounds the index of every abelian subgroup below. In other words, for the hypothesis to be met, we have to have

\dim V > [G:H]

for every nonabelian subgroup H, but Isaacs tells us that

[G:A] \geq \dim V

for every abelian subgroup A. It follows that |A|<|H| for every abelian A and nonabelian H! 🌵

Gems from Isaacs’ book on characters: controlling degrees

Something made me pick up I. Martin Isaacs’ book on character theory recently, and it is just full of great stuff. As a person who has a published paper in the representation theory of finite groups (journal | arXiv), maybe I expected I wouldn’t get much new info from the introductory chapters, but this is already very false. I’m recording some of what I learned here, and maybe there will be more posts like it in the future.

In this post, I’ll focus on results that gain control over the degrees of irreducible characters of a finite group G.

Definition: Let \chi be a character of G. Then

\mathbf{Z}(\chi) := \{g\in G : |\chi(g)| = \chi(1)\}

is the center of \chi.

The definition is motivated by the fact that if \chi is an irreducible character and \rho is a representation with that character, then \mathbf{Z}(\chi) is exactly the preimage in G of the center of \rho(G). In general (even if \chi is not irreducible), \chi restricts on \mathbf{Z}(\chi) to a linear character of \mathbf{Z}(\chi) times the identity.

Proposition 1: If \chi is irreducible, then its degree is bounded above by \sqrt{[G:\mathbf{Z}(\chi)]}.

This is Corollary (2.30) in Isaacs. The idea of the proof is to bound the size of the restricted character \chi|_{\mathbf{Z}(\chi)} (with respect to the norm on characters on \mathbf{Z}(\chi)) using the fact that \chi is irreducible on the whole group, and then relate this norm to the degree.

Proof: Because \chi is irreducible, we have

|G| = \sum_{g\in G} |\chi(g)|^2.

Because |\chi(g)|^2 is always nonnegative, we have

\sum_{g\in G} |\chi(g)|^2 \geq \sum_{g\in \mathbf{Z}(\chi)} |\chi(g)|^2.

By the definition of \mathbf{Z}(\chi), we have

\sum_{g\in \mathbf{Z}(\chi)} |\chi(g)|^2 = |\mathbf{Z}(\chi)|\cdot \chi(1)^2.

Chaining this together and dividing by |\mathbf{Z}(\chi)|, we get

[G:\mathbf{Z}(\chi)] \geq \chi(1)^2.

The result follows by taking square roots and noting that \chi(1) is the degree of \chi. 🐳

It also follows from the proof that equality occurs iff \chi vanishes outside of \mathbf{Z}(\chi).

Aside: I just figured out how to put emojis in here and I am very excited to use them as QED symbols.

Proposition 2: We have equality in the situation of Proposition 1 if G/\mathbf{Z}(\chi) is abelian.

This is Theorem (2.31) in Isaacs. The hypothesis can be rephrased as the statement that the image \rho(G) is nilpotent of nilpotency class 2. The proof uses this hypothesis to show that \chi must vanish outside of \mathbf{Z}(\chi).

Proof: By the comment following the proof of Proposition 1, this is the same as showing that if G/\mathbf{Z}(\chi) is abelian, then \chi vanishes outside of \mathbf{Z}(\chi). So take g\in G\setminus \mathbf{Z}(\chi).

Because g is not in \mathbf{Z}(\chi) and \chi is irreducible, g‘s image in \rho(G) is not central. (By \rho I mean any representation whose character is \chi.) So there exists h\in G with the commutator [g,h] := g^{-1}h^{-1}gh not lying in the kernel of \chi. But on the other hand, the assumption that G/\mathbf{Z}(\chi) is abelian is equivalent to the statement that the entire commutator subgroup [G,G] lies inside \mathbf{Z}(\chi); in particular, [g,h]\in \mathbf{Z}(\chi). So \rho([g,h]) is a scalar matrix that’s not the identity! Call it \zeta I, with \zeta\in\mathbb{C} an appropriate root of unity different from 1.

Then \rho(g[g,h])=\rho(g)\rho([g,h]) = \zeta \rho(g), so that

\chi(g[g,h]) = \zeta \chi(g),

by linearity of the trace. On the other hand, because g[g,h] = h^{-1}gh is conjugate to g, we also have

\chi(g[g,h]) = \chi(g).

Putting these two equalities together, we find that \zeta \chi(g) = \chi(g), and because \zeta \neq 1, this implies \chi(g) = 0. 🌈

Ok that’s all I have time for today, but I plan on doing more of these soon!

Why nonmodularity makes the co/homology of a quotient so lovely

I had a conversation with Søren Galatius a while back in which he made use of the fact that H^\bullet(X/G;k) \cong H^\bullet(X;k)^G for a particular topological space X that was carrying an action by a particular finite group G. When I wrote down my notes on the conversation, this point struck me as one of those “familiar, but why?” facts. Did I just believe it by analogy with more familiar facts? (Like how if X = \mathrm{Spec}\, A is an affine variety with an action of a finite group G, then X/G = \mathrm{Spec}\, A^G? Or if \Delta is a balanced simplicial poset with an action of a finite group G, then the Stanley-Reisner ring k[\Delta / G] is isomorphic to k[\Delta]^G (see here, Theorem 2.3.1)?) Why was it actually true? And what hypotheses on X and G does it require?

I wrote to Sylvain Cappell with these questions. He said (paraphrasing): in the case that X is a simplicial or CW complex, and the characteristic of k doesn’t divide the order of G, this is basically because if G acts simplicially/cellularly, and if you choose a fine enough triangulation (or CW complex structure) so that the quotient X/G inherits a simplicial (or CW complex) structure, then already C_\bullet(X/G;k)\cong C_\bullet(X;k)^G at the level of chains. There are analogous results for compact Lie groups if the characteristic of k is zero. A reference is Glen Bredon’s book Introduction to Compact Transformation Groups.

So, great! Thank you Sylvain! Clear bird’s eye explanation plus a reference where (when I looked it up) sure enough, it is proven carefully that H^\bullet(X/G;k) \cong H^\bullet(X;k)^G under the hypothesis that \mathrm{char}\,k doesn’t divide |G|. (And the corresponding statement for homology. See Theorem 2.4 in Chapter III of that book.)

But also, something was bothering me. Once I thought about it, it was clear to me that C_\bullet(X/G;k)\cong C_\bullet(X;k)^G at the level of chains. But this conclusion did not seem to require the nonmodularity hypothesis — that \mathrm{char}\,k doesn’t divide |G| — at all! And in fact, it doesn’t! Assuming you’ve cut up X finely enough (and that G acts simplicially/cellularly on this structure), each simplex/cell of X/G is a G-orbit of simplices/cells in X. Meanwhile, chains on X are G-invariant if and only if every simplex/cell in a given orbit has the same coefficient. So mapping a cell of X/G to the sum across the corresponding orbit in X will map a basis of C_\bullet(X/G;k) to a basis of C_\bullet(X;k)^G, implying that C_\bullet(X/G;k) \cong C_\bullet(X;k)^G in all cases. I couldn’t see any reason why the boundary map wouldn’t commute with this isomorphism. Doesn’t that mean that both chain complexes C_\bullet(X/G;k) and C_\bullet(X;k)^G have the same homology?

So why did both Cappell and Bredon (the latter in a textbook) require the nonmodularity hypothesis? (Bredon’s proof needs it — more below. But did Cappell’s sketch show it was actually superfluous?)

I actually wrote Cappell with this question last night. He hasn’t replied yet, but that’s ok because as soon as I clicked “send” and went to brush my teeth, I figured it out.

Yes, C_\bullet(X/G;k) and C_\bullet(X;k)^G are isomorphic as chain complexes in all cases. But the homology and cohomology of the latter aren’t necessarily H_\bullet(X;k)^G and H^\bullet(X;k)^G. That’s the step that requires the nonmodularity hypothesis.

For the rest of this, I’ll concentrate on homology just to keep things simpler. (I know I put cohomology in the headline and the lede! Sorry to be clickbaity! Similar arguments should work for cohomology.) I’ll outline Bredon’s direct proof, under the nonmodularity hypothesis, that H_\bullet(X/G;k) \cong H_\bullet(X;k)^G (it’s short), and then I’ll show how Cappell’s sketch can be completed to a proof by identifying H_\bullet(X;k)^G with the homology of the complex C_\bullet(X;k)^G under the same hypothesis. I think that morally they’re really the same proof: I need to use a little group cohomology to get that H_\bullet(X;k)^G \cong H_\bullet(C_\bullet(X;k)^G) from the nonmodularity assumption, and I think that the reasoning behind the necessary group cohomology is basically the same as Bredon’s argument anyway, so the alternative proof is secretly the same proof hiding behind fancier machinery. But still I wanted to record it because it was a new connection for me.

Okay: consider the map on chains t: C_\bullet(X/G;k) \rightarrow C_\bullet(X;k)^G given on simplices \Delta by

t(\Delta) = \sum_{g\in G} g(\Delta).

(This map is closely related to the isomorphism on chains C_\bullet(X/G;k)\rightarrow C_\bullet(X;k)^G described above, where you send a simplex of X/G to the sum across the corresponding orbit in X, but it differs because if \Delta has a nontrivial stabilizer, then t(\Delta) will be a nontrivial multiple of the sum across the corresponding orbit in X. In general, t is not an isomorphism unless \mathrm{char} k is prime to |G| — a first clue that this hypothesis matters.)

This map t can be composed with the inclusion C_\bullet(X;k)^G\hookrightarrow C_\bullet(X;k). Let me also call this composition t since all I did was enlarge the codomain. Then the induced map on homology t_*:H_\bullet(X/G;k) \rightarrow H_\bullet(X;k) is called the transfer. The image lands inside H_\bullet(X;k)^G. Meanwhile, there’s a very canonical map in the other direction: the map \pi_*:H_\bullet(X;k) \rightarrow H_\bullet(X/G;k) induced from the quotient map \pi:X\rightarrow X/G. If you restrict \pi_* to H_\bullet(X;k)^G, both compositions t_*\pi_* and \pi_*t_* end up being multiplication by |G|, respectively on H_\bullet(X;k)^G and H_\bullet(X/G;k). (Basically you find this out by chasing chains through the definitions of \pi and t. I’m omitting the details, but they’re in Bredon if you want to look.) If |G| is invertible in k (the nonmodular assumption), then t_*\pi_* and \pi_*t_* are both invertible, and it follows that t_* and \pi_* are themselves invertible. So they’re isomorphisms between H_\bullet(X/G;k) and H_\bullet(X;k)^G. This is the proof found in Bredon’s book.

Well and good! Ok. What’s the alternative proof based on Cappell’s point that C_\bullet(X/G;k) and C_\bullet(X;k)^G are isomorphic as chain complexes? Well, H_\bullet(X/G;k) is the homology of the complex C_\bullet(X/G;k), by definition. So, our hope is that we can get H_\bullet(X;k)^G as the homology of the chain complex C_\bullet(X;k)^G. Here, we meet an obstruction, but we can control it:

By definition, we’ve got H(X;k) = \mathrm{ker}\,\partial / \mathrm{im}\,\partial, where \partial is the boundary operator of the chain complex C_\bullet(X;k). (I’m suppressing subscripts because I’m finding them distracting to the main point. I hope it doesn’t bother you and I apologize if it does.) In other words, we’ve got a short exact sequence

0\rightarrow \mathrm{im}\,\partial \rightarrow \mathrm{ker}\,\partial \rightarrow H(X;k) \rightarrow 0.

Now take G-invariants. The functor \left(-\right)^G is left-exact (e.g., Exercise 6.1.1 in Weibel), and its right derived functor is the group cohomology H^*(G,-). Therefore, there is a long exact sequence

0\rightarrow(\mathrm{im}\,\partial)^G \rightarrow (\mathrm{ker}\,\partial)^G \rightarrow H(X;k)^G \rightarrow H^1(G,\mathrm{im}\,\partial)\rightarrow \dots

Okay, what is (\mathrm{ker}\,\partial)^G? It consists of those chains that are both G-invariant and are annihilated by the boundary map \partial, i.e, it is \mathrm{ker}\,\partial \cap C_\bullet(X;k)^G, i.e., it is just the kernel of the boundary map of the complex C_\bullet(X,k)^G. This doesn’t require the nonmodularity hypothesis.

What about (\mathrm{im}\,\partial)^G? This consists of chains that are images of \partial and are also G-invariant. We’d like to identify it as the image of the boundary map of the complex C_\bullet(X,k)^G, but a priori, the latter could be smaller as it consists only of chains that are \partial-images of chains that were already G-invariant. For these to be the same, we’d need to know that any \partial-image that is itself invariant is in fact the \partial-image of an already-invariant chain.

And nonmodularity guarantees this! If x is a G-invariant chain that lies in \mathrm{im}\partial, let y be any \partial-preimage, and let

y' = \frac{1}{|G|}\sum_{g\in G} gy.

We can do this because nonmodularity says |G| is invertible in k. Then y' is G-invariant by construction, and

\partial y' = \frac{1}{|G|}\sum_{g\in G} gx = \frac{1}{|G|}|G|x=x

by the G-invariance of x (and the assumtion \partial y = x, and the fact that \partial is G-equivariant). So we have found a G-invariant preimage for x, as desired!

We have thus identified (\mathrm{im}\,\partial)^G and (\mathrm{ker}\,\partial)^G as nothing but the kernel and image of the boundary map on the complex C_\bullet(X;k)^G. So the image of (\mathrm{ker}\,\partial)^G in H(X;k)^G is exactly the homology of C_\bullet(X;k)^G. This proves that H_\bullet(C_\bullet(X;k)^G) injects into H_\bullet(X;k)^G. We can complete the argument by showing this injection is also surjective, i.e., that the map (\mathrm{ker}\,\partial)^G \rightarrow H(X;k)^G is surjective. From the long exact sequence, this in turn will follow if H^1(G,\mathrm{im}\,\partial) = 0. Is it?

In general, no, but again yes under the nonmodularity assumption! The chain groups of C_\bullet(X;k) are k-modules, and so are all their submodules such as the \mathrm{im}\,\partial_\bullet‘s etc. If |G| is invertible in k, then multiplication by |G| is an automorphism of any k-module, and it follows that the group cohomology H^n(G,M) vanishes for any n>0 and any k-module M! This is, e.g., Corollary III.10.2 in Kenneth Brown’s book on group cohomology. In particular, if |G| is inverible in k, then H^1(G,\mathrm{im}\,\partial)=0 for any of the \mathrm{im}\,\partial‘s. So, this does it!

As I said at the beginning, I don’t really think this is exactly a different proof than Bredon’s – in particular, key logic is hidden inside Corollary III.10.2 from Brown’s book, and if you look at the proof of that proposition, it is very reminiscent of Bredon’s proof quoted above. Indeed, it involves a group-theoretic analog to the map t above, also called the transfer. But nonetheless it definitely clarified my thinking to walk through all this, so I’m sharing it.

Addendum 4/20/21: I noticed that Corollary III.10.2 from Brown is also Proposition 9.40 in Joseph Rotman’s book on homological algebra. Rotman gives a lower-tech proof that does not involve the transfer homomorphism. (Heads up that I think Rotman’s proof has some typos in the indexing.) Later (in Corollary 9.89), Rotman does give the transfer-based proof. But if we replace the call to Brown’s III.10.2 in the above with a call to Rotman’s 9.40, I think that makes the proof given here officially a different one than Bredon’s.

Addendum 7/23/22: First of all, I want to note a correction. When I first wrote this down I didn’t notice the role of the nonmodularity assumption in identifying (\mathrm{im}\,\partial)^G with the image of the restriction of \partial to C_\bullet(X,k)^G, so the original blog post incorrectly stated that this is true all the time. I just corrected this, adding an argument that this holds under the nonmodularity assumption.

Secondly, I want to note that while I wrote down that argument in a computational way, it could also have been phrased in terms of the vanishing of H^1. Indeed, we have the short exact sequence

0\rightarrow \mathrm{ker}\,\partial \rightarrow C_\bullet \rightarrow \mathrm{im}\,\partial \rightarrow 0,

and we can take G-invariants to get

0 \rightarrow (\mathrm{ker}\,\partial)^G \rightarrow (C_\bullet)^G \rightarrow (\mathrm{im}\,\partial)^G \rightarrow H^1(G,\mathrm{ker}\,\partial)\rightarrow\dots,

and nonmodularity implies H^1(G,\mathrm{ker}\,\partial) = 0 just as it does for H^1(G,\mathrm{im}\,\partial). So the G-invariant chains surject onto the G-invariant elements of \mathrm{im}\,\partial, allowing us to see (\mathrm{im}\,\partial)^G as the image of the boundary map’s restriction to (C_\bullet)^G, as desired.

[Aside: actually it seems to me that the computation above by which I showed this same result by averaging a preimage of x over G, is pretty much the general argument that H^i(G,M) vanishes when (multiplication by) |G| is invertible on M; because that exact calculation shows that (-)^G preserves surjectivity, and thus is an exact functor, on \mathbb{Z}[|G|^{-1}]-modules.]

One final note. Tracing through the proof, we see that (\mathrm{ker}\,\partial|_{C_\bullet(X;k)^G})/ (\mathrm{im}\,\partial)^G always injects into H(X;k)^G, while \mathrm{im}\,\partial|_{C_\bullet(X;k)^G} always injects into (\mathrm{im}\,\partial)^G, even in the modular situation. Thus, (\mathrm{ker}\,\partial|_{C_\bullet(X;k)^G}) / (\mathrm{im}\,\partial|_{C_\bullet(X;k)^G}) = H(C_\bullet(X;k)^G) always has a map to H(X;k)^G. But nonmodularity is needed (and sufficient) to guarantee both the injectivity and the surjectivity of this map.

Borel subgroups and Sylow subgroups

I’ve been reading T. A. Springer’s book on linear algebraic groups, and it’s really satisfying because it’s filling in a lot of details about things I’ve heard about for years but only in vague terms. More importantly, it’s filling in the story. I feel like I’ve looked up the definitions of the words “parabolic subgroup” and “Borel subgroup” many times in the past, but somehow they never completely stuck, and I realize it’s because the definitions weren’t connected to a whole picture-of-what’s-going-on for me. Well, now they are. This blog post will record a part of that picture.

The structure theory of connected linear algebraic groups is reminiscent of Sylow theory in finite group theory. Given a connected linear algebraic group over an algebraically closed field, it has “Borel subgroups” and they’re all conjugate, kind of like how a finite group has Sylow p-groups and they’re all conjugate. I think I initially wanted one theory to specialize to the other, and this almost happens, but not quite: if I take G = GL(n,k) (k algebraically closed), then the Borel subgroups are the conjugates of the group of upper triangular matrices, while if I take G=GL(n,\mathbb{F}_p), then the Sylow p-subgroups are the conjugates of the group of unipotent upper triangular matrices. The \mathbb{F}_p-points of the Borel subgroup form the Sylow normalizer.

This makes it look like the parallel is superficial, but I’m learning from Springer’s book that it’s richer than I thought.

In both situations, there is a family of subgroups “coming down from the top of the group” — subgroups of index prime to p in the finite case, and parabolic subgroups in the algebraic case. And there is a family of subgroups “coming up from the bottom” — p-subgroups in the finite case and closed connected solvable subgroups in the algebraic case. There is a fixed-point theorem asserting that an action by one of the “bottom” subgroups on a transitive G-space (in the appropriate category) whose point stabilizers are “top” subgroups has a fixed point. This situation implies in both cases that any given one of the “bottom” subgroups has a conjugate contained in any given one of the “top” subgroups. Finally, there is an “existence theorem” asserting that the two families “meet in the middle”: a minimal member of the “top” family is also a maximal member of the “bottom” — these are the Sylow subgroups in the finite group case and the Borel subgroups in the algebraic case. The fact that they’re all conjugate then falls out as a corollary in the exact same way.

Here’s what I mean, with a little more precision and completeness:

A linear algebraic group G is (for the purposes of this discussion) an affine algebraic variety over an algebraically closed field k, whose elements — i.e., the closed points if G is viewed as a k-scheme; equivalently, the k-points — form a group, such that the product and inversion maps are morphisms of algebraic varieties. Given a subgroup H that is also a closed subset in the Zariski topology on G (i.e., a “closed subgroup”), there is a standard way to regard the coset space G/H as an algebraic variety. The subgroup H is parabolic if G/H is complete as a variety. This is the analog in the category of varieties to being compact as a topological space. The canonical family of examples is projective space and its closed subsets. As usual, any transitive action of G on an algebraic variety can be identified with the action of G on G/H if H is taken to be a point stabilizer; thus parabolic subgroups of a linear algebraic group can be thought of as point stabilizers of transitive actions on complete varieties.

Completeness, like compactness in the topological category, is preserved under surjective morphisms. Meanwhile if H\subset K are closed subgroups of G, there is a surjective morphism of algebraic varieties G/H\rightarrow G/K (that commutes with the action of G). It follows that if H is parabolic, so is K; i.e., the family of parabolic subgroups of G is upward closed in the lattice of closed subgroups of G. This is what I mean when I say that the parabolic subgroups of G are “coming down from the top”.

Now, ordinarily, developments of Sylow theory don’t make this explicit, but there is a parallel. Let G now be a finite group, and let’s call any subgroup H of index prime to p a pprimely big subgroup. (This will be analogous to parabolic subgroups in the algebraic case.) To emphasize the parallel, let’s think of this as actually a property of the transitive G-set G/H: it has length prime to p. By the usual identification of transitive G-sets with coset spaces G/H, p-primely big subgroups can be thought of as point stabilizers of transitive actions on G-sets of length prime to p.

Like completeness in the algebraic case, primeness to p is a property preserved under surjective maps of G-sets, and exactly as above it follows that the property of being p-primely big is upward closed in the lattice of subgroups of G. (Of course one could see this more directly, but I’m trying to draw a parallel, so I’m seeing “G-set with length prime to p” as an analogue of “complete G-space”.)

Now for the subgroups that “come up from the bottom”. In the finite group case, these are the p-subgroups; this is a downward-closed property because factors of p-powers are p-powers. In the algebraic group case, we consider the closed, connected, solvable subgroups. Since solvability is a downward-closed property, this family is downward-closed in the lattice of closed, connected subgroups of G.

Now a counting argument based on divisibility shows that the action of any p-group on any set of length prime to p has a fixed point. Given a finite group G with a p-subgroup Q and a p-primely big subgroup H, the action of G on G/H can be restricted to Q and it thus has a fixed point. By a standard argument (see proof of Theorem 2 at the link), this shows that Q has a conjugate contained in H.

In direct parallel, the Borel fixed point theorem asserts that a connected, solvable linear algebraic group acting algebraically on a complete algebraic variety has a fixed point. Then, given a linear algebraic group G with a connected, solvable subgroup Q and a parabolic subgroup H, the action of G on G/H restricted to Q thus has a fixed point, and the exact same argument shows that Q has a conjugate contained in H.

Finally, the “first Sylow theorem” asserts that a finite group G of order p^am, p\nmid m has a subgroup of order p^a (this is the Sylow subgroup). Evidently such a subgroup is both maximal among p-subgroups and minimal among p-primely big subgroups. The family of p-subgroups and of p-primely big subgroups “meet in the middle”. Since any given p-subgroup has a conjugate contained in any given p-primely big subgroup, and Sylow p-subgroups are both p-subgroups and p-primely big, it follows that any two of them are conjugate.

In the same way, the theory of linear algebraic groups furnishes us a theorem that a maximal closed, connected, solvable subgroup is parabolic. (So the family of parabolic subgroups and connected, solvable closed subgroups “meet in the middle”. In Springer’s book, it’s Theorem 6.2.7.) These are the Borel subgroups, and it follows in the exact same way that any two of them are conjugate.

Under the hood of the Steenrod 5-lemma

I first ran into the 5-lemma in an algebraic topology class. It was just tossed off with a comment like “as you can check” or something.

Lemma (5-lemma): Suppose you have a commutative ladder diagram like so:

Image

If the rows are exact, and a, b, d, and e, are isomorphisms, then c is an isomorphism.

Since then I’ve encountered it in at least 3 different textbooks, and they all think this is something the reader should verify. Chuck Weibel (An Introduction to Homological Algebra) and James Munkres (Elements of Algebraic Topology) both leave it as an exercise. Allen Hatcher (Algebraic Topology) does write out a proof, but he introduces it with the words, “There is really no choice about how the argument can proceed, and it would be a good exercise for the reader to close the book now and reconstruct the proofs without looking.”

This is all perfectly within bounds of course. And, generally, I’m pro-giving the reader things to think through. But the thing is, the statement is so absurdly plausible that it’s hard to muster sufficient doubt and worry to really earnestly take the invitation. I think when I first heard it, I not only believed it instantly on faith but had a little trouble believing that the full strength of the hypotheses was necessary. (In fairness to me, it turns out that the hypotheses are slightly stronger than needed. More below. But I probably thought they could be relaxed more than they really can.)

When I encountered the exercise in Weibel, I wrote out an element-chasing proof. Since that stretch of Weibel was also my introduction to abelian categories (which is supposedly the 5-lemma’s “natural setting,” although it also holds in Grp, see below), I also started to write down a proof that avoided thinking of A, B, \dots as having elements and just applied the axioms of an abelian category; I seem to have abandoned the effort after it started to seem like too much work to be worth standing on ceremony. (In view of Mitchell’s embedding theorem, the element chase already had plenty of generality.)

Years later, I was reading Munkres. He does us the courtesy of calling it the Steenrod 5-lemma. It was exciting to me for it to be associated to a name: this gives it a sense of history. As mentioned above, Munkres leaves the proof as an exercise (he declares that it’s “simple diagram-chasing”). I remembered I’d done it in Weibel, but found I didn’t remember the proof idea at all. I had stored only that it could be broken into parts: if b and d are surjective and a, or was it e?? (see below for the answer!), is injective, then c is surjective; and a dual statement for injectivity. I looked it up in my notes on Weibel; there was the proof, clever and clean; but apparently writing this down had not left much of a permanent dent in my consciousness. Just chase elements; the hypotheses are sitting there waiting for you right when you need them; presto. Not much to leave an impression.

I decided to reflect a little more on the Steenrod 5-lemma to see if I couldn’t get a little more intuition for the situation. Here’s what I came up with.

If you use the full strength of the hypotheses, then a,b,d,e are isomorphisms. If you identify the pairs A, A' etc. along these isomorphisms, the diagram becomes

Image

The statement is that if the two horizontal paths are exact, then c is an isomorphism.

As far as concerns c, the relevance of exactness at B is that the two maps B\rightarrow C and B\rightarrow C' must have the same kernel (namely, the image of A\rightarrow B). This means that the image of B in C and C' is the same. Similarly, exactness at D is the statement that C\rightarrow D and C'\rightarrow D have the same image (the kernel of D\rightarrow E). We may as well replace B with its quotient by A, so that it’s being embedded in C and C', and replace D with the common image of C and C':

Image

So, finally making use of exactness at C, C', we can conclude that C and C' are both extensions of I = \mathrm{Im} by K = \mathrm{Ker}. The Steenrod 5-lemma thus comes down to the assertion that if you have groups C and C' containing a common normal subgroup K, and a group homomorphism C\xrightarrow{c} C' acting as the identity on K and inducing an isomorphism of the quotients C/K and C'/K, then c is an isomorphism.

(I hopped to the category of [not necessarily abelian] groups there; hope you don’t mind. Abelian groups come out as a special case, and then modules over an arbitrary ring follow immediately because a module homomorphism is an isomorphism if and only if it’s an isomorphism of the underlying abelian groups. I’m happy to stop at module categories, but if you like, use Mitchell’s embedding theorem to get more generality still.)

The above italicized assertion feels to me like the intuition I was going for. The map c is surjective because (1) its restriction to K is surjective, and also (2) it hits every coset of K in C' since the induced map C/K\rightarrow C'/K is surjective. It’s injective because (1) injectivity of C/K\rightarrow C'/K implies the kernel of c is contained in K, whereupon (2) injectivity of the restriction to K shows the kernel of c is trivial.

Since this argument cleanly separates the ways we conclude injectivity and surjectivity for c, it also illuminates the way the lemma can be broken apart into separate statements for injectivity and surjectivity. Back to the original diagram, and assuming exact rows but dropping the assumptions on a,b,d,e for a moment, what the above train of thought shows is that surjectivity of c comes from surjectivity of maps on kernels and quotients, and ditto with injectivity replacing surjectivity everywhere. The kernels are the images of B and B' in C and C' respectively. The quotients are the images of C and C' in D and D'. I’m using boldface to distinguish these kernels and quotients from other kernels and quotients that come up in what follows; the last two sentences should be taken as definitions-of-boldface-kernels-and-quotients. The question is, “what assumptions on a, b, d, e allow us to conclude that the maps on kernels and quotients are surjective (resp. injective)?

Addressing this question does force me to diagram-chase, but at least for me personally, the framework articulated here gives the diagram-chase more animating intuition. In particular, we can keep track of which hypotheses are needed to guarantee that the map on kernels, resp. quotients, is surjective, resp. injective.

To guarantee that the map on kernels is surjective, what do we want? It is enough for b to be surjective: take any element in C' coming from B'; pull it back along b to B; and then push it forward to C to find a preimage. (The general principle: surjectivity of the map b implies surjectivity of the induced map on quotients of B, B'. The fact that the kernels are quotients of B, B' illustrates why I’m using boldface.) On the other hand, to guarantee the map on quotients is surjective, we need more than surjectivity of d, since these quotients are subgroups of D, D'. (General principle: surjectivity of a map does not imply surjectivity of an induced map on subgroups.) But surjectivity of d plus injectivity of e is enough: take an element of \mathrm{Im}(C'\rightarrow D') = \mathrm{Ker}(D'\rightarrow E') and pull it back to D, possible because d is surjective; does it lie in \mathrm{Im}(C\rightarrow D) = \mathrm{Ker}(D\rightarrow E)? Yes, because it must map to zero after pushing forward to E and then down along e, but since e is injective, it was already zero in E.

To summarize, surjectivity of b implies surjectivity on the kernels, while surjectivity of d plus injectivity of e implies surjectivity on the quotients. So together, these conditions imply surjectivity of c.

Dually, injectivity of d is already enough to imply injectivity of the quotients. (Injectivity of a map does imply injectivity of an induced map on subgroups.) Meanwhile, injectivity of b plus surjectivity of a implies injectivity of the kernels. (Again, just injectivity of b wasn’t enough, because injectivity of a map does not imply injectivity of an induced map on quotients.) We can see this because surjectivity of a means that everything that becomes zero in B'\rightarrow C' came from A, so it already becomes zero in B\rightarrow C. So together, these conditions imply injectivity of c.

Burnside’s counting lemma via characters

I just noticed that there is a very conceptually transparent proof of Burnside’s counting lemma via character theory! Here’ s a formulation of the statement. (I have heard it is not really due to Burnside; Frobenius or something.)

Burnside’s counting lemma: If a finite group G acts on a finite set X, the number of orbits for the action is equal to the average number of fixed points of the group elements.

The proof I’ve seen before is a very pretty elementary counting argument. You count the cardinality of the set \{(g,x): gx=x\} two different ways. Indexing over G first, you get

\sum_{g\in G} \left| X^g \right|,

where X^g is the set of fixed points of g in X. Indexing over X first instead, organizing the sum into orbits, and invoking the orbit-stabilizer theorem, you get

\sum_{x\in X}\left|G_x\right| = \sum_{\theta\in \mathcal{O}} |\theta|\left|G_x\right| = |\mathcal{O}||G|,

where \mathcal{O} is the set of orbits, and the x in the middle expression is any element of \theta. Equating the two counts and dividing by |G|, you get the claimed result

|\mathcal{O}| = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|.

I know I’m late to the party with this, but today was the first time I noticed that the thing on the right looks character-theoretic. I wondered if the result could be understood in those terms. Of course the answer is yes, and the argument is maybe even more straightforward to remember (if less elementary) than the above. The characters do all the work.

Proof by characters: Let V be the permutation representation of G on a complex vector space with basis X, and let \chi be its character. A vector in this representation is fixed iff it has constant coefficients across each orbit; thus the dimension of the fixed-point subspace V^G is equal to the number of orbits |\mathcal{O}|.

On the other hand, the dimension of V^G can also be calculated as the multiplicity of the trivial character of G (call it \psi) in \chi, which is the inner product \langle \psi,\chi\rangle. Meanwhile, \chi(g) is exactly \left| X^g\right| since V is a permutation representation with permutation basis X. Thus,

\langle \psi,\chi\rangle = \frac{1}{|G|} \sum_{g\in G} \overline{1}\cdot \left|X^g\right| = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|.

So, equating, you get

|\mathcal{O}| = \dim V^G = \frac{1}{|G|}\sum_{g\in G} \left|X^g\right|,

as was to be shown!

Cohen’s theorem on noetherian rings

I recently inherited a copy of Nagata’s book Local Rings, and I’ve already learned a new theorem!

Theorem of Cohen: A commutative ring is noetherian if and only if all its prime ideals are finitely generated.

This is cool because if you, like me, have ever been sad about the fact that noetherianity is not a local property of a ring, this is a sort of substitute. (NB: the axiom of choice, in the form of Zorn’s lemma, is involved.)

There is a naive reason you might hope the theorem is true, that is not the real reason, but it’s sort of the right idea: you might hope that because any non-finitely-generated (henceforth, “non-f.g.”) ideal is contained in a maximal ideal, non-f.g.ness must show up on a maximal.

This is wrong, and actually, the statement of the theorem becomes false if you reduce the quantification from all primes to just maximals. At the bottom of this post I’ll give an example to show this, due to Cory Colbert. The problem is that a non-f.g. ideal can live inside an f.g. ideal. (Silly example: take your favorite non-f.g. ideal inside your favorite non-noetherian ring. It’s inside the unit ideal. For a more interesting case, see Cory’s example below.) Thus, the family of non-f.g. ideals is not upward-closed, and there’s no reason for a failure of noetherianity to show up on a maximal.

However, it’s approximately the right idea. Non-f.g.ness does not propagate upward because f.g.ness doesn’t simply propagate downward. But what’s true is that f.g.ness does propagate downward to an ideal from two ideals containing it that relate to it in a specific way.

Lemma: If I is an ideal in a commutative ring, and a is an element, and I+(a) and (I:a) are both finitely generated, then I is finitely generated.

Proof: Choose a finite set of generators for I+(a), say f_1,\dots,f_m, and write each one as the sum of an element of I and a multiple of a:

f_i = g_i + ar_i,

with each g_i\in I and each r_i just some ring element. Also choose a finite set of generators h_1,\dots,h_n for (I:a). Then all of g_1,\dots,g_m,ah_1,\dots,ah_n lie in I, and we will show that they generate I. Let x\in I be arbitrary. Then there is a representation

x = \sum p_if_i = \sum \left(p_ig_i + p_iar_i\right)

for x, where the p_i are some ring elements. The key observation is that

x - \sum p_ig_i = \sum p_iar_i

is in I, since both terms on the left are in I, and it follows that \sum p_ir_i is in (I:a)! Thus, there is a representation

\sum p_ir_i = \sum q_jh_j,

where the q_j are some ring elements. Then we have

x = \sum p_ig_i + \sum q_j(ah_j),

and this is the promised expression of x in terms of the proffered generators g_1,\dots,g_m, ah_1,\dots,ah_n. This completes the proof.

With this lemma in hand, the proof of Cohen’s theorem can be briefly summarized as follows. Consider the family of non-f.g. ideals in a ring. It’s not upward closed, so it won’t necessarily reach the maximal ideals themselves, but if it is nonempty, it will have maximal members (this is where Zorn comes in), and what the lemma does (a sort of weaker substitute for upward-closure) is to prove that these maximal members must be prime. Thus non-f.g.ness must show up on a prime if it shows up at all. Here’s the precise argument:

Proof of Cohen’s Theorem: By definition, a noetherian ring can have only f.g. ideals, so in particular, it has only f.g. primes. The converse is the substantive direction. We will (equivalently) prove the inverse: a non-noetherian ring must have a non-f.g. prime. So let A be a non-noetherian ring, and let \mathcal{F} be the family of non-f.g. primes, which is nonempty by assumption. Note that given an ascending chain in \mathcal{F}, the union yields an upper bound; we see this as follows. If the union of an ascending chain of ideals in \mathcal{F} were not in \mathcal{F}, i.e. if it were f.g. as an ideal, then the (finite list of) generators would be found in some finite collection of members of the chain, and then the greatest of these members would contain all the generators and thus equal the whole union. This is a contradiction because the members of the chain were presumed to be in \mathcal{F} whereas the union was presumed not to be. It follows that the union of an ascending chain of non-f.g. ideals is non-f.g. Thus \mathcal{F} satisfies the hypotheses of Zorn’s lemma, so has a maximal element. Call it I.

We claim I is prime. Indeed, suppose ab\in I, but a\notin I. By the former supposition, b\in (I:a). By the latter supposition, I+(a) contains I properly. Since I is maximal in \mathcal{F}, this proper containment implies that I+(a) is f.g. Since I is not f.g. while I+(a) is, the Lemma implies that (I:a) is not f.g. This means that it cannot contain I properly, again by maximality of I in \mathcal{F}. We conclude I = (I:a). But recalling that b\in(I:a), we can now conclude that b\in I! This proves that I is prime, so there is a non-f.g. prime, completing the proof.

I promised I’d tell you about Cory’s example showing that the statement cannot be quantified only over maximal ideals. Let k be any field. The action will all take place inside the field of rational functions over k in two variables. Let R = k[x,y]_{(x,y)}. Let S  = R[y/x,y/x^2,\dots]. Note that (x)S = (x,y)S = (x,y,y/x,y/x^2,\dots)S, because every y/x^i is a multiple of x in S. It follows that S/(x)S = k, so that (x)S is a maximal ideal in S. Finally, consider the localization J = S_{(x)S}.

This is the desired ring. It is local and the unique maximal ideal is principal (generated by x), so certainly f.g. On the other hand, it is non-noetherian: the non-zero element y is contained in the ideal (x^j) = (x)^j, for all j, since y = (y/x^j)x^j, but in a noetherian local ring, the intersection of the powers of the maximal ideal is zero, by the Krull intersection theorem.

Why x^p-a doesn’t factor unless it has a root

I’ve heard this result in the title referred to as “classical” but I’m actually not sure where to find the proof. It came up in conversation with a collaborator last week which is why I’m thinking about it. It’s a problem in Michael Artin’s Algebra text, somewhere in the Galois theory chapter, so I have my own argument, which I’m sharing here. I have no idea how it’s usually proven. Same way? Or might there be a proof that handles the characteristic of the field in a uniform way? (Update 8-4-20: there is! It’s essentially the characteristic p proof given here, which it turns out can be easily tweaked to work in all cases. I’ve included it in an addendum. The characteristic not-p argument given here turns out to be a lot of unnecessary work, although I’m not sorry.)

Let k be a field of arbitrary characteristic. Let a be an arbitrary nonzero element of k, and let f=x^p-a\in k[x], where p is some prime number.

Theorem: Either f is irreducible over k or it has a root in k.

Proof: Let \Omega be an algebraic closure of k. In both of the below cases, the plan is to show that if f is reducible over k, then it has a root in k.

Case 1: \text{char}\, k = p.

In this case, f=(x-\alpha)^p, where \alpha is a pth root of a in \Omega. Suppose there exists a nontrivial factorization of f,

f=gh

with g,h\in k[x]. Then, normalizing g,h to be monic, we have

g=(x-\alpha)^m, h=(x-\alpha)^n,

with m,n\geq 1 and m+n=p. This implies \gcd(m,n)=1 since p is prime. Also, since g,h\in k[x], the final terms \alpha^m,\alpha^n are \in k. Since m,n are relatively prime, there exist integers j,\ell for which jm + \ell n = 1, and then we can conclude (\alpha^m)^j(\alpha^n)^\ell = \alpha^{jm+\ell n} = \alpha\in k. Thus f has a root in k.

Case 2: \text{char}\, k \neq p.

In this case, f has distinct roots in the algebraic closure \Omega, since the derivative f' = px^{p-1} is relatively prime with f, as p is invertible in k. For a similar reason, the pth roots of unity in \Omega are distinct.

Let L\subset\Omega be a splitting field for f. Let \zeta be a primitive pth root of unity in \Omega, and let \alpha be some root of f in L. Note that \zeta^j\alpha is also a root of f for j=1,\dots,p-1, and is thus in L, so that L contains \zeta^j\alpha / \alpha = \zeta^j; thus L contains k(\zeta). As all roots of f have the form \zeta^j\alpha, we have L=k(\zeta)(\alpha).

Now L/k is a Galois extension. Let \Gamma = \text{Gal}(L/k), and let G be the subgroup \text{Gal}(L/k(\zeta)). If G is nontrivial, then let \tau be a nontrivial element. We have \tau(\alpha) = \zeta^j\alpha for some j\in {1,\dots,p-1}, as \tau must act nontrivially on \alpha since it generates L over k(\zeta). Also, \tau acts trivially on \zeta by definition of G. Thus for any m\in \mathbb{N}, we have

\tau^m(\alpha) = \zeta^{jm}\alpha.

Since p is prime, jm traverses all residue classes mod p as m varies, and thus \tau^m(\alpha) = \zeta^{jm}\alpha moves to every root of f as m varies. Thus the cyclic subgroup \langle \tau\rangle \subset G acts transitively on the roots of f. It follows that f is irreducible over k(\zeta), and therefore over the smaller field k.

Thus, if f is reducible over k, it must be that G is trivial, and therefore that L=k(\zeta). It follows that \Gamma = \text{Gal}(k(\zeta)/k). An element \tau of \Gamma is therefore determined by its action on \zeta, which must be sent to some \zeta^j, j\in {1,\dots,p-1}; since composition of such maps corresponds to multiplication of the exponents, this means the map \tau\mapsto j realizes \Gamma as a subgroup of \mathbb{F}_p^\times. Since the latter group is cyclic (as it is a finite subgroup of the multiplicative group of a field), it follows that \Gamma is cyclic. Let \sigma be a generator.

Now we have \sigma(\alpha) = \zeta^b\alpha for some b\in {0,\dots,p-1}, and \sigma(\zeta) = \zeta^m for some m\in {1,\dots,p-1}. We will show that f has a root in k.

If m=1, then \zeta is fixed by \sigma, and thus the action of \Gamma = \langle \sigma\rangle on L=k(\zeta) is trivial, in which case L=k(\zeta)=k, and f has a root (in fact, it splits!) in k.

On the other hand, if m\neq 1, then the equation (m-1)j+b = 0, regarded as occurring in \mathbb{F}_p, has a solution for j. For this specific j, we have

\sigma(\zeta^j\alpha) = (\zeta^m)^j(\zeta^b\alpha) = \zeta^{mj+b}\alpha = \zeta^j\alpha,

since exponents of \zeta reduce mod p. We conclude that \zeta^j\alpha is fixed by \langle\sigma\rangle = \Gamma. Since this is the entire Galois group of L/k, we conclude that \zeta^j\alpha\in k. Thus f has a root in k. QED.

Addendum 8-4-2020: I encountered the uniform proof since writing this blog post. It’s essentially the characteristic p proof given above, which I didn’t notice works essentially unchanged in any characteristic. I now forget where I first read it, but just now I’m lifting it from the chapter on Abel’s impossibility theorem in H. Dörrie’s book 100 Great Problems of Elementary Mathematics.

In all cases, the roots of f can be expressed as \zeta^j \alpha, where \alpha is some root and \zeta is a pth root of unity. In the characteristic p case, \zeta=1. But this means if there is a proper factorization into factors of degree m,n with m+n=p and m,n\geq 1, then the constant terms of the factors (which reside in k) look like \zeta^M\alpha^m, \zeta^N\alpha^n, for some integers M,N. As above, in Case 1, because m,n are relatively prime, there exist integers j,\ell for which jm + \ell n = 1. Then

(\zeta^M\alpha^m)^j(\zeta^N\alpha^n)^\ell = \zeta^P\alpha

for some integer P. The thing on the right is both in k and a pth root of a! That’s it!

Design a site like this with WordPress.com
Get started