Advanced macro support and magic \dots#794
Conversation
| "([!-\\[\\]-\u2027\u202A-\uD7FF\uF900-\uFFFF]" + // single codepoint | ||
| "|[\uD800-\uDBFF][\uDC00-\uDFFF]" + // surrogate pair | ||
| "|\\\\(?:[a-zA-Z]+|[^\uD800-\uDFFF])" + // function name | ||
| "|\\\\(?:[a-zA-Z@]+|[^\uD800-\uDFFF])" + // function name |
There was a problem hiding this comment.
This is going to be dangerous. @ is not actually a valid character for macro nams in default TeX. It only works after \makeatletter, and is intended for internal macros that users don't type.
Overall it is probably safe to include @ by default, or wait for complaints before trying a complete fix; the \@ macro is rare in math.
There was a problem hiding this comment.
Good to have at least one negative reaction -- I was curious what others would think. I see three alternatives:
- Add a
Lexeroption for whether to include@as a letter. Then we could actually support\makeatletterand\makeatother(presumably this would need to be carried in theSettingsobject...). - Have the
cdotsmacro return manually parsedTokens instead of strings. Then we don't need to touch theLexer. - Rename
\@cdotsto something else like\cdotsINTERNALor\latexcdots.
Thoughts/preferences?
There was a problem hiding this comment.
Incidentally, the regex change won't affect @{...} support in tabular (the only use I know of @ as an active char), nor will it affect the most typical usage of \@, namely, \@. (not that KaTeX currently supports \@). What it would break is if you used \@ immediately followed by a letter, which ... I can't imagine doing. So the other option is to leave this change as is for now.
There was a problem hiding this comment.
That is where I came down and for a similar reason: keep @ as is in this commit.
| // \let\DOTSX\relax | ||
| defineMacro("\\DOTSI", "\\relax"); | ||
| defineMacro("\\DOTSB", "\\relax"); | ||
| defineMacro("\\DOTSX", "\\relax"); |
There was a problem hiding this comment.
amsmath defines these "null macros" for other macros to indicate the behavior of \dots preceding them. Indeed, I should have added use of these macros to \iff, \implies, and \impliedby. See new commit.
|
I'm going to try to get this reviewed over the next couple of days. |
|
Thanks @kevinbarabash! I just rebased to fix a conflict with the CJK tweak on |
|
|
||
| it("should consume spaces after macro", function() { | ||
| compareParseTree("\\text{\\foo }", "\\text{x}", {"\\foo": "x"}); | ||
| }); |
There was a problem hiding this comment.
Nice tests. I like how these call out differences in behavior between math mode and text mode.
| "\\foo": "\\bar\\bar", | ||
| "\\bar": "a", | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Not necessary for this PR, but it might be good to have some tests that verify things about multiple expansions involving arguments.
| * Recursively expand first token, then return first non-expandable token. | ||
| * Return the next unexpanded token without removing anything from the | ||
| * stack. Similar in behavior to TeX's `\futurelet`. | ||
| */ |
There was a problem hiding this comment.
I find the term unexpanded token a little confusing, because it's possible that future gets called sometime after expandOnce gets called in which case there may be tokens on the stack that were the result of expanding a token. Maybe stating this in the following way:
Returns the topmost token on the stack, without expanding it.
There was a problem hiding this comment.
Nice rewrite. Implemented.
src/MacroExpander.js
Outdated
|
|
||
| /** | ||
| * Expand next token only once, and leave it on the stack. | ||
| * Returns the token or its expansion. |
There was a problem hiding this comment.
This comment could be a bit clearer, maybe:
Expand the next token only once if possible.
If the token is expanded, the resulting tokens will be push
on to the stack in reverse order and will be return in as an
array, also in reverse order.
If not, the next token will be return without removing it from
the stack.
As a result, as long as there are tokens on the stack, the next
token is at the top of the stack.
Feel free to wordsmith this more.
src/MacroExpander.js
Outdated
| expandOnce() { | ||
| const topToken = this.popToken(); | ||
| const name = topToken.text; | ||
| const macro = (name.charAt(0) === "\\"); |
There was a problem hiding this comment.
macro => isMacro to more clearly indicate that it's a boolean.
| let expansion = this.macros[name]; | ||
| if (typeof expansion === "function") { | ||
| expansion = expansion.call(this); | ||
| } |
There was a problem hiding this comment.
If expansion is a function then it must always return a string with valid TeX code which can contain argument placeholders, is that correct?
There was a problem hiding this comment.
Technically, it could also return an Array that's a reverse-ordered list of tokens. This was functionality from before that I'm preserving, though I'm not sure we'd want to use it exactly that way. (In particular, the reverse-order aspect is weird.) But it is a nice way for a macro to be able to force strange parsing.
| if (!(macro && this.macros.hasOwnProperty(name))) { | ||
| // Fully expanded | ||
| this.stack.push(topToken); | ||
| return topToken; |
There was a problem hiding this comment.
This is a little weird to pop topToken at the start just to push it back on. I can't think of demonstrably better way to do this though.
There was a problem hiding this comment.
I agree, unfortunately on both counts. If I didn't have to consumeSpaces, I could use future() and then popToken after the fully expanded case. Maybe there's another place to put the consumeSpaces... but at least this works.
src/MacroExpander.js
Outdated
| /** | ||
| * Expand the next token once (ignoring initial spaces like `get`), | ||
| * without removing anything from the stack, and return the top token | ||
| * on the stack. Similar in behavior to TeX's `\expandafter\futurelet`. |
There was a problem hiding this comment.
How does it ignore initial spaces if it calls expandOnce right away which only ignores trailing spaces?
There was a problem hiding this comment.
That comment was out-of-date. Updated.
| // expandOnce returns Token if and only if it's fully expanded. | ||
| if (expanded instanceof Token) { | ||
| // \relax stops the expansion, but shouldn't get returned (a | ||
| // null return value couldn't get implemented as a function). |
There was a problem hiding this comment.
I'm not sure what a null return value couldn't get implemented as a function means. Could you give an example?
There was a problem hiding this comment.
Functions (as defined by defineFunction) return a ParseNode or an object that gets converted into a ParseNode by the parser. It's not possible for a function to return null that would just disappear in the final parse tree. Hence it would be impossible to implement a \relax that expands to nothing (which is different to how {} expands).
There was a problem hiding this comment.
So if we return null the parser would think that it's an object an convert it to a ParseNode. I wonder when would we ever want to write a function that returns a null?
There was a problem hiding this comment.
Long-term, I vaguely hope we can transition functions over to macros, as that's what TeX really does. So maybe not too critical... though certainly could be done by tweaking group parsing.
src/macros.js
Outdated
| import utils from "./utils"; | ||
|
|
||
| // This function might one day accept additional argument and do more things. | ||
| function defineMacro(name, body) { |
There was a problem hiding this comment.
Once we get flow in place we'll be able to type this as:
function defineMacro(name: string, body: string | () => string) {
|
Sorry for the delay. Finally went through and implemented all the comments. Thanks for the review!! Incidentally, this PR also fixes a bug in master with |
|
|
||
| // This function might one day accept additional argument and do more things. | ||
| function defineMacro(name, body) { | ||
| function defineMacro(name: string, body: string | () => string) { |
There was a problem hiding this comment.
Cool! Mind adding // @flow at the top of this file?
There was a problem hiding this comment.
Oops, indeed! Clearly still a flow noob...
There was a problem hiding this comment.
Me too, for all that's worth. ;)
This is a re-implementation of #599 (
\dotssupport) but extending the MacroExpander as necessary to build it as a macro instead of a function, following @gagern's outline.Along the way, we get some nice new macro features:
defineMacro) can now be JavaScript functions, withthisbound to theMacroExpander. (Note the slightly different interface from functions, which get a complex context object -- up for discussion!) This lets us extend theMacroExpanderin more powerful ways, and use the following features:future()returns the next token without expansion (similar to\futureletbut without the "let" aspect of assigning to a variation)expandAfterFuture()returns the next token after one level of expansion (equivalent to a careful use of\expandafterand\futurelet)\relaxis now implemented! It stops expansion, but doesn't actually get returned from theMacroExpander.We also get fixes to old KaTeX bugs (with new tests to confirm, some of which used to fail):
\text{\textellipsis !}used to include a space in the middle. The trouble is that we were only eating spaces when in math mode, but in text mode we need to be more careful to eat the spaces that make up part of the command name. This is now consumed during macro expansion, even though it's not technically a macro expansion (\textellipsisis an unexpandable character).\text{\foo }where\foois defined to something -- we got a space, but shouldn't have.\def\foo#1{(#1)}\foo\barshould expand to(\bar)and then expand\bar, whereas the old macro expander would first expand\barand then use its tokens for arguments. For example, this didn't work if\def\bar{}or\def\bar{ }(especially the latter because of bad space ignoring). Also, if\foois a multiargument function,\barshould constitute one argument instead of one per token. (I did a bunch of testing in LaTeX to confirm this behavior.)A few other miscellaneous changes:
Lexer.jsoptionally exportsToken, so that I can doinstanceoftests.consumeSpaces()irreversibly consumes space tokens. This is used to fix the first bug listed above.nextToken()got split intoexpandOnce()andexpandNextToken(). So, e.g.,expandAfterFuture()is equivalent toexpandOnce()and thenfuture(). (This makes for some awkward diffs -- sorry!)\cdots@, because amsmath defines a more complex\cdotsin terms of this symbol.@is now considered a valid character for commands, as if\makeatletteris in effect. I don't think we currently define\@but if we did this change would be annoying.\dots,\cdots, etc. correctly.Here's a texcmp confirming I got at least the tested cases right: