Rename "XHTML parsing" etc to more-accurate "XML parsing"#2062
Rename "XHTML parsing" etc to more-accurate "XML parsing"#2062
Conversation
|
I think there's a worthy goal here in clarifying that certain parts of the spec currently say "XHTML" but are meant to apply more broadly to all XML in web browsers. However, I'm not sure I'd go as far as this PR does in attempting to eradicate the concept of XHTML entirely. (Also I think it's worth being careful about how you phrase things to avoid pitchfork-wielding mobs; you're not "retiring XHTML" in the sense of asking browsers to get rid of it; you're phasing out the name XHTML.) In my opinion the name XHTML is still useful, to clarify that we're talking about a specific XML vocabulary. To the extent people think about standards at all when using the word XHTML, I disagree that they think about XHTML1; I hope that most people are aware that the HTML Standard is what defines XHTML these days as something that swallowed both the HTML4 and XHTML1 efforts. Concretely in terms of this PR, I think most of the changes then are not good, as they just remove specificity. It requires a lot more care to go through and find which parts of the spec are actually talking about XML in general as opposed to the specific XHTML vocabulary. At a glance, the "parsing XHTML documents" and "serializing XHTML fragments" sections are the most obvious changes, but most of the others I find in the PR I'd rather not change. |
Stated another way, the changes in this PR seem tantamount to me to replacing all mentions of "SVG" with "XML". |
|
I don't think it's quite the same as that. The vocabulary is HTML and there's an HTML and an XML syntax. This is true for SVG too. The vocabulary is SVG and there's an HTML and an XML syntax. Giving the vocabulary a different name in XML is rather weird and mostly a historical thing because it was new at the time and the focus back then was much more on syntax than systems. |
|
I see your point. Maybe it's just what I'm used to, but I still think there's value in having a simple name like "XHTML" for "the XML syntax of HTML", and using that in various sections in the spec. |
It seems to me that’s not apt, for the same reasons pointed out in #2062 (comment). To me it seems the way we’re using the “XHTML” label is similar to using a special name like “HSVG“ or whatever to label the case of SVG is embedded in a text/html document rather than an XML one.
You’re right—I’ve changed the issue title now to be less provocative. (And thanks for calling me on that—it’s counterproductive to be flame-baiting here.)
I strongly agree there would be if we had evidence showing we have a lot of authors who are actually doing that. But the evidence we actually have shows the opposite. I’ll post another comment with the details, but in the mean time, I want to explicitly assert something implicit in me raising this PR to begin with—which is: I don’t think the underlying use case is common enough or important enough to merit us continuing to give it a special label. |
|
To close the loop here, I withdraw my objection, but I don't wan't to be the one reviewing this since my instinct is just to leave things as they are, so I'm not a good judge of whether each of the changes is correct. I think we may want to be more careful about preserving the auto-generated IDs, however. And changing the split-filename has a lot of consequences; either we need to create a redirect, or we should just leave it as-is as a historical note. |
zcorpan
left a comment
There was a problem hiding this comment.
In general I think this is fine, but the prose needs some tweaks here and there and we shouldn't change filenames or ids.
source
Outdated
|
|
||
|
|
||
| <h3>HTML vs XHTML</h3> | ||
| <h3>HTML vs XML</h3> |
There was a problem hiding this comment.
I think this changes the expectation about what this section is about, from "HTML as text/html vs HTML as XML" to "HTML vs XML the meta language".
There was a problem hiding this comment.
I think in this section we should add a note that HTML's XML syntax was formerly known as XHTML, but that we decided to abandon that terminology since it does not exist for MathML and SVG either.
There was a problem hiding this comment.
I think as we transition it's important to acknowledge that XHTML was indeed a thing of sorts.
There was a problem hiding this comment.
- <h3>HTML vs XHTML</h3>
+ <h3>HTML vs XML</h3>I think this changes the expectation about what this section is about, from "HTML as text/html vs HTML as XML" to "HTML vs XML the meta language".
Agreed yeah the change is misleading as is.
Maybe "HTML vs XML syntax"?
I changed it to “HTML syntax vs XML syntax”.
I think in this section we should add a note that HTML's XML syntax was formerly known as XHTML, but that we decided to abandon that terminology since it does not exist for MathML and SVG either.
I think in this section we should add a note that HTML's XML syntax was formerly known as XHTML, but that we decided to abandon that terminology since it does not exist for MathML and SVG either.
I added a note saying that.
source
Outdated
| Web browsers. This specification defines the latest HTML syntax, known simply as "HTML".</p> | ||
|
|
||
| <p>The second concrete syntax is the XHTML syntax, which is an application of XML. When a document | ||
| <p>The second concrete syntax is XML. When a document |
source
Outdated
| are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors | ||
| will prevent a document labeled as XML from being rendered fully, whereas they would be ignored in | ||
| the HTML syntax. This specification defines the latest XHTML syntax, known simply as "XHTML".</p> | ||
| the HTML syntax.</p> |
source
Outdated
| the HTML syntax.</p> | ||
|
|
||
| <p>The DOM, the HTML syntax, and the XHTML syntax cannot all represent the same content. For | ||
| <p>The DOM, the HTML syntax, and XML cannot all represent the same content. For |
source
Outdated
| and in the XHTML syntax. Similarly, documents that use the <code>noscript</code> feature can be | ||
| represented using the HTML syntax, but cannot be represented with the DOM or in the XHTML syntax. | ||
| and in XML. Similarly, documents that use the <code>noscript</code> feature can be | ||
| represented using the HTML syntax, but cannot be represented with the DOM or in XML. |
|
|
||
| <p>Implementations that support <span>the XHTML syntax</span> must support some version of XML, | ||
| as well as its corresponding namespaces specification, because that syntax uses an XML | ||
| serialization with namespaces. <ref spec=XML> <ref spec=XMLNS></p> |
There was a problem hiding this comment.
I suppose we still want to require support for Namespaces in XML for UAs that support XML?
There was a problem hiding this comment.
Yes, I don't think we should remove this. In fact, we have numerous dependencies on XML/XMLNS in the platform. I doubt it's really optional.
There was a problem hiding this comment.
- <p>Implementations that support <span>the XHTML syntax</span> must support some version of XML,
- as well as its corresponding namespaces specification, because that syntax uses an XML
- serialization with namespaces. <ref spec=XML> <ref spec=XMLNS></p>I suppose we still want to require support for Namespaces in XML for UAs that support XML?
Yes, I don't think we should remove this. In fact, we have numerous dependencies on XML/XMLNS in the platform. I doubt it's really optional.
Restored
source
Outdated
|
|
||
|
|
||
| <div w-nodev> | ||
| <h2 split-filename="xml"><dfn>XML</dfn></h2> |
There was a problem hiding this comment.
Don't change the filename or the id.
There was a problem hiding this comment.
+ <h2 split-filename="xml"><dfn>XML</dfn></h2>Don't change the filename or the id.
OK, restored those
source
Outdated
| <div w-nodev> | ||
|
|
||
| <h3>Parsing XHTML documents</h3> | ||
| <h3>Parsing XML documents</h3> |
There was a problem hiding this comment.
id="parsing-xhtml-documents"
Added
source
Outdated
|
|
||
|
|
||
| <!--en-GB--><h3 id="serialising-xhtml-fragments">Serializing XHTML fragments</h3> | ||
| <!--en-GB--><h3 id="serialising-xml-fragments">Serializing XML fragments</h3> |
There was a problem hiding this comment.
- <!--en-GB--><h3 id="serialising-xhtml-fragments">Serializing XHTML fragments</h3>
+ <!--en-GB--><h3 id="serialising-xml-fragments">Serializing XML fragments</h3>don't change the id
OK, reverted the id
source
Outdated
|
|
||
|
|
||
| <h3>Parsing XHTML fragments</h3> | ||
| <h3>Parsing XML fragments</h3> |
There was a problem hiding this comment.
- <h3>Parsing XHTML fragments</h3>
+ <h3>Parsing XML fragments</h3>id
Added
|
I'm also in favor of this. I added some minor comments on top of those of @zcorpan. |
source
Outdated
|
|
||
|
|
||
| <h3>HTML vs XML</h3> | ||
| <h3>HTML syntax vs XML syntax</h3> |
There was a problem hiding this comment.
We should preserve the ID here. Also, "HTML vs XML syntax" seems more natural?
There was a problem hiding this comment.
- <h3>HTML vs XML</h3>
+ <h3>HTML syntax vs XML syntax</h3>We should preserve the ID here.
oofs, yeah—fixed
Also, "HTML vs XML syntax" seems more natural?
Yes, quite clear in the context—changed to that
source
Outdated
|
|
||
| <p class="note">The XML syntax for HTML was formerly referred to as "XHTML", but this | ||
| specification does not use that term (among other reasons, because no corresponding term is used | ||
| for the cases of MathML and SVG).</p> |
There was a problem hiding this comment.
Maybe "no such term is used for the HTML syntaxes of MathML and SVG"?
There was a problem hiding this comment.
Maybe "no such term is used for the HTML syntaxes of MathML and SVG"?
OK—changed it to that
source
Outdated
| Web browsers. This specification defines the latest HTML syntax, known simply as "HTML".</p> | ||
|
|
||
| <p>The second concrete syntax is the XHTML syntax, which is an application of XML. When a document | ||
| <p id="the-xml-syntax">The second concrete syntax is XML. When a document |
There was a problem hiding this comment.
Why was this ID added by the way?
There was a problem hiding this comment.
Just so there would be something to point to for anybody who wanted a specific reference for the XML syntax. But it’s not strictly necessary and not used internally, so I can just remove it if you think we should.
There was a problem hiding this comment.
Yeah, let's remove it then. There's a couple of sections that can be referenced with exposed IDs.
There was a problem hiding this comment.
id="the-xml-syntax"
Yeah, let's remove it then. There's a couple of sections that can be referenced with exposed IDs.
OK, removed
Yeah I shouldn’t have changed those to begin with. But I think we got them all reverted in review.
Yeah, undid that as well |
annevk
left a comment
There was a problem hiding this comment.
It's not clear whether these nits are worth fixing, but I'm not sure why we should change the tone from the original as well.
source
Outdated
| example, namespaces cannot be represented using the HTML syntax, but they are supported in the DOM | ||
| and in the XHTML syntax. Similarly, documents that use the <code>noscript</code> feature can be | ||
| represented using the HTML syntax, but cannot be represented with the DOM or in the XHTML syntax. | ||
| and in XML. Similarly, documents that use the <code>noscript</code> feature can be |
There was a problem hiding this comment.
Should this not be "the XML syntax"? This happens here and a couple times below where what used to say XHTML syntax is now just XML.
There was a problem hiding this comment.
Should this not be "the XML syntax"? This happens here and a couple times below where what used to say XHTML syntax is now just XML.
Yup, so changed
source
Outdated
| and in the XHTML syntax. Similarly, documents that use the <code>noscript</code> feature can be | ||
| represented using the HTML syntax, but cannot be represented with the DOM or in the XHTML syntax. | ||
| and in XML. Similarly, documents that use the <code>noscript</code> feature can be | ||
| represented using the HTML syntax, but cannot be represented with the DOM or in XML. |
source
Outdated
| represented using the HTML syntax, but cannot be represented with the DOM or in XML. | ||
| Comments that contain the string "<code data-x="">--></code>" can only be represented in the | ||
| DOM, not in the HTML and XHTML syntaxes.</p> | ||
| DOM, not in the HTML syntax or XML.</p> |
There was a problem hiding this comment.
And here. Should be "in the HTML and XML syntaxes" I think?
There was a problem hiding this comment.
got it and the rest too
source
Outdated
|
|
||
| <p>Generally, when the specification states that a feature applies to <span>the HTML syntax</span> | ||
| or <span>the XHTML syntax</span>, it also includes the other. When a feature specifically only | ||
| or <span>XML</span>, it also includes the other. When a feature specifically only |
source
Outdated
| objects and their descendant DOM trees, and to serialized byte streams using the <span data-x="the | ||
| HTML syntax">HTML syntax</span> or <span data-x="the XHTML syntax">XHTML syntax</span>, depending | ||
| on context.</p> | ||
| HTML syntax">HTML syntax</span> or <span>XML</span>, depending on context.</p> |
Happy with this when Anne's nits are fixed
|
Oh, can you re-wrap to 100 cols also? |
yup, will do right now |
source
Outdated
|
|
||
|
|
||
| <div w-nodev> | ||
| <h2 split-filename="xhtml"><dfn id="xhtml">XML</dfn></h2> |
There was a problem hiding this comment.
Ah, so if we keep this as "The XML syntax", and use <span>the XML syntax</span> above, it would be all consistent again. So I think we want to do that too.
There was a problem hiding this comment.
+ <h2 split-filename="xhtml"><dfn id="xhtml">XML</dfn></h2>Yup, so changed
source
Outdated
| <p>The above technique is also useful in XHTML, since <code>noscript</code> is not supported in | ||
| <span>the XHTML syntax</span>.</p> | ||
| <p>The above technique is also useful in XML, since <code>noscript</code> is not supported in | ||
| XML.</p> |
source
Outdated
| is by essentially "turning off" the parser when scripts are enabled, so that the contents of the | ||
| element are treated as pure text and not as real elements. XML does not define a mechanism by | ||
| which to do this.</p> | ||
| syntax</span>, it has no effect in XML. This is because the way it works is by essentially |
source
Outdated
| represented using the HTML syntax, but cannot be represented with the DOM or in the XML syntax. | ||
| Comments that contain the string "<code data-x="">--></code>" can only be represented in the | ||
| DOM, not in the HTML and XHTML syntaxes.</p> | ||
| DOM, not in the HTML syntax or the XML syntax.</p> |
source
Outdated
| or <span>the XHTML syntax</span>, it also includes the other. When a feature specifically only | ||
| applies to one of the two languages, it is called out by explicitly stating that it does not apply | ||
| to the other format, as in "for HTML, ... (this does not apply to XHTML)".</p> | ||
| or the XML syntax, it also includes the other. When a feature specifically only applies to one of |
source
Outdated
| objects and their descendant DOM trees, and to serialized byte streams using the <span data-x="the | ||
| HTML syntax">HTML syntax</span> or <span data-x="the XHTML syntax">XHTML syntax</span>, depending | ||
| on context.</p> | ||
| HTML syntax">HTML syntax</span> or the XML syntax, depending on context.</p> |
There was a problem hiding this comment.
<span data-x="the XML syntax">XML syntax</span> (no need to change this from the original either)
source
Outdated
| from the <span>HTML namespace</span> found in XML documents as described in this specification, | ||
| so that users can interact with them, unless the semantics of those elements have been | ||
| overridden by other specifications.</p> | ||
| <p>Web browsers that support XML must process elements and attributes from the <span>HTML |
source
Outdated
| using a <a href="#writing">custom format</a> inspired by SGML (referred to as <span>the HTML | ||
| syntax</span>). Implementations must support at least one of these two formats, although | ||
| supporting both is encouraged.</p> | ||
| two authoring formats: one based on <span data-x="xhtml">XML</span>, and one using a <a |
source
Outdated
|
|
||
| <p>Implementations that support <span>the XHTML syntax</span> must support some version of XML, | ||
| as well as its corresponding namespaces specification, because that syntax uses an XML | ||
| <p>Implementations that support the XML syntax for HTML must support some version of XML, as |
source
Outdated
| <span>browsing context</span> are exempt from all document conformance requirements other than the | ||
| <a href="#writing">HTML syntax</a> requirements and <a href="#writing-xhtml-documents">XHTML | ||
| syntax</a> requirements.</p> | ||
| <a href="#writing">HTML syntax</a> requirements and <span data-x="xhtml">XML syntax</span> |
There was a problem hiding this comment.
I think we should reinstate the "writing-xhtml-documents" ID and subsection (and link it from here).
There was a problem hiding this comment.
I think we should reinstate the "writing-xhtml-documents" ID and subsection (and link it from here).
OK restored the link and restored the heading as:
<h3 id="writing-xhtml-documents">Writing documents in the XML syntax</h3>
source
Outdated
| is by essentially "turning off" the parser when scripts are enabled, so that the contents of the | ||
| element are treated as pure text and not as real elements. XML does not define a mechanism by | ||
| which to do this.</p> | ||
| syntax</span>, it has no effect in <span data-x="xhtml">the XML syntax</span>. This is because the |
source
Outdated
| <p>The above technique is also useful in XHTML, since <code>noscript</code> is not supported in | ||
| <span>the XHTML syntax</span>.</p> | ||
| <p>The above technique is also useful in <span data-x="xhtml">the XML syntax</span>, since | ||
| <code>noscript</code> is not supported in <span data-x="xhtml">the XML syntax</span>.</p> |
source
Outdated
| <p class="note">This section only describes the rules for resources labeled with an <span>HTML | ||
| MIME type</span>. Rules for XML resources are discussed in the section below entitled "<span>The | ||
| XHTML syntax</span>".</p> | ||
| MIME type</span>. Rules for XML resources are discussed in the <span data-x="xhtml">XML |
annevk
left a comment
There was a problem hiding this comment.
Many nits left relative to the original text and a couple of referencing errors as far as I can tell.
|
|
||
| <p class="note">This section only describes the rules for XML resources. Rules for | ||
| <code>text/html</code> resources are discussed in the section above entitled "<span>The HTML | ||
| syntax</span>".</p> |
There was a problem hiding this comment.
I don't see a reason to remove this note or the "Writing XML documents" section following it.
There was a problem hiding this comment.
- <p class="note">This section only describes the rules for XML resources. Rules for
- <code>text/html</code> resources are discussed in the section above entitled "<span>The HTML
- syntax</span>".</p>I don't see a reason to remove this note or the "Writing XML documents" section following it.
OK—restored both
|
LGTM, but maybe someone should give it another pass with fresh eyes. |
This change replaces references throughout the spec to “the XHTML syntax” and “XHTML parsing”, etc., with references instead to “the XML syntax [of HTML]” and “XML parsing”, while adding a couple of notes to help make clear that the term “the XML syntax” is the same thing the term “XHTML” was formerly used for.
671b92a to
27266ab
Compare
To make that easier, I went ahead and squashed the commits so there’s only one diff to look at. |
source
Outdated
|
|
||
| <p>Generally, when the specification states that a feature applies to <span>the HTML syntax</span> | ||
| or <span>the XHTML syntax</span>, it also includes the other. When a feature specifically only | ||
| or the <span>XML syntax</span>, it also includes the other. When a feature specifically only |
There was a problem hiding this comment.
The span needs to wrap the "the" as well to xref correctly. (We need to make wattsi fail for this...)
There was a problem hiding this comment.
The
spanneeds to wrap the "the" as well to xref correctly. (We need to make wattsi fail for this...)
oofs, thanks for catching that
| the same way as for an <span>HTML parser</span>.</p> | ||
|
|
||
| <p>For the purposes of conformance checkers, if a resource is determined to be in <span>the XHTML | ||
| syntax</span>, then it is an <span data-x="XML documents">XML document</span>.</p> |
There was a problem hiding this comment.
Why is this no longer necessary? "XML documents" here is the DOM concept, and some document conformance differences are stated in terms of that (e.g. noscript). But it might not be clear how that concept applies to a conformance checker without this paragraph.
There was a problem hiding this comment.
- <p>For the purposes of conformance checkers, if a resource is determined to be in <span>the XHTML
- syntax</span>, then it is an <span data-x="XML documents">XML document</span>.</p>Why is this no longer necessary?
I agree it’s necessary now after the round “XML”->“the XML syntax” review changes, so I’ve restored it.
Initially when I had replaced “the XHTML syntax” with just “XML”, this would have become:
For the purposes of conformance checkers, if a resource is determined to be XML, then it is an XML document.
…which didn’t seem to expressing anything implementable…
source
Outdated
|
|
||
|
|
||
| <h2 split-filename="xhtml"><dfn id="xhtml">The XHTML syntax</dfn></h2> | ||
| <h2 split-filename="xhtml" id="xhtml"><dfn>The XML syntax</dfn></h2> |
There was a problem hiding this comment.
This is another subtle change. The ID used to be on the <dfn>. I think that distinction is important for Wattsi.
source
Outdated
| MIME type</span>. Rules for XML resources are discussed in the section below entitled "<span>The | ||
| XHTML syntax</span>".</p> | ||
| MIME type</span>. Rules for XML resources are discussed in the <span data-x="the XML syntax">XML | ||
| syntax</span> section.</p> |
There was a problem hiding this comment.
Why not put the "the" inside the span and omit data-x? Same for the note below.
source
Outdated
| <p>The above technique is also useful in XHTML, since <code>noscript</code> is not supported in | ||
| <span>the XHTML syntax</span>.</p> | ||
| <p>The above technique is also useful in <span>the XML syntax</span>, since <code>noscript</code> | ||
| is not supported in there.</p> |
|
I'll push a fix for these remaining nits. |
…el (speaks about HTML documents)
|
\o/ |
See #2056 (comment)
The current Parsing XHTML documents, Serializing XHTML fragments, and Parsing XHTML fragments sections define requirements for XML processing, not anything specific to “XHTML”.
And in general continued use of the term “the XHTML syntax” buys us nothing at this point, so the change in this PR replaces all references to that with just “XML”.
When most authors see the term “XHTML document” they think about XHTML1, not about anything we define in the current HTML spec. It’s ambiguous. So this change clears away that ambiguity.