Make UAs support named character references in all XML docs#2056

sideshowbarker · 2016-11-17T07:25:25Z

Authors choosing to serve HTML documents with an XML mime type shouldn’t forever be forced to put an obsolete XHTML1 doctype in their documents just to able to use named character references.

This relates to #2048 but is separated out in the interest ensuring this affects-browsers change doesn’t get overlooked by implementors in the midst of the doesn’t-affect-browsers changes in #2048. But I can fold it into #2048 if we think it’s more valuable to have all these changes in one PR.

💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:57 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

Parsing MDN data...
Parsing...

If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

annevk · 2016-11-17T07:30:04Z

So this is basically #500. I'm supportive of this change, but it requires interest from implementers.

Relates to #2048

sideshowbarker · 2016-11-17T07:41:03Z

So this is basically #500.

As far as I understand #500, it’s proposing adding a new public and/or system identifier in addition to XHTML1 and MathML identifiers the spec already lists (and which this PR removes).

The change in this PR wouldn’t require authors to specify any special identifier at all (other than just <!DOCTYPE html>)—instead we would just make browser behavior for the HTML-served-with-an-XML-mime-type case consistent with the browser behavior for the text/html case.

That is, browsers would just always consistently recognize named character references in any document that ends up in the DOM with a document element in the HTML namespace—regardless of how it got there.

annevk · 2016-11-17T08:32:21Z

Sorry, yeah, I understood your change. I meant that this change would remove the need for the request in that issue.

zcorpan · 2016-11-18T08:05:16Z

I think if this change is OK, it should be OK to go all the way and just enable the entities always in XML. Why should they not work in SVG, Atom, etc?

annevk · 2016-11-18T08:07:53Z

Oh yeah, I had not noticed this was limited somehow. I'm not even sure that limitation works in practice.

sideshowbarker · 2016-11-18T10:58:41Z

I think if this change is OK, it should be OK to go all the way and just enable the entities always in XML. Why should they not work in SVG, Atom, etc?

But the HTML spec just defines requirements for HTML documents, right? It doesn’t (yet) anywhere attempt to define requirements for SVG or Atom or any other XML vocabularies.

So as far as this change in this PR, are you thinking that we could/should drop the part that says “If the document element of a Document is in the HTML namespace”? And instead have it just say to do that regardless of what the namespace is?

If so, IMHO it would be out of scope for the HTML spec to try to state that requirement for all of XML. And I think others (e.g., the SVG WG) would likely object to the HTML spec stating it.

annevk · 2016-11-18T11:09:26Z

The current bit about DOCTYPEs already affects all of XML. template element processing affects all of XML. script element processing affects all of XML. HTML defines how XML is loaded in browsing contexts too. There's a ton of things HTML defines about XML already and nobody has objected about that thus far.

sideshowbarker · 2016-11-18T11:35:16Z

There's a ton of things HTML defines about XML already and nobody has objected about that thus far.

Yeah I realize that now. Hadn’t thought it through before I responded.

Anyway I’m not personally opposed to making the spec say the entities should work in UAs for any XML. So I’ll take a shot at refining the patch here to actually say that.

sideshowbarker · 2016-11-18T11:41:01Z

Anyway I’m not personally opposed to making the spec say the entities should work in UAs for any XML. So I’ll take a shot at refining the patch here to actually say that.

The section the requirement is currently in is the “The XHTML syntax” section—specifically about XHTML and not about XML in general. So I am looking right now for where else in the spec to move it to. In the mean time, guidance welcome.

annevk · 2016-11-18T11:47:42Z

Well, it says "Parsing XHTML documents" but then it actually defines "XML parser" within that section. Arguably we should rename those sections to more accurately reflect what is going on.

Now nobody really cares about XHTML anymore that might be easier to do.

sideshowbarker · 2016-11-18T11:59:59Z

Well, it says "Parsing XHTML documents" but then it actually defines "XML parser" within that section. Arguably we should rename those sections to more accurately reflect what is going on.

Now nobody really cares about XHTML anymore that might be easier to do.

OK yeah I’ve never been fond of continuing to forever call these document “XHTML documents”. Among other reasons I think when most authors see the term “XHTML document” they think it is still talking about XHTML1, not about anything we define in the current HTML spec.

I think it would be better to instead consistently use something precise like “HTML documents served with an XML mime type” that makes it clear and unambiguous what we actually mean—and to forever retire the term “XHTML” as far as spec usage goes.

Anyway I would be glad to retitle the entire “The XHTML syntax” section and to rework (or move) the contents of it—but it seems the change is this PR doesn’t need to wait on that.

So for now in 2303f8d I moved the requirement about the entities to the Page load processing model for XML files. Lemme now if that works.

annevk · 2016-11-18T13:04:13Z

I think the XML parser section is a better fit. Otherwise this would not work for XMLHttpRequest for instance.

sideshowbarker · 2016-11-18T16:41:53Z

I think the XML parser section is a better fit. Otherwise this would not work for XMLHttpRequest for instance.

OK b5b9bc6 restores it to there. (And #2062—which can first land without this—actually makes the section into being the XML parser section, by replacing XHTML in the section titles with just XML).

domenic · 2016-11-23T23:43:13Z

I know @dominiccooney was working on XML in Blink. Maybe @hsivonen is the correct person to ask for Gecko? Any implementer interest? Seems like a nice simplification.

annevk · 2018-02-05T15:29:06Z

source

+  URL given by this link</a> (this URL is a DTD containing the <a
+  href="https://www.w3.org/TR/xml/#sec-entity-decl">entity declarations</a> for the names listed in
+  the <span>named character references</span> section), and should not attempt to retrieve any other
+  external entity's content. <ref spec=XML></p>


Seems we should make this a MUST if we do this?

hsivonen · 2018-02-05T16:32:08Z

I think the text in the PR isn't specific enough to explain what exactly is expected to happen. @sideshowbarker's comment here indicates that instead of this being an entity resolver hack within the constraints of XML conformance, this would involve patching an XML parser not to be conforming to XML.

I'm reluctant to proceed for four reasons.

XML lost on the Web to the point that there isn't the enthusiasm for serving XHTML as application/xhtml+xml that existed in 2006. In that sense, it seems to it's no longer worthwhile to stir this space by trying to solve problems.
If we do want to stir this space, I think we should do XML5 fully with an exact spec instead of ad hoc patching an XML 1.0 parser in an ill-specified manner.
I'm worried about interop with non-browser consumers in the ad hoc patching scenario. In the XML5 scenario, it would be clear that a bunch of systematic activity to write XML5 parsers for various programming languages is needed. With the ad hoc patching scenario, people would face interop problem without a clear way to resolve them. ("You need an XML 1.0 parser with this hack" is a worse story than "You need an XML5 parser".)
The current solution of having to stick particular aesthetically displeasing boilerplate at the top of the XML source already works in browsers and can be explained in terms of an entity resolver configuration for non-browsers. That's less disruptive than sending everyone on the treadmill to patch software in order to have a more aesthetically pleasing boilerplate.

annevk · 2021-04-28T13:14:56Z

I suggest we close this. @sideshowbarker?

sideshowbarker · 2021-04-28T14:36:53Z

I suggest we close this. @sideshowbarker?

Yup

annevk added do not merge yet Pull request must not be merged per rationale in comment needs implementer interest Moving the issue forward requires implementers to express interest labels Nov 17, 2016

Enable named character refs in XHTML w/o doctype

24d620a

Relates to #2048

sideshowbarker force-pushed the xhtml-named-charrefs-no-doctype branch from 384f772 to 24d620a Compare November 17, 2016 07:30

sideshowbarker mentioned this pull request Nov 17, 2016

Make HTML4/XHTML1 Strict doctypes non-conforming #2048

Merged

sideshowbarker mentioned this pull request Nov 18, 2016

Rename "XHTML parsing" etc to more-accurate "XML parsing" #2062

Merged

Make UAs support named charrefs in all XML docs

b5b9bc6

sideshowbarker force-pushed the xhtml-named-charrefs-no-doctype branch from 2303f8d to b5b9bc6 Compare November 18, 2016 16:35

sideshowbarker changed the title ~~Enable named character refs in XHTML w/o doctype~~ Make UAs support named character references in all XML docs Nov 18, 2016

annevk reviewed Feb 5, 2018

View reviewed changes

Base automatically changed from master to main January 15, 2021 07:56

sideshowbarker closed this Apr 28, 2021

sideshowbarker deleted the xhtml-named-charrefs-no-doctype branch April 28, 2021 14:36

Conversation

sideshowbarker commented Nov 17, 2016 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💥 Error: Wattsi server error 💥

Uh oh!

annevk commented Nov 17, 2016

Uh oh!

sideshowbarker commented Nov 17, 2016

Uh oh!

annevk commented Nov 17, 2016

Uh oh!

zcorpan commented Nov 18, 2016

Uh oh!

annevk commented Nov 18, 2016

Uh oh!

sideshowbarker commented Nov 18, 2016

Uh oh!

annevk commented Nov 18, 2016

Uh oh!

sideshowbarker commented Nov 18, 2016

Uh oh!

sideshowbarker commented Nov 18, 2016

Uh oh!

annevk commented Nov 18, 2016

Uh oh!

sideshowbarker commented Nov 18, 2016

Uh oh!

annevk commented Nov 18, 2016

Uh oh!

sideshowbarker commented Nov 18, 2016

Uh oh!

domenic commented Nov 23, 2016

Uh oh!

annevk Feb 5, 2018

Choose a reason for hiding this comment

Uh oh!

hsivonen commented Feb 5, 2018

Uh oh!

annevk commented Apr 28, 2021

Uh oh!

sideshowbarker commented Apr 28, 2021

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

sideshowbarker commented Nov 17, 2016 •

edited by pr-preview bot

Loading