Skip to content

Make UAs support named character references in all XML docs#2056

Closed
sideshowbarker wants to merge 2 commits intomainfrom
xhtml-named-charrefs-no-doctype
Closed

Make UAs support named character references in all XML docs#2056
sideshowbarker wants to merge 2 commits intomainfrom
xhtml-named-charrefs-no-doctype

Conversation

@sideshowbarker
Copy link
Member

@sideshowbarker sideshowbarker commented Nov 17, 2016

Authors choosing to serve HTML documents with an XML mime type shouldn’t forever be forced to put an obsolete XHTML1 doctype in their documents just to able to use named character references.

This relates to #2048 but is separated out in the interest ensuring this affects-browsers change doesn’t get overlooked by implementors in the midst of the doesn’t-affect-browsers changes in #2048. But I can fold it into #2048 if we think it’s more valuable to have all these changes in one PR.


💥 Error: Wattsi server error 💥

PR Preview failed to build. (Last tried on Jan 15, 2021, 7:57 AM UTC).

More

PR Preview relies on a number of web services to run. There seems to be an issue with the following one:

🚨 Wattsi Server - Wattsi Server is the web service used to build the WHATWG HTML spec.

🔗 Related URL

Parsing MDN data...
Parsing...



If you don't have enough information above to solve the error by yourself (or to understand to which web service the error is related to, if any), please file an issue.

@annevk
Copy link
Member

annevk commented Nov 17, 2016

So this is basically #500. I'm supportive of this change, but it requires interest from implementers.

@annevk annevk added do not merge yet Pull request must not be merged per rationale in comment needs implementer interest Moving the issue forward requires implementers to express interest labels Nov 17, 2016
@sideshowbarker sideshowbarker force-pushed the xhtml-named-charrefs-no-doctype branch from 384f772 to 24d620a Compare November 17, 2016 07:30
@sideshowbarker
Copy link
Member Author

So this is basically #500.

As far as I understand #500, it’s proposing adding a new public and/or system identifier in addition to XHTML1 and MathML identifiers the spec already lists (and which this PR removes).

The change in this PR wouldn’t require authors to specify any special identifier at all (other than just <!DOCTYPE html>)—instead we would just make browser behavior for the HTML-served-with-an-XML-mime-type case consistent with the browser behavior for the text/html case.

That is, browsers would just always consistently recognize named character references in any document that ends up in the DOM with a document element in the HTML namespace—regardless of how it got there.

@annevk
Copy link
Member

annevk commented Nov 17, 2016

Sorry, yeah, I understood your change. I meant that this change would remove the need for the request in that issue.

@zcorpan
Copy link
Member

zcorpan commented Nov 18, 2016

I think if this change is OK, it should be OK to go all the way and just enable the entities always in XML. Why should they not work in SVG, Atom, etc?

@annevk
Copy link
Member

annevk commented Nov 18, 2016

Oh yeah, I had not noticed this was limited somehow. I'm not even sure that limitation works in practice.

@sideshowbarker
Copy link
Member Author

I think if this change is OK, it should be OK to go all the way and just enable the entities always in XML. Why should they not work in SVG, Atom, etc?

But the HTML spec just defines requirements for HTML documents, right? It doesn’t (yet) anywhere attempt to define requirements for SVG or Atom or any other XML vocabularies.

So as far as this change in this PR, are you thinking that we could/should drop the part that says “If the document element of a Document is in the HTML namespace”? And instead have it just say to do that regardless of what the namespace is?

If so, IMHO it would be out of scope for the HTML spec to try to state that requirement for all of XML. And I think others (e.g., the SVG WG) would likely object to the HTML spec stating it.

@annevk
Copy link
Member

annevk commented Nov 18, 2016

The current bit about DOCTYPEs already affects all of XML. template element processing affects all of XML. script element processing affects all of XML. HTML defines how XML is loaded in browsing contexts too. There's a ton of things HTML defines about XML already and nobody has objected about that thus far.

@sideshowbarker
Copy link
Member Author

There's a ton of things HTML defines about XML already and nobody has objected about that thus far.

Yeah I realize that now. Hadn’t thought it through before I responded.

Anyway I’m not personally opposed to making the spec say the entities should work in UAs for any XML. So I’ll take a shot at refining the patch here to actually say that.

@sideshowbarker
Copy link
Member Author

Anyway I’m not personally opposed to making the spec say the entities should work in UAs for any XML. So I’ll take a shot at refining the patch here to actually say that.

The section the requirement is currently in is the “The XHTML syntax” section—specifically about XHTML and not about XML in general. So I am looking right now for where else in the spec to move it to. In the mean time, guidance welcome.

@annevk
Copy link
Member

annevk commented Nov 18, 2016

Well, it says "Parsing XHTML documents" but then it actually defines "XML parser" within that section. Arguably we should rename those sections to more accurately reflect what is going on.

Now nobody really cares about XHTML anymore that might be easier to do.

@sideshowbarker
Copy link
Member Author

Well, it says "Parsing XHTML documents" but then it actually defines "XML parser" within that section. Arguably we should rename those sections to more accurately reflect what is going on.

Now nobody really cares about XHTML anymore that might be easier to do.

OK yeah I’ve never been fond of continuing to forever call these document “XHTML documents”. Among other reasons I think when most authors see the term “XHTML document” they think it is still talking about XHTML1, not about anything we define in the current HTML spec.

I think it would be better to instead consistently use something precise like “HTML documents served with an XML mime type” that makes it clear and unambiguous what we actually mean—and to forever retire the term “XHTML” as far as spec usage goes.

Anyway I would be glad to retitle the entire “The XHTML syntax” section and to rework (or move) the contents of it—but it seems the change is this PR doesn’t need to wait on that.

So for now in 2303f8d I moved the requirement about the entities to the Page load processing model for XML files. Lemme now if that works.

@annevk
Copy link
Member

annevk commented Nov 18, 2016

I think the XML parser section is a better fit. Otherwise this would not work for XMLHttpRequest for instance.

@sideshowbarker sideshowbarker force-pushed the xhtml-named-charrefs-no-doctype branch from 2303f8d to b5b9bc6 Compare November 18, 2016 16:35
@sideshowbarker sideshowbarker changed the title Enable named character refs in XHTML w/o doctype Make UAs support named character references in all XML docs Nov 18, 2016
@sideshowbarker
Copy link
Member Author

I think the XML parser section is a better fit. Otherwise this would not work for XMLHttpRequest for instance.

OK b5b9bc6 restores it to there. (And #2062—which can first land without this—actually makes the section into being the XML parser section, by replacing XHTML in the section titles with just XML).

@domenic
Copy link
Member

domenic commented Nov 23, 2016

I know @dominiccooney was working on XML in Blink. Maybe @hsivonen is the correct person to ask for Gecko? Any implementer interest? Seems like a nice simplification.

URL given by this link</a> (this URL is a DTD containing the <a
href="https://www.w3.org/TR/xml/#sec-entity-decl">entity declarations</a> for the names listed in
the <span>named character references</span> section), and should not attempt to retrieve any other
external entity's content. <ref spec=XML></p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we should make this a MUST if we do this?

@hsivonen
Copy link
Member

hsivonen commented Feb 5, 2018

I think the text in the PR isn't specific enough to explain what exactly is expected to happen. @sideshowbarker's comment here indicates that instead of this being an entity resolver hack within the constraints of XML conformance, this would involve patching an XML parser not to be conforming to XML.

I'm reluctant to proceed for four reasons.

  1. XML lost on the Web to the point that there isn't the enthusiasm for serving XHTML as application/xhtml+xml that existed in 2006. In that sense, it seems to it's no longer worthwhile to stir this space by trying to solve problems.

  2. If we do want to stir this space, I think we should do XML5 fully with an exact spec instead of ad hoc patching an XML 1.0 parser in an ill-specified manner.

  3. I'm worried about interop with non-browser consumers in the ad hoc patching scenario. In the XML5 scenario, it would be clear that a bunch of systematic activity to write XML5 parsers for various programming languages is needed. With the ad hoc patching scenario, people would face interop problem without a clear way to resolve them. ("You need an XML 1.0 parser with this hack" is a worse story than "You need an XML5 parser".)

  4. The current solution of having to stick particular aesthetically displeasing boilerplate at the top of the XML source already works in browsers and can be explained in terms of an entity resolver configuration for non-browsers. That's less disruptive than sending everyone on the treadmill to patch software in order to have a more aesthetically pleasing boilerplate.

Base automatically changed from master to main January 15, 2021 07:56
@annevk
Copy link
Member

annevk commented Apr 28, 2021

I suggest we close this. @sideshowbarker?

@sideshowbarker
Copy link
Member Author

I suggest we close this. @sideshowbarker?

Yup

@sideshowbarker sideshowbarker deleted the xhtml-named-charrefs-no-doctype branch April 28, 2021 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge yet Pull request must not be merged per rationale in comment needs implementer interest Moving the issue forward requires implementers to express interest

Development

Successfully merging this pull request may close these issues.

5 participants