Image

Imageutil wrote in Imagejava_dev

Read DOM XHTML without net connection?

We have some code that depends on being able to locate items by ID within an XHTML document. Say, Document doc = getPage("a.xhtml");. (getPage is pasted below.) Our code depends on being able to use Node right = doc.getElementById("right") to get the element with ID "right" in "a.xhtml".

Doing so seems to require a net connection. This is because if the XHTML file contains a DTD declaration, the parser seems to try to open a connection to download that DTD and will exit with an exception if a connection isn't available. On the other hand, if we omit the DTD declaration, the parser doesn't attempt to open a net connection, but the Document returned by the parser returns null for all calls to getElementById, and so is useless from our code's perspective. So, my problem is to find a way to parse the XHTML into a Document without requiring a net connection and while allowing elements to be located by ID. Any suggestions? I'm open to suggestions specific to Apache's Xerces classes.

Is the DTD required because without it the parser cannot distinguish which tags define elements? If not, why is the DTD required for getElementById to work as expected? Is there a way to provide a local copy of the DTD for the parsing stage while still maintaining a DTD declaration in the XHTML that will make sense to non-local Web browsers?

Thank you.

private static Document getPage(String filename) throws SAXException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try {
   DocumentBuilder builder = factory.newDocumentBuilder();
   return builder.parse(pathForFile(filename));
}
catch (ParserConfigurationException pce) {
   throw new RuntimeException(pce);
}
}