A dependency-free HTML5 parser for Swift, following the WHATWG HTML parsing specification.
Swift port of justhtml (Python) and justjshtml (JavaScript).
- Full HTML5 Compliance - Passes all 1,831 html5lib tree construction tests
- Zero Dependencies - Pure Swift using only standard library and Foundation
- Cross-Platform - macOS, iOS, tvOS, watchOS, visionOS, and Linux
- CSS Selectors - Query documents using standard CSS selector syntax
- Multiple Output Formats - Serialize to HTML, plain text, or Markdown
- Streaming API - Memory-efficient event-based parsing
- Fragment Parsing - Parse HTML fragments in specific contexts
Add swift-justhtml to your Package.swift:
dependencies: [
.package(url: "https://github.com/kylehowells/swift-justhtml.git", from: "0.3.0")
]Then add it to your target's dependencies:
targets: [
.target(
name: "YourTarget",
dependencies: ["justhtml"]
)
]- File > Add Package Dependencies...
- Enter:
https://github.com/kylehowells/swift-justhtml.git - Select version: 0.3.0 or later
import justhtml
// Parse an HTML document
let doc = try JustHTML("<html><body><p>Hello, World!</p></body></html>")
// Access the document tree
print(doc.root.children) // [<html>]// Query with CSS selectors
let paragraphs = try doc.query("p")
let byClass = try doc.query(".intro")
let byId = try doc.query("#header")
let complex = try doc.query("nav > ul > li > a[href]")
// Check if a node matches a selector
let matches = try node.matches(".highlight")// Get plain text
let text = doc.toText()
// Serialize to HTML
let html = doc.toHTML()
let prettyHtml = doc.toHTML(pretty: true, indentSize: 4)
// Convert to Markdown
let markdown = doc.toMarkdown()// Parse HTML as if inside a specific element
let ctx = FragmentContext("tbody")
let fragment = try JustHTML("<tr><td>Cell</td></tr>", fragmentContext: ctx)// Memory-efficient event-based parsing
for event in HTMLStream("<p>Hello</p>") {
switch event {
case .start(let tag, let attrs):
print("Start: \(tag)")
case .end(let tag):
print("End: \(tag)")
case .text(let content):
print("Text: \(content)")
case .comment(let text):
print("Comment: \(text)")
case .doctype(let name, let publicId, let systemId):
print("Doctype: \(name)")
}
}// Strict mode - throws on first parse error
do {
let doc = try JustHTML("<p>Unclosed", strict: true)
} catch let error as StrictModeError {
print("Error: \(error.parseError.code)")
}
// Collect errors without throwing
let doc = try JustHTML("<p>Unclosed", collectErrors: true)
for error in doc.errors {
print("\(error.line):\(error.column): \(error.code)")
}swift-justhtml implements the WHATWG HTML parsing specification exactly and passes all tests from the official html5lib-tests suite (used by browser vendors), the same as justhtml.
| Test Suite | Passed | Failed |
|---|---|---|
| Tree Construction | 1,831 | 0 |
| Tokenizer | 6,810 | 0 |
| Serializer | 230 | 0 |
| Encoding | 82 | 0 |
| Total | 8,953 | 0 |
The parser has been fuzz tested with millions of randomized and malformed HTML documents to ensure it never crashes or hangs on any input:
- Random data fuzzing with varying document sizes
- Fragment context fuzzing
- Deep nesting stress tests
- Malformed tag and entity sequences
Run the fuzzer: swift test --filter fuzzTest
swift-justhtml is optimized for performance, matching or exceeding JavaScript implementations:
| Implementation | Parse Time | Comparison |
|---|---|---|
| Swift | 97ms | - |
| JavaScript | 99ms | 1.02x slower |
| Python | 398ms | 4.1x slower |
Benchmark: Parsing 2.5MB of HTML across 5 Wikipedia articles
See Benchmarks/BENCHMARK_RESULTS.md for detailed performance comparison.
| Implementation | Peak RSS | Comparison |
|---|---|---|
| Swift | 103 MB | - |
| Python | 106 MB | 1.03x more |
| JavaScript | 226 MB | 2.2x more |
Benchmark: Average peak memory across 6 test files including 20MB synthetic HTML
See Benchmarks/MEMORY_RESULTS.md for detailed memory comparison.
| Library | html5lib Pass Rate | Crashes/Hangs | Dependencies |
|---|---|---|---|
| swift-justhtml | 100% (1831/1831) | None | None |
| Kanna | 94.4% (1542/1633) | None | libxml2 |
| SwiftSoup | 87.9% (1436/1633) | Infinite loop on 197 tests | swift-atomics |
| LilHTML | 47.4% (775/1634) | Crashes on 855 tests | libxml2 |
See notes/comparison.md for detailed library comparison.
| Platform | Minimum Version |
|---|---|
| macOS | 13.0+ |
| iOS | 16.0+ |
| tvOS | 16.0+ |
| watchOS | 9.0+ |
| visionOS | 1.0+ |
| Linux | Swift 6.0+ |
MIT License - see LICENSE for details.
- Original Python implementation: justhtml by Emil Stenström
- JavaScript port: justjshtml by Simon Willison
- Test suite: html5lib-tests