A Rust implementation of twitter-text that parses tweet text using a Pest PEG grammar. Includes bindings for Ruby, Python, Java, C++, Swift, and WebAssembly.
- Entity extraction: URLs, @mentions, #hashtags, $cashtags, and emoji
- Tweet validation: 280 weighted character limit with configurable weights
- Autolinking: Convert entities to HTML links
- Hit highlighting: Highlight search terms in tweet text
- Unicode 17.0: Full emoji support including ZWJ sequences and skin tone modifiers
cargo build
cargo test# Build everything
bazel build //rust/...
# Run all tests
bazel test //rust/...| Language | Directory | Requirements | Technology |
|---|---|---|---|
| Ruby | rust/ruby-bindings/ |
Ruby 3.3+ | Magnus FFI |
| Python | rust/python-bindings/ |
Python 3.12 | PyO3 |
| Java | rust/java-bindings/ |
JDK 23+ | Foreign Function & Memory API |
| C++ | rust/cpp-bindings/ |
C++17 | cxx.rs |
| Swift | rust/swift-bindings/ |
Swift 6.0+ | C FFI |
| WebAssembly | rust/wasm-bindings/ |
- | wasm-bindgen |
# Ruby
bazel build //rust/ruby-bindings:twittertext
# Python
bazel build //rust/python-bindings:twitter_text
# Java
bazel build //rust/java-bindings:twitter_text_java_ffm
# C++
bazel build //rust/cpp-bindings/...
# Swift
bazel build //rust/swift-bindings:TwitterText
# WebAssembly
bazel build //rust/wasm-bindings:twitter_text_wasm- PEG Grammar Parser (
rust/parser/): Pest grammar for parsing tweet entities - Nom Parser (
rust/twitter-text/src/nom_parser/): High performance parser using Nom, tested against the PEG grammar for correctness - Main Library (
rust/twitter-text/): Extraction, validation, autolinking, and highlighting - Configuration (
rust/config/): Character weights and URL length settings - Conformance Tests (
rust/conformance/): Tests against canonical twitter-text test suites
The grammar processes entities in this order to resolve ambiguities:
- URLs (including t.co short URLs)
- Hashtags
- Mentions
- Cashtags
Most of these are downloaded by Bazel. These dependencies are listed to explain which versions they are implemented in. A notable exception are the Ruby dependencies, because they are not fully managed by Bazel.
- Rust: 1.91.1+
- Bazel: 8.4.2+ (for full build)
- Ruby: 3.3+ (requires libyaml:
brew install libyamlon macOS) - Python: 3.12
- Java: JDK 23+
- LLVM: 17.0.6 (hermetic toolchain via Bazel)
This implementation passes the canonical twitter-text conformance tests in conformance/*.yml. These tests cover:
- Autolink (URL/mention/hashtag linking)
- Extract (entity extraction)
- Validation (tweet validity)
- Hit highlighting
Apache 2.0