Inspiration
Reddit moderators constantly battle repost bots and karma-farming accounts that recycle previously successful posts with slightly rewritten titles. Existing moderation systems typically rely on exact text matching, regex rules, or external AI infrastructure, all of which have major limitations.
Traditional approaches fail when:
- Words are reordered
- Synonyms are substituted
- Punctuation changes
- Small edits are introduced
Meanwhile, modern semantic systems often require expensive vector databases, external APIs, GPU infrastructure, and additional backend services that are difficult to operate within Devvit’s lightweight serverless ecosystem.
We wanted to build a fully native Reddit moderation engine capable of detecting rewritten reposts in real time while running entirely inside Devvit using only Redis-native infrastructure.
What it does
Phantom is a launch-ready Devvit moderation engine that detects:
- rewritten reposts,
- karma farming,
- duplicate submissions,
- and repeat spam offenders
using Redis-native SimHash and LSH banding without relying on external AI services or vector databases.
Core Pipeline
[ New Reddit Post ]
│
▼
┌──────────────────────────┐
│ Text Normalization │
│ Remove URLs & symbols │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Weighted Character │
│ 3/4-Gram Shingling │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ 64-bit SimHash │
│ Signature Generation │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Redis LSH Band Search │
│ Candidate Retrieval │
└──────────────────────────┘
│
▼
┌──────────────────────────┐
│ Hamming Distance │
│ Verification │
└──────────────────────────┘
│
┌──────┴────────────┐
▼ ▼
Auto Remove Moderator Report
Main Features
| Feature | Description |
|---|---|
| SimHash Detection | Detects rewritten reposts through locality-sensitive hashing |
| LSH Banding | Enables near (O(1)) similarity lookup |
| Two-Tier Enforcement | Auto-removes severe reposts and reports medium-confidence matches |
| Interactive Dashboard | Displays live moderation analytics directly on Reddit |
| Offender Tracking | Escalates repeat offenders automatically |
| Retention Sweeping | Automatically prunes outdated signatures |
| Zero External Services | Fully native to Devvit + Redis |
How we built it
Phantom combines several information retrieval and distributed systems concepts into a lightweight moderation engine.
Architecture
┌──────────────────────┐
│ Reddit Post Submit │
└──────────┬───────────┘
│
▼
┌─────────────────────────┐
│ Devvit Trigger Handler │
└──────────┬──────────────┘
│
▼
┌─────────────────────────────────┐
│ Text Normalization & Shingling │
└──────────┬──────────────────────┘
│
▼
┌─────────────────────────────────┐
│ SimHash Signature Generation │
└──────────┬──────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Redis LSH Band Candidate Search │
└──────────┬──────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Exact Hamming Verification │
└──────────┬──────────────────────┘
│
┌───────────┴────────────┐
▼ ▼
Auto Remove Moderator Report
Technical Stack
| Layer | Purpose |
|---|---|
| Weighted Character Shingling | Robust typo-resistant feature extraction |
| FNV-1a Hashing | Fast 64-bit deterministic hashing |
| SimHash | Locality-sensitive signature generation |
| LSH Banding | Efficient candidate reduction |
| Hamming Distance | Exact similarity verification |
| Redis Sorted Sets | Native scalable indexing |
| Devvit UI Components | Interactive moderation dashboard |
Mathematical Foundation
FNV-1a Hashing
For every byte (d):
$$ \text{hash} = (\text{hash} \oplus d) \times \text{FNV_prime} $$
$$ \text{hash} \bmod 2^{64} $$
SimHash Construction
For each shingle:
- bit (1) adds weight
- bit (0) subtracts weight
Final signature bit:
$$ V_i > 0 \Rightarrow 1 $$
$$ V_i \leq 0 \Rightarrow 0 $$
LSH Banding
We split a 64-bit SimHash into:
$$ 64 = 8 \times 8 $$
creating:
- 8 bands,
- each containing 8 bits.
Posts sharing at least one band become candidate matches.
Hamming Distance
Similarity is verified using:
$$ d(x,y)=\sum_{i=1}^{64}(x_i \oplus y_i) $$
Lower distance indicates stronger similarity.
Challenges we ran into
| Challenge | Solution |
|---|---|
| Detecting rewritten reposts without embeddings | Implemented weighted SimHash with mixed 3/4-gram shingles |
| Preventing expensive full-database scans | Used Redis-native LSH banding |
| Working inside Devvit serverless limits | Built lightweight native indexing structures |
| Reducing false positives | Added multi-tier moderation thresholds |
| Maintaining performance over time | Designed automated retention sweeper jobs |
| Providing moderator transparency | Built a live Reddit-native analytics dashboard |
Accomplishments that we're proud of
| Achievement | Impact |
|---|---|
| Built fully native approximate similarity search | Eliminated dependency on external vector infrastructure |
| Achieved near (O(1)) candidate retrieval | Enabled scalable moderation workflows |
| Designed a launch-ready moderation dashboard | Improved moderator visibility and usability |
| Added automated retention sweeping | Maintained stable long-term performance |
| Integrated real-time offender escalation | Reduced manual moderation workload |
| Combined IR algorithms with moderation tooling | Introduced new capabilities to the Devvit ecosystem |
What we learned
Technical Learnings
| Topic | Insight |
|---|---|
| Locality-Sensitive Hashing | Approximate search can outperform heavier AI pipelines for moderation tasks |
| Redis Data Structures | Sorted sets are extremely powerful for temporal indexing |
| SimHash | Character-level similarity is highly robust against repost mutation |
| Devvit Architecture | Native integrations dramatically simplify deployment |
| Probabilistic Search | Candidate filtering is essential for scalable real-time systems |
Product Learnings
| Observation | Takeaway |
|---|---|
| Moderators value reliability | Stability matters more than flashy AI |
| Explainability matters | Similarity scores improve moderator trust |
| False positives are expensive | Multi-tier enforcement is critical |
| Native installs improve adoption | Zero-config deployment reduces friction |
What's next for Phantom
| Future Feature | Goal |
|---|---|
| Image perceptual hashing | Detect reposted memes and edited images |
| Cross-subreddit intelligence | Detect coordinated repost campaigns |
| Adaptive thresholds | Dynamically tune moderation sensitivity |
| Hybrid semantic search | Combine SimHash with embedding workflows |
| Moderator training mode | Learn from moderator feedback |
| Expanded analytics | Add spam heatmaps and trend analysis |
| Federated moderation insights | Surface coordinated spam networks |
Long-Term Vision
We envision Phantom evolving into a fully native Reddit moderation intelligence layer capable of detecting:
- coordinated spam behavior,
- content recycling networks,
- and large-scale karma farming campaigns,
while remaining:
- lightweight,
- explainable,
- privacy-friendly,
- and entirely integrated within the Devvit ecosystem.
Built With
- devvit
- fnv-1a
- hamming-distance
- locality-sensitive-hashing
- node.js
- npm
- reddit-developer-platform
- redis
- simhash
- tsx
- typescript



Log in or sign up for Devpost to join the conversation.