Inspiration

Reddit moderators constantly battle repost bots and karma-farming accounts that recycle previously successful posts with slightly rewritten titles. Existing moderation systems typically rely on exact text matching, regex rules, or external AI infrastructure, all of which have major limitations.

Traditional approaches fail when:

  • Words are reordered
  • Synonyms are substituted
  • Punctuation changes
  • Small edits are introduced

Meanwhile, modern semantic systems often require expensive vector databases, external APIs, GPU infrastructure, and additional backend services that are difficult to operate within Devvit’s lightweight serverless ecosystem.

We wanted to build a fully native Reddit moderation engine capable of detecting rewritten reposts in real time while running entirely inside Devvit using only Redis-native infrastructure.


What it does

Phantom is a launch-ready Devvit moderation engine that detects:

  • rewritten reposts,
  • karma farming,
  • duplicate submissions,
  • and repeat spam offenders

using Redis-native SimHash and LSH banding without relying on external AI services or vector databases.

Core Pipeline

[ New Reddit Post ]
        │
        ▼
┌──────────────────────────┐
│ Text Normalization       │
│ Remove URLs & symbols    │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Weighted Character       │
│ 3/4-Gram Shingling       │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ 64-bit SimHash           │
│ Signature Generation     │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Redis LSH Band Search    │
│ Candidate Retrieval      │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Hamming Distance         │
│ Verification             │
└──────────────────────────┘
        │
 ┌──────┴────────────┐
 ▼                   ▼
Auto Remove      Moderator Report

Main Features

Feature Description
SimHash Detection Detects rewritten reposts through locality-sensitive hashing
LSH Banding Enables near (O(1)) similarity lookup
Two-Tier Enforcement Auto-removes severe reposts and reports medium-confidence matches
Interactive Dashboard Displays live moderation analytics directly on Reddit
Offender Tracking Escalates repeat offenders automatically
Retention Sweeping Automatically prunes outdated signatures
Zero External Services Fully native to Devvit + Redis

How we built it

Phantom combines several information retrieval and distributed systems concepts into a lightweight moderation engine.

Architecture

                 ┌──────────────────────┐
                 │ Reddit Post Submit   │
                 └──────────┬───────────┘
                            │
                            ▼
              ┌─────────────────────────┐
              │ Devvit Trigger Handler  │
              └──────────┬──────────────┘
                         │
                         ▼
         ┌─────────────────────────────────┐
         │ Text Normalization & Shingling  │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ SimHash Signature Generation    │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ Redis LSH Band Candidate Search │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ Exact Hamming Verification      │
         └──────────┬──────────────────────┘
                    │
        ┌───────────┴────────────┐
        ▼                        ▼
 Auto Remove               Moderator Report

Technical Stack

Layer Purpose
Weighted Character Shingling Robust typo-resistant feature extraction
FNV-1a Hashing Fast 64-bit deterministic hashing
SimHash Locality-sensitive signature generation
LSH Banding Efficient candidate reduction
Hamming Distance Exact similarity verification
Redis Sorted Sets Native scalable indexing
Devvit UI Components Interactive moderation dashboard

Mathematical Foundation

FNV-1a Hashing

For every byte (d):

$$ \text{hash} = (\text{hash} \oplus d) \times \text{FNV_prime} $$

$$ \text{hash} \bmod 2^{64} $$


SimHash Construction

For each shingle:

  • bit (1) adds weight
  • bit (0) subtracts weight

Final signature bit:

$$ V_i > 0 \Rightarrow 1 $$

$$ V_i \leq 0 \Rightarrow 0 $$


LSH Banding

We split a 64-bit SimHash into:

$$ 64 = 8 \times 8 $$

creating:

  • 8 bands,
  • each containing 8 bits.

Posts sharing at least one band become candidate matches.


Hamming Distance

Similarity is verified using:

$$ d(x,y)=\sum_{i=1}^{64}(x_i \oplus y_i) $$

Lower distance indicates stronger similarity.


Challenges we ran into

Challenge Solution
Detecting rewritten reposts without embeddings Implemented weighted SimHash with mixed 3/4-gram shingles
Preventing expensive full-database scans Used Redis-native LSH banding
Working inside Devvit serverless limits Built lightweight native indexing structures
Reducing false positives Added multi-tier moderation thresholds
Maintaining performance over time Designed automated retention sweeper jobs
Providing moderator transparency Built a live Reddit-native analytics dashboard

Accomplishments that we're proud of

Achievement Impact
Built fully native approximate similarity search Eliminated dependency on external vector infrastructure
Achieved near (O(1)) candidate retrieval Enabled scalable moderation workflows
Designed a launch-ready moderation dashboard Improved moderator visibility and usability
Added automated retention sweeping Maintained stable long-term performance
Integrated real-time offender escalation Reduced manual moderation workload
Combined IR algorithms with moderation tooling Introduced new capabilities to the Devvit ecosystem

What we learned

Technical Learnings

Topic Insight
Locality-Sensitive Hashing Approximate search can outperform heavier AI pipelines for moderation tasks
Redis Data Structures Sorted sets are extremely powerful for temporal indexing
SimHash Character-level similarity is highly robust against repost mutation
Devvit Architecture Native integrations dramatically simplify deployment
Probabilistic Search Candidate filtering is essential for scalable real-time systems

Product Learnings

Observation Takeaway
Moderators value reliability Stability matters more than flashy AI
Explainability matters Similarity scores improve moderator trust
False positives are expensive Multi-tier enforcement is critical
Native installs improve adoption Zero-config deployment reduces friction

What's next for Phantom

Future Feature Goal
Image perceptual hashing Detect reposted memes and edited images
Cross-subreddit intelligence Detect coordinated repost campaigns
Adaptive thresholds Dynamically tune moderation sensitivity
Hybrid semantic search Combine SimHash with embedding workflows
Moderator training mode Learn from moderator feedback
Expanded analytics Add spam heatmaps and trend analysis
Federated moderation insights Surface coordinated spam networks

Long-Term Vision

We envision Phantom evolving into a fully native Reddit moderation intelligence layer capable of detecting:

  • coordinated spam behavior,
  • content recycling networks,
  • and large-scale karma farming campaigns,

while remaining:

  • lightweight,
  • explainable,
  • privacy-friendly,
  • and entirely integrated within the Devvit ecosystem.

Built With

Share this project:

Updates