Phantom | Devpost

Subreddit homepage
Dashboard

Inspiration

Reddit moderators constantly battle repost bots and karma-farming accounts that recycle previously successful posts with slightly rewritten titles. Existing moderation systems typically rely on exact text matching, regex rules, or external AI infrastructure, all of which have major limitations.

Traditional approaches fail when:

Words are reordered
Synonyms are substituted
Punctuation changes
Small edits are introduced

Meanwhile, modern semantic systems often require expensive vector databases, external APIs, GPU infrastructure, and additional backend services that are difficult to operate within Devvit’s lightweight serverless ecosystem.

We wanted to build a fully native Reddit moderation engine capable of detecting rewritten reposts in real time while running entirely inside Devvit using only Redis-native infrastructure.

What it does

Phantom is a launch-ready Devvit moderation engine that detects:

rewritten reposts,
karma farming,
duplicate submissions,
and repeat spam offenders

using Redis-native SimHash and LSH banding without relying on external AI services or vector databases.

Core Pipeline

[ New Reddit Post ]
        │
        ▼
┌──────────────────────────┐
│ Text Normalization       │
│ Remove URLs & symbols    │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Weighted Character       │
│ 3/4-Gram Shingling       │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ 64-bit SimHash           │
│ Signature Generation     │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Redis LSH Band Search    │
│ Candidate Retrieval      │
└──────────────────────────┘
        │
        ▼
┌──────────────────────────┐
│ Hamming Distance         │
│ Verification             │
└──────────────────────────┘
        │
 ┌──────┴────────────┐
 ▼                   ▼
Auto Remove      Moderator Report

Main Features

Feature	Description
SimHash Detection	Detects rewritten reposts through locality-sensitive hashing
LSH Banding	Enables near (O(1)) similarity lookup
Two-Tier Enforcement	Auto-removes severe reposts and reports medium-confidence matches
Interactive Dashboard	Displays live moderation analytics directly on Reddit
Offender Tracking	Escalates repeat offenders automatically
Retention Sweeping	Automatically prunes outdated signatures
Zero External Services	Fully native to Devvit + Redis

How we built it

Phantom combines several information retrieval and distributed systems concepts into a lightweight moderation engine.

Architecture

                 ┌──────────────────────┐
                 │ Reddit Post Submit   │
                 └──────────┬───────────┘
                            │
                            ▼
              ┌─────────────────────────┐
              │ Devvit Trigger Handler  │
              └──────────┬──────────────┘
                         │
                         ▼
         ┌─────────────────────────────────┐
         │ Text Normalization & Shingling  │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ SimHash Signature Generation    │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ Redis LSH Band Candidate Search │
         └──────────┬──────────────────────┘
                    │
                    ▼
         ┌─────────────────────────────────┐
         │ Exact Hamming Verification      │
         └──────────┬──────────────────────┘
                    │
        ┌───────────┴────────────┐
        ▼                        ▼
 Auto Remove               Moderator Report

Technical Stack

Layer	Purpose
Weighted Character Shingling	Robust typo-resistant feature extraction
FNV-1a Hashing	Fast 64-bit deterministic hashing
SimHash	Locality-sensitive signature generation
LSH Banding	Efficient candidate reduction
Hamming Distance	Exact similarity verification
Redis Sorted Sets	Native scalable indexing
Devvit UI Components	Interactive moderation dashboard

Mathematical Foundation

FNV-1a Hashing

For every byte (d):

$$ \text{hash} = (\text{hash} \oplus d) \times \text{FNV_prime} $$

$$ \text{hash} \bmod 2^{64} $$

SimHash Construction

For each shingle:

bit (1) adds weight
bit (0) subtracts weight

Final signature bit:

$$ V_i > 0 \Rightarrow 1 $$

$$ V_i \leq 0 \Rightarrow 0 $$

LSH Banding

We split a 64-bit SimHash into:

$$ 64 = 8 \times 8 $$

creating:

8 bands,
each containing 8 bits.

Posts sharing at least one band become candidate matches.

Hamming Distance

Similarity is verified using:

$$ d(x,y)=\sum_{i=1}^{64}(x_i \oplus y_i) $$

Lower distance indicates stronger similarity.

Challenges we ran into

Challenge	Solution
Detecting rewritten reposts without embeddings	Implemented weighted SimHash with mixed 3/4-gram shingles
Preventing expensive full-database scans	Used Redis-native LSH banding
Working inside Devvit serverless limits	Built lightweight native indexing structures
Reducing false positives	Added multi-tier moderation thresholds
Maintaining performance over time	Designed automated retention sweeper jobs
Providing moderator transparency	Built a live Reddit-native analytics dashboard

Accomplishments that we're proud of

Achievement	Impact
Built fully native approximate similarity search	Eliminated dependency on external vector infrastructure
Achieved near (O(1)) candidate retrieval	Enabled scalable moderation workflows
Designed a launch-ready moderation dashboard	Improved moderator visibility and usability
Added automated retention sweeping	Maintained stable long-term performance
Integrated real-time offender escalation	Reduced manual moderation workload
Combined IR algorithms with moderation tooling	Introduced new capabilities to the Devvit ecosystem

What we learned

Technical Learnings

Topic	Insight
Locality-Sensitive Hashing	Approximate search can outperform heavier AI pipelines for moderation tasks
Redis Data Structures	Sorted sets are extremely powerful for temporal indexing
SimHash	Character-level similarity is highly robust against repost mutation
Devvit Architecture	Native integrations dramatically simplify deployment
Probabilistic Search	Candidate filtering is essential for scalable real-time systems

Product Learnings

Observation	Takeaway
Moderators value reliability	Stability matters more than flashy AI
Explainability matters	Similarity scores improve moderator trust
False positives are expensive	Multi-tier enforcement is critical
Native installs improve adoption	Zero-config deployment reduces friction

What's next for Phantom

Future Feature	Goal
Image perceptual hashing	Detect reposted memes and edited images
Cross-subreddit intelligence	Detect coordinated repost campaigns
Adaptive thresholds	Dynamically tune moderation sensitivity
Hybrid semantic search	Combine SimHash with embedding workflows
Moderator training mode	Learn from moderator feedback
Expanded analytics	Add spam heatmaps and trend analysis
Federated moderation insights	Surface coordinated spam networks