Catches reworded reposts and karma-farm reuploads on Reddit — even when titles are rewritten — using Locality-Sensitive Hashing entirely inside Devvit's native Redis. No external service, no vector DB.
Phantom is a high-performance, fully native Reddit moderation tool built on the Devvit Developer Platform. Reposts and duplicate uploads are the #1 moderation time-sink on Reddit. Karma-farming accounts frequently rephrase or rewrite post titles to evade standard text detectors.
Phantom solves this by using Locality-Sensitive Hashing (LSH) and SimHash. Rather than using fragile exact-string matches or relying on costly external vector databases, Phantom runs 100% natively inside Devvit's serverless environment, querying and index-scanning duplicates in milliseconds using Devvit's native Redis sorted sets.
Below is the workflow and execution lifecycle when a post is submitted to the subreddit:
sequenceDiagram
autonumber
actor User as Submitter
participant Reddit as Reddit Core API
participant Phantom as Phantom Trigger
participant Redis as Devvit Redis (Native)
participant Mod as Mod Queue / Modmail
User->>Reddit: Submits new Post
Reddit->>Phantom: Fires PostSubmit Event
Phantom->>Redis: Check alreadySeen (NX Set)
Redis-->>Phantom: Returns false
Phantom->>Phantom: Normalize text (remove URLs, strip punctuation)
Phantom->>Phantom: Generate weighted character 3-/4-gram shingles
Phantom->>Phantom: Compute 64-bit SimHash
Phantom->>Redis: Increment Telemetry Scans
Phantom->>Phantom: Extract 8 Band Keys (8 bits each)
Phantom->>Redis: Query candidates matching bands in lookback window
Redis-->>Phantom: Returns candidate list
loop For each Candidate
Phantom->>Redis: Get Candidate Metadata & Hash
Redis-->>Phantom: Returns Candidate Record
Phantom->>Phantom: Calculate Hamming Distance (bit popcount)
end
alt Hamming Distance <= Auto-Remove Threshold (default <= 5)
Phantom->>Reddit: Auto-remove post
Phantom->>Reddit: Post stickied bot comment (Removed)
Phantom->>Redis: Increment Auto-Removal Telemetry
Phantom->>Redis: Increment Author Dupe Score
else Hamming Distance <= Report Threshold (default <= 15)
Phantom->>Reddit: Report post to mod queue
Phantom->>Reddit: Post stickied bot comment (Reported)
Phantom->>Redis: Increment Mod-Report Telemetry
Phantom->>Redis: Increment Author Dupe Score
end
opt Author Dupe Score >= Escalation Count (default 3)
Phantom->>Reddit: Send Modmail Alert (Repeat Offender)
Phantom->>Redis: Increment Modmail Telemetry
end
Phantom->>Redis: Index post (Hash & Band Sorted Sets)
Phantom->>Reddit: Done
| Feature | Description |
|---|---|
| Real-time detection | Runs instantly on every new submission using serverless Devvit hooks. |
| SimHash Matching | Character-level 3-gram and 4-gram shingles ensure robustness against typos, rewordings, and synonyms. |
| Locality-Sensitive Hashing | Uses the LSH banding trick (8 bands of 8 bits) to query candidates in near-$O(1)$ time, avoiding slow full-database scans. |
| Tiered Auto-Actions | Auto-deletes high-confidence duplicates and reports medium-confidence matches to the mod queue. |
| Sticky Moderation Comments | Posts a stickied, distinguished bot comment on duplicates detailing similarity scores and linking directly to the original post. |
| Modmail Escalation | Monitors repeat duplicate offenders and triggers automated modmail alerts if they exceed the configured count. |
| Interactive Dashboard | Renders a gorgeous live custom post featuring Scans, Caught Duplicates, dynamic SVG histograms of Hamming distance distributions, and top repeat offenders. |
| Administration Menus | Mod-only options to manually find duplicates for any post, check subreddit stats, or deploy the interactive dashboard. |
| Hourly Retention Sweeper | Cron job sweeps expired posts from Redis hashes and LSH indices, keeping memory usage constant and bounded. |
| Installation Backfill | Auto-indexes the last 100 posts upon initial app install so it goes live with active history immediately. |
Moderators can easily configure the sensitivity of Phantom from the Reddit App Settings panel:
| Setting Key | Type | Default | Description |
|---|---|---|---|
textThreshold |
Number | 15 |
Report threshold (Hamming bit distance; lower is stricter). Matches at or below this value are reported. |
autoRemoveThreshold |
Number | 5 |
Auto-remove threshold. Matches at or below this distance are auto-deleted. (Set to 0 to disable auto-removals). |
modmailEscalationCount |
Number | 3 |
Offender limit. Number of caught duplicates by the same author before triggering a modmail ban alert. |
lookbackDays |
Number | 30 |
Lookback window (in days) to compare submissions against. |
ignoreCrossposts |
Boolean | true |
Skip analyzing cross-posts from other subreddits. |
- Node.js (v18+)
- npm
- Devvit CLI installed and configured (
npm install -g @devvit/cli)
- Clone the repository:
git clone https://github.com/omshukla24/Phantom-Mod.git cd Phantom-Mod - Install dependencies:
npm install
- Log in to your Devvit account:
devvit login
- Build the application:
npm run build
- Playtest/deploy the app to your moderated subreddit:
devvit playtest <subreddit-name>
├── src/
│ ├── handlers/
│ │ ├── dashboard.tsx # Live interactive custom post & statistics UI
│ │ ├── findDuplicates.ts # Context menu action for manual duplicate checking
│ │ ├── onPostSubmit.ts # Main submit trigger pipeline & enforcement actions
│ │ ├── retention.ts # Hourly cron job retention sweeper & install backfill
│ │ └── statsMenu.ts # Context menu action displaying raw telemetry
│ ├── lib/
│ │ ├── banding.ts # LSH 8-bit split banding logic
│ │ ├── hamming.ts # Hamming distance popcount & similarity calculations
│ │ ├── normalize.ts # Regex text normalization & shingle generator
│ │ ├── simhash.ts # FNV-1a weight-accumulating SimHash
│ │ └── store.ts # Redis database key schemas, getters, and setters
│ ├── main.ts # Devvit configuration and handlers import
│ └── settings.ts # App configuration schema definitions
├── tests/ # Algorithmic test suite
└── package.json
This project is licensed under the MIT License - see the LICENSE file for details.