Corporate Narrative Consistency Engine

An end-to-end AI system that detects contradictions in corporate disclosures and executive statements, and measures how those contradictions correlate with stock price movements.

Overview

This system automatically:

Collects corporate filings (SEC 8-K, 10-K, 10-Q) and executive statements from news sources
Extracts structured claims made by companies and executives using NLP
Detects contradictions and narrative shifts across time using Natural Language Inference
Aligns contradiction events with stock price movements
Learns whether contradictions correlate with positive, negative, or neutral stock signals
Provides explainable evidence for each detected signal

Core Research Question

Do contradictions or narrative shifts in corporate disclosures and executive statements predict short-term stock price movements?

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                           DATA INGESTION LAYER                               │
├─────────────────┬─────────────────────────┬─────────────────────────────────┤
│   SEC EDGAR     │    News Sources         │      Market Data                │
│   (8-K, 10-K,   │    (GDELT, NewsAPI,     │      (yfinance)                 │
│    10-Q)        │     RSS Feeds)          │                                 │
└────────┬────────┴───────────┬─────────────┴──────────────┬──────────────────┘
         │                    │                            │
         ▼                    ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                              NLP PIPELINE                                    │
├─────────────────┬─────────────────────────┬─────────────────────────────────┤
│  Document       │   Claim Extraction      │   Claim Matching                │
│  Parsing        │   (Claude API)          │   (Embeddings)                  │
└────────┬────────┴───────────┬─────────────┴──────────────┬──────────────────┘
         │                    │                            │
         ▼                    ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        CONTRADICTION DETECTION                               │
│                    (DeBERTa NLI Model + Temporal Guard)                      │
└─────────────────────────────────┬───────────────────────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           ANALYSIS LAYER                                     │
├─────────────────┬─────────────────────────┬─────────────────────────────────┤
│  Event          │   Market Alignment      │   Feature Engineering           │
│  Generation     │   (Returns, Abnormal)   │                                 │
└────────┬────────┴───────────┬─────────────┴──────────────┬──────────────────┘
         │                    │                            │
         ▼                    ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                           MODELING LAYER                                     │
├─────────────────┬─────────────────────────┬─────────────────────────────────┤
│  Baseline       │   Signal Prediction     │   Explainability                │
│  Models         │   (XGBoost)             │   (SHAP)                        │
└────────┬────────┴───────────┬─────────────┴──────────────┬──────────────────┘
         │                    │                            │
         ▼                    ▼                            ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                          EVALUATION LAYER                                    │
├─────────────────┬───────────────┬───────────────┬───────────────────────────┤
│  Metrics        │  Signal Decay │  Robustness   │  Calibration              │
│  (P/R/F1/AUC)   │  Analysis     │  Tests        │  Analysis                 │
└─────────────────┴───────────────┴───────────────┴───────────────────────────┘
                                  │
                                  ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         GRADIO DASHBOARD                                     │
│         (Company Explorer, Contradiction Viewer, Signal Dashboard)           │
└─────────────────────────────────────────────────────────────────────────────┘

Project Structure

corporate-narrative-engine/
├── .env.template           # Environment variables template
├── .gitignore              # Git ignore rules
├── README.md               # This file
├── requirements.txt        # Python dependencies
├── pyproject.toml          # Project metadata
│
├── configs/                # Configuration files
│   ├── pipeline_config.yaml
│   ├── model_config.yaml
│   └── evaluation_config.yaml
│
├── data/                   # Data storage (gitignored)
│   ├── raw/
│   │   ├── sec_filings/
│   │   └── news_articles/
│   ├── processed/
│   │   ├── claims/
│   │   └── events/
│   └── market_data/
│
├── src/                    # Source code
│   ├── config.py           # Configuration loader
│   ├── database.py         # Database models and connection
│   ├── ingestion/          # Data collection modules
│   ├── parsing/            # Document parsing
│   ├── claim_extraction/   # NLP claim extraction
│   ├── claim_matching/     # Semantic matching
│   ├── contradiction_detection/  # NLI-based detection
│   ├── events/             # Event generation
│   ├── baselines/          # Baseline models
│   ├── modeling/           # Signal prediction
│   ├── evaluation/         # Metrics and analysis
│   ├── advanced_nlp/       # Topic modeling, RAG
│   └── visualization/      # Plotting utilities
│
├── app/                    # Gradio application
│   ├── gradio_app.py       # Main dashboard
│   └── components/         # UI components
│
├── scripts/                # Pipeline execution scripts
├── tests/                  # Unit and integration tests
├── reports/                # Generated reports
│   ├── figures/
│   └── tables/
└── notebooks/              # Jupyter notebooks

Installation

Prerequisites

Python 3.10+
PostgreSQL 14+
CUDA-capable GPU (optional, for faster inference)

Setup

Clone the repository

git clone <repository-url>
cd corporate-narrative-engine

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Download spaCy model

python -m spacy download en_core_web_sm

Configure environment variables

cp .env.template .env
# Edit .env with your API keys and database credentials

Set up the database

# Create PostgreSQL database
createdb corporate_narrative_engine

# Run migrations
alembic upgrade head

Configuration

Required API Keys

Service	Purpose	How to Get
Anthropic	Claim extraction	https://console.anthropic.com/
NewsAPI	News articles	https://newsapi.org/register
SEC EDGAR	SEC filings	Email only (no key)

Configuration Files

Edit configs/pipeline_config.yaml to customize:

# Target companies (tickers)
companies:
  - AAPL
  - MSFT
  - GOOGL
  - TSLA
  - AMZN

# Lookback period (full 1 year)
lookback:
  start_date: "2025-03-08"
  end_date: "2026-03-08"

# Contradiction thresholds
contradiction:
  high_threshold: 0.8
  medium_threshold: 0.6

# Market reaction thresholds
market_reaction:
  positive_threshold: 0.015   # +1.5%
  negative_threshold: -0.015  # -1.5%

Usage

Run Full Pipeline

python scripts/run_full_pipeline.py

Run Individual Steps

# 1. Data ingestion
python scripts/run_ingestion.py

# 2. Claim extraction
python scripts/run_extraction.py

# 3. Contradiction detection
python scripts/run_contradiction_detection.py

# 4. Train baselines
python scripts/run_baselines.py

# 5. Train signal model
python scripts/run_training.py

# 6. Evaluate
python scripts/run_evaluation.py

Launch Dashboard

python app/gradio_app.py

Then open http://localhost:7860 in your browser.

Dashboard Features

Company Explorer

View all extracted claims for a company
Timeline visualization of statements over time
Filter by topic, speaker, and source

Contradiction Events

Side-by-side comparison of contradicting claims
Contradiction score and NLI breakdown
Price reaction chart showing market impact

Signal Dashboard

Active signals with confidence levels
Historical signal accuracy
Win rate and return statistics

Evaluation Reports

Model vs baseline comparison
ROC and precision-recall curves
Signal decay analysis
Robustness by topic and document type
Calibration curves

Key Concepts

Claim Extraction

Statements are extracted from documents and structured as:

{
  "claim_id": "c001",
  "company": "Tesla",
  "speaker": "Elon Musk",
  "speaker_role": "CEO",
  "claim_text": "Cybertruck production will reach 250,000 units by end of 2025",
  "topic": "projects",
  "direction": "positive",
  "timestamp": "2025-05-10",
  "source": "news"
}

Contradiction Detection

Uses Natural Language Inference (NLI) to classify relationships:

Entailment: Later claim confirms earlier claim
Neutral: Claims are unrelated or compatible
Contradiction: Later claim contradicts earlier claim

Signal Generation

Signals are generated when:

Contradiction score > 0.7 AND model confidence > 0.6 → Bearish Alert
Contradiction score < 0.3 AND positive direction → Bullish Consistency

Evaluation Methodology

Temporal Splitting (No Leakage)

Data is split chronologically:

Train: 67% (oldest)
Validation: 16%
Test: 17% (newest)

Never use random splits for financial time series.

Metrics

Category	Metrics
Classification	Precision, Recall, F1, ROC-AUC
Decision	Precision@K, False Alarm Rate
Financial	Abnormal Returns, Win Rate
Calibration	Brier Score, ECE

Baselines

Keyword Rules: Pattern-based bearish/bullish detection
Sentiment Only: FinBERT sentiment on latest statement
Bag-of-Words: TF-IDF + Logistic Regression

Example Output

Company: Tesla (TSLA)
Event Date: 2025-08-15

Earlier Claim (2025-05-10, CEO Interview):
  "Cybertruck production will reach 250,000 units by end of 2025"

Later Claim (2025-08-15, 10-Q Filing):
  "Cybertruck production targets revised to 125,000 units due to
   supply chain constraints"

Contradiction Score: 0.87
Topic: projects
Speaker: CEO

Market Reaction:
  Next-day return: -4.2%
  Abnormal return: -3.8%

Model Signal: Bearish Alert
Confidence: 0.81

Explanation:
  Strong contradiction between CEO projection and subsequent filing.
  Historical analysis shows 76% of similar project-related
  contradictions preceded negative returns.

Constraints and Safeguards

Temporal Ordering: All comparisons enforce earlier.timestamp < later.timestamp
No Test Leakage: Thresholds tuned only on train/validation data
LLM Isolation: Claude prompts never contain future prices or labels
Minimum Samples: Require n ≥ 30 per category for statistical validity
Statistical Significance: Report p-values and 95% CIs

Development

Running Tests

pytest tests/ -v --cov=src

Code Style

black src/ app/ scripts/
isort src/ app/ scripts/
flake8 src/ app/ scripts/

Type Checking

mypy src/

Troubleshooting

Common Issues

Database connection failed

Ensure PostgreSQL is running
Check credentials in .env
Verify database exists: psql -l | grep corporate_narrative_engine

SEC rate limiting

The system respects 10 req/sec limit
If blocked, wait 10 minutes and retry

Out of memory during NLI

Reduce batch size in configs/model_config.yaml
Use CPU instead of GPU: TORCH_DEVICE=cpu

Claude API errors

Check API key in .env
Verify account has credits
Check rate limits (default: 60 req/min)

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make changes with tests
Submit a pull request

Acknowledgments

SEC EDGAR for free access to corporate filings
Anthropic for Claude API
HuggingFace for transformer models
GDELT Project for news data

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
app		app
configs		configs
scripts		scripts
src		src
tests		tests
.env.template		.env.template
.gitignore		.gitignore
PROJECT_DOCS.md		PROJECT_DOCS.md
README.md		README.md
batch_results.json		batch_results.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Corporate Narrative Consistency Engine

Overview

Core Research Question

Architecture

Project Structure

Installation

Prerequisites

Setup

Configuration

Required API Keys

Configuration Files

Usage

Run Full Pipeline

Run Individual Steps

Launch Dashboard

Dashboard Features

Company Explorer

Contradiction Events

Signal Dashboard

Evaluation Reports

Key Concepts

Claim Extraction

Contradiction Detection

Signal Generation

Evaluation Methodology

Temporal Splitting (No Leakage)

Metrics

Baselines

Example Output

Constraints and Safeguards

Development

Running Tests

Code Style

Type Checking

Troubleshooting

Common Issues

License

Contributing

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages