OSU Campus Energy Investment Prioritization Tool

A data-driven decision-support platform for identifying which campus buildings should be prioritized for energy-efficiency capital investment. Built for the OSU AI Hackathon 2025.

The tool analyzes 60 days of smart meter data (September–October 2025) across 286 OSU campus buildings, combines it with weather and building metadata, and produces a transparent, ranked shortlist with plain-language explanations and estimated savings opportunities.

Project Structure

osu_energy/
├── app.py                  # 4-page Streamlit application
├── data_loader.py          # Data ingestion, joining, and daily aggregation
├── feature_engineering.py  # Investment signal computation
├── models.py               # Portfolio clustering and per-building time-series regression
├── scoring.py              # Composite score, normalization, ranking, action recommendations
├── utils.py                # Shared utility functions (infrastructure filter)
├── requirements.txt        # Python dependencies
└── PRD.md                  # Product Requirements Document

Requirements

Python 3.10+
Dependencies:

streamlit>=1.35.0
pandas>=2.0.0
numpy>=1.26.0
scikit-learn>=1.4.0
plotly>=5.20.0
scipy>=1.13.0

Install with:

pip install -r requirements.txt

Data Setup

Place the following four files in your ~/Downloads/ directory before running the app:

File	Description
`meter-data-sept-2025.csv`	15-minute interval meter readings, September 2025
`meter-data-oct-2025.csv`	15-minute interval meter readings, October 2025
`weather-sept-oct-2025.csv`	Hourly weather data from Open-Meteo API
`building_metadata.csv`	SIMS building database (size, age, location)

The data pipeline joins all three sources automatically on startup. No manual preprocessing required.

Running the App

streamlit run app.py

The first load takes approximately 30 seconds while the full pipeline runs. Subsequent page navigation is instant due to Streamlit's @st.cache_data caching.

Application Pages

1. Portfolio Overview

A campus-wide snapshot of energy performance across all 286 buildings.

Summary metrics: buildings analyzed, meter readings processed, data window
Top 20 buildings by priority score, color-coded by confidence tier
Energy intensity vs. building age scatter plot (bubble size = gross square footage)
Portfolio cluster map: KMeans archetypes projected to 2D via PCA

2. Building Prioritization

The primary decision-support page. Produces a ranked shortlist with adjustable signal weights.

Signal weight sliders (auto-renormalized to 100%)
Utility selector (ELECTRICITY, HEAT, GAS)
Infrastructure building filter (enabled by default)
Ranked table with priority score, confidence tier, age, area, and recommended next step
Expandable "Why This Building?" section per building — signal breakdown chart and plain-language explanation
Estimated savings opportunity (assumes 20% load reduction at $0.13/kWh for electricity)

3. Building Deep Dive

Per-building time-series analysis.

60-day time-series of actual vs. weather-predicted energy use
Temperature overlay on secondary axis
Isolation Forest anomaly detection — flags statistically unusual days on the chart
Signal scorecard and recommended action

4. Methodology & Limitations

Full transparency page documenting signals, models, assumptions, and limitations.

Signal definitions and model descriptions
Confidence tier criteria
KMeans elbow curve (k=2 through k=10) justifying k=5
Weight sensitivity analysis: Spearman rank correlation across 300 random weight sets
Stated assumptions and known limitations

Data Pipeline

Stage 1 — Ingestion (`data_loader.py`)

Loads and concatenates September and October meter CSVs
Filters to energy utilities only: ELECTRICITY, HEAT, GAS, STEAM
Joins meter data to hourly weather on truncated timestamp
Joins meter data to building metadata on SIMS building number
Computes vintage_age = 2025 − construction_year

Stage 2 — Daily Aggregation (`data_loader.py`)

Aggregates 15-minute readings to one row per building, utility, and date
Uses readingwindowsum (the true daily total across all 96 intervals) — not readingvalue
Applies IQR outlier clipping per building+utility (3×IQR fence) to remove sensor faults
Computes daily_kwh_per_sqft for size-normalized comparisons

Stage 3 — Signal Engineering (`feature_engineering.py`)

Five signals are computed for each building × utility combination:

Signal	What It Measures
Energy Intensity (kWh/sqft)	Baseline consumption normalized for building size
Unexplained Deviation (RMSE)	Energy use not explained by weather or building age
Peer Group Excess (z-score)	How far above similar-sized, similar-aged buildings
Weather Sensitivity (kWh/sqft/°F)	Strength of HVAC response to temperature
Load Variability (CV)	Erratic or unstable daily consumption patterns

Each building also receives a confidence tier:

Tier	Criteria
High	≥ 45 days of data, R² > 0.3, < 10% missing readings
Medium	≥ 25 days of data OR R² > 0.1
Low	Anything below Medium

A plausibility check removes buildings with a median daily intensity above 50 kWh/sqft/day — a threshold that indicates a likely unit error (e.g., Wh logged as kWh) rather than genuine consumption.

Stage 4 — Portfolio Clustering (`models.py`)

KMeans (k=5) on all five normalized signals, one row per building
PCA (2 components) for visualization
Cluster archetypes assigned by centroid ranking:
- High Load + Weather-Driven (Priority)
- High Baseline Load (Priority)
- Erratic / Unstable Load (Investigate)
- Efficient / Low Load (Reference)
- Moderate Load (Monitor)

Stage 5 — Scoring (`scoring.py`)

Each signal is min-max normalized to [0, 1] within utility type. Infrastructure buildings are excluded from the normalization scale to prevent them from compressing campus building scores.

Composite score formula:

score = 100 × (
    0.30 × norm(energy_intensity)
  + 0.25 × norm(unexplained_deviation)
  + 0.20 × norm(peer_excess)
  + 0.15 × norm(|weather_sensitivity|)
  + 0.10 × norm(load_variability)
)

Weights are user-adjustable via the Building Prioritization sliders.

Action recommendations by score:

Score	Recommendation
>= 70 (High/Medium confidence)	Full Energy Audit
>= 50	Targeted Investigation
>= 30	Monitor
< 30	No Immediate Action

AI / ML Methods

Method	Library	Purpose
OLS Regression (per building)	`sklearn.linear_model.LinearRegression`	Model expected energy from weather + age; extract RMSE and sensitivity
KMeans — Peer Grouping	`sklearn.cluster.KMeans` (k=5)	Group buildings by size and age for fair peer comparison
KMeans — Portfolio Clustering	`sklearn.cluster.KMeans` (k=5)	Group buildings by all 5 signals into behavioral archetypes
PCA	`sklearn.decomposition.PCA` (n=2)	Project portfolio clusters to 2D for visualization
Isolation Forest	`sklearn.ensemble.IsolationForest` (contamination=0.10)	Flag anomalous days per building in the Deep Dive
Min-Max Normalization	Manual (per utility)	Normalize signals to [0, 1] before scoring
Spearman Correlation	`scipy.stats.spearmanr`	Validate ranking stability across 300 random weight sets

Key Design Decisions

Why OLS regression instead of a more complex model? Interpretability. OLS coefficients have a direct meaning — a facilities manager can understand that a building's energy use changes by X kWh/sqft per degree Fahrenheit. A black-box model would produce better predictions but cannot explain its reasoning.

Why normalize by square footage? Raw kWh always favors large buildings. kWh/sqft makes comparisons fair regardless of building size.

Why score utilities separately? Electricity and heat operate on different absolute scales and cannot be meaningfully compared directly. Each utility is scored independently; scores are then averaged at the building level.

Why exclude infrastructure from normalization? Substations and chiller plants consume energy at orders of magnitude higher than campus buildings. Including them in the min-max scale would compress all campus building scores into a narrow band near zero.

Limitations

60-day window only: September–October is a shoulder season. Full-year data would capture heating and cooling cycles more completely.
No occupancy or use-type data: A research lab and a lecture hall of the same size and age are treated identically. The peer grouping partially mitigates this but does not fully resolve it.
Single weather station: All buildings use the same campus weather feed. Microclimatic variation is not captured.
Savings estimates are illustrative: The 20% load reduction assumption and $0.13/kWh rate are representative starting points, not guarantees. Dollar figures are shown for ELECTRICITY only.
No discount rate on 3-year savings: The 3-year projection is a simple linear extrapolation and should not be used as a capital budgeting input without further financial analysis.

Data Source

Smart meter data provided by the OSU Energy Research Data Hub for the OSU AI Hackathon 2025. Weather data sourced from the Open-Meteo API. Building metadata from the OSU SIMS database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSU Campus Energy Investment Prioritization Tool

Project Structure

Requirements

Data Setup

Running the App

Application Pages

1. Portfolio Overview

2. Building Prioritization

3. Building Deep Dive

4. Methodology & Limitations

Data Pipeline

Stage 1 — Ingestion (`data_loader.py`)

Stage 2 — Daily Aggregation (`data_loader.py`)

Stage 3 — Signal Engineering (`feature_engineering.py`)

Stage 4 — Portfolio Clustering (`models.py`)

Stage 5 — Scoring (`scoring.py`)

AI / ML Methods

Key Design Decisions

Limitations

Data Source

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
__pycache__		__pycache__
PRD.md		PRD.md
README.md		README.md
app.py		app.py
build_slides.py		build_slides.py
data_loader.py		data_loader.py
feature_engineering.py		feature_engineering.py
models.py		models.py
requirements.txt		requirements.txt
scoring.py		scoring.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

OSU Campus Energy Investment Prioritization Tool

Project Structure

Requirements

Data Setup

Running the App

Application Pages

1. Portfolio Overview

2. Building Prioritization

3. Building Deep Dive

4. Methodology & Limitations

Data Pipeline

Stage 1 — Ingestion (data_loader.py)

Stage 2 — Daily Aggregation (data_loader.py)

Stage 3 — Signal Engineering (feature_engineering.py)

Stage 4 — Portfolio Clustering (models.py)

Stage 5 — Scoring (scoring.py)

AI / ML Methods

Key Design Decisions

Limitations

Data Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Stage 1 — Ingestion (`data_loader.py`)

Stage 2 — Daily Aggregation (`data_loader.py`)

Stage 3 — Signal Engineering (`feature_engineering.py`)

Stage 4 — Portfolio Clustering (`models.py`)

Stage 5 — Scoring (`scoring.py`)

Packages