A data-driven decision-support platform for identifying which campus buildings should be prioritized for energy-efficiency capital investment. Built for the OSU AI Hackathon 2025.
The tool analyzes 60 days of smart meter data (September–October 2025) across 286 OSU campus buildings, combines it with weather and building metadata, and produces a transparent, ranked shortlist with plain-language explanations and estimated savings opportunities.
osu_energy/
├── app.py # 4-page Streamlit application
├── data_loader.py # Data ingestion, joining, and daily aggregation
├── feature_engineering.py # Investment signal computation
├── models.py # Portfolio clustering and per-building time-series regression
├── scoring.py # Composite score, normalization, ranking, action recommendations
├── utils.py # Shared utility functions (infrastructure filter)
├── requirements.txt # Python dependencies
└── PRD.md # Product Requirements Document
- Python 3.10+
- Dependencies:
streamlit>=1.35.0
pandas>=2.0.0
numpy>=1.26.0
scikit-learn>=1.4.0
plotly>=5.20.0
scipy>=1.13.0
Install with:
pip install -r requirements.txtPlace the following four files in your ~/Downloads/ directory before running the app:
| File | Description |
|---|---|
meter-data-sept-2025.csv |
15-minute interval meter readings, September 2025 |
meter-data-oct-2025.csv |
15-minute interval meter readings, October 2025 |
weather-sept-oct-2025.csv |
Hourly weather data from Open-Meteo API |
building_metadata.csv |
SIMS building database (size, age, location) |
The data pipeline joins all three sources automatically on startup. No manual preprocessing required.
streamlit run app.pyThe first load takes approximately 30 seconds while the full pipeline runs. Subsequent page navigation is instant due to Streamlit's @st.cache_data caching.
A campus-wide snapshot of energy performance across all 286 buildings.
- Summary metrics: buildings analyzed, meter readings processed, data window
- Top 20 buildings by priority score, color-coded by confidence tier
- Energy intensity vs. building age scatter plot (bubble size = gross square footage)
- Portfolio cluster map: KMeans archetypes projected to 2D via PCA
The primary decision-support page. Produces a ranked shortlist with adjustable signal weights.
- Signal weight sliders (auto-renormalized to 100%)
- Utility selector (ELECTRICITY, HEAT, GAS)
- Infrastructure building filter (enabled by default)
- Ranked table with priority score, confidence tier, age, area, and recommended next step
- Expandable "Why This Building?" section per building — signal breakdown chart and plain-language explanation
- Estimated savings opportunity (assumes 20% load reduction at $0.13/kWh for electricity)
Per-building time-series analysis.
- 60-day time-series of actual vs. weather-predicted energy use
- Temperature overlay on secondary axis
- Isolation Forest anomaly detection — flags statistically unusual days on the chart
- Signal scorecard and recommended action
Full transparency page documenting signals, models, assumptions, and limitations.
- Signal definitions and model descriptions
- Confidence tier criteria
- KMeans elbow curve (k=2 through k=10) justifying k=5
- Weight sensitivity analysis: Spearman rank correlation across 300 random weight sets
- Stated assumptions and known limitations
- Loads and concatenates September and October meter CSVs
- Filters to energy utilities only: ELECTRICITY, HEAT, GAS, STEAM
- Joins meter data to hourly weather on truncated timestamp
- Joins meter data to building metadata on SIMS building number
- Computes
vintage_age = 2025 − construction_year
- Aggregates 15-minute readings to one row per building, utility, and date
- Uses
readingwindowsum(the true daily total across all 96 intervals) — notreadingvalue - Applies IQR outlier clipping per building+utility (3×IQR fence) to remove sensor faults
- Computes
daily_kwh_per_sqftfor size-normalized comparisons
Five signals are computed for each building × utility combination:
| Signal | What It Measures |
|---|---|
| Energy Intensity (kWh/sqft) | Baseline consumption normalized for building size |
| Unexplained Deviation (RMSE) | Energy use not explained by weather or building age |
| Peer Group Excess (z-score) | How far above similar-sized, similar-aged buildings |
| Weather Sensitivity (kWh/sqft/°F) | Strength of HVAC response to temperature |
| Load Variability (CV) | Erratic or unstable daily consumption patterns |
Each building also receives a confidence tier:
| Tier | Criteria |
|---|---|
| High | ≥ 45 days of data, R² > 0.3, < 10% missing readings |
| Medium | ≥ 25 days of data OR R² > 0.1 |
| Low | Anything below Medium |
A plausibility check removes buildings with a median daily intensity above 50 kWh/sqft/day — a threshold that indicates a likely unit error (e.g., Wh logged as kWh) rather than genuine consumption.
- KMeans (k=5) on all five normalized signals, one row per building
- PCA (2 components) for visualization
- Cluster archetypes assigned by centroid ranking:
- High Load + Weather-Driven (Priority)
- High Baseline Load (Priority)
- Erratic / Unstable Load (Investigate)
- Efficient / Low Load (Reference)
- Moderate Load (Monitor)
Each signal is min-max normalized to [0, 1] within utility type. Infrastructure buildings are excluded from the normalization scale to prevent them from compressing campus building scores.
Composite score formula:
score = 100 × (
0.30 × norm(energy_intensity)
+ 0.25 × norm(unexplained_deviation)
+ 0.20 × norm(peer_excess)
+ 0.15 × norm(|weather_sensitivity|)
+ 0.10 × norm(load_variability)
)
Weights are user-adjustable via the Building Prioritization sliders.
Action recommendations by score:
| Score | Recommendation |
|---|---|
| >= 70 (High/Medium confidence) | Full Energy Audit |
| >= 50 | Targeted Investigation |
| >= 30 | Monitor |
| < 30 | No Immediate Action |
| Method | Library | Purpose |
|---|---|---|
| OLS Regression (per building) | sklearn.linear_model.LinearRegression |
Model expected energy from weather + age; extract RMSE and sensitivity |
| KMeans — Peer Grouping | sklearn.cluster.KMeans (k=5) |
Group buildings by size and age for fair peer comparison |
| KMeans — Portfolio Clustering | sklearn.cluster.KMeans (k=5) |
Group buildings by all 5 signals into behavioral archetypes |
| PCA | sklearn.decomposition.PCA (n=2) |
Project portfolio clusters to 2D for visualization |
| Isolation Forest | sklearn.ensemble.IsolationForest (contamination=0.10) |
Flag anomalous days per building in the Deep Dive |
| Min-Max Normalization | Manual (per utility) | Normalize signals to [0, 1] before scoring |
| Spearman Correlation | scipy.stats.spearmanr |
Validate ranking stability across 300 random weight sets |
Why OLS regression instead of a more complex model? Interpretability. OLS coefficients have a direct meaning — a facilities manager can understand that a building's energy use changes by X kWh/sqft per degree Fahrenheit. A black-box model would produce better predictions but cannot explain its reasoning.
Why normalize by square footage? Raw kWh always favors large buildings. kWh/sqft makes comparisons fair regardless of building size.
Why score utilities separately? Electricity and heat operate on different absolute scales and cannot be meaningfully compared directly. Each utility is scored independently; scores are then averaged at the building level.
Why exclude infrastructure from normalization? Substations and chiller plants consume energy at orders of magnitude higher than campus buildings. Including them in the min-max scale would compress all campus building scores into a narrow band near zero.
- 60-day window only: September–October is a shoulder season. Full-year data would capture heating and cooling cycles more completely.
- No occupancy or use-type data: A research lab and a lecture hall of the same size and age are treated identically. The peer grouping partially mitigates this but does not fully resolve it.
- Single weather station: All buildings use the same campus weather feed. Microclimatic variation is not captured.
- Savings estimates are illustrative: The 20% load reduction assumption and $0.13/kWh rate are representative starting points, not guarantees. Dollar figures are shown for ELECTRICITY only.
- No discount rate on 3-year savings: The 3-year projection is a simple linear extrapolation and should not be used as a capital budgeting input without further financial analysis.
Smart meter data provided by the OSU Energy Research Data Hub for the OSU AI Hackathon 2025. Weather data sourced from the Open-Meteo API. Building metadata from the OSU SIMS database.