Inspiration

Mapping textbook sections to educational standards is vital for curriculum design, but doing it by hand is slow, subjective, and nearly impossible to scale. We were curious whether natural language processing could step in as a helpful co-pilot — automating alignment while keeping curricula consistent and up to date.

In short, we wanted to turn a tedious task into something fast, reliable, and actually enjoyable.

What it does

OpenStaxAlign predicts the most relevant educational standard for each textbook section. It processes hierarchical OpenStax JSON files, preserves parent-section context, and produces standard predictions for unseen books in a submission-ready CSV format.

The result: curriculum designers can review, validate, and iterate in minutes instead of hours.

How we built it

We extracted structured text fields from nested JSON, transformed them into TF-IDF features, and trained Logistic Regression and Linear SVM models. We fine-tuned hyperparameters with stratified cross-validation, applied class weighting to handle rare standards, and evaluated performance using classification accuracy and visualization tools.

Our focus was on building models that were not only effective, but also transparent and quick to iterate on.

Challenges we ran into

Preserving hierarchical context during flattening turned out to be essential for strong predictions, and class imbalance required careful filtering and weighting strategies.

API limits and the one-day timeline meant every experiment had to count, pushing us toward fast, reliable approaches over heavier architectures.

Accomplishments that we're proud of

In a single day, we delivered a complete end-to-end pipeline and reached around 0.75 validation accuracy with lightweight, interpretable models.

We also produced polished visualizations and clear documentation so others could easily explore and build on our work.

What we learned

We learned just how much preprocessing and contextual features influence NLP performance, how to make smart trade-offs under time pressure, and how to present technical results clearly to both technical and non-technical audiences.

What's next for OpenStaxAlign

Next up: transformer-based models, multi-label prediction, hierarchical classification, multilingual support, and an educator-facing review interface that puts humans in the loop.

This is only the beginning 🚀

Built With

Share this project:

Updates