Inspiration
Students hunt for faculty whose research, courses, or open positions match their passions - but the information is scattered across listings, personal websites, and lab sites. We want students to be able to post a link of their dream job listing and for us to do the rest to help them use their school's opportunities. A one-stop knowledge graph that surfaces who works on what, with direct links to emails, lab pages, and recent publications so students can discover mentors in seconds.
What it does
CLI + JSON knowledge base that:
- Crawls public “People” pages (faculty, grads, post-docs, researchers) and each person’s Campus Directory profile.
- Extracts name, title, department, email, websites, and rich extras (research interests, courses, awards).
- Enriches each profile with research topics from the Semantic Scholar API (Work in progress...)
- Embeds the combined text into 1,536-dim OpenAI vectors so downstream apps (chatbots, semantic search, match-making dashboards) can query by meaning.
- Persists everything to data/records.json, ready for search, ranking, or feeding into a vector DB.
How we built it
- Node.js script (crawl.js) orchestrates the crawl.
- cheerio parses the HTML DOM for fast, jQuery-style scraping.
- node-fetch retrieves seed pages and profile URLs (with automatic HTTPS→HTTP fallback).
- Semantic Scholar API returns top topics for each author.
- OpenAI Embeddings API converts profile + topic + bio into semantic vectors.
- Output is stored locally; future steps can upsert into PostgreSQL + pgvector or any vector store.
Challenges we ran into
- Semantic Scholar API - did not have enough time to totally debug what was going on. The unauthenticated user limit is probably the issue here; it's taking too long for Semantic Scholar website to approve me as a user.
- Directory downtime - occasional DNS hiccups required HTTP fallback and retry logic
What we learned
- Different colleges had vastly different HTML and front end, had to focus on just UCSC
- Different departments are also vastly different, had to focus on just the Engineering department
What's next for College Matcher
- Get Semantic Scholar fully working so students have access to research
- Expand it to any college with an intelligent scraper that can handle new websites
- A UI that allows students to type in the careers they want or post in job listings
Built With
- cheerio
- dotenv
- javascript
- node-fetch
- node.js
- openai
- semanticscholar
- typescript
Log in or sign up for Devpost to join the conversation.