Inspiration

Khan Academy doesn't offer a public API for content on their website. Making some of their content more accessible for developers.

What it does

This program creates a module capable of scraping the Khan Academy search page. You can search Khan Academy for guides, practice problems, or videos.

How we built it

This is built using a Javascript library called puppeteer, a tool used for headless browsing using Google Chromium. It allows the user to emulate Javascript events on a page, so that scraping non-static sites is possible (albeit time consuming). It's like using a typical web browser like chrome, except you use code to navigate instead of your mouse or keyboard.

Accomplishments that we're proud of

The site is able to produce first page search results in around four seconds. This is much faster than I expected it to be, since it has to load a lot before the user is able to access it.

What we learned

It's been a long time since I've used puppeteer. It's definitely helped me brush up on some old knowledge.

What's next for KA Scrapes

As of right now, the only scraped page is the search page. I'd like to add more pages in the future for more content.

Built With

Share this project:

Updates