FaqToBot

FAQ bot generator for websites

Inspiration

For RamHacks 2017, Octo challenged hackers to take a website and generate a question-answer bot with a voice interface for that website's FAQ pages. Technical challenges arise in finding the FAQ pages, parsing the FAQ pages, and generating mappings from question intents to answers.

Structure

Flow

On the Android app interface, the user enters a base url, for example, http://ramhacks.vcu.edu/. The app then makes a call to an AWS Lambda function that scrapes the website for pages and returns pages likely to contain an FAQ. For each potential FAQ page, the app attempts to parse the FAQ pages into question-answer pairs. The user can ask a question, via type or via voice, and the app will find and display an answer,leveraging Microsoft Cognitive Services.

Structure

The Android app interface can be found here.

The AWS Lambda python script will be described in detail here.

The scraper.py file contains the Lambda function handler. This scraper conducts breadth-first search through all internal-facing links in the anchor tags of each page. For each page visited, multiple features e.g. how many times FAQ appears in the page, are extracted from the html. The features compose an input vector for a basic neural network which generates a likelihood that the page is an FAQ page. If the probability exceeds 50%, the page is tagged as a potential FAQ page. The page crawling continues until there are no more unique pages to visit or the page visit limit (set by the initial Lambda function call) is exceeded. The final list of FAQ urls is concatenated into a newline-separated string and is returned to the origin of the Lambda function call.

ANN.py contains the basic neural network. Data to train the neural network is found in prepro_training.txt and is processed by DataGeneration.py. For each url in the training set, the HtmlFeatureExtractor.py extracts a set of quantitative features from the page and outputs that feature data into training.txt. The ANN.py can then be trained on the numeric training data in training.txt. The weights are stored and restored via a pickle file.

Links

Android app repository

Credits

Julian Duque - Link between application and Microsoft Cognitive Services
Meredith Lee - Application-user interface
Tony Wang - Amazon Alexa interface through AWS
David Zhao - AWS Python backend

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
bs4		bs4
numpy		numpy
.gitignore		.gitignore
.scraper.py.swp		.scraper.py.swp
ANN.py		ANN.py
DataGeneration.py		DataGeneration.py
FaqChecker.py		FaqChecker.py
HtmlFeatureExtractor.py		HtmlFeatureExtractor.py
README.md		README.md
package.zip		package.zip
prepro_training.txt		prepro_training.txt
scraper.py		scraper.py
training.txt		training.txt
training_backup.txt		training_backup.txt
weights.pickle		weights.pickle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FaqToBot

Inspiration

Structure

Flow

Structure

Links

Credits

About

Uh oh!

Releases

Packages

Languages

davidbzhao/FaqToBot

Folders and files

Latest commit

History

Repository files navigation

FaqToBot

Inspiration

Structure

Flow

Structure

Links

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages