Our Product!

Project Story: Automating Government Contract Analysis with Custom NLP Model

About the Project

Our project came about from the need to efficiently analyze a large volume of government contracts from the Department of Defense (DOD) website. We aimed to streamline this process by building and training our own Natural Language Processing (NLP) model, leveraging the robust infrastructure of the AWS ecosystem and newer ML models such as BERT

Inspiration

The inspiration for this project arose from recognizing the time-consuming nature of manually parsing through large blocks of text whether that be long articles or government contracts. We saw an opportunity to harness the power of NLP and machine learning to automate this process, allowing for quicker and more accurate analysis of contract data.

What We Learned

Throughout the development journey, we gained valuable insights into various AWS services and machine learning techniques. We deepened our understanding of AWS Lambda and EventBridge for scheduling tasks, AWS SageMaker & API Gateway for model training and deployment, and Python + React TypeScript for the backend and frontend work

One of the key lessons we learned was the importance of data preprocessing and model optimization. We experimented with different NLP architectures and fine-tuning strategies to ensure the accuracy and efficiency of our model while still being robust and lightweight.

Building the Project

Our project centered around React TypeScript on the frontend and Python + AWS to process and manage the data passed into our frontend. The overall AWS ecosystem centered around AWS Lambda and EventBridge which triggered a scheduled function to scrape the latest contracts from the DOD website. These contracts were then fed into our custom NLP model deployed on AWS SageMaker via API Gateway .

Once analyzed, the contract data was stored in two DynamoDB tables, providing a scalable and reliable storage solution. This data was then seamlessly accessed and presented on our frontend interface, providing users with valuable insights into government contract details.

Challenges Faced

One of the main challenges we faced while creating our project was deploying our NLP model onto our AWS ecosystem as there was simply too much data to process locally. We struggled for a long time uploading our model to Sagemaker and creating an endpoint for it.

Additionally, with the complexity of some of the contracts, it required a lot of time manually parsing and inspecting to create our testing and blind dataset to evaluate our model against current LLMs.

Lastly, integrating and synchronizing across numerous AWS services required a lot of additional AWS features that we had never used in depth such as IAM.

Accomplishments that we're proud of

Created, trained, and deployed a NLP model to analyze government contracts that performs better than ChatGPT
Developed a working AWS ecosystem to automatically parse data from the DOD contracts website , feed into our LLM model, and then update our frontend daily.
Created a sleek UI to view insights into our NLP model.

Design Diagram

Diagram

Built With

Submitted to

Bitcamp 2024
- Winner Best AI Powered Solution for Defense Contracts - Bloomberg Industry Group

Created by

I scraped, transformed, and managed the storage/cleanliness of the data to be fed into our NLP model.

Brian Lau
I built, tested, and deployed the main data extraction model.

Tyler Kempton
I worked on the front end dashboard and analytics tools.
I also helped work on a pipeline to use GPT as a comparison model.

Andrew Lee
Worked on model accuracy, gpt prompting, line chart, populating initial table with example data

Tim-Vuong Vuong