Project Story: Automating Government Contract Analysis with Custom NLP Model
About the Project
Our project came about from the need to efficiently analyze a large volume of government contracts from the Department of Defense (DOD) website. We aimed to streamline this process by building and training our own Natural Language Processing (NLP) model, leveraging the robust infrastructure of the AWS ecosystem and newer ML models such as BERT
Inspiration
The inspiration for this project arose from recognizing the time-consuming nature of manually parsing through large blocks of text whether that be long articles or government contracts. We saw an opportunity to harness the power of NLP and machine learning to automate this process, allowing for quicker and more accurate analysis of contract data.
What We Learned
Throughout the development journey, we gained valuable insights into various AWS services and machine learning techniques. We deepened our understanding of AWS Lambda and EventBridge for scheduling tasks, AWS SageMaker & API Gateway for model training and deployment, and Python + React TypeScript for the backend and frontend work
One of the key lessons we learned was the importance of data preprocessing and model optimization. We experimented with different NLP architectures and fine-tuning strategies to ensure the accuracy and efficiency of our model while still being robust and lightweight.
Building the Project
Our project centered around React TypeScript on the frontend and Python + AWS to process and manage the data passed into our frontend. The overall AWS ecosystem centered around AWS Lambda and EventBridge which triggered a scheduled function to scrape the latest contracts from the DOD website. These contracts were then fed into our custom NLP model deployed on AWS SageMaker via API Gateway .
Once analyzed, the contract data was stored in two DynamoDB tables, providing a scalable and reliable storage solution. This data was then seamlessly accessed and presented on our frontend interface, providing users with valuable insights into government contract details.
Challenges Faced
One of the main challenges we faced while creating our project was deploying our NLP model onto our AWS ecosystem as there was simply too much data to process locally. We struggled for a long time uploading our model to Sagemaker and creating an endpoint for it.
Additionally, with the complexity of some of the contracts, it required a lot of time manually parsing and inspecting to create our testing and blind dataset to evaluate our model against current LLMs.
Lastly, integrating and synchronizing across numerous AWS services required a lot of additional AWS features that we had never used in depth such as IAM.
Accomplishments that we're proud of
- Created, trained, and deployed a NLP model to analyze government contracts that performs better than ChatGPT
- Developed a working AWS ecosystem to automatically parse data from the DOD contracts website , feed into our LLM model, and then update our frontend daily.
- Created a sleek UI to view insights into our NLP model.
Design Diagram

Log in or sign up for Devpost to join the conversation.