Inspiration

Legal contracts are designed to be confusing and arcane. Last month we saw people freeze in Texas while their power contracts skyrocketed. Providing legal literacy in an easily consumable way can save money and lives.

What it does

When given a legal agreement in the form of a pdf, or text file, legal-ease will summarize and simplify the document into a short and more easily readable document. It will also notate the summarized document giving definitions and simplification of complex phrases.

How we built it

Legal-ease uses advanced AI and state-of-the-art summarizing techniques. Legal-ease works in three phases. Phase one calculates a value for each sentence. We use three main heuristics to achieve this.

  • Word occurrence. The more a word occurs, the more important it is given. Each standard deviation above the mean number of occurrences is given an added value.
  • Natural language processing (NLP). We try to detect "legal jargon" such as sentences with legal conditions and constraints. We also give a sentence additional value when it talks about payments, dates, or definitions, among other key phrases.
  • Recurrent neural network (RNN). Finally, we use an RNN to help classify edge-cases that couldn't be detected by the previous two methods. Our RNN is trained to detect ambiguous, predatory, important, and unimportant sentences. These classifications give the sentence additional value.

Phase two is the selection of the sentences. To select the optimal subset of sentences, we use a modified version of the Knapsack problem to pick the best subset of sentences that stay under a predetermined word limit or percent of the original document. We also use an approximation algorithm if the traditional algorithm would take too long to run.

The last phase is passing the summarized document to a legal-to-English dictionary, to clarify common legal terms.

Challenges we ran into

Our biggest challenge was training and using the RNN. Because of our limited amount of time, we only collected around 160 points of truth data and didn't have any computer or time to train the RNN as much as we originally wanted to. However, after many attempts, we were able to achieve a somewhat decent accuracy for the RNN.

Accomplishments that we're proud of

Our biggest accomplishment was how well Legal-ease works. In general, we were extremely proud of the summaries it created. Obviously, it isn't perfect, but we think this is a powerful proof of concept.

We ran a lease agreement and the Amazon Prime TOS through legal-ease. Both summaries generated by legal-ease appeared to contain almost every sentence we thought was important, with both sentences that seemed predatorial and sentences with important information.

What we learned

We learned a lot: different NLP techniques, tensor flow models, activation functions, and even how to better understand "legal jargon."

We also learned that we're glad we're computer programmers and not lawyers.

Built With

+ 2 more
Share this project:

Updates