Inspiration
I spent the last 4 years teaching English and composition in the South Bronx and at Rikers Island Penitentiary. I quickly became aware of the innate, sometimes even ignored, influence language has. We are judged immediately from the first words we say or write. Something as innocuous as a typo can disqualify our resume and deny us a job interview.
I have further been championing the idea that not all bad students are bad in the same way. Surely, some are difficult to motivate and fail because of that. However, there is a subset who are genuinely studious and responsible, but who fail due to outside factors. This phenomenon is acutely observable with ESL writers; they may fail to produce grammatically correct sentences in English because they are still adhering to their primary language syntax and grammar.
It is important for both the educator and the student to make such distinctions. Misattributing a student's academic needs can stall their progress or even discourage them enough for them to drop out. Sadly, I've witnessed this too many times. Therefore, my team and I developed Shibboleth to help address this issue in linguistic contexts.
How it works
We have pored through several research papers on ESL detection using computational linguistics. We've selected a handful of methodologies to emulate and condensed them into a set of python functions. Our web and mobile interfaces allow educators to upload student essays en masse for analysis. We maintain a small but well curated list of detection algorithms; what sets us apart is that we are not trying to identify all grammar errors but only errors that arise from primary and secondary language conflicts, called Interferences.
Consulting with Jimmy, who is bilingual in Korean and English, we developed a Korean Primary, English Secondary (Korean L1, English L2) detection library. These are based specifically on the grammar, usage, and syntax conflicts between these two languages. Therefore, we'll be able to distinguish between Interference errors and general errors.
Challenges I ran into
Many of the methodologies used in the research we referred to are resource intensive. They require large sample sizes, intricately crafted grammars, numerous calls to protected APIs. We don't have access to many of these and could only replicate a small handful.
Our access to fluent foreign speakers and ESL testing data were also limited. We initially tried to develop an Indonesian L1 detection library but quickly abandoned it, since we didn't have team members fluent in Indonesian. Jimmy being bilingual in Korean made Korean our only viable choice.
Even though we gathered half a dozen ESL essays from native Korean speakers, a larger sample size would have let us detect more interference errors. The patterns of errors would have been more prominent and easier to identify.
Accomplishments that I'm proud of
Our team worked incredibly well at specialization. Each member had a task that he was most qualified to handle and we each worked in parallel. Sometimes, there were overlap; Jimmy and Noah proofread and revised much of the copy text together. However, there was never a time when one of us was waiting on another, nor were we ever stepping on each others' toes.
There was also no need for a centralized leader. We all had equal input and control of the project's progress. We collaborated smoother than most teams I've worked in.
What I learned
Goal driven education is amazingly powerful. Two of our teammates had no experience in a majority of the languages / frameworks we used. However, because we had a clear, definite goal, they worked tirelessly to master these resources (namely Bootstrap and HTML / CSS). Their end product is just as good as anything any of us could have produced.
Another teammate was fairly inexperienced with Android development. However, of the 4 of us, he was the most experienced and thus headed the mobile aspects of our project. His end product is impressive and accomplished.
What's next for Shibboleth
The power of our service comes from its accuracy and its scope. Currently, we are not very accurate because we only track 4 known Interference errors. Should we have more time and resources, we can expand that to 400 tracked errors.
Also, our scope is incredibly limited. We are only working with Korean ESL writers. In the future, we will develop series of lambda detection algorithms for Spanish ESL writers, German ESL writers, even Italian French-Secondary-Language learners.
These are the two aspects that would have an immediate, measurable improvement on our service.
Log in or sign up for Devpost to join the conversation.