I am a Ph.D. candidate in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University.
I research natural language processing (NLP), statistical text analysis, probabilistic topic modeling, and their interaction with social science with my advisors,
Noah Smith and William Cohen.
A lot of stuff we do falls into the area of text-driven prediction.
In recent works we focus mostly on the problems arising in American politics and government.
Understanding a large volume of text is difficult for humans, which poses a unique challenge on the face of the recent flood of information.
There seems to be so much information, yet we seem not to know where to begin to read.
I think statistical NLP is uniquely equipped to make a social impact in this context.
Its fundamental pursuit is, in short, to understand linguistic phenomenon and language artifacts (e.g., documents) by taking advantage of evidences in large numbers.
We hope our research will bear both practical and scholastic importance in this context.
Tae Yano and William Cohen, Noah A. Smith.
Predicting Response to Political Blog Posts with Topic Models In Proceedings of the North American Association for Computational Linguistics Human
Language Technologies Conference (NAACL). Boulder, CO. May/June, 2009
Congressional Bill Corpus 51,762 U.S. Congressional bills from the 103rd to 111th Congresses (1993 to 2010), each annotated with whether it survived (i.e., was recommended by) the Congressional committee process.
README.txt