Tae Yano

I am a Ph.D. candidate in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University. I research natural language processing (NLP), statistical text analysis, probabilistic topic modeling, and their interaction with social science with my advisors, Noah Smith and William Cohen. A lot of stuff we do falls into the area of text-driven prediction. In recent works we focus mostly on the problems arising in American politics and government.

Prior to Carnegie Mellon, I was at Columbia University where I obtained MS in Computer Science. There I worked with Becky Passonneau at Columbia' CCLS on CLiMB Project. Before becoming a full-time graduate student I was a software engineer, building system applications for large scale document devices.

Research Interest:

Understanding a large volume of text is difficult for humans, which poses a unique challenge on the face of the recent flood of information. There seems to be so much information, yet we seem not to know where to begin to read.

I think statistical NLP is uniquely equipped to make a social impact in this context. Its fundamental pursuit is, in short, to understand linguistic phenomenon and language artifacts (e.g., documents) by taking advantage of evidences in large numbers. We hope our research will bear both practical and scholastic importance in this context.

Some of our attempts are outlined in my dissertation proposal: Text as Actuator: Text-Driven Response Modeling and Prediction in Politics

Refereed Publications:

Textual Predictors of Bill Survival in Congressional Committees

In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2012)

Structured Databases of Named Entities from Bayesian Nonparametrics

In Proceedings of the EMNLP Workshop on Unsupervised Learning in NLP

Seeing a Home away from the Home: Distilling proto-Neighborhood from Incidental Data with Topic Modeling

In Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds, Annual Conference on Neural Information Processing System (NIPS).

Shedding (a Thousand Points of) Light on Biased Language

In Proceedings of the NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk.

What's Worthy of Comment? Content and Comment Volume in Political Blogs with Topic Models

In Proceedings of the International AAAI Conference on Weblogs and Social Media 2010.

Predicting Response to Political Blog Posts with Topic Models

In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference (NAACL).

Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain

In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC).

Computational Linguistics for Metadata Building: Aggregating Text Processing Technologies for Enhanced Image Access

In Proceedings LREC Workshop on Language Resources for Content-Based Image Retrieval (OntoImage 2008).

Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning

In Proceedings of the workshop on Metadata Mining for Image Understanding, 3rd International Conference on ComputerVision Theory and Applications (VISAPP).