Image

    Tae Yano

    Graduate Research Assistant
    Language Technologies Institute
    Carnegie Mellon University



    About me:


    Image

    I am a Ph.D. candidate in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University. I research natural language processing (NLP), statistical text analysis, probabilistic topic modeling, and their interaction with social science with my advisors, Noah Smith and William Cohen. A lot of stuff we do falls into the area of text-driven prediction. In recent works we focus mostly on the problems arising in American politics and government.

    Prior to Carnegie Mellon, I was at Columbia University where I obtained MS in Computer Science. There I worked with Becky Passonneau at Columbia' CCLS on CLiMB Project. Before becoming a full-time graduate student I was a software engineer, building system applications for large scale document devices.


    Research Interest:


    Understanding a large volume of text is difficult for humans, which poses a unique challenge on the face of the recent flood of information. There seems to be so much information, yet we seem not to know where to begin to read.

    I think statistical NLP is uniquely equipped to make a social impact in this context. Its fundamental pursuit is, in short, to understand linguistic phenomenon and language artifacts (e.g., documents) by taking advantage of evidences in large numbers. We hope our research will bear both practical and scholastic importance in this context.

    Some of our attempts are outlined in my dissertation proposal: Text as Actuator: Text-Driven Response Modeling and Prediction in Politics


    Image

    Refereed Publications:



    Tae Yano, Noah A. Smith, and John D. Wilkerson.
    Textual Predictors of Bill Survival in Congressional Committees
    In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL 2012) Montreal, Quebec, July 2012.

    Jacob Eisenstein, Tae Yano, William W. Cohen, Noah A. Smith, and Eric P. Xing.
    Structured Databases of Named Entities from Bayesian Nonparametrics
    In Proceedings of the EMNLP Workshop on Unsupervised Learning in NLP Edinburgh, UK, July 2011.

    Justin Cranshaw and Tae Yano.
    Seeing a Home away from the Home: Distilling proto-Neighborhood from Incidental Data with Topic Modeling
    In Proceedings of the Workshop on Computational Social Science and the Wisdom of Crowds, Annual Conference on Neural Information Processing System (NIPS). Vancouver, B.C., Canada. Dec 2010

    Tae Yano, Philip Resnik,and Noah A. Smith.
    Shedding (a Thousand Points of) Light on Biased Language
    In Proceedings of the NAACL-HLT Workshop on Creating Speech and Language Data With Mechanical Turk. Los Angeles, CA. June 2010

    Tae Yano and Noah A. Smith.
    What's Worthy of Comment? Content and Comment Volume in Political Blogs with Topic Models
    In Proceedings of the International AAAI Conference on Weblogs and Social Media 2010. Washington D.C. May, 2010

    Tae Yano and William Cohen, Noah A. Smith.
    Predicting Response to Political Blog Posts with Topic Models
    In Proceedings of the North American Association for Computational Linguistics Human Language Technologies Conference (NAACL). Boulder, CO. May/June, 2009

    Rebecca Passonneau, Tom Lippincott, Tae Yano, Judith Klavans.
    Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain
    In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC). Marrakesh, Morroco. May/Jun, 2008

    Judith Klavans, Carolyn Sheffield, Eileen Abels, Joan Beaudoin, Laura Jenemann, Jimmy Lin, Tom Lippincott, Rebecca Passonneau, Tandeep Sidhu, Dagobert Soergel, and Tae Yano.
    Computational Linguistics for Metadata Building: Aggregating Text Processing Technologies for Enhanced Image Access
    In Proceedings LREC Workshop on Language Resources for Content-Based Image Retrieval (OntoImage 2008). Marrakesh, Morroco. May/Jun, 2008

    Rebecca Passonneau, Tae Yano, Tom Lippincott, Judith Klavans
    Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning
    In Proceedings of the workshop on Metadata Mining for Image Understanding, 3rd International Conference on ComputerVision Theory and Applications (VISAPP).
    Funchal, Madeira Portugal. Jan, 2008


    Image

    Reports:


    Tae Yano
    KP: A knitting language
    Term project report, Programming Languages and Translator (COMS4115)
    Columbia University, New York, NY. Fall 2005

    Tae Yano and Moonyoung Kang
    Taking advantage of Wikipedia in Natural Language Processing
    Term project report, Language and Statistics II (11-762)
    Carnegie Mellon University, Pittsburgh, PA. Fall 2008


    Image

    Data:


    I released some of the data I collected along the way. Please follow the term of use, and cite our papers if you end up using them for your paper.

    Political Blog Corpora
    Data from five American political blogs during 2007 to 2008. README.txt

    Congressional Bill Corpus
    51,762 U.S. Congressional bills from the 103rd to 111th Congresses (1993 to 2010), each annotated with whether it survived (i.e., was recommended by) the Congressional committee process. README.txt

    Advertisement