Arthur R. Toth

Graduated May 17,2009 with Ph.D.

Education

Ph.D. Language and Information Technologies, School of Computer Science, Carnegie Mellon University, May 2009
"Using Articulatory Position Data to Improve Voice Transformation"
Advisor: Alan W Black

M.S. Language Technologies, School of Computer Science, Carnegie Mellon University, May 2001

A.B. Mathematics, Harvard University, June 1993

Teaching Assistant Positions

15-453: Formal Languages, Automata, and Computation, Spring 2003
11-682/15-492: Intro to IR, NLP, MT, and Speech, Fall 2002

Research

I have been working as a research scientist at Yap, Inc. since September 21, 2009.

I received my Ph.D. on May 17th, 2009.

I continued work with Dr. Tanja Schultz for one month (May 2009), but now from Pittsburgh. My task was to construct an on-line system that converts electromyographical data to speech. The surface electromyographical data we used was collected by attaching probes to a person's face in order to measure the activation potentials of certain muscles which are used during speech. As this data could also be collected while a person pantomimes speech, we were investigating its use for silent speech interfaces which take this data and produce speech from it. The goal of the on-line system was to serve as a demonstration and proof-of-concept of a silent speech interface based on certain machine learning and signal processing concepts.

From February through April 2009, I worked with Dr. Tanja Schultz in the Cognitive Systems Lab at University of Karlsruhe. I worked with her group to apply voice transformation techniques to synthesize speech from electromyographical data that they had collected and previously used for speech recognition experiments. This work led to two paper submissions to Interspeech 2009. During this time, Tanja and I also continued our collaboration with Dr. Alan W Black and Dr. Qin Jin. I constructed some human listening evaluations on various types of de-identified speech to determine how difficult it was for people to identify speakers when we tried to obscure who was speaking. This work was combined with some other work we had performed and was part of another paper we submitted to Interspeech 2009 and part of an article we submitted to IEEE Transactions on Audio, Speech, and Language Processing.

From 2005 until January 2009, I worked with Dr. Alan W Black on the TRANSFORM project. My primary work was on trying to use articulatory position data, more specifically the MOCHA database, to improve voice transformation. We also investigated and implemented Harmonic plus noise and Harmonic Stochastic models for speech signals. In our last year-and-a-half, we collaborated with Dr. Qin Jin and Dr. Tanja Schultz, pitting our voice transformation systems against their speaker identification systems. We investigated security issues, such as whether voice transformation was a threat for fooling speaker identification systems, and we investigated privacy issues, such as whether voice transformation could be used to obscure the identity of speech presentated to speaker identification systems.

From September 2002 until 2005, I worked with Dr. Alan W Black on the Storyteller project. I worked primarily on the automatic detection of prosodic boundaries in speech, especially in the context of multi-sentence recordings that are longer than what is typically used for constructing concatenative speech synthesizers.

Previously, from August 1999 through August 2002, I worked with Dr. Roni Rosenfeld on Statistical Language Modeling and the Universal Speech Interface project.

Publications

Refereed Conference and Workshop Papers

Arthur R. Toth, Bhiksha Raj, Kaustubh Kalgaonkar, Tony Ezzat. Synthesizing Speech From Doppler Signals. Proc. ICASSP 2010.
Qin Jin, Arthur R. Toth, Tanja Schultz, Alan W Black. Speaker De-Identification Via Voice Transformation. Proc. ASRU2009.
Arthur R. Toth, Michael Wand, Tanja Schultz. Synthesizing Speech from Electromyography using Voice Transformation Techniques. Proc. Interspeech 2009.
Michael Wand, Arthur R. Toth, Szu-Chen (Stan) Jou, Tanja Schultz. Impact of Different Speaking Modes on EMG-based Speech Recognition. Proc. Interspeech 2009.
Qin Jin, Arthur R. Toth, Tanja Schultz, Alan W Black. Voice Convergin: Speaker De-Identification by Voice Transformation. Proc. ICASSP 2009.
Arthur R. Toth, Alan W Black. Incorporating Durational Modification in Voice Transformation. Proc. Interspeech 2008.
Qin Jin, Arthur R. Toth, Alan W Black, Tanja Schultz. Is Voice Transformation a Threat to Speaker Identification? Proc. ICASSP 2008.
Kishore Prahallad, Arthur R. Toth, Alan W Black. Automatic Building of Synthetic Voices from Large Multi-Paragraph Speech Databases. Proc. Interspeech 2007.
Alan W Black, Christina L. Bennett, Benjamin C. Blanchard, John Kominek, Brian Langner, Kishore Prahallad, Arthur Toth. CMU Blizzard 2007: A Hybrid Acoustic Unit Selection System from Stastistically Predicted Parameters. Blizzard 2007.
Arthur R. Toth, Alan W Black. Using Articulatory Position Data in Voice Transformation. Sixth ISCA Workshop on Speech Synthesis. 2007.
Arthur R. Toth and Alan W Black. Visual Evaluation of Voice Transformation Based on Knowledge of Speaker. In Proc. ICASSP 2006.
Arthur R. Toth and Alan W Black. Cross-Speaker Articulatory Position Data for Phonetic Feature Prediction. In Proc. Interspeech 2005.
John Kominek, Christina Bennett, Brian Langner, Arthur Toth. The Blizzard Challenge 2005 CMU Entry: a method for improving speech synthesis systems. In Proc. Interspeech 2005.
Arthur R. Toth. Forced Alignment for Speech Synthesis Databases Using Duration and Prosodic Phrase Breaks. In Proc. 5th ISCA Speech Synthesis Workshop. June 2004.
Jason Y Zhang, Arthur R. Toth, Kevin Collins-Thompson, and Alan W Black. Prominence Prediction for Super-Sentential Prosodic Modeling Based on a New Database. In Proc. 5th ISCA Speech Synthsesis Workshop. June 2004.
Arthur R. Toth, Thomas K. Harris, James Sanders, Stefanie Shriver and Roni Rosenfeld. Towards Every-Citizen's Speech Interface: An Application Generator for Speech Interfaces to Databases. In Proc. ICSLP 2002.
Stefanie Shriver, Roni Rosenfeld, Xiaojin Zhu, Arthur Toth, Alex Rudnicky, Markus Flueckiger. Universalizing Speech: Notes from the USI Project. In Proc. Eurospeech 2001.
Stefanie Shriver, Arthur Toth, Xiaojin Zhu, Alex Rudnicky, Roni Rosenfeld. A Unified Design for Human-Machine Voice Interaction. In Proc. CHI 2001.
Ronald Rosenfeld, Xiaojin Zhu, Stefanie Shriver, Arthur Toth, Kevin Lenzo, Alan W Black. Towards a Universal Speech Interface. In Proc. ICSLP 2000.

Refereed Journal Article

Stefanie Tomko, Thomas K. Harris, Arthur Toth, James Sanders, Alexander Rudnicky, Roni Rosenfeld. Towards Efficient Human Machine Speech Communication: The Speech Graffiti Project. ACM Transactions on Speech and Language Processing. Vol. 2, No. 1. February 2005.