Ph.D. Language and Information Technologies, School of Computer Science, Carnegie Mellon University, May 2009 "Using Articulatory Position Data to Improve Voice Transformation" Advisor: Alan W Black
M.S. Language Technologies, School of Computer Science, Carnegie Mellon University, May 2001
I have been working as a research scientist at
Yap, Inc. since September 21, 2009.
I received my Ph.D. on May 17th, 2009.
I continued work with Dr. Tanja Schultz for one
month (May 2009), but now from Pittsburgh. My task was to construct an
on-line system that converts electromyographical data to speech. The
surface electromyographical data we used was collected by attaching
probes to a person's face in order to measure the activation
potentials of certain muscles which are used during speech. As this
data could also be collected while a person pantomimes speech, we were
investigating its use for silent speech interfaces which take this
data and produce speech from it. The goal of the on-line system was
to serve as a demonstration and proof-of-concept of a silent speech
interface based on certain machine learning and signal processing
concepts.
From February through April 2009, I worked with Dr. Tanja Schultz in the Cognitive Systems Lab at University of Karlsruhe. I
worked with her group to apply voice transformation techniques to
synthesize speech from electromyographical data that they had
collected and previously used for speech recognition experiments.
This work led to two paper submissions to Interspeech 2009. During
this time, Tanja and I also continued our collaboration with Dr. Alan W Black and Dr. Qin Jin. I constructed
some human listening evaluations on various types of de-identified
speech to determine how difficult it was for people to identify
speakers when we tried to obscure who was speaking. This work was
combined with some other work we had performed and was part of another
paper we submitted to Interspeech 2009 and part of an article we
submitted to IEEE Transactions on Audio, Speech, and Language
Processing.
From 2005 until January 2009, I worked with Dr. Alan W Black on the
TRANSFORM project. My primary work was on trying to use articulatory
position data, more specifically the MOCHA database, to improve voice
transformation. We also investigated and implemented Harmonic plus
noise and Harmonic Stochastic models for speech signals. In our last
year-and-a-half, we collaborated with Dr. Qin Jin and Dr. Tanja Schultz, pitting our
voice transformation systems against their speaker identification
systems. We investigated security issues, such as whether voice
transformation was a threat for fooling speaker identification
systems, and we investigated privacy issues, such as whether voice
transformation could be used to obscure the identity of speech
presentated to speaker identification systems.
From September 2002 until 2005, I worked with Dr. Alan W Black on the
Storyteller project. I worked primarily on the automatic detection of
prosodic boundaries in speech, especially in the context of
multi-sentence recordings that are longer than what is typically used
for constructing concatenative speech synthesizers.
Stefanie Tomko, Thomas K. Harris, Arthur Toth, James Sanders, Alexander Rudnicky, Roni Rosenfeld. Towards Efficient Human Machine Speech Communication: The Speech Graffiti Project. ACM Transactions on Speech and Language Processing. Vol. 2, No. 1. February 2005.