HOME PAGE of PATRICK SCHONE (Last Update: 05/24/2008)
SPEECH AND HUMAN LANGUAGE TECHNOLOGIES
RESEARCHThe following are publications (including pre-prints):
Human Language Technology |
Paper Title |
Authors |
Subjects |
Reference |
Synopsis |
| 1996 |
A dictionary-based method for
determining topics in text and transcribed speech |
Schone, P., Nelson, D. |
Topic identification |
Proceedings of the
1996 IEEE International Conference on Acoustics, Speech, & Signals
Processing, Atlanta, GA. Vol. 1, pp. 295-298. |
We describe an algorithm which mines electronic dictionaries
and applies that information to the task of tagging incoming documents for
topics. Topics in this case are not some pre-described word set, but rather,
they are lists of words and generalizations about the content of the text. |
| 1997 |
Text Retrieval via
Semantic Forests |
Schone, P., Townsend, J., Crystal, T. Olano, C. |
Speech retrieval |
The 6th Text Retrieval Conference (TREC-6), Gaithersburg,
MD. NIST Special Publication 500-240, pp. 761-773 |
This paper describes using topical description of
documents as the indexing terms for information retrieval. The topics were
generated using electronic dictionaries, which have a taxonomic structure which
can be exploited similarly to the structures of ontologies. There is additional
work in how the system was modified to tackle the challenge of known-item
retrieval...an SDR task in TREC-6. |
| 1998 |
Text Retrieval via
Semantic Forests: TREC7. |
Hendrickson, G., Schone, P., Crystal, T. |
Speech retrieval |
The 7th Text Retrieval Conference (TREC-7),
Gaithersburg, MD. NIST Special Publication 500-242, pp. 583-593 |
This paper shows some additional improvements over our
first try (TREC-6) at information retrieval. Our precision-in-top30 increased by
50% relative using semantic classes and pseudo-relevance feedback in addition to
topics only. |
| 1999 |
Automatically generating a topic description for text
and searching and sorting text by topic using the same |
Nelson, D., Schone, P., Bates, R. |
Topic identification |
U.S.
PATENT 5,937,422 |
This is a patent on the above-mentioned (1996) topic algorithm. |
| 2000 |
Knowledge-Free
Induction of Morphology Using Latent Semantic Analysis |
Schone, P., Jurafsky, D. |
Morphology induction |
Conference on
Natural Language Learning 2000 (CoNLL-2000), Lisbon, Portugal, September 2000 |
In brief, we identify the morphological conflation sets for
each word in English and compare our results to those of the hand-developed
CELEX lexicon. As a first step, we identify plausible word suffixes by frequency
and then we compare pairs of potential morphological variants (for example,
"dog" to "dogs") using latent semantic analysis to see if the word pairs are
semantically similar. We use this and other distributional information to make a
judgment as to whether words are true conflations of each other. |
| 2001 |
Mandarin-English
Information (MEI): investigating translingual speech retrieval |
Meng, H., Chen, B., Khudanpur, S., Levow G-A., Lo, W-K, Oard, D., Schone,
P., Tang, K., Wang, H-M, Wang J. |
Speech retrieval |
Proceedings of the 2001 Human Language Technology (HLT) Conference, San
Diego, 2001 |
This paper describes the results of the JHU Summer Workshop
2000 group working on Mandarin-English Information Retrieval. The task was to
use English news text as query exemplars and to retrieve Broadcast News Mandarin
Chinese on the same topics. |
Multi-scale
audio indexing for translingual spoken document retrieval |
Hsin-min WANG, Helen MENG, Patrick SCHONE, Berlin CHEN and Wai-Kit LO |
Speech retrieval |
Proceedings
of the 26th International Conference on Acoustics, Speech, and Signal Processing
(ICASSP-2001), vol. 1, pp.605-608, Salt Lake City, 2001 |
This paper describes a subcomponent of the JHU Summer
Workshop group working on Mandarin-English Information Retrieval. The particular
focus was to create multiscale Chinese word recognitions and searches thereon. |
Multi-scale
retrieval in MEI: an English-Chinese translingual speech retrieval system |
Wai-Kit LO, Patrick SCHONE and Helen MENG |
Speech retrieval |
Proceedings of the Seventh European Conference on Speech Communication and
Technology (EUROSPEECH), vol. 2, pp.1303-1306, Aalborg, 2001 |
This paper describes a separate subcomponent of the JHU
Summer Workshop group working on Mandarin-English Information Retrieval. |
Is
Knowledge-Free Induction of Multiword Unit Dictionary Headwords A Solved
Problem? |
Schone, P., Jurafsky, D. |
Lemmatization, Multiword-Units |
Empirical Methods in Natural Language Processing,
Pittsburgh, PA, 2001 |
This paper explores the issue of identifying which n-grams
from a corpus of text have the appropriate properties of a multiword unit
dictionary headword. We first compare nine different collocation-finding
algorithms and test these using both WordNet and a large compendium of Internet
dictionaries. We then attempt to see whether Latent Semantic Analysis (LSA) can
help to isolate better multiword headwords. Multiword units should be either
non-compositional, non-substitutable, or non-modifiability, so we use LSA to
look for the first and the second of these. We are able to make some performance
gains using LSA to find non-substitutability (though LSA does not help find
non-compositionals). |
Knowledge-Free
Induction of Inflectional Morphologies |
Schone, P., Jurafsky, D. |
Morphology induction |
Proceedings of the North
American Chapter of the Association of Computational Linguistics
(NAACL-2001), Pittsburgh, PA June 2001 |
This work is an extension of the work described in Portugal.
In particular, we look explicitly for circumfixes and prefix/affix combinations
to begin with rather than just suffixes. We then incorporate induced syntactic,
orthographic, and transitive properties to decrease the error over our original
algorithm by 25% relative. Additionally, we show performance on German and Dutch
using CELEX again as the gold standard. |
Language
Independent Induction of Part of Speech Class Labels Using only Language
Universals |
Schone, P., Jurafsky, D. |
Part of speech induction |
"Machine Learning: Beyond Supervision," Workshop at
IJCAI-2001, Seattle, WA., August 2001 |
This work looks at the question of whether it is possible to
induce part of speech labels for syntactic clusters. This approach starts with
no lexicon, no hand-marked corpus -- nothing of this kind. Instead, we assume we
start with some (perfect) syntactic clusters from which we extract a number of
features: openness, boundedness, numeracy, optionality, affixation, adn cluster
ordering. We then appeal to Bayesian networks and to the body of work that
exists in linguistic typology and language universals to estimate which tag is
most befitting for each cluster. |
| 2002 |
Toward
Knowledge-Free Induction of Machine-Readable Dictionaries |
Schone, P. |
Dissertation |
University of Colorado at Boulder, December, 2001 (Copyright 2002). Advisors: Daniel S. Jurafsky
and James H. Martin |
The goal of this dissertation was
to see how far one could
go toward knowledge-free induction of an electronic dictionary. By
"knowledge-free," I mean that there is no human input ... only a corpus of text.
Where this is not possible, my requirement was that whatever limited human input
is used, it must at least be language independent. This work involved
multiword-unit induction, Chinese and phonetic segmentation, inflectional
morphologies, and parts of speech. Also, the dissertation describes what was
then the state of the art in automatic induction of word hierarchies. |
| 2003 |
Novel
Approaches to Arabic Speech Recognition: Report from the 2002 Johns-Hopkins
Workshop |
K. Kirchhoff, J. Bilmes, S. Das, N. Duta, M. Egan, G. Ji, F. He, J.
Henderson, D. Liu, M. Noamany, P. Schone, R. Schwartz and D. Vergyri |
Arabic speech-to-text |
Proceedings of the International Conference on Acoustics,
Speech and Signal Processing, Hong Kong, April 2003 |
This paper describes efforts of the Conversational Arabic
speech-to-text team at the JHU Summer Workshop of 2002. The big ideas here were
(1) the notion of trying to improve recognition using factored language models
which incorporate various syntactics components of Arabic language like
morphology; and (2) trying to induce diacritization for Arabic and determine
whether or not this will aid in recognition of Arabic. |
Language-reconfigurable universal phone recognition |
Walker, B., Lackey, B., Muller, J., Schone, P. |
Phonetic recognition |
EUROSPEECH-2003, pp. 153-156, Geneva, Switzerland |
This paper describes a universal phone (phonetic) recognizer
for conversational telephone-quality speech. The recognizer automatically
reconfigures itself to apply the strongest language model in its inventory to
whatever language it is used on. We describe the system and performance
measurements for it using extensive testing material both from languages in its
training set as well as from a language it has never seen. The recognizer
produces near-equivalent performance between the two types of data thus showing
its true universality and represents a solution for processing conversational,
telephone-quality speech in any language - even in low-resourced languages. |
| 2004 |
Mandarin English Information (MEI): Investigating translingual
speech retrieval |
Helen M. MENG, Berlin CHEN, Sanjeev KHUDANPUR, Gian-Anne LEVOW, Wai-Kit LO,
Douglas OARD, Patrick SCHONE, Karen TANG, Hsin-min WANG and Jianqiang WANG |
Speech retrieval |
Computer, Speech and Language, vol. 18, iss 2, pp.
163-179, Elsevier Press, Apr 2004 |
This paper is an expanded overview of the JHU 2000 Summer
Workshop group working on Mandarin-English Information Retrieval. The task was
to use English news text as query exemplars and to retrieve Broadcast News
Mandarin Chinese on the same topics. |
| 2005 |
Question
Answering with QACTIS at TREC 2004 |
Schone, P., Ciany, G., McNamee, P., Mayfield, J., Kulman, A., Bassi, T. |
Question answering |
The 13th Text Retrieval Conference
(TREC-13), Gaithersburg, MD. NIST Special Publication 500-261 |
This describes a strategy for performing automatic
question answering of factoid, list, and definitional style questions using a
two-way strategy. A top-down approach uses induced attributed object-
relationship graphs which are mined to discover answers with graphical
properties akin to those of the question. It is top-down in that "one size fits
all": all questions are tackled in much the same whay, regardless of whether
they are who, what, where, when, etc. questions. The second strategy is
bottom-up and focuses on only a few types of questions, but it handles them each
very well. It uses a cascade of filters to identify the answer word/phrase which
survives after a number of filtering processes are applied. Lastly, we use Web
validation to further reduce spurious answers, which is particularly beneficial
for list type questions. |
Searching Conversational
Telephone Speech in Any of the World's Languages |
Schone, P., McNamee, P., Morris, G., Ciany, G., Lewis, S. |
Speech retrieval |
International
Conference on Intelligence Analysis. McLean, VA. |
This effort uses rule-based transliteration in 90+ languages, universal phonetic
recognition, and speech-to-text processing to provide speech retrieval in potentially any language.
Results are provided in five languages, including one for which the system had virtually no prior training. |
| 2006 |
QACTIS-based Question Answering
at TREC 2005 |
Schone, P., Ciany, G., Cutts, R., McNamee, P., Mayfield, J., Smith, T. |
Question answering |
The 14th Text Retrieval Conference
(TREC-14), Gaithersburg, MD. NIST Special Publication 500-261 |
Significant advances were made to the base system by teaching the system
about word categorization. |
Low-Resource
Autodiacritization of Abjads for Speech Keyword Search |
Schone, P. |
Rule-based transliteration, speech retrieval |
Interspeech 2006 -- ICSLP,
Pittsburgh, PA, September 2006 |
This paper described an effort to learn the vowelization of abjadic languages
(i.e., those that generally do not include vowels) for a context-obly-based rule-based transliteration
system. Results are shown in five abjadic languages (Arabic, Farsi, Hebrew, Pashto, and Urdu) with as much as
31.2% relative improvement. |
| 2007 |
QACTIS Enhancements in TREC QA 2006 |
Schone, P., Ciany, G., Cutts, R., McNamee, P., Mayfield, J., Smith, T. |
Question Answering |
The 13th Text Retrieval Conference
(TREC-15), Gaithersburg, MD. NIST Special Publication 500-272 |
Additional improvements were made to the base system to bring up results
further. In particular, interest this year was on subcategorization of class types. For example, for a
question such as 'What team what the 1988 World Series?' the system would determine that team
could be sports, comedy, or various other teams, but 'world series' is a reference to baseball. So
only a baseball team could satisfy the question. |
| 2008 |
Learning Named Entity Hyponyms for Question Answering |
P. McNamee, R. Snow, P. Schone and J. Mayfield |
Question answering, Hyponym induction |
Third International Joint Conference on Natural Language Processing,
Hyderabad, India, January 2008 |
Since hyponym dictionaries have a major effect on the performance of
question-answering system, this system shows how induction of hyponyms can be applied to the same
question answering and the results thereof. |
Mining Wiki Resources for Multilingual Named Entity Recognition |
Richman, A., Schone, P. |
Content Extraction |
ACL-2008, Columbus, OH |
This paper provides a process whereby one can mine freely-accessible
Wikipedia pages as a mechanism of automatically obtaining content-tagged text in potentially any
language for the prupose of training a statistical content extractor. |
Return to Main
Page
|