|
SHORT NOTE |
|
AUTOMATED CREOLE ORTHOGRAPHY CONVERSION |
|
Marilyn P. Mason
Mason Integrated Technologies, Boston |
|
Introduction |
|
In 1991, I developed a flexible, semi-automated process1 for converting texts created in earlier Haitian Creole orthographies to conform to IPN (Institut Pédagogique National), the legal standard established by the Orthography Law of 1979. (see Dejan, 1985; Dejean, 1980; Vernet, 1980) |
|
Why flexible? Although the 1979 Law established a core of fixed rules to which most Haitian Creole writers adhere, there is less consistent uniformity with regard to the more peripheral issues of apostrophes, hyphens, contractions, punctuation, capitalization, proper names, and nasalization.2, 3 (Allen & Hogan, 1998) Absent complete orthographical uniformity, there probably will never be a "one-size-fits-all" tool for orthographical conversion. Flexibility, by means of "modification-of-command" functionality to meet a given publisher's requirements, is essential - somewhat analogous to the "training" functionality built into commercial OCR software. (Brown, 1998; Decrozant & Voss, 1998) |
|
Background |
|
As a typesetter of Haitian Creole texts since 1976,4 I have focused on the practical aspects of abiding by orthographic principles, in order to help guarantee consistency of output for my publisher. In pre-computer days, my job was to act computer-like and, as consistently as humanly possible, try to conform texts to declared standards. I intuitively identified the mixing and matching modules within words which were involved in the conversions. It all happened very naturally. |
|
In 1988, as an Administrative Secretary in the Chemistry Department at Massachusetts Institute of Technology, I was finally introduced to computers. Research papers had to be typed and I had to work with my computer to force it to make end products which my research scientist supervisors required. |
|
Once I understood how to work in partnership with a computer to typeset and speed edit chemistry texts, I began to apply the same techniques in my spare time to typesetting and speed editing Haitian Creole texts. |
|
To experiment, however, I needed a large volume of HC texts in my computer. I therefore spent 7 1/2 years typing the entire HC Bible. |
|
Why that book? In 1989, what was to become one of the most widely-read books in the Haitian Creole language5 was published in an outdated orthography - 10 years after a newer standard had been legally established! From day one, it was a prime candidate for orthographical updating but, because the original text had been set in "old technology," it needed to be digitized in order to be manipulated and edited. |
|
Why did I type it instead of scan it? Because the text was printed on both sides of "see-through" India paper, the scanner tried to read both sides of the page at once! Taking the additional step of photocopying the pages did not altogether eliminate the background interference and the tiny superscripted verse numbers - which to this day are not well treated by Optical Character Recognition (OCR) software - created havoc 10 years ago! So, starting in 1989, I typed it! |
|
Why did I type it in its original orthography - Pressoir-Faublas - when I knew that IPN was the new standard? Common sense. Not having been trained as a linguist, I knew I lacked the expertise to stray from the established starting point. Therefore, I had to be willing to do all that typing - but essentially leave it alone! My task, it seemed to me, was to deliver an accurate, digitized version of the original to those who were more qualified to manipulate it toward the new legal standard. |
|
I was content to proceed in this manner until I was nearing the midway point. I had typeset 29 of the 66 Bible books when I became increasingly tormented by doubt. Am I painting myself into a corner? What if nobody can unscramble the orthographical-update eggs? Should I quit? Should I blindly continue? I did neither! |
|
Process Experimentation |
|
Using copies of my digitized HC texts, I began to experiment with a very low tech, intuitive "character matching" approach of speed editing away from older orthographies toward IPN - using the standard editing tools of "over-the-counter" word processing software. |
|
It worked! I could convert Pressoir-Faublas text to IPN. IPN text to McConnell. Hybrid texts to other hybrids. And, over the years, the process matured. From semi-automated (250 page books converted in 2 hours) to an automated process requiring less than 2 1/2 minutes to convert that same 250 page book! |
|
It was at this point that I realized that by resolving my own personal problem, I had also gone directly to the heart of a major difficulty facing all publishers of Haitian Creole literature: How to keep work relevant in spite of orthographical shift. |
|
If interconvertibility of orthographies could be achieved in a computer environment, without having to retype set text, then the name of the game would be to get as many older works as possible into a computer to make the magic happen. |
|
In June of 1991, after considerable experimentation, I devised the prototype computer program. (Mason, 1991a) By the end of that month, I expanded the research to include a Haitian Creole text that I had never typed. (Mason,1991b) As a result of scanning, with the use of optical character recognition (OCR) software, printed texts of varying age and print quality produced by others were added to my computer's word processing software. I was then able to proceed, by means of the Mason Method of Haitian Creole Orthography Conversion (MMHCOC), to convert - without retyping text - the outdated orthographies of samples from Boukan, Jé Nou Louvri, and Chanmòt la to IPN. |
|
At this point, I improved upon the model, achieving the breakthroughs in 1992 and 1993 of extending the methodology beyond the Macintosh environment to also include MS DOS and Windows, as well as beyond Haitian Creole to accommodate languages using the Cyrillic alphabet as well. In 1994, the semi-automated process was reduced to "one mouse click on a menu item." (Mason, 1994) |
|
This proprietary methodology/software was demonstrated and test marketed in Haiti in May 1996 and the response from government Haitian Creole literacy specialists, educators, writers, editors, and publishers was unanimous: When can we have it? Nothing else like this exists! This will revolutionize the publishing industry in Haiti! (see Mason, 1997) |
|
It was obvious that a mechanism needed to be created for fostering further research and development and for broad based delivery of such tools in Haiti, the Diaspora, and other nations (early indications are that this methodology can be applied to other language groups).6 In 1996, Mason Integrated Technologies, Ltd was formed to enable publishers, writers, educators, and governmental and non-governmental agencies in developing nations to quickly and efficiently standardize printed materials. |
|
Potential Application to Other Creoles and Pidgins |
|
In March 1998, the first formal presentation of these possibilities was made to the creolist community at the Fourth International Creole Language Workshop in Miami, Florida.(see Mason, 1998) That talk served to whet the appetites of more than Haitians and Haitian Creole specialists. One by one, researchers in other creoles and pidgins approached me with the following question: Can these tools and methodologies developed to deal with the orthographical realities of Haitian Creole be adapted for use with other creoles? |
|
I see no reason why not, and a growing number of creolists tend to agree.7 As a result, some of us are moving forward to examine more formally whether we can modify my tools and methodologies to develop appropriate scripts to be added to the toolbars of their word processing programs. |
|
Wouldn't it be nice if work done in the past in your creole or pidgin, regardless of orthographical system, could be redeemed and efficiently incorporated into an ever widening body of orthographically consistent work? (see Mason, 1999a; 1999b; 1999c) |
|
Acknowledgments |
|
I would like to thank creolists Yves Dejean, Pauris Jn-Baptiste, Michel DeGraff, Jeffrey Allen, Emmanuel Védrine, Tometro Hopkins, Dany Adone, Loretto Todd, Suzanne Romaine, Vincent Cooper, and Glenn Gilbert for their encouragement to me. |
|
References |
Allen, J., & Hogan, C. (1998, January). Evaluating Haitian Creole orthographies from a non-literacy-based perspective. Paper presented at the SPCL Meeting, New York City. |
Bernard, J. (1980). Ki jan nou ekri kreyòl ayisyen [Reprint of a communiqué on Haitian Creole's official orthography]. Etudes Créoles, 3(1), 101-106. |
Brown, R. (1998, October ). Improving embedded machine translation with user interaction. Paper presented at the Workshop on Embedded MT Systems, in conjunction with the AMTA 1998 Conference, Langhorne, PA. |
Decrozant, L., & Voss, C. (1998, August). Cross-linguistic resources for MT evaluation and language training. Paper presented at the Natural Language Processing and Industrial Applications Conference (NLP+IA-98), Moncton, NB, Canada. |
Dejan, I. [Dejean, Y.] (1985). Ann aprann òtograf kreyòl la. Pòtoprens [Port-au-Prince], Haïti: Sekreteri d Eta pou Alfabetizasyon. |
Dejean, Y. (1980). Comment écrire le créole d'Haïti [Abridged and revised Ph.D. Thesis, Indiana University, 1977]. Outremont, Quebec: Collectif Paroles. |
Mason, M. (1991a) Novel method for orthography conversion in Haitian Creole. Unpublished manuscript. |
Mason, M. (1991b) Optical character recognition (OCR) technology widens impact of Mason Method of Haitian Creole Orthography Conversion (MMHCOC). Unpublished manuscript. |
Mason, M. (1994). Story behind CCMMHCOC. Unpublished manuscript. |
Mason, M. (1997, February) The backgrounder: President's report to the officers of Mason Integrated Technologies Ltd. Boston: Mason Integrated Technologies. |
Mason, M. (1998, March). Automated approach to Haitian Creole orthography conversion. Paper presented at the Fourth International Creole Language Workshop on Standardizing the Orthography, Vocabulary, and Structure, Florida International University, Miami. |
Mason, M. (1999a, June). Automated approach to Haitian Creole orthography conversion: Can this methodology be adapted to other creoles? Paper presented at the Creole Orthography Workshop, 9e Colloque International des Etudes Creoles, Aix-en-Provence, France. |
Mason, M. (1999b, October) Orthography standardization tools: Preparing creole languages for the new millennium (included a computer demonstration). Paper presented at the Seychelles 1999 Creole Sumposium, Mahé, Seychelles. |
Mason, M. (1999c, October) Kreol + computers + internet = a bright future for Kreol! Paper presented at the 14th Annual Creole Festival, Mahé, Seychelles. |
Vernet, P. (1980). Techniques d'écriture du créole haïtien. Port-au-Prince, Haïti: Imprimerie Le Natal. |
|
Footnotes
|
|
1. Mason Method for Haitian Creole Orthography Conversion (MMHCOC). |
|
2. Refer to Dejan. (1985), Dejean (1980), and Vernet (1980). |
|
3. Michel DeGraff. Personal communication, March 1999. |
|
4. Assistant to Rév. Alain Rocourt, Imprimerie Méthodiste, Église Méthodiste d'Haïti, Frères-Pétionville, Haïti, 1976-1977. Typeset his Haitian Creole Sunday School curriculum materials while he was Director of the United Methodist Haitian Mission, Miami, FL, 1989-1991. |
|
5. BIB LA an Ayisyin, Société Biblique Haïtienne, Port-au-Prince, 1989. |
|
6. Dr. Borys Bilokur, retired Slavic Languages professor at UCONN and I are collaborating on adapting the technology to the Ukrainian and Russian languages. |
|
7. Personal conversations and communications with Dany Adone, Tometro Hopkins, Derek Bickerton, Suzanne Romaine, and Jeffrey Allen, 1998-1999. |
|

|
|
Return to the JPCL Homepage
|
|
ISSN 0920-9034 © John Benjamins Publishing Company |